virID, a novel microbial sequence identification algorithm, identifies no known or novel viral sequences associated with BPDCN. (A) Schematic of the virID computational pipeline. virID can be run with assembly (left) or without assembly (right). In the assembly-based approach, SPAdes is used to conduct de novo assembly of GATK-PathSeq unassigned reads to generate contigs. Contigs are then subjected to nucleotide (MegaBLAST) and translated amino acid (DIAMOND) searches against reference databases. The number of reads supporting each contig is determined by mapping reads back to contigs with the BWA-MEM aligner. Results are then integrated to report the abundance of microorganisms in the input reads. In the read-based approach, reads are directly subjected to BLAST and DIAMOND searches. (B) Taxonomic representation of assembly-based assignment when applied to 4 virus-positive and 2 virus-negative Merkel cell carcinomas. MCPyV was excluded from the virID reference databases. The top 15 genera per mean abundance are displayed. The tree is taxonomic and does not incorporate phylogenetic distances. Units are genera reads per 1 million human reads. (C) Taxonomic representation of read-based assignment of GATK-PathSeq unmapped reads. The top 15 genera by mean abundance are displayed. Units are genera reads per 1 million human reads. (D, left) Schematic of the kmer-enrichment approach. First, all 21mers were identified in the unassigned reads from BPDCN skin and control skin samples using jellyfish.22 21mers present in at least 2 BPDCN samples were kept, and any 21mer present in any control sample was removed. Reads containing remaining 21mers were subjected to BLASTN homology search. (D, right) Results from BLASTN search of kmer-enriched reads. The top 15 genera by mean abundance are displayed. Units are genera reads per 1 million human reads.