Combining cellular barcoding with multiplex deep sequencing – setup and method validation. (A) Individually barcoded LSK48−150+ cells were monoclonally expanded in liquid culture. (B) Different numbers of cells from expanded barcode cultures were mixed to generate samples with different ratios of barcodes and different total cell numbers. After isolation of genomic DNA, individual samples were amplified with primers bearing multiplex tags. Pooled PCR products were analyzed on Illumina HiSeq 2000 (Illumina, Inc., San Diego, CA). Steps of data processing and noise filtering are described. Dmin refers to minimal distance or nucleotide difference between 2 barcodes. (C) Number of unique sequencing reads (sequences different at any sequence position from all other reads) that remain after removal of noise calculated based on calibration samples. (D) Calculated reads frequencies (proportions of total number of reads in a multiplexed sample) related to true barcodes and various sources of noise. (E) Distribution of barcode frequencies in calibration samples with different cell content (1000 to 500 000). The original ratio of mixed barcodes was 1:1:1:1:1:1:1:2 (top barcode, green). Each color represents a distinct barcode. (F) Barcode analysis in a sample with highly unequal barcode composition (1:5:10:10:25:25:50:55). Barcodes comprising 0.55% of the total mix (gray) could be quantitatively detected.