Figure 2.
trackSeq improves CFD gene detection by removing scRNA-Seq confounders. (A) HSC gene expression landscape (density). Log2(counts+1) used throughout study. (B) trackSeq stratification steps for increased filtering out of confounders not related to CFD. NOstrat compares gene expression across all cells, TSDstrat across cells with similar lifetimes (removes CC variance), and KINstrat across sisters (identical TSD, removes cross-clonal and CC variance). trackSeq adds fate directionality information. The different stratifications identify drastically different candidates. Dotted line: boundary of NOstrat detection.23 (C) TSDstrat removes CC effects but still detects many more NOstrat candidates than expected by chance (P = 4.27 × 10−171, hypergeometric test). (D) TSDstrat candidates have higher cross-clonal than intra-clonal variance and are thus unlikely CFD regulators. Unrelated: 203 randomly paired cells. Paired Wilcoxon rank sum test. Error bars: Tukey throughout study. (E) KINstrat selects candidates with larger intra-clonal variance than expected by chance. Sister expression differences for representative genes with Kinship stratification ranks. “Real”: absolute sister differences of measured log2(counts+1) expression. “Random”: total raw counts for each gene randomly re-distributed between sisters (see Methods). (F) KINstrat candidates have increased intra-clonal variance. Paired Wilcoxon rank sum test. (G) KINstrat reveals different candidates. (H) Information from trackSeq on sister fate directionality (here lysosome inheritance) improves detection of subtle but directed differences by accumulating directed differences but averaging out uncoordinated differences. (I) Paired trackSeq candidate expression for ACD daughters with lysosome ratio >1.6×, n = 59 pairs. To simulate sister analysis without information on fate directionality, random fates were assigned. Paired t test. (J) Eighty-nine percent of the top 500 trackSeq candidates are missed by other stratification methods. Single-cell resolution is required for differential candidate identification. (K) Decreasing single-cell resolution from trackSeq to pooled cell experiments was simulated by incrementally removing available information from trackSeq data: First, “Pooled trackSeq” averages LysoHigh and LysoLow cells into a LysoHigh or LysoLow pool, respectively. Second, “Pseudo bulk” removes single-cell library size normalization. Third, “Bulk” further removes single-cell library quality control (QC) information that would not be available for complementary DNA libraries from pooled cells. Top 500 candidates identified for each step using paired t tests (supplemental Figure 3A) between LysoHigh/LysoLow pools to relate changes in candidate identification to loss of single-cell information. (L) Loss of single-cell library information worsens trackSeq candidate re-identification. “Pooled trackSeq”/“Pseudo bulk”/“Bulk” identify 34.8%/19.8%/22.4% of trackSeq candidates.