Figure 5.
Machine learning identifies best 20 gene predictors for separating young and aged HSCs in scRNA-seq. (A) Machine learning is able to predict with high accuracy young and aged HSCs. Top panel depicts a schematic representation of machine learning application into transcriptomic data from different scRNA-seq sets. Lower panel represents the output of one of the algorithm used (ADAboost) in individual cells from different sets. Young (circles) and aged (triangles) single cells were separated by PC1 (x-axis) and PC2 (y-axis) and color-coded according to the match between the measured aged of the cells and the predicted aged measured by the algorithm (gray for young cells, red for aged cells, and orange for mismatched cells). (B) Machine learning scores varies depending on which training set is used. Heatmap from ADAboost training depicting different overall scores for different scRNA-seq sets used. The overall score is color-coded from blue (lower scores) to yellow (higher scores). Sets on the x-axis (training sets) were used to train the algorithm, and the following sets on the y-axis (test sets) were scored according to training. (C) The best machine learning gene predictors also have high enrichment for membrane-associated proteins. Horizontal stacked plot comparing the list of the 20 best predictors extracted from machine learning algorithms (best predictors) and AS genes. The percentage of genes in each cellular location is represented as a percentage (x-axis) and divided by category (different colors). The 20 best predictor gene symbols are represented below. ER, endoplasmic reticulum.