FigureĀ 7.
Cell-type specificity of regulatory elements is encoded in the underlying motif composition. (A) (i) UMAP representation of ATAC-seq regions in CD34+ cells (gray) with heptad TF bound HSC-MPP specific regions colored in purple. (ii) An XGBoost machine learning model was trained and tested with motif counts from a mixture of regions specified in panel Ai and background regions, to predict cell type with high accuracy. The receiver operating characteristic (ROC) curve shows the predictive performance of the constructed model to predict HSC-MPP specific regions. (iii) A beeswarm plot depicting the top 12 representative motifs in HSC-MPP specific regions, ranked based on their absolute importance in contributing to the predictive model. Each row shows the motif (and canonical TF family if known), and the corresponding SHAP values for the cell type in question (right) and the others (left). The feature count indicates the normalized motif counts with a range of 0 to 1. (B) (i) UMAP representation of ATAC-seq regions in CD34+ cells (gray) with heptad TF bound GMP specific regions colored in green. (ii) ROC curve showing the performance of the model to predict GMP specific regions. (iii) A beeswarm plot depicting the top 12 representative motifs in GMP specific regions, ranked based on their absolute importance in contributing to the predictive model. (C) (i) UMAP representation of ATAC-seq regions in CD34+ cells (gray) with heptad TF bound MEP specific regions colored in orange. (ii) ROC curve showing the performance of the model to predict MEP specific regions. (iii) A beeswarm plot depicting the top 12 representative motifs in MEP specific regions, ranked based on their absolute importance in contributing to the predictive model.

Cell-type specificity of regulatory elements is encoded in the underlying motif composition. (A) (i) UMAP representation of ATAC-seq regions in CD34+ cells (gray) with heptad TF bound HSC-MPP specific regions colored in purple. (ii) An XGBoost machine learning model was trained and tested with motif counts from a mixture of regions specified in panel Ai and background regions, to predict cell type with high accuracy. The receiver operating characteristic (ROC) curve shows the predictive performance of the constructed model to predict HSC-MPP specific regions. (iii) A beeswarm plot depicting the top 12 representative motifs in HSC-MPP specific regions, ranked based on their absolute importance in contributing to the predictive model. Each row shows the motif (and canonical TF family if known), and the corresponding SHAP values for the cell type in question (right) and the others (left). The feature count indicates the normalized motif counts with a range of 0 to 1. (B) (i) UMAP representation of ATAC-seq regions in CD34+ cells (gray) with heptad TF bound GMP specific regions colored in green. (ii) ROC curve showing the performance of the model to predict GMP specific regions. (iii) A beeswarm plot depicting the top 12 representative motifs in GMP specific regions, ranked based on their absolute importance in contributing to the predictive model. (C) (i) UMAP representation of ATAC-seq regions in CD34+ cells (gray) with heptad TF bound MEP specific regions colored in orange. (ii) ROC curve showing the performance of the model to predict MEP specific regions. (iii) A beeswarm plot depicting the top 12 representative motifs in MEP specific regions, ranked based on their absolute importance in contributing to the predictive model.

Close Modal

or Create an Account

Close Modal
Close Modal