Developmental trajectories of BCR::ABL1-positive ALL can be determined by gene expression. (A) Uniform manifold approximation and projection (UMAP) plot shows unsupervised clustering of 493 BCP-ALL patients (GMALL study group) based on 2802 genes previously established15 for allocation to 21 molecular disease subtypes. A total of 18 subtypes represented in this adult cohort are shown. Arrows indicate separation of BCR::ABL1-positive patients into 2 distinct clusters. (B) BCR::ABL1-positive samples from this cohort (n = 113) were reanalyzed by UMAP analysis with systematic variation of 30 setting combinations for the parameters “min_dist” and “n_neighbors” (supplemental Figure 1). Sample-to-sample distances for each setting were calculated, z-transformed, and averaged. Hierarchical clustering of the averaged distances is shown. To define the final number of clusters, the dendrogram was progressively split at each junction and the integrity of the resulting clusters was determined using machine learning (SVM linear). When the predictability (Cohen κ) of a cluster decreased below 0.8, no further cluster splitting was performed (for details, see supplemental Figure 2). This resulted in 2 main clusters (C1 and C2) with 4 subclusters (C1a, C1b, C2a, C2b), which could be reliably predicted. (C) To test whether similar clusters were present in other cohorts, 2 machine learning classifiers (1 for the 2 main clusters and 1 for the 4 subclusters) were trained on the basis of 178 and 331 LASSO genes, respectively, derived from the GMALL discovery cohort (supplemental Tables 2-6). Gene expression data from validation cohorts (Munich Leukemia Laboratry (MLL), n = 61; St. Jude Children's Research Hospital, n = 104; and Princess Margaret Cancer Centre (PMCC), n = 49) were used for hierarchical clustering together with the GMALL reference cohort after batch correction. Newly established classifiers were used for sample allocation to the 2 main and 4 subclusters (supplemental Table 1), which are shown in the annotation. (D) UMAP plots obtained from the data in panel C, showing the classifier predictions for the main clusters (left) and subclusters (right). (E) Bone marrow/peripheral blood samples at first diagnosis of ALL were fluorescence-activated cell sorted into hematopoietic compartments on cover slides and used for BCR::ABL1 fluorescence in situ hybridization (FISH) (supplemental Figure 4). Bars depict the frequency of BCR::ABL1-positive cells in the corresponding compartments: myeloid cells (CD45lowCD19−CD10-CD34+/−CD13/33+), mature B cells (CD45highCD19+CD10−CD20+), T cells (CD45highCD19-CD3+CD16/65−), or B lymphoid precursor/ALL cells (CD45lowCD19+CD10+; in 1 case with pro-B immunophenotype, ALL cells were only identified by CD45lowCD19+). FISH signal constellations and distribution in analyzed cells are detailed in the supplemental Table 12. Note: ∗less than 100 cells analyzed, ¥less than 50 cells analyzed.