Background:

Higher-risk myelodysplastic syndromes/neoplasms (HR-MDS) and secondary acute myeloid leukemia (sAML) share overlapping molecular features but both remain clinically diverse with poor prognosis. Machine learning (ML)-based genomic clustering offers an opportunity to redefine disease subgroups beyond traditional clinical prognostic schemes or pathologic classifications. Prognostic scoring systems emphasize outcomes while pathologic classifications morphology resulting in inclusion of patients (pts) with different genetic background into same outcome or morphologic subgroup. We compared our ML-unsupervised clustering (UC) model named molecular nosology1 with the taxonomy model2 to evaluate differences in cluster architecture and clinical relevance in HR-MDS and sAML.

Methods

We analyzed HR-MDS (≥10% blasts) and sAML pts with available molecular data at diagnosis from a well-annotated cohort. Our molecular nosology model used UC via a binary latent-factor model to generate molecular clusters (MCs). The taxonomy model assigned pts to predefined genomic classes. We compared MC size, genomic enrichment, and overall survival (OS) across diagnostic categories. Concordance was assessed using Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI). Purity and entropy were calculated to assess how concentrated (purity) or dispersed (entropy) each taxonomy group was across MCs.

Results

Among 898 pts, 367 (41%) had HR-MDS and 531 (59%) had sAML. Median age was 71 (IQR 65–77) years. Compared to sAML, HR-MDS pts had similar age (71 vs. 72, p=0.556) but higher hemoglobin (9.9 vs. 9.3, p<0.001) and platelet counts (91 vs. 50, p<0.001). Normal karyotype was more common in HR-MDS (55% vs. 42%, p<0.001). TET2 (24%), ASXL1 (24%), RUNX1 (23%), and SRSF2 (22%) were the most frequently mutated genes.

Using our nosology model, we identified 14 MCs.1 The most frequent were MC2 (normal karyotype) at 22% and MC13 (TP53-complex) at 17%, followed by MC1 (+8/ASXL1/TET2/RUNX1, 12%), MC6 (SRSF2/RAS-pathway, 9%), and MC9 and MC14 (each 8%). Less common clusters included MC12 (5%) and MC3–MC5 (2–3%).

For comparison, the taxonomy model included 18 genomic groups: TP53-complex (20%), mNOS (13%), bi-TET2, IDH-STAG2, and no event (each 10%), CCUS-like, SETBP1/-7, and AML-like (each 6%), SRSF2 (4%), del(5q), BCOR/L1, EZH2-ASXL1, and SF3B1 (each 3%), and five rare subgroups (DDX41, der(1;7), U2AF1157, U2AF134, ZRSR2; all <1%).

Our MCs aligned closely with several taxonomy groups: 82% of TP53-complex cases were assigned to MC13, 99% of no event cases mapped to MC2, and 97% of del(5q) cases mapped to MC8. Bi-TET2 cases were distributed across MC6 and MC12 (both TET2-enriched); SETBP1/-7 to MC11; and EZH2-ASXL1 cases to MC1 and MC12. In contrast, mNOS, AML-like, and CCUS-like groups were genomically heterogeneous and spanned multiple MCs.

ARI and NMI values were 0.391 and 0.458, indicating moderate concordance. Purity was highest in no event (0.989), del(5q) (0.967), and TP53-complex (0.823), and lowest in IDH-STAG2 (0.247), CCUS-like (0.241), and AML-like (0.347), all with high entropy (>2.5), reflecting genomic diversity not captured by taxonomy groups alone.

Median OS for the entire cohort was 17 mo (95%CI: 14–20). HR-MDS pts had longer OS vs. sAML (26 vs. 12 mo, p<0.001). OS did not differ significantly between HR-MDS and sAML within most taxonomy groups or MCs, except for MC3 (NA vs. 11 mo, p=0.014), MC6 (111 vs. 15 mo, p<0.001), and MC9 (18 vs. 9 mo, p=0.027) for the nosology model and SF3B1 (129 vs. 23 mo, p=0.033), del(5q) (33 vs. 6 mo, p=0.022), bi-TET2 (25 vs. 8 mo, p=0.019), mNOS (34 vs. 9 mo, p<0.001) for taxonomy.

Conclusions

Some molecular taxonomy groups showed strong concordance with our nosology-defined clusters, which served as the gold standard for genomically distinct subsets (e.g., TP53-complex, del(5q), and no event), irrespective of diagnosis or blast percentage which may present the stage of the disease. In contrast, taxonomy-defined categories such as CCUS-like, AML-like, and mNOS were more genetically diverse and less aligned with specific clusters in the nosology model. Median OS within most clusters was not significantly influenced by blast count, underscoring the potential of molecular-driven clustering to reveal underlying disease biology not captured by morphology and to refine classification in MDS/AML and sAML with few exceptions (SF3B1 or biTET2).

Ref

1.Kewan, Durmaz et al.Nat Commun 2023

2.Bernard, et al.Blood 2024

This content is only available as a PDF.
Sign in via your Institution