Background Myeloid malignancies such as myeloproliferative neoplasms (MPNs) and myelodysplastic/myeloproliferative neoplasms (MDS/MPNs) are challenging to diagnose in routine practice due to overlapping clinical and laboratory features. These entities also share genetic abnormalities, further complicating distinction. Interpreting the complex mutational patterns in these disorders may benefit from advanced analytical approaches. We hypothesized that machine learning methods could differentiate among these entities based on their mutational patterns and compared two ensemble models to classify these malignancies.

Methods We analyzed whole-exome sequencing (WES) data from patients with essential thrombocythemia (ET), polycythemia vera (PV), myelofibrosis (MF), chronic myelomonocytic leukemia (CMML), and atypical chronic myeloid leukemia (aCML). We included mutations in 44 genes recurrently altered in these neoplasms and commonly represented in commercial NGS panels. We also defined 28 gene combinations, 9 pathway/protein-type variables (JAK–STAT, DNA methylation, histone modification, RNA splicing, RAS signaling, TP53, tyrosine kinases, cohesin proteins, and transcription factors), and 36 pathway combinations—117 genetic features in total. Two ensemble approaches were developed: (1) binomial (one-vs-all) and (2) multinomial (all classes simultaneously), each combining LASSO logistic regression, random forest, and XGBoost. Model performance was evaluated with a 70/30 train–validation split using accuracy, sensitivity, specificity, F1-score, and area under the ROC curve (AUC).

Results The cohort comprised 373 patients, including 110 from our institution and 263 from the literature: ET (n=92, 24.6%), PV (n=93, 24.9%), MF (n=97, 26%), CMML (n=78, 20.9%), and aCML (n=13, 3.4%).

In the binomial ensemble models, F1-scores were 0.98 (aCML), 0.93 (CMML), 0.79 (MF), 0.77 (PV), and 0.74 (ET). Sensitivity ranged from 64% to 100%, with lower values among the classic Ph-negative MPNs (ET, PV, MF). Specificity ranged from 20% to 93%, with lower specificity for aCML, the least frequent disease in our cohort. Accuracy was 0.96 (aCML), 0.90 (CMML), 0.71 (MF), 0.71 (PV), and 0.66 (ET).

The multinomial ensemble model achieved an overall accuracy of 0.60—approximately three times higher than random guessing (20%)—and superior F1-scores versus the binomial approach: 0.98 (aCML), 0.94 (CMML), 0.83 (MF), 0.81 (PV), and 0.89 (ET). Sensitivity ranged from 75% (PV) to 100% (aCML), and specificity from 20% (aCML) to 91.3% (CMML). In the validation cohort, common misclassifications included ET labeled as PV (n=12) or MF (n=5), and MF labeled as PV (n=9) or ET (n=3). ROC analysis showed AUCs of 0.929 (aCML), 0.978 (CMML), 0.829 (PV), 0.794 (MF), and 0.784 (ET).

Conclusion Machine-learning ensemble models can classify chronic myeloid neoplasms using mutational profiles alone. In our analysis, the multinomial approach performed better for differential diagnosis and potential clinical application, with particularly strong performance for CMML and aCML (high accuracy and minimal misclassification). Genetic similarities among ET, PV, and MF likely limit performance in those subgroups; incorporating additional mutational data (e.g., JAK2 allele burden) and/or clinical features (e.g., hemoglobin, platelet count) may improve accuracy. Further data collection to balance class sizes may also improve sensitivity and specificity. Ensemble models could help standardize disease classification and enable faster, more accurate diagnoses. Validation in larger, diverse cohorts is warranted to establish clinical utility.

This content is only available as a PDF.
Sign in via your Institution