FigureĀ 1.
Schematic workflow of development of the 2-step machine-learning model. The model was developed with (1) collection of clinical and laboratory data routinely available for patients with BMF from 2 independent cohorts; (2) curation of germ line variants identified by genetic testing in order to assign a label (target classification) for each patient correspondent to BMF etiology: acquired or inherited. All patients identified with pathogenic and likely pathogenic variants were labeled as inherited cases. Patients without germ line variants or with only benign/likely benign variants were labeled as acquired cases. Patients with VUS were not included in the training data set; (3) data preparation; (4) K-means clustering of cases from the training cohort; (5) classification machine-learning algorithm optimized for the cluster with the highest number of cases (cluster A); and (6) validation of the model in an external data set. The predictive model was next applied to predict BMF etiology in patients with VUS.