In this issue of Blood, Gutierrez-Rodrigues et al1 use machine learning to predict the disease etiology in a large retrospective cohort of patients with well-annotated bone marrow failure syndrome (BMFS) treated at the National Institutes of Health and the University of São Paulo. The ability to differentiate between inherited and acquired BMFS was possible using artificial intelligence (AI), with good accuracy especially in acquired BMFS, using 25 clinical and laboratory parameters without the requirement of germline genomic data. Telomere length (TL) was the top-ranking predictor that helped this delineation. A model without TL was also useful but with low sensitivity and underperformed for telomere biology disorders without classical triad findings. Overall, the model could potentially be used to prioritize patients for genetic testing.
Aplastic anemia (AA) is a rare condition and is a diagnosis of exclusion in an acquired setting; patients can present with acute deterioration of their hematological indices or a slowly progressive decline in blood counts.2 The need to rule out drug-, toxin-, or virus-induced transient marrow suppression and also infiltrative and malignant processes like hypoplastic myelodysplastic syndromes (MDSs) is critical.3 Once a diagnosis of AA is confirmed, ascertaining the etiology (acquired or inherited) of the underlying BMF is important but can be challenging and time-consuming, especially in adults with no classic physical features and no family history. Although genetic testing to identify inherited phenotype is becoming increasingly available even in low-resource settings, the turnaround times to get the genetic panels results could be up to 3 months or longer. Inappropriate or delayed therapeutic decisions while waiting for the genetic panels could potentially be harmful for patients. With this backdrop, the authors should be commended for using machine learning tools and inputting simple clinical and laboratory parameters to help physicians and hematologists differentiate the 2 different forms of BMFS in a timely manner.
Although AI domains like machine learning are not frequently reported in research studies and the algorithms may be too complicated for routine clinical application by practitioners and researchers,4 the authors have given a simple overview of the process and have also successfully developed a free online tool/app for differentiating types of AA. This will allow others in the BMF field to replicate and adapt this approach to their own clinical datasets.
It is extremely gratifying to note, even in the era of expanding molecular genomics, a simple but focused clinical evaluation (physical abnormalities, mucocutaneous triad, early graying, multiorgan disease) with a good history and family pedigree assessment supplemented with routine laboratory parameters (long-standing cytopenias/macrocytosis, mean corpuscular volume, red cell distribution width) and TL is sufficient to accurately identify the etiology of BMF in a vast majority of patients. Nothing trumps clinical acumen! The ability to collate and integrate all the 25 clinical/laboratory variables to elucidate the cause of the BMF was only possible through machine learning.
TL analysis has been a helpful screening tool in the initial assessment of AA, and, not surprisingly, was the key variable out of 25 variables included for the accurate prediction of the diagnosis. Interestingly, although the majority of patients had the gold standard flow fluorescence in situ hybridization (FISH) for TL measurement, other techniques to measure TL including quantitative polymerase chain reaction and Southern blotting did not negatively impact on the performance of the model. The ability to use TL reported as percentiles (<1st vs 1st to 10th vs >10th percentile) ascertained through nonflow FISH methods or using a model that entirely excluded TL while still maintaining accuracy, albeit with lower sensitivity, using only simple clinical/laboratory predictors is vitally useful for resource restricted centers (see figure). The ability to identify specifically acquired (vs inherited) AA, especially in patients aged >18 years with severe pancytopenia, and initiate immunosuppression is not impacted by the availability of TL testing.
Some key variables like paroxysmal nocturnal hemoglobinuria (PNH) clones and karyotype analysis were not incorporated in the current main model, due to missing data. Future iteration of the model should include these variables, which potentially could improve the accuracy and sensitivity. Presence of PNH clones not surprisingly correlated with acquired AA,5 as shown previously and in the subanalysis of patients in the NIH cohort. The ability of the model to be adaptable and to include additional indices like fetal hemoglobin and evolving baseline parameters like acquired somatic mutations,6 HLA loss/mutations, and abnormalities detected by molecular array karyotyping should also be incorporated and validated in future iterations.
The median age was 28 and 23 years in the training and validation data sets, respectively, and age was a key variable identified in the predictive model. As AA has bimodal incidence, the predictive power should be applicable in both extremes of ages, the high likelihood of inherited BMFS in pediatric cohorts and the possibility of hypoplastic MDS or acquired AA or even DDX41-related myeloid neoplasms in older cohort7 (see figure).
Increasing use of genomics will identify variants of unknown significance (VUS) in germline genomic panels. Patients with VUS, with classic features of inherited BMFS, and predicted to be inherited by the model could potentially have pathogenic variants in genes that are not covered by the existing panel. These patients should be screened by whole genome sequencing to identify potential uncharacterized gene mutations.
The first AI-driven approach to delineate BMF cases is definitely a timely tool especially for general hematologists and low-resource centers, with immediate utility using routine clinical/laboratory parameters leveraging the proposed online application. The ability of this tool to be flexible and incorporate additional variables would potentially improve the model (see figure). Newer machine learning tools in AA to predict response and clonal evolution,8 including malignant transformation to myeloid neoplasms, using baseline hematological parameters are also likely to add value and expand the role of AI in AA—a rare disease with significant morbid complications if not diagnosed, categorized appropriately, and managed in a timely manner.
Conflict-of-interest disclosure: The author declares no competing financial interests.