In this issue of Blood, Fenwarth et al demonstrate the potential applicability of a knowledge bank (KB) to inform decision making as to whether patients ages 18 to 59 years with acute myeloid leukemia (AML) should undergo allogeneic hematopoietic stem cell transplantation (HSCT) in first complete remission (CR1) or whether it should be deferred to a later stage.1
KB is a multistage model using clinical, cytogenetic, molecular, and treatment variables drawn from results of 3 German-Austrian AML Study Group trials conducted from 1993 to 2004 involving 1540 patients.2 By incorporating these data, this algorithm holds the potential to construct personalized survival simulations. The authors apply their model to 545 HSCT-eligible patients entered onto the more recent ALFA-0702 trial (2009-2013).3 The authors report that by integrating KB with European LeukemiaNet (ELN) 2017 stratification and NPM1 minimal residual disease assessment, they can identify patients who would benefit from transplantation in CR1 but also those who would be specifically harmed by it.
When physicians and patients commit to HSCT, they do so presuming that it affords the best opportunity for long-term survival. Large prospective randomized trials of HSCT in transplant-eligible AML patients in CR1 have not been conducted because of a lack of clinical equipoise. Instead, we have relied upon donor vs no donor analyses on which to determine our HSCT recommendations.4 Although it is devastating to witness a death from transplant-related complications, one might rationalize this loss by believing that, on a population-wide scale predicated on well-annotated trial-based data, the choice to transplant was indeed justified. The model established by Fenwarth et al allows refinement of the decision-making process, potentially protecting those patients for whom HSCT in CR1 would be detrimental and aiding physicians to abide by their sacred pledge to “do no harm.”
Still, this algorithm requires additional validation and is not ready to be deployed for clinical use. The trials upon which KB was generated were performed 16 to 27 years ago. Chemotherapy for AML, HSCT strategies, and available antimicrobial agents have evolved considerably since then, leading to incrementally improved outcomes. The more recent ALFA-0207 trial,3 on which this model was validated, used in part a strategy of clofarabine and intermediate-dose cytarabine as intensification therapy that is not routinely used for this purpose today. It is also important to recognize that this model was generated from data on clinical trial subjects who may not be representative of “real-world” patients seen in the community.
The KB was not designed to be static. The authors have enhanced it by incorporating new data that were not available when the model was first developed. However, current strategies for detecting minimal residual disease today are not limited to NPM1-mutated patients and involve multiparameter flow cytometry and genomic tools targeting other mutations. A very significant challenge to this computer-based predictive model is that prognostic biomarkers are being identified at a rapid clip, far outpacing the ability to construct validated models that can be immediately applied to clinical decision making.
The ELN 2017 scheme is considered the current standard for risk classification in AML and is used to guide the type of consolidation therapy.5 Similar to other risk stratification systems, the ELN is useful in population-based studies. However, its ability to predict the fate of individuals is limited. The C-statistic of the ELN, a measure reflecting predictive accuracy is ∼60.1,2 A C-statistic of 100 indicates perfect concordance between the prediction provided by the model and actual outcome. In contrast, a C-statistic of 50 indicates random concordance. C-statistics values of 60 to 70, 70 to 80, and 80 to 90 are commonly considered reflective of poor, fair, and good concordance with predictions.6 The KB, which includes detailed clinical and genetic features, resulted in a modest but meaningful improvement in the C-statistic compared with ELN (68.9 vs 63.0). Although the KB only partially accounts for variance in outcome, this still represents progress. Objective tools guiding clinical decision making can reduce the influence of personal biases, often involved in medical care.7
Then the question remains: How can we improve our ability to foresee the results of possible therapeutic paths in AML? As elegantly discussed by Estey and Gale,6 latent covariates, not captured in registries or clinical trials, contribute to imperfect prediction. However, by the law of diminishing returns, adding more features to the model will only marginally increase accuracy.8 Another consideration is the ratio between the effect and sample size. Mathematical simulations show that 5000 to 10 000 patients are needed to detect an association between a gene that has a moderate-size prognostic effect on outcome and is present in 1% of the population.1 Therefore, in the era of next-generation sequencing, huge collaborative registries should be formed to capitalize on the wealth of data available. Furthermore, methods such as shrinkage techniques and machine learning algorithms capable of dealing with high-dimensional data (ie, the number of features is high relative to the number of patients) are needed.9 Finally, with all humility, physicians must acknowledge that there is inherent uncertainty to prediction. It is unrealistic to expect that features at the beginning of a patient’s journey or even at the time of transplantation will unambiguously determine his fate. Instead, prediction should be dynamic, recalculating probabilities throughout the course based on previous events. This sort of Bayesian approach was recently applied in diffuse large B-cell lymphoma, yielding sequential individualized estimation of disease-free survival following diagnosis, interim, and end-of-treatment response evaluation.10
These caveats notwithstanding, the work presented in this manuscript gives us a glimpse of what the future can hold. Some physicians might be wary of black box computer-generated algorithms, fearful that they might threaten the contribution of clinical judgment to decision making. But indeed, these models use many more variables than any clinician could likely juggle in his or her head. Tools such as these complement our clinical experience and can only enhance our ability to make what very well could be lifesaving recommendations to our patients (see figure). When the stakes are as high as they are for HSCT, we must embrace every opportunity to steer our patients to safe harbor.
Conflict-of-interest disclosure: The authors declare no competing financial interests.