Abstract
Prognostic factors determined at diagnosis are predictive for outcome whereas achievement of morphological complete remission (CR) is still an important end point during treatment. Residual disease after therapy may reflect the sum of all diagnosis and postdiagnosis resistance mechanisms/factors; its measurement could hypothetically be very instrumental for guiding treatment. The possibility of defining residual disease (minimal residual disease [MRD]) far below the level of 5% blast cells is changing the landscape of risk classification. In this manuscript, the various methods, all different in sensitivity, specificity, and phase of development, to assess MRD are discussed. Currently, the 2 methods mostly used are flow cytometry–based immune MRD (multiparameter flow cytometry [MPFC]) and molecular MRD assessed by real-time quantitative polymerase chain reaction. Both have advantages and disadvantages that are summarized in detail. Many studies in children as well as adults already demonstrated that MRD detection by MPFC or molecular MRD provides strong prognostic information in acute myeloid leukemia (AML) after both induction and consolidation. These studies are summarized in this review. The general conclusion of this review is that a better definition of disease burden than morphological CR is now emerging. MRD assessed by flow or molecular techniques should become standard in every clinical trial in AML. Harmonization of antibody panels, introduction of single-cell tube systems (for determination of residual leukemic stem cells), and standardized analytical programs will pave the way for individual risk assessment and become a surrogate end point for survival in studies investigating new drugs, hopefully resulting in faster drug approval in AML.
To have knowledge of the various methods available to assess MRD and their appropriate application
To know the clinical utility of MRD
Acute myeloid leukemia (AML) is a heterogeneous disease characterized by a multitude of molecular abnormalities. Better understanding of this complex mutational landscape has not resulted in dramatic changes in treatment in the last decades.1 At diagnosis, several factors have prognostic impact for outcome, although achievement of morphological complete remission (CR) is still an important end point. There is no cure without CR. The current definition of remission is based on those established decades ago. The morphologic definition of CR, that is, bone marrow (BM) should contain <5% blast cells (and neutrophil count >1.0 × 109/L and platelets >100 × 109/L) is practical, cheap, and clinically relevant: below the 5% cutoff level, it defines a group of patients performing relatively well compared with those above the cutoff level.
Currently, the term remission now also includes CRp (CR with platelets <100 × 109/L) and CRi (all criteria for CR except for neutrophil and/or platelet recovery). It has been shown that CRi and CRp are associated with shorter survival, underlining the poorer quality of these types of remission.
Independent of the definition of CR, the majority of CR patients relapse within a few years after diagnosis. A multitude of factors at diagnosis, including clinical parameters and cytogenetic as well as molecular factors and biological properties of the leukemic cells, have important prognostic impact for outcome in the whole patient population. Such risk factors were shown to correlate with quality of remission reflected by frequencies of residual disease.2-5
The outcome of AML treatment is highly variable and not individually predictable. It thus seems that prognosticators at diagnosis will not be able to reach the ultimate goal of individualized risk assessment. One important issue is that the prognostic impact of these factors in the present risk groups does not take into account the contribution of several factors such as cellular resistance mechanisms at diagnosis, and moreover, postdiagnosis factors, which include dosage, compliance, pharmacological resistance, and probably other unknown features. The only postdiagnosis prognostic factor that is now included in risk stratification is the CR status after the first course of induction remission therapy. Patients achieving late CR have a more dismal outcome. The earlier-mentioned factors, which corroborate proper risk classification, are only partly covered by inclusion of CR status. To illustrate the complexity of risk stratification, on the one hand relapses occur in very-good-risk patients, but, on the other hand, cures are possible in very-poor-prognosis AML. Because residual disease after different cycles of therapy may reflect the sum of all diagnosis and postdiagnosis resistance mechanisms/factors, its measurement could hypothetically be very instrumental for guiding treatment.
The possibility of defining residual disease far below the level of 5% blast cells is changing the landscape of risk classification. This so-called minimal residual disease (MRD; also referred to as measurable residual disease) approach at present establishes the presence of leukemia cells down to levels of 1:1000 to 1:106 white blood cells (WBCs), compared with 1:20 for morphology.6
Different platforms are available for assessing residual disease in AML including traditional light microscopy, fluorescence in situ hybridization, cytogenetics, multiparameter flow cytometry (MPFC), including leukemia stem cells (LSCs), real-time quantitative polymerase chain reaction (RT-qPCR), and next-generation sequencing (NGS), all different in sensitivity, specificity, and phase of development.6
Morphological assessment of blast percentage is highly unreliable because of limited sensitivity and interobserver variability. Striking discrepancies between standard morphology and flow cytometry–determined CR status have been clearly shown: part of BM classified as CR by morphology appeared to have in excess of 5% blasts by MPFC, whereas part of morphologically assessed non-CR cases, probably due to BM regeneration, in fact showed no or very few leukemic blasts by MPFC.7 An urgent need for a more sensitive, specific, and objective assessment of MRD than classical morphology is obvious in order to develop a more individualized approach of treatment decisions.
Methods for MRD assessment
MRD detection by PCR
RT-qPCR allows MRD detection in cases with chimeric fusion genes generated by balanced chromosomal rearrangements. Common targets are PML-RARA/t(15;17) RUNX1-RUNX1T1/t(8;21) CBFB-MYH11/inv(16)/t(16;16) DEK-CAN (NUP214)/t(6;9), t(11q23)/MLL fusions, t(5;11)/NUP98-NSD1 patients. Other genetic alterations that can be used for MRD detection include: (1) insertions/duplications (eg, NPM1, FLT3-ITD, MLL-PTD); (2) point mutations (CEBPA, IDH1/2, KIT, RAS, RUNX1, TP53, JAK2); (3) gene overexpression (WT1, EVI1, ERG).8,9
Altogether, in about 50% of patients, PCR assessment of MRD is, in principle, possible.
Apart from t(15;17), RUNX1-RUNX1T1, and CBFB-MYH11, currently, NPM1 is the best-validated molecular marker for MRD assessment.10 Overexpression of WT1 has been proven to be informative to predict outcome in selected cases.11 This assay will not be generally applicable because of limited sensitivity, lack of specificity, and better alternatives.
Real-time qPCR (RT-qPCR) is an advantage over RT-PCR for MRD detection because it is more reliable and can be readily standardized. Moreover, quantitative assays with the capacity to measure the absolute level of leukemic transcripts allow for the assessment of whether these levels are rising or falling, which is extremely important to inform therapy. This methodology has been standardized for several molecular markers for clinical implementation in the Europe Against Cancer (EAC) program.12 It is most likely that appropriate cutoff levels will differ between the various genomic aberrations.
A major drawback of RT-qPCR is that it can only be applied in around 50% of AML cases. MPFC, on the other hand, has been shown by many authors to enable MRD assessment in the vast majority of the AML patient population.
MRD detection by MPFC
The use of LAIP.
The basic principle is to identify, at diagnosis, immunophenotypically aberrant populations (LAIPs) that differ from the majority of normal hematopoietic cells, and to use these to trace residual leukemia after treatment. It is important to stress that these LAIPs consist of normally occurring marker/marker combinations, that are present in low or very-low frequencies in normal and regenerating BM. Different types of aberrancies can be distinguished; the major are cross-lineage expression of antigens, antigen overexpression, lack of antigen expression, and asynchronous expression of antigens. LAIPs generally consist of the pan leukocyte marker CD45, a primitive marker (CD34, CD117, CD133), a myeloid antigen (CD33, CD13), together defining normally occurring populations, and 1 or more markers/marker combinations defined as aberrantly expressed, and which may include the above-mentioned markers too. Although low, the background levels of LAIPs in normal and regenerating BM prevent specific detection of aberrancies with sensitivities higher than 1:10 000. To cover all possible aberrancies, this requires, for each new AML patient, the use of an extensive panel of monoclonal antibodies. The number of different LAIPs in AML may amount to about 100 as reported in a recent study.16
Identifying “different from normal” patterns.
Essentially, aberrant patterns of differentiation are recognized and translated into quantifiable aberrant cell populations (in fact LAIPs). It uses a standard fixed antibody panel to recognize leukemic cells based on their difference with normal hematopoietic cells at all stages of disease/treatment. The advantage is that it does not a priori restrict MRD assessment based on LAIPs defined at diagnosis only, thereby taking into account possible immunophenotype “shifts”.
Newer technologies
Digital PCR
Digital PCR (dPCR) is a refinement of conventional PCR methods that can be used to directly quantify and clonally amplify nucleic acids. The key difference between dPCR and traditional PCR lies in the method of quantifying nucleic acids. PCR carries out 1 reaction per single sample. dPCR also carries out a single reaction within a sample, however, the sample is separated into a large number of partitions and the reaction is carried out in each partition individually. This separation allows a more reliable collection and sensitive measurement of nucleic acid amounts and does not need a standard curve. It has been shown to be a reliable tool for MRD assessment in lymphoid malignancies with greater applicability and reduced labor intensiveness than RT-qPCR.17 Also in NPM1+ AML, dPCR has been shown to be applicable for a large variety of NPM1 mutation subtypes without the need for plasmid standards.18 It is also reliable for identifying patients at risk during follow-up.18
Next-generation sequencing
In a French study assessment of MRD in the NPM1-mutated genetic subgroup, IDH1/2 and DNMT3a mutations were quantified by NGS.19 IDH1/2 mutations were reliable markers for prediction of relapse in 100% of cases whereas DNMT3a mutations were not correlated to relapse after long follow-up. Persistence of this mutation in 40% of the patients on the one hand fits in the concept of clonal hematopoiesis, but, on the other hand, makes DNMT3a less suitable for specific MRD detection.
RUNX 1 is a frequently recurring mutation in AML. Because it lacks a mutational hotspot it requires patient-specific, RT-qPCR which, due to labor intensiveness and costs, is not attractive for MRD application. In a recent series, this mutation was detected by deep sequencing in 25.9% of 814 patients.20 The prognostic impact of residual levels of this mutation, based on separation according to median mutation burden, resulted in significant differences in outcome.
Heuser and colleagues used NGS to detect NPM1 and FLT3-ITD mutations at follow-up in remission. Parallel assessment of MRD by NGS and RT-qPCR in NPM1-mutated patients was concordant in 95% of analyzed samples.21 Many other mutations (like, eg, TET2, ASXL1) detected by NGS can probably be used for MRD assessment, however, clinical data are currently lacking. A possible risk is that these mutations can be present in a substantial number of cases in the premalignant lesion implying that one has to be assured that MRD positivity represents recurrence of the disease. NGS sensitivity reflects a promising tool to assess MRD, although it is currently applicable for only a restricted number of mutations. Although promising, various hurdles have to be taken, implicating that introduction for MRD assessment in clinical practice will not be realized in very short time. NGS will probably refine our MRD measurements further, however, standardization is currently lacking and the sensitivity level is about 1% (dependent on depth of sequencing) which cannot compete with other MRD measurements techniques discussed in the previous sections.
False-positivity and false-negativity in MRD
Careful examination of the studies published thus far reveals that, in most cases, a single cutoff level that defines “MRD+” and “MRD−” patient groups results in a small proportion of patients who, while classified as MRD+, remain in CR. This may, at least in part, be related to persistence of preleukemic clones, for example, for a molecular aberration like t(8;21). Clinically, false-positivity may lead to overtreatment, which is especially important as to the decision to treat or not with allogeneic transplantation. Increasing the threshold level that defines positivity usually results in better identification of patients at risk, but at the same time increases the number of MRD− patients with poor prognosis. However, in most studies, with cutoff levels in the order of 0.1%, already 20% to 40% of the patients classify as MRD false-negative. Further increase of these percentages would unacceptably decreases the value of MRD-based risk stratification. From this, one might argue that MRD-based risk stratification may ultimately benefit from the introduction of 2 instead of 1 cutoff level, thereby defining 3 patient groups in which different prognostic impact could then be combined with the other well-known prognostic parameters. Given that, it is clearly of importance to better identify the possible underlying causes of MRD negativity. There are several: (1) sensitivity of the assay, (2) phenotypic shifts, (3) differences in time of onset of outgrowth to relapse, (4) other biologically important factors such as frequencies of residual LSCs, and (5) quality of the aspirate. These causes are explored in further detail in the following sections.
Sensitivity of the assay
As referred to previously, assay sensitivity is restricted by its relatively low specificity.
Phenotypic shifts
Initially present aberrancies used for MRD detection can change or disappear while the leukemia-initiating clone/cell population remains (reviewed in Zeijlemaker et al22 ).
Differences in time of onset of outgrowth to relapse
Such differences are not adequately covered by assessing MRD at 1 particular time point after therapy.
Other biologically important factors such as frequencies of residual LSCs
LSCs are thought to be chemotherapy resistant and thereby to be at the basis of outgrowth of MRD cells to overt relapse. Different cellular compartments may contain LSCs. Based on relative therapy resistance, ability to grow out in different mouse models, and correlation with leukemic engraftment,23-25 as well as prognostic strength at diagnosis,26 CD34+CD38− LSCs qualify best for assessment under MRD conditions, at least in CD34+ AML cases. Identification of LSCs is challenging because of the very low frequency, down to 1 in 5 × 106 WBCs. CD34+CD38− LSCs often aberrantly express a plethora of cell surface markers in highly heterogeneous patterns with, similar to MRD, immunophenotype changes between diagnosis and relapse. For these reasons, similar to MRD, extensive antibody panels are required. Thanks to the high specificity of LSC detection,27 it was possible to replace a multitube 8-color antibody panel by a single 1 tube 8-color LSC assay (Table 1). This enables CD34+CD38− LSC detection in a broadly applicable, less expensive, and more efficient manner than current detection strategies. A further advantage is that a more objective analysis is possible with much less requirement of extensive experience of normal and leukemic BM differentiation patterns as is necessary for MPFC MRD. It remains to be established whether both MRD and LSC assays are necessary for optimal prognostification.
Quality of the aspirate
MRD levels in peripheral blood (PB) are lower than in BM so contamination with blood, when not sending in the first tap for examination but continuing to aspirate from the same site, will result in hemodilution and cause underrepresentation of the number of leukemic cells. This has been clearly shown in childhood acute lymphoblastic leukemia (ALL). For an optimal MRD assessment, it is required that aspirates are taken from separate puncture sites. Unequal infiltration of the BM space, BM fibrosis, and adhesive properties of blasts can all cause false-negative MRD results. Also, a heterogeneous response on therapy in the BM compartment may cause false MRD negativity. Positron emission tomography, using the proliferation marker 3′-deoxy-3′-[18F]fluoro-l-thymidine, for early assessment of treatment response in patients with AML, showed a marked heterogeneity in response especially in patients with refractory disease.28 An aspirate for assessment of response is typically a single point measurement. Consequently, this might sample a region negative for residual disease despite the presence of residual disease in other parts of BM. In 1987, Martens and Hagenbeek showed in a rat model that distribution of leukemic cells after chemotherapy was heterogenous, which contrasted to the homogeneous distribution at diagnosis.29 Theoretically, the number of MRD cells in peripheral blood may better represent the overall leukemic burden at different BM sites.
PB MRD vs BM MRD
PB to assess MRD would thus be an attractive alternative reliable source for MRD detection, also for specific purposes allowing a frequent sampling, facilitating sequential MRD monitoring. Sensitivity for MPFC is lower,30,31 whereas specificity may be higher as compared with the BM MRD assay.30 The latter is likely caused by a much lower frequency in PB compared with BM of “background” normal progenitor populations. Studies in ALL have already proven that assessment of BM MRD can be replaced by PB MRD.
In addition, molecular MRD (in patients with NPM1 mutant, core-binding factor [CBF], or with overexpressed WT1) on PB paralleled BM assessment. For MPFC, Maurillo et al found promising results of PB MRD in a series of samples from 50 patients.31 Recently, in a larger study (378 paired samples in 114 patients), MPFC-based MRD significantly correlated between PB and BM (r = 0.67, P < .001).30 In 78 patients, it was shown that the cumulative incidence of relapse (CIR) 1 year after induction therapy was 29% for PB MRD− patients and 89% for MRD+ patients (P < .001). Although less sensitive, the high specificity of PB MRD may thus outweigh this disadvantage, offering a promise for a prominent role in future clinical treatment decisions.
In conclusion, it is clear that various methods to assess MRD are available. Some are more standardized, others are less, although a few are still in development phase. Concerning the molecular and flow cytometric MRD, a broad combined US and European initiative to generate technical as well as clinical recommendations for MRD has been undertaken under the umbrella of European LeukemiaNet (ELN) and will be finalized soon.
Clinical aspects
Although the concept of MRD negativity as an indicator for the quality of treatment response is the same in AML and other hematological diseases such as chronic myeloid leukemia, multiple myeloma, and ALL, application of MRD assessment in AML has lagged behind.
Standard therapy produces CRs even in patients whose cytogenetic/molecular features suggest a poor outcome. However, the quality of the remission measured by MRD affects its duration. Incorporation of on-treatment data is becoming increasingly important. It is now generally accepted that levels of MRD assessed at particular time points during treatment offer an independent biomarker to predict outcome. Most of the available data have been derived from clinical studies with intensive AML treatment. The predictive value of MRD assessment in less-intensive treatment schedules is not studied very extensively. At least the timing of assessment will be different. From a cost-effectiveness perspective, we would recommend performing the assessment after 2 cycles of intensive chemotherapy to predict outcome.
Selected data from clinical studies using molecular MRD
Acute promyelocytic leukemia
MRD assessment has been introduced as a component of the standard response criteria in acute promyelocytic leukemia (APL). Grimwade et al showed in a large Medical Research Council (MRC) study with 406 APL patients that MRD assessed using the standardized EAC PML-RARA RT-qPCR proved to be more powerful independent predictor for relapse as compared with WBCs that usual dictates treatment.32 Given the high survival rates in low-risk APL, sequential monitoring is most useful in high-risk APL and relapsed APL. BM is the recommended source for monitoring because PB has a 1.5 log lower sensitivity.
CBF leukemias
RUNX1-RUNX1.
MRD assessment by RT-qPCR in AML with the RUNX1-RUNX1 transcript has been shown to be important. Molecular remission or a significant reduction in RUNX1-RUNX1 transcripts is strongly associated with a favorable outcome as shown in several studies. In a study of 96 t(8;21) AML patients, high WBC count, the presence of KIT mutations and of FLT3-ITD and FLT3-TKD, as well as a <3-log MRD reduction in univariate analysis showed a higher hazard rate for relapse.33 However, in multivariate analysis, MRD reduction remained the only prognostic factor. At 3 years, the CIR was 22% vs 54% in patients who achieved ≥3-log MRD reduction vs the others. These data were confirmed by Yin et al who, in a series of 163 patients taken from the MRC AML-15 trial, showed that >3 log reduction in RUNX1-RUNX1 transcripts in BM after 1 cycle of induction therapy was the strongest prognostic variable for relapse in multivariate analysis: relapse rate was only 4%, compared with >30% for those who did not reach this threshold.34 The fact that this did not translate into a better overall survival (OS) can be explained by excellent rescue, mostly by allogeneic stem cell transplantation (alloSCT), after relapse. In all studies, rising MRD levels, assessed by serial monitoring, predicted relapse.
CBFB-MYH11.
Similar to AML, t(8;21) assessment of CBFB-MYH112 studies has been shown to be very valuable for predicting outcome. In 98 patients with either inv(16)(p13.1;q22) or t(16;16) (p13.1;q22), Jourdan et al showed that MRD level after the second consolidation cycle of chemotherapy was the only significant prognostic parameter in multivariate analysis for relapse-free survival (RFS).33 Similarly, for inv16 AML as well, MRD proved to be a very good predictor of relapse: Yin et al identified cutoff MRD thresholds in BM (>50 copies) and PB (>10 copies) with a 100% relapse rate.34 In this study, it was also shown that there was a significant correlation between BM and PB levels in both groups of CBF AMLs during follow-up. Based on their data, serial monitoring should be performed at 3-month intervals during follow-up, creating a 3- to 4-month window for preemptive treatment.
Importantly, in t(8;21) patients, MRD negativity is not a condition sine qua non in contrast to inv(16) patients.
NPM1-mutated AML
Mutations of the NPM1 gene are among the most common molecular aberrancies in AML, present in roughly 30% of all AML and approaching 60% in normal karyotype AML. The prognosis is dependent on the presence or absence of additional mutations (FLT3-ITD and DNMT3A): patients with FLT3-ITD or mutated DNMT3A have a poorer outcome as compared with NPM1 single mutated as well as non FLT3ITD cases. The NPM1 mutations are heterogeneous (>50 reported) but provide an ideal leukemia-specific target for MRD detection by RT-qPCR because 3 types of mutations (A, B, D) account for 90% of NPM1-mutated cases. This contrasts to RUNX1 mutations as discussed previously. Krönke et al showed in 245 patients that, after double induction, RT-qPCR negativity for NPM1 had a CIR of 6.5% at 4 years vs 53% for NPM1+ patients.10 This was confirmed in retrospective studies.9,35 Also, rising levels of NPM1 after chemotherapy or alloSCT were predictive for relapse.
Recently, Ivey et al showed the complexity of NPM1-mutated AML. By molecular profiling >150 subgroups could be distinguished, of which, in univariate analysis, only the presence of FLT3-ITD and DNMT3A mutations were associated with outcome.36 In multivariate analysis, only MRD status (measured by RT-qPCR of NPM1) remained as a significant prognostic factor. The median sensitivity of the assay was 1 × 10−5. Risk of relapse at 3 years was 30% in case of an absence of NPM1 transcripts vs 82% in those with detectable transcripts. On sequential monitoring, relapse could be predicted by rising levels of NPM1 transcripts.
Other targets
Other targets include (1) Wilms tumor 1 (WT1) gene expression, (2) MLL-MLLT3, and (3) FLT3-ITD, explored in further detail in the following sections.
WT1 gene expression.
WT1 is not leukemic-specific, limiting the capacity to distinguish low-level MRD from expression in normal PB and BM cells. However, in an ELN study, it was shown to be possible to assess at least a 2-log reduction in transcript level in about 45% of the cases which provided independent prognostic information. Because of lack of specificity and limited sensitivity, and because more suitable tools are available for MRD detection, WT1 is not a likely candidate for routine MRD assessment.11
MLL-MLLT3.
MLL-MLLT3 is very rare in adults (2% of AMLs) in contrast to childhood AML where frequencies of over 10% have been reported. A sensitive RT-qPCR for different fusion products has been developed and found useful for MRD assessment to predict outcome. MRD negativity by this assay showed a very low relapse rate of 11% and an OS of 70% at 48 months while all MRD+ patients relapsed and died.37
FLT3-ITD.
FLT3-ITD mutations are present in 25% to 30% of AML patients and are correlated with poor outcome. Because of technical limitations (necessity for clone-specific primer/probes for patient-specific QR-PCR) and instability of the marker, the clinical applicability for MRD assessment is currently limited.
In conclusion, at present for routine clinical application of molecular MRD assessment in adult AML, it is only the CBF and NPM1+ AML cases that are good candidates.
Selected data from clinical studies using MPFC
Many studies in children as well as adults already demonstrated that MRD detection by MPFC provides strong prognostic information in AML after both induction and consolidation therapy (summarized in Table 2). Importantly, one has to realize that most of these studies were retrospective and in a single institute setting, thus introducing potential bias. Marker panels and instrumentations varied. By identifying cutoff values in the order of 10−3 to 10−4, it has been possible in most of these studies to identify 2 patient groups with either relatively poor or relatively good prognosis. Determining the optimal cutoff is mostly done by empirically setting a threshold below or above which patient outcome is statistically significantly different. In part of these cases, the thresholds found were validated in a prospective setting. Various statistical methods have been used, such as maximally selected log-rank statistics or receiver operating characteristic analysis. A couple of studies have now also been performed in a multicenter and, in 1 instance, also a multinational setting. Two of the largest studies are summarized here.
The HOVON group
The HOVON group established the value of immunophenotypic-determined MRD in patients younger than 61 years.16 MRD was evaluated in BM in 389 patients available for analysis. After all courses of therapy, low MRD values distinguished patients with relatively favorable outcome from those with high relapse rate and adverse RFS and OS. In the whole patient group and in the subgroup with intermediate-risk cytogenetics, MRD was an independent prognostic factor. RFS at 4 years was 23% in MRD+ patients vs 52% in MRD− patients and the rate of relapse at 4 years was 72% vs 42%, respectively. Multivariate analysis after cycle 2, when decisions about consolidation treatment have to be made, confirmed that high MRD values (>0.1% of WBCs) were associated with a higher risk of relapse after adjustment for consolidation treatment time-dependent covariate risk score and early or later CR.
The UK MRC group
The UK MRC group assessed MRD in 427 patients older than 60 years. MRD negativity conferred significantly better 3-year survival from CR (42% v 26% in MRD+ patients after cycle I, and 38% v 18%, respectively, after cycle II).38 Reduced relapse and higher risk of early relapse in MRD+ patients (median time to relapse, 8.5 vs 17.1 months, respectively) was shown. In multivariate analysis, MRD status was an independent prognostic factor, identifying a subgroup of intermediate-risk patients with poor outcome.
Collectively, these studies showed that low levels of MRD were associated with improved survival and lower risk of relapse. Many studies included a multivariate analysis showing that MRD was an independent predictor of RFS, OS, event-free survival (EFS), or combinations, superior to other well-defined prognostic factors such as AML type, age, WBC count at diagnosis, and classification of cytogenetic risk.
Pretransplant MRD
Evidence is accumulating that presence of MRD pretransplant is a powerful predictor of outcome after alloSCT in AML. Most of the available data comes from the Walter group at Fred Hutchinson Cancer Research Center (FHCRC; Seattle, WA). They showed that in 253 patients transplanted after myeloablative therapy in CR1, the 3-year OS was 73% for pretransplant MRD− patients vs 32% for pretransplant MRD+ patients and 73% vs 44%, respectively, for patients in CR2. Relapse rates were 21% for MRD− patients vs 58% for MRD+ CR1 patients and 19% vs 68% for CR2 patients. In a recent update, Araki et al60 showed that in 359 adults, the 3-year relapse rates were 67% in MRD+ as compared with 22% in MRD− patients, resulting in an OS of 26% vs 73%, respectively.39 These authors also showed that MRD status has the same predictive value in the nonmyeloablative transplant setting. Although this conclusion was based on retrospective data, this study does not support the preferential use of myeloablative conditioning in case of pretransplant MRD positivity. The quality of the response preceding transplantation seems to be the most important predictor of transplant outcome. The same group presented data that MRD positivity before myeloablative cord blood transplantation in patients has no impact on relapse rate and survival, indicating that the greater graft-versus-leukemia effect of this procedure can overcome the persistence of residual disease present before transplantation.
The recent finding that conversion of MRD positivity pretransplant to MRD negativity after myeloablative conditioning does not improve relapse rate or OS is disappointing.
Obviously, additive peritransplant strategies, like posttransplant epigenetic or immunotherapeutic approaches, are necessary to improve outcome.
LSCs as predictive marker for outcome
Treatment failure in AML is most probably caused by the presence at diagnosis of leukemia-initiating cells, also referred to as LSCs, and their persistence after therapy. We developed flow cytometric methods using LSC-associated markers and newly defined aberrancies to identify LSCs and to distinguish these from normal hematopoietic stem cells (HSCs).
At diagnosis, the frequency of the thus-defined neoplastic part of the CD34+CD38− putative stem cell compartment had a strong prognostic impact.26 After different courses of therapy too, higher percentages of neoplastic CD34+CD38− cells in CR strongly correlated with (shorter) patient survival (Figure 1). Discrimination between putative LSCs and HSCs in this study thus allowed for demonstration of the clinical importance of putative CD34+CD38− LSCs in AML. Moreover, combining neoplastic CD34+CD38− frequencies with frequencies of MRD cells, which reflect the total neoplastic burden, revealed 4 patient groups with different survival. As discussed before, a single-cell tube has been constructed that is now prospectively validated in a large HOVON study.30
Current status of clinical application of MRD
MRD negativity is becoming accepted as a potential surrogate for clinical benefit in some hematological malignancies like chronic lymphocytic leukemia, ALL, and APL. For AML, this is still doubtful because of the heterogeneity of the disease and the complexity of the MRD assessment.
Two nonrandomized studies point clearly in the direction that treatment intensification guided by MRD measurement improves the outcome; they are explored in further detail in the following sections.
Risk-directed study
Rubnitz et al applied risk-directed therapy in childhood AML.40 Risk was determined by risk profile at diagnosis and level of MRD after the first cycle of chemotherapy. MRD was measured by flow cytometric determination of LAIP. Levels of MRD were used to intensify treatment (addition of Mylotarg) and timing of the second induction cycle. The outcome was superior to other comparable trials performed and reported in childhood AML, suggesting that this success was due to this risk-stratification strategy based on MRD findings.
Risk-adapted study
Zhu et al performed a risk-adapted nonrandomized study in CBF AML.41 Patients were assigned to alloSCT after the second consolidation treatment if MRD+. In case of MRD negativity, the treatment consisted of chemotherapy/autologous SCT. Some patients decided not to follow the assigned treatment based on the MRD status. It was shown that the patients receiving treatment other than that assigned by the risk status did worse than those who completed their assigned treatment. Although the authors showed that both groups were balanced, a bias cannot be excluded in this retrospective study design.
It is clear that randomized studies are lacking, showing unambiguously the clinical benefit that can be achieved with risk-adapted MRD-based treatment. At least the UK MRC AML18 trial is trying to answer this question by randomizing for MRD assessment.
However, although perhaps premature, many trial groups and individual centers start to guide treatment based on MRD status. The decision to proceed with an alloSCT can be adapted by taking the MRD status into account. Also posttransplantation monitoring of MRD could guide the prescription of immune-suppressive drugs (see Figure 2).
Whether MRD can be used as a surrogate end point for survival is still debatable. However, once established, it would be very helpful, not only to redefine risk groups, but also in the evaluation of new drugs by offering the possibility of faster drug approval or, in contrast, stopping development of drugs and treatment strategies that are ineffective and suboptimal. Currently, 2 studies may serve as examples of how MRD can be used as a surrogate for survival end points; they are explored in the following sections.
Daunorubicin study
Prebet et al showed that, in CBF-AML, a higher dosage of daunorubicin (90 mg/m2, 3 days) has a better clinical outcome than the standard dosage (60 mg/m2, 3 days), which was associated with a significantly lower level of MRD.42
Gemtuzumab ozogamicin study
In the ALFA-0701 trial, the addition of Mylotarg (gemtuzumab ozogamicin [GO]) to standard-induction therapy did not result in a higher CR rate but showed an improvement in 2-year OS (53% vs 42%) and EFS (41% vs 17%). Interestingly, MRD negativity (assessed by NPM1 RT-qPCR) was significantly more frequent in the GO arm as compared with the control arm: 39% vs 7% after induction and 91% vs 61% at the end of treatment.43
Conclusion
A definition of disease burden better than morphologic CR is now emerging. MRD assessed by flow or molecular techniques should become standard in every clinical trial in AML. Harmonization of antibody panels, introduction of single-cell tube systems (LSCs) and standardized analytical programs will pave the way for MRD-based individual risk assessment and for MRD to become a surrogate end point for survival in studies investigating new drugs, thereby hopefully resulting in faster drug approval in AML.
Correspondence
Gert Ossenkoppele, VU University Medical Center Amsterdam, Amsterdam, The Netherlands; e-mail: g.ossenkoppele@vumc.nl.
References
Competing Interests
Conflict-of-interest disclosure: G.O. has received research funding from Johnson & Johnson (J&J), Celgene, Karyopharm, and Novartis, has consulted for J&J and Karyopharm, and has received honoraria from J&J, Celgene, and Roche. G.J.S. declares no competing financial interests.
Author notes
Off-label drug use: None disclosed.