Key Points
The predictive model stratified CML patients into 3 risk subgroups with significantly different cumulative incidences of TKI-therapy failure.
Visual Abstract
Although tyrosine kinase inhibitor (TKI) therapy has markedly improved the survival of people with chronic-phase chronic myeloid leukemia (CML), 20% to 30% of people still experienced therapy failure. Data from 1955 consecutive patients with chronic-phase CML diagnosed by the European LeukemiaNet recommendations from 1 center receiving initial imatinib or a second-generation (2G) TKI therapy were interrogated to develop a clinical prediction model for TKI-therapy failure. This model was subsequently validated in 3454 patients from 76 other centers. Using the predictive clinical covariates associated with TKI-therapy failure, we developed a model that stratified patients into low-, intermediate- and high-risk subgroups with significantly different cumulative incidences of therapy failure (P < .001). There was good discrimination and calibration in the external validation data set, and the performance was consistent with that of the training data set. Our model had the better prediction discrimination than the Sokal and European Treatment and Outcome Study long-term survival scores, with the greater time-dependent area under the receiver-operator characteristic curve values and a better ability to redefine the risk of therapy failure. Our model could help physicians estimate the likelihood of initial imatinib or 2G TKI–therapy failure in people with chronic-phase CML.
Introduction
Tyrosine kinase inhibitor (TKI) therapy has markedly improved the survival of persons with chronic-phase chronic myeloid leukemia (CML), with 85% to 90% of persons surviving for >10 years.1-6 However, 20% to 30% of people still experienced therapy failure.6-13 Patients who met therapy-failure milestones had a high risk of disease progression and even death during TKI therapy.14-19 Guidance from the World Health Organization, the European LeukemiaNet (ELN), and the National Comprehensive Cancer Network suggests that switching TKI therapy should be considered for patients who failed to meet certain response milestones.20-25 Consequently, it is important to accurately predict the likelihood of therapy failure in persons with chronic-phase CML when choosing which TKI to begin with.
Recently, we developed a predictive model of therapy failure in persons with chronic-phase CML initially treated with imatinib.11,13,26 However, with the increasing number of persons receiving initial second-generation (2G) TKI therapy, existing predictive scoring systems seem inadequate to meet the clinical need for identifying high-risk patients for TKI-therapy failure. To achieve this aim, we interrogated data from 1955 consecutive patients with chronic-phase CML according to the ELN recommendations from the center to develop the predictive model and then externally validated it in 3454 patients from 76 other centers.20-23 The model stratifies patients into low-, intermediate- and high-risk subgroups with significantly different cumulative incidences of therapy failure. Our model could help physicians estimate the likelihood of initial imatinib- or 2G-TKI therapy failure in people with chronic-phase CML.
Methods
Patients
Data from consecutive patients aged ≥18 years with newly diagnosed chronic-phase CML with the e14a2 and/or e13a2 BCR::ABL transcripts receiving initial imatinib or a 2G-TKI including nilotinib, dasatinib, or flumatinib therapy at Peking University People’s Hospital from January 2006 to September 2023 were interrogated to develop the model (training data set). Data from patients from January 2006 to January 2023 with the same inclusion criteria from 76 other centers were used to validate the model (validation data set). The patients included in the study had regular follow-up during TKI therapy. Demographic and clinical covariates including complete blood count, percentages of blood blasts, basophils and eosinophils, spleen size below the costal margin, comorbidities, and initial TKI therapy, and results of hematological, cytogenetic, and molecular analyses were extracted from the medical records. Patient data were obtained before therapy. Starting doses were imatinib, 400 mg per day; nilotinib, 300 mg twice daily; dasatinib, 100 mg per day; and flumatinib, 600 mg per day. These doses were subsequently adjusted according to responses and/or adverse events on the basis of ELN recommendations.20-23 The study was approved by the ethics committee of Peking University People’s Hospital (no. 2022PHB103-001), and all patients provided written informed consent, which was consistent with the precepts of the Declaration of Helsinki.
Diagnosis, monitoring, and definitions
Diagnosis and monitoring were performed according to the ELN recommendations.20-23 Sokal and European Treatment and Outcome Study long-term survival (ELTS) scores at diagnosis were calculated as previously described.4,27 Bone marrow cytogenetic analyses were performed using the G-banding method. Additional cytogenetic abnormalities (ACAs) in Philadelphia chromosome–positive (Ph+) cells were identified as described previously,23,28 and high-risk ACAs were defined by the 2020 ELN recommendations.23 All ACAs were independently reviewed by 2 cytogeneticists. When the interpretations were inconsistent, they were referred to a senior cytogeneticist for reevaluation. Blood samples were used to analyze BCR::ABL transcripts at diagnosis. During therapy, BCR::ABL transcript levels were analyzed by quantitative real-time polymerase chain reaction with an ABL1 control and converted to the international scale (IS; BCR::ABLIS) using laboratory-specific conversion factors validated at the Institute of Medical and Veterinary Science International Reference Laboratory when the value (IS) was <10%.29
Hematologic response was monitored every 1 to 2 weeks until a complete hematologic response (CHR) was achieved, and every 3 to 6 months thereafter. Cytogenetic response was assessed at baseline and every 3 to 6 months thereafter until a complete cytogenetic response (CCyR) was achieved, and repeated when patients failed treatment. Quantitative real-time polymerase chain reaction monitoring was performed at diagnosis and every 3 months thereafter until a major molecular response (MMR) was achieved, and every 3 to 6 months thereafter. ABL1 mutation screening was performed in patients with a suboptimal, warning, or failure to TKI therapy according to the ELN recommendations.20-23
Response definitions were as follows: (1) CHR: white blood cell < 10E+9/L, platelets <450 × 10E+9/L, no blood blasts or promyelocytes, <5% blood myelocytes and metamyelocytes, <5% blood basophils and no extramedullary leukemia, and duration of ≥4 weeks; (2) CCyR: no Ph+ cells in ≥20 bone marrow metaphases; (3) MMR: BCR::ABLIS ≤0.1%; (4) molecular response 4 (MR4): BCR::ABLIS ≤0.01%; and (5) molecular response 4.5 (MR4.5): BCR::ABLIS ≤0.0032%.20-23 TKI-therapy failure was defined as meeting “failure” milestones in the 2020 ELN recommendations23; loss of responses including CHR, CCyR, or MMR; or transformation to advanced phase as defined by the ELN recommendations.20-23 Failure-free survival (FFS) was calculated from the start of TKI therapy to the first therapy failure or censored at a transplant, death, or the last follow-up. We analyzed only the first episode in patients with >1 therapy failure. Transformation was defined as blood or bone marrow blasts of ≥15%, increased lymphoblasts, or extramedullary leukemia during TKI therapy.24,30 Transformation-free survival (TFS) was calculated from the start of TKI therapy to transformation or censored at a transplant, death, or the last follow-up. Survival was calculated from the start of TKI therapy to death or censored at transplant or the last follow-up. The last follow-up was 31 January 2024.
Statistical analyses
Descriptive statistics were used to summarize covariates. Categorical covariates were reported as percentages and counts. Continuous variables were reported as medians and ranges or interquartile ranges (IQRs). The Pearson χ2 test was used to analyze categorical covariates. The Student t test (normal distribution) or the Mann-Whitney U test (nonnormal distribution) was used to analyze continuous covariates. Cumulative incidences of therapy failure were calculated using the competing risk model and compared by the Fine-Gray test considering competing events, which were defined as transplant or death. FFS was calculated by the Kaplan-Meier method and compared by the log-rank test.
Cox and Fine-Gray regression models were used for univariable and multivariable analyses to identify covariates associated with TKI-therapy failure. Candidate covariates included sex, age, spleen size below the costal margin, white blood cell, hemoglobin and platelet concentrations, percentages of blood blasts, basophils and eosinophils, comorbidity(ies), and Ph+ ACAs at diagnosis. Fractional polynomial transformations were used for the candidate continuous prognostic covariates in Cox regression models.31,32 The most statistically suitable polynomial transformation for each covariate was identified when modeling its influence on TKI-therapy failure. These transformations were also used for the Fine-Gray models. Interactions among candidate covariates were tested in regression models. Akaike information criterion was used to select prognostic covariates and build the best model.33,34 Significant prognostic covariates from multivariable analyses of the training data set were used to develop the predictive model. We used the final best multivariable regression model to develop a predictive model for therapy failure. A total of 1000 bootstrap samples in the training data set were used to identify candidate cutoffs.35,36 The Fine-Gray test, the minimal P value approach, and Bonferroni correction were used to select optimal cutoffs for significantly different cumulative incidences of therapy failure.37,38 Kernel density estimator was used to fit the smooth function to visualize the distribution of cutoffs.39 Subsequently, patients were classified into different therapy-failure risk subgroups based on the predictive model.
The time-dependent area under the receiver-operator characteristic curve (AUROC) was used to estimate the accuracy of the predictive model, and calibration plots were generated to determine how closely the predicted and observed cumulative incidences of therapy failure were concordant.40,41 Decision curve analysis (DCA) was used to calculate the net benefit using the model.42
Propensity score matching (PSM) was used to adjust for differences in baseline covariates between patients receiving imatinib or a 2G-TKI as initial therapy, and balance was evaluated using the standardized absolute mean difference for which a score of <0.02 was considered balanced.43,44
A 2-sided P < .05 was considered significant. SPSS 22.0 (SPSS, Chicago, IL), R version 4.0.2 (R Core Team, Vienna, Austria), and GraphPad Prism 8 (GraphPad Software Inc, La Jolla, CA) were used for analyses and graphing.
Results
Training data set
Data from 2587 consecutive patients receiving initial imatinib or a 2G-TKI therapy in the training data set are displayed in Figure 1. A total of 632 patients were excluded because of the interval from diagnosis to starting the TKI therapy of ≥6 months (n = 41), advanced phase at diagnosis (n = 236), missing important baseline covariates (n = 194), irregular response monitoring and/or loss to follow-up (n = 112), and non-e14a2 and/or e13a2 transcripts (n = 49) that could not be evaluated for molecular responses based on the IS using ELN recommendations.20-23 The remaining 1955 patients received initial imatinib (n = 1539, 79%) or a 2G-TKI (n = 416, 21%) including nilotinib (n = 280, 14%), dasatinib (n = 72, 4%), or flumatinib (n = 64, 3%).
Patient covariates are displayed in Table 1; 1195 (61%) patients were male; median age was 40 years (IQR, 30-52); and 68 (3%) patients had ACAs in Ph+ cells at diagnosis, 41 (2%) of whom had high-risk ACAs (supplemental Table 1, available on the Blood website). Median follow-up was 56 months (IQR, 30-91). Overall, 1584 (81%) patients continued receiving their initial TKI therapy including imatinib (n = 1216, 79%), nilotinib (n = 249, 89%), dasatinib (n = 61, 85%), or flumatinib (n = 58, 91%). A total of 495 patients (25%) failed initial TKI therapy at a median of 9 months (IQR, 4-14), because of not meeting the ELN therapy milestones at 3 (n = 112), 6 (n = 88), or 12 (n = 82) months; loss of responses (n = 50), emergence of TKI-resistant ABL1 mutation(s) (n = 56) or high-risk ACAs in Ph+ cells (n = 4); and transformation to advanced phase (n = 72) alone or combined (n = 31). Only 1 patient died of hepatocellular carcinoma before therapy failure. With a median follow-up of 65 months (IQR, 38-98) in the imatinib cohort, 8-year probabilities of FFS, TFS, and survival were 71% (95% confidence interval [CI], 68-74), 93% (95% CI, 91-95), and 96% (95% CI, 95-97). With a median follow-up of 36 months (IQR, 19-69) in the 2G-TKI cohort, 6-year probabilities of FFS, TFS, and survival were 71% (95% CI, 66-76), 90% (95% CI, 87-93), and 92% (95% CI, 88-96). Although the follow-up duration in the 2G-TKI cohort was significantly shorter than that in the imatinib cohort (P < .001), patients receiving initial 2G-TKI therapy had comparable outcomes including FFS (P = .69), TFS (P = .17), and survival (P = .39) compared with those receiving imatinib therapy. PSM analyses further confirmed these results (supplemental Table 2; supplemental Figure 1).
. | Training data set (n = 1955) . | Validation data set (n = 3454) . | P value . |
---|---|---|---|
Age, years, median (IQR) | 40 (30-52) | 42 (32- 54) | <.001 |
Male, n (%) | 1195 (61) | 2075 (60) | .45 |
Spleen size, cm below costal margin, median (IQR) | 3 (0-10) | 2 (0-7) | <.001 |
WBC, ×10E+9/L, median (IQR) | 122 (47-235) | 123 (51-225) | .59 |
Hemoglobin, ×10E+9/L, median (IQR) | 115 (97-132) | 111 (94-127) | <.001 |
Platelets, ×10E+9/L, median (IQR) | 410 (270-635) | 412 (263-625) | .99 |
Blood blasts, %, median (IQR) | 1 (0-3) | 1 (0-2) | <.001 |
Blood basophils, %, median (IQR) | 5 (2-8) | 4 (2-7) | <.001 |
Blood eosinophils, %, median (IQR) | 2 (1-4) | 2 (1-4) | .73 |
Sokal risk, n (%) | <.001 | ||
Low | 819 (42) | 1526 (44) | |
Intermediate | 555 (28) | 1157 (33) | |
High | 394 (20) | 543 (16) | |
Unknown | 187 (10) | 228 (7) | |
ELTS risk, n (%) | <.001 | ||
Low | 1115 (57) | 2207 (64) | |
Intermediate | 471 (24) | 795 (23) | |
High | 182 (9) | 224 (6) | |
Unknown | 187 (10) | 228 (7) | |
Ph+ACAs, n (%) | 68 (3) | 118 (3) | .90 |
High-risk ACAs, n (%) | 41 (2) | 51 (1) | .09 |
Comorbidity(ies), n (%) | 700 (36) | 694 (20) | <.001 |
Initial TKI therapy, n (%) | <.001 | ||
Imatinib | 1539 (79) | 2386 (69) | |
2G TKI | 416 (21) | 1068 (31) | |
Nilotinib | 280 (14) | 620 (18) | |
Dasatinib | 72 (4) | 154 (4) | |
Flumatinib | 64 (3) | 294 (9) | |
Follow-up∗, mo, median (IQR) | 56 (30-91) | 45 (26-76) | <.001 |
. | Training data set (n = 1955) . | Validation data set (n = 3454) . | P value . |
---|---|---|---|
Age, years, median (IQR) | 40 (30-52) | 42 (32- 54) | <.001 |
Male, n (%) | 1195 (61) | 2075 (60) | .45 |
Spleen size, cm below costal margin, median (IQR) | 3 (0-10) | 2 (0-7) | <.001 |
WBC, ×10E+9/L, median (IQR) | 122 (47-235) | 123 (51-225) | .59 |
Hemoglobin, ×10E+9/L, median (IQR) | 115 (97-132) | 111 (94-127) | <.001 |
Platelets, ×10E+9/L, median (IQR) | 410 (270-635) | 412 (263-625) | .99 |
Blood blasts, %, median (IQR) | 1 (0-3) | 1 (0-2) | <.001 |
Blood basophils, %, median (IQR) | 5 (2-8) | 4 (2-7) | <.001 |
Blood eosinophils, %, median (IQR) | 2 (1-4) | 2 (1-4) | .73 |
Sokal risk, n (%) | <.001 | ||
Low | 819 (42) | 1526 (44) | |
Intermediate | 555 (28) | 1157 (33) | |
High | 394 (20) | 543 (16) | |
Unknown | 187 (10) | 228 (7) | |
ELTS risk, n (%) | <.001 | ||
Low | 1115 (57) | 2207 (64) | |
Intermediate | 471 (24) | 795 (23) | |
High | 182 (9) | 224 (6) | |
Unknown | 187 (10) | 228 (7) | |
Ph+ACAs, n (%) | 68 (3) | 118 (3) | .90 |
High-risk ACAs, n (%) | 41 (2) | 51 (1) | .09 |
Comorbidity(ies), n (%) | 700 (36) | 694 (20) | <.001 |
Initial TKI therapy, n (%) | <.001 | ||
Imatinib | 1539 (79) | 2386 (69) | |
2G TKI | 416 (21) | 1068 (31) | |
Nilotinib | 280 (14) | 620 (18) | |
Dasatinib | 72 (4) | 154 (4) | |
Flumatinib | 64 (3) | 294 (9) | |
Follow-up∗, mo, median (IQR) | 56 (30-91) | 45 (26-76) | <.001 |
WBC, white blood cell.
Censored at a transplant, death, or the last follow-up.
Validation data set
For the validation data set we interrogated data from 4309 patients with chronic-phase CML from 76 other centers. A total of 855 patients were excluded because of the interval from diagnosis to starting TKI therapy of ≥6 months (n = 66), advanced phase at diagnosis (n = 125), missing several important baseline covariates (n = 227), irregular response monitoring and/or loss to follow-up (n = 276), non-e14a2 and/or e13a2 transcripts (n = 23), and non-IS monitoring (n = 138). The remaining 3454 patients received initial imatinib (n = 2386, 69%) and a 2G-TKI (n = 1068, 31%) including nilotinib (n = 620; 18%), dasatinib (n = 154; 4%), or flumatinib (n = 294, 9%). In the validation data set, 665 patients (19%) failed initial TKI therapy at a median of 9 months (IQR, 5-17) because of meeting therapy-failure criteria at 3 (n = 149), 6 (n = 107), or 12 (n = 87) months, loss of responses (n = 74), emergence of TKI-resistant ABL mutation(s) (n = 12) or high-risk ACAs in Ph+ cells (n = 23), and transformation to advanced phase (n = 109) alone or combined (n = 104). A total of 60 (2%) patients died during TKI therapy, including 13 who died before therapy failure and 47 who died from disease transformation. With a median follow-up of 45 months (IQR, 26-76), 6-year probabilities of FFS, TFS, and survival were 75% (95% CI, 73-77), 96% (95% CI, 95-97), and 97% (95% CI, 95-99).
Comparison of the training and validation data sets
The distribution of baseline covariates in the validation data set differed from those in the training data set (Table 1). Patients in the validation data set were older (P < .001), and had a smaller spleen size below the costal margin (P < .001), lower percentages of blood blasts and basophils (P < .001), lower proportions of Sokal and ELTS high-risk patients (P < .001) and fewer comorbidities (P < .001) compared with patients in the training data set. Additionally, the median follow-up of 45 months (IQR, 26-76 months) in the validation data set was significantly-shorter than that in the training data set (median [IQR], 56 [30-91] months; P < .001). Probability of FFS in the validation data set was significantly-higher than that in the training data set (P = .01). TFS and survival were similar (P = .25 and .54; supplemental Figure 2).
Predictive model
Results of univariable analyses of predictive covariates in the training data set are displayed in supplemental Table 3. Multivariable Cox and Fine-Gray analyses of the 1701 patients with complete data for the 12 candidate covariates from the univariable analyses indicated that male sex, increasing age, lower hemoglobin concentration, higher percentage of blood blasts, larger spleen size below costal margin, and high-risk ACAs in Ph+ cells at diagnosis were significantly associated with TKI-therapy failure with no significant interactions (supplemental Table 4).
After polynomial transformation, hemoglobin concentration was transformed to “(hemoglobin/100)−2” in the regression model. Using the 6 covariates correlated with TKI-therapy failure in the 1762 evaluable patients with complete data, we repeated these analyses using Cox and Fine-Gray regression models with similar results (Table 2). Although the results of the Cox and Fine-Gray models had high concordance for therapy failure when there were competing risks, we used the Fine-Gray model that is more parsimonious and precise.45-47 The Fine-Gray model, which included the 6 covariates, was used to develop the predictive model: initial TKI-therapy failure risk score = 0.1919 × sex (male = 1, female = 0) + 1.6160 × (age/100) + 0.3105 × (hemoglobin concentration/100)−2 + 0.1087 × blood blasts + 0.0671 × spleen size below costal margin + 0.5461 × high-risk ACA in Ph+ cells (Y = 1, N = 0).
Covariates . | Regression coefficient . | HR (95% CI) . | P value . |
---|---|---|---|
Multivariable Cox analyses | |||
Male | 0.2026 | 1.2 (1.0-1.5) | .02 |
Age/100, y | 1.5668 | 4.8 (2.3-9.8) | <.001 |
(Hemoglobin/100)−2, g/L | 0.3108 | 1.4 (1.3-1.5) | <.001 |
Blood blasts, % | 0.1085 | 1.1 (1.1-1.2) | <.001 |
Spleen size, cm below costal margin | 0.0675 | 1.1 (1.0-1.1) | <.001 |
High-risk ACAs in Ph+ cells∗ | 0.5458 | 1.7 (1.1-2.8) | .03 |
Multivariable Fine-Gray analyses | |||
Male | 0.1919 | 1.2 (1.0-1.5) | .01 |
Age/100, y | 1.6160 | 5.0 (2.5-10.3) | <.001 |
(Hemoglobin/100)−2, g/L | 0.3105 | 1.4 (1.2-1.5) | <.001 |
Blood blasts, % | 0.1087 | 1.1 (1.1-1.2) | <.001 |
Spleen size, cm below costal margin | 0.0671 | 1.1 (1.0-1.1) | <.001 |
High-risk ACAs in Ph+ cells∗ | 0.5461 | 1.7 (1.1-2.8) | .02 |
Covariates . | Regression coefficient . | HR (95% CI) . | P value . |
---|---|---|---|
Multivariable Cox analyses | |||
Male | 0.2026 | 1.2 (1.0-1.5) | .02 |
Age/100, y | 1.5668 | 4.8 (2.3-9.8) | <.001 |
(Hemoglobin/100)−2, g/L | 0.3108 | 1.4 (1.3-1.5) | <.001 |
Blood blasts, % | 0.1085 | 1.1 (1.1-1.2) | <.001 |
Spleen size, cm below costal margin | 0.0675 | 1.1 (1.0-1.1) | <.001 |
High-risk ACAs in Ph+ cells∗ | 0.5458 | 1.7 (1.1-2.8) | .03 |
Multivariable Fine-Gray analyses | |||
Male | 0.1919 | 1.2 (1.0-1.5) | .01 |
Age/100, y | 1.6160 | 5.0 (2.5-10.3) | <.001 |
(Hemoglobin/100)−2, g/L | 0.3105 | 1.4 (1.2-1.5) | <.001 |
Blood blasts, % | 0.1087 | 1.1 (1.1-1.2) | <.001 |
Spleen size, cm below costal margin | 0.0671 | 1.1 (1.0-1.1) | <.001 |
High-risk ACAs in Ph+ cells∗ | 0.5461 | 1.7 (1.1-2.8) | .02 |
HR, hazard ratio; N, no; Y, yes.
High-risk ACAs in Ph+ cells were scored as categorical covariates (“Y/N”) in the regression models.
Identifying cutoffs for the predictive model
The TKI-therapy failure risk score was rounded to 4 decimal places. One-thousand bootstrap resamplings were performed for the 1762 patients with complete data. After confirmation of cutoffs with significance and data rationality by the Fine-Gray test, the minimal P value approach, and Bonferroni correction, 892 cutoffs remained. Smoothing functions of these cutoffs were determined by the kernel density plot (supplemental Figure 3). The kernel density plot for the cutoffs at the 2 highest peaks was then used to classify the patients into low- (score ≤ 1.3115; n = 716; 41%), intermediate- (1.3115 < score ≤ 2.4266; n = 812; 46%), and high-risk (score > 2.4266; n = 234; 13%) subgroups.
Cumulative incidence of therapy failure using the predictive model
In the training data set using the predictive model, 8-year cumulative incidences of therapy failure for the 3 risk subgroups were 10% (95% CI, 5-15), 34% (95% CI, 29-39), and 69% (95% CI, 63-75; P for trend < .001; Figure 2A), respectively. 8-year probabilities of FFS were 90% (95% CI, 87-93), 66% (95% CI, 62-70), and 32% (95% CI, 23-41; P for trend < .001; Figure 2B). Hazard ratios with the low-risk subgroup as reference were 3.8 (2.9, 5.0; P < .001) and 10.4 (7.7, 14.0; P < .001).
In the validation data set, 3218 patients with complete data for the 6 prognostic covariates in the model were included; 1179 (37%), 1765 (55%), and 274 (8%) patients were classified as low-, intermediate-, and high-risk, respectively, using the predictive model. Six-year cumulative incidences of therapy failure for the 3 risk subgroups were 7% (95% CI, 2-12), 28% (95% CI, 24-32), and 54% (95% CI, 48-60; P for trend <.001; Figure 2C), respectively. Six-year probabilities of FFS were 93% (95% CI, 91-95), 70% (95% CI, 67-73), and 42% (95% CI, 35-49; P value for trend <.001; Figure 2D) respectively. Hazard ratios with the low-risk subgroup as reference were 5.4 (4.1, 7.2; P < .001) and 12.1 (8.8, 16.6; P < .001).
Predictive model performance
To evaluate model accuracy, we plotted time-dependent AUROCs for therapy failure at 1, 3, and 5 years in the training and validation data sets (supplemental Figure 4A,D). In the training data set, we found good prediction sensitivity and specificity with 1-, 3- and 5-year AUROCs of 0.83 (95% CI, 0.81-0.85), 0.84 (95% CI, 0.82-0.86), and 0.84 (95% CI, 0.82-0.87), respectively. Comparable AUROC values in the validation data set were 0.77 (95% CI, 0.74-0.79), 0.79 (95% CI, 0.77-0.82), and 0.80 (95% CI, 0.78-0.83), respectively. Calibration plots for the 1-, 3- and 5-year cumulative incidences of therapy failure indicated good concordance between the predicted and observed cumulative outcome incidences (supplemental Figure 4B,E). DCA curves indicated a net benefit from using the model (supplemental Figure 4C,F).
Predictive model application in the imatinib or 2G-TKI cohort
Patients with complete data from the training (n = 1762) and validation (n = 3218) data sets were integrated into the entire data set (n = 4980), in which 3621 patients receiving initial imatinib therapy were identified as the low- (n = 1389, 38%), intermediate- (n = 1907, 53%) and high-risk (n = 325, 9%) subgroups using the predictive model; 1359 patients receiving 2G-TKI therapy were identified as low- (n = 506, 37%), intermediate- (n = 670, 49%), and high-risk (n = 183, 14%) subgroups. With a median follow-up of 59 months (IQR, 37-92) in the imatinib cohort, 8-year cumulative incidences of therapy failure in the 3 risk subgroups were 11% (95% CI, 8-14%), 36% (95% CI, 32-40%), and 71% (95% CI, 67-75%; P < .001; Figure 3A), respectively. With a median follow-up of 28 months (IQR, 19-52) in the 2G-TKI cohort, 4-year cumulative incidences of therapy failure in the 3 risk subgroups were 5% (95% CI, 1-9%), 26% (95% CI, 20-32%), and 53% (95% CI, 46-60%; P < .001; Figure 3B), respectively.
Predictive model performance by age
We evaluated the performance of the predictive model by age: <40 years (n = 2268, 46%), 40-60 years (n = 2022, 41%), and >60 years (n = 690, 13%). Regardless of the age subgroup, the predictive model demonstrated the excellent ability to predict TKI-therapy failure (supplemental Figure 5).
Predictive model for molecular responses, TFS, and CML-related survival
In the entire data set, 1895 (38%), 2577 (52%), and 508 (10%) patients were identified as the low-, intermediate- and high-risk, respectively, by the predictive model. There were significant differences in the cumulative incidences of MMR (P < .001), MR4 (P < .001), and MR4.5 (P = .001), as well as the probabilities of TFS (P < .001) and CML-related survival (P = .01) among the 3 subgroups (supplemental Figure 6).
Comparison of the predictive model with the Sokal and ELTS scores
A total of 4975 patients were simultaneously classified by the Sokal, ELTS, and the predictive models (Figure 4A-B). Although both the Sokal and ELTS scores could extensively predict TKI-therapy failure (supplemental Figure 7), our model had better discriminatory ability than the Sokal and ELTS scores, with significantly greater AUROC values (predictive model: 0.71-0.77; Sokal: 0.63-0.68; and ELTS: 0.65-0.70; supplemental Figure 8A). This was especially the case in the 2G-TKI cohort (supplemental Figure 8C). DCA also indicated that the net benefit of using the predictive model was greater than that of using the Sokal or ELTS scores (supplemental Figure 9). Most importantly, 2340 and 3316 patients in the Sokal and ELTS low-risk subgroup, respectively, were reclassified into the low-, intermediate-, or high-risk subgroups using our model with the significantly different cumulative incidences of therapy failure (all P values < .001; Figure 4C,F). Results were similar in the Sokal or ELTS intermediate- and high-risk subgroups, indicating better predictive ability of the TKI-therapy failure model than the Sokal and ELTS scores (Figure 4D-H).
Intermediate- and high-risk patients identified by the predictive model receiving initial 2G-TKI therapy had lower therapy-failure rates compared with those receiving imatinib therapy
We compared the cumulative incidences of TKI-therapy failure in patients receiving initial imatinib or a 2G-TKI therapy in each model risk subgroup by the predictive model using PSM analyses to adjust for the differences in the baseline covariates (supplemental Tables 5-7). Intermediate- or high-risk patients identified by the predictive model receiving initial 2G-TKI therapy had significantly lower cumulative incidences of therapy failure than patients receiving initial imatinib therapy (all P values <.001; supplemental Figure 10). Low-risk patients treated with imatinib or a 2G-TKI had comparable therapy-failure rates (P = .79; supplemental Figure 10).
Discussion
Although >80% of patients with CML exhibit a long-term survival on TKI therapy, 20% to 30% of patients still experienced therapy failure. Patients with therapy failure generally have a poor prognosis and are often switched to an alternative TKI or even transplantation as salvage therapy.14-19 Therefore, accurate prediction at diagnosis of which persons with chronic-phase CML will fail initial TKI therapy is important. We used data from 1955 patients to develop a TKI-therapy failure predictive model. The model identified 3 risk subgroups and had good predictive accuracy with the high time-dependent AUROC values. We validated our model in an independent data set of 3454 patients with similarly high time-dependent AUROC values.
Currently, the Sokal and ELTS scores, which were originally introduced to predict the survival or CML-related survival in the context of chemotherapy or TKI therapy, are the most widely used to guide initial TKI therapy in chronic-phase CML.4,6,23,24,27 Recent studies reported that they can also be expanded to predict therapy failure and disease progression.6,48,49 Consequently, we compared our model with these scores and unsurprisingly found that our model has better prediction accuracy. This finding does not imply that the predictive model can replace the Sokal and ELTS scoring systems. Survival remains the most crucial outcome for patients with CML, and the Sokal and ELTS scores were specifically developed for predicting survival. The more appropriate use of the predictive model is to further stratify persons who are identified as low or intermediate risk by the Sokal or ELTS score, making the risk assessment more precise. Importantly, regardless of which model identified a “high-risk” person, more attention should be given to therapeutic strategies and disease monitoring.
Despite the relatively low incidence of ACAs in Ph+ cells at diagnosis in our study, which is consistent with previous studies,28,50-52 high-risk ACAs rather than non–high-risk ACAs were strongly associated with TKI-therapy failure.
We found that patients in the 2G-TKI cohort had fewer late failure events than did those in the imatinib cohort. Possible reasons include the following: (1) 2G-TKIs are more effective resulting in a reduced progression risk, especially in intermediate- and high-risk patients as reported in previous studies.3,53,54 We also found that intermediate- or high-risk patients identified by our model receiving 2G-TKI therapy had a lower therapy-failure rate than those receiving imatinib therapy. (2) Follow-up in the 2G-TKI cohort was shorter than in the imatinib cohort with fewer patients.
Cumulative incidence of therapy failure in the validation data set was lower than that in the training data set. However, patients in the validation data set were older, and had smaller spleen size below costal margin, lower percentages of blood blasts and basophils, and lower proportions of Sokal and ELTS high-risk patients.6,13,55 Additionally, follow-up of the validation data set was shorter than that of the training data set. Regardless, our model had good prediction accuracy in both data sets.
Our study has important limitations. First, it was retrospective. However, the number of patients in the data sets was large. Second, our patients were relatively young compared with those with CML of predominantly European descent; therefore, the external validation from different races and countries is needed. Third, we were not able to strictly monitor the therapy compliance, especially in patients in the validation data set because of the large number of contributing centers. However, this heterogeneity increases the generalizability of our conclusions. Finally, PSM analyses can only adjust for known prognostic covariates and, although useful, are not a substitute for randomized controlled trials.
In conclusion, we developed and externally validated a TKI-therapy failure predictive model in persons with chronic-phase CML receiving initial TKI therapy with the high accuracy and precision. Intermediate- and high-risk patients identified by our model receiving 2G-TKI therapy had lower therapy-failure rate than those receiving imatinib. This conclusion needs further confirmation in a randomized controlled trial. Our model could help physicians estimate the likelihood of therapy failure of initial TKI therapy and help physicians choose the best initial TKI.
Acknowledgments
The authors thank the following medical staff who provided patient data: Ru Feng, Junxia Meng, Xiaonan Zhang, Jijun Wang, Lan Ma, Rongxia Wei, Zhiqiang Sun, Yun Zeng, Xiaoyi Lv, Zhen Xiao, Xingxia Zhang, Yanping Ma, Xiaoyan Ge, Jianmin Luo, Yujuan Yang, Congmeng Lin, Pengliang Xin, Hai Yi, Yilan Liu, Liling Zheng, Hebing Zhou, Luoming Hua, Wangxiang Huang, Yonghuai Feng, Yu Jing, Lijun Wang, Linhua Yang, Hongguo Zhao, Hairong Fei, Qin Li, Jianhui Qiao, Hua Fan, Shasha Zhao, Ronghua Hu, Jianli Wang, Yin Sai, Wenqian Li, Lianyu Shen, Li Ding, Fang Ye, Min Ouyang, Shunjie Wu, and Liru Wang.
R.P.G. acknowledges support from the UK National Institute for Health Research Biomedical Research Centre. Q.J. acknowledges support from the National Natural Science Foundation of China (grants 81970140 and 82370161). This study was funded by the National Nature Science Foundation of China (grants 81970140 and 82370161).
Authorship
Contribution: Q.J. and X.H. designed the study; Q.J., X.Z., B.L., J. Huang, Y. Zhang, N.X., and R.P.G. analyzed the data and prepared the manuscripts; Q.J., W.L., X.L., Y.Y., H.L., B.L., X.D., R.L., C.C., J. Huang, H.Z., L. Pan, X.W., G.L., Zhuogang Liu, Y.Z., Zhenfang Liu, J. Hu, C.L., F.L., W.Y., L.M., Y.H., L.L., Z. Zhao, C.T., C.Z., Y.B., Z. Zhou, S.C., H.Q., L.Y., X. Sun, H.S., L.Z., Z.L., D.W., J.G., L. Pang, Q.Z., X. Suo, W.Z., and Y. Zheng provided most of the patient data in this study and revised the manuscript; and all the authors approved the final manuscript, were responsible for the content, and agreed to its submission for publication.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Qian Jiang, Peking University People’s Hospital, Peking University Institute of Hematology National Clinical Research Center for Hematologic Disease, Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, No. 11 Xizhimen South St, Beijing 100044, People's Republic of China; email: jiangqian@medmail.com.cn; and Xiaojun Huang, Peking University People’s Hospital, Peking University Institute of Hematology, National Clinical Research Center for Hematologic Disease, Beijing Key Laboratory of Hematopoietic Stem Cell Transplantation, Peking-Tsinghua Center for Life Sciences, Academy for Advanced Interdisciplinary Studies, Peking University, State Key Laboratory of Natural and Biomimetic Drugs, Peking University, No. 11 Xizhimen South St, Beijing 100044, People's Republic of China; email: huangxiaojun@bjmu.edu.cn.
References
Author notes
X.Z., B.L., J. Huang, Y. Zhang, and N.X. contributed equally to this study.
For original data, please contact corresponding author Qian Jiang (jiangqian@medmail.com.cn).
The online version of this article contains a data supplement.
There is a Blood Commentary on this article in this issue.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Comments
Change of failure definition
An explanation for the new data is that the great majority of imatinib-treated patients failing at 3 months respond by month 24 (late responders)5. Patients failing at months 12 and 24 who survive 10 years point to a beneficial TKI-effect regardless of reaching failure milestones.
Compared with the endpoint survival, therapy failure is a moving target requiring adaptation to new data from time to time. I wonder what adaptations of the predictive model are foreseen by the authors.
1 Hehlmann R et al (2017) Leukemia, 31, 2398
2 Guilhot F et al (2021) Leukemia, 35, 2332
3 Bidikian A et al (2023) American journal of hematology, 98, 639
4 Lauseker M et al (2023) Leukemia, 37, 2231
5 Hehlmann R & Lauseker M (2024) Leukemia, 38, 465