TO THE EDITOR:
Devillier et al1 present results from an observational cohort study on the efficacy of allogeneic hematopoietic stem cell transplantation (allo-HSCT), as compared with nonintensive consolidation chemotherapy, in 507 older patients (60-70 years) with acute myeloid leukemia (AML) of intermediate or unfavorable risk according to the European LeukemiaNet classification. The end points were relapse-free survival (RFS) and overall survival (OS). The authors performed various statistical analyses and reported consistent results in favor of allo-HSCT, that is, hazard ratioRFS (HRRFS) = 0.47 and HROS = 0.56 from the multivariate time-dependent Cox model and HRRFS = 0.47 and HROS = 0.54 from the super landmark model. Therefore, they concluded that allo-HSCT significantly improves outcomes among these patients with AML. From a statistical perspective, however, we have concerns regarding the results in this study.
In randomized AML trials where both treatment options are applicable to 2 groups of comparable patients, HR and its confidence interval are used to suggest whether, on average, changing the treatment from standard chemotherapy to allo-HSCT improves RFS and OS for the respective population.
Due to the observational nature of this study, treatment allocation in the first complete remission (CR1) was not randomized, and it may have depended on several factors, such as the physical condition of the patient, donor availability, waiting time for the transplant, and concurrent events such as relapse or death. Assuming that these clinical characteristics also guide treatment decisions, extraneous differences apart from treatment allocation alone might have been created between the 2 study groups. Potential failure to include all relevant confounders in the analyses as described by Devillier et al1 might have led to biased effect estimates. Further, our own experience with other AML data from the German-Austrian Acute Myeloid Leukemia Study Group2 shows that the similarity overlap between 2 treatment groups is usually very small, which suggests that the 2 groups are not comparable. Such incomparability is unlikely to be accounted for by simply applying multivariate analyses, the so-called “apples and oranges” problem. Therefore, the authors should have clarified how many among the 304 patients who did not receive allo-HSCT were considered ineligible for allo-HSCT by the physician, and among the remaining 203 patients who received allo-HSCT, how many were in fact unfit for standard chemotherapy. In other words, there could have been a mixture of patients who were presenting a therapeutic uncertainty and those who were only suitable for 1 of the 2 treatment options. Methods such as propensity score3 can further assess the overlap of similarity between the 2 treatment groups, information that Devillier et al1 could have provided.
The next issue of relevance is guarantee-time bias (GTB), also referred to as immortal time bias.4,5 As allo-HSCT cannot be performed immediately after a patient has reached CR1 (due to the wait for an appropriate donor), the waiting time from CR1 to allo-HSCT tends to select patients with a better prognosis; a patient who receives allo-HSCT is implicitly guaranteed a longer survival time free of relapse or death. As a result, the estimated effect of allo-HSCT might have been illusory; the better prognosis would also have been seen had the allo-HSCT not been efficacious.
Devillier et al1 stated in their article, “[t]o investigate the impact of allo-HSCT on outcome after CR1, three statistical methods were used to deal with the guarantee-time issue.” To this end, the value of the time-dependent treatment covariate was set from 0 to 1 at the time of allo-HSCT, in both the multivariate time-dependent Cox model and the super landmark model.6 This technique extends the naïve Cox model and has been recommended to correct for GTB.5 However, we believe it is still not sufficient, especially in a complex setting with multistate prognosis. The underlying assumption of these models is that whether or not a patient receives allo-HSCT should be independent of the disease prognosis and patients’ individual traits, despite the existence of GTB. This could have been the case for relapse as an outcome (that relapse is the classifying event occurring during follow-up, which precludes allo-HSCT, and there are no other prognostic factors that are associated with treatment allocation), but not for OS; relapse is also related to a greater likelihood of death and now serves as a selector assigning more good-risk patients to the allo-HSCT group.
To illustrate, we present our simulation study following the analytical strategies of Devillier et al.1 In the example below, each patient hypothetically has a chance to receive allo-HSCT any time between 0 and 5 years after reaching CR1. A patient will eventually receive allo-HSCT if the outcomes (relapse and/or death) have not yet occurred. We assume that time to relapse and death are following Weibull distributions, on which allo-HSCT has no effects. Table 1 reflects the situation where most deaths occur after experiencing relapse, whereas Table 2 shows there are more early deaths without relapse due to the toxicity of early chemotherapy. In both scenarios, the occurrence of relapse can cause deterioration of a patient’s physical condition and accelerate death. There are 2 additional prognostic factors (X1 and X2) that are associated with the outcomes. For explanatory purposes, we simplify the scenarios by removing the association of allo-HSCT with other prognostic factors.
Relapse ∼ Weibull (λ = 0.15, γ = 1.5); Nonrelapse death ∼ Weibull (λ = 0.05, γ = 1.5); . | ||||
---|---|---|---|---|
Post-relapse death ∼ Weibull (λ = 0.3, γ = 1.5); β(X1) = 0.5; β(X2) = 0.75; . | ||||
Time from CR1 to allo-HSCT ∼ U (0, 5) . | ||||
. | HRRFS . | FPRRFS . | HROS . | FPROS . |
Naïve Cox | 0.293 | 1 | 0.323 | 1 |
Time-dependent Cox | 1.001 | 0.057 | 0.852 | 0.344 |
Super Landmark | 1.002 | 0.055 | 0.857 | 0.292 |
Relapse ∼ Weibull (λ = 0.15, γ = 1.5); Nonrelapse death ∼ Weibull (λ = 0.05, γ = 1.5); . | ||||
---|---|---|---|---|
Post-relapse death ∼ Weibull (λ = 0.3, γ = 1.5); β(X1) = 0.5; β(X2) = 0.75; . | ||||
Time from CR1 to allo-HSCT ∼ U (0, 5) . | ||||
. | HRRFS . | FPRRFS . | HROS . | FPROS . |
Naïve Cox | 0.293 | 1 | 0.323 | 1 |
Time-dependent Cox | 1.001 | 0.057 | 0.852 | 0.344 |
Super Landmark | 1.002 | 0.055 | 0.857 | 0.292 |
Simulation replicates = 1000 and sample size = 600; HR, hazard ratio, is averaged over the results from 1000 replicates; FPR, false-positive rate, is the proportion of significant effect estimates of allo-HSCT in 1000 replicates, where the true treatment effect does not exist; RFS: relapse-free survival; OS: overall survival.
Allo-HSCT is fixed in the naïve Cox model and is time-dependent in the multivariate time-dependent Cox model and in the super landmark model; relapse is included as a covariate (fixed or time-dependent, as appropriate) in all 3 models for OS; the 2 prognostic factors (X1 and X2) are also included.
Relapse ∼ Weibull (λ = 0.05, γ = 1.5); Nonrelapse death ∼ Weibull (λ = 0.15, γ = 1.5); . | ||||
---|---|---|---|---|
Post-relapse death ∼ Weibull (λ = 0.3, γ = 1.5); β(X1) = 0.5; β(X2) = 0.75; . | ||||
Time from CR1 to allo-HSCT ∼ U (0, 5) . | ||||
. | HRRFS . | FPRRFS . | HROS . | FPROS . |
Naïve Cox | 0.309 | 1 | 0.313 | 1 |
Time-dependent Cox | 1.007 | 0.049 | 0.962 | 0.053 |
Super Landmark | 1.007 | 0.061 | 0.956 | 0.067 |
Relapse ∼ Weibull (λ = 0.05, γ = 1.5); Nonrelapse death ∼ Weibull (λ = 0.15, γ = 1.5); . | ||||
---|---|---|---|---|
Post-relapse death ∼ Weibull (λ = 0.3, γ = 1.5); β(X1) = 0.5; β(X2) = 0.75; . | ||||
Time from CR1 to allo-HSCT ∼ U (0, 5) . | ||||
. | HRRFS . | FPRRFS . | HROS . | FPROS . |
Naïve Cox | 0.309 | 1 | 0.313 | 1 |
Time-dependent Cox | 1.007 | 0.049 | 0.962 | 0.053 |
Super Landmark | 1.007 | 0.061 | 0.956 | 0.067 |
Simulation replicates = 1000 and sample size = 600; HR, hazard ratio, is averaged over the results from 1000 replicates; FPR, false-positive rate, is the proportion of significant effect estimates of allo-HSCT in 1000 replicates, where the true treatment effect does not exist; RFS: relapse-free survival; OS: overall survival.
Allo-HSCT is fixed in the naïve Cox model and is time-dependent in the multivariate time-dependent Cox model and in the super landmark model; relapse is included as a covariate (fixed or time-dependent, as appropriate) in all 3 models for OS; the 2 prognostic factors (X1 and X2) are also included.
Tables 1 and 2 show the results from the naïve Cox model and the 2 time-dependent models. As expected, the naïve Cox model always gives significant and biased effect estimates of HRRFS and HROS (with 100% FPR). This is because the treatment allocation is determined after baseline, and hence, the proportional hazards assumption no longer holds.
The effect of allo-HSCT on relapse and nonrelapse death can be corrected by the 2 time-dependent models, as we see that HRRFS appears unbiased in both Table 1 and Table 2. In our simulation setting, the allocation of allo-HSCT is set as an external factor independent of the disease prognosis and patients’ individual traits. Therefore, RFS, which consists of relapse and nonrelapse death, is insensitive to the selection bias introduced by the occurrence of relapse.
However, given the predisposition to the selection bias as discussed, bias remains for the HROS in Table 1, even after applying both time-dependent models. This suggests a beneficial effect of allo-HSCT on OS that does not in fact exist. The significance level is also inflated. In Table 2, on the other hand, many early deaths occur before relapse, where the selection bias plays no part; hence, the HROS from the 2 models is no longer biased, and the significance level is back to 0.05. Of particular note, the unbiased HRRFS and HROS in this simulation study may well be biased in practice, as the treatment allocation is often not an external factor and is associated to many determinants such as patients’ functional status and comorbidities. Our simulation script is publicly available (https://github.com/YX-IBE/AMLTimeDependentSimulation), and the results can be reproduced and the simulation setting can be modified.
Contrary to the statement made by Devillier et al1, “[o]ne limitation of time-dependent analyses is that […] they provide unbiased HR (with respect to the immortality bias),” our simulation example shows that using sophisticated time-dependent methods may not be able to fully avoid biased effect estimates. Even though the results appear consistent, based on the same model assumption, they could have been consistently overestimated.
Contribution: U.M. and T.H. provided critical content expertise; and U.M. and Y.X. conceptualized research, performed the simulation study, and wrote the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Yujun Xu, Institute for Medical Information Processing, Biometry and Epidemiology, Ludwig Maximilian University of Munich, Marchioninistr. 15, 81377, Munich, Germany; e-mail: yujunxu@ibe.med.uni-muenchen.de.
References
Author notes
The analysis code for the simulation study is publicly available in the online GitHub repository (https://github.com/YX-IBE/AMLTimeDependentSimulation), licensed under the GNU General Public License version 3.0.
Data are available on request from the corresponding author, Yujun Xu (yujunxu@ibe.med.uni-muenchen.de).