Use of surrogates as primary end points is commonplace in hematology/oncology clinical trials. As opposed to prognostic markers, surrogates are end points that can be measured early and yet can still capture the full effect of treatment, because it would be captured by the true outcome (eg, overall survival). We discuss the level of evidence of the most commonly used end points in hematology and share recommendations on how to apply and evaluate surrogate end points in research and clinical practice. Based on the statistical literature, this clinician-friendly review intends to build a bridge between clinicians and surrogacy specialists.
Introduction
In the context of fatal diseases such as hematologic malignancies, overall survival (OS) is considered a gold standard end point in cancer trials. When time and resources to demonstrate benefit in efficacy of new medical interventions in OS are too substantial, surrogate end points may be used.1 As opposed to prognostic markers, surrogates are end points that can be measured earlier, easier, or with lower cost, and yet can still capture the full effect of treatment, because it would be captured by the true outcome (eg, OS). In other words, the magnitude of change in the surrogate between arms has to predict the magnitude of change in the true end point. However, 2 key concepts are generally ignored by clinicians and trialists: (1) all surrogacy analyses do not answer the same question; and (2) when properly validated, surrogates are only valid for a specific disease setting and class of interventions. This can lead to the misuse of nonvalidated surrogates as primary end point in clinical trials, which may subsequently mislead clinicians and regulatory agencies in the evaluation of anticancer agents.
Regulatory agencies face several competing challenges when considering the design and evaluation of clinical trials. In 1 hand, they must assess both efficacy and safety of every agent before a potential approval, and thus would prefer the most robust scientific studies possible (ie, large randomized trials with OS as primary end point or, potentially, progression-free survival [PFS] as a primary end point in indolent malignancies). On the other hand, these types of studies may take an extended period of time to measure a drug’s clinical benefit, and there is clinical and societal urgency to provide patients with access to breakthrough therapies as soon as possible. To walk this tightrope, agencies may allow alternative trial designs using surrogate end points as evidence for consideration. Thus, complete response (CR) rates and duration of response frequently substantiate accelerated approvals despite lack of validation as surrogate end points in many indications. Another option for agencies is to assume that a validated surrogate in 1 indication might also be valid in a similar indication. For example, the confirmatory trial NCT04212013 evaluating the effectiveness of ibrutinib-rituximab in first-line marginal zone lymphoma (MZL) was designed, alongside the US Food and Drug Administration (FDA), using CR rate at 30 months (CR30) as primary end point, which had been validated as surrogate of PFS in follicular lymphoma (FL).
For >30 years, statisticians have been developing sophisticated methods to formally validate surrogate end points in specific settings. Articles written for specialized statistics journals are typically not readily understandable for clinicians, who nevertheless design clinical trials.2-4 Therefore, we intended to provide a clinician-friendly review of the statistical aspects regarding the clinically meaningful end points, their surrogates and how they were validated. We also discuss the balance that exists between speed and accuracy of scientific evidence and the issue of the evidence transportability of surrogates to newer settings, and provide recommendations that may help clinicians in analyzing end points and their surrogates.
Clinically meaningful end points
Clinically meaningful end points refer to patient-centered end points that reliably assess a drug’s efficacy in a particular indication, and ideally its safety. Key safety outcomes include all-cause death, adverse events (AEs), serious AEs, and withdrawals because of AEs.
In aggressive malignancies, OS is considered as the gold standard clinical efficacy measure because it also encompasses nonrelapse mortality, which can sometimes reflect the treatment’s toxicity. Indeed, nonrelapse mortality is known to hamper survival outcomes in allogeneic stem cell transplantation5,6 (acute graft-versus-host disease and severe infection) but also in autologous stem cell transplantation (secondary malignancies).7
The quality of life (QOL) is also a patient-centered end point that includes safety outcomes because AE and serious AE are believed to negatively affect QOL scores. However, because QOL is only assessable in living patients, it can only be considered as a conditional clinical efficacy measure.8 In other words, QOL is worth comparing between arms when the experimental treatment offers at least the same OS rate as the standard therapy. For example, in relapsed/refractory (R/R) classical Hodgkin lymphoma, patient-reported outcomes from the KEYNOTE-204 trial were published after initial results showed a PFS advantage of pembrolizumab over brentuximab vedotin9; similarly, in R/R large B-cell lymphoma (LBCL), patient-reported outcomes from the ZUMA-7 trial were published after efficacy results showed an event-free survival (EFS) advantage of axicabtagene ciloleucel over standard of care.10
Evaluation of OS becomes challenging in indolent malignancies with a prolonged survival and low rates of mortality from progressive disease over the first decade after diagnosis. In a setting like this, such as extranodal MZL, non-lymphoma–related deaths are more frequent than lymphoma-related deaths (at 10 years: 10.6% [95% confidence interval (CI), 7.6-14.9] vs 3.7% [95% CI, 2.22-6.4]).11 Therefore, PFS as primary end point becomes very relevant, although it is tumor centered and not patient centered. Caveats of using PFS as primary end point is that it does not reflect the drug’s toxicity at all. An example in R/R multiple myeloma is the BELLINI trial that compared venetoclax or placebo in combination with bortezomib and dexamethasone: despite an advantage of the experimental arm in PFS (hazard ratio [HR], 0.63; 95% CI, 0.44-0.90; P = .010), it was detrimental in OS (HR, 2.03; 95% CI, 1.04-3.95) because of numerous emergent fatal infections reported in the venetoclax group.12 Another relevant example is the substantial late toxicity shown by phosphoinositide 3-kinase inhibitors that led to limitations of use or even voluntary withdrawals in several indications (indolent non-Hodgkin lymphomas and chronic lymphocytic leukemia [CLL]), despite improvements in PFS.13
Surrogates or correlates?
Depending on the indication, true clinical efficacy end points may require a long follow-up period that may not be feasible to assess in a reasonable timeframe. In this case only, earlier end points should be used. Earlier end points are necessarily indirect measures of clinical efficacy, tumor- or cell-centered end points that generally elude the toxicity assessment (Table 1). Typical surrogates in cancers are PFS, response rates, or evaluation of disease rates at a precise time point (eg, CR30 or progression of disease at 24 months [POD24]), but their assessment may vary widely according to the disease (computed tomography [CT] scan, positron emission tomography [PET] and CT, or minimal residual disease [MRD]/circulating tumor DNA). Imaging techniques are tumor centered but are not specific of the disease and have their own challenges with interpretation. With a CT scan, shrinkage of the tumor is never immediate and residual mass can be either assessed as a residual disease or an unconfirmed CR.14 PET/CT-based criteria eliminate unconfirmed CR but interpretation of a Deauville score depends on the timing of assessment, the clinical context, and the treatment.15 In contrast, MRD or circulating tumor DNA are cell centered and ignore the multidimensionality of the causal mechanisms of the disease.
Such indirect measures may fail to provide reliable evidence about the benefit-to-risk profile of interventions. In 2012, Fleming and Powers proposed a categorization of outcome measures, according to level of evidence regarding efficacy.16 Level 1 corresponds to the true clinical efficacy measures, such as OS, as described hereabove. Level 2 corresponds to validated surrogates for a specific disease setting and class of interventions (eg, CR30 in first-line FL in patients requiring systemic therapy,17 and CR24 in first-line extranodal MZL in patients requiring systemic therapy18). Level 3 refers to nonvalidated surrogates yet established to be “reasonably likely to predict clinical benefit” for a specific disease setting and class of interventions. Level 4 refers to markers that correlate with the true outcome but not established at a higher level.
Correlates are established with a prognostic analysis and can be predictive in a particular setting (eg, undetectable MRD in first-line acute myeloblastic leukemia [AML]) but they do not necessarily make surrogates. For example, if they do not lie on the causal pathway of the disease process, or if they capture only a small fraction of the treatment effect on the final outcome, they are likely to provide misleading information about clinical efficacy. When there is no validated surrogate in a given indication, regulatory agencies may use level 3 end points to grant accelerated (FDA) or conditional (European Medicine Agency) approvals.
Currently, nonvalidated surrogates are used to speed up drug access when an “unmet medical need” is identified. However, of >237 cancer indications described as “unmet medical need” that were analyzed in a study, 55 (23%) referred to indications with an annual incidence of >1000 cases, with ≥5 National Comprehensive Cancer Network–recommended regimens and a ≥50% 5-year survival.19 Over the past decade (2013-2023) in the field of lymphomas, all 23 FDA-accelerated approvals were based on trials using response rates (with/without combination with duration of response in the subset of responsive patients; supplemental Table 2 in the supplemental Appendix, available on the Blood website), as an end point that was “reasonably likely to predict clinical benefit.” By using what are believed to be level 3 markers, the FDA grants accelerated approvals on the basis of strong correlates that are not necessarily validated surrogates. Accelerated approval does require confirmatory trials to verify the clinical benefit or demonstrate an effect on irreversible morbidity or mortality, and approvals may be rescinded if this benefit is not confirmed, as in the case of phosphoinositide 3-kinase inhibitors.13 Of the 15 surrogate analyses conducted by the FDA themselves in oncology between 2005 and 2022, only 1 demonstrated a strong correlation between a surrogate outcome and OS,20 which raises concerns about the evidence upon which drug approvals are granted, as well as highlight challenges with identification of true surrogate end points in this space.
Key concepts to validate surrogates end points
As previously mentioned, an ideal surrogate lies on the causal pathway of the disease process, can be measured easily and in a timely manner, is significantly changed under the experimental treatment, and captures the magnitude of treatment effect that will be observed when the true clinical outcome will be measurable. Statistical demonstration of causality between the assigned treatment, the surrogate, and the final outcome, relies either on the causal effect or the causal association paradigm.4
The causal effect paradigm focuses on single-trial approaches assessing the ability of the surrogate end point to lie in the causal pathway from the treatment to the true outcome.21 The historical approaches estimate the proportion of treatment effect that is explained by the surrogate, whereas the modern mediation techniques use counterfactuals to decompose the total effect of a treatment on the true outcome into the direct (ie, independent of the surrogate) and indirect (ie, through the surrogate) effects of the treatment.22,23 Further details on counterfactuals and the causal framework are provided in the supplemental Appendix.
In contrast, the causal association paradigm investigates the relationship between the treatment effects on the surrogate end point and the true outcome, which does not imply that the surrogate lies on the causal pathway. The most common method is the trial-level surrogate validation, which measures the strength of correlation between the change in the surrogate and the change in the final outcome across randomized trials.24,25 Another method is the principal stratification, that estimates the contrast of treatment effect in all patients (using counterfactuals) whether they had or had not met the surrogate end point.26,27
Although very popular, the trial-level surrogate validation relies on the use of individual patients’ data (IPD). IPD require extensive time and resources to overcome legal and data management issues, and when they are eventually gathered, they generally represent only a fraction of the published data in the studied indication.28 In that context, simplified models have been proposed for the estimation of the trial-level association when only aggregated data are available, using a meta-regression, but this method cannot be considered as reliable as the IPD-based trial-level surrogate validation (Table 2).
It is crucial to remember that, regardless of the method, a validated surrogate is only valid for a specific disease setting and class of interventions. Even with the trial-level validation method, because it relies on the “no between-trials confounders” assumption, it musts include only similar trials (same disease, same line, and same backbone of control arms). In other words, using multiple randomized trials with various settings or class of interventions may not make the results more transportable. Given that the challenge of surrogate transportability in future trials (newer drugs and different settings) is theoretically unsolvable, the scientific risks taken by regulatory agencies become reasonable or, at least, understandable. When nonvalidated surrogates are leveraged to grant accelerated approvals, confirmatory trials or postmarketing analyses are then of paramount importance.
Surrogates in hematologic malignancies
Validated surrogates (level 2)
In diffuse large B-cell lymphoma (DLBCL), the SEAL consortium29 used a multiple-trial R2 approach, leveraging the IPD from 13 randomized trials (accrual start ranged from 1998-2007). Trial-level surrogacy for PFS was strong with a R2trial > 0.80 and PFS was therefore considered as a good surrogate of OS in first-line DLBCL. Yet, in 2017, the REMARC trial30 demonstrated that lenalidomide maintenance in patients aged 60 to 80 years who achieved at least a partial response to R-CHOP (rituximab, cyclophosphamide, doxorubicin, Oncovin [vincristine], and prednisone) induction was associated with an improved PFS over placebo; however, OS was similar between arms. In 2022, the POLARIX trial31 demonstrated that polatuzumab vedotin-R-CHP (rituximab, cyclophosphamide, doxorubicin, and prednisone) was associated with an improved PFS over the standard R-CHOP, but 2-year OS did not differ significantly between arms. Multiple explanations can be set forth to explain these discrepancies (limited size effect in PFS, immature OS data, advent of new effective treatments in relapsed patients, and nonsimilarity of trial patients vs SEAL patients) but none was yet proven. Mindful of the fact that new therapeutic and response assessment eras may affect the surrogate transportability, PFS remains the only and strongest surrogate of OS in DLBCL.
In FL, the FLASH consortium17 used a multiple-trial R2 approach, leveraging the IPD from 13 randomized trials (accrual start ranged from 1980-2004). Trial-level surrogacy for CR30 was strong with a R2trial > 0.80, and CR30 was therefore considered as a good surrogate of PFS in first-line FL. To date, no clinical trial published on PubMed has used CR30 as a primary end point. According to ClinicalTrials.gov, of the 24 phase 3 trials in FL that began enrolling after publication of FLASH results, none used CR30 as the primary end point. Notably, only 1 approval (obinutuzumab) was granted in first-line FL in the past 10 years and it was based on an improvement in PFS.
In MZL, sidestepping the issue of randomized trials scarcity, we, and others, performed a single-trial approach based on the IELSG19 trial that demonstrated that CR24 (mediated effect = 0.90) and time to CR censored at 24 months (captured effect = 0.95) were valid surrogates of 8-year PFS.18
Nonvalidated surrogates, yet reasonably likely to predict clinical benefit (level 3)
EFS
When an innovative therapy is evaluated in phase 3 trials, investigators seek to assess its efficacy in a timely manner. In this context, EFS is useful because its definition is trial dependent and may include additional events such as nonresponse, discontinuation of the experimental treatment, or initiation of a new treatment. However, interpretation of EFS results is limited by its trial-dependent definition.
In AML, EFS served as the basis for FDA approval of gemtuzumab ozogamicin32,33 in first-line CD33+ AML. In this very aggressive disease, timing of CR assessment as an EFS event may considerably affect EFS estimates.34 Recently, a simplified approach (with aggregated data, instead of IPD) showed, on a trial-level, a strong correlation between HR EFS and HR OS (R2 = 0.87 [95% CI, 0.47-0.98]), but this correlation was weakened when EFS definitions were not harmonized across trials.35 This analysis encouraged FDA to consider that an EFS benefit would be likely to have an OS effect.
In DLBCL, EFS served as the basis for FDA approval of axicabtagene ciloleucel (ZUMA-736) and lisocabtagene maraleucel (TRANSFORM37) in second-line DLBCL. It is noteworthy that EFS was never validated as surrogate in a relapse setting. Moreover, EFS definition varied significantly between the ZUMA-7, TRANSFORM, and BELINDA trials: stable disease was considered an event at different time points across trials and a second salvage regimen was allowed on the BELINDA trial.38
Durable OR and CR
According to the FDA’s surrogate end point table,39 durable overall response (OR)/CR are potential surrogates that may be considered for drug approvals, because they are considered likely to predict clinical benefit, however they do not capture toxicity.
In lymphoid neoplasms, over the past 10 years, all FDA accelerated approvals were based on durable OR rates (usual primary end point in phase 2 trials). Durable OR/CR rates were also used as surrogates to substantiate traditional approvals (eg, ibrutinib and zanubrutinib in Waldenström macroglobulinemia, crizotinib in R/R anaplastic lymphoma kinase–positive anaplastic large T-cell lymphoma, lenalidomide in R/R indolent non-Hodgkin lymphomas, axicabtagene ciloleucel and lisocabtagene maraleucel in R/R DLBCL, and tisagenlecleucel in R/R FL). Over the period from 2005 to 2022, none of the 15 surrogate analyses conducted by the FDA investigated the reliability of durable OR/CR in lymphoid neoplasms.20 Moreover, durable OR/CR actually embeds 2 different end points (best OR/CR in all patients, and duration of response in respondents), which complicates its interpretation.
In newly diagnosed AML, the FDA conducted a trial-level simplified analysis assessing CR as surrogate of OS, but the results did not show a strong correlation between these end points.35 More stringent responses are now required to be considered as reasonable surrogate according to the FDA’s surrogate end point table: major hematologic response or cytogenetic response in acute lymphoblastic leukemia (ALL), AML, myeloproliferative diseases, and chronic myeloid leukemia (CML). This leads us to the next topic: MRD.
MRD
MRD is so far only recognized as a potential surrogate by the FDA in patients with B-cell ALL in first or second CR. It served, in combination with relapse-free survival, as the basis for FDA approval of blinatumomab in 2018 (BLAST40). Although it is an independent predictor of risk of relapse and long-term OS in adults and children with B-cell ALL,41 it was never validated as a surrogate of survival in this disease. In 2020, the FDA issued a guidance, asking researchers and sponsors to conduct a trial-level surrogacy analysis to validate MRD as a surrogate end point.42
In AML, a prognostic meta-analysis of patients receiving first-line therapy showed a significant association between the levels of MRD and OS, irrespective of age, AML subtype, sample type, time of MRD assessment, and MRD detection method.43 In intermediate-risk AML, a MRD-driven treatment strategy has demonstrated its superiority in OS.44 Likewise, MRD is now a tool to optimally determine the conditioning intensity before allogeneic transplant.45 Nevertheless, no surrogacy analysis was adequately performed that would make MRD an actual surrogate of OS in AML. That does not discourage investigators to use it as coprimary end point in ongoing phase 3 trials (eg, NCT03665480, NCT04093505, NCT04168502, and NCT01828489), which might be motivated by the high lethality of the disease and the hope that MRD will soon be considered by the FDA as a reasonable surrogate in AML. In this regard, the Foundation for the National Institutes of Health’s Biomarkers Consortium is currently collecting data to support MRD as a validated surrogate in AML.
In first-line multiple myeloma, a trial-level simplified analysis based on aggregated data from 6 trials claimed that MRD was a level 3 surrogate of PFS (R2trial = 0.97).46 These results were recently challenged by 2 other analyses based on aggregated data (R2trial = 0.62 and R2trial = 0.70).47,48 Beside, several studies have shown the strong prognostic role of MRD in predicting both PFS and OS,49-51 which led to the development of MRD-guided strategies (MASTER52). However, like in AML, MRD technique and threshold may vary across trials and centers, which affects the prediction of survival outcomes; and its consistency of measurement can be limited by a patchy infiltration of the bone marrow, hemodilution during aspiration, or even less aggressive clones being incidentally picked up.53 This potentially renders a multiple-trial surrogacy analysis more complicated because of the data alignment that becomes necessary (the i2TEAMM is currently harmonizing data to overcome this issue).54
In CLL, several analyses strongly associated MRD negativity with both PFS and OS wherein patients received fixed-duration immunochemotherapy55-59 or venetoclax plus anti-CD20 combinations.58,60 Yet, this prognostic significance was not retrieved in patients with unmutated immunoglobulin heavy-chain variable region gene treated with immunochemotherapy.61 Moreover, in patients receiving fixed-duration ibrutinib-venetoclax, a detectable MRD at end of treatment was not associated with a shorter survival (CAPTIVATE62,63 and GLOW64). Nonetheless, MRD-guided strategies demonstrate outstanding results both in terms of PFS and OS (FLAIR65). Therefore, MRD status is very promising in CLL, but no consensus yet exists on when it should be assessed as a surrogate measurement for PFS or OS.
In CML, dasatinib and nilotinib were approved based on rates of complete cytogenetic response and major molecular response. Upon publication of the DASISION66 and ENESTnd67 trials, only a few studies had demonstrated the prognostic impact of these markers on OS.68-71 Nonetheless, in an allegedly 1-hit tumor such as CML, treatment-free remission (ie, that combines achievement of major molecular response and maintenance of that response after therapy discontinuation) is legitimately seen as a gold end point. Finally, CML has become a chronic disease in which OS, for most patients, is determined by their preexisting comorbidities. Therefore, development of new efficacy surrogates is a lower priority than preventing drug-related AEs.72
PET/CT radiomics
In diseases with measurable [18F]fluorodeoxyglucose-avid lesions, PET imaging provides investigators with various ways the capture treatment effect (eg, ΔSUVmax [maximum standardized uptake value] and change in total metabolic tumor volume), and all the more so as fully automatic segmentation is now at hand using artificial intelligence.78 Change in total metabolic tumor volume and 5-point Deauville score on interim PET enable individualized estimation of survival outcomes in patients with lymphoma, and they are already used for risk-adapted treatment approaches in clinical trials. In [18F]fluorodeoxyglucose-avid lymphomas, because PET is the main response assessment tool, the evaluation of PET-CR as surrogate is encompassed by the evaluation of CR as a surrogate.
In multiple myeloma, the ability of PET to distinguish between active and inactive sites makes it an excellent tool to monitor response to therapy.79 Achievement of PET CR before maintenance therapy or at 6 months after treatment start is a positive predictor for long-term PFS and OS,80,81 and ΔSUVmax of ≥25% on interim PET is an independent prognostic factor for PFS.82 These findings incited the International Myeloma Working Group to list PET radiomics in the MRD evaluation criteria.83
Correlates (level 4)
POD24
The POD24 marker (progression of disease within 2 years) was first developed in first-line FL, identifying a high-risk population that may be preferentially selected for investigation of novel therapies in the second-line setting.84,85 However, many patients experiencing POD24 do so in the context of transformation to DLBCL,86 showing that POD24 does not lie in the causal pathway between the first-line therapy and OS. Recently, POD24 was even excluded of the potential surrogates in FL.87 In a similar fashion, POD24 was correlated to PFS and OS in MZL and mantle-cell lymphoma (MCL),88-91 but as in FL, POD24 correlation with survival is biased by the transformation of MZL to DLBCL92,93 or of classic MCL to blastoid variant MCL. Therefore, POD24 cannot yet be used as surrogate in these diseases.
TTNT
Time to next treatment (TTNT) has the advantage (and the weakness) to combine disease progression, symptom control, and treatment tolerability into 1 end point. It can be clinically meaningful to study indolent diseases that may not be actively treated as soon as progression is noted; in a retrospective setting for rare diseases in which randomized trials are not feasible, and in a prospective setting when trials with crossover designs are conducted.94,95 That makes TTNT a meaningful end point for real-world studies and in diseases such as cutaneous lymphoma, in which pin-pointing the exact date of progression is challenging. Nevertheless, TTNT has 2 major caveats: it does not only assess efficacy, which makes it less objective than PFS; and it is not timely assessed. PFS remains, therefore, the relied-upon end point according to the FDA in cutaneous lymphoma.94,96
Discussion
The goal of surrogates is to provide an accurate end point to shorten trial duration and allow more timely access to innovative therapies for patients facing unmet needs. In a retrospective study of FDA oncology approvals from 2006 to 2017 and their registration trials (107 oncology drugs with 188 indications), the use of PFS rather that OS was associated with a savings of 11 months, and that of OR/CR with a savings of 19 months.97 When validated surrogates (level 2) are used in registration trials, this represents a precious time saving for sponsors, clinicians, and patients. When nonvalidated surrogates (level 3) are used in registration trials, this time saving must be weighed against the downside of increased uncertainty of clinical benefit arising from using nonvalidated surrogates. Over the past decades, a major shift was observed in the use of surrogate end points, which was concomitant to the increase in proportion of industry-funded randomized trials.98 The contrast between the limited number of available validated surrogates and the increasing use of nonvalidated surrogates should be a matter of concern for the academic community. As a matter of fact, there are many counter-examples of drugs that were approved on the basis of a surrogate but proved to be ineffective after assessment of clinically meaningful end points.99 This is why the epithet “surrogate” should be cautiously used, and surrogacy analyses should be encouraged by clinicians and sponsors. In the surrogate validation process, clinicians are consulted for multiple reasons: determining the most used primary end points in phase 3 trials100-102 and the end points’ definitions across trials (which can widely vary103-105); selection of candidate surrogates; and dissemination of validated surrogates by using them as primary end points in randomized trials (Figure 1).
In our classification of surrogates in hematology, we labeled “level 2” the surrogates that had been academically validated. In contrast, “level 3” surrogates are already relied upon for FDA approvals without previous academic validation. Nonvalidated surrogates are particularly used to assess drugs that go through the fast-track process, either because a “breakthrough therapy” is expected to be a game changer in a specific indication (eg, EFS-based approval for chimeric antigen receptor T-cells in relapsed LBCL), or because of the unmet need of the indication in which the drug is given (eg, OR/CR-based approval of crizotinib in R/R anaplastic lymphoma kinase–positive anaplastic large cell lymphoma). Adaptive designs trials are now conducted using level 3 surrogates as decisional biomarker (eg, MRD), which personalizes treatment in population subgroups but further complicates the ability to validate these end points at a trial-level in the future.
The inadequate use of nonvalidated surrogates may simply harm patients. Regulatory agencies should promote level 2 end points, foster research on potential surrogates (level 3), and help in the simplification of data sharing with academics.
Conclusion
Surrogate end points represent a critical area of statistical application and innovation in hematology. Settings in which OS is not a feasible primary end point are pervasive, and specific surrogacy analyses may be necessary in numerous clinical settings and diseases to make informed decisions about clinical trial design and interpretation. In this purpose, clinicians, statisticians, and trial sponsors, from both industry and academia, have a critical role to play in facilitating surrogacy studies and working with regulatory agencies to ensure clinical trials are timely, efficient, and accurate.
Authorship
Contribution: All authors were involved at each step of the review.
Conflict-of-interest disclosure: C.B. was supported by INSERM/AvieSan-ITMO Cancer through a doctoral grant and was awarded the Bertrand Coiffier Prize by LYSA/ELI. C.B. also received funding as mobility support from the Philippe Foundation and Institut Servier (CT0101951). M.J.M. declares research funding from Roche/Genentech, Bristol Myers Squibb, and GenMab; and consultancy for, and advisory board membership with, Bristol Myers Squibb, and AstraZeneca. J.L. declares no competing financial interests.
Correspondence: Côme Bommier, Service d’Hémato-Oncologie, Hôpital St Louis, 1 Ave Claude Vellefaux, 75010 Paris, France; email: come.bommier@aphp.fr.
References
Author notes
The online version of this article contains a data supplement.