In this issue of Blood, Ruppert and colleagues compare 3 common prognostic indices for diffuse large B-cell lymphoma (DLBCL) across a large assemblage of patients in clinical trials and argue that the National Comprehensive Cancer Network (NCCN) International Prognostic Index (IPI) performs best.1
Optimal navigation of the heterogeneity in a disease such as DLBCL relies on informative prognostic tools. Prognostic indices aid the clinician when explaining risks to patients with DLBCL, and the indices are useful when comparing patient cohorts in clinical research. The IPI has a longer history in the management of DLBCL than treatment with rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP) and is seemingly just as hard to vanquish as the tool of choice.2 After the initial IPI in 1993 effectively categorized populations of patients with DLBCL into 4 groups with distinct risks of premature death, modifications to that IPI were proposed with the goal of improving discrimination in newer patient cohorts. The revised IPI (R-IPI)3 and NCCN-IPI4 are 2 such modifications that have become popular among loyal groups of users, to the detriment of the lymphoma research community; use of 3 prognostic indices is worse than use of only 1 with user consensus. Until now, these indices have not been rigorously compared with one another. Ruppert et al show that when the 3 indices are compared, the NCCN-IPI best discriminates patients with varying risk for death within 5 years, but not by much (see figure).
The IPI, which incorporates 5 easily collected clinical variables, has been trusted for more than a quarter century as effective and simple, but it is flawed. The IPI was modeled on a cohort of patients who were treated before monoclonal antibodies revolutionized the diagnosis and treatment of large cell lymphoma. With modern therapies, the index extremes fail to identify patients who are either reliably cured or who are more likely than not to die of their disease. Most disappointingly, with a few exceptions of low-risk, limited-stage patients,5 the risk categories within the IPI do not dictate decisions in treatment strategies. The R-IPI (2007) has different risk-group clustering based upon the same 5 clinical variables, and it was evaluated retrospectively in a cohort of patients who received R-CHOP therapy. Sehn and colleagues at the time suggested that the R-IPI was better because it retained 3 distinct prognostic categories, whereas the IPI applied to the same cohort functionally identified only 2 prognostic categories.
The NCCN-IPI uses the same 5 clinical variables with further leverage given to the broad range of age and lactate dehydrogenase levels, while also modifying extranodal criteria to any involvement of specified sites. In their 2014 report, the authors suggested that the NCCN-IPI was better than the IPI because of its enhanced capacity to discriminate risk groups. They expressed particular enthusiasm for its potential to identify high-risk candidates for novel approaches in clinical trials, perhaps not fully appreciating that the increased risk for a patient was frequently determined on the basis of extra points for advanced age or central nervous system disease, factors which ironically often preclude participation in clinical trials.
Good models, like good therapies, are best compared head-to-head in novel patient populations. Ruppert et al did exactly that by assembling 7 recent DLBCL populations treated somewhat consistently in well-respected clinical trials and applied all 3 models to the combined cohort. In this setting, the NCCN-IPI does the best job of identifying patients at highest risk of early death, and it does as well as the other indices at identifying those at lowest risk. Importantly, NCCN-IPI also had the best discrimination for risk of progression at 24 months and progression-free survival. Although the differences are mathematically rigorous, they are clinically modest, which one might expect, given that each model relies on the same 5 clinical variables. The 3 indices also share a potentially crippling weakness because of newer DLBCL diagnostic categories. As noted by the authors, even these recent clinical trials in patients with DLBCL are populated by patients whose disease today would be classified formally as high-grade lymphoma with MYC and BCL2 and/or BCL6 rearrangements or recognized as DLBCL with abundant expression of those same proteins. Each of these would be recognized as having an adverse prognosis independent of prognostic index score, and their exclusion in future DLBCL cohorts will likely further weaken the ability of the above indices to identify a population with very high risk.
Hippocrates tells us that “(s)he will manage the cure best who has foreseen what is to happen from the present state of matters.”6 We must still use the best available indices to compare clinical trial populations across studies, and Ruppert et al are justified in suggesting that NCCN-IPI should be the index of choice for this exercise, especially since the comparison was made in large clinical trial cohorts. Any of the choices are still probably fine for clinicians when discussing prognosis with their patients. Disappointingly, none are yet useful in making therapy choices. The next new index should be held to that standard.
Conflict-of-interest disclosure: B.K.L. has received compensation as a consultant for Genentech and Karyopharm and as a data safety monitoring board member for AbbVie and Gilead.