Key Points
Clinical trials of different CAR-T products for patients with r/r DLBCL are not aligned on CRS grading scales and management algorithms.
Regrading of CRS from the JULIET trial using the Penn, Lee, and ASTCT systems highlights the need for standardized CRS grading practices.
Abstract
Chimeric antigen receptor T-cell (CAR-T) therapy yields durable responses in patients with relapsed/refractory diffuse large B-cell lymphoma (r/r DLBCL). Cytokine release syndrome (CRS) is a CAR-T therapy–related adverse event. To date, clinical trials of different CAR-T products have not been aligned on CRS grading scales and management algorithms. We assessed concordance between the Penn, Lee, and American Society for Transplantation and Cellular Therapy (ASTCT) grading systems by retrospectively regrading CRS events in the JULIET (A Phase 2, Single Arm, Multicenter Trial to Determine the Efficacy and Safety of CTL019 in Adult Patients With Relapsed or Refractory DLBCL) trial. Four medical experts with experience treating patients with 3 different CAR-T products independently regraded individual patient-level CRS events from the phase 2, global, pivotal JULIET trial (#NCT02445248). As of 8 December 2017, a total of 111 patients with r/r DLBCL underwent infusion with tisagenlecleucel. Sixty-four patients had CRS events graded per the Penn scale; on retrospective review, 63 and 61 patients had CRS events regraded per the Lee and ASTCT criteria, respectively. The Lee scale yielded concordance for 39, lower grade for 20, and higher grade for 5 events compared with the Penn scale. The ASTCT criteria provided concordance for 37, lower grade for 23, and higher grade for 4 events compared with the Penn scale. Sixteen (14%) of 111 patients in the JULIET trial received tocilizumab, all for severe events (Penn grade 3/4 CRS). This study is the first to assess concordance between 3 CRS grading scales using the same patient data set and to compare tocilizumab use according to the Lee scale in the JULIET trial and the ZUMA-1 (Long-Term Safety and Activity of Axicabtagene Ciloleucel in Refractory Large B-Cell Lymphoma) trial. This analysis describes key differences between grading scales and may inform CRS management practices.
Introduction
Autologous, CD19-targeted chimeric antigen receptor T-cell (CAR-T) therapy has greatly improved outcomes for patients with relapsed/refractory (r/r) hematologic malignancies. Two currently commercially available CD19-directed CAR-T therapy products, tisagenlecleucel and axicabtagene ciloleucel, have shown durable responses and improved overall survival compared with historical controls.1 These findings were reported in the JULIET (A Phase 2, Single Arm, Multicenter Trial to Determine the Efficacy and Safety of CTL019 in Adult Patients With Relapsed or Refractory DLBCL) trial, enrolling patients with r/r diffuse large B-cell lymphoma (DLBCL) and transformed follicular lymphoma (tFL), and the ZUMA-1 (Long-Term Safety and Activity of Axicabtagene Ciloleucel in Refractory Large B-Cell Lymphoma) trial, enrolling patients with r/r DLBCL, tFL, and primary mediastinal B-cell lymphoma.2-5 A third CD19-directed CAR-T therapy, JCAR017 (lisocabtagene maraleucel), is currently under investigation in B-cell non-Hodgkin lymphomas (#NCT02631044).6 Several unique and commonly observed adverse events (AEs) are associated with CAR-T therapy across hematologic malignancies and require specialized management; these AEs include cytokine release syndrome (CRS) and neurologic toxicity.3,5,7-10 Any-grade CRS occurs in many patients receiving CAR-T therapies,11-13 including tisagenlecleucel (58%) or axicabtagene ciloleucel (93%), although cross-trial comparisons are difficult to interpret due to diverse trial designs and the differences in CRS reporting scales used.3,5 Sixteen of 111 patients in the JULIET trial (14%) and 49 of 108 patients in the ZUMA-1 trial (45%) received tocilizumab for CRS management, per different CRS management algorithms.5,7,14->16
The National Institutes of Health (NIH) Common Terminology Criteria for Adverse Events version 4.03 were not adequately designed for determining CAR-T therapy–associated CRS onset and severity. Instead, new systems have been developed to define and grade CRS events and are used in different CAR-T therapy clinical trials.10,13,17-20 The Penn scale was developed first, at the University of Pennsylvania, and was used in the JULIET trial of tisagenlecleucel for patients with r/r DLBCL.13 Around the same time, investigators at the NIH introduced the NIH consensus criteria (commonly referred to as the Lee scale) for use in NIH trials and subsequently in the ZUMA-1 trial of axicabtagene ciloleucel, as well as other CAR-T therapy trials.18 More recently, the American Society for Transplantation and Cellular Therapy (ASTCT; formerly known as the American Society for Blood and Marrow Transplantation) formulated CRS toxicity criteria based on a consensus conference among 49 academic, industry, and payor experts, as well as representatives of several federal organizations in the United States, including the US Food and Drug Administration. The ASTCT criteria address both CAR-T therapy–associated CRS and neurologic toxicity.2
The key differences between the 3 grading scales are shown in Table 1.2,13,18 Briefly, the Penn scale defines CRS as a single event and retrospectively determines the time to onset based on when fever and/or myalgias first became evident in a patient who develops CRS after infusion of CAR T cells. This scale frames CRS severity in terms of hypotension and fluid and/or pressor requirements used for CRS management.13 The Lee scale provides definitions of mild, moderate, and severe CRS with additional attention to single-organ toxicities. The Lee scale also outlines detailed CRS management recommendations, including use of tocilizumab and anticytokine therapy.18 Finally, the ASTCT’s definition of CRS simplifies the grading scale, focusing on fever, hypotension and hypoxia, and symptoms that occur primarily within 14 days of CAR-T infusion. Because these symptoms can be observed individually and can arise from causes other than the CAR-T–induced CRS, cautious evaluation of any CRS-like events occurring beyond 14 days is recommended by the ASTCT.2
As alluded to, comparisons of safety results from phase 2 clinical trials of tisagenlecleucel, axicabtagene ciloleucel, and lisocabtagene maraleucel are difficult due to different trial designs, enrolled patient populations, recommended management algorithms, and differences with respect to the CAR construct, modes of transfection, manufacturing, and T-cell composition.5-7,17 Comparing the existing 3 CRS grading systems when applied to a single patient data set will better inform clinicians treating patients with CAR-T therapy about CRS identification, severity assessment, and management, as well as facilitate comparison of trials. Thus, we present an analysis of the concordance between the Penn, Lee, and ASTCT scales by using Lee and ASTCT criteria to retrospectively regrade observed CRS events, previously graded per protocol using the Penn scale, in tisagenlecleucel-treated patients with r/r DLBCL or tFL in the JULIET trial. The data presented here facilitate interpretation of the existing literature and endorse the need for a consensus approach to CRS grading and treatment; the findings will in turn facilitate the development and distribution of best practices between institutions, physician groups, and CAR-T products, especially as new countries approve and implement CAR-T therapy.
Methods
JULIET trial
The JULIET trial (#NCT02445248) is a global, phase 2, single-arm, pivotal trial of centrally manufactured tisagenlecleucel for adult patients with r/r DLBCL or tFL who were ineligible for or had relapsed after autologous hematopoietic stem cell transplantation. Eligible patients were ≥18 years old and received at least 2 previous lines of therapy, including an anthracycline and rituximab. Patients who had received previous anti-CD19 therapy, had primary mediastinal B-cell lymphoma, received a previous allogeneic stem cell transplantation, or had active central nervous system involvement were not eligible for enrollment. After leukapheresis, tisagenlecleucel was manufactured at centralized facilities in Morris Plains, New Jersey, and in Leipzig, Germany. Bridging chemotherapy was permitted during the manufacturing interval. Lymphodepleting chemotherapy was omitted in a minority of patients with white blood cell counts <1000 cells/mm2 1 week before tisagenlecleucel infusion.
The primary end point of the JULIET trial was overall response rate (partial responses plus complete responses) according to the Lugano classification21 per independent review committee assessment. Key secondary end points included duration of response, overall survival, safety, and cellular kinetics. Key exploratory end points included biomarkers.5
Data source
Patient-level data were obtained from case report forms from the JULIET trial. The extracted data included CRS-related AEs (eg, fever, hypotension, organ dysfunction, hypoxia), AE grade, time to onset of each event from time of infusion, and duration of each event. In addition, details on the management of CRS were extracted, including intensive care unit (ICU) admission, interval from infusion to ICU admission, duration of ICU admission, use of supplemental oxygen (and maximum level), hypotension interventions (eg, fluid resuscitation, dose/combination of vasopressors), ventilator support, systemic anticytokine therapy, and other major organ toxicities (eg, bleeding, concurrent infections, disseminated intravascular coagulation, pulmonary abnormalities).
Adjudication of CRS regrading
Four medical experts with experience treating patients with different CAR-T therapy protocols and products independently reviewed the extracted data and regraded CRS events using the Lee scale and ASTCT criteria (key definitions outlined in Table 1)2,13,18 while blinded to the other experts’ regrading assessment. As expected, especially when introducing new grading methods, some variance was observed among the experts’ independent and blinded grading. Thus, as occurs in real-world practice, complex patient cases went through an adjudication discussion by the 4 experts, similar to a clinical tumor board, with preestablished agreement to accept the highest individual grade as the final reported AE from the clinical data set. After discussion, final expert agreement was achieved in all cases.
Results
As of 8 December 2017, a total of 111 adult patients with r/r DLBCL or tFL underwent infusion with tisagenlecleucel. Median follow-up from time of infusion was 14 months, and 93 patients had ≥3 months of follow-up. Detailed patient characteristics have been described previously.5 Eighty-nine percent and 63% of patients experienced an AE suspected to be related to tisagenlecleucel and a grade 3 or 4 AE suspected to be related to tisagenlecleucel, respectively. No grade 5 CRS events occurred.
Sixty-four patients (58%) had CRS events per the Penn scale and were candidates for regrading using the Lee scale and ASTCT criteria. Sixty-three patients (57%) were considered to have any-grade CRS according to the Lee scale, and 61 patients (55%) were considered to have CRS according to the ASTCT consensus criteria (Figure 1A). One patient with grade 1 CRS per the Penn scale did not have CRS according to the Lee scale (and ASTCT criteria) due to absence of documented fever or symptoms requiring intervention. Finally, 24 patients (22%) per the Penn scale, 19 patients (17%) per the Lee scale, and 15 patients (14%) according to ASTCT consensus criteria were considered to have grade 3 or 4 CRS events. Regrading CRS events by using the Lee scale provided concordance for 39 patients (61%), a lower grade (including downgrading to no CRS) for 20 patients (31%), and a higher grade for 5 patients (8%) compared with the Penn scale (Figure 1B). Using the ASTCT criteria, 2 patients with Penn grade 1 CRS and 1 patient with Penn grade 4 CRS were regraded as having no CRS according to the ASTCT criteria and were counted as lower grades despite not meeting the ASTCT definition of CRS, due to lack of documentation of fever >38°C. Regrading of CRS events using the ASTCT criteria provided concordance for 37 patients (58%), a lower grade for 23 patients (36%), and a higher grade for 4 patients (6%) compared with the Penn scale. In contrast, compared with the Lee scale, regrading of CRS events using ASTCT criteria provided concordance for 55 patients (86%), a lower grade for 6 patients (9%), and a higher grade for 3 patients (5%). Agreement of grading among the 4 experts was 57 (89%) for the Lee scale and 58 (91%) for the ASTCT criteria before the live adjudication discussion. After this meeting, expert agreement was 100% for both scales.
The most common discordance between CRS event grades was a higher Penn scale grade than either Lee or ASTCT grades. For example, 1 patient was evaluated as having grade 4 CRS according to both the Penn and Lee scales but no CRS according to the ASTCT criteria. This patient was admitted to an ICU 5 days postinfusion for a 12-day length of stay, required supplemental oxygen, and had hypotension treated with fluid resuscitation. Furthermore, this patient received tocilizumab therapy because the event was attributed to CRS. However, due to the absence of high fever, the experts unanimously agreed that CRS per the ASTCT criteria was not present. Otherwise, differences in CRS grading were often redefined based on the presence of hypotension, levels of administered supplemental oxygen, or the presence or absence of unique organ toxicity such as acute kidney injury, which would be scored according to the Lee criteria but not by the Penn or ASTCT criteria. For example, a patient had grade 3 CRS per the Penn scale due to a fever lasting 4 days and was admitted to an ICU for 4 days. The patient was not intubated but required oxygen supplementation at a maximum rate of 2 L/min and fluid resuscitation for blood pressure support. This patient did not receive any anticytokine therapy or corticosteroids. However, it was noted that the patient had a grade 3 (according to Common Terminology Criteria for Adverse Events version 4.03) infection concurrent with CRS. When going through the available records during regrading, it was impossible to differentiate CRS from infection. To be on the conservative side, all experts regraded this CRS event as grade 2 according to both the Lee and ASTCT criteria.
Characteristics of CRS events and management approaches such as ICU admission, anticytokine therapy, and vasopressor use are shown in Table 2 according to the Penn, Lee, and ASTCT grades. Notably, of the 111 patients in the JULIET trial of whom 64 (58%) had CRS, 17 (15%) received an anticytokine therapy. Of the 111 patients in the JULIET trial, 16 patients (14.4%), 14 patients (12.6%), and 12 patients (10.8%) received tocilizumab for grade 3 or 4 CRS per the Penn scale, Lee scale, and ASTCT criteria, respectively. No patients, 2 patients (1.8%), and 3 patients (2.7%) received tocilizumab for grade 1 or 2 CRS per the Penn scale, Lee scale, and ASTCT criteria, as well as 1 patient previously evaluated as having grade 4 CRS by the Penn and Lee scales but no CRS according to the ASTCT criteria.
Tocilizumab treatment in JULIET by a retrospective Lee regrade was graphed alongside tocilizumab use in ZUMA-1 according to a prospective Lee grade (Figure 2).4,5,7,22 Although there are known major limitations to comparing these 2 trials (discussed later), this descriptive visual representation confirms that CRS management with tocilizumab in the JULIET trial was restrictive, with tocilizumab primarily administered for grade 3 to 4 CRS.
Discussion
The CD19-directed CAR-T therapy trials have thus far used 2 different, individually developed grading scales to evaluate CRS. Per protocol, CRS events in the JULIET trial were graded according to the Penn scale. The other CD19-directed CAR-T therapy clinical trials (ZUMA-1, ROCKET [Study Evaluating the Efficacy and Safety of JCAR015 in Adult B-cell Acute Lymphoblastic Leukemia], TRANSCEND [Study Evaluating the Safety and Pharmacokinetics of JCAR017 in B-cell Non-Hodgkin Lymphoma]) used the Lee scale.7,20 A consensus scale was proposed by the ASTCT with the intent to separate CRS into distinct actionable grades and standardize CRS grade definitions across clinical trials.2,13,18 Given the evolution of identification and grading of CRS, we retrospectively analyzed patient-level CRS data (per protocol assessed by using the Penn scale) from the JULIET trial and used expert adjudication to regrade all CRS events according to the Lee scale and ASTCT criteria; the goal was to identify concordance between scales when applied to a single CAR-T therapy product in the same clinical trial data set.
Compared with the Penn scale, reassessment according to the Lee scale showed more events being categorized as grade 1 CRS, fewer patients having grades 2 and 3 CRS, and the same number of patients having grade 4 CRS. Compared with the Penn scale, ASTCT criteria resulted in more patients being categorized as having grade 1, the same number of patients as having grade 2, and fewer patients as having grades 3 and 4 CRS. Overall, the Lee scale and ASTCT criteria result in lower grades of CRS than the Penn scale used in the JULIET trial. Individual examples show how key differences between the definitions of CRS grades across the 3 scales translate into different CRS grades. Thus, our analyses highlight the challenges of having multiple CRS grading and management algorithms in the real world. Discordance in CRS grades across the 3 scales occurred due to the weight given to fever by the ASTCT consensus criteria and the different cutoffs for oxygen supplementation between the 3 scales, as well as the inclusion of distinct organ toxicity grades to define CRS grade by the Penn and Lee systems and the elimination of specific organ toxicity grades by the ASTCT criteria.
Because CRS management algorithms are based on CRS grade, different grading systems could result in patients receiving more aggressive or less aggressive treatment of the same CRS event depending on the criteria used at each institution, unless treating physicians use a CAR-T product–specific scale, which is impracticable. An important benefit of standardizing CAR-T therapy–associated CRS grading criteria will be standardization of CRS management algorithms. In the JULIET trial, tocilizumab was recommended for CRS with worsening symptoms, with respiratory distress requiring high-flow oxygen and/or mechanical ventilation, or with hypotension requiring moderate- to high-dose vasopressor intervention (ie, symptoms equivalent to Penn grades 3 and 4 CRS, respectively). Thus, only 14% of patients received tocilizumab in the JULIET trial. Conversely, the modified CRS treatment guidance cited in ZUMA-1 suggested considering tocilizumab treatment of CRS starting at Lee grade 2, resulting in 45% of enrolled patients receiving at least 1 tocilizumab dose (supplemental Data).5,7,14 It is difficult to directly compare 2 separate, single-arm, phase 2 trials with distinct patient populations. Furthermore, some patients in the JULIET trial were treated as outpatients, and CRS according to Lee grade was determined retrospectively. By comparison, the ZUMA-1 trial protocol required 7 days of hospitalization and close patient monitoring, and it may have captured additional AE data not recorded in JULIET patients in the outpatient setting. Nevertheless, Figure 2 4,5,7,22 highlights the evolution of tocilizumab use for CRS treatment; indeed, tocilizumab treatment of CRS at lower grades has become more prevalent in real-world practice and may lower the likelihood of subsequent evolution to grade 3 or 4 CRS events,23-25 obscuring the assessment of CAR-T product–related risk.
More recent CAR-T trials are also much more permissive in their algorithms for tocilizumab treatment of CRS. Thus, the JULIET trial is likely to remain the most conservative trial with regard to tocilizumab use, reserving treatment only for Penn scale grade 3 or 4 CRS events, which influenced decisions regarding utilization. Timing of tocilizumab treatment is also complicated by variations in definitions of CRS grade used by the Penn, Lee, and ASTCT scales. Therefore, achieving widespread use of a single set of consensus criteria, as proposed by the ASTCT, will allow a more uniform approach to managing CRS events in the real world. Furthermore, as CAR-T therapies become available in additional countries, it will be important that nurses and physicians with little experience managing CAR-T therapy–related AEs have clear toxicity definitions with aligned management recommendations to facilitate identification and treatment of emerging/progressing CRS. In the absence of any other global unifying criteria, we endorse the ASTCT consensus criteria as the current optimal grading scale for clinical studies and real-world use. Of note, prospective validation of ASTCT criteria is still needed, and ongoing clinical trials should continue to respond to the requests of regulatory bodies for any additional or alternate criteria by collecting the relevant data simultaneously with ASTCT grading data.
A key limitation of this analysis is that only retrospective patient-level data from the JULIET trial were available. Therefore, CRS grade was not prospectively assessed by the Lee and ASTCT criteria in parallel with the Penn scale, as the former were both introduced chronologically later than the Penn scale. In addition to grading criteria and CRS management practices, patient monitoring procedures have also evolved since the initial CAR-T treatment protocols. Nevertheless, using the JULIET trial data set as the basis for this analysis offered several advantages over using newer real-world data. Specifically, a global, registrational clinical trial offers a much more stringently controlled setting and extensive, standardized data collection/reporting procedures, capturing a more detailed and complete raw data set than can be obtained from the real-world setting. As more prospective and observational trials report results, CRS grading scales and treatment algorithms (with leeway for case-by-case treatment decisions) may become more standardized, leading to improved CRS management and optimization of anticytokine/tocilizumab and vasopressor use. We therefore expect that when large CRS database sets become available for study, we will be able to even more fully characterize the clinical boundaries of CRS and also better distinguish CRS when overlap with other coincident pathologies, such as infection, is present. For now, the ASTCT criteria offer a comprehensive “how to” guide on identifying and grading CRS events.2
In conclusion, we present a retrospective assessment of adult patients with r/r DLBCL or tFL, previously identified and graded according to the Penn scale criteria as having CAR-T–related CRS events, that redefined and graded CRS events per the Lee scale and ASTCT consensus criteria. We describe the concordance between all 3 grading scales as applied to the same data set from the global, registrational JULIET study, indicating a tendency for the Lee and ASTCT scales to “downgrade” severe CRS events as defined by using the Penn criteria. Finally, we describe tocilizumab use for CRS when graded by using the Lee system in 2 registrational CAR-T therapy trials for lymphoma (JULIET and ZUMA-1) to show how CRS management with tocilizumab has evolved with different trials and grading scales. The trend of administering tocilizumab at earlier grades of CRS can now also be seen in real-world clinical practice.23-25 Despite the limitations of a retrospective regrade of CRS events, these trials are the first informative studies of CAR-T therapy–associated CRS and its management with tocilizumab. This analysis highlights the need for standardizing grading scales and CRS management algorithms for different CAR-T therapies across trials.
Novartis is committed to sharing with qualified external researchers access to patient-level data and supporting clinical documents from eligible studies. These requests are reviewed and approved by an independent review panel on the basis of scientific merit. All data provided are anonymized to respect the privacy of patients who have participated in the trial in line with applicable laws and regulations. The data availability of these trials is according to the criteria and process described on www.clinicalstudydatarequest.com.
Acknowledgments
The investigators thank the patients and their families, and the clinical trial teams who participated in the JULIET trial.
Medical writing support was provided by Ina Nikolaeva (Healthcare Consultancy Group) and was funded by Novartis Pharmaceuticals Corporation. Editorial assistance was provided by Marie Louise Edwards, Lei Yin, and Yichen Lu from Analysis Group, Inc., and was supported by Novartis Pharmaceuticals Corporation. The trial was sponsored by Novartis Pharmaceuticals Corporation.
Authorship
Contribution: S.J.S., R.T.M., D.G.M., F.L.L., and V.V.R. were responsible for conception and design; S.J.S. and R.T.M. provided trial materials or patients; V.V.R. collected and assembled data; S.J.S., R.T.M., D.G.M., J.E.S., J.L., F.L.L., and V.V.R. analyzed and interpreted the data; and all authors were involved in writing the manuscript, approved the final manuscript as submitted, and agree to be accountable for all aspects of the work.
Conflict-of-interest disclosure: S.J.S. reports consultancy, honoraria, membership on an entity’s Board of Directors or advisory committees, and research funding from Celgene; consultancy and honoraria from Dava Oncology; honoraria and research funding from Genentech; membership on an entity’s Board of Directors or advisory committees for Gilead and Pfizer; consultancy, honoraria, and research funding from Merck; honoraria, membership on an entity’s Board of Directors or advisory committees, and research funding from Novartis; consultancy, honoraria, and membership on an entity’s Board of Directors or advisory committees from Nordic Nanovector; and honoraria from OncLive and Physicians’ Education Source, LLC. R.T.M. reports honoraria, membership on an entity’s Board of Directors or advisory committees, and research funding from Novartis; consultancy and honoraria from CRSPR Therapeutics, Incyte, and Juno Therapeutics; honoraria from Kite Therapeutics; patents and royalties from Athersys, Inc.; and employment by Oregon Health & Science University. R.T.M. also provided consultant services to and received payment from Novartis; this potential conflict of interest has been reviewed and managed by Oregon Health & Science University. E.S.R. is employed by Novartis. J.L. and J.E.S. are employed by the Analysis Group, Inc., which received funding from Novartis. V.V.R. is employed by Novartis. F.L.L. is a scientific advisor for Kite Pharma and Novartis; and reports consultancy for the Cellular BioMedicine Group Inc. D.G.M. receives research funding from Kite Pharma, a Gilead Company, and Celgene; receives research funding from and has patents licensed or pending with Juno Therapeutics, a Celgene/Bristol-Myers Squibb company; has participated in advisory board and/or protocol-specific data monitoring committee meetings for BioLine RX, Kite Pharma, Gilead, Genentech, Novartis, Juno Therapeutics, and Celgene and received honoraria; and is a member of the A2 Biotherapeutics Scientific Advisory Board and has stock options.
Correspondence: Stephen J. Schuster, Perelman Center for Advanced Medicine, University of Pennsylvania, 3400 Civic Center Blvd, Philadelphia, PA 19104; e-mail: schustes@pennmedicine.upenn.edu.
References
Author notes
The full-text version of this article contains a data supplement.