Abstract
Chronic graft-versus-host disease (cGVHD) is the leading cause of late treatment-related deaths among recipients of allogeneic bone marrow and blood transplants. However, cGVHD is also associated with fewer relapses. We sought to determine whether severity of cGVHD predicts the magnitude of these effects. One impediment to such an analysis is the current limited/extensive grading system for cGVHD because this classification was designed to identify patients likely to benefit from systemic immune suppression and does not capture the severity of multiorgan involvement. We, therefore, first developed a grading system predictive for survival by using data from 1827 HLA-matched sibling allotransplant recipients reported to the International Bone Marrow Transplant Registry (IBMTR). We found Karnofsky performance score, diarrhea, weight loss, and cutaneous and oral involvement to be independent prognostic variables, from which we generated a grading scheme. We tested this scheme, the limited/extensive classification system, and a classification based on clinical impression of overall cGVHD severity (mild/moderate/severe) in parallel analyses of 1092 HLA-matched sibling transplant recipients from the IBMTR and 553 recipients of unrelated donor marrow from the National Marrow Donor Program. Presence of cGVHD was associated with fewer relapses (relative risk [RR], 0.5-0.6) but more treatment-related mortality (RR, 1.8-2.8) in the 3 analyses. No grading scheme correlated cGVHD severity with relapse rates, but all schemes predicted treatment-related mortality. Survival and disease-free survival of the most favorable cGVHD group in each scheme were similar, or better, than those of patients without cGVHD; these patients may not need aggressive or extended immune suppression.
Introduction
Chronic graft-versus-host disease (cGVHD) is an important complication of allogeneic stem cell transplantation. From 30% to 70% of allograft recipients develop cGVHD1,2 that is associated with decreased quality of life,3 impaired functional status,4,5 need for extended immune suppression, and impaired survival.6-10 Several trends in allogeneic transplantation, including use of transplants in older patients, use of peripheral blood cell grafts,11-17 and use of unrelated or HLA-mismatched donors, may increase incidence of cGVHD.
Despite its adverse effects, cGVHD is associated with fewer leukemia relapses. This effect is thought to reflect a graft-versus-leukemia effect comparable or greater than that ascribed to acute GVHD.18-23 Consequently, the influence of cGVHD on survival reflects the balance of its negative (increased treatment-related mortality) and positive (fewer relapses) effects. Describing this relationship requires an accurate measure of cGVHD severity.
The current grading scheme for cGVHD severity was proposed in 1980 based on data on 20 subjects. Chronic GVHD was classified as limited (involving only localized skin and/or liver) or extensive (generalized skin or limited disease plus involvement of other organs) involvement. This system was developed primarily to distinguish patients requiring systemic immune suppression from those for whom local care might suffice. The heterogeneity of organ involvement, clinical severity, and prognosis, especially within the extensive category, was acknowledged but purposefully not incorporated into the grading scale.24 Both the International Bone Marrow Transplant Registry (IBMTR) and the National Marrow Donor Program (NMDP) use this grading system to report cGVHD severity.
There are several grading schemes that predict survival of patients with cGVHD.6,8,25 26 In these grading schemes, poor prognostic variables include lichenoid skin changes, extensive skin involvement (> 50% body surface area), elevated bilirubin, progressive onset, thrombocytopenia, and prior steroid refractory/dependent acute GVHD. We were unable to calculate severity scores with the use of these grading schemes in our study because several variables (lichenoid skin changes, extent of skin involvement, prior steroid refractory/dependent acute GVHD) are unavailable in the IBMTR and NMDP databases.
We analyzed data from the IBMTR and the NMDP to develop and, hopefully, validate a new cGVHD severity scoring system based on patterns of organ involvement. We then used this system and the previous limited/extensive and clinical grading of cGVHD (mild/moderate/severe) to study the correlation between cGVHD severity and treatment-related mortality, relapse, disease-free survival, and survival.
Patients, materials, and methods
Patients
Three independent data sets were used (IBMTR-1, IBMTR-2, and NMDP). Each group represented nonoverlapping populations with data obtained by using different versions of data collection forms. IBMTR-1 served as the training set for the new cGVHD severity score, whereas IBMTR-2 and NMDP were used to test validity. All subjects met the following eligibility criteria: (1) acute myelogenous leukemia, acute lymphoblastic leukemia, or chronic myelogenous leukemia (CML); (2) non–T-cell–depleted transplant from an HLA-matched sibling or unrelated donor (matched at HLA-A, -B, -DR by serologic or molecular testing); (3) age of 16 years or older; (4) transplantation after 1990; (5) acute GVHD prophylaxis with cyclosporine and methotrexate; and (6) disease-free survival 100 days or more.
Data collection instrument and methods
Subject-, disease-, and transplant-related variables and outcomes were collected on standardized forms of the IBMTR and NMDP. The IBMTR is a voluntary working group of more than 350 transplant teams worldwide that contribute detailed data on their allogeneic transplants to the Statistical Center at the Medical College of Wisconsin. Participants are required to register all consecutive transplants. The IBMTR database includes information on 40% to 45% of all allogeneic transplant recipients since 1970. All patients are followed longitudinally for survival and relapse. However, the IBMTR uses 2 follow-up forms, the abbreviated and the comprehensive versions; only the comprehensive form collects cGVHD data. To ensure that late onset cGVHD was not missed and to maintain comparability of follow-up between patients who did and did not develop cGVHD, all patients were censored at the time of last comprehensive follow-up form completion even if more recent survival and relapse data were available from abbreviated forms. Computerized error checks, physician review of submitted data, and on-site audits of participating centers ensure data quality.
The NMDP, established in 1986, maintains a registry of more than 4 million volunteer stem cell donors, facilitates unrelated donor stem cell transplants, and performs research in the area of unrelated donor stem cell transplantation. Through its network of participating centers, the NMDP has facilitated more than 12 000 transplantations. The 128 participating transplant centers prospectively report recipient baseline and follow-up data to the NMDP Coordinating Center in Minneapolis. Baseline and follow-up data on almost 85% of the NMDP-facilitated transplants have been reported to the NMDP database. Data quality is maintained through on-line computer validation and on-site data audits.
Relapse was defined by hematologic criteria for all diseases. Molecular and cytogenetic relapses for chronic myelogenous leukemia were not included because they are not 100% predictive for subsequent relapse and because frequency of monitoring likely varies by center. Chronic GVHD information was collected on 3 different versions of one data form and varied in the degree of detail requested. This variation necessitated combining or substituting some variables to apply the new severity score to the validation sets. These conversions are outlined in “” and were established before analysis. Clinical assessment of cGVHD severity (mild, moderate, or severe) was collected on IBMTR patients. However, no guidelines were provided to standardize this judgment.
The NMDP collects information at 100 days, 6 months, 1 year, and then annually after transplantation. Median time of completion of the first 3 follow-up forms is 100 days, 6 months, and 1 year, respectively. In contrast, the first IBMTR form is completed a median of 7.5 months after transplantation, and the second is completed approximately 19 months after transplantation. Any manifestations reported on the first 3 NMDP forms (comprising follow-up to 1 year) were considered “initial” manifestations in the calculation of the new severity score to be comparable with the IBMTR training set in which the score was developed.
Missing data are due to absent questions on the study form or incomplete reporting by centers. For example, data on bilirubin was not collected for the IBMTR-1 group and weight loss was not collected for the IBMTR-2 cohort. The NMDP did not request Karnofsky performance score (KPS) before 1993 or collect a clinical impression of severity of cGVHD (mild/moderate/severe).
Biostatistical analysis
Descriptive statistics are reported for the 3 cohorts. The IBMTR-1 (training) group was used to develop the severity score that was tested in the IBMTR-2 and NMDP cohorts (validation sets).
Classifications of limited or extensive cGVHD were compared with reported organ involvement to assess whether this grading system was interpreted correctly by transplant centers. We assumed that involvement of only skin and/or liver was consistent with limited disease. We furthermore considered chronic diarrhea or eye, mouth, esophageal, joint, lung, genitourinary, or muscular involvement consistent with extensive disease whether or not there was skin and/or liver involvement. This algorithm misclassifies some patients with extensive cGVHD as having limited disease because patients with generalized skin involvement or liver cirrhosis actually have extensive disease.
Development and validation of the new severity score
Two proportional hazards models were created by using the training set: the first used a delayed onset model to predict overall survival from time of transplantation so that subjects entered the risk set when they developed cGVHD. A second model predicted survival time from date of cGVHD onset. Candidate variables included (1) type of onset (progressive, interrupted, de novo); (2) platelet count at onset (< 100 × 109/L versus ≥ 100 × 109/L); (3) KPS (< 80% versus ≥ 80%); (4) organ-specific involvement: skin, eye, mouth, esophagus, liver, joint, and lung; and (4) symptoms: diarrhea and weight loss. (Note: Bilirubin level was not collected on IBMTR-1 patients and, therefore, was not a potential predictor.) Time-varying covariates were used to represent organ involvement and symptoms. Of the baseline covariates (year of transplantation, disease type, disease stage [early, intermediate, advanced], donor-recipient sex-matching, age, cytomegalovirus serologic status, use of total body irradiation for conditioning, and whether or not acute GVHD occurred), only disease stage was found to predict survival in patients with cGVHD and was included in all models. Patients were censored at time of second transplantation, donor lymphocyte infusion, or last follow-up if alive.
Proportional hazards models were fitted for different combinations of organ involvement, and those combinations present in at least 5 subjects retained for analysis. Combinations with overall survival curves statistically different than baseline (P < .05) were identified and grouped.
Comparisons were made between the predictive ability of the new severity score and the limited/extensive and mild/moderate/severe grading scales as reported by the transplant centers using Akaike information criterion (AIC).27 28 When models are run on identical subject populations, a lower AIC indicates a better fit and higher predictive ability.
Comparison of cGVHD severity groups with patients who did not develop cGVHD
Severity groups were compared with each other and with subjects without cGVHD by using time-varying covariates for the cGVHD severity score (patients without cGVHD formed the baseline group). Analyses were adjusted for disease type and disease stage, age, donor-recipient sex-matching, and whether or not prior acute GVHD occurred since these baseline covariates were associated with survival in the entire population. Possible interactions between cGVHD severity and the baseline covariates were evaluated by using interaction terms.
Although the cGVHD severity score was developed by using a survival end point, its predictive values for leukemia-free survival, treatment-related mortality, and relapse were also tested. Survival curves were presented by using left-truncated Kaplan-Meier plots. Relapse and treatment-related mortality were presented as cumulative incidences.29
Results
Characteristics of the training and validation sets
Table 1 shows the characteristics of the cohorts studied. Baseline subject- and transplant-related variables were similar among the 3 data sets, although there were some differences in disease distribution, proportion of recipient/donor cytomegalovirus-negative subjects, use of total body irradiation, and year of transplantation. The distribution of grades of acute GVHD was skewed upward in unrelated transplant recipients compared with sibling donor transplant recipients.
Because the IBMTR does not collect cGVHD data on its abbreviated follow-up forms, surviving recipients of sibling transplants were censored at date of last comprehensive reporting to ensure that cases of late onset cGVHD were not missed. Consequently, follow-up of IBMTR survivors was shorter (median, 1.4 and 1.2 years for the 2 groups) than NMDP survivors (median, 6.0 years, P < .0001). We were not able to classify 190 (50%) of NMDP patients by the new score because forms were completed before a question on KPS data was added. At last follow-up there were relapses in 243 (13%) IBMTR-1 patients, 132 (12%) IBMTR-2 patients, and 64 (12%) NMDP patients. There was treatment-related mortality in 221 (12%) IBMTR-1 patients, 130 (12%) IBMTR-2 patients, and 256 (46%) NMDP patients.
Description of cGVHD
Cumulative incidence of cGVHD was greater (63% versus 42% at 1 year, P < .0001), and manifestations more severe in the recipients of unrelated compared with sibling transplantations. Characteristics of cGVHD are shown in Table2 and Figure1. The greatest differences were seen in extensive disease (85% in recipients of unrelated compared with 46%-52% in sibling transplants, P < .0001), lower KPS in recipients of unrelated transplants (median, 70% versus 80%,P < .0001), and more frequent cutaneous involvement and diarrhea in recipients of unrelated transplants although less hepatic involvement was reported. However, median time to onset was similar in the 3 data sets (122-140 days, P = not significant). Skin, mouth, and liver involvement occurred in more than one half of subjects with cGVHD, whereas eye involvement, diarrhea, and weight loss occurred in about one quarter (Figure 1). More than one half of subjects had 3 or more organs involved.
Incorrect designation of limited and extensive disease was common among recipients of sibling transplants with 65% to 67% of subjects scored as “limited,” reporting organ involvement other than skin and liver. The most common other organ reported in patients misclassified as having limited disease was oral involvement (72%-83% of misclassified cases) and eye involvement (25%-41% of cases). Forty-three percent of the “limited” cGVHD in recipients of unrelated donor transplants was similarly misclassified, primarily because of oral involvement (83% of cases). However, the original classification as reported by the transplant centers was used for all subsequent analyses presented (ie, patients were not reclassified) because database limitations prevented our ability to apply the limited/extensive criteria with certainty.
cGVHD severity score based on registry data
Proportional hazards assumptions were fulfilled in the training set. Both statistical approaches identified similar clinical manifestations as important and generated identical prognostic groups. KPS, mouth and skin involvement, diarrhea, and weight loss were found to be important, with high KPS and mouth involvement being favorable prognostic signs and with skin involvement, diarrhea, and weight loss being unfavorable factors (Figure 1). Consideration of time to onset of cGVHD; platelet count; type of onset; eye, esophageal, liver, joint, and lung involvement at diagnosis did not improve prognostic ability.
Three prognostic groups were devised in which survival was statistically different. Low-risk patients had a high KPS (≥ 80%) and no diarrhea or weight loss. High-risk patients had high KPS, diarrhea, and weight loss or low KPS with either no oral involvement or all 3 poor prognostic signs: diarrhea, weight loss, and cutaneous involvement. Intermediate-risk patients had any KPS and all other combinations of organ involvement. The training population contained 50% low-, 32% intermediate-, and 18% high-risk patients (Figure2).
There was no evidence of an interaction (P < .05) between any of the variables associated with survival (disease type and disease stage, age, donor-recipient sex-matching, and prior acute GVHD) and severity of cGVHD.
Predictive ability of the cGVHD severity score in the validation sets
The cGVHD severity score was applied to the validation sets after conversion of variables according to rules established before the analysis (). All analyses were adjusted for disease type and disease stage, recipient age, donor-recipient sex-matching, and prior acute GVHD. Survival as predicted by different scoring systems is shown in Figure 3 and summarized in Table 3. Relative risks (RRs) represent the effect of cGVHD severity on survival compared with the baseline group of patients without cGVHD. When the new severity score was applied to the IBMTR-2 validation set, low-risk patients had better survival than the intermediate- and high-risk group (RR, 0.8 versus 2.0 versus 2.7, P < .0001); intermediate- and high-risk groups were similar (P = .29). In the NMDP validation set, patients with low-, intermediate-, and high-risk cGVHD had statistically different survival times (RR, 0.6 versus 1.1 versus 2.0, P = .0006).
Comparison of grading schemes
Comprehensive comparison of the low-/intermediate-/high-risk schema, the limited/extensive score, and the IBMTR mild/moderate/severe clinical grading was hampered by incorrect classification of the limited/extensive scale by transplant centers and absence of clinical impression for NMDP subjects. However, the new severity score was equivalent or better than the limited/extensive classification based on lower AIC, but not as good as the mild/moderate/severe classification in predicting survival (Figure3). All grading schemes performed best in the IBMTR-1 cohort. The limited/extensive analysis was based on data reported directly by the centers; no patients were reclassified. When patients were reclassified by using reported organ involvement according to our algorithm, predictive ability worsened in all 3 cohorts.
Effects of cGVHD severity on treatment-related mortality, relapse, and disease-free survival
The effects of cGVHD severity on treatment-related mortality, relapse, and disease-free survival are shown in Tables4, 5, and6 and graphed for the IBMTR-1 cohort in Figure 4. In all 3 cohorts, treatment-related mortality increased with increasing severity of cGVHD with the use of all 3 grading systems. For example, in the IBMTR-1 cohort, intermediate risk (RR, 2.2; 95% confidence interval [CI], 1.5-3.2) and high risk (RR, 4.2; 95% CI, 2.9-6.0) were associated with more treatment-related mortality than low risk or no cGVHD. A similar pattern was seen in the IBMTR-2 and NMDP groups.
A lower relapse rate was seen in all patients with cGVHD, but we could find no evidence that relapse rate was associated with cGVHD severity. Relapses were uncommon after the development of cGVHD (8%-9%), and increasing severity of cGVHD was not associated with a longer time to relapse.
Discussion
Prior reports have noted that patients with cGVHD have fewer leukemia relapses30-32 but more treatment-related mortality when considered as a binary end point.9 10 We confirm and extend these findings by considering cGVHD severity. By using 3 grading systems: limited/extensive, a clinical impression scale (mild, moderate, severe), and a new severity score developed and validated specifically for this study, we could not find evidence that severity of cGVHD was associated with relapse risk (RR, 0.5-0.6), but we did find that more severe cGVHD is associated with more treatment-related mortality. The net effect of these 2 influences determines the effect of cGVHD on disease-free and overall survival. Greater cGVHD is associated with worse survival because there is greater treatment-related mortality without a benefit of fewer relapses.
We based our severity score on data reported retrospectively to the IBMTR and NMDP. This score separates subjects into low-, intermediate-, and high-risk categories based on KPS at diagnosis and on the presence of diarrhea, weight loss, cutaneous manifestations, and/or oral involvement. For example, patients in the low-risk group comprised 50% of the IBMTR-1 cGVHD population, 61% of the IBMTR-2 group, and 34% of the NMDP patients. They had high KPS (≥ 80%) and no diarrhea or weight loss. The new score performed better than the limited/extensive grading system in predicting overall and disease-free survival but not as well as the clinical impression of mild/moderate/severe disease. However, within all 3 grading schemes we could identify a favorable subgroup of patients (low risk, limited involvement, mild severity) whose survival was as good as, or better than, patients without cGVHD because of low treatment-related mortality and high antileukemic effect.
Several factors associated with poor prognosis of patients with cGVHD are reported by others, including lichenoid skin changes (RR, 2.2 and 2.5), skin involvement on more than 50% of body surface area (RR, 7.0), elevated total bilirubin (RR, 2.1), progressive onset (RR, 1.7 and 4.1), low platelet count (RR, 2.0 and 3.6), and cGVHD developing after corticosteroid refractory/dependent acute GVHD.6,8,25 26 However, many of these variables are not available in our databases (lichenoid skin involvement, extent of skin involvement, bilirubin levels, prior corticosteroid treatment) or were not found to have prognostic significance (progressive onset, low platelet counts) in these analyses.
An ideal grading scheme has the following properties: (1) predicts survival or disease-free survival in a clinically meaningful way even as transplantation procedures evolve; (2) applicable in diverse populations, including related and unrelated transplants, HLA-matched and mismatched recipients, and both children and adults; (3) reproducible over time and at different sites; and, finally, (4) easy to apply. Each grading scheme we used in this analysis suffered from limitations in one or more of these areas.
The mild/moderate/severe clinical impression appeared to be the most predictive but had 2 major limitations. First, centers classified their patients on the basis of clinical impression without specific guidance as to these definitions. In the absence of objective criteria, reproducibility, reliability, and validity of individual assessments are questionable. Second, these classifications may reflect a lengthy period of observation. The designation of mild/moderate/severe cGVHD may be highly predictive because it incorporates response to treatment, evolution of new organ involvement, and even knowledge about morbidity and death as a result of cGVHD. Only with real-time data defining the worst manifestations of cGVHD at a given time point can these ambiguities be eliminated.
The limited/extensive grading scheme appears relatively straightforward, but our results show application in practice can be difficult. Although the criteria are printed on the data forms, many clinical situations do not fit either the limited or extensive category, and many centers appear to misclassify patients based on inconsistencies in the spectrum of organs reported. In patients without clinical or histologic characteristics detailed by limited or extensive headings, physicians may interpret the scale as an indicator of severity. If so, the limited/extensive scale becomes a subjective assessment similar to the mild/moderate/severe system.
Our severity score performed well in the training set, but it is difficult to use. Also, it performed less well in the validation sets. This finding demonstrates the need to validate proposed scales either prospectively or in independent data sets with a fuller representation of all relevant data elements. Although registry data can indicate potentially important components of a severity scale, we suggest that any further refinement in cGVHD assessment is best accomplished by a prospective effort to collect detailed information on characteristics at cGVHD onset, treatment response, and course of organ involvement in a large group of patients by using defined criteria.
Despite these caveats, our analysis shows that patients identified by centers as having extensive cGVHD, severe involvement, or high-risk disease are likely to have significantly worse survival than patients without cGVHD or with less severe cGVHD. These patients are appropriate candidates for clinical trials aimed at improving management of cGVHD. Patients with limited cGVHD, mild involvement, or low-risk as defined by our criteria (KPS ≥ 80%, no diarrhea, no weight loss) have survival equal to or better than patients without cGVHD and may not need aggressive treatment or an extended duration of immune suppression.
The contents of this article are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute.
Supported in part by National Institutes of Health Grant no. CA75267-03 and the Amy Strelzer-Manasevit Scholars Program. National Marrow Donor Program, the Health Resources and Services Administration no. 240-97-0036. Additional support provided by Public Health Service Grants P01-CA-40053 and U24-76518 from the National Cancer Institute, the National Institute of Allergy and Infectious Diseases, and the National Heart, Lung, and Blood Institute of the U.S. Department of Health and Human Services; and grants from Alpha Therapeutic Corporation; Amgen, Inc; Anonymous; Baxter Fenwal; Berlex Laboratories; BioWhitakker, Inc; Blue Cross and Blue Shield Association; Lynde and Harry Bradley Foundation; Bristol-Myers Squibb Company; Cell Therapeutics, Inc; Centeon; Center for Advanced Studies in Leukemia; Chimeric Therapies; Chiron Therapeutics; Charles E. Culpeper Foundation; Eleanor Naylor Dana Charitable Trust; Eppley Foundation for Research; Fromstein Foundation; Genentech, Inc; Human Genome Sciences; Immunex Corporation; Kettering Family Foundation; Kirin Brewery Company; Robert J. Kleberg Jr and Helen C. Kleberg Foundation; Herbert H. Kohl Charities, Inc; Nada and Herbert P. Mahler Charities; Milstein Family Foundation; Milwaukee Foundation/Elsa Schoeneich Research Fund; NeXstar Pharmaceuticals, Inc; Samuel Roberts Noble Foundation; Novartis Pharmaceuticals; Orphan Medical; Ortho Biotech, Inc; John Oster Family Foundation; Jane and Lloyd Pettit Foundation; Alirio Pfiffer Bone Marrow Transplant Support Association; Pfizer, Inc; RGK Foundation; Rockwell Automation Allen Bradley Company; Roche Laboratories; SangStat Medical Corporation; Schering AG; Schering-Plough Oncology; Searle; SEQUUS Pharmaceuticals; SmithKline Beecham Pharmaceutical; Stackner Family Foundation; Starr Foundation; Joan and Jack Stein Foundation; SyStemix; United Resource Networks; and Wyeth-Ayerst Laboratories.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 U.S.C. section 1734.
References
Author notes
Stephanie Lee, Dana-Farber Cancer Institute, 44 Binney Street, Boston, MA 02115; e-mail:stephanie_lee@dfci.harvard.edu.