Key Points
A risk model using donor and recipient cytokine gene polymorphisms and clinical variables significantly improves GVHD risk stratification.
The model is useful in identifying patients with low-risk of developing severe GVHD, but results must be confirmed in prospective studies.
Abstract
Despite considerable advances in our understanding of the pathophysiology of graft-versus-host disease (GVHD), its prediction remains unresolved and depends mainly on clinical data. The aim of this study is to build a predictive model based on clinical variables and cytokine gene polymorphism for predicting acute GVHD (aGVHD) and chronic GVHD (cGVHD) from the analysis of a large cohort of HLA-identical sibling donor allogeneic stem cell transplant (allo-SCT) patients. A total of 25 SNPs in 12 cytokine genes were evaluated in 509 patients. Data were analyzed using a linear regression model and the least absolute shrinkage and selection operator (LASSO). The statistical model was constructed by randomly selecting 85% of cases (training set), and the predictive ability was confirmed based on the remaining 15% of cases (test set). Models including clinical and genetic variables (CG-M) predicted severe aGVHD significantly better than models including only clinical variables (C-M) or only genetic variables (G-M). For grades 3-4 aGVHD, the correct classification rates (CCR1) were: 100% for CG-M, 88% for G-M, and 50% for C-M. On the other hand, CG-M and G-M predicted extensive cGVHD better than C-M (CCR1: 80% vs. 66.7%, respectively). A risk score was calculated based on LASSO multivariate analyses. It was able to correctly stratify patients who developed grades 3-4 aGVHD (P < .001) and extensive cGVHD (P < .001). The novel predictive models proposed here improve the prediction of severe GVHD after allo-SCT. This approach could facilitate personalized risk-adapted clinical management of patients undergoing allo-SCT.
Introduction
Allogeneic hematopoietic stem-cell transplantation (allo-SCT) is a curative therapeutic approach for patients with hematologic malignancies. Patients undergoing allo-SCT receive a donor graft containing hematopoietic stem cells, as well as various other cell types, including alloreactive T cells. T cells promote hematopoietic engraftment, T-cell immunity reconstitution, and mediate graft-versus-leukemia effect, which may prevent tumor relapse. However, donor T cells may also cause graft-versus-host disease (GVHD), which is the main complication after allo-SCT and the most important cause of nonrelapse morbidity and nonrelapse mortality (NRM).1
There are 2 forms of GVHD, acute GVHD (aGVHD) and chronic GVHD (cGVHD). aGVHD is a complex process that takes place in 3 phases.2 In the first phase, conditioning regimen damages host tissues and raises levels of proinflammatory cytokines such as interleukin-1 (IL-1), IL-6, tumor necrosis factor α (TNFα), and interferon-γ (IFN-γ), thus activating host antigen-presenting cells, which stimulate donor T cells. In the second phase, this interaction induces proliferation and differentiation of donor T cells, which in turn leads to rapid intracellular biochemical cascades that induce transcription of genes for many proteins (including cytokines TNFα, IFN-γ, and IL-2) and promote cellular activity. The third effector phase is a complex cascade of both cellular mediators and soluble inflammatory mediators such as TNFα, IFN-γ, IL-1, and nitric oxide, resulting in tissue injury. Although the pathophysiology of cGVHD is less known, significant advances in our understanding have been made in recent years, and it is now evident that the clinical manifestations result from a complex immune disease involving both donor B cells and T cells.3 The long-standing hypothesis is that cGVHD is similar to an autoimmune disorder.4 It is well established that the most important risk factor for the development of GVHD is the degree of HLA matching between the recipient and the donor,5,6 although a significant proportion of patients undergoing transplantation with HLA-identical grafts develop aGVHD1 and/or cGHVD.7 Consequently, other non-HLA factors contribute to the development of this complication. Major clinical factors associated with GVHD include patient age, sex of donor/recipient,8 stem-cell source,9 GVHD prophylaxis, underlying disease, conditioning regimen,10 and, for cGVHD, a history of aGVHD.
Genetic differences in non-HLA genes between recipients and donors are also important,2 and the role of polymorphisms in human minor histocompatibility antigens,11,12 innate immunity genes,13-15 genes involved in drug metabolism, and proinflammatory cytokines must be taken into account.16,17 During the past decade, single-nucleotide polymorphisms (SNPs) have been identified in genes involved in innate and adaptive immune responses, such as cytokines and their receptors, which have a role in the classic cytokine storm of GVHD.18-21 However, information regarding the diagnostic, prognostic, and predictive significance of these molecules in GVHD is limited. Although clinically useful biomarkers are available, no particular biomarker alone is generally satisfactory in terms of sensitivity or specificity for the diagnosis or prediction of a disease. Therefore, it is important to build biomarker panels and risk models for GVHD. In recent years, many groups have been working in this field. Kim et al22,23 built a risk model incorporating SNPs and clinical markers to stratify patients and more accurately predict the risk of GVHD in specific organs, Paczesny et al24 developed protein panels that provide meaningful information to confirm the diagnosis of GVHD in patients at the onset of clinical symptoms of GVHD and provide useful data for prognosis.25
After genotyping a panel of polymorphisms in cytokine genes that had been previously associated with aGVHD or cGVHD,7,18,26 we applied a complex estimation method, the least absolute shrinkage and selection operator (LASSO) procedure,27 which is able to group optimal predictors from a large set of potential clinical and genetic predictor variables, improving their clinical utility.
Methods
Study design
This retrospective study included 509 patients with hematological malignancies from the Spanish Group for Hematopoietic Transplantation (GETH) who underwent conventional HLA-identical sibling-donor allo-SCT between 1997 and 2010 at 11 Spanish institutions (the mean number of patients from each center was 46.3 [range, 25-134]; supplemental Table 1). The median follow-up for living patients was 14.7 months (range, 2-105.4 months).
Only patients for whom all clinical and genetic data were available (a prerequisite of the LASSO procedure) were finally included in the analysis (n = 359; Table 1). Patients who died before day +100 without aGVHD (n = 96) or day +200 without cGVHD (n = 154) were excluded from the LASSO multivariate analyses, which therefore included 263 patients for aGVHD and 207 patients for cGVHD modeling.
The study was approved by the ethics committee of Hospital General Universitario Gregorio Marañón, and all recipients and donors provided written informed consent according to the Declaration of Helsinki.
Polymorphism genotyping
DNA was obtained from EDTA anticoagulated peripheral blood samples collected at the pretransplantation evaluation included in the GETH DNA bank.
A total of 25 SNPs in 12 cytokine genes (supplemental Table 2) were selected for their potential role in the pathogenesis of GVHD or in any autoimmune disease in other studies.7,15,20 SNPs were genotyped using the MALDI-TOF MassARRAY iPLEX Gold platform (Sequenom, San Diego, CA) at CeGen (National Genotyping Centre, Santiago de Compostela, Spain).
Clinical and genetic variables
Three predictive models were constructed for each of the outcomes considered (grade 2-4 aGVHD, grade 3-4 aGVHD, cGVHD, extensive cGVHD, and NRM): 1 with clinical variables alone (C-M), 1 with genetic variables alone (G-M), and 1 with both clinical and genetic variables (CG-M).
Clinical variables included were donor and recipient sex, recipient age, female donor/male recipient, stem-cell source, conditioning regimen, total body irradiation (TBI)–containing regimen, and disease. Previous grade 2 to 4 aGVHD was included in the analysis of cGVHD. Genetic variables (supplemental Table 1) were assessed for donors and recipients, and 4 different models of transmission were considered (recessive, dominant, codominant, and additive). Therefore, 8 genetic variables were built for each SNP.
Limitations
Extensive cGVHD is considered in the present study, which includes patients from 1997, although it is no longer used as an end point in clinical practice.
Statistical analysis
The descriptive analysis of the SNPs was performed using the SNPassoc R package (version 1.5-8). Univariate regression analysis was performed using logistic regression with the SNPassoc R package for SNPs and with IBM SPSS Statistics for Windows (version 21.0; IBM Corp, Armonk, NY). P < .05 was considered significant.
Multivariate regression analysis was performed using the LASSO procedure, which is being increasingly applied to overcome the challenges posed by high-dimensional data.
LASSO is an innovative estimation method for linear regression models developed in 1996 by Tibshirani,27 which is able to select a set of optimal predictors from a large set of potential predictor variables. This method constrains the sum of absolute values of the regression coefficients by means of a smoothing parameter (λ), shrinking the estimated coefficients toward 0. Because of this, it is considered a powerful method for variable selection, providing more interpretable models. The idea of LASSO is quite general and can be applied in other statistical models, such as the generalized linear models.
In this study, the response variable Y is a binary variable that denotes whether the patient is affected by GVHD/NRM or not (Y = 1 and Y = 0, respectively). In that sense, LASSO was considered a variable selection method under the estimation of a Logit regression model (which is a particular type of generalized linear model). In this model, the strength of the penalty term is controlled by a smoothing parameter (λ), so that the larger λ is, the more parsimonious the model is (if λ = 0, then all the predictors are considered in the final model). Because of this, it is important to find the optimal parameter λ, which provides the best predictive model to anticipate GVHD and NRM. To this end, λ was chosen by adhering to the principle of parsimony and maximizing the area under the receiver operating characteristic curve (AUC) and the correct classification rates (CCRs; global CCR); for patients who do not develop GVHD, CCR0; for patients who develop GVHD, CCR1) associated with the fitted models for a grid of 100 values of λ (supplemental Figure 1). Theoretically, the AUC takes values between 0 and 1, although the practical lower bound is 0.5. A perfect classifier has an AUC of 1. All clinical and genetic variables were included in the LASSO multivariate analysis independently of the P value.
The statistical model was fitted (goodness-of-fit assessment) by randomly selecting 85% of the data (training set: 85% of cases and 85% of controls, because sets were representative of our initial sample), and the predictive ability was computed with the remaining 15% (test set). To evaluate the performance and the predictive ability of each model, training and testing samples were randomly selected 100 times. The distribution of the CCR and the AUC over the 100 iterations was shown by means of box plots and a statistical summary of the results. LASSO multivariate regression analysis was shown as odds ratio, which is the exponential function of the β coefficient of LASSO (odds ratio = exp[β coefficient]).
Finally, for prediction purposes, the cutoff point between low and high risk was based on the proportion of patients who developed (Y = 1) and did not develop (Y = 0) GVHD or NRM.30 These proportions of Y = 1 were 0.28 for grades 2 to 4 aGVHD, 0.11 for grade 3 to 4 aGVHD, 0.53 for cGVHD, 0.30 for extensive cGVHD, and 0.24 for NRM.
Predictive models
On the basis of LASSO multivariate analyses, a risk score was calculated for grade 2 to 4 and grade 3 to 4 aGVHD, for cGVHD and extensive cGVHD, and for NRM. To build the predictive model, GVHD and NRM risk scores were weighted by the size of the effect on the β coefficient of each variable and a constant obtained by the LASSO procedure, within a risk score equation.31 Such risk scores were used to calculate the risk for each patient who was classified as low risk (when the risk score fell below the cutoff point) and high risk (risk score above the cutoff point).
Results
Descriptive analysis
Results of the descriptive analysis including genotype frequencies, Hardy-Weinberg equilibrium, and minor allele frequency of the 25 SNPs in donors and recipients are summarized in supplemental Table 3. Genotype frequencies were similar to those of the 1000 Genomes Project for the Spanish population and were in accordance with Hardy-Weinberg equilibrium, except for the frequencies of the IL-10 SNPs (rs1800871, rs1800872, and rs1800896), which were in linkage disequilibrium.
Univariate and multivariate analyses
The association between clinical and genetic variables in donors and in recipients with the development of aGVHD, cGVHD, and NRM was investigated using univariate analysis (summarized in Table 2; detailed in supplemental Table 4) and LASSO multivariate analysis (Tables 3-6).
Univariate analysis allowed us to identify statistically significant (P < .05) clinical variables and cytokine gene polymorphisms associated with the development of aGVHD, cGVHD, and NRM. Clinical variables seemed to have a reduced influence on the development of aGVHD. In contrast, IL-1B and IL-17A were the most important cytokines for the development of grade 2 to 4 aGVHD, as IL-6 was for grade 3 to 4 aGVHD. Clinical variables (age, conditioning regimen, stem-cell source, and previous development of aGVHD) seemed to have a greater influence on the development of cGVHD. Likewise, the most important cytokines were IL-1A, IL-1B, IL-23R, and INF-γ for cGVHD and IL-2, IL-17A, IL-23R, and TGFβ for extensive cGVHD. Only sex mismatch and IL-17A were associated with the occurrence of NRM (Table 2; supplemental Table 4).
The LASSO approach (supplemental Figure 1) allowed us to obtain the best models for predicting GVHD (aGVHD and cGVHD) and NRM. For anticipating grade 2 to 4 aGVHD, cGVHD, and NRM, none of the models was good enough for stratifying patients (Figure 1). For grade 3 to 4 aGVHD, the best C-M included 3 variables: conditioning, TBI, and disease (Table 3; Figure 1), which rendered a CCR1 of 50% and AUC of 0.6. The negative predictive value (NPV) of this model was 91.8%.
The best G-M included 10 cytokines (IL-1B, IL-2, IL-6, IL-7R, IL-10, IL-17A, IL-23R, INF-γ, TGFβ, and TNFα). This model obtained a CCR1 of 88%, with an AUC of 0.8 and NPV of 96% (Table 4; Figure 1). In contrast, the model that included both clinical and genetic variables retained the same clinical variables from C-M and added 11 cytokines (IL-1A, IL-1B, IL-2, IL-6, IL-7R, IL-10, IL-17A, IL-23R, INF-γ, TGFβ, and TNFα). This model obtained a CCR1 of 100%, AUC of 0.9, and NPV of 98.6% (Tables 5 and 6; Figure 1). Interestingly, 9 SNPs were selected by LASSO in both models (grades 2 to 4 and 3 to 4), in genes that could be relevant for aGVHD pathophysiology. However, 13 SNPs were selected only in the grade 3 to 4 model in genes that may be related to the severity of the complication.
The best clinical model for predicting extensive cGVHD included age, sex, stem-cell source, and previous aGVHD (Table 3; Figure 1), with a CCR1 of 66.7%, AUC of 0.7, and NPV of 82.9%. The best genetic model included 10 cytokines (IL-1B, IL-2, IL-6, IL-7R, IL-10, IL-17A, IL-23R, INF-γ, TGFβ, and TNFα). This model obtained a CCR1 of 80%, AUC of 0.8, and NPV of 81% (Table 4; Figure 1).
When both genetic and clinical variables were included, the same clinical variables persisted, and 8 cytokines were added (IL-1B, IL-2, IL-7R, IL-10, IL-17A, IL-23R, INF-γ, and TGFβ), improving the results of C-M, with a CCR1 of 80%, AUC of 0.8, and NPV of 85.1% (Tables 5 and 6; Figure 1).
A detailed explanation of the SNPs selected in CG-M in light of previously reported results is included in Table 6 and in supplemental material.
On the basis of the β results from LASSO, risk scores were calculated for aGVHD and cGVHD as well as for NRM. Patients were categorized into 2 groups: low risk (below the cutoff value) and high risk (above the cutoff). Final risk scores with C-M, G-M, and CG-M are summarized in supplemental Tables 5-7, respectively.
Overall, prediction of grade 3 to 4 aGVHD was significantly better using CG-M (P < .001) than using C-M or G-M (Figures 1 and 2). However, similar results were obtained when predicting extensive cGVHD with both CG-M and G-M, and both performed better than C-M (Figures 1 and 3). When NRM was considered, both C-M and CG-M performed better than G-M (supplemental Figure 5).
Also, we calculated GVHD risk scores for the patients most recently undergoing transplantation (2005-2012) to test the usefulness of the models in a subset of patients treated following current practices. Interestingly, the results obtained in this subgroup of patients were similar to those reported for the whole cohort (supplemental Figures 3 and 4).
Finally, we calculated the incidence of aGVHD also including censored patients (total, n = 359), which interestingly rendered similar results compared with those of the previous analysis (supplemental Figure 4). Incidence of cGVHD could not be calculated because of a lack of cGVHD onset time point information.
Discussion
Despite advances in the knowledge of the pathophysiology of GVHD during the last 2 decades, 30% to 50% of patients undergoing allo-SCT develop this complication,32 which leads to high morbidity, reduces quality of life, and is associated with a significantly higher risk of treatment-related mortality and poorer overall survival.3 Therefore, it is essential to identify biomarkers that can help to estimate the risk of GVHD. Biomarkers may also identify patients who will not respond to traditional treatments,33 making it possible to implement more stringent monitoring and specific preventive care or modify treatment. Moreover, the ability to anticipate the risk of subsequent morbidity and mortality could facilitate personalized treatment plans, including additional immunosuppressive therapies introduced early for high-risk patients or reduced-intensity approaches in low-risk patients.
Classically, GVHD has been estimated based almost entirely on the presence of clinical symptoms; indeed, over the last 15 years, several groups,2,11,13,15,16 including ours,12,14,34 have demonstrated that non-HLA SNPs can be used as biomarkers to anticipate GVHD. Although some of these reports identified individual SNPs, in most cases, no single SNP is sufficient for prognosis. Thus, the simultaneous use of several SNPs may increase specificity and predictability. In any case, there are currently no validated laboratory tests to predict the risk of GVHD or patient survival.
Given that this study was performed in a large and homogeneous cohort, our results suggest that SNPs in cytokine genes, in combination with clinical factors, could predict severe GVHD (grade 3-4 aGVHD and extensive cGVHD). One of the limitations of our retrospective study is that the end point of extensive cGVHD, which is no longer used in clinical practice, was considered because patients from 1997 were included. Therefore, current clinical applicability of such results is limited, and results must be validated in an independent study considering in-use end points of mild, moderate, and severe cGVHD.
As described before, the LASSO procedure autonomously selected clinical variables that were previously known to influence GVHD and NRM development, such as older patient age, peripheral blood as stem-cell source, female donor/male recipient, and previous aGVHD,8-10 confirming the robustness of the approach.
Other characteristics (reduced-intensity conditioning and not having received TBI), for which reported data were more controversial, were also selected by the LASSO procedure as associated with GVHD,10,32 probably because less intense regimens tend to be offered to older and more heavily treated patients. Gene variants have been shown to alter the expression or function of the proteins responsible for immune response,4 and there is growing evidence to support the importance of genetic variability (gene polymorphisms) for predicting the risk of GVHD in individual recipients. In the present study, the LASSO procedure selected polymorphisms in known cytokine genes such as IL1B,35,36 IL6,26 IL10,37 IL17A,38-40 IL23R,41 INFγ ,42 TGFβ ,43 and TNFα,44,45 which were confirmed to correlate with the risk of severe aGVHD, and polymorphisms in IL1B,36 IL2, IL7R, IL17A, IL23R, INFγ ,42 and TGFβ were found to play an important role in the risk of extensive cGVHD. Furthermore, LASSO identified an association between various genes (IL246,47 and IL7R48 ) and GVHD, which have remained controversial in the literature. Of note, the approach generates complex models to predict severe GVHD that include a high number of genetic variables, probably derived from the fact that GVHD is a complex entity with different phases and cell types involved.
The main strength of our study is the development of a predictive model that combines clinical and genetic variables in a large cohort of homogeneous patients undergoing the same type of transplantation (HLA-identical sibling donor). These findings may not apply to patients undergoing transplantation with unrelated or non–HLA-identical sibling donors. The inclusion of SNPs as markers, together with clinical variables in the risk model, significantly improves the CCRs of patients with severe aGVHD in comparison with the models based only on clinical or genetic data. In fact, the best model for the anticipation of severe aGVHD was CG-M, with a high CCR1 of 100%; CG-M and G-M performed similarly, with a CCR of 80%. These results demonstrate the clinical usefulness of including genetic variables, in addition to the available clinical variables, in the predictive models. Such models are of clinical utility because they consistently identify patients who will develop GVHD. In any case, it is also important to identify those patients who will not develop severe GVHD. Interestingly, CG-M provided an NPV of 98.6% for severe aGVHD and 85.1% for extensive cGVHD. In contrast, the NPVs obtained for the other models were slightly worse (severe aGVHD: C-M, 91% and G-M, 96%; extensive cGVHD: C-M, 82.6% and G-M, 81%).
Interestingly, the models proposed here can be applied by other centers using the mathematical formulas shown in supplemental Tables 5-7.
In light of these results, it could be argued that patients who are classified as high risk for the development of severe GVHD, mainly aGVHD, would still receive standard-of-care immunosuppression. However, patients classified as low risk according to the model, and who therefore will most probably not develop severe GVHD, could benefit from modification of immunosuppressive therapy, thus preserving the graft-versus-leukemia effect. This would be of special relevance in those patients with persistent minimal residual disease before transplantation.49 Of course, before making any recommendations, these findings must be validated in prospective studies with large cohorts to demonstrate their clinical utility.
As previously mentioned, it is clear that no single SNP is sufficient for prognosis, but the use of simultaneous SNPs may increase specificity and predictability. Thus, other authors have also developed GVHD predictive models. Kim et al22 proposed SNP-based risk models, also including clinical and genetic variables, associated with transplantation outcomes, which allowed stratification of patients in terms of overall survival, relapse-free survival, NRM, and aGVHD, but not cGVHD. Hartwell et al50 recently described an early-biomarker algorithm that predicted lethal GVHD and survival measuring 4 biomarkers (ST2, REG3a, TNFR1, and IL-2Rα) on plasma samples on day +7 after SCT in 1287 patients. This study included transplantations performed with various types of donors (unrelated or related), whereas ours included only HLA-identical transplantations. As in our study, this model was also capable of predicting the risk of severe GVHD after SCT before the onset of GVHD symptoms. Unlike our proposal, this algorithm only included the 4 biomarkers and did not consider clinical data. This approach could be combined in future studies with the 1 proposed here to further improve predictive models. Moreover, including genetic markers that help predict response to drugs could drive therapeutic interventions in the management of GVHD.33
Our main goal for the future would be to improve the model to make it useful for all patients. To this end, we are currently using next-generation sequencing to search for new polymorphisms in immune response–related genes, minor histocompatibility antigen genes, drug metabolism genes, and innate immunity genes. In conclusion, although prospective validation studies should be performed to confirm these results, the present study suggests a risk model using donor and recipient SNP markers and clinical variables that improves GVHD risk stratification, allowing optimized clinical management of patients undergoing transplantation.
The full-text version of this article contains a data supplement.
Acknowledgments
The authors would like to thank the Centro Nacional de Genotipado for help with genotyping. They would also like to acknowledge the patients who participated in this study, as well as the staff of the Hematology Department, Hospital General Universitario Gregorio Marañón (Madrid, Spain), who made the study possible.
This study was partially supported by Ministry of Economy and Competitiveness ISCIII-FIS Grants PI08/1463, PI11/00708, PI14/01731, PI17/01880, and RD12/0036/0061 and cofinanced by the European Regional Development Fund from the European Commission, the “A way of making Europe” initiative, and grants from Fundación LAIR and Asociación Madrileña de Hematología y Hemoterapia.
Authorship
Contribution: C.M.-L., E.B., M.C.A.-M., J.L.D.-M., J.R., and I.B. were responsible for conception and design; N.S., B.M.-A., V.G., J.B.N., M.G., R.d.l.C., S.B., A.J.-V., I.E., C.V., A.S., D.S., M.K., J.G., Á.U.-I., C.S., D.G., and J.L.D.-M. provided patients and samples; C.M.-L., E.B., M.C.A.-M., A.P., R.L., and I.B. collected and assembled data; C.M.-L., E.B., M.C.A.-M., A.P., M.G.-R., R.L., J.M.B., P.B., J.L.D.-M., J.R., and I.B. were responsible for data analysis and interpretation; C.M.-L., E.B., M.C.A.-M., and I.B. wrote the manuscript; and all authors gave final approval of the manuscript and are accountable for all aspects of the work.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
A complete list of the members of the GVHD/Immunotherapy Committee of the Spanish Group for Hematopoietic Transplantation appears in “Appendix.”
Correspondence: Carolina Martínez-Laperche, Laboratorio Genética Hematólogica, Edif Oncología Pl−1, Servicio de Hematología, Hospital General Universitario Gregorio Marañón, C/Doctor Esquerdo 46, 28007 Madrid, Spain; e-mail: cmlaperchehgugm@gmail.com.
Appendix: study group members
The members of the GVHD/Immunotherapy Committee of the Spanish Group for Hematopoietic Transplantation are: C.M.-L., B.M.-A., J.B.N., M.G., R.d.l.C., S.B., I.E., C.V., A.S., D.S., M.K., J.G., P.B., Á.U.-I., C.S., D.G., J.L.D.-M., and I.B.
References
Author notes
C.M.-L., E.B., and M.C.A.-M. contributed equally to this work.