Abstract
The severity of symptoms at the onset of graft versus host disease (GVHD) does not accurately define risk, and thus most patients (pts) are treated alike with high dose systemic steroids. We hypothesized that concentrations of one or more plasma biomarkers at the time of GVHD diagnosis could define distinct non-relapse mortality (NRM) risk grades that could guide treatment in a multicenter setting. We first analyzed plasma that was prospectively collected at acute GVHD onset from 492 HCT pts from 2 centers, which we randomly divided into training (n=328) and validation (n=164) sets; 300 HCT pts who enrolled on multicenter BMT CTN primary GVHD therapy clinical trials provided a second validation set. We measured the concentrations of 3 prognostic biomarkers (TNFR1, REG3α, and ST2) and used competing risks regression to create an algorithm from the training set to compute a predicted probability (p) of 6 mo NRM from GVHD diagnosis where log[-log(1-p)] = -9.169 + 0.598(log2TNFR1) - 0.028(log2REG3α) + 0.189(log2ST2). We then rank ordered p from lowest to highest and identified thresholds that met predetermined criteria for 3 GVHD grades so that NRM would increase 15% on average with each grade. A range of thresholds in the training set met these criteria, and we chose one near each median to demarcate each grade.
In the resulting grades, risk of NRM significantly increased with each grade after the onset of GVHD in both the training and validation sets (FIG 1A,B). Most (80%) NRM was due to steroid-refractory GI GVHD, even though surprisingly only half of these pts presented with GI symptoms. We next applied the biomarker algorithm and thresholds to the second multicenter validation set (n=300) and observed similarly significant differences in NRM (FIG 1C). Relapse, which was treated as a competing risk for NRM, did not differ among the three GVHD grades (Figure 1D-F). The differences in NRM thus translated into significantly different overall survival for each GVHD grade (Figure 1G-I). These differences in survival are explained by primary therapy response at day 28, which was highly statistically different for each of Ann Arbor grade (grade 1, 81%; grade 2, 68%; grade 3, 46%; p<0.001 for all comparisons).
We performed additional analyses on the multicenter validation set of pts that developed GVHD after treatment with a wide spectrum of supportive care, conditioning and GVHD prophylaxis practices. As expected, the Glucksberg grade at GVHD onset did not correlate with NRM (data not shown). Despite small sample sizes, the same biomarker algorithm and thresholds defined three distinct risk strata for NRM within each Glucksberg grade (FIG 2A-C). Pts with the higher Ann Arbor grades were usually less likely to respond to treatment. Unexpectedly, approximately the same proportion of pts were assigned to each Ann Arbor grade (~25% grade 1, ~55% grade 2, ~20% grade 3) regardless of the Glucksberg grade (FIG 2D-F).
Several clinical risk factors, such as donor type, age, conditioning, and HLA-match, can predict treatment response and survival in patients with GVHD. Using Ann Arbor grade 2 as a reference, we found that Ann Arbor grade 1 predicted a lower risk of NRM (range 0.16-0.32) and grade 3 a higher risk of NRM (range 1.4-2.9), whether or not any of these clinical risk factors were present.
To directly compare Ann Arbor grades to Glucksberg grades, we fit a multivariate model with simultaneous adjustment for both grades. FIG 3 shows that Ann Arbor grade 3 pts had significantly higher risk for NRM (p=0.005) and Ann Arbor grade 1 pts had significantly less risk for NRM (p=0.002) than pts with Ann Arbor grade 2. By contrast, the confidence intervals for the HRs of the Glucksberg grades encompassed 1.0, demonstrating a lack of statistical significance between grades.
In conclusion, we have developed and validated an algorithm of plasma biomarkers that define three grades of GVHD with distinct risks of NRM and treatment failure despite differences in clinical severity at presentation. The biomarkers at GVHD onset appear to reflect GI tract disease activity that does not correlate with GI symptom severity at the time. This algorithm may be useful in clinical trial design. For example, it can exclude pts who are likely to respond to standard therapy despite severe clinical presentations, thus limiting the exposure of low risk pts to investigational agents while also identifying the high risk pts most likely to benefit from investigational approaches.
Levine:University of Michigan: GVHD biomarker patent Patents & Royalties. Braun:University of Michigan: GVHD biomarker patent Patents & Royalties. Ferrara:University of Michigan: GVHD biomarker patent Patents & Royalties.
Author notes
Asterisk with author names denotes non-ASH members.