• PROM total score, PROM total score change, and NIH 0 to 3 skin change are associated with clinician-reported response in cutaneous sclerosis.

  • Human activity profile AAS, SF36 vitality change, LSS skin, and LSS skin change are associated with patient-reported response.

Abstract

Cutaneous sclerosis, a highly morbid subtype of chronic graft-versus-host disease (GVHD), demonstrates limited treatment response under current National Institutes of Health (NIH) response measures. We explored novel sclerosis-specific response measures using Chronic GVHD Consortium data. A training cohort included patients with cutaneous sclerosis from a randomized trial of imatinib vs rituximab and a consortium observational study. The validation cohort was a different consortium observational study. Clinician-reported measures (baseline and baseline to 6-month change) were examined for association with 6-month clinician-reported response. Patient-reported measures (baseline and baseline to 6-month change) were studied for association with 6-month patient-reported response. A total of 347 patients were included (training 183 and validation 164). Although multiple skin and joint measures were associated with clinician-reported response on univariate analysis, patient range of motion (PROM) total score, PROM total score change, and NIH 0 to 3 skin change were retained in the final multivariate model (area under the receiver operating characteristic curve [AUC], 0.83 training and 0.75 validation). Similarly, many patient-reported measures were associated, but final multivariate analysis retained the human activity profile adjusted activity score (AAS), 36 item short form health survey (SF36) vitality change, Lee symptom scale (LSS) skin, and LSS skin change in the model (AUC, 0.86 training and 0.75 validation). We identified which sclerosis measures have the greatest association with 6-month clinician- and patient-reported treatment responses, a previously unstudied area. However, given the observed performance in the validation cohorts, we conclude that further work is needed. Novel response measures may be needed to optimally assess treatment response in cutaneous sclerosis.

Chronic graft-versus-host disease (GVHD) is a serious multisystem immune-mediated disorder after allogeneic hematopoietic cell transplantation (HCT). It is a leading source of late post-HCT death, impaired quality of life and function, increased symptom burden, and prolonged duration of immune suppressive therapy.1-6 Cutaneous sclerosis is a relatively common manifestation of chronic GVHD, characterized by fibrotic change in skin, subcutaneous tissue, and joint/fascia.7 Affected patients may suffer nonmutually exclusive combinations of thickened and/or tight skin epidermis, deep tissue thickening, impairments in joint mobility/function, and thus significant disability. Because responses are limited with available immune suppressive therapies, many will require numerous agents and prolonged overall duration of therapy. Highlighting the pressing need for advances in this area, the 2020 National Institutes of Health (NIH) Chronic GVHD Consensus (working group 4) called for research innovation in this and other highly morbid forms of chronic GVHD.8 

Clinical trials in chronic GVHD (such as those that have led to 3 current US Food and Drug Administration–approved agents)9-11 have typically enrolled patients with a diverse range of organ-site manifestations, with a relative underrepresentation of cutaneous sclerosis. In contrast, a few trials have specifically enrolled patients with cutaneous sclerosis: 1 randomized phase 2 Chronic GVHD Consortium trial tested imatinib (n = 35) vs rituximab (n = 37) and demonstrated a 26% to 27% significant clinical response rate at 6 months of therapy, with no significant difference in this outcome per treatment arm.12 This primary outcome was defined by improvements in a sclerotic skin assessment tool (Vienna Skin Scale [VSS]) and range of motion in affected joints (patient range of motion [PROM] scale). Another more recent, multicenter, phase 2, single-arm trial (N = 49) tested ruxolitinib in cutaneous sclerosis.13 Using NIH-defined response in skin or joints at 6 months, 49% achieved a partial response, whereas others had stable or progressive disease. Most of the responses were due to improvements in joint range of motion rather than improvements in sclerotic skin.

Low response rates in cutaneous sclerosis may be driven both by limited responsiveness of the disease to currently available therapies, length of time required for reversal of sclerosis, as well as limited sensitivity of the NIH response measures designed for chronic GVHD overall. Under this response assessment tool, resolution of deep sclerotic changes or major functional improvements in affected joints may be needed to document clinical benefit.14 Additionally, other research has suggested that PROM–based joint/fascia responses may be overly sensitive to both response and progression and affected by interobserver variability.15 There is general recognition that better response measures are needed in cutaneous sclerosis. Research in an allied disorder (the autoimmune condition systemic sclerosis) produced a composite response index (American College of Rheumatology Composite Response Index in Systemic Sclerosis [ACR-CRISS]) that has now been implemented in relevant clinical trials.16 This CRISS model uses 5 core items, including the modified Rodnan Skin Score that has been used in cutaneous sclerosis previously.

The main objective of our current analysis was to define which existing measures of skin and joint disease activity in cutaneous sclerosis were associated with clinician- and patient-reported benefit at 6 months. Two Chronic GVHD Consortium observational studies and 1 interventional trial were leveraged to address this question, including separate training and independent validation.

Parent study populations

Population 1: prospective clinical trial testing imatinib vs rituximab

This multicenter phase 2 trial enrolled patients with cutaneous sclerosis and randomly assigned therapy with either imatinib or rituximab.12 The trial was conducted from 2011 to 2014 and treated a total of 72 patients from 11 participating centers. The core objective was to assess and compare the efficacy of each agent in cutaneous sclerosis. The primary end point of significant clinical response at 6 months was defined by VSS improvement (≥2 points) and PROM improvement (≥2/7 scale or ≥1/4 scale). Beyond this, extensive data were collected on baseline features, objective measures of sclerotic involvement and changes (NIH chronic GVHD scores, VSS, PROM, and goniometer measures), patient-reported outcome measures, as well as skin and blood biomarkers.

Population 2: 2192 observational cohort study

This multicenter observational study enrolled patients with chronic GVHD, which could be either incident (enrollment within 3 months of chronic GVHD diagnosis) or prevalent (over 3 months from diagnosis and within 3 years of HCT). Recruitment occurred from 2007 to 2012 at 9 HCT centers in the United States, and a total of 601 were enrolled. Data collection occurred at enrollment, 3 months (incident cases only), and every 6 months through 5 years. Clinician- and patient-reported data, treatment information, samples, and functional assessments were serially obtained with the overall objective of validating proposed NIH Consensus measures.

Population 3: 2710 observational cohort study

This multicenter observational study enrolled chronic GVHD–affected patients starting a new systemic therapy for chronic GVHD.17 Recruitment occurred from 2013 to 2019 at 12 US HCT centers, with 383 total enrolled. Data collection occurred at enrollment, 3, 6, and 18 months and at the time of systemic treatment change. With the overall objective of testing the NIH response criteria, the study gathered comprehensive clinician- and patient-reported data, treatment information, samples, and functional assessments.

Institutional review board approval was granted for the original 3 studies (1 trial and 2 observational cohort studies listed in the manuscript) including long-term follow-up analyses (including this report).

Current study population and analysis plan

The following were included in this analysis (supplemental Figure 1). First, all participants from the imatinib vs rituximab clinical trial were included; the original trial eligibility included a diagnosis of cutaneous sclerosis (within 12 months from sclerosis diagnosis) with either sclerotic skin, morphea, myofascial involvement, or join contractures with a score of ≥2 in any area on the Vienna skin scale,18 or a score of ≤5 at the shoulder, elbow, or wrist or a score of ≤3 at the ankle on the PROM scale.19 Second, we included patients from the 2192 and 2710 observational studies based on reported sclerotic skin or joint/fascial involvement at cohort entry. From the parent studies, the following variables were considered for the purpose of defining cutaneous sclerosis: NIH 0 to 3 sclerotic skin scores (inclusive of score 2-3 indicating superficial or deep sclerosis), indication of sclerotic features present, fascial involvement, or VSS grade 3 or 4 for any body region at cohort entry. The imatinib vs rituximab clinical trial and the 2192 study were combined to form the training cohort of this analysis, and the 2710 study formed the independent validation cohort.

The analysis was organized to address 2 parallel questions. First, clinician-assessed sclerosis variables (both baseline and baseline to 6 month change values) were considered for association with 6-month (skin-specific) clinician-reported treatment response. The variables considered included the following: PROM scores (separately at shoulder, elbow, wrist, and ankle, as well as total PROM score),19 total body surface area (BSA) involved (separately for movable and nonmovable sclerotic skin changes), NIH 0 to 3 skin scores, Hopkins skin score (normal, thickened with pockets of normal skin, thickened over majority of skin, thickened and unable to move, and hidebound unable to pinch) and fascia scale (normal, tight with normal areas, tight, and tight and unable to move),20 and total BSA involved with superficial or deep sclerosis per the Vienna skin scale (total VSS for all BSA, not per individual anatomical sites).18 Clinician-assessed response at 6 months was originally captured according to an 8-point scale with categories of “completely gone,” “very much better,” “moderately better,” “a little better,” “about the same,” “a little worse,” “moderately worse,” or “a lot worse.”14 These response categories were collapsed into “improved” (completely gone, very much better, moderately better, and a little better) vs “not improved” (about the same, a little worse, moderately worse, and a lot worse).

Second, patient-reported outcome (PRO) measures (both baseline and baseline to 6-month change values) were considered for association with 6-month (skin-specific) patient-reported treatment response. Patient-reported outcome measures considered included the following: modified scleroderma health assessment questionnaire standard and alternative disability index,21 modified human activity profile (HAP) adjusted activity score,22 SF-36 domain (physical functioning, role physical, bodily pain, general health, vitality, social functioning, role emotional, and mental health) and summary (mental and physical component scores) scores,23 Lee symptom scale (LSS) domain (skin, energy, lung, eye, nutrition, psychological, and mouth), individual question (joint and muscle aches and limited joint movement) and summary (overall summary) scores,24 and Functional Assessment of Cancer Therapy–Bone Marrow Transplant (FACT-BMT) domain (physical, social/family, emotional, and functional well-being; BMT subscale) and summary (Functional Assessment of Cancer Therapy - General [FACT-G], FACT trial outcome index, and FACT-BMT total) scores.25 Patient-response categories were as described above (per clinician-reported response) and similarly collapsed into improved vs not improved summary response categories. These measures are briefly summarized in supplemental Table 3.

Statistical methods

Patient characteristics were summarized and compared across the training and validation cohorts using the χ2 test and Fisher exact test for categorical variables and the Wilcoxon rank sum test for continuous variables. Univariate logistic regression analyses were conducted to examine the association between individual (clinician- and patient-reported variables separately) variables and the 6-month response outcome (clinician- and patient-reported 6 month response, respectively). Both baseline and baseline to 6 month change values were considered for each variable. Multivariate analyses were performed (for the 2 response outcomes separately) using stepwise regression with criteria for entry and stay in the model of P value <.05. Because combined models (incorporating both clinician- and patient-reported variables to model each of our response outcomes) did not improve model performance, we examined only clinical variables in the clinician response model and patient-reported variables in the patient-reported response model. The final performance of each model was reported as an area under the receiver operating characteristic (ROC) curve (AUC), and cut-points were determined (using Youden J = sensitivity + specificity – 1, and minimizing the distance from the ROC curve to the point [0, 1]) to calculate sensitivity, specificity, and positive and negative predictive values.

Considering eligible participants from the 3 parent studies, a total of 347 patients were included in this analysis (flow diagram, supplemental Figure 1). These were divided into a training set (n = 183) and validation set (n = 164). Baseline characteristics of the included patients are listed in supplemental Table 1. A full description of clinician-reported sclerosis variables and PRO measures is presented in Tables 1 and 2, whereas those that were retained in the final multivariate analysis are presented in Table 3. Actual systemic immune suppressive therapies given for cutaneous sclerosis are detailed in supplemental Table 2.

Table 1.

Cutaneous sclerosis features

CharacteristicTotal, mean (range)Training, mean (range)Validation, mean (range)
ROM-shoulder 6.3 (2.0-7.0) 6.3 (3.0-7.0) 6.2 (2.0-7.0) 
295 138 157 
ROM-elbow 6.3 (1.0-7.0) 6.3 (1.0-7.0) 6.4 (3.0-7.0) 
294 139 155 
ROM-wrist 5.7 (1.0-7.0) 5.7 (1.0-7.0) 5.7 (1.0-7.0) 
295 138 157 
ROM-ankle 3.4 (1.0-4.0) 3.3 (1.0-4.0) 3.5 (1.0-4.0) 
289 137 152 
BSA movable 11.7 (0.0-77.8) 11.7 (0.0-77.8)  
182 182  
BSA nonmovable 8.0 (0.0-70.0) 8.0 (0.0-70.0)  
182 182  
Hopkins skin score 1.8 (0.0-4.0) 1.8 (0.0-4.0)  
182 182  
Fascia/joints 1.1 (0.0-3.0) 1.2 (0.0-3.0) 1.0 (0.0-3.0) 
346 182 164 
TSS    
% grade 0 68.5 (0.0-100.0) 68.5 (0.0-100.0)  
% grade 1 10.3 (0.0-90.1) 10.3 (0.0-90.1)  
% grade 2 5.9 (0.0-63.1) 5.9 (0.0-63.1)  
% grade 3 10.1 (0.0-65.0) 10.1 (0.0-65.0)  
% grade 4 5.2 (0.0-54.0) 5.2 (0.0-54.0)  
Total 183 183  
VSS 6.3 (0.0-26.5) 6.3 (0.0-26.5)  
183 183  
CharacteristicTotal, mean (range)Training, mean (range)Validation, mean (range)
ROM-shoulder 6.3 (2.0-7.0) 6.3 (3.0-7.0) 6.2 (2.0-7.0) 
295 138 157 
ROM-elbow 6.3 (1.0-7.0) 6.3 (1.0-7.0) 6.4 (3.0-7.0) 
294 139 155 
ROM-wrist 5.7 (1.0-7.0) 5.7 (1.0-7.0) 5.7 (1.0-7.0) 
295 138 157 
ROM-ankle 3.4 (1.0-4.0) 3.3 (1.0-4.0) 3.5 (1.0-4.0) 
289 137 152 
BSA movable 11.7 (0.0-77.8) 11.7 (0.0-77.8)  
182 182  
BSA nonmovable 8.0 (0.0-70.0) 8.0 (0.0-70.0)  
182 182  
Hopkins skin score 1.8 (0.0-4.0) 1.8 (0.0-4.0)  
182 182  
Fascia/joints 1.1 (0.0-3.0) 1.2 (0.0-3.0) 1.0 (0.0-3.0) 
346 182 164 
TSS    
% grade 0 68.5 (0.0-100.0) 68.5 (0.0-100.0)  
% grade 1 10.3 (0.0-90.1) 10.3 (0.0-90.1)  
% grade 2 5.9 (0.0-63.1) 5.9 (0.0-63.1)  
% grade 3 10.1 (0.0-65.0) 10.1 (0.0-65.0)  
% grade 4 5.2 (0.0-54.0) 5.2 (0.0-54.0)  
Total 183 183  
VSS 6.3 (0.0-26.5) 6.3 (0.0-26.5)  
183 183  

ROM, range of motion; TSS, total skin score.

Table 2.

Patient-reported outcome measures

CharacteristicTotal, mean (range)Training, mean (range)Validation, mean (range)
SF36 physical component summary score 37.5 (11.0-60.7) 37.5 (13.7-60.7) 37.4 (11.0-57.9) 
293 156 137 
SF36 mental component summary score 48.5 (7.1-68.4) 47.2 (17.7-68.4) 50.0 (7.1-68.2) 
293 156 137 
SF36 physical function score 39.8 (14.9-57.0) 40.0 (14.9-57.0) 39.7 (14.9-57.0) 
300 162 138 
SF36 role-physical score 37.1 (17.7-56.9) 37.4 (17.7-56.9) 36.6 (17.7-56.9) 
298 161 137 
SF36 bodily pain score 43.0 (19.9-62.1) 41.5 (19.9-62.1) 44.7 (19.9-62.1) 
301 163 138 
SF36 general health score 39.3 (16.2-63.9) 39.1 (16.2-63.9) 39.6 (16.2-62.5) 
298 160 138 
SF36 social functioning score 41.2 (13.2-56.8) 40.4 (13.2-56.8) 42.1 (13.2-56.8) 
301 163 138 
SF36 role-emotional score 44.7 (9.2-55.9) 43.9 (9.2-55.9) 45.2 (9.2-55.9) 
298 160 138 
SF36 mental health score 48.9 (10.6-64.1) 47.5 (16.2-64.1) 50.5 (10.6-64.1) 
301 163 138 
LSS energy score 41.5 (0.0-96.4) 43.4 (0.0-96.4) 39.2 (0.0-92.9) 
304 166 138 
LSS lung score 7.3 (0.0-60.0) 8.4 (0.0-55.0) 6.0 (0.0-60.0) 
303 165 138 
LSS eye score 39.8 (0.0-100.0) 37.2 (0.0-100.0) 42.8 (0.0-100.0) 
302 164 138 
LSS nutrition score 7.5 (0.0-60.0) 7.6 (0.0-60.0) 7.4 (0.0-45.0) 
304 166 138 
LSS psychological score 27.7 (0.0-100.0) 30.9 (0.0-100.0) 23.7 (0.0-100.0) 
303 166 137 
LSS mouth score 19.0 (0.0-100.0) 16.9 (0.0-100.0) 21.5 (0.0-100.0) 
303 165 138 
LSS summary score 25.3 (4.1-73.3) 25.9 (4.1-73.3) 24.5 (4.4-65.9) 
303 165 138 
FACT physical well-being 20.1 (0.0-28.0) 19.4 (0.0-28.0) 21.0 (3.0-28.0) 
299 161 138 
FACT social/family well-being 22.0 (2.0-28.0) 21.7 (2.0-28.0) 22.3 (7.0-28.0) 
299 162 137 
FACT emotional well-being 18.4 (2.4-24.0) 17.9 (4.0-24.0) 18.9 (2.4-24.0) 
299 161 138 
FACT functional well-being 16.3 (1.2-28.0) 15.7 (1.2-28.0) 17.0 (2.0-28.0) 
300 162 138 
FACT-G 76.9 (23.0-108.0) 74.8 (23.0-108.0) 79.3 (36.0-107.0) 
297 160 137 
FACT-BMT subscale 26.4 (11.0-38.0) 26.4 (11.0-38.0)  
162 162  
FACT trial outcome index 61.4 (20.0-94.0) 61.4 (20.0-94.0)  
161 161  
FACT-BMT total 101.3 (36.0-146.0) 101.3 (36.0-146.0)  
160 160  
Joint/muscle aches 1.9 (0.0-4.0) 2.0 (0.0-4.0) 1.7 (0.0-4.0) 
303 165 138 
Limited joint movement 1.8 (0.0-4.0) 1.8 (0.0-4.0) 1.7 (0.0-4.0) 
302 166 136 
CharacteristicTotal, mean (range)Training, mean (range)Validation, mean (range)
SF36 physical component summary score 37.5 (11.0-60.7) 37.5 (13.7-60.7) 37.4 (11.0-57.9) 
293 156 137 
SF36 mental component summary score 48.5 (7.1-68.4) 47.2 (17.7-68.4) 50.0 (7.1-68.2) 
293 156 137 
SF36 physical function score 39.8 (14.9-57.0) 40.0 (14.9-57.0) 39.7 (14.9-57.0) 
300 162 138 
SF36 role-physical score 37.1 (17.7-56.9) 37.4 (17.7-56.9) 36.6 (17.7-56.9) 
298 161 137 
SF36 bodily pain score 43.0 (19.9-62.1) 41.5 (19.9-62.1) 44.7 (19.9-62.1) 
301 163 138 
SF36 general health score 39.3 (16.2-63.9) 39.1 (16.2-63.9) 39.6 (16.2-62.5) 
298 160 138 
SF36 social functioning score 41.2 (13.2-56.8) 40.4 (13.2-56.8) 42.1 (13.2-56.8) 
301 163 138 
SF36 role-emotional score 44.7 (9.2-55.9) 43.9 (9.2-55.9) 45.2 (9.2-55.9) 
298 160 138 
SF36 mental health score 48.9 (10.6-64.1) 47.5 (16.2-64.1) 50.5 (10.6-64.1) 
301 163 138 
LSS energy score 41.5 (0.0-96.4) 43.4 (0.0-96.4) 39.2 (0.0-92.9) 
304 166 138 
LSS lung score 7.3 (0.0-60.0) 8.4 (0.0-55.0) 6.0 (0.0-60.0) 
303 165 138 
LSS eye score 39.8 (0.0-100.0) 37.2 (0.0-100.0) 42.8 (0.0-100.0) 
302 164 138 
LSS nutrition score 7.5 (0.0-60.0) 7.6 (0.0-60.0) 7.4 (0.0-45.0) 
304 166 138 
LSS psychological score 27.7 (0.0-100.0) 30.9 (0.0-100.0) 23.7 (0.0-100.0) 
303 166 137 
LSS mouth score 19.0 (0.0-100.0) 16.9 (0.0-100.0) 21.5 (0.0-100.0) 
303 165 138 
LSS summary score 25.3 (4.1-73.3) 25.9 (4.1-73.3) 24.5 (4.4-65.9) 
303 165 138 
FACT physical well-being 20.1 (0.0-28.0) 19.4 (0.0-28.0) 21.0 (3.0-28.0) 
299 161 138 
FACT social/family well-being 22.0 (2.0-28.0) 21.7 (2.0-28.0) 22.3 (7.0-28.0) 
299 162 137 
FACT emotional well-being 18.4 (2.4-24.0) 17.9 (4.0-24.0) 18.9 (2.4-24.0) 
299 161 138 
FACT functional well-being 16.3 (1.2-28.0) 15.7 (1.2-28.0) 17.0 (2.0-28.0) 
300 162 138 
FACT-G 76.9 (23.0-108.0) 74.8 (23.0-108.0) 79.3 (36.0-107.0) 
297 160 137 
FACT-BMT subscale 26.4 (11.0-38.0) 26.4 (11.0-38.0)  
162 162  
FACT trial outcome index 61.4 (20.0-94.0) 61.4 (20.0-94.0)  
161 161  
FACT-BMT total 101.3 (36.0-146.0) 101.3 (36.0-146.0)  
160 160  
Joint/muscle aches 1.9 (0.0-4.0) 2.0 (0.0-4.0) 1.7 (0.0-4.0) 
303 165 138 
Limited joint movement 1.8 (0.0-4.0) 1.8 (0.0-4.0) 1.7 (0.0-4.0) 
302 166 136 
Table 3.

Clinician and patient-reported measures retained in final multivariate models for 6-month treatment response

CharacteristicTotal (N = 347)Training (n = 183)Validation (n = 164)P value 
Clinician-reported measures     
PROM at baseline 21.7 (12.0-25.0) 21.6 (13.0-25.0) 21.9 (12.0-25.0) .44 
285 136 149  
PROM change 0.3 (−12.0 to 8.0) 0.2 (−12.0 to 8.0) 0.3 (−11.0 to 7.0) .79 
265 129 136  
NIH skin score change −0.2 (−3.0 to 3.0) −0.3 (−3.0 to 2.0) −0.2 (−3.0 to 3.0) .56 
340 180 160  
Patient-reported measures     
Modified HAP at baseline 64.5 (9.0-94.0) 64.9 (9.0-94.0) 64.0 (24.0-94.0) .63 
299 161 138  
SF36 vitality change 1.1 (−21.9 to 31.2) 1.5 (−21.9 to 31.2) 0.7 (−21.9 to 25.0) .48 
 265 140 125  
LSS skin score at baseline 34.1 (0.0-100.0) 36.7 (0.0-100.0) 30.9 (0.0-100.0) .03 
303 165 138  
LSS skin score change −9.1 (−75.0 to 60.0) −9.4 (−75.0 to 35.0) −8.6 (−70.0 to 60.0) .76 
263 141 122  
CharacteristicTotal (N = 347)Training (n = 183)Validation (n = 164)P value 
Clinician-reported measures     
PROM at baseline 21.7 (12.0-25.0) 21.6 (13.0-25.0) 21.9 (12.0-25.0) .44 
285 136 149  
PROM change 0.3 (−12.0 to 8.0) 0.2 (−12.0 to 8.0) 0.3 (−11.0 to 7.0) .79 
265 129 136  
NIH skin score change −0.2 (−3.0 to 3.0) −0.3 (−3.0 to 2.0) −0.2 (−3.0 to 3.0) .56 
340 180 160  
Patient-reported measures     
Modified HAP at baseline 64.5 (9.0-94.0) 64.9 (9.0-94.0) 64.0 (24.0-94.0) .63 
299 161 138  
SF36 vitality change 1.1 (−21.9 to 31.2) 1.5 (−21.9 to 31.2) 0.7 (−21.9 to 25.0) .48 
 265 140 125  
LSS skin score at baseline 34.1 (0.0-100.0) 36.7 (0.0-100.0) 30.9 (0.0-100.0) .03 
303 165 138  
LSS skin score change −9.1 (−75.0 to 60.0) −9.4 (−75.0 to 35.0) −8.6 (−70.0 to 60.0) .76 
263 141 122  

Based on the t test.

By 6 months, the training and validation set patients (presented as percentage of training set patients/percentage of validation set patients for each, respectively) had improvement (52%/53%) per clinician assessment, comprising completely gone (7%/7%), very much better (12%/10%), moderately better (17%/18%), and a little better (17%/18%). The nonimprovement categories included about the same (23%/23%), a little worse (14%/12%), moderately worse (10%/12%), and very much worse (1%/0%). On univariate analysis, many individual measures were associated with clinician-reported treatment response (supplemental Table 4). On final multivariate analysis, baseline PROM summary score, PROM change from baseline to 6 months, and NIH 0 to 3 skin score change from baseline to 6 months were significantly associated with clinician-reported treatment response (Table 4).

Table 4.

Final multivariate analysis results for clinician- and patient-reported 6-month treatment response

VariableOR (95% CI)P value
Clinician model   
PROM at baseline 1.3 (1.1-1.6) .002 
PROM change  1.5 (1.2- 2.0) .002 
NIH skin score change  0.3 (0.2-0.6) <.001 
Patient model   
Modified HAP (per 10) 1.5 (1.1-2.1) .02 
SF36 vitality scale change (per 10) 3.9 (2.0-7.6) <.001 
LSS skin scale (per 10) 0.6 (0.5-0.8) <.001 
LSS skin scale change (per 10) 0.4 (0.2-0.6) <.001 
VariableOR (95% CI)P value
Clinician model   
PROM at baseline 1.3 (1.1-1.6) .002 
PROM change  1.5 (1.2- 2.0) .002 
NIH skin score change  0.3 (0.2-0.6) <.001 
Patient model   
Modified HAP (per 10) 1.5 (1.1-2.1) .02 
SF36 vitality scale change (per 10) 3.9 (2.0-7.6) <.001 
LSS skin scale (per 10) 0.6 (0.5-0.8) <.001 
LSS skin scale change (per 10) 0.4 (0.2-0.6) <.001 

CI, confidence interval; OR, odds ratio.

Change from baseline to 6 months

By 6 months, the training and validation set patients (presented as percentage of training set patients/percentage of validation set patients for each, respectively) had patient-reported improvement (59%/64%), comprising completely gone (6%/8%), very much better (17%/18%), moderately better (18%/20%), and a little better (18%/18%). For those without improvement, categories were about the same (21%/20%), a little worse (11%/9%), moderately worse (7%/6%), and very much worse (2%/1%). On univariate analysis, multiple individual PRO measures (inclusive of domain and summary scores) were associated with patient-reported response at 6 months (supplemental Table 5). Final multivariate analysis confirmed that baseline HAP adjusted activity score, SF-36 vitality score change from baseline to 6 months, and LSS skin (both baseline and change value from baseline to 6 months) were significantly associated with patient-reported treatment response (Table 4). This model only considered subscales for SF-36, Lee symptom score, and FACT (did not incorporate both domain scores and summary total scores for each PRO measure). Alternative models that incorporated domain and summary scores did not provide more optimal AUC (data not shown) and were not pursued further.

We used ROC plots for the training and validation cohorts to characterize the performance of the final models for the clinician- and patient-reported responses, respectively. The ROC plots for the training and validation cohorts are presented in supplemental Figures 2 and 3, and final model thresholds and performance (sensitivity, specificity, positive predictive values, and negative predictive values) are presented in Table 5.

Table 5.

Final model performance for clinician- and patient-reported 6-month treatment response

AUCCut-point SensitivitySpecificityPPVNPV
Clinician model 
Training 0.83 0.47 0.77 0.79 0.75 0.80 
Validation 0.75 0.47 0.60 0.79 0.77 0.63 
Patient model      
Cut-point based on Youden J 
Training 0.86 0.63 0.75 0.86 0.86 0.74 
Validation 0.75 0.63 0.52 0.84 0.85 0.50 
Cut-point based on distance to (1, 1) 
Training 0.86 0.54 0.78 0.82 0.84 0.76 
Validation 0.75 0.54 0.60 0.78 0.83 0.53 
AUCCut-point SensitivitySpecificityPPVNPV
Clinician model 
Training 0.83 0.47 0.77 0.79 0.75 0.80 
Validation 0.75 0.47 0.60 0.79 0.77 0.63 
Patient model      
Cut-point based on Youden J 
Training 0.86 0.63 0.75 0.86 0.86 0.74 
Validation 0.75 0.63 0.52 0.84 0.85 0.50 
Cut-point based on distance to (1, 1) 
Training 0.86 0.54 0.78 0.82 0.84 0.76 
Validation 0.75 0.54 0.60 0.78 0.83 0.53 

AUC, area under curve; NPV, negative predictive value; PPV, positive predictive value.

Based on Youden J and distance to (1, 1)

Major innovation is needed in the treatment and response assessment of cutaneous sclerosis to advance clinical care, conduct of clinical trials, and ultimately improve patient outcomes. Among existing limitations, NIH response criteria applied to this chronic GVHD subgroup failed to capture certain improvements that are recognized by clinicians and patients. Additionally, numerous clinician- and patient-reported measures of sclerosis burden and associated symptoms and impairments have been routinely captured in prior studies, yet the optimal measure or combination of measures to robustly assess treatment response is not known. To address these gap areas, we leveraged 2 major national Chronic GVHD Consortium observational studies and a prior cutaneous sclerosis–specific national clinical trial to test which sclerosis variables were associated with clinician- and patient-reported response, both of which have been previously demonstrated to have association with long-term treatment success.26,27 

In the training set, we found strong association between routinely captured skin and joint/fascia measures and 6-month clinician-assessed treatment response. The PROM baseline score and change value, as well as the change in NIH 0 to 3 skin scores, interestingly were retained in the final model. This speaks to the feasibility of testing this model further in other existing observational data sets or even other recent large clinical trials, given that these measures are routinely captured in baseline and response provider surveys in these settings. In contrast, some measures no longer routinely used (eg, Hopkins scale and Vienna skin scale) were not retained in the final model. The patient-response analysis suggests clarity in which (among many possible) PRO measures have the strongest association with patient-reported response and supports that a combination of the HAP, SF36, and LSS would need to be used for this purpose. These data suggest that both a quality of life (QOL) and symptom-based PRO are needed to adequately capture sclerosis response.

However, although both models had AUC values generally considered to reflect excellent discrimination in the training set, AUC values in the validation set could only be considered acceptable at best, with significant risks for response misclassification. Accordingly, the model in current state requires further refinement and validation and could not be applied in current state to clinical trials or routine practice. The inferior results in the validation set may be due to several known or unknown factors. Likely one of the largest issues is the inherent diversity within and between the populations we examined in this analysis. The included patients uniformly had cutaneous sclerosis, yet had diversity in type, anatomical site, and severity of sclerotic features, varied functional impairments, and varied duration of prior sclerosis before enrollment in the parent studies included here. Another major potential contributor is interobserver variability (both in terms of clinicians and patients) in rating treatment response, as well as differential weighting of improvements in reporting overall response. Additionally, there was marked heterogeneity in the therapeutic agents used across the 3 included studies (with potential variation in treatment efficacy), an inherent challenge given the diversity in patients enrolled in these studies as well as the range of available therapeutic agents.

In total, our results demonstrate that further research is needed. One avenue for additional progress would be a similar exercise in training/validation of a response model using larger and potentially more uniform patient populations; however, it is not possible in the near term based on limited availability of such resources. We also note, for example, the completion of several large chronic GVHD trials (eg, those testing ruxolitinib, belumosudil, or axatilimab)10,11,28 in the recent past, and these study populations could in future work be examined using the methods we have used here in our study population. Separately, novel measures (eg, skin thickness measures,29,30 novel imaging modalities,31-33 tissue and/or blood biomarkers) may ultimately provide new insight and a path forward to optimal response assessment in cutaneous sclerosis, potentially including a composite model incorporating both clinical measures and novel tools. As well, other future directions include development of sclerosis-specific tools, including a sclerosis–specific PRO measure.

The authors acknowledge grant funding support CA163438 and CA118953.

Contribution: J.A.P, L.O., and S.J.L. designed the study, conducted the analysis, and wrote the manuscript; and E.B., P.A.C., C.C., S.A., C.L.K., and G.L.C. provided significant input on the study analysis and writing of the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Joseph A. Pidala, Blood and Marrow Transplantation and Cellular Immunotherapy, H. Lee Moffitt Cancer Center and Research Institute, 12902 Magnolia Drive, Tampa, FL 33612; email: joseph.pidala@moffitt.org.

1.
Arora
M
,
Klein
JP
,
Weisdorf
DJ
, et al
.
Chronic GVHD risk score: a Center for International Blood and Marrow Transplant Research analysis
.
Blood
.
2011
;
117
(
24
):
6714
-
6720
.
2.
Pidala
J
,
Kurland
B
,
Chai
X
, et al
.
Patient-reported quality of life is associated with severity of chronic graft-versus-host disease as measured by NIH criteria: report on baseline data from the Chronic GVHD Consortium
.
Blood
.
2011
;
117
(
17
):
4651
-
4657
.
3.
Arai
S
,
Jagasia
M
,
Storer
B
, et al
.
Global and organ-specific chronic graft-versus-host disease severity according to the 2005 NIH consensus criteria
.
Blood
.
2011
;
118
(
15
):
4242
-
4249
.
4.
Lee
SJ
,
Nguyen
TD
,
Onstad
L
, et al
.
Success of immunosuppressive treatments in patients with chronic graft-versus-host disease
.
Biol Blood Marrow Transplant
.
2018
;
24
(
3
):
555
-
562
.
5.
Pidala
J
,
Anasetti
C
,
Jim
H
.
Quality of life after allogeneic hematopoietic cell transplantation
.
Blood
.
2009
;
114
(
1
):
7
-
19
.
6.
Pidala
J
,
Martens
M
,
Anasetti
C
, et al
.
Factors associated with successful discontinuation of immune suppression after allogeneic hematopoietic cell transplantation
.
JAMA Oncol
.
2020
;
6
(
1
):
e192974
.
7.
Arora
M
,
Cutler
CS
,
Jagasia
MH
, et al
.
Late acute and chronic graft-versus-host disease after allogeneic hematopoietic cell transplantation
.
Biol Blood Marrow Transplant
.
2016
;
22
(
3
):
449
-
455
.
8.
Wolff
D
,
Radojcic
V
,
Lafyatis
R
, et al
.
National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: IV. The 2020 highly morbid forms report
.
Transplant Cell Ther
.
2021
;
27
(
10
):
817
-
835
.
9.
Miklos
D
,
Cutler
CS
,
Arora
M
, et al
.
Ibrutinib for chronic graft-versus-host disease after failure of prior therapy
.
Blood
.
2017
;
130
(
21
):
2243
-
2250
.
10.
Zeiser
R
,
Polverelli
N
,
Ram
R
, et al
.
Ruxolitinib for glucocorticoid-refractory chronic graft-versus-host disease
.
N Engl J Med
.
2021
;
385
(
3
):
228
-
238
.
11.
Cutler
CS
,
Lee
SJ
,
Arai
S
, et al
.
Belumosudil for chronic graft-versus-host disease (cGVHD) after 2 or more prior lines of therapy: the ROCKstar study
.
Blood
.
2021
;
138
(
22
):
2278
-
2289
.
12.
Arai
S
,
Pidala
J
,
Pusic
I
, et al
.
A randomized phase II crossover study of imatinib or rituximab for cutaneous sclerosis after hematopoietic cell transplantation
.
Clin Cancer Res
.
2016
;
22
(
2
):
319
-
327
.
13.
Bhatt
VR
,
Shostrom
VK
,
Saad
A
, et al
.
Ruxolitinib for treatment of steroid refractory sclerotic chronic graft-versus-host disease (cGVHD): results of a multicenter phase II trial
.
Blood
.
2022
;
140
(
suppl 1
):
1379
-
1380
.
14.
Lee
SJ
,
Wolff
D
,
Kitko
C
, et al
.
Measuring therapeutic response in chronic graft-versus-host disease. National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: IV. The 2014 Response Criteria Working Group report
.
Biol Blood Marrow Transplant
.
2015
;
21
(
6
):
984
-
999
.
15.
Inamoto
Y
,
Lee
SJ
,
Onstad
LE
, et al
.
Refined National Institutes of Health response algorithm for chronic graft-versus-host disease in joints and fascia
.
Blood Adv
.
2020
;
4
(
1
):
40
-
46
.
16.
Khanna
D
,
Berrocal
VJ
,
Giannini
EH
, et al
.
The American College of Rheumatology provisional composite response index for clinical trials in early diffuse cutaneous systemic sclerosis
.
Arthritis Rheumatol
.
2016
;
68
(
2
):
299
-
311
.
17.
Chronic
GC
.
Design and patient characteristics of the chronic graft-versus-host disease response measures validation study
.
Biol Blood Marrow Transplant
.
2018
;
24
(
8
):
1727
-
1732
.
18.
Greinix
HT
,
Pohlreich
D
,
Maalouf
J
, et al
.
A single-center pilot validation study of a new chronic GVHD skin scoring system
.
Biol Blood Marrow Transplant
.
2007
;
13
(
6
):
715
-
723
.
19.
Jagasia
MH
,
Greinix
HT
,
Arora
M
, et al
.
National Institutes of Health consensus development project on criteria for clinical trials in chronic graft-versus-host disease: I. The 2014 Diagnosis and Staging Working Group report
.
Biol Blood Marrow Transplant
.
2015
;
21
(
3
):
389
-
401 e1
.
20.
Jacobsohn
DA
,
Rademaker
A
,
Kaup
M
,
Vogelsang
GB
.
Skin response using NIH consensus criteria vs Hopkins scale in a phase II study for steroid-refractory chronic GVHD
.
Bone Marrow Transplant
.
2009
;
44
(
12
):
813
-
819
.
21.
Poole
JL
,
Steen
VD
.
The use of the health assessment questionnaire (HAQ) to determine physical disability in systemic sclerosis
.
Arthritis Care Res
.
1991
;
4
(
1
):
27
-
31
.
22.
Herzberg
PY
,
Heussner
P
,
Mumm
FH
, et al
.
Validation of the human activity profile questionnaire in patients after allogeneic hematopoietic stem cell transplantation
.
Biol Blood Marrow Transplant
.
2010
;
16
(
12
):
1707
-
1717
.
23.
Ware
JE
.
Gandek B. overview of the SF-36 health survey and the International Quality of Life Assessment (IQOLA) project
.
J Clin Epidemiol
.
1998
;
51
(
11
):
903
-
912
.
24.
Lee
S
,
Cook
EF
,
Soiffer
R
,
Antin
JH
.
Development and validation of a scale to measure symptoms of chronic graft-versus-host disease
.
Biol Blood Marrow Transplant
.
2002
;
8
(
8
):
444
-
452
.
25.
McQuellon
RP
,
Russell
GB
,
Cella
DF
, et al
.
Quality of life measurement in bone marrow transplantation: development of the functional assessment of cancer therapy-bone marrow transplant (FACT-BMT) scale
.
Bone Marrow Transplant
.
1997
;
19
(
4
):
357
-
368
.
26.
Palmer
J
,
Chai
X
,
Pidala
J
, et al
.
Predictors of survival, nonrelapse mortality, and failure-free survival in patients treated for chronic graft-versus-host disease
.
Blood
.
2016
;
127
(
1
):
160
-
166
.
27.
Im
A
,
Pusic
I
,
Onstad
L
, et al
.
Patient-reported treatment response in chronic graft-versus-host disease
.
Haematologica
.
2023
;
109
(
1
):
143
-
150
.
28.
Wolff
D
,
Cutler
C
,
Lee
SJ
, et al
.
Safety and efficacy of axatilimab at 3 different doses in patients with chronic graft-versus-host disease (AGAVE-201)
.
Blood
.
2023
;
142
(
1
):
1
.
29.
Ghosh
S
,
Baker
L
,
Chen
F
, et al
.
Interrater reproducibility of the Myoton and durometer devices to quantify sclerotic chronic graft-versus-host disease
.
Arch Dermatol Res
.
2023
;
315
(
9
):
2545
-
2554
.
30.
Muller
B
,
Ruby
L
,
Jordan
S
,
Rominger
MB
,
Mazza
E
,
Distler
O
.
Validation of the suction device Nimble for the assessment of skin fibrosis in systemic sclerosis
.
Arthritis Res Ther
.
2020
;
22
(
1
):
128
.
31.
Abignano
G
,
Aydin
SZ
,
Castillo-Gallego
C
, et al
.
Virtual skin biopsy by optical coherence tomography: the first quantitative imaging biomarker for scleroderma
.
Ann Rheum Dis
.
2013
;
72
(
11
):
1845
-
1851
.
32.
Su
P
,
Cao
T
,
Tang
MB
,
Tey
HL
.
In vivo high-definition optical coherence tomography: a bedside diagnostic aid for morphea
.
JAMA Dermatol
.
2015
;
151
(
2
):
234
-
235
.
33.
Chen
GL
,
Jeon
M
,
Ross
M
, et al
.
Optical coherence tomography for quantifying human cutaneous chronic graft-versus-host disease
.
Transplant Cell Ther
.
2021
;
27
(
3
):
271.e1
-
271.e8
.

Author notes

For potential data sharing, please inquire to the Chronic Graft-versus-Host Disease Consortium, Stephanie J Lee (sjlee@fredhutch.org).

The full-text version of this article contains a data supplement.

Supplemental data