Abstract
In 2005, a NIH consensus conference was held to better define methods for research in chronic GVHD (cGVHD). Provisional definitions of response categories for individual organs and overall cGVHD disease activity were proposed: complete response (CR), partial response (PR), stable disease (SD) and progressive disease (PD). These response criteria were designed to improve consistency in documentation of disease activity across different centers, to allow less biased response assessments by comparison of enrollment and follow-up measures, rather than relying on clinician perceptions of change in the setting of clinical trials. In this study, we compared the proposed response criteria with clinician-reported changes in organ specific and overall responses. Good agreement would suggest that the proposed response criteria mirror clinician judgments of whether patients are responding to treatment or not. Methods: Patients ≥ 2 years of age diagnosed with cGVHD requiring systemic treatment ≤ 3 years after transplantation were eligible and assessed every 3–6 months. At each visit, clinicians reported the following: organ specific measures (used to calculate the NIH organ response for skin, mouth, eye and overall), perception of change in organ and overall involvement (completely gone = CR; very much or moderately improved = PR; a little better, stable, or a little worse = SD; or moderately or very much worse = PD), and overall aggregate response (CR, PR, SD, PD). Kappa statistics were used to compare agreement between these measures, with 0.21–0.4 considered fair agreement. Results: As of September 2010, 290 patients who had at least one follow-up visit 3 or 6 months beyond enrollment were included, with median age of 51 years (2–79). Based on NIH overall response criteria, 24 (8%) had CR, 83 (29%) had PR, 25 (9%) had SD, and 158 (54%) had PD for an overall CR+PR of 37%. In contrast, clinicians reported that 31 (11%) had CR, 171 (59%) had PR, 30 (10%) had SD and 56 (19%) had PD for an overall CR+PR of 70%. For organ specific comparisons, agreement rates between NIH proposed response measures and clinician reported changes in skin, mouth and eye were fair. For overall response, agreement rates between the calculated NIH response and clinician-reported overall change and clinician-reported response status were also fair. (Table) Conclusions: For both organ-specific and overall comparisons, the proposed NIH response criteria do not agree well with responses determined by clinicians. These data suggest that conclusions from prior literature reporting high overall CR+PR rates based on clinician judgment would not be supported if the current NIH response criteria had been used to measure response. Additional studies are needed to validate candidate response criteria through correlation with a robust, objective and informative gold standard.
Organ . | Response measure . | N . | NI . | CR . | PR . | SD . | PD . | Kappa with NIH response . |
---|---|---|---|---|---|---|---|---|
Skin | Calculated NIH skin response | 286 | 35% | 22% | 7% | 15% | 21% | |
Clinician reported skin change | 286 | 29% | 17% | 17% | 32% | 5% | 0.39*/0.43** | |
Mouth | Calculated NIH mouth response | 287 | 20% | 15% | 7% | 45% | 13% | |
Clinician reported mouth change | 287 | 20% | 15% | 29% | 33% | 4% | 0.28*/0.35** | |
Eye | Calculated NIH eye response | 168 | 40% | 10% | 4% | 26% | 19% | |
Clinician reported eye change | 168 | 44% | 2% | 10% | 39% | 5% | 0.29*/0.26** | |
Overall | Calculated NIH overall response | 288 | — | 8% | 29% | 9% | 54% | |
Clinician reported overall change | 285 | — | 7% | 41% | 45% | 8% | 0.24** | |
Clinician reported response status | 288 | — | 11% | 59% | 10% | 19% | 0.20** |
Organ . | Response measure . | N . | NI . | CR . | PR . | SD . | PD . | Kappa with NIH response . |
---|---|---|---|---|---|---|---|---|
Skin | Calculated NIH skin response | 286 | 35% | 22% | 7% | 15% | 21% | |
Clinician reported skin change | 286 | 29% | 17% | 17% | 32% | 5% | 0.39*/0.43** | |
Mouth | Calculated NIH mouth response | 287 | 20% | 15% | 7% | 45% | 13% | |
Clinician reported mouth change | 287 | 20% | 15% | 29% | 33% | 4% | 0.28*/0.35** | |
Eye | Calculated NIH eye response | 168 | 40% | 10% | 4% | 26% | 19% | |
Clinician reported eye change | 168 | 44% | 2% | 10% | 39% | 5% | 0.29*/0.26** | |
Overall | Calculated NIH overall response | 288 | — | 8% | 29% | 9% | 54% | |
Clinician reported overall change | 285 | — | 7% | 41% | 45% | 8% | 0.24** | |
Clinician reported response status | 288 | — | 11% | 59% | 10% | 19% | 0.20** |
NI, not involved; CR, complete response/completely resolved; PR, partial response/moderately better, very much better; SD, stable disease/a little better/stable/a little worse; PD, progressive disease/moderately worse, very much worse
simple kappa, including all patients
weighted kappa, limited to patients with involvement by both measures at enrollment
No relevant conflicts of interest to declare.
This icon denotes a clinically relevant abstract
Author notes
Asterisk with author names denotes non-ASH members.