In this issue of Blood, Barrington et al present the analyses of centrally reviewed positron emission tomography-computed tomography (PET-CT) used for staging and response monitoring after 2 cycles of doxorubicin, bleomycin, vinblastine, dacarbazine (ABVD) to guide treatment modification in a large prospective clinical trial in Hodgkin lymphoma (HL) (Response-Adapted Therapy in Advanced Hodgkin Lymphoma [RATHL]).1
HL is known as the best curable subtype of malignant lymphomas and as a disease often affecting young adults. Personalized PET-guided treatment is currently the scope of many large international trials, mainly to diminish early and late toxicity of treatment, while maintaining the relatively good outcome. Accurate staging is of utmost importance to selecting the appropriate treatment regimen, which nowadays consists of 2 to 4 cycles of ABVD and radiotherapy for early-stage HL, and, for advanced-stage HL, mainly chemotherapy (6-8 cycles of ABVD or 6 cycles of bleomycin, etoposide, doxorubicin, cyclophosphamide, vincristine, procarbazine, and prednisone [BEACOPP] escalated). With the introduction of PET using 2-deoxy-2-[18F]fluoro-d-glucose (FDG) in the mid-1990s, and PET-CT (with low-dose unenhanced CT) in the last decade, it became clear that by metabolic imaging, additional lesions could often be detected, especially in extranodal sites such as bone marrow and spleen.2 Although for patients already diagnosed with advanced-stage disease based on CT this extension will not change the Ann Arbor stage, for an accurate evaluation after treatment, the optimal comparison should be performed using the same modalities. The revised Cheson criteria, published in 2007, advised using FDG-PET (optional) for staging and as mandatory for evaluation of treatment.3 However, the recently published Lugano criteria describe PET-CT as the modality of choice for staging and evaluation during and after treatment.4 For the visual assessment of PET-CT, a 5-point scale (also called Deauville score [DS]) grading FDG uptake compared with physiological uptake in mediastinum and liver was introduced as a new scoring system.5 No FDG uptake or minor uptake less intense than the mediastinum are graded as DS1 and DS2, respectively. Lesions with FDG uptake between mediastinum and liver are assessed as DS3. Uptake more intense than physiological uptake in the liver is scored as DS4 (see figure), and DS5 is scored for very intense uptake or appearance of new Hodgkin-related lesions.
In this issue, Barrington et al report on the results of staging and early response monitoring by PET-CT in a large prospectively designed interim PET–guided clinical trial in advanced-stage HL.1 Patients with advanced-stage HL received 2 cycles of ABVD, followed by an interim PET-CT. Patients with a negative PET were randomized between further treatment with ABVD or doxorubicin, vinblastine, and dacarbazine (AVD). Patients with a positive PET all switched to a more intensive BEACOPP regimen. Using PET-CT as the modality to distinguish good and poor responders, and to de-escalate or to intensify chemotherapy, it is critical that this assessment is reliable and reproducible. Within this large clinical trial, including >1100 patients, all baseline and interim PET-CT scans were centrally reviewed by 5 core laboratories using the DS. Importantly, these interim PET-CT scans were reviewed within 72 hours to guide further treatment, either by continuing with ABVD or switching to escalated BEACOPP. It must be noted that the design of the infrastructure to transfer, assess, and report these baseline and interim PET scans, in a large international trial, is itself a real challenge for which the authors have to be applauded.
The RATHL staging (based on clinical assessment, contrast enhanced CT, and bone marrow biopsy) was discordant with the staging based on PET-CT in 20% of patients, mainly due to detection of extranodal HL localization or nodal HL below the diaphragm. As it was impossible to verify these localizations by histology, the disappearance during therapy was used as circumstantial evidence. As the interim PET after 2 cycles of ABVD was pivotal to guide treatment, an accurate and undebatable assessment of response is essential. The core laboratories used the 5-point Deauville system to adjudicate these PET scans, dichotomizing the scores as negative (good responders; DS1-3) or positive (poor responders; DS4-5). However, especially in HL, interpretation of interim and end-of-treatment PET scans can be difficult: FDG uptake in mediastinal masses might be difficult to distinguish from thymic uptake, and sarcoid-like reactions might mimic uptake in hilar pulmonary lymph nodes.6 Obviously, expert readers in core laboratories, reviewing a large amount of interim PET scans as performed in this clinical trial, are more experienced to balance between FDG positivity due to HL activity or due to inflammatory reactions. The reported interobserver agreement between the experts in the core laboratories, analyzed in a random subset of scans, was very good (κ 0.84). The observer agreement between the local readers and the core laboratory could only be assessed in 33% of patients, showing a κ of 0.77, indicating good agreement. Unfortunately, the local readers, for reasons not reported, did not score the interim PET scans in 67%, introducing the possibility for a bias in the agreement with an overvaluation of the κ.
At baseline, for proper measurement of lymph nodes and for radiotherapy planning, a contrast-enhanced CT (ceCT) added to PET-CT is essential.7 Whether the additional information from ceCT outweighs the radiation exposure of about 16 mSv and the extra costs is still a matter of debate.
The article by Barrington et al highlights the importance of staging with PET-CT in a large prospective study. The authors clearly demonstrate that accurate staging with PET-CT is the modern standard. For interim PET assessment, the DS is a reliable scoring system with good concordance between local and central review. So PET-CT seems to be a robust and reliable cornerstone for staging patients with HL, and for evaluation during treatment, with visual assessment using the Deauville 5-point score. Whether quantitative PET metrics (semiquantitative values of FDG uptake and/or metabolic tumor volume measurement) have added value to visual interpretation for a more precise distinction between good and poor responders is currently addressed in several clinical trials.8
The question of whether patients with advanced-stage HL and interim PET positivity will benefit from a switch from ABVD to escalated BEACOPP will be answered in this important RATHL trial. Barrington et al have demonstrated that they have realized one of the most essential prerequisites for such trials: appropriate and reliable assessment of imaging. Results of such studies should be evaluated before PET-guided treatment modifications for patients with advanced-stage HL are implemented in daily clinical practice.
Conflict-of-interest disclosure: The author declares no competing financial interests.