Abstract
Positive interim positron emission tomography (PET) scans are thought to be associated with inferior outcomes in diffuse large B-cell lymphoma. In the E3404 diffuse large B-cell lymphoma study, PET scans at baseline and after 3 cycles of rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone were centrally reviewed by a single reader. To determine the reproducibility of interim PET interpretation, an expert panel of 3 external nuclear medicine physicians visually scored baseline and interim PET scans independently and were blinded to clinical information. The binary Eastern Cooperative Oncology Group (ECOG) study criteria were based on modifications of the Harmonization Criteria; the London criteria were also applied. Of 38 interim scans, agreement was complete in 68% and 71% by ECOG and London criteria, respectively. The range of PET+ interim scans was 16% to 34% (P = not significant) by reviewer. Moderate consistency of reviews was observed: κ statistic = 0.445 using ECOG criteria, and κ statistic = 0.502 using London criteria. These data, showing only moderate reproducibility among nuclear medicine experts, indicate the need to standardize PET interpretation in research and practice. This trial was registered at www.clinicaltrials.gov as #NCT00274924.
This activity has been planned and implemented in accordance with the Essential Areas and policies of the Accreditation Council for Continuing Medical Education (ACCME) through the joint sponsorship of Medscape, LLC and the American Society of Hematology. Medscape, LLC is accredited by the ACCME to provide continuing medical education for physicians. Medscape, LLC designates this educational activity for a maximum of 0.25 AMA PRA Category 1 credits™. Physicians should only claim credit commensurate with the extent of their participation in the activity. All other clinicians completing this activity will be issued a certificate of participation. To participate in this journal CME activity: (1) review the learning objectives and author disclosures; (2) study the education content; (3) take the post-test and/or complete the evaluation at http://cme.medscape.com/cme/blood; and (4) view/print certificate. For CME questions, see page 918.
The authors, the Associate Editor Martin S. Tallman, and the CME questions author Charles P. Vega, University of California, Irvine, CA, declare no competing interests.
Upon completion of this activity, participants should be able to:
Identify study procedures in the current research
Specify the interobserver agreement in regard to interim positron emission tomographic (PET) scans in the current study
Describe the current therapeutic approach to diffuse large B-cell lymphoma
List common anatomic sites of disagreement between nuclear medicine specialists in the current study
Introduction
Remarkable predictive accuracy with midtreatment 18F-fluorodeoxyglucose positron emission tomography (PET) scans has been reported in diffuse large B-cell lymphoma (DLBCL), based on the concept that tumor burden above or below the threshold of detection after 1 to 3 chemotherapy cycles results in treatment failure or success.1 Although guidelines for PET interpretation in clinical trials have been issued, their reproducibility has not been studied carefully.2 During conduct of the DLBCL E3404 study, the rate of PET+ interim scans was lower than projected, and we therefore convened an expert panel to blindly review baseline and interim PET scans from approximately the first one-third of participants to assess reproducibility.
Methods
After a baseline PET scan, bulky or advanced DLBCL patients received 3 cycles of rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone (R-CHOP), followed by a PET scan 14 to 20 days later. During the central PET review by a single reader, a fourth cycle of R-CHOP was given and patients continued R-CHOP if PET− or changed to rituximab, ifosfamide, carboplatin, and etoposide if PET+. Scans were obtained on dedicated high-resolution PET or PET/computed tomography (CT) scanners according to protocol and quality control standards at participating Eastern Cooperative Oncology Group (ECOG) sites. Centralized PET review of baseline and interim scans was performed via file transfer or compact disc with DICOM images. The protocol specified a binary visual interpretation, which the central reviewer based on modifications of the International Harmonization Project, customized for E3404 interim scans and deemed the “ECOG criteria”: (1) only sites of abnormality at baseline are evaluated; (2) abnormal activity requires both a focal appearance and intensity greater than average liver; (3) all positive nodal sites must have an anatomic correlate; (4) activity in bone marrow and spleen is considered abnormal only if focal and clearly discernible; (5) symmetric abnormal foci in the mediastinum and hilum are considered abnormal only if the remainder of the scan is positive; and (6) new foci are considered positive only if the remainder of the scan is positive or a new lesion is focal, very intense, and associated with a lesion on CT.2 Scan interpretation was binary; the result could be “positive” or “negative.”
Three external nuclear medicine experts independently applied, without dedicated training, the ECOG study criteria as well as the London criteria to visually score every baseline lesion at midtreatment for the first 38 cases (76 scans) from the E3404 study. Neither the central reviewer nor the experts had access to any clinical information. The London criteria score scans 0 to 3 as “negative” if uptake is less than liver and 4 or 5 as “positive” for uptake that is moderately or markedly increased relative to liver.3 The 245 individual baseline lesions were identified by anatomic site and provided on a worksheet for the external experts, who applied the ECOG and London criteria to each lesion on interim PET. Each case was scored as negative or positive, and agreement among external experts was analyzed by Fleiss κ test to determine a P value for differences in proportion of positive scans. The κ statistic was used to correct for chance in the agreement among the external experts.
Results and discussion
The proportions of positive interim scans by reader were 16%, 34%, and 26%, (P = .206) for ECOG criteria and 16%, 34%, and 29% (P = .263) for London criteria, with only reader 3 scoring differently between criteria (Figure 1). With 3 experts scoring 38 interim scans (representing 1-25 baseline lesions per case), agreement was 68% for ECOG and 71% for London criteria. The κ statistic was 0.445 using ECOG criteria, indicating only moderate (typical range, 0.4-0.6) agreement per case, and 0.502 for the London criteria, also in the moderate range.
Table 1 details discordance among experts in 12 cases. Reviewer 2 was more likely to interpret interim PET scans as positive, reader 1 less likely, and reviewer 3 in between. In 5 cases, 2 reviewers considered the interim scan to be positive; and in 7 cases, a single reviewer considered the interim scan positive. Using ECOG criteria, there were 26 cases with complete agreement among experts, and these cases were also in complete agreement with the central review. Each expert considered a single case of residual bone disease positive using London criteria but negative by ECOG criteria.
Among the 12 discordant cases, the number of baseline nodal sites ranged from 0 to 16 (median, 5), and 5 cases had extranodal sites at baseline. A single site of disagreement was observed in each case, including para-aortic nodes (n = 5), bone (n = 3), spleen (n = 2), and 1 each of iliac/inguinal and supraclavicular nodes. A definitive CT correlate was present in 1 case, absent in 8 cases, debatable in 2 cases; CT was not available in 1 case. After independent review, the 12 cases were reviewed together to determine whether consensus could be achieved. There was agreement in 3 cases, with 2 cases becoming negative and the other considered positive (Table 1).
The fact that agreement of midtreatment PET among expert nuclear medicine physicians using standardized criteria was only moderate on a per-case basis has important implications as decisions are being made regarding treatment efficacy in practice as well as in clinical trials. More recently, some investigators have raised concern about the false-positive rate of interim PET in modern DLBCL treatment, which includes rituximab with its long half-life and unique mechanisms of cytotoxicity, use of dose-dense chemotherapy with scans obtained within 2 weeks of treatment, and use of granulocyte colony-stimulating factor.4-6 Indeed, the positive predictive value of interim PET scans appears to be lower in the current therapeutic era (∼ 60%) versus the prior 80% likelihood of failure with chemotherapy alone.7 The predictive value of interim PET+ scans has been positively correlated with the international prognostic index and with the international working classification response criteria.8,9 Equivocal or indeterminate dictated reports of interim PET scans, which pose challenges for clinicians, appear to predict treatment success rather than failure.6,8 The literature is inconsistent with regard to the predictive value of PET scans at the conclusion of R-CHOP, suggesting real differences in interpretation.5,6 Lin et al have proposed that changes in standard uptake value may improve the predictive accuracy of interim FDG-PET.10 In sum, the broader use of interim PET scans in the modern therapeutic era has not reproduced the dichotomous results previously reported, although progression-free survival is generally consistently inferior for interim PET+ patients.11,12
Using our study criteria, the proportion of PET+ scans was relatively low, and the current report relates solely to the reproducibility of interpretation using standardized criteria. Agreement among external experts would probably have been higher if the study had been preceded by a training exercise using the 2 study criteria, neither of which is well validated (no such criteria exist for interim PET scan). It is interesting that there was essentially no difference in agreement with either ECOG or the London criteria, which are being applied in a phase 3 Hodgkin lymphoma trial.3 Our results indicate that, among multiple involved sites at diagnosis, single sites, particularly para-aortic, spleen, and bone, were the source of disagreement on interim PET, and CT correlates of residual positive sites were frequently absent or debatable. We conclude that greater harmonization of PET interpretation is indicated for research and practice, and this will require training of nuclear physicians using consistent, validated interpretive criteria and standardized reporting.
An Inside Blood analysis of this article appears at the front of this issue.
Presented in part at the 50th annual meeting of the American Society of Hematology, San Francisco, CA, December 8, 2008.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
The authors thank John Allen and Patrick Pringle, supported by Stanford University, who provided technical and data management support.
This work was supported by the Eastern Cooperative Oncology Group Research and Foundation. The E3404 clinical trial study was conducted by the Eastern Cooperative Oncology Group (Dr Robert L. Comis, Chair) and supported in part by the National Cancer Institute, National Institutes of Health and the Department of Health and Human Services (Public Health Service grants CA21115, CA23318, CA66636, CA13650, and CA16116). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the National Cancer Institute.
National Institutes of Health
Authorship
Contribution: S.J.H. designed research, collected, analyzed, and interpreted data, and wrote the manuscript; M.E.J., H.S. and G.W. performed research and participated in analysis and interpretation of data and manuscript preparation; A.M. conducted statistical analysis and participated in interpretation of data and manuscript preparation; L.J.S. was the principal investigator of the clinical trial, facilitated central review of PET scans, and participated in manuscript review; R.A. was the coprincipal investigator of the clinical trial and participated in manuscript review; R.G. reviewed diagnostic pathology for the clinical trial and participated in manuscript review; and A.Q. designed and performed research, provided central PET review for the clinical trial, and participated in analysis and interpretation of data and manuscript preparation.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Sandra J. Horning, 875 Blake Wilbur Dr, Suite 2338, Stanford, CA 94304; e-mail: sandra.horning@stanford.edu.