Abstract
Background: Cutaneous T-cell lymphomas (CTCLs) and cutaneous lymphoproliferative disorders (CLPD) are diagnostically challenging clonal lymphoproliferations that clinically and pathologically masquerade as inflammatory dermatoses such as eczema, psoriasis and drug reactions. Accurate diagnosis requires expert dermatopathologists to integrate clinical, histopathologic, immunophenotypic, and molecular features. However clinical and ancillary pathologic data is often unavailable, and overall exposure to CTCL/CLPD in training is poor, contributing to uncertain diagnosis and inconsistent patient management, particularly in early stage disease.
Methods: We developed a weakly supervised, end-to-end AI system for the classification of CTCL/CLPD using routinely stained hematoxylin and eosin (H&E) slides. The system combines a pretrained pathology foundation model with gated attention-based multiple instance learning to analyze whole-slide images and identify regions most predictive of a CTCL/CLPD diagnosis.
To generate large-scale training data, we applied a large language model (LLM; GPT-4o) to 2,803 pathology reports from Memorial Sloan Kettering Cancer Center, which were preselected based on the presence of relevant keywords. The LLM parsed free-text diagnoses, extracted slide-level associations, and assigned case-level labels as positive or negative for CTCL/CLPD. This process identified 1,011 positive and 1,482 negative cases, corresponding to 2,493 whole-slide images used for model training (47% female; mean patient age: 61 ± 15 years).
The feature extractor model extracted 1,024-dimensional embeddings from 256x256 patches at 20x magnification using the UNI pathology foundation encoder. These embeddings were aggregated via gated attention multiple instance learning (MIL) for binary classification. For evaluation, a balanced, held-out test set of 50 slides (25 positive, 25 negative) was randomly selected from the LLM-labeled dataset. A dermatopathologist independently reviewed these cases to confirm label fidelity and provide a pathologist-verified benchmark. Subsequently, one case initially labeled as CTCL by the LLM was excluded from the test set due to an inconclusive diagnosis in the report.
Results: Our model demonstrated strong performance in distinguishing cutaneous lymphoproliferative disorders from reactive mimics using H&E morphology alone. It achieved an area under the ROC curve (AUROC) of 0.96, overall accuracy of 0.84, sensitivity of 66.7%, and specificity of 100%. Precision was 1.00, indicating perfect positive predictive value. This performance suggests the model is highly reliable for confirming disease presence, though with moderate sensitivity. The attention maps revealed strong localization to perivascular and lichenoid infiltrates, intraepidermal/epidermotropic regions, and adnexal structures, highlighting clinically relevant features learned without explicit supervision.
Conclusion: This study demonstrates the synergistic application of large language models for automated cohort curation and advanced computer vision techniques to train high-performing models for challenging histopathologic diagnoses. Using this approach, we developed a model that achieved highly accurate diagnosis of cutaneous lymphoproliferative disorders based on H&E morphology. Our AI-based approach to CTCL shows promise for reducing diagnostic variability, improving triage, and guiding ancillary testing. The model's interpretable outputs support integration into dermatopathology workflows, offering decision support in an area marked by high clinical ambiguity.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal