Introduction Cutaneous complications are common among patients undergoing allogeneic hematopoietic stem cell transplantation (allo-HSCT) for hematologic malignancies. They arise from a complex interplay of immune dysregulation, drug reactions, engraftment syndrome, graft-versus-host disease (GVHD), infections, and, less commonly, malignant infiltration. These dermatologic manifestations, whether benign or life-threatening, significantly impact patient morbidity and quality of life. Diagnostic accuracy is critical but often compromised by limited access to dermatologists, particularly in underserved and low-resourced areas. Artificial intelligence (AI) tools have demonstrated near-dermatologist accuracy in primary skin cancer diagnosis (Esteva et al., 2017), but their application in hematology-specific dermatologic disorders, especially in diverse populations, remains underexplored. To expand access to immediate dermatological care to transplant patients, we developed a novel AI tool, DermPathGPT01, designed to recognize cutaneous findings among post-allogeneic-HSCT patients.

Methods We conducted a retrospective analysis of 95 hematologic malignancy patients who underwent allo-HSCT at Montefiore Einstein from January 2022 – December 2024. Of these, 63.2% developed skin reactions attributed to disease or treatment. After excluding 9 patients without clinical photographs, 51 patients were included. 5 patients experienced 2 independent skin reactions at different points post-transplant. These were considered distinct clinical events and were treated as separate cases, yielding a sample size of n = 56. A novel AI model, DermPathGPT01 (built on GPT-4 architecture), was developed to analyze de-identified clinical photographs and brief patient context (age, sex, transplant timing, medical history), generating a unique top-5 ranked differential diagnosis list for each case. DermPathGPT01 outputs were benchmarked against gold standards: dermatopathology results when biopsies were available or board-certified dermatologist diagnoses when they were not. In cases where no one specific diagnosis was favored, we counted any diagnosis in the top-3 differential as the gold standard. Dermatopathology reports with no differential diagnoses listed were considered in the context of the dermatologist's impression. The primary endpoint was top-5 diagnostic accuracy. Secondary endpoints included top-1 accuracy, inter-rater reliability (Cohen's Kappa), and comparative performance (McNemar's Test).

Results DermPathGPT01 identified the correct diagnosis within its top-5 predictions in 60.7% of cases (n = 56). Top-1 diagnostic accuracy was 41.1%. Cohen's Kappa indicated no substantial agreement between AI and dermatologist diagnoses. McNemar's Test revealed that dermatologists significantly outperformed our first version of DermPathGPT01 in top-1 accuracy (p < 0.05).

Conclusions DermPathGPT01 demonstrated a strong ability to generate comprehensive differential diagnoses, but this first version remains inferior to dermatologists in selecting the most accurate diagnosis. These findings highlight the model's potential as a triage or decision support tool—particularly in settings with limited dermatology access—while emphasizing the continued need for human expertise at this time. Further refinement and validation of AI models across diverse skin types and hematologic settings is warranted to ensure equitability in clinical practice. Stem cell and cellular therapies are increasingly moving out from academic centers to the community where access to dermatologists is markedly limited. Thus, continued development of future iterations of DermPathGPT01 is expected to provide a rapid triage management tool and, importantly, to identify skin manifestations of life-threatening disorders for rapid referral to tertiary care centers.

This content is only available as a PDF.
Sign in via your Institution