Key Points
Exact quantitation of RBC dysmorphologies in peripheral blood smears can be accurately performed using a computer vision system.
This quantitation allowed for improved diagnostic and prognostic evaluations of multiple hematologic disease states.
Abstract
Examination of red blood cell (RBC) morphology in peripheral blood smears can help diagnose hematologic diseases, even in resource-limited settings, but this analysis remains subjective and semiquantitative with low throughput. Prior attempts to develop automated tools have been hampered by their poor reproducibility and limited clinical validation. Here, we present a novel, open-source machine-learning approach (denoted as RBC-diff) to quantify abnormal RBCs in peripheral smear images and generate an RBC morphology differential. RBC-diff cell counts showed high accuracy for single-cell classification (mean AUC, 0.93) and quantitation across smears (mean R2, 0.76 compared with experts, interexperts R2, 0.75). RBC-diff counts were concordant with the clinical morphology grading for 300 000+ images and recovered the expected pathophysiologic signals in diverse clinical cohorts. Criteria using RBC-diff counts distinguished thrombotic thrombocytopenic purpura and hemolytic uremic syndrome from other thrombotic microangiopathies, providing greater specificity than clinical morphology grading (72% vs 41%; P < .001) while maintaining high sensitivity (94% to 100%). Elevated RBC-diff schistocyte counts were associated with increased 6-month all-cause mortality in a cohort of 58 950 inpatients (9.5% mortality for schist. >1%, vs 4.7% for schist; <0.5%; P < .001) after controlling for comorbidities, demographics, clinical morphology grading, and blood count indices. RBC-diff also enabled the estimation of single-cell volume-morphology distributions, providing insight into the influence of morphology on routine blood count measures. Our codebase and expert-annotated images are included here to spur further advancement. These results illustrate that computer vision can enable rapid and accurate quantitation of RBC morphology, which may provide value in both clinical and research contexts.
Introduction
Quantitation and differential profiling of blood cells are the cornerstone of modern clinical diagnosis.1,2 For example, the white blood cell (WBC) differential, a quantitative profile of WBC subtypes, can flag infections or malignancies.3,4 Unlike WBCs, there are no functionally distinct normal RBC subtypes. However, morphologic RBC subtypes are associated with pathology and can be pathognomonic5 – eg, sickle cells in sickle cell disease, spiculated cells in liver disease, or teardrop cells in bone marrow disorders. Although clinical laboratory technologies to analyze WBCs have advanced,6-8 RBC profiling technologies have not and are still primarily limited to evaluating changes in RBC size and hemoglobin content, with only a limited analysis of morphology.7,9 An objective and quantitative differential of RBC subtypes could provide valuable clinical insights as a scalable, standardized, and automated summary of morphology. However, RBC shape cannot be accurately detected by standard automated hematology analyzers that rely on optical scatter or electrical impedance,10 making alternative approaches necessary.
To be the most clinically useful, RBC morphology classification must be fast and accurate. For example, identification of schistocytes is a linchpin in the diagnosis of immune thrombotic thrombocytopenic purpura (iTTP), a life-threatening medical emergency that can be treated with immediate therapeutic plasma exchange.11 Yet, assessment of schistocytes is primarily performed through manual examination of a peripheral smear, a slow and subjective process often involving initial evaluation by a laboratory technologist and subsequent review by a hematologist or hematopathologist.7 These review processes typically generate semiquantitative flags (No flag, 1+, 2+, and 3+) that categorize smears in terms of frequency but are based on criteria that can vary substantially across hospitals.9 The result may be delayed or inaccurate diagnoses12 that do not fully use the information present in the smear. The lack of methods for rapid and objective RBC evaluation is also a key obstacle in clinical research to investigate the novel diagnostic information contained in peripheral blood smears. In particular, the lack of automated tools means that RBC morphology quantitation is not regularly recorded in electronic medical records and is subsequently unavailable for large-scale retrospective studies of hematologic diseases. More broadly, smears contain rich information on the RBC shape, size, and hemoglobin content at the single-cell level, which are rarely captured or used. This is in stark contrast to the increasing role of single-cell data in understanding human physiology in other settings.
The automated capture of peripheral blood smear images and artificial intelligence have the potential to address many of these limitations. Digital peripheral smear images are already automatically captured in many hospitals and are regularly used to conduct remote manual reviews. Current state-of-the-art for automated RBC morphology analysis includes CellaVision analyzers, which provide some preclassification of RBCs to assist with manual grading of smears.7,13,14 These systems assist in creating semiquantitative grading but are not sufficiently calibrated for important RBC subtypes, such as schistocytes,7,14 and the corresponding hardware may not be available in resource-limited settings. Research and development of additional tools have been hampered by poor reproducibility,15 inadequately small image data sets,16 limited clinical testing,17,18 or narrow focus on a few morphologies.19 Although some recent approaches have shown promise,17,20,21 they have not been validated at the cell population level, at which clinical assessments are made, nor have they been shown to add value in clinical diagnosis.
Here, we address these limitations by presenting a novel, open-source machine-learning pipeline (denoted as RBC-diff) for the calculation of an RBC morphology differential from peripheral blood smear images. We validated the RBC-diff performance at single-cell and cell population levels using a clinical grading from a multicenter database of 338 577 smears. We then retrospectively applied the RBC-diff in multiple clinical contexts, demonstrating its value in differential diagnosis and prognosis. Finally, we illustrated the utility of RBC-diff in a research setting by showing how this tool can derive novel single-cell data that can help improve the understanding of how morphology contributes to routine blood count indices.
Materials and methods
Peripheral smear collection
Images were collected for all peripheral blood smears at Massachusetts General Hospital (MGH) between 4 November 2015 and 15 November 2021 (n = 281 745 images and 49 056 patients), and at Brigham and Women’s Hospital (BWH) between 1 January 2021 and 20 December 2021 (n = 56 832 images and 9894 patients). Smear slides were created as part of standard clinical care (further details are given in supplemental Methods) and imaged using CellaVision (DM96 or DI60) with an image resolution of ∼0.2 μm per pixel. The CellaVision system automatically identifies and captures an appropriately dispersed area of the smear adequate for clinical evaluation,14 typically between 500 and 600 μm in width and height, containing ∼1000 to 3000 RBCs. Morphology grading flags (generated by the clinical hematology laboratory) were recorded as either present, 1+, 2+, or 3+ per the local clinical laboratory guidelines (supplemental Methods). The characteristics of the MGH and BWH cohort are given in supplemental Table 1.
RBC-diff algorithm
The RBC-diff was designed to calculate the relative abundance of 9 types of RBC morphology (normal RBCs, elliptocytes, microcytes, macrocytes, schistocytes, sickle cells, spiculated cells, teardrop cells, and other abnormal RBCs). The algorithm takes a smear image, binarizes it, and uses black-white boundary detection to identify all potential cells. Ten geometric features were used to classify each potential RBC using a support vector machine classifier. See supplemental Methods for details on (1) feature calculation, (2) algorithm training, (3) effects of sample preparation delay, (4) intrasample variability, (5) robustness against data set shift, (6) performance with manually collected images, and (7) approximate normal reference ranges.
Expert estimates
To provide a reference for RBC-diff performance, 5 experts (board-certified hematopathologists or hematologists) were asked to estimate the prevalence (%) of specified cell types in 5 sets of 10 smears (10 smears for elliptocytes, schistocytes, sickle cells, spiculated cells, and teardrop cells), with each set containing 5 smears with a 1+ flag for the given cell type, and 5 with no flag. The experts were blinded to the clinical details and morphology grading flags. To simulate standard high-power microscopic fields, each smear was presented as a series of 16 smaller images, each containing ∼100 to 200 RBCs. To best reflect clinical practice, the experts were not given specific instructions on how to perform the task.
Clinical cohort studies
For further validation, we tested the discriminatory capacity of RBC-diff counts across 5 clinical cohorts with clear pathophysiologic signals: elliptocytosis vs spherocytosis, before and after liver transplantation, before and after RBC exchange in patents with sickle cell disease, before and after iron supplementation in patients with iron-deficiency, and before and after splenectomy. The cohort inclusion criteria are given in the supplemental Methods.
Thrombotic microangiopathy (TMA) cohort
Patients with TMA were drawn from the Harvard TMA Research Collaborative data set.22 Two TMA cohorts were collated: a derivation and a validation cohort. The derivation cohort consisted of patients presenting at the MGH between 31 March 2017 and 30 November 2020, and the validation cohort consisted of patients presenting at the MGH and BWH between 1 January 2021 and 19 December 2021. Patient details were gathered through a detailed chart review by members of the study team. Immune thrombotic thrombocytopenic purpura (iTTP) was defined as an ADAMTS13 enzyme activity level ≤10% (normal reference range, activity >66% for assay at Blood Center of Wisconsin) or ADAMTS13 enzyme activity ≤ 25% with an inhibitor of >1.0 inhibitor units (normal reference range, <0.5 inhibitor units). Outpatient cases of Upshaw-Schulman syndrome were not defined as iTTP cases. See the supplemental Methods for additional details.
Matched cohort mortality analysis
Relationships between RBC-diff counts and all-cause mortality were estimated using a matched cohort analysis. Using each patient’s first available smear, for a given abnormal RBC type, each patient with a corresponding count < 0.5% was matched to a patient of the same sex, race, comorbidity profile, age (<5-year gap), hematocrit (<10% absolute gap), and morphology grades, with the given cell type count between 0.5% and 1% or >1%. Mortality differences were analyzed using Kaplan-Meier curves and log-rank test. See the supplemental Methods for additional details.
Generation of single-cell volume-morphology distributions
To estimate individual cell volumes, the mean pixel area of each detected RBC was converted to μm2 (based on image resolution) and multiplied by 2.5 μm (approximate average vertical height of an RBC23). Smear-derived estimated volumes were then compared with blood count-derived mean corpuscular volume (MCV) and RBC distribution width (RDW), measured as part of standard clinical care on the Sysmex and Advia instruments, (supplemental Figure 1). These estimated volumes are based on 2D information and are therefore expected to be less accurate than approaches that include 3D information.24
Statistical analysis
All statistical analysis was performed in MATLAB and R. For continuous variables, unless otherwise noted, we reported the means (std) and use 2-sided t tests (for 2 variables) or analysis of variance (for 3+ variables) for population comparisons. For categorical variables, we reported percentages and used a χ2 test for population comparisons. Differences between model sensitivities and specificities were calculated using χ2 tests based on true positive and false negative rates (for sensitivity) and true negative and false positive rates (for specificity). The thresholds for statistical significance in the hypothesis tests were set at P = .05. For event rates, confidence intervals were calculated assuming binomial distributions.
Ethics
The study protocol was approved by the local institutional review board (IRB) of the MGH.
Results
The RBC-diff provides rapid and accurate morphologic assessments
Across a set of 5000 manually labeled single-cell images (2/3 used for training and 1/3 for testing), RBC-diff accurately classified each major morphologic class (mean test set area under the receiver-operator curve [AUC], 0.93, minimum AUC, 0.85; Figure 1A). Across cell population smear images (typically containing 1000-3000 RBCs), RBC-diff counts were concordant with expert estimates (R2 = 0.61, 0.71, 0.98, 0.75, and 0.75, for elliptocytes, schistocyte, sickle, spiculated cells and teardrop cells respectively; Figure 1B; supplemental Figure 2). The mean algorithm-expert correlation (R2, 0.76) was comparable with the interexpert concordance (R2, 0.75) suggesting that the algorithm performance is limited by the lack of an objective gold standard definition for each class. Interexpert comparisons were concordant but often only weakly calibrated, with the average estimated cell prevalence varying up to fourfold (supplemental Figure 2), highlighting the potential value of a more objective and consistent approach to quantitation. Across 8459 cases for which 2 smears were generated from 1 blood sample, RBC-diff counts showed low intrablood sample variability (Figure 1C). Across 281 745 smears from MGH, the RBC-diff counts aligned with the morphology grades assigned by the hematology laboratory, with higher grade flags (1+, 2+, 3+) associated with increased cell counts of the given type (Figure 1D). This consistency was also observed across 56 832 smears from BWH (supplemental Figure 3), despite significant interhospital differences in smear grading protocols (see supplemental Methods).
As a final validation, we tested whether RBC-diff counts would detect expected qualitative morphology perturbations across 5 clinical cohorts (Figure 2). Elliptocyte elevations were observed in patients with hereditary elliptocytosis but not in those with hereditary spherocytosis (Figure 2A). RBC-diff counts also accurately tracked expected changes after clinical intervention: spiculated cells decreased after liver transplantation25 (Figure 2B); sickle cells decreased after RBC exchange26 (Figure 2C); microcytes decreased after IV iron supplementation in iron-deficient anemia27 (Figure 2D), and schistocytes increased after splenectomy28 (Figure 2E). These changes typically occurred with stable profiles for the other morphologies (supplemental Figure 4) and, often, in settings in which grading by the clinical laboratory did not change. For example, 20 of 46 (44%) patients who underwent liver transplantation showed no change in spiculated smear grades, as assessed by the clinical laboratory, from pre to posttransplantation, whereas RBC-diff detected a decrease in spiculated cells in 17 of 20 (85%) patients (mean absolute decrease, 7.3%).
In addition to its accuracy, RBC-diff was also (1) fast (<1 second image processing time), (2) accurate with manually photographed smear images (supplemental Figure 5), and (3) insensitive to changes in image hue, as is often observed between medical centers29 (supplemental Figure 6).
RBC-diff facilitates the speed and specificity of iTTP and HUS diagnosis
To evaluate the diagnostic utility of the RBC-diff, we considered a cohort of patients with TMA22 with concern for iTTP, a medical emergency involving a severe acquired deficiency in the von Willebrand factor-cleaving protease ADAMTS13.11 The definitive diagnostic test for iTTP is an ADAMTS13 activity assay, which is typically performed in a reference laboratory, limiting availability in emergency settings. Patients with thrombocytopenia suspected of having iTTP were evaluated manually and subjectively for the presence of schistocytes in the peripheral smear. Therefore, we sought to test whether RBC-diff counts could facilitate objective and rapid iTTP diagnosis before the ADAMTS13 activity results were known. We constructed 2 independent cohorts of 106 (derivation cohort) and 90 (validation cohort) TMA cases, with etiology determined by physician review of clinical charts (Figure 3A, see Methods for further details). iTTP and hemolytic uremic syndrome (HUS) showed higher schistocyte counts than all other TMA etiologies (Figure 3B), although the relapsed iTTP cases had lower schistocyte counts than the initial episodes (Figure 3B). Considering the full differential, iTTP and HUS cases exhibited a unique fingerprint with schistocyte elevations being predominant (schistocyte levels being higher than other morphologies; Figure 3C,D). Elevated schistocytes (with or without predominance) provided high specificity and sensitivity for the diagnosis of iTTP or HUS compared with other TMAs, outperforming hematology laboratory grading (Figure 3E). From the derivation cohort, the optimal diagnostic criteria were identified as (1) schistocytes >4% or (2) schistocytes >2% and predominant (supplemental Figure 7). In the validation cohort, these joint criteria produced significantly higher specificity (72% vs 42%; P < 1e-5) and positive predictive value (41% vs 25%; P < 1e-5) than the hematology laboratory grades, while providing 100% sensitivity (Figure 3F). Schistocyte counts provided a diagnostic signature that was not captured via routine blood count measures (Figure 3G).
RBC-diff counts are associated with prognosis in multiple populations
While reviewing the clinical charts, we noted a high mortality rate in the TMA derivation cohort, particularly among patients who were ultimately not diagnosed with iTTP or HUS. This led us to investigate whether high schistocyte counts were associated with mortality. In the TMA derivation cohort (excluding iTTP and HUS cases), elevated schistocytes at the time of ADAMTS13 testing were associated with a nearly five fold increase in 7-day mortality (3.7% to 17.7%; P = .027; χ2 = 4.9, df = 1) (Figure 4A). Similar schistocyte-mortality associations were observed in the earliest available blood smears from 49 056 patients with MGH. In this cohort, elevated levels of schistocytes (>1%) were associated with increased 6-month all-cause mortality compared with low levels of schistocytes (<0.5%; 11.3% mortality vs 6.4%; P < .001), after matching cohorts for demographics, comorbidities, hematology laboratory grading, hematocrit, and other RBC-diff counts (Figure 4B; supplemental Methods). This signal was validated in an independent cohort of 9894 patients with BWH and was maintained after excluding patients with a cancer diagnosis before or within 30 days of the blood smear (supplemental Figure 8). A chart review of 100 randomly selected deceased patients with high or low levels of schistocytes found no significant differences in the primary cause of death (supplemental Figure 8; χ2 test; P = .56; χ2 = 4.9, df = 6), suggesting that this schistocyte signal may be a complementary predictor of mortality risk and is not specific to 1 pathologic process. This signal was also maintained after controlling for RDW, which is a well-known nonspecific risk factor for morbidity and mortality30,31 (supplemental Figure 9). A weaker mortality association was observed in elevated spiculated cells (Figure 4B). No mortality association was observed for other RBC morphologies (supplemental Figure 8).
The RBC-diff provides single-cell insights into routine blood count measures
Using the pixel dimensions of each identified cell, the RBC-diff can provide an estimate of individual RBC volumes (see Methods). Although less accurate than 3D approaches,24 these estimates are concordant with the routine complete blood count (CBC) indices MCV and RDW (supplemental Figure 1), and the RBC-diff can therefore be used to investigate how RBC morphology affects CBC indices by analyzing approximate volume-morphology distributions (Figure 5A). Using this method, across the preoperative liver transplantation cohort (Figure 2B), spiculated cells were, on average, 14% smaller than other RBCs but did not significantly decrease MCV (Figure 5B). In the iron-deficiency cohort (Figure 2D), the response to iron therapy involved an increase in the size of all RBCs and not just a reduction in microcytes (Figure 5C-D). In the derivation iTTP cohort (Figure 3), schistocytes were, on average, 30% smaller than other cells but only drove a 2 fL mean decrease in MCV (90.5-88.4 fL) (Figure 5E-F). Conversely, schistocytes drove an average absolute RDW increase of 1.9% (18.4%-20.3%) (Figure 5G). These 2 results suggest that previously reported MCV decreases in iTTP32 may be driven mostly by increased microcytosis rather than by schistocytosis and that a sudden increase in RDW in inpatient settings may be an early signal of emergent schistocytosis. Single-cell analysis of iTTP cases also revealed a significant inverse correlation between average schistocyte size (as a percentage of average cell size) and schistocyte count, suggesting that higher schistocyte counts may involve harsher or repeat shearing of cells (Figure 5H).
Discussion
Here, we present a novel machine-learning algorithm for the quantification of RBC morphologies in peripheral blood smear images. We validated this method at the single-cell and cell population levels, including comparison with morphology grading flags. We demonstrate how this method can aid in the differential diagnosis and evaluation of patient prognosis in multiple clinical settings. Finally, we illustrate how this method may help elucidate the effects of RBC morphology on routine CBC indices and help understand the pathophysiology of disease progression and treatment response.
Some previously developed machine-learning methods for the classification of RBC abnormalities have been limited by small or poor-quality data sets,16 choice of nonstandard classification categories,33 and limited clinical correlation.17,18 Other approaches have shown good performance in larger or well-defined data sets,17,20 and have often focused on individual cell classification without validation at the smear level or in the context of clinical care. Because human assessment of an individual morphology of a cell will be informed by morphologic heterogeneity across the entire smear, the clinical application of blood smear analysis involves consideration of the overall RBC population. Our approach overcomes these limitations using a robust and multipronged validation approach to demonstrate the accuracy of the method and its potential for diagnostic and prognostic applications (Figures 1-4). RBC-diff classifications were also insensitive to changes in image hue and the method performed well at a separate medical center and on manually collected images (supplemental Figures 3, 5, and 6). Because it is possible that alternative or complementary approaches to classification, such as using neural networks or automating feature selection,17,20,21 could enhance performance, we provide single-cell and cell population images (and associated expert labels) as a public resource (supplemental Data 2).
One significant challenge in automating the detection of RBC morphology is the lack of clear definitions of the specific morphologies. Unlike WBCs, RBC types do not have distinct mechanistic functions that help inform cellular structure, and morphologic classes tend to arise subjectively. Although Researchers such as Bessis et al, have elucidated and described RBC morphology in detail in experimental settings,34,35 the definition of morphology in clinical settings remains subjective, as demonstrated by the modest interexpert agreement levels we found (supplemental Figure 2). The type of objective definitions of morphologic class provided by the RBC-diff would improve the reproducibility of smear analysis, interpretation, and clinical utilization.
RBC-diff demonstrates the potential benefits of more precise and objective quantitation of RBC morphology. Compared with morphology grading flags, RBC-diff counts improved the sensitivity and specificity of the differential diagnosis of iTTP (Figure 3). Schistocyte levels are known to be of importance in ADAMTS13 deficiencies,32,36 but manual differentiation between different levels (1+, 2+, etc) of schistocytes is challenging, with expert assessments often differing substantially.13 The RBC-diff provides an objective and reproducible definition of significant schistocyte elevation, including determination of predominance, a recommendation in clinical guidelines.37 Different TMA etiologies had distinct RBC-diff count fingerprints (Figure 3), suggesting that this tool could play a role in the initial evaluation of patients with TMA, complementing scoring systems such as the PLASMIC score32,36 to assess the risk of severe ADAMTS13 deficiency.
Figure 4 shows that RBC-diff counts may in some scenarios be predictive of patient outcomes or track with patient prognosis. The surprising associations with mortality in Figure 4B persisted after adjusting for multiple factors, including morphology grading flags, comorbidities, and RDW, suggesting the presence of valuable and underutilized clinical information in blood smears. We note that the population of patients with blood smears at MGH is not representative of the general patient population, and further study of this signal in healthy cohorts is required.
RBC-diff can also help provide single-cell insights into the influence of morphology on CBC indices (Figure 5) via the estimation of single-cell volumes. Estimation of blood count parameters from imaging data has previously been shown to be promising, with prototype approaches showing similar accuracy to flow-cytometry approaches.38,39 By connecting estimated CBC indices to morphology, RBC-diff can generate morphology-corrected CBC indices that may provide improved discrimination of pathologic states or response to treatment. It has been shown that hemoglobin levels can be estimated from blood smear images,39 suggesting that RBC-diff could potentially be extended to other blood count measures such as hemoglobin and mean corpuscular hemoglobin.
This study focuses primarily on the quantitation of schistocytes because they are commonly elevated in important acute care settings40-44 and existing automated systems show limited specificity of detection.14,45 Schistocyte counts can be approximated via the fragmented red cell count (FRC), which can be calculated via flow-cytometry.46,47 FRC counts are sensitively but nonspecifically associated with imaging-derived schistocyte counts47 and may have diagnostic value for TMAs.48 However, FRC counts do not provide information on other RBC morphologic classes, a key feature for differential diagnosis in our study (Figure 3) and a part of the current recommendations for TMA diagnosis.41 Although FRC is a valuable correlate of schistocyte levels, it is typically used to highlight the need for manual smear review,46 and thus may provide value in tandem with the RBC-diff. A robust comparison of the RBC-diff schistocyte counts and FRC was not possible in this study because the primary clinical hematology analyzers at the MGH and BWH do not routinely record FRC values.
The RBC-diff is not intended to replace manual smear review but rather to provide technical assistance to improve speed and objectivity. The CBC and WBC differential currently provide an objective and quantitative foundation that informs manual smear review, and the RBC-diff could bolster this foundation. Given its accuracy with manually collected images (supplemental Figure 5), this potential application may be of particular benefit in resource-limited settings in which automated imaging systems are unavailable. However, it should be noted that RBC-diff was designed to quantify only 5 major morphologic classes and does not currently assess hypochromia, pallor, polychromasia, or RBC inclusions, and thus does not yet detect target cells, spherocytes, or RBC parasites. Similarly, the algorithm does not use advanced techniques17 to account for cell adhesion or crowding and may be less accurate in settings of extreme agglutination or poor smear quality. The expansion of cell classes and adjustments for smear quality are exciting avenues for future work.
Our application of RBC-diff primarily focused on the clinical setting of TMAs, where red cell dysfunction is commonplace. However, we speculate that RBC-diff may be valuable in many other clinical settings, such as: (1) schistocyte quantitation in disseminated intravascular coagulation,40 sepsis, or pregnancy-related conditions such as HELLP49; (2) sickle cell quantitation for sickle cell disease50; and (3) spiculated cell quantitation in severe liver disease. These reflect avenues for future research with a significant potential to improve clinical outcomes. More broadly, given its speed, accuracy, and robustness, we hope that RBC-diff may provide a powerful new lens to study red blood cell morphology in disease.
Acknowledgments
The authors thank the Mass General Brigham Research Patient Data Registry and Electronic Data Warehouse groups for facilitating the use of their databases, Chris Lofgren for the assistance with MGH database access and management, Olga Pozdnyakova for the assistance in accessing blood smears from BWH, Rahul Deo for the valuable conversations about the analysis, and the CellaVision team and Hangs-Inge Bengtsson for their help in archiving blood smear data.
This work is supported by the Vickery-Colvin Pathology Research Grant (J.A.S., J.M.H., and R.S.M.), the One Brave Idea Initiative (J.M.H.), and the Evelyn and Robert Luick Endowed Fund for the Blood Transfusion Service at MGH (R.S.M.). H.A.-S. is the recipient of the American Society of Hematology Scholar Award.
Authorship
Contribution: J.A.S., B.H.F., J.M.H., and R.S.M. conceived the project and its design; B.H.F. wrote the code for the RBC-diff with input from other authors; and all authors conducted analyses and contributed to writing the manuscript.
Conflict-of-interest disclosure: R.S.M. and P.K.B. have both worked as consultants for Alexion on a project to validate use of the PLASMIC score for diagnosis of atypical HUS. H.A.-S. lists universal disclosures from research funding Agios, Amgen, Dova/Sobi, and consultancy for Agios, Dova/Sobi, Novartis, Rigel, argenx, Moderna, and Forma (all unrelated to this article). The remaining authors declare no competing financial interests.
Correspondence: John M. Higgins, 185 Cambridge St, Boston, MA 02114; e-mail: higgins.john@mgh.harvard.edu; and Robert S. Makar, Massachusetts General Hospital, GRJ2-233, 55 Fruit St, Boston, MA 02114; e-mail: rmakar@mgh.harvard.edu.
References
Author notes
∗B.H.F. and J.A.S. are joint first authors and contributed equally to this study.
The code to run RBC-diff, including example images and README, is included in supplemental Data 3.
The raw data for the figures and tables in the manuscript are included in supplemental Data 1. Because of IRB restrictions on the sharing of protected health information, raw data for certain figures were not included or have been limited to ensure anonymization.
A labeled set of 5000 single RBC images used to train the algorithm and a set of 50 cell population level smears with expert estimates of cell density are provided (supplemental Data 2).
An additional set of 5000 manually labeled single-cell images, not used in model training, is also provided (see supplemental Methods for further details).
Other details about the code are available on request from the corresponding author, Brody H. Foy (bfoy1@mgh.harvard.edu).
The full-text version of this article contains data supplement.