Abstract
Introduction: COVID-19 is likely to continue affecting populations across the world and with novel virulent strains regularly emerging, it is critical to determine which patients are at risk for its severe manifestations. Earlier scoring tools for prognostication had limitations including high risk for bias and lack of generalizability. (1) The development of newer COVID-19 scoring tools should take into account novel biomarkers of disease like immature platelet fraction% (IPF%). IPF% has been shown to be a predictor of clinical outcomes in COVID-19. (2, 3) The aim of this study is first, to identify prognostic markers for disease severity in COVID-19. Second, we aim to incorporate these prognostic markers into a COVID-19 scoring tool, to help clinicians identify patients at risk for disease progression, morbidity, and mortality.
Study Population: This study was a retrospective cohort analysis of 1,795 patients above the age of 18 hospitalized due to COVID-19 infection at the University of Texas Southwestern University Hospital.
Methods: We analyzed the population to predict two events: 30-day mortality (30DM) and mechanical ventilation (MV) during hospital stay. 30DM included all patients who died within 30 day of hospital admission. Each subject is characterized with 120 variables collected within 6hr of hospital admission. Variables missing in more than 60% in patients across the dataset were removed. Missing data is imputed with K-Nearest Neighbors (N=10, 50) method SciKit-Learn software library. For each prediction model, the study population was divided into two subgroups. Subjects of positive groups were 156 and 235 subjects for 30DM and MV groups, respectively. Random forest classifiers are trained with various parameters to avoid overfitting and underfitting. Variable selection is completed by the Random Forest Algorithm's feature importance functions. Out of 120, we identified 25 most significant variables to feed to the classifier (Table 1).
Results: Results of each predicting models were validated with 10-fold cross validation method. Random Forest classifier for 30DM model had 192 sub-trees, and obtained 0.72 sensitivity and 0.78 specificity, and 0.82 AUC. The model used to predict MV had 64 sub-trees and returned .75 sensitivity and 0.75 specificity, and 0.81 AUC.
Conclusion: Out of 120 variables collected, we identified the 25 most significant variables (Table 1). These variables were then used to generate a scoring model. Based on the results, our scoring model has a high degree of sensitivity and specificity not only for 30DM but also MV. To develop more accurate predictive models, we intend to expand our dataset with more subjects and validate the model with expansion of this prognostic model to other hospitals.
References: 1. Petersen E, Ntoumi F, Hui DS, et al. Emergence of new SARS-CoV-2 Variant of Concern Omicron (B.1.1.529) - highlights Africa's research capabilities, but exposes major knowledge gaps, inequities of vaccine distribution, inadequacies in global COVID-19 response and control efforts. Int J Infect Dis. 2022;114:268-272. doi:10.1016/j.ijid.2021.11.040
2. Ouyang SM, Zhu HQ, Xie YN, et al. Temporal changes in laboratory markers of survivors and non-survivors of adult inpatients with COVID-19. BMC Infect Dis. 2020;20(1):952. Published 2020 Dec 11. doi:10.1186/s12879-020-05678-0
3. Welder D, Jeon-Slaughter H, Ashraf B, et al. Immature platelets as a biomarker for disease severity and mortality in COVID-19 patients. Br J Haematol. 2021;194(3):530-536. doi:10.1111/bjh.17656
Disclosures
No relevant conflicts of interest to declare.
Author notes
Asterisk with author names denotes non-ASH members.