Figure 1.
Overall workflow of data analysis. Baseline variables and summary statistics of longitudinal measurements collected in the period from 30 days before allo-HCT to 30 days afterwards in the St Jude data set were included in the analysis. A total of 10 replicates of multiple imputation were performed for the missing baseline variables. Each imputed data set was randomly split into 70% training data and 30% validation data for model construction and validation. Establishing the ML model involved 2 steps: dimension reduction and model construction. Dimension reduction was performed by univariate logistic regressions. The top 50 variables with the smallest P values were selected into model construction. The ML algorithm used for classification was a naïve-Bayes, which classifies subjects as predicted to be deceased or alive based on the conditional probabilities. The constructed ML model was assessed on the validation data set according to different evaluation metrics (eg, Kaplan-Meier plots).