Figure 2.
Validation of TALLSorts and example TALLSorts output figures. (A-B) Predicted probabilities for each sample (dot) by each subtype classifier (x-axis) within the holdout testing set (A) and the St Jude/Ma-Spore independent testing set (B). A red dot indicates a sample labeled as the subtype being tested; a blue dot indicates a different label. The black line is the threshold of 50% probability; all samples above this line are classified as positive for the relevant subtypes. These probability plots are generated as output by the TALLSorts package, albeit without the true positive/negative colorings in the case of data sets without known truth values. (C-D) Waterfall plots for the holdout test set (C) and independent test set (D). Each vertical bar is a sample colored by most likely subtype and with height proportional to its probability according to the classifier. Samples that do not exceed the 50% probability threshold for any subtype are labeled as “none/other.” These waterfall plots are also generated as output by the TALLSorts package. (E-F) Confusion matrices for the holdout test set (E) and independent test set (F). The y-axes correspond to subtype ground truth, and the x-axes are the TALLSorts predicted subtypes. A sample predicted positive for multiple subtypes is considered correctly labeled if its true label is one of the predictions.