Random forest analysis of gene expression identifies genes predictive of important clinical features in WM patients. Gene expression from the first 37 WM samples were analyzed for their utility in predicting serum IgM (A), BM disease involvement (B), and hemoglobin (C) levels using a random forest analysis. Of the statistically significant genes, the top 9 genes from each group deemed the most biologically relevant are shown as single variable correlates and incorporated into a final linear model using the full data set to demonstrate their predictive utility. The final 20 samples withheld for validation are shown in red and the root mean squared error (RMSE) of the final model for both the training and validation subsets is shown. P values have been adjusted for multiple hypotheses testing using the Benjamini-Hochberg FDR.