Abstract
Hematopoietic cell types achieve distinct functions despite sharing the same genome, in part through the expression of alternative RNA isoforms. During hematopoiesis, RNA isoform switching occurs at specific stages of differentiation resulting in functional changes at a protein level (PMID:27151974). Isoform switching is likely widespread, with ~95% of multi-exon genes having alternative isoforms (PMID:18978789). However, we lack a comprehensive understanding of which genes undergo isoform switching versus those that maintain co-expression of multiple isoforms across hematopoiesis. Understanding isoform regulation is essential, as mutations in splicing factors are common in hematologic malignancies such as MDS and AML.
Here, we aimed to generate a comprehensive map of isoform expression across the spectrum of healthy hematopoiesis using long read single cell RNAseq with PacBio's Kinnex protocol. Bone marrow aspirates were collected from the iliac crest of 4 healthy donors (ages 37–49), cryopreserved, and viable cells were purified into progenitor and whole marrow mononuclear cell fractions. cDNA was generated using 10x GEMx targeting 10,000 cells per sample. Full-length libraries were constructed with Kinnex and sequenced on one PacBio Revio SPRQ cell per sample, achieving 95%+ Q30 base quality, median isoform lengths of ~1kb, and around 1e8 transcript reads per sample.
Because long read scRNA sequencing is a new technology, the computational tools for pre-processing and analysis remain a work in progress. Our goal was to put together a computational pipeline that solved some key limitations in existing workflows. Namely we wanted our pipeline to collapse novel isoforms in a unified way across samples, and to identify a manageable number of isoforms per gene, ideally around 1-5 on average. We created a bespoke pipeline that combines and customizes the available tools Isoseq and Isoquant to robustly pre-process long read single cell RNAseq data. Our approach allows for unified isoform collapsing across all samples and fewer spurious novel isoforms.
After pre-processing and standard QC filtering this atlas has over 65K cells and 50K robustly expressed isoforms, with a median of 5.9K molecules per cell. The cells span stages of hematopoietic differentiation, split into roughly 30K progenitor and 30K mature cells. Reassuringly, our long-read isoform dataset recapitulated a similar cell-state structure to other bone marrow datasets generated using short read scRNAseq. Moreover, our long read data integrated seamlessly with public short read scRNAseq, indicative of good data quality in our atlas.
In some cases, the long read isoforms provided an improved resolution of known cell states compared to gene expression alone. Isoform-level embeddings clearly separated naïve and memory CD4 T cells, evidenced by the well-characterized CD45RA-to-CD45RO isoform switch. We also observed isoform-level events undetectable by short-read methods. For example, gene-level analysis of BCL2L1 showed high expression in pro-B and pre-B cells, consistent with the known pro-survival function of the long isoform Bcl-xL during V(D)J recombination (PMID:9697834). Isoform-level resolution, however, revealed a pronounced shift from Bcl-xL to the pro-apoptotic short isoform Bcl-xS at the pro-B to pre-B transition, a previously unreported event.
Our ultimate goal is to use this rich dataset to understand isoform dynamics on a broad level to answer questions such as: what types of genes tend to co-express versus mutually exclude isoforms? Using a preliminary statistical model, we have found that some genes express multiple isoforms at relatively equal proportions. Other genes such as CHMP7, which is involved in nuclear envelope reformation, express isoforms in a cell-type specific manner. Using this model we can explore the spectrum of isoform co-occurrence to see whether functionally related gene networks utilize different styles of isoform regulation throughout hematopoiesis.
Together, this work demonstrates the feasibility and utility of long-read scRNA-seq for understanding isoform expression in human bone marrow. We developed a robust computational pipeline for pre-processing, validated the integration of long-read data with existing short-read datasets, and uncovered novel isoform-level insights into hematopoietic cell states. This atlas provides a foundational resource for studying how isoform diversity contributes to hematopoiesis.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal