Abstract
Abstract 481
Most successful DNA-based genome wide association studies identify genomic regions, not genes themselves, and the findings are often devoid of context or mechanism. To identify the genetic basis of disease and disease traits, it is imperative to characterize the quantity and forms of the genes that are expressed in the tissue of interest. It is not feasible to use primary megakaryocytes to profile mRNA from large numbers of subjects, but platelet RNA is easy to obtain. Others and we have previously surveyed genome-wide platelet RNA expression using microarrays, an approach that has had a major impact on systems biology. However, microarrays have a number of limitations, including the use of probes only to known transcripts, a limited dynamic range for quantifying very low and high levels of transcripts, high background levels from cross-hybridization, and complicated normalization schemes to compare expression levels across experiments. Novel high-throughput sequencing approaches that overcome the limitations of microarrays have recently become available. RNA sequencing (RNAseq) has a remarkable ability to quantify mRNAs and provide information about transcript sequence variations, including single nucleotide changes and alternately spliced exons. The goal of these studies was to apply RNAseq to capture platelet transcriptome complexity. Total RNA was prepared using leukocyte-depleted platelets (LDP; less than 1 WBC per 5 million platelets) from 4 donors; 2 were studied twice each. Analysis of this material showed that compared to nucleated cells (HeLa, Meg-01), platelets had 50%-90% less ribosomal RNA, and high levels of messenger and small RNAs (Agilent 2100). The major reduction in platelet rRNA was confirmed by RNA gel analysis. The platelet whole transcriptomes were analyzed via the Applied Biosystems (AB) SOLiD 3Plus next generation sequencing protocols and platform. A typical sequence run generated ∼250 million reads of 50 bp each. We observed more than 30,000 independent platelet mRNA-coding transcripts from about 10,000 genes, demonstrating substantial numbers of variant isoforms. The increased sensitivity of RNAseq for low copy number is clear from these results, because prior platelet transcriptome studies using microarrays have identified only 1500–6000 expressed genes. As an example, the platelet-specific transcript, ITGA2B, showed very high copy number in platelets, but no expression in HeLa cells and modest expression in the megakaryocyte cell line, Meg-01. As is expected for RNA-Seq data, the density of mapped reads varies by exon and local sequence. We also provide examples of newly discovered SNPs that encode non-conservative amino acid changes (AKT2 1209A/T; PIK3CB 837C/G) and alter consensus exon/intron splice junction sites (P2YR12 nt 65 G/A). We have also identified a major difference in the ratio of two splice variants of the FcRg chain, 4:1 in one human platelet donor and 49:1 in another. In summary, we have demonstrated that RNAseq can accurately and sensitively determine the quantity and quality of variations in individual platelet transcriptomes. It appears that the the platelet transcriptome is approximately 10 times more complex than previously thought. The major relative reduction in platelet rRNA may be an advantage for characterizing functional platelet transcripts. RNAseq should permit better understanding of the molecular mechanisms regulating platelet physiology and identify novel genetic variants that contribute to disorders of thrombosis and hemostasis.
No relevant conflicts of interest to declare.
Author notes
Asterisk with author names denotes non-ASH members.