Abstract
Blood is a complex fluid that samples all tissues in the human body. Despite complete sequence determination of the human genome, defining genes and gene products remains a challenge. Here, we apply tandem mass spectroscopy as new source of unbiased data to interrogate genomic sequence and identify novel protein coding sequences. A six-frame translation of the Human genome was used as the query database to search for novel blood proteins in the data from the HUPO PPP. Significance is assessed using a Poisson statistical model incorporating the length of the matching sequence and the frequency of spectrum matches observed in searching the database [Nat Biotech 2006 24(3):333–8]. Matches are binned by X!Tandem hyperscore, and statistics for each score class are considered independently. The overall probability that the matches to an ORF occurred at random is calculated as the product of the probability that the matches in each score category occurred at random. The expected number of random matches, E, is calculated as the product of the probability that an ORF match occurred at random multiplied by the number of ORFs searched. The confidence in an ORF identification is 1/(1+E). An open reading frame is considered significant if confidence is greater than 95%. Expanding recently published work [
[This work was supported in part by grants R01LM008106, U54DA021519, P41RR018627, and MTTC6887.]
Disclosure: No relevant conflicts of interest to declare.
Author notes
Corresponding author