Abstract
Protein C is an important endogenous anticoagulant in hemostasis. Deficiencies of protein C due to genetic mutations or a low level of circulating protein C increase the risk of venous thromboembolism. We performed a genome-wide association scan for plasma protein C antigen concentration with approximately 2.5 million single-nucleotide polymorphisms in 8048 individuals of European ancestry and a replication analysis in a separate sample of 1376 individuals in the Atherosclerosis Risk in Communities Study. Four independent loci from 3 regions were identified with genome-wide significance: 2p23 (GCKR, best SNP rs1260326, P = 2.04 × 10−17), 2q13-q14 (PROC, rs1158867, P = 3.77 × 10−36), 20q11 (near and within PROCR, rs8119351, P = 2.68 × 10−203), and 20q11.22 (EDEM2, rs6120849, P = 7.19 × 10−37 and 5.23 × 10−17 before and after conditional analysis, respectively). All 4 loci replicated in the independent sample. Furthermore, pooling the discovery and replication sets yielded an additional locus at chromosome 7q11.23 (BAZ1B, rs17145713, P = 2.83 × 10−8). The regions marked by GCKR, EDEM2, and BAZ1B are novel loci that have not been previously reported for association with protein C concentration. In summary, this first genome-wide scan for circulating protein C concentration identified both new and known loci in the general population. These findings may improve the understanding of physiologic mechanisms in protein C regulation.
Introduction
Protein C, a vitamin K–dependent plasma glycoprotein synthesized in the liver, is one of the most important endogenous anticoagulants.1 Upon activation by the thrombin-thrombomodulin complex, it inactivates factor Va and FVIIIa and thus reduces the coagulation reaction and consequently formation of thrombi. Hereditary protein C deficiencies, characterized by reduction of protein C antigen/activity due to rare genetic mutations, contribute to familial venous thrombosis.2-5 In the general population, a low level of circulating protein C as well as common variants in the protein C gene are associated with increased risk of venous thromboembolism.6-9 Activated protein C also exerts other physiologic effects including anti-inflammatory and antiapoptotic properties and endothelial barrier stabilization.10 Treatment with activated protein C is effective for patients with severe sepsis and acute organ dysfunction.10 Plasma levels of protein C are influenced by genetic factors, with a heritability of 0.36 and 0.50 in Spanish and Mexican-American families, respectively.11,12 To date, only a few candidate genes studies of protein C focusing on a few variants have been reported.9,13-15 A comprehensive investigation of genomic variants influencing protein C is not available in the literature. We performed a genome-wide association (GWA) scan for plasma protein C concentration with approximately 2.5 million single-nucleotide polymorphisms (SNPs), based on the data from a large population of individuals of European ancestry in the Atherosclerosis Risk in Communities (ARIC) study.
Methods
Study population and phenotype measurement
The ARIC study includes a longitudinal epidemiologic cohort recruiting by probability sampling 15 792 African American and European American adults aged 45 to 64 years in 1987 through 1989 from Forsyth County, NC; Jackson, MS; suburbs of Minneapolis, MN; and Washington County, MD.16 Participants of European ancestry, by self-report, were recruited from the 3 field centers not including Jackson. The Jackson center recruited only African Americans. Three follow-up exams and hospital and death surveillance were conducted to ascertain the development of cardiovascular diseases. The ARIC study was approved by the institutional review board of each field center institutes and participants gave informed consent in accordance with the Declaration of Helsinki.
Baseline measures of demographic and clinical characteristics, including anthropometry, lifestyle variables, medical history, and medication use, were collected by standardized protocols during a home interview and clinical examination in which fasting blood was drawn. Aliquots of citrated plasma were obtained by centrifugation at 4°C and stored at −70°C for protein C measurement within a few weeks. Protein C antigen was measured by commercial enzyme-linked immunosorbent assay (ELISA) kits (Asserachrom Protein C, Diagnostica Stago) at a central laboratory. The coefficient of variation was 12%; the reliability coefficient (between-subject variance divided by total variance) obtained from repeated testing of individuals over several weeks was 0.56.17 DNA samples were extracted from blood samples and consent was obtained for genetic testing.
Genotyping and imputation
Details on genotyping, quality control, and imputation have been described elsewhere.18 In brief, genome-wide SNPs were measured using Affymetrix SNP array 6.0 in an initial set of 8861 participants of European ancestry. Individuals were excluded based on the following criteria: (1) self-reported sex mismatched with genotypic sex; (2) substantial genotype discordance with previous reference panel, (3) all but one in each set of suspected first-degree relatives based on genome-wide genotype data, or (4) genetic outliers using a principal components approach as calculated by EIGENSTRAT,19 resulting in a sample of 8127 genotyped individuals of European ancestry. Of these, 8052 had protein C measures available and were not using a coumarin-based anticoagulant at the time of protein C measurement (ie, baseline), constituting the GWA scan discovery set in this study. SNPs were screened for call rates < 90%, minor allele frequencies (MAF) ≤ 1%, or Hardy-Weinberg (HW) equilibrium P < 10−6, resulting in 602 642 variants for inclusion in the imputation. Imputation was performed with the use of the phased data from the haplotype map for Centre d'Etude du Polymorphisme Humain samples of Utah residents with ancestry from Northern and Western Europe (HapMap-CEU) human genome release 21 (build 35) and the program MACH Version 1.00.16 (http://www.sph.umich.edu/csg/abecasis/MACH/download/).20 Imputation quality for each SNP was reflected by the ratio of empirically observed variance to the expected binomial variance of the allele dosage at HW equilibrium.21 In addition to the above quality control screens, we excluded from the analysis SNPs with imputation quality score < 0.3 or MAF ≤ 1%, resulting in a total of 2 461 269 SNPs in the analysis for protein C concentration. Physical positions for SNPs were mapped to the HapMap build 36.
In silico replication was conducted in an additional sample of 1376 ARIC participants of European ancestry who were genotyped with the same Affymetrix array in a second set at a later time and not on anticoagulant treatment at baseline. This set finished the genotyping task of the whole ARIC population including any reruns that were necessary. The 2 sets were not selected based on any phenotype characteristics, but rather, on convenience related to DNA readiness. Quality control screens and imputation were conducted with similar procedures in both sets.
Statistical analysis
Untransformed protein C values were analyzed. Four participants with values > 5.5 standard deviations from the mean were excluded, resulting in 8048 participants in the GWA scan discovery set. The distribution of protein C was approximately normal (skewness = 0.61, kurtosis = 0.89). The genetic association analysis was conducted in a linear regression model with ProbABEL v.0.1-0,22 which uses “allele dosage” for each SNP as a predictor assuming an additive genetic effect (http://mga.bionet.nsc.ru/∼yurii/ABEL/). The analysis was adjusted for age, gender and field center to reduce nongenetic variation in the distribution of protein C levels. A linear relationship was assumed between age and protein C and this assumption held for the ARIC data. Age and gender were significantly associated with protein C (P < .0001) and explained approximately 4% of its variation. The a priori threshold of P < 5.0 × 10−8 was used to judge genome-wide significance for SNP associations. When more than 1 SNP clustered at a region, we conducted conditional analyses to additionally adjust for the top SNP from that region; if there remained significant signals after the conditional analysis, the top SNP after the first adjustment was added to the model until there were no significant signals. In addition, linkage disequilibrium (LD) between SNPs, represented by r2, was used to evaluate the independence of associations from a region. Independent SNPs identified from the GWA scan discovery set were tested for replication in the additional sample using the same analytic approach. Finally, the program FASTSNP (http://fastsnp.ibms.sinica.edu.tw/pages/input_CandidateGeneSearch.jsp) was used to predict the impacts of the identified variants on the structure and function of proteins coded by the corresponding variants.23
Results
Table 1 presents selected demographic and lifestyle characteristics for the GWA scan discovery and replication samples. Manhattan and quantile-quantile (Q-Q) plots of P value distribution from the GWA scan are shown in Figure 1 and supplemental Figure 1 (available on the Blood Web site; see the Supplemental Materials link at the top of the online article), respectively. The genomic inflation lambda coefficient was 1.04, suggesting negligible test statistic inflation by potential population stratification or other technical factors. A total of 504 SNPs from multiple genes exceeded the genome-wide significance threshold of 5 × 10−8 and marked 3 regions: chromosomes 2p23 (spanning 204 000 bp), 2q13-q14 (spanning 448 000 bp), and 20q11 (spanning 3.3 million bp). Detailed association results for the 504 SNPs are presented in supplemental Table 1. Details of the top SNP associations at the 3 regions are presented in Table 2.
Characteristic . | GWA scan discovery sample . | Replication sample . |
---|---|---|
N | 8048 | 1376 |
Age, years | 54.3 ± 5.7 | 54.3 ± 5.7 |
Female, % | 52.9 | 53.2 |
Body mass index, kg/m2 | 27.0 | 26.7 |
Prevalent coronary heart disease, % | 5.0 | 4.3 |
Hypertension*, % | 27.0 | 24.9 |
Diabetes†, % | 8.5 | 9.5 |
Current smoker, % | 25.1 | 21.8 |
Current alcohol drinker, % | 66.3 | 61.7 |
Total cholesterol, mg/dL | 214.7 ± 40.7 | 215.7 ± 41.6 |
Triglycerides, mg/dL | 137.1 ± 91.6 | 137.8 ± 94.9 |
Protein C, μg/mL | 3.2 ± 0.6 | 3.2 ± 0.6 |
Median protein C (IQR), μg/mL | 3.1 (2.8-3.5) | 3.1 (2.8-3.6) |
Characteristic . | GWA scan discovery sample . | Replication sample . |
---|---|---|
N | 8048 | 1376 |
Age, years | 54.3 ± 5.7 | 54.3 ± 5.7 |
Female, % | 52.9 | 53.2 |
Body mass index, kg/m2 | 27.0 | 26.7 |
Prevalent coronary heart disease, % | 5.0 | 4.3 |
Hypertension*, % | 27.0 | 24.9 |
Diabetes†, % | 8.5 | 9.5 |
Current smoker, % | 25.1 | 21.8 |
Current alcohol drinker, % | 66.3 | 61.7 |
Total cholesterol, mg/dL | 214.7 ± 40.7 | 215.7 ± 41.6 |
Triglycerides, mg/dL | 137.1 ± 91.6 | 137.8 ± 94.9 |
Protein C, μg/mL | 3.2 ± 0.6 | 3.2 ± 0.6 |
Median protein C (IQR), μg/mL | 3.1 (2.8-3.5) | 3.1 (2.8-3.6) |
Data are stated as mean ± SD or percentage, unless otherwise stated.
Hypertension was defined based on systolic blood pressure ≥ 140 mm Hg, diastolic blood pressure ≥ 90 mm Hg, or treatment for hypertension.
Diabetes defined as fasting glucose ≥ 126 mg/dL, nonfasting glucose ≥ 200 mg/dL, self-reported physician diagnosis of diabetes, or treatment for diabetes.
SNP . | Position . | Region . | Gene . | Function . | A1/A2 . | AFA1 . | GWA scan discovery . | Replication . | Imput . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
β/SE . | P . | Var% . | β/SE . | P . | ||||||||
rs1260326 | 27584432 | 2p23 | GCKR | cns | C/T | 0.59 | 0.082/0.010 | 2.04 × 10−17 | 0.85 | 0.059/0.023 | .010 | 0.98 |
rs1158867 | 127893824 | 2q13-q14 | PROC | intron | T/C | 0.58 | −0.123/0.010 | 3.77 × 10−36 | 1.94 | −0.154/ 0.023 | 7.83 × 10−11 | 0.94 |
rs1799810 | 127892480 | 2q13-q14 | PROC | utr | A/T | 0.58 | −0.123/0.010 | 4.35 × 10−36 | 1.93 | −0.154/0.023 | 7.83 × 10−11 | 0.94 |
rs8119351 | 33218064 | 20q11 | Interg | – | G/A | 0.90 | 0.480/0.015 | 2.68 × 10−203* | 10.9 | 0.492/0.035 | 3.42 × 10−41 | 0.99 |
rs867186 | 33228208 | 20q11.2 | PROCR | cns | T/C | 0.90 | 0.468/0.015 | 2.00 × 10−200* | 10.4 | 0.491/0.035 | 3.02 × 10−41 | – |
rs6120849 | 33194048 | 20q11.22 | EDEM2 | intron | C/T | 0.77 | −0.141/0.011 | 7.19 × 10−37† | 1.85 | −0.121/0.027 | 6.70 × 10−6 | – |
rs17145713 | 72542746 | 7q11.23 | BAZ1B | intron | C/T | 0.80 | −0.062/0.012 | 2.50 × 10−7‡ | 0.33 | −0.079/0.029 | .007 | 0.99 |
SNP . | Position . | Region . | Gene . | Function . | A1/A2 . | AFA1 . | GWA scan discovery . | Replication . | Imput . | |||
---|---|---|---|---|---|---|---|---|---|---|---|---|
β/SE . | P . | Var% . | β/SE . | P . | ||||||||
rs1260326 | 27584432 | 2p23 | GCKR | cns | C/T | 0.59 | 0.082/0.010 | 2.04 × 10−17 | 0.85 | 0.059/0.023 | .010 | 0.98 |
rs1158867 | 127893824 | 2q13-q14 | PROC | intron | T/C | 0.58 | −0.123/0.010 | 3.77 × 10−36 | 1.94 | −0.154/ 0.023 | 7.83 × 10−11 | 0.94 |
rs1799810 | 127892480 | 2q13-q14 | PROC | utr | A/T | 0.58 | −0.123/0.010 | 4.35 × 10−36 | 1.93 | −0.154/0.023 | 7.83 × 10−11 | 0.94 |
rs8119351 | 33218064 | 20q11 | Interg | – | G/A | 0.90 | 0.480/0.015 | 2.68 × 10−203* | 10.9 | 0.492/0.035 | 3.42 × 10−41 | 0.99 |
rs867186 | 33228208 | 20q11.2 | PROCR | cns | T/C | 0.90 | 0.468/0.015 | 2.00 × 10−200* | 10.4 | 0.491/0.035 | 3.02 × 10−41 | – |
rs6120849 | 33194048 | 20q11.22 | EDEM2 | intron | C/T | 0.77 | −0.141/0.011 | 7.19 × 10−37† | 1.85 | −0.121/0.027 | 6.70 × 10−6 | – |
rs17145713 | 72542746 | 7q11.23 | BAZ1B | intron | C/T | 0.80 | −0.062/0.012 | 2.50 × 10−7‡ | 0.33 | −0.079/0.029 | .007 | 0.99 |
A1 indicates allele 1 (major allele); A2, allele 2 (minor allele); AFA1, allele frequency for A1; β, change in protein C level per 1-allele increase in the minor allele for both GWA scan and replication analyses; SE, standard error; Var%, percentage of variance explained by the SNP; imput, ratio of observed to expected variance as a measure of imputation quality (– for genotyped SNPs); cns, coding-nonsynonymous; utr, within an exon but not translated; and interg, intergenic.
r2 = 1.0 between rs8119351 and rs867186 in HapMap Ceu.
P = 5.23 × 10−17 after adjustment for rs867186, r2 = 0.022 for both rs6120849-rs8119351 and rs6120849-rs867186 in HapMap-Ceu.
P = 2.83 × 10−8 in the pooled GWA scan of discovery and replication sets.
Twenty-eight SNPs at the 2p23 region, covering 5 genes (supplemental Figure 2), were associated with plasma protein C levels at P < 5 × 10−8. The strongest signal was observed for rs1260326, a coding-nonsynonymous SNP in exon 1 of the glucokinase (hexokinase 4) regulatory protein (GCKR or GKRP) gene encoding a leucine to proline substitution (P446L). Each copy of the minor T allele was associated with a 0.082 μg/mL greater plasma protein C concentration (P = 2.04 × 10−17, 0.85% variance explained; Table 2). Adjustment for rs1260326 abolished the associations for the remaining 27 SNPs (smallest P > .05). The signal for rs1260326 was replicated in the replication sample (P = .010; Table 2).
At the 2q13-q14 region, 112 SNPs reached the genome-wide significance and covered 6 genes (supplemental Figure 3). Of the 112 SNPs, 2 are coding-synonymous and none are nonsynonymous. The strongest association was observed for a locus marked by SNP rs1158867 (Table 2), which is intronic to the protein C structure gene (PROC). Each copy of the minor C allele was associated with a 0.123 μg/mL lower plasma protein C concentration (P = 3.77 × 10−36, 1.94% variance explained; Table 2). Another SNP rs1799810, located within an exon of PROC but not translated, showed similar signal as rs1158867 (β = −0.123, P = 4.35 × 10−36, 1.93% variance explained). This SNP is in high LD with rs1158867 (r2 = 0.85 and 0.99 in HapMap-CEU and ARIC, respectively). After adjusting for rs1158867, none of the remaining 111 SNPs was significant at the genome-wide level (smallest adjusted P = .03). The associations for both rs1158867 and rs1799810 were strongly replicated in the additional sample (Table 2).
At the 20q11 region, 364 SNPs covering 40 genes exceeded the genome-wide significance threshold of P < 5 × 10−8 (supplemental Figure 4). Of the 364 SNPs, the top 4 are located within a 0.8 kb window and showed similar signals: rs8119351 (intergenic, P = 2.68 × 10−203; Table 2), rs2069940 (near 5′ of protein C receptor (PROCR) or endothelial protein C receptor (EPCR) gene, P = 1.24 × 10−201), rs867186 (coding-nonsynonymous in PROCR, S219G substitution, P = 2.00 × 10−200; Table 2), and rs11167260 (intergenic, P = 6.78 × 10−202). The missense variant rs867186 is in high LD with the other 3 SNPs (r2 = 1.0 and 0.95 in HapMap-CEU and ARIC, respectively). Therefore, the signals represented by the 4 SNPs may be attributable to the single signal from rs867186. This SNP was associated with a 0.468 μg/mL higher plasma protein C level per minor C allele and explained 10.4% of its variance (Table 2). In conditional analysis adjusting for rs867186, 37 SNPs (covering 5 genes) remained statistically significant at P < 5 × 10−8 (supplemental Table 2, supplemental Figure 5). Thirty-four of these SNPs were not in LD with rs867186 (r2 < 0.05 in HapMap-CEU); the other 3 were in low LD (r2: 0.08-0.32 in HapMap-CEU). Of the 37 SNPs, the strongest signal in the conditional analysis was observed for rs6120849 associated with 0.141 (P = 7.19 × 10−37, 1.85% variance explained) and 0.089 μg/mL (P = 5.23 × 10−17, 0.74% variance explained) lower protein C level per minor T allele before and after the adjustment, respectively. This SNPs is intronic to the endoplasmic reticulum (ER) degradation enhancer, mannosidase alpha-like 2 (EDEM2) gene and not linked with rs867186 (r2 = 0.022 and 0.029 in HapMap-CEU and ARIC, respectively). Notably, rs6120849 is in a moderate LD (r2 = 0.54) with a missense mutation in EDEM2: rs3746429 (T456A substitution). Rs3746429 was also significantly associated with protein C concentration (P = 1.25 × 10−27 and 3.481 × 10−13 before and after adjusting for rs867186). Including both rs867186 and rs6120849 as covariates in the analyses yielded no further signals at the genome-wide significant level (smallest adjusted P = .000012). Replacing rs867186 by rs8119351 in the above conditional analysis yielded similar results with a minor change in the SNP ranking: the second top SNP rs6060266, another intronic variant in EDEM2, became the top one and rs6120849 moved to the 8th. Rs6120849, rs6060266, rs3746429, and the top 4 SNPs within or near the PROCR gene were strongly replicated in the additional sample (P < 10−5; Table 2 shown for rs8119351, rs867186, and rs6120849). The signals for the 3 SNPs in EDEM2 remained significant in the replication analysis after additional adjusting for rs867186 (P = .04, .04, and .005 for rs6120849, rs6060266, and rs3746429, respectively).
Furthermore, a GWA scan based on the pooled discovery and replication sets yielded an additional locus at chromosome 7q11.23. Seven SNPs from this region reached the genome-wide significance (supplemental Figure 6, supplemental Table 1). Of the 7, 5 are in the bromodomain adjacent to zinc finger domain 1B (BAZ1B) gene and 2 intergenic. The top 2 SNPs, rs17145713 and rs1178977, are intronic variants in BAZ1B: β ± standard error = −0.063 ± 0.011, P = 2.83 × 10−8, 0.33% variance explained for both SNPs. All the 7 SNPs showed suggestive signal in the GWA scan of the discovery set (P < 2 × 10−6) and replicated in the additional sample (P < .007, rs17145713 shown in Table 2).
Discussion
To the best of our knowledge, this is the first report of a GWA scan for plasma protein C levels in European Americans, based on a GWA scan discovery set of 8048 subjects and an independent replication sample of 1376 subjects. We identified genome-wide significant signals from novel loci (GCKR, EDEM2, and BAZ1B) as well as from candidate genes known to play a role in protein C regulation (PROC and PROCR). Of the 5 independent associations, 4 were replicated in the 1376 ARIC participants not included in the GWA scan discovery set.
The first novel locus is the region marked by the variants from the GCKR gene. GCKR has not been previously implicated in the regulation of protein C, nor have its genetic variants been associated with plasma protein C levels. Interestingly, rs1260326 (P446L), the top SNP in the GCKR gene for protein C levels, has previously been associated with circulating levels of C-reactive protein (CRP),24 triglycerides,25 fasting glucose,25 and factor VII (FVII) antigen/activity26 from other GWA scan reports. The minor allele T, associated with a higher plasma protein C level, was associated with higher levels of CRP,24 triglycerides,25 FVII antigen/activity,26 and lower fasting glucose.25 FASTSNP predicted the rs1260326 variant to break the exonic splicing site with moderate to high risk. A study of 21 Spanish extended families reported significant genetic correlation (0.42) between plasma levels of protein C and FVII.27 Therefore, the common associations of rs1260326 with FVII and protein C could be due to pleiotropic effects of the GCKR gene. The protein encoded by GCKR inhibits glucokinase (hexokinase 4) in liver and pancreatic islet cells. It may also serve as an anchor to sequester glucokinase in the hepatocyte nucleus under fasting conditions, which provides a protective mechanism for glucokinase degradation.28 In a GCKR knock-out mouse, there was a loss of both glucokinase protein and activity in the hepatocytes of the mutant mouse, possibly due to the disruption of nucleus sequestration.28 Glucokinase catalyzes the initial step in utilization of glucose by the pancreatic β cell and liver, providing glucose-6-phosphate for the synthesis of glycogen. Both protein C and FVII are vitamin K-dependent glycoproteins synthesized in the liver, and bear substantial sequence and structural homology.1,27,29 One of the key posttranslational modifications for protein C and FVII is glycosylation at several residuals, which requires glucose.30,31 We speculate that GCKR may exert its pleiotropic influence on protein C and FVII by modulating the use of glucose by liver during the glycosylation process.
Variants from EDEM2 marked the second novel locus for plasma protein C levels. This locus emerged from conditional analysis after adjusting for rs867186, a missense variant in PROCR and one of the top SNPs at this region. Rs867186 was directly genotyped in ARIC, is in HW equilibrium, and had similar MAF as in HapMap-CEU and other populations of European ancestry.15,26 Therefore, residual signal due to genotyping error in rs867186 is an unlikely explanation for the remaining associations at this region. Moreover, adjustment for rs8119351, which was tightly linked with rs867186 and excellently imputed (imputation quality score = 0.99), yielded similar results. Search in FASTSNP for the top SNP in EDEM2 (rs6120849) returned with “no known function” while rs3746429, the T456A substitution that was in moderate LD with rs6120849, was predicted as a conservative missense variant involved in splicing regulation. The EDEM2 gene has not been reported previously as a candidate gene for protein C level. The protein product encoded by EDEM2 is a member of the EDEM family involved in ER-associated degradation (ERAD) of glycoproteins in which misfolded glycoproteins are retrotranslocated from ER to the cytosol and degraded by the proteasome.32,33 Up-regulation of EDEM2 accelerates the ERAD of terminally misfolded glycoproteins.33 In Chinese hamster ovary cells transfected with protein C mutants, cotransfection of EDEM accelerated the degradation of glycosylated protein C.29 Therefore, it is possible that EDEM2 may influence protein C levels by modulating its degradation.
Variants from BAZ1B marked the third novel locus that reached the genome-wide significance in the combined analysis of the discovery and replication sets. The gene product BAZ1B is an enzyme that plays a central role in chromatin remodeling and is also involved in the modulation of transcription. This enzyme has not yet been previously related to protein C and it might influence protein C levels by regulating its transcription. Interestingly, the top SNP in this gene (rs17145713) has previously been associated with triglyceride levels,34 suggesting the possibility of a pleiotropic effect. Nevertheless, the signals detected at the BAZ1B region need to be replicated in independent populations.
The top PROC SNPs identified from our study, rs1158867 and rs1799810, have not been reported previously in other genetic studies of plasma protein C levels. However, another 2 SNPs, rs1799808 and rs1799809, which are 5′ near the PROC gene, were previously associated with plasma protein C levels.9 In our study, rs1799808 was associated with protein C to a lesser extent (β = −0.085, P = 6.03 × 10−17) than rs1158867; the signal for this SNP was no longer significant after adjustment for rs1158867 (adjusted P = .51). There was modest LD between rs1158867 and rs1799808 (r2 = 0.26 in HapMap-CEU). The other reported SNP rs1799809 was not included in our GWA scan dataset, but has been found in high LD with the top 2 SNPs of our study (r2 = 0.95 with both rs1799810 and rs1158867 in HapMap-CEU). It is unknown whether rs1799809 or the top SNPs identified in our study was responsible for the observed associations at the PROC region. In FASTSNP, the 2 SNPs (rs1799808 and rs1799809) reported by other studies were predicted to have “no known function” while rs1158867 and rs1799810 identified in our study were predicted to break a consensus splicing site sequence with moderate risk.
Of the top 4 SNPs that showed similar signals at the 20q11 region, rs867186 is the only functional variant, resulting in a serine to glycine substitution at position 219 of the PROCR (ie, EPCR) protein (ie, S219G). This SNP explained 10.4% of variation in plasma protein C level. Because this variant is tightly linked with the other 3 top SNPs, it is possible that the signals showed by the first independent locus at this region are mainly driven by a single signal from rs867186. FASTSNP predicted the S219G change to be missense conservative with similar protein structure characteristics, or a splicing regulation with low to moderate risk. The association for rs867186 agreed with a previous report in which rs867186 was significantly associated with plasma protein C in 336 European-Americans, explaining 13% of its phenotypic variation.15 Interestingly, in that study the same allele that increased plasma protein C level was also strongly and positively associated with plasma levels of soluble EPCR, explaining 75% of its phenotypic variation.15 EPCR serves as a receptor for activated protein C and further enhances its activation. It was speculated that the PROCR S219G associates with plasma protein C level because soluble EPCR might be able to stabilize circulating protein C by binding to it15 ; another possibility is that increased shedding of the EPCR from the endothelial surface due to the influence of this variant results in less cell-bound EPCR to bind protein C, leading to higher levels of protein C in the circulation. More interestingly, rs867186 was also associated with plasma FVII antigen and activity in 2 other studies.26,35
In conclusion, we report the first GWA study for plasma protein C level in a large sample of European Americans. We identified 5 independent loci associated with plasma levels of protein C, marked by GCKR, EDEM2, BAZ1B, PROC, and PROCR. Variants in GCKR, EDEM2, and BAZ1B are newly identified loci that have not been reported previously for association with protein C. Moreover, the top SNPs in GCKR and PROCR were also reported for FVII antigen/activity in other studies, suggesting pleiotropic effects. These findings provide a greater understanding of physiologic mechanisms in protein C regulation, potentially improving the prevention and treatment of disorders in which protein C deficiency is implicated.
The online version of this article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
We thank the University of Minnesota Supercomputing Institute for use of the blade supercomputers. The authors thank the staff and participants of the ARIC study for their important contributions.
This work was supported by National Heart, Lung, and Blood Institute contracts N01-HC-55015, N01-HC-55016, N01-HC-55018, N01-HC-55019, N01-HC-55020, N01-HC-55021, and N01-HC-55022, and grants R01-HL-087641, R01-HL-59367, and R01-HL-086694; National Human Genome Research Institute contract U01-HG-004402; and the National Institutes of Health (NIH) contract HHSN268200625226C. The infrastructure was partly supported by grant UL1-RR-025005, a component of the NIH and NIH Roadmap for Medical Research. The Longitudinal Investigation of Thromboembolism Etiology was funded by grant R01-HL59367. Part of the work was supported by grant R01-HL095603.
National Institutes of Health
Authorship
Contribution: W.T., M.C., E.B., and A.R.F. designed the research; N.A., E.B., and A.R.F. collected the data; W.T., S.B., X.K., J.S.P., and A.T. analyzed and interpreted the data; W.T. wrote the manuscript; and all authors edited the manuscript for scientific content.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Weihong Tang, Division of Epidemiology and Community Health, School of Public Health, University of Minnesota, 1300 South Second St, WBOB 300, Minneapolis, MN 55454; e-mail: tang0097@umn.edu.