Key Points
Genome-wide association analyses revealed common DNA variants in PLG, LPA, and near SIGLEC14 that contribute to plasma plasminogen level variation.
Tobacco smoking and female sex were associated with higher levels of plasminogen.
Abstract
Plasminogen is the precursor of the serine protease plasmin, a central enzyme of the fibrinolytic system. Plasma levels of plasminogen vary by almost 2-fold among healthy individuals, yet little is known about its heritability or genetic determinants in the general population. In order to identify genetic factors affecting the natural variation of plasminogen levels, we performed a genome-wide association study and linkage analysis in a sample of 3456 young healthy individuals who participated in the Genes and Blood Clotting Study (GABC) or the Trinity Student Study (TSS). Heritability of plasminogen levels was 48.1% to 60.0%. Tobacco smoking and female sex were associated with higher levels of plasminogen. In the meta-analysis, 11 single-nucleotide polymorphisms (SNPs) in 2 regions reached genome-wide significance (P < 5.0E-8). Of these, 9 SNPs were near the PLG or LPA genes on Chr6q26, whereas 2 were on Chr19q13 and 5′ upstream of SIGLEC14. These 11 SNPs represented 4 independent signals and collectively explained 6.8% of plasminogen level variation in the study populations. The strongest association was observed for a nonsynonymous SNP in the PLG gene (R523W). Individuals bearing an additional copy of this allele had an average decrease of 13.4% in plasma plasminogen level.
Introduction
In circulating blood, the fibrinolytic system limits the extent of blood coagulation through the regulated conversion of plasminogen (PLG) to plasmin on the surface of the blood clot. PLG binds to exposed lysine residues on fibrin(ogen) through its kringle domains and undergoes activation to plasmin by tissue plasminogen activator (tPA) or urokinase. Plasmin then degrades fibrin into soluble forms, including the D-dimer, a clinically relevant marker of fibrinolytic activity. Plasminogen activator inhibitor-1 (PAI-1) binds to tPA and inhibits tPA’s ability to activate PLG. The predominant form of PLG in circulation has a glutamic acid at its amino terminus and has a plasma half-life of 2.2 days.1 Homozygous deficiency of PLG is associated with ligneous conjunctivitis,2 though venous thromboembolic disease is surprisingly not a uniform feature of this disorder.3 Other studies have suggested that elevated PLG levels are associated with coronary heart disease,4 possibly through the promotion of foam cell formation.5 In the extravascular space, PLG has many other functions, including roles in angiogenesis, inflammation, and tissue remodeling.6,7
PLG levels vary approximately 2-fold among healthy individuals,8 and this variation is influenced by inherited factors. A genome-wide linkage study of PLG levels in 629 individuals in 26 Mexican American families estimated the heritability of PLG to be 43%9 and identified a region on chromosome 12 (Chr12) with suggestive linkage (LOD = 2.73). Other components of the fibrinolytic system are also affected by genetic factors. Heritability of PAI-1 and tPA levels in several twin studies was estimated to be 42% to 71% for PAI-1 and 43% to 62% for tPA.10 A genome-wide meta-analysis of plasma PAI-1 levels in ∼30 000 individuals11 identified previously described variants in SERPINE1 (the PAI-1 gene) as well as novel variants in ARNTL (aryl hydrocarbon receptor nuclear translocator-like) and PPARG (peroxisome proliferative activated receptor) that together explained up to 3.7% of the variation in PAI-1 levels. A subsequent meta-analysis of tPA levels identified variants in STXBP5 (syntaxin binding protein 5), STX2 (syntaxin 2), and PLAT (tPA) that accounted for 0.75% of the variance in tPA levels.12 The PLG gene (PLG) is highly homologous to an adjacent gene encoding apolipoprotein(a) (LPA) on 6q25.3-q26. A genome-wide association study (GWAS) of plasma lipoprotein(a) [Lp(a)], which is composed in part of apolipoprotein(a), identified several common variants on 6q25.3-q26, including single-nucleotide polymorphisms (SNPs) in LPA, PLG, and PARK2.13 However, no GWAS for plasma PLG levels has been reported.
Because of PLG’s important role in a variety of cellular functions and fibrinolysis, we analyzed samples from the Genes and Blood Clotting Study (GABC; N = 1152)12 and the Trinity Student Study (TSS; N = 2304)13 in order to identify genetic variants contributing to plasma PLG variation. These cohorts had a narrow age range (14-35 years) and were generally healthy, which should have minimized the confounding effects of atherosclerosis, acute inflammation, or treatment on PLG levels. They were also characterized for many potential covariates such as body weight, oral contraceptive use, and tobacco smoking. Additionally, the GABC was a sibling cohort, which, along with the smaller number of siblings in the TSS, allowed for the use of both genome-wide association and linkage studies.
Materials and methods
TSS
A cohort of 2507 healthy and ethnically Irish individuals between 18 and 28 years old attending University of Dublin, Trinity College was recruited over the 2003-2004 academic year.14,15 Ethical approval was obtained from the Dublin Federated Hospitals Research Ethics Committee affiliated with the University of Dublin, Trinity College. The study was reviewed by the Office for Human Research Protections at the United States National Institutes of Health. Written informed consent was obtained from participants upon enrollment, in accordance with the Declaration of Helsinki.
GABC
A cohort of 1189 healthy individuals representing 507 sibships between 14 and 35 years old was collected between June 26, 2006 and January 30, 2009 at the University of Michigan, Ann Arbor. Subjects with acute or chronic disease or those who were pregnant were excluded. Study participants agreed to an online informed consent.16
Genotyping, phenotyping, and data processing
DNA samples from both the GABC and TSS cohorts were genotyped using the Illumina HumanOmni1-Quad_v1-0_B array. All samples were anonymized prior to genetic analyses. The final data included 755 451 SNPs (call rate >97%, per-SNP call rate >97%, minor allele frequency [MAF] ≥ 0.01 and P ≥ 1E-6 in test for deviation from Hardy-Weinberg equilibrium) for 2304 TSS subjects and 783 836 SNPs for 940 GABC European subjects. PLG levels were determined by AlphaLISA (Perkin-Elmer, Waltham, MA) based on the manufacturer’s guidelines from platelet-poor plasma. Further details about the genotyping, data cleaning, PLG antigen measurement, and phenotype data processing are provided in supplemental Methods (available at the Blood Web site).
Heritability estimation using SNP genotypes and pedigree data
For all 2304 TSS and 1152 GABC individuals, the proportions of variance in PLG levels explained by all genotyped SNPs, the top associated SNPs, or selected chromosomes were estimated using SNP-derived genetic relationships and restricted maximum likelihood method by Genome-wide Complex Trait Analysis (GCTA) version 1.20.17 In addition, for the 557 sibships (138 TSS and 1139 GABC sibs), 2 pedigree-based methods were applied to estimate the narrow-sense heritability: intraclass correlation for sibpairs using the irr package in R and pedigree-wide regression analysis using MERLIN-REGRESS version 1.1.2.18
Association analyses
Single-SNP quantitative trait association analysis for the adjusted PLG level was performed in TSS and the European subset of GABC using PLINK version 1.07,19 assuming an additive mode of allelic effect while treating all samples as unrelated. Then, to assess the impact of sibling structure, 2 approaches that consider subject relatedness in association tests were applied. The first approach used a variance component model implemented in Efficient Mixed-Model Association eXpedited (EMMAX).20 The second approach employed a linear mixed-effects model implemented in R package Genome-wide Association analysis with Family data (GWAF) version 2.1.21 The genome-wide significance level was set at P = 5 × 10−8 based on Bonferroni correction for 1 million independent tests.
Meta-analysis of TSS and GABC
Meta-analysis was carried out using a fixed effect, sample-size–weighted approach implemented in METAL22 using EMMAX association results for TSS and the European subset of GABC from a common set of 741 807 SNPs. The genomic control factors23 were corrected to 1.000 in meta-analysis by METAL. Regional plots of top associated SNPs were generated by LocusZoom.24
Results
Study cohorts, plasminogen levels, and smoking status
Both the TSS cohort and GABC cohort consisted of healthy young adults. The characteristics of the TSS and GABC cohorts are summarized in Table 1.25 The median PLG levels were 101.7 IU/dL and 104.7 IU/dL for GABC and TSS, respectively. The 5th and 95th percentiles of raw PLG levels spanned a 1.9-fold range (77.2-144.7 IU/dL) for GABC and a 1.7-fold range (82.7-139.7 IU/dL) for TSS. The distributions of raw PLG levels were significantly different between GABC and TSS (Kolmogorov-Smirnov test, P = 3.7E-5; Mann-Whitney U test, P = 5.1E-4; supplemental Figure 1A-B) but more similar between GABC and TSS nonsmokers (supplemental Figure 1C), suggesting that the PLG level difference was mostly due to different proportions of smokers (4.6% in GABC and 31.4% in TSS). We accounted for relatedness in the GABC cohort by randomly selecting PLG levels for 1 individual from each family of GABC and comparing these to the PLG levels in TSS. After repeating this process for 500 iterations, the log-transformation; age-, sex-, and smoking-status–adjusted; and outlier-removed PLG levels were not significantly different between TSS and GABC (Kolmogorov-Smirnov test, mean P = .07; Mann-Whitney U test, mean P = .53).
Cohort . | TSS . | GABC . |
---|---|---|
Subject counts, N | 2304 | 1152* |
Age (Q1, Q3) | 22 (21, 24) | 21 (19, 23)* |
Female | 1,353 (58.7%) | 721 (62.6%) |
Weight, lb (Q1, Q3) | 148 (133, 167) | 145 (125, 165)* |
Height, inch (Q1, Q3) | 68 (65, 71) | 67 (64, 70) |
BMI, kg/m2 (Q1, Q3) | 22.6 (21.0, 24.5) | 22.5 (20.7, 25.0)* |
Current smoker, n/N | 723/2299 (31.4%) | 53/1151 (4.6%) |
Sibship size (n sibships) | 2 (66); 3 (2) | 2 (366); 3 (94); 4 (22); 5 (5); 6 (2)* |
Plasminogen, IU/dL (Q1, Q3) | 104.7 (94.6, 115.6) | 101.7 (91.5, 117.2) |
Cohort . | TSS . | GABC . |
---|---|---|
Subject counts, N | 2304 | 1152* |
Age (Q1, Q3) | 22 (21, 24) | 21 (19, 23)* |
Female | 1,353 (58.7%) | 721 (62.6%) |
Weight, lb (Q1, Q3) | 148 (133, 167) | 145 (125, 165)* |
Height, inch (Q1, Q3) | 68 (65, 71) | 67 (64, 70) |
BMI, kg/m2 (Q1, Q3) | 22.6 (21.0, 24.5) | 22.5 (20.7, 25.0)* |
Current smoker, n/N | 723/2299 (31.4%) | 53/1151 (4.6%) |
Sibship size (n sibships) | 2 (66); 3 (2) | 2 (366); 3 (94); 4 (22); 5 (5); 6 (2)* |
Plasminogen, IU/dL (Q1, Q3) | 104.7 (94.6, 115.6) | 101.7 (91.5, 117.2) |
Median values for age, weight, height, body mass index, and raw plasminogen levels are reported.
BMI, body mass index.
Previously published data, Desch et al.25
Heritability
The narrow-sense heritability (h2) of the adjusted PLG levels was 60.0% using intraclass correlation of TSS and GABC siblings and 59.3% by MERLIN-REGRESS. These pedigree-based values were similar to the estimates of 48.1% based on SNP genotyping data for all TSS and GABC individuals using GCTA (supplemental Table 1).
Genetic association and meta-analysis in TSS and GABC
GWASs were performed in TSS and GABC separately to identify single-SNP associations with adjusted PLG levels. The genomic inflation factor was 1.020 and 1.238 for TSS and GABC, respectively, using the standard single-marker test in PLINK. GWAS in TSS revealed 10 significantly associated SNPs (P < 5.0E-8) in an additive model (supplemental Figure 3A). All 10 SNPs reside in an 850-kb region on Chr6 containing the PLG, LPA, SLC22A3, and AGPAT4 genes (supplemental Table 2 and supplemental Figure 3B). The Q-Q plot of the observed versus expected −log10(P) demonstrated a deviation from expectation, almost entirely due to the significant signals on Chr6 (supplemental Figure 3C). The T allele of the top SNP, rs4252129, was associated with PLG with a β coefficient of −0.15 ± 0.014 (P = 5.0E-27), equivalent to a 14.6% decrease in PLG level per allele. When we applied EMMAX or GWAF to take family relatedness into account, the results showed strong consistency with those not considering relatedness (supplemental Figure 3D-E). The genomic inflation factor was 0.993 and 1.006 for TSS and GABC, respectively, for EMMAX-based association results.
Similar analysis of the European subset of GABC (n = 940) revealed no significantly associated SNP for PLG (supplemental Figure 4). However, the top 10 SNPs discovered in TSS showed similar allelic effect sizes and directions as in GABC (supplemental Figure 5), suggesting that the lower significance in GABC was mainly due to its smaller sample size than TSS.
To increase statistical power, a meta-analysis of the TSS and GABC cohorts using EMMAX association results for a common set of 741 807 SNPs was performed, revealing 11 SNPs significantly associated with PLG levels (P < 5.0E-8) (Figure 1A and Table 2). These SNPs collectively explained 6.8% of PLG level variation in the combined TSS and GABC cohorts. Of these, 9 were close to the PLG, LPA, and SLC22A3 genes on Chr6q26 (Figure 1B), whereas 2 were 5′-upstream of the SIGLEC14 genes on Chr19q13 (Figure 1C). Therefore, apart from the signals close to the structural PLG gene on Chr6, our meta-analysis identified a second signal of genome-wide significance on Chr19 that was only moderately significant in TSS or GABC alone (P < 5E-4 in both; Table 2). Meta-analysis using PLINK demonstrated comparable results (supplemental Figure 6). Meta-analysis of TSS and GABC using a common subset of 4.5 million imputed SNPs (supplemental Methods) revealed 44 significant SNPs, 38 on Chr6 surrounding PLG and LPA and 5 on Chr19 5′ upstream of SIGLEC14 (supplemental Figure 7 and supplemental Table 3), with no other region showing significant association. This confirmed the meta-analysis signals on PLG, LPA, and SIGLEC14 using the genotyped data above. The top SNP in the analysis using imputed data, rs4252129, was the same as the top SNP in the meta-analysis using genotyped data only.
Meta-Analysis of TSS 2,304 + GABC 940 (741 807 SNPs) . | TSS (N = 2304) . | GABC (n = 940) . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP . | Position* . | Region . | Gene . | Annotation . | P . | Dir . | A1 . | A2 . | MAF . | β (SE) . | P . | MAF . | β (SE) . | P . |
rs4252129 | 161072895 | 6q26 | PLG | Coding-nonsyn | 1.9E-26 | − | T | C | 0.02 | −0.159 (0.014) | 2.2E-27 | 0.014 | −0.100 (0.035) | 5.1E-03 |
rs1084651 | 161009807 | 6q26 | LPA | Intron | 1.6E-15 | − | A | G | 0.19 | −0.038 (0.005) | 2.5E-12 | 0.16 | −0.044 (0.011) | 1.2E-04 |
rs783149 | 161008908 | 6q26 | LPA | 5′ upstream | 1.6E-14 | − | A | C | 0.19 | −0.037 (0.005) | 1.6E-11 | 0.17 | −0.042 (0.011) | 1.9E-04 |
rs11751605 | 160883220 | 6q25.3 | LPA | Intron | 2.9E-12 | − | C | T | 0.17 | −0.042 (0.006) | 1.8E-13 | 0.13 | −0.018 (0.013) | .15 |
rs1247513 | 161200369 | 6q26 | PLG | 3′ downstream | 2.8E-11 | − | T | C | 0.17 | −0.040 (0.006) | 5.8E-13 | 0.17 | −0.012 (0.011) | .27 |
rs3120137 | 160691182 | 6q25.3 | SLC22A3 | Intron | 3.6E-10 | − | T | C | 0.17 | −0.036 (0.006) | 3.2E-10 | 0.13 | −0.022 (0.012) | .07 |
rs4252137 | 161074440 | 6q26 | PLG | Intron | 1.9E-09 | − | A | G | 0.04 | −0.071 (0.011) | 4.8E-11 | 0.03 | −0.020 (0.024) | .39 |
rs10412972 | 56846717 | 19q13.33 | SIGLEC14 | 5′ upstream | 1.1E-08 | ++ | A | G | 0.16 | 0.026 (0.006) | 8.1E-06 | 0.17 | 0.041 (0.011) | 2.7E-04 |
rs11084102 | 56854528 | 19q13.33 | SIGLEC14 | 5′ upstream | 2.9E-08 | ++ | T | C | 0.14 | 0.026 (0.006) | 4.0E-05 | 0.15 | 0.047 (0.012) | 1.0E-04 |
rs3127573 | 160601383 | 6q25.3 | SLC22A2 | 5′ upstream | 4.1E-08 | − | C | T | 0.13 | −0.034 (0.006) | 9.4E-08 | 0.11 | −0.025 (0.013) | 6.6E-02 |
rs3127572 | 160602208 | 6q25.3 | SLC22A2 | 5′ upstream | 4.8E-08 | − | C | T | 0.13 | −0.033 (0.006) | 1.5E-07 | 0.12 | −0.025 (0.013) | 5.5E-02 |
Meta-Analysis of TSS 2,304 + GABC 940 (741 807 SNPs) . | TSS (N = 2304) . | GABC (n = 940) . | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
SNP . | Position* . | Region . | Gene . | Annotation . | P . | Dir . | A1 . | A2 . | MAF . | β (SE) . | P . | MAF . | β (SE) . | P . |
rs4252129 | 161072895 | 6q26 | PLG | Coding-nonsyn | 1.9E-26 | − | T | C | 0.02 | −0.159 (0.014) | 2.2E-27 | 0.014 | −0.100 (0.035) | 5.1E-03 |
rs1084651 | 161009807 | 6q26 | LPA | Intron | 1.6E-15 | − | A | G | 0.19 | −0.038 (0.005) | 2.5E-12 | 0.16 | −0.044 (0.011) | 1.2E-04 |
rs783149 | 161008908 | 6q26 | LPA | 5′ upstream | 1.6E-14 | − | A | C | 0.19 | −0.037 (0.005) | 1.6E-11 | 0.17 | −0.042 (0.011) | 1.9E-04 |
rs11751605 | 160883220 | 6q25.3 | LPA | Intron | 2.9E-12 | − | C | T | 0.17 | −0.042 (0.006) | 1.8E-13 | 0.13 | −0.018 (0.013) | .15 |
rs1247513 | 161200369 | 6q26 | PLG | 3′ downstream | 2.8E-11 | − | T | C | 0.17 | −0.040 (0.006) | 5.8E-13 | 0.17 | −0.012 (0.011) | .27 |
rs3120137 | 160691182 | 6q25.3 | SLC22A3 | Intron | 3.6E-10 | − | T | C | 0.17 | −0.036 (0.006) | 3.2E-10 | 0.13 | −0.022 (0.012) | .07 |
rs4252137 | 161074440 | 6q26 | PLG | Intron | 1.9E-09 | − | A | G | 0.04 | −0.071 (0.011) | 4.8E-11 | 0.03 | −0.020 (0.024) | .39 |
rs10412972 | 56846717 | 19q13.33 | SIGLEC14 | 5′ upstream | 1.1E-08 | ++ | A | G | 0.16 | 0.026 (0.006) | 8.1E-06 | 0.17 | 0.041 (0.011) | 2.7E-04 |
rs11084102 | 56854528 | 19q13.33 | SIGLEC14 | 5′ upstream | 2.9E-08 | ++ | T | C | 0.14 | 0.026 (0.006) | 4.0E-05 | 0.15 | 0.047 (0.012) | 1.0E-04 |
rs3127573 | 160601383 | 6q25.3 | SLC22A2 | 5′ upstream | 4.1E-08 | − | C | T | 0.13 | −0.034 (0.006) | 9.4E-08 | 0.11 | −0.025 (0.013) | 6.6E-02 |
rs3127572 | 160602208 | 6q25.3 | SLC22A2 | 5′ upstream | 4.8E-08 | − | C | T | 0.13 | −0.033 (0.006) | 1.5E-07 | 0.12 | −0.025 (0.013) | 5.5E-02 |
A1, allele 1 (minor allele and tested allele); A2, allele 2 (major allele); Dir, direction of effect for the tested allele; SE, standard error.
Position uses NCBI Build 36 coordinates (UCSC hg18), which is used throughout the study.
Conditional analyses
We performed a conditional analysis to screen for potential secondary signals masked by the top SNPs to identify potential gene-to-gene interactions and to clarify the number of independent association signals. To perform these conditional studies, top SNPs from the initial meta-analysis were introduced as covariates in a second round of analyses. First, when rs4252129 was included as a covariate, 2 Chr6 SNPs, rs1084651 and rs783149, which are in linkage disequilibrium (LD) with each other (r2 = 0.98), remained significant (Table 3), suggesting that they represent an independent signal from rs4252129. This is supported by the local LD patterns for rs4252129 and rs1084651 that are not in LD (r2 = 0.005) and are separated by a recombination hotspot (Figure 1B). Figure 2A displays the PLG distribution of subjects with different genotype combinations formed by the top 2 independent SNPs, rs4252129 and rs1084651, ordered by their allelic effect. The allelic effect of rs1084651 is observed within each stratum of rs4252129 genotypes (P = 3.5E-38), demonstrating that their effects were additive and independent. Additionally, the effect size and P value of the 2 Chr19 SNPs, rs10412972 and rs11084102, remained nearly identical (Table 3) when rs4252129 was included as a covariate, suggesting no interaction between Chr6 and Chr19 signals. The next round of conditional analyses used rs1084651 as an additional covariate and uncovered 2 new SNPs in Chr6: rs41272114 in a splice site of LPA, and rs783176 in PLG (Table 3). Figure 2B-C displays the PLG distribution in all genotype combinations formed by these 3 SNPs and similarly demonstrates that the allelic effect of rs41272114 or rs783176 is observed in most strata formed by the rs4252129-rs1084651 genotype combinations (P = 8.0E-40 and 2.8E-42, respectively). When rs41272114 was used in a final round of conditional analysis, no signal on Chr6 remained. These results demonstrated 3 independent signals at 6q25.3-q26 that are associated with PLG levels.
SNP . | Chr . | Position . | Gene . | A1 . | MAF . | β . | P0 . | P1 . | P2 . |
---|---|---|---|---|---|---|---|---|---|
First round covariate: rs4252129 | |||||||||
rs1084651 | 6 | 161009807 | LPA | A | 0.18 | −0.038 | 1.1E-13 | 1.8E-15 | — |
rs783149 | 6 | 161008908 | LPA | A | 0.18 | −0.037 | 6.2E-13 | 1.1E-14 | — |
rs10412972 | 19 | 56846717 | SIGLEC14 | A | 0.16 | 0.030 | 3.0E-09 | 1.8E-08 | — |
rs11084102 | 19 | 56854528 | SIGLEC14 | T | 0.14 | 0.031 | 9.0E-09 | 4.1E-08 | — |
2nd round covariates: rs4252129 + rs1084651 | |||||||||
rs10412972 | 19 | 56846717 | SIGLEC14 | A | 0.16 | 0.029 | 3.0E-09 | 1.8E-08 | 2.5E-08 |
rs41272114 | 6 | 160926067 | LPA | A | 0.05 | 0.056 | 0.087 | 0.15 | 2.5E-08 |
rs11084102 | 19 | 56854528 | SIGLEC14 | T | 0.14 | 0.031 | 9.0E-09 | 4.1E-08 | 3.8E-08 |
rs783176 | 6 | 161090825 | PLG | C | 0.18 | 0.028 | 0.080 | 4.3E-07 | 4.0E-08 |
SNP . | Chr . | Position . | Gene . | A1 . | MAF . | β . | P0 . | P1 . | P2 . |
---|---|---|---|---|---|---|---|---|---|
First round covariate: rs4252129 | |||||||||
rs1084651 | 6 | 161009807 | LPA | A | 0.18 | −0.038 | 1.1E-13 | 1.8E-15 | — |
rs783149 | 6 | 161008908 | LPA | A | 0.18 | −0.037 | 6.2E-13 | 1.1E-14 | — |
rs10412972 | 19 | 56846717 | SIGLEC14 | A | 0.16 | 0.030 | 3.0E-09 | 1.8E-08 | — |
rs11084102 | 19 | 56854528 | SIGLEC14 | T | 0.14 | 0.031 | 9.0E-09 | 4.1E-08 | — |
2nd round covariates: rs4252129 + rs1084651 | |||||||||
rs10412972 | 19 | 56846717 | SIGLEC14 | A | 0.16 | 0.029 | 3.0E-09 | 1.8E-08 | 2.5E-08 |
rs41272114 | 6 | 160926067 | LPA | A | 0.05 | 0.056 | 0.087 | 0.15 | 2.5E-08 |
rs11084102 | 19 | 56854528 | SIGLEC14 | T | 0.14 | 0.031 | 9.0E-09 | 4.1E-08 | 3.8E-08 |
rs783176 | 6 | 161090825 | PLG | C | 0.18 | 0.028 | 0.080 | 4.3E-07 | 4.0E-08 |
P0, P in original association analysis; P1, P in the first conditional analysis with rs4252129 as a covariate; P2, P in the second conditional analysis with rs4252129 and rs1084651 as covariates.
The meta-analysis also revealed 2 significant SNPs on Chr19, which are in LD with each other (r2 = 0.84) and in the same LD block marked by recombination hotspots (Figure 1C). When the top Chr19 SNP, rs10412972, was used as a covariate in meta-analysis, no significant signal remained on Chr19. The allelic effect of rs10412972 is independently observed within most strata of the rs4252129-rs1084651 genotype combinations (Figure 2D), supporting the independence of the Chr19 and Chr6 signals (P = 6.6E-43). Taken together, these results identify 4 independent loci associated with PLG levels based on conditional analyses, LD patterns, and the PLG distribution across genotype combinations: 3 adjacent regions on Chr6 (PLG and LPA), and 1 on Chr19 (near SIGLEC14).
Environmental factors and PLG
Female sex was associated with a 9.0% increase of log-transformed PLG levels in TSS (β = 0.090 ± 0.0064, P = 1.5E-40) and a 15.1% increase of PLG in GABC (β = 0.15 ± 0.012, P = 9.3E-30) (supplemental Figure 8A-B). The relationship between age and sex-adjusted PLG levels in TSS or GABC consisted of both a linear term and an age-squared term. Height and weight were not significantly associated with age- and sex-adjusted PLG levels in TSS or GABC.
Smoking status was associated with a 2.2% increase of PLG levels in the TSS cohort, and the effect was significant (β = 0.022 ± 0.0065, P = 8.4E-4; Figure 3A and supplemental Figure 8C). We examined whether the effects of the top associated SNPs and smoking status were independent in the TSS. Figure 3B shows the age- and sex-adjusted PLG levels for the 3 genotypes of the top SNP, rs4252129, and for smoking status, ordered by the direction of effect. The effect of smoking was observed in each stratum of rs4252129 genotypes, demonstrating that their effects are additive and independent. Figure 3C displays the genotype combinations of the top 2 independent SNPs, rs4252129 and rs1084651, as well as smoking status, and similarly establishes that the effect of smoking on PLG levels was independent of the top 2 associated SNPs.
Functional annotation of the 6q25.3-q26–associated regions
The associated regions on Chr6 contain the PLG gene, encoding the PLG protein. The top SNP from meta-analysis, rs4252129, codes for an amino acid substitution (R523W) in PLG. This missense variation is predicted to be benign/tolerated according to PolyPhen2 or SIFT. All other significant SNPs in the meta-analysis were noncoding (Table 2). None of the significant SNPs from either cohort or meta-analysis matched with any known eQTL in multiple tissues.26 Because it is possible that multiple rare variants could underlie the observed association to genotyped common variants, we reviewed the discovered variants in the Exome Sequencing Project database. Twenty-eight rare variants (MAF ≤ 1%) in PLG were predicted to be probably or possibly damaging with a cumulative MAF of 2.53% in European Americans (supplemental Table 4). Even in the unlikely event that every rare variant was in LD with the same surrogate common SNP, the power of detecting the association with such a SNP would be low given our sample size. Additionally, damaging variants that alter protein structure and/or function may not be associated with altered plasma levels of protein.
SIGLEC14 deletion polymorphism
Our results include 2 associated SNPs 5′ upstream of SIGLEC14. Previous studies of SIGLEC14 and the highly homologous SIGLEC5 have described a common gene fusion between SIGLEC14 and SIGLEC5 that results in a null allele of SIGLEC14 and an altered expression pattern of Siglec-5.27 In order to determine if the SNPs identified at SIGLEC14 were in LD with the deletion polymorphism at SIGLEC14 and to test the association of the deletion polymorphism with PLG levels, we performed polymerase chain reaction–based genotyping in the European subset of the GABC cohort. Out of 874 individuals that were genotyped, 292 (33%) were heterozygous and 24 (2.7%) were homozygous for the deletion. This deletion polymorphism was in Hardy-Weinberg equilibrium (P = .069) but was not associated with PLG levels (β = −0.0032, P = .76) (supplemental Figure 9) and was not in LD with any of the 27 genotyped SNPs near SIGLEC5 or SIGLEC14.
Linkage analysis for PLG in 1139 GABC and 138 TSS sibs
The linkage analysis for the 557 sibships using 35 356 LD clusters (supplemental Methods) revealed the strongest independent region of linkage in 17q22-q25.3 (LOD = 2.2, P value = 8.0E-4) (supplemental Table 5 and Figure 4). However, when we evaluated the genome-wide significance of the linkage results, none of the top 10 independent regions of linkage had higher LOD scores than the 95th percentile of their respective equal-ranked LOD score null distributions among the 1000 simulations of randomized phenotypes as described previously28,29 (supplemental Figure 10B). Analysis by SOLAR (Sequential Oligogenic Linkage Analysis Routines)30 revealed that the power to detect a QTL having per-locus heritability of 10% with a LOD score of 2 was only 3% for the sample size of our studies (supplemental Figure 11), suggesting that we were under-powered to detect all but the strongest linkage signals for PLG in the TSS and GABC siblings. The highest LOD score for the significant regions identified in meta-analysis was 0.72 on the 19q13.3 region (P = .03).
Discussion
Previous GWASs have been conducted for other fibrinolytic factors, including plasma levels of PAI-1 and tPA, but have not studied PLG.11,12 In this report, the estimated heritability of PLG was 48.1% based on genotyped data for all individuals, and this value was comparable to the h2 estimates of 43% reported in a cohort of Mexican Americans.9 The estimates of h2 in 2 other studies involving individuals with thrombosis or stroke were much lower, at 23.6% and 18.2%, respectively.31,32 Our heritability estimate may have been higher than the later reports due to the healthy and young nature of our subjects, which could have decreased the amount of variance in PLG levels due to unknown environmental influences. For example, PLG is a known acute-phase reactant, so we would expect the plasma concentration to be elevated in individuals undergoing stress or illness.33
Our results demonstrated that smoking exposure was associated with a 2.2% increase in PLG levels in the TSS cohort, where cigarette use was common (31.4%). This finding is consistent with a previous study that observed an average increase in PLG of 3.6% in smokers compared with nonsmokers.34 The effect of smoking on PLG levels was independent of the genetic effects of DNA variants at PLG and LPA. Cigarette smoking is known to increase the expression of a variety of proinflammatory factors in epithelial cells35 and is a known risk factor for acute coronary thrombosis and other vascular diseases where thrombosis plays a major pathophysiologic role.36 Indeed, previous studies have shown that the PLG gene contains acute-phase responsive elements, which may provide a mechanistic explanation of increased PLG levels in smokers.37 Consistent with our findings, smoking appears to be associated with an increase in C-reactive protein, a classic acute-phase reactant, in a study of adolescents.38
The strongest genetic association for PLG levels was with a nonsynonymous SNP, rs4252129, in the PLG gene itself, with a MAF of 1.4% in GABC and 2.2% in TSS. The minor allele T (R523W) was associated with a 13.4% decrease in mean plasma PLG levels in the combined TSS and GABC cohorts. Though predicted to be benign by PolyPhen2 and SIFT, these software predictions are intended to predict the impact not on plasma protein levels but rather on protein function.39,40 Although we were unable to find other variants in LD with this SNP, we cannot rule out the possibility that rs4252129 tags an undiscovered but functional variant. However, it is reasonable to speculate that rs4252129 (T) is a functional mutation leading to decreased PLG levels through altered rates of synthesis, secretion, or plasma clearance.
The second strongest signal, rs1084651, was in an intron of the LPA gene, a paralog of PLG located 35-kb upstream of the PLG gene. This SNP has been previously associated with levels of total cholesterol and high-density lipoprotein.41 Elevated levels of Lp(a) are an independent risk factor for cardiovascular disease.42-44 The LPA gene is highly homologous (80%) to PLG45 and encodes apolipoprotein(a) that complexes with other lipoproteins to form Lp(a). Apolipoprotein(a) has no protease function but contains a variable number of kringle IV domain repeats and exhibits a wide range of plasma levels among individuals.46 The majority of the variation in plasma Lp(a) was associated with SNPs in the LPA locus that affect the number of kringle 4 domain repeats expressed in apolipoprotein(a).47 In the circulation, Lp(a) competes with PLG, PAI-1, and tPA for fibrin binding and therefore may have an antifibrinolytic effect. Interestingly, the other top SNPs in our study, including rs4252129, rs783149, and rs783176, have been associated with Lp(a) levels in patients with carotid artery disease, type 2 diabetes, or coronary artery disease.44,48,49 Although the top associated SNP for Lp(a) levels in these studies, rs10455872, was not significantly associated with PLG levels for healthy subjects in our study (P = .79), 3 SNPs reported here (rs4252129, rs783149, and rs783176) had the same direction of allelic effect on both PLG in our study and Lp(a) levels in the studies mentioned above. For example, the minor allele T of rs4252129 is associated with decreased PLG levels (β = −0.14) in the combined TSS and GABC cohorts and associated with decreased Lp(a) levels (β = −0.4) in patients with carotid artery disease.48 Taken together, these results suggest that the levels of PLG and Lp(a) share some measure of genetic control by common variants at the 6q25.3-q26 locus. We also compared our PLG association results to other large meta-analysis of related fibrinolytic proteins PAI-130 and tPA12 but did not identify any signals in common.
The third-strongest signal for PLG was in SNPs 5′ upstream of the SIGLEC14 and SIGLEC5 genes. Like PLG and LPA, these genes are adjacent paralogs with extensive sequence similarities.50 Siglec (sialic acid-binding Ig-like lectin) proteins have diverse biologic functions and serve as membrane-bound receptors for a large variety of sialyated glycoproteins. Siglec-5 may function as a clearance receptor for coagulation factor VIII and von Willebrand factor,51 although no variant has been identified near SIGLEC5 in large GWAS or linkage studies for factor VIII and von Willebrand factor levels.29,52 Siglec-14 is expressed on granulocytes and monocytes, whereas Siglec-5 is expressed on granulocytes and B cells.27 Loss of Siglec-14 expression due to a common gene fusion event has been associated with protection from chronic obstructive pulmonary disease in a Japanese population where the polymorphism is most common.53 We detected no significant association between PLG and the deletion polymorphism or any evidence of LD between the genotyped SNPs near SIGLEC14 and the deletion polymorphism. This is consistent with the ancient nature of this polymorphism ,as it is present in all human populations,27 and strongly suggests that the significant SIGLEC14 SNPs in the meta-analysis tag haplotypes that are independent of the deletion polymorphism.
To further identify the potential functional link between PLG and the observed association in the Chr19 region, we examined the eQTLs, coexpression patterns, gene expressions, and LD patterns for genes in the region of interest. The SIGLEC14 SNPs were not a known cis-eQTL for SIGLEC14. Microarray data sets for PLG had a weak but negative correlation (r = −0.22) with SIGLEC14. Overall, the LD patterns still strongly favor SIGLEC14, which is 5 kb away from the significant SNPs, as the gene locus linked to the association with PLG levels. Further details of these database investigations are available in supplemental Results.
To discover additional genetic signals marked by allele-sharing patterns in siblings but undetected by GWAS, we performed linkage analyses. These studies did not detect significant signals (Figure 4 and supplemental Table 5). Based on the linkage power analysis, we predicted that only loci accounting for greater than 35% of the trait heritability would be detected. Therefore, we have not ruled out the existence of other loci that contribute to PLG level variance.
In summary, this report details the results from genome-wide association and linkage studies of PLG levels in 2 healthy young cohorts. We identify 4 independent signals in the LPA, PLG, and SIGLEC14 loci that collectively explain 6.43% (genotyped SNPs) to 10.2% (imputed SNPs) of the variance in PLG levels. Taken together, these findings suggest that common variants in PLG and LPA are the major common genetic determinants of plasma PLG levels, whereas SNPs 5′ upstream of SIGLEC14, which were significant in the meta-analysis only, await further replication and functional verification.
Presented in abstract form at the 55th annual meeting of the American Society of Hematology, New Orleans, LA, December 8, 2013.
The online version of the article contains a data supplement.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
Acknowledgments
The authors recognize the contributions of the participants of the Genes and Blood Clotting Studies and the Trinity Student Studies to these analyses.
This work was supported by the Intramural Research Programs of the National Human Genome Research Institute and the Eunice Kennedy Shriver National Institute of Child Health and Human Development. This work was also supported by National Institutes of Health National Heart, Lung, and Blood Institute grants R37HL039693 (K.C.D. and D.G.) and R01HL112642 (D.G., J.Z.L., A.B.O., Q.M., and K.C.D). Additionally, D.G. is a Howard Hughes Medical Institute Investigator.
Authorship
Contribution: Q.M., D.G., R.K., J.L.M., L.B., A.M., J.Z.L., and K.C.D. designed the research; B.M., R.K., and K.C.D. and performed the experiments; Q.M., A.B.O., S.R., J.Z.L., and K.C.D. analyzed results; H.L. and Y.G. performed bioinformatic analysis of the Chr19-associated region; Q.M. made the figures; and Q.M., D.G., J.Z.L., and K.C.D. wrote the paper.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Karl C. Desch, Department of Pediatrics and Communicable Disease, University of Michigan, Ann Arbor, MI 48109; e-mail: kdesch@med.umich.edu.