Human-specific lncRNA GATA2AS is a regulator of erythroid differentiation.
GATA2AS is a novel GATA2 and HBG activator through distinct mechanisms.
Visual Abstract
Long noncoding RNAs (lncRNAs) are extensively expressed in eukaryotic cells and have been revealed to be important for regulating cell differentiation. Many lncRNAs have been found to regulate erythroid differentiation in the mouse. However, given the low sequence conservation of lncRNAs between mouse and human, our understanding of lncRNAs in human erythroid differentiation remains incomplete. lncRNAs are often transcribed opposite to protein coding genes and regulate their expression. Here, we characterized a human erythrocyte-expressed lncRNA, GATA2AS, which is transcribed opposite to erythroid transcription regulator GATA2. GATA2AS is a 2080-bp long, primarily nucleus-localized noncoding RNA that is expressed in erythroid progenitor cells and decreases during differentiation. Knockout of GATA2AS in human HUDEP2 erythroid progenitor cells using CRISPR-Cas9 genome editing to remove the transcription start site accelerated erythroid differentiation and dysregulated erythroblast gene expression. We identified GATA2AS as a novel GATA2 and HBG activator. Chromatin isolation by RNA purification showed that GATA2AS binds to thousands of genomic sites and colocalizes at a subset of sites with erythroid transcription factors including LRF and KLF1. RNA pulldown and RNA immunoprecipitation confirmed interaction between GATA2AS and LRF and KLF1. Chromatin immunoprecipitation sequencing (ChIP-seq) showed that knockout of GATA2AS reduces binding of these transcription factors genome wide. Assay for transposase-accessible chromatin sequencing (ATAC-seq) and H3K27ac ChIP-seq showed that GATA2AS is essential to maintain the chromatin regulatory landscape during erythroid differentiation. Knockdown of GATA2AS in human primary CD34+ cells mimicked results in HUDEP2 cells. Overall, our results implicate human-specific lncRNA GATA2AS as a regulator of erythroid differentiation by influencing erythroid transcription factor binding and the chromatin regulatory landscape.
Introduction
About 75% of the human genome is transcribed,1 but only 2% of transcripts represent protein coding genes and the rest represent noncoding genes, including microRNAs (miRNAs), transfer RNAs (tRNAs), and long noncoding RNAs (lncRNAs). lncRNAs are transcripts longer than 200 base pair (bp), without protein coding potential that number in the tens of thousands and are poorly conservative compared with protein coding genes.2,3 lncRNAs can be categorized as divergent lncRNAs transcribed opposite to messenger RNAs (mRNAs), protein coding gene body-associated lncRNAs, intergenic lncRNAs, and eRNAs, which originate from enhancers. Divergent lncRNA represent a large fraction of all lncRNAs.2,4 lncRNAs can regulate gene expression in almost all aspects from transcription to translation by diverse mechanisms.5,6 In the nucleus, lncRNAs can regulate transcription of target genes by recruiting protein complexes and sequestering transcription factors.7 The expression of lncRNAs is more cell type specific than protein coding genes,8 supporting a function in cell differentiation. Indeed, many lncRNAs have been reported to be essential for cell lineage choice.9
Hematopoiesis is the formation of diverse blood cell lineages from hematopoietic stem cells (HSCs), which is continuous during the lifetime of an organism. Hematopoiesis results in production of diverse cell types from HSCs, including erythroid cells, macrophages, and lymphocytes in a controlled fashion by well-defined transcription factors.10 For example, RUNX1, TAL1, LMO2, and GATA2 are required for maintenance of HSCs. In addition to transcription factors, lncRNAs have also been found to regulate hematopoiesis.11 However, many studies used the mouse as a model system to investigate lncRNA function in erythropoiesis, which may not fully reflect the situation in humans, given that >80% of mouse erythroid lncRNAs are not detected in human cells.12,13
GATA2AS is a lncRNA transcribed opposite to GATA2 that was reported in transcriptomic data of human gastric cancer cells.14 In non–small cell lung cancer cells, GATA2AS interacts with GATA1 to inhibit GATA2 transcription directly and represses cell proliferation.15 By contrast, in colorectal cancer cells, GATA2AS recruits DDX3X to stabilize GATA2 mRNA and, in turn, GATA2 binding to the GATA2AS promoter to increase GATA2AS expression.16 The feedback loop between GATA2AS and GATA2 promotes colorectal cancer cell proliferation. Other work reported a patient with a heterozygous duplication of the GATA2/GATA2AS locus who had increased GATA2AS expression and reduced GATA2 expression from all 3 copies of GATA2.17 The patient had a GATA2-deficient phenotype including mononuclear cell deficiency and multilineage dysplasia. Whether GATA2AS regulates GATA2 expression in erythroid cells or affects erythroid differentiation is unknown.
Here, we used immortalized human HUDEP2 erythroid progenitor cells and human cord blood CD34+ progenitor cells to study GATA2AS function in erythropoiesis.18 Knockout (KO) of GATA2AS revealed that GATA2AS regulates proliferation and differentiation of erythroid progenitor cells. GATA2AS is targeted to thousands of genomic sites in HUDEP2 cells and colocalizes at a subset of sites with erythroid transcription factors including LRF and KLF1.19,20 GATA2AS functions to maintain the regulatory landscape of erythropoiesis including chromatin accessibility and enhancers. In conclusion, our study demonstrates GATA2AS is essential for erythropoiesis.
Methods
RT-qPCR and RNA-seq
RNA was extracted with a PureLink RNA Mini Kit (Invitrogen; catalog no. 12183020), followed by DNase digestion (Invitrogen; catalog no. 12185-010). Total RNA (2 μg) was reverse transcribed to cDNA by the SuperScript III First-Strand Synthesis System (Invitrogen; catalog no. 18080051). RNA sequencing libraries were constructed from 500 ng total RNA with the Truseq Stranded mRNA sample preparation kit (Illumina; catalog no. 20020493). Library quality was validated with an Agilent Bioanalyzer 2100. Fifty bp single-end sequencing was performed with HiSeq-3000 (Illumina). For probes see supplemental Table 1, available on the Blood website.
RNA pulldown
RNA pulldown was performed as described.21 Briefly, full length GATA2AS was cloned into pcDNA3.1 vector. In vitro transcription and biotin labeling were performed with a MEGAscript T7 Kit (Invitrogen; catalog no. AM1333) according to manufacturer’s protocol. GATA2AS RNA (2 μg) was incubated with 2 mg precleared protein extracted from K562 cells for 2 hours at 4°C; 100 μL washed M-280 streptavidin Dynabeads (Invitrogen; catalog no. 11205D) was added to incubate for another 2 hours. Proteins were eluted and subjected to western blotting.
RIP
RNA immunoprecipitation (RIP) was performed as described22 with modifications. Briefly, HUDEP2 cells were fixed with 1% formaldehyde (Thermo; catalog no. 28908) for 10 minutes at room temperature and quenched with 0.125 M glycine. Nuclei were sonicated with a Bioruptor (Diagenode). Lysate from ∼7.5 × 106 cells was aliquoted for each IP and antibodies added. RNA was extracted with miRNAeasy Mini Kit (Qiagen; catalog no. 217004). cDNAs were reverse transcribed from RNA by SuperScript III First-Strand Synthesis System (Invitrogen; catalog no. 18080051). qPCR for GATA2AS and GAPDH was performed. For antibodies see supplemental Table 1.
Chromatin isolation by RNA purification sequencing (ChIRP-seq)
GATA2AS probes were designed using Stellaris Probe Designer 4.2 with 3’ Bio-TEG modification. LacZ probes were from Millipore (CS216572). Chromatin isolation by RNA purification sequencing (ChIRP-seq) was performed as described.23 Briefly, 4 × 107 cells were fixed with 1% glutaraldehyde (Sigma-Aldrich; catalog no. G5882), sonicated with Bioruptor (Diagenode), and aliquoted to input, odd, and even samples. After probe incubation, RNA was extracted with a miRNAeasy Mini Kit (Qiagen; catalog no. 217004), and a SuperScript III First-Strand Synthesis System (Invitrogen; catalog no. 18080051) was used to reverse transcribe cDNA. A NEBNext Ultra II DNA Library Prep Kit for Illumina (NEB; catalog no. E7645S) was used for library construction. Library quality was validated with an Agilent Bioanalyzer 2100. Single-end sequencing of 50 bp was performed with HiSeq-3000 (Illumina). For probes, see supplemental Table 1.
Chromatin immunoprecipitation sequencing (ChIP-seq)
ChIPmentation was performed as described.24 Briefly, HUDEP2 cells were fixed with 1% formaldehyde (Thermo; catalog no. 28908) and sonicated with Bioruptor (Diagenode) to 300 to 500 bp. A total of 5 × 106 cells were used for each IP. After antibodies incubated, tagmentation was performed on Dynabeads A or G (Invitrogen; catalog no. 10002D and 10007D) TDE1 with Tagment DNA Enzyme (Illumina; catalog no. 20034210). Libraries were amplified with KAPA HiFi HotStart ReadyMix (KAPABIOSYSTEMS, KK2602). Library quality was validated with Agilent Bioanalyzer 2100. Single-end sequencing of 50 bp was performed with HiSeq-3000 (Illumina). For antibodies, see supplemental Table 1.
Assay for transposase-accessible chromatin sequencing (ATAC-seq)
ATAC-seq was performed as described.25 HUDEP2 cells (50 000) were used for each library. A total of 3.5 μL TDE1 Tagment DNA Enzyme (Illumina; catalog no. 20034210) was added to 50 μL transposition mixture and incubated for 1 hour at 37°C with 1000 rpm shaking. Reactions were clarified with a minElute PCR Purification Kit (Qiagen; catalog no. 28004). Libraries were amplified with NEBNext High Fidelity 2× Master Mix (NEB, M0541L) for 6 cycles, clarified with 1.8× volume SPRIselect beads (Beckman Coulter, B23318) and size-selected with 0.65×/1.15× SPRIselect beads. Library quality was validated with an Agilent Bioanalyzer 2100. Paired-end sequencing of 50 bp was performed with HiSeq-3000 (Illumina).
Quantification and statistical analysis
Two biological replicates were performed for sequencing. For other results, 3 biological replicates were performed. All data were expressed as mean ± standard deviation. GraphPad Prism 8.0 (GraphPad Software) was used to perform statistical analyses. For complete data analysis, see supplemental Methods.
Additional experimental details appear in supplemental Methods.
Results
GATA2AS is a GATA2-divergent lncRNA transcribed in human erythroid progenitor cells
We mapped the 5′ and 3′ ends of GATA2AS in erythroid K562 cells by rapid amplification of cDNA ends (RACE), detecting a 2080 bp transcript with 3 exons (Figure 1A-B). The coding-potential assessment tool (CPAT) supported that GATA2AS is a noncoding RNA (coding probability = 0.019; coding label = no).26,GATA2AS is transcribed opposite to GATA2 and partially overlaps 1 long GATA2 isoform. The exon sequence of GATA2AS is not conserved between human and mouse, but the known GATA2 regulatory elements at –1.8 kb and –2.8 kb from the transcription start site (TSS) are shared (Figure 1B).27,28 RNA fluorescence in situ hybridization (FISH) and RT-PCR localized GATA2AS primarily in the nucleus in K562 cells (Figure 1C-D). RT-qPCR showed decreased GATA2AS expression during differentiation of human HUDEP2 erythroid progenitor cells, whereas GATA2 gene transcription decreased, and β-globin gene transcription was induced, as expected (Figure 1E).
Identification and KO of GATA2AS in human erythroid cells. (A) RACE was used to identify the 5’ and 3’ ends of full length GATA2AS, and full length of GATA2AS was cloned for sequencing. (B) Browser shot of human and mouse GATA2 gene loci. The known and novel GATA2AS transcripts in human erythroid cells is shown in green and purple. The 2 promoters of GATA2 are highlighted in blue. The basewise conservation track at the bottom shows conservation of the 2 GATA2 regulatory elements (highlighted in yellow) and GATA2 exons. (C) GATA2AS distribution in cells was detected by RNA FISH in K562 cells. Alexa Fluor 594 (red) labeled the antisense GATA2AS transcript. (D) GATA2AS distribution in K562 cells was detected by RT-qPCR. Cytoplasm-located ACTB and nuclear-located XIST transcripts were used as controls. (E) Expression pattern of GATA2AS during HUDEP2 differentiation was detected by RT-qPCR for undifferentiated and differentiated (day 7) cells. GATA2 and β-globin genes were used as markers of erythroid differentiation. (F) GATA2AS expression was detected in GATA2AS KO and control HUDEP2 cells by RT-qPCR. (G) Proliferation curves of GATA2AS KO HUDEP2 cells during differentiation. (H) Differentiation method (left schematic) of HUDEP2 cells and representative flow cytometry results for differentiation markers CD235a and CD71 in control and GATA2AS KO cells on days 0, 4, 7, and 9. (I) Cell surface expression of CD235a and CD71 during erythroid differentiation, presented as median fluorescence intensity (MFI).
Identification and KO of GATA2AS in human erythroid cells. (A) RACE was used to identify the 5’ and 3’ ends of full length GATA2AS, and full length of GATA2AS was cloned for sequencing. (B) Browser shot of human and mouse GATA2 gene loci. The known and novel GATA2AS transcripts in human erythroid cells is shown in green and purple. The 2 promoters of GATA2 are highlighted in blue. The basewise conservation track at the bottom shows conservation of the 2 GATA2 regulatory elements (highlighted in yellow) and GATA2 exons. (C) GATA2AS distribution in cells was detected by RNA FISH in K562 cells. Alexa Fluor 594 (red) labeled the antisense GATA2AS transcript. (D) GATA2AS distribution in K562 cells was detected by RT-qPCR. Cytoplasm-located ACTB and nuclear-located XIST transcripts were used as controls. (E) Expression pattern of GATA2AS during HUDEP2 differentiation was detected by RT-qPCR for undifferentiated and differentiated (day 7) cells. GATA2 and β-globin genes were used as markers of erythroid differentiation. (F) GATA2AS expression was detected in GATA2AS KO and control HUDEP2 cells by RT-qPCR. (G) Proliferation curves of GATA2AS KO HUDEP2 cells during differentiation. (H) Differentiation method (left schematic) of HUDEP2 cells and representative flow cytometry results for differentiation markers CD235a and CD71 in control and GATA2AS KO cells on days 0, 4, 7, and 9. (I) Cell surface expression of CD235a and CD71 during erythroid differentiation, presented as median fluorescence intensity (MFI).
KO of GATA2AS affects erythroid cell proliferation and differentiation
To investigate GATA2AS function, we knocked out GATA2AS in HUDEP2 cells by deleting the TSS using CRISPR-Cas9 genome editing (supplemental Figure 1A). Two GATA2AS KO clones were obtained and deletion verified by Sanger sequencing (supplemental Figure 1B-C). GATA2AS expression was almost undetectable in KO cells by RT-qPCR and immunofluorescent staining (Figure 1F; supplemental Figure 1D). GATA2AS KO cells displayed decreased proliferation after differentiation compared with controls (Figure 1G). During differentiation of HUDEP2 cells, CD235a gradually increases, whereas CD71 decreases.18 Flow cytometry for these markers revealed that KO of GATA2AS accelerated gain of CD235a during differentiation and accelerated loss of CD71 (Figure 1H-I). Thus, loss of GATA2AS promotes differentiation of HUDEP2 cells.
KO of GATA2AS dysregulates expression of genes enriched in erythroid differentiation pathways
To ask how KO of GATA2AS affects gene expression, we performed RNA-seq for control and GATA2AS KO HUDEP2 cells. There were thousands of differentially expressed genes (DEGs) at different times during differentiation (Figure 2A-B; supplemental Figure 2A-C). A subset of 365 genes was dysregulated at all stages, including heme synthetic enzymes FECH, HMBS, and UROS and erythrocyte membrane protein EPB42 (supplemental Table 2). The largest effect of GATA2AS loss on the transcriptome (>5000 DEGs) was observed for the most mature cells on day 9 of differentiation. Genes associated with the HSC differentiation pathway were upregulated at all stages of differentiation after GATA2AS KO (Figure 2C; supplemental Figure 2A-C).
Gene expression changes in HUDEP2 GATA2AS KO cells are consistent with phenotypic changes. (A) RNA-seq of GATA2AS KO and control HUDEP2 cells on day 0 of differentiation. MA plots of DEGs, the number of upregulated and downregulated genes, representative erythroid genes, and a twofold change threshold are indicated. (B) Venn diagram shows overlapping DEGs between different time points during differentiation for control and GATA2AS KO HUDEP2 cells. The number of overlapped DEGs is indicated. (C) Gene ontology (GO) analysis of differentially expressed genes and pathways associated with cell proliferation and erythroid differentiation are shown. (D) Differential gene expression compared with control is shown for select DEGs in HUDEP2 cells after loss of GATA2AS, including α-globins HBA1 and HBA2, adult HBB and HBD, fetal HBG1 and HBG2, and GATA1/2. (E) Heat map shows log2 fold change for DEGs in panel D at different days of differentiation of GATA2AS KO and control cells with P values superimposed. ∗P < .05; ∗∗P < .01; ∗∗∗P < .001; ∗∗∗∗P < .0001.
Gene expression changes in HUDEP2 GATA2AS KO cells are consistent with phenotypic changes. (A) RNA-seq of GATA2AS KO and control HUDEP2 cells on day 0 of differentiation. MA plots of DEGs, the number of upregulated and downregulated genes, representative erythroid genes, and a twofold change threshold are indicated. (B) Venn diagram shows overlapping DEGs between different time points during differentiation for control and GATA2AS KO HUDEP2 cells. The number of overlapped DEGs is indicated. (C) Gene ontology (GO) analysis of differentially expressed genes and pathways associated with cell proliferation and erythroid differentiation are shown. (D) Differential gene expression compared with control is shown for select DEGs in HUDEP2 cells after loss of GATA2AS, including α-globins HBA1 and HBA2, adult HBB and HBD, fetal HBG1 and HBG2, and GATA1/2. (E) Heat map shows log2 fold change for DEGs in panel D at different days of differentiation of GATA2AS KO and control cells with P values superimposed. ∗P < .05; ∗∗P < .01; ∗∗∗P < .001; ∗∗∗∗P < .0001.
GATA2AS KO decreased GATA2 in undifferentiated cells and GATA1 in differentiated cells, when each was maximally expressed (Figure 2D). HBB, HBD, and α-globin genes HBA1 and HBA2 were upregulated earlier than in control cells after KO of GATA2AS, consistent with accelerated differentiation (Figure 2D). HBG1 and HBG2, encoding the fetal hemoglobin γ-globin subunits were reduced across differentiation (Figure 2D). Reduced HBG1/2 transcription after GATA2AS KO was reflected in an 85% drop in fetal hemoglobin (HbF)-positive cells (supplemental Figure 2D). Western blotting showed decreased HbF after GATA2AS KO, and high-performance liquid chromatography analysis indicated a decrease of (γ1 + γ2)/total β-like globin chains from 2.45% to 0.81% (supplemental Figure 2E-F). Failure to upregulate HBG during differentiation could not be attributed to dysregulation of repressors BCL11A, LRF (ZBTB7A), SOX6, or ETO2 (CBFA2T3).29-32 Although SOX2 was elevated in D0 cells, as differentiation progressed, all the repressors were either unaffected by GATA2AS loss or had decreased expression (supplemental Figure 2G). Overall, the loss of GATA2AS has a large effect on the transcriptome of erythroid progenitor cells during differentiation.
A subset of GATA2AS sites colocalize with erythroid transcription factors
We performed ChIRP-seq to find direct targets of GATA2AS (supplemental Figure 3A-B). There were 6793 overlapped GATA2AS peaks between ChIRP odd and even probes (q-value <0.05) across the genome in HUDEP2 cells, including prominent peaks at the GATA2AS exons. A small subset of ∼10% of GATA2AS peaks was at gene promoters, whereas most were intronic and intergenic, predominantly 5 to 500 kb upstream or downstream of gene promoters (Figure 3A-B). Among GATA2AS KO DEGs, 516 named genes have GATA2AS peaks at their promoters, including GATA2, BCL11A, and SOX6 (supplemental Table 3). HBG1/2 do not have GATA2AS peaks at their promoters. Thus, GATA2AS directly regulates a subset of genes through their promoters, but many DEGs are likely regulated by GATA2AS-occupied enhancers, consistent with GATA2AS genomic location, or indirectly.
GATA2AS is localized to genomic sites enriched for erythroid transcription factors. (A) Genomic distribution of GATA2AS binding sites in uninduced HUDEP2 cells (day 0). GATA2AS peaks were separately called from odd and even probes, and 6793 overlapped peaks were used. (B) Distances to a TSS of GATA2AS binding sites were plotted by GREAT. (C) Known motifs enriched in GATA2AS binding sites were analyzed with HOMER and representative motifs of erythroid transcription factors are shown. q-values for all motifs are <0.0001. (D) Biotin-labeled GATA2AS RNA was used to pulldown GATA2AS-interacting proteins. Western blotting was used to detect LRF, KLF1, GATA1, and TAL1. (E) Interactions between GATA2AS and LRF or KLF1 were detected by RIP. RNAs interacting with LRF or KLF1 were detected by RT-qPCR. Normal IgG antibody and GAPDH RNA were used as controls. (F) Heat maps showing GATA2AS peaks grouped into 2 sets depending on overlap with LRF peaks. Published ChIP-seq data of LRF, GATA1, KLF1, TAL1, BCL11A, and LDB1 in HUDEP2 cells33,34 were plotted on the 2 sets of GATA2AS peaks. (G) Venn diagram showing GATA2AS-associated regions by GREAT and their overlap with GATA2AS DEGs on days 0-9 of differentiation. (H) GO term analysis of 1551 of GATA2AS associated genes by GREAT overlapped with GATA2AS KO DEGs (supplemental Figure 3C). IgG, immunoglobulin G.
GATA2AS is localized to genomic sites enriched for erythroid transcription factors. (A) Genomic distribution of GATA2AS binding sites in uninduced HUDEP2 cells (day 0). GATA2AS peaks were separately called from odd and even probes, and 6793 overlapped peaks were used. (B) Distances to a TSS of GATA2AS binding sites were plotted by GREAT. (C) Known motifs enriched in GATA2AS binding sites were analyzed with HOMER and representative motifs of erythroid transcription factors are shown. q-values for all motifs are <0.0001. (D) Biotin-labeled GATA2AS RNA was used to pulldown GATA2AS-interacting proteins. Western blotting was used to detect LRF, KLF1, GATA1, and TAL1. (E) Interactions between GATA2AS and LRF or KLF1 were detected by RIP. RNAs interacting with LRF or KLF1 were detected by RT-qPCR. Normal IgG antibody and GAPDH RNA were used as controls. (F) Heat maps showing GATA2AS peaks grouped into 2 sets depending on overlap with LRF peaks. Published ChIP-seq data of LRF, GATA1, KLF1, TAL1, BCL11A, and LDB1 in HUDEP2 cells33,34 were plotted on the 2 sets of GATA2AS peaks. (G) Venn diagram showing GATA2AS-associated regions by GREAT and their overlap with GATA2AS DEGs on days 0-9 of differentiation. (H) GO term analysis of 1551 of GATA2AS associated genes by GREAT overlapped with GATA2AS KO DEGs (supplemental Figure 3C). IgG, immunoglobulin G.
Known transcription factor motifs enriched at GATA2AS peaks were determined using HOMER, revealing strong enrichment of the motif for LRF and more modest enrichment for the motifs of BCL11A and master erythroid regulators KLF1 and TAL1 (Figure 3C). We performed RNA pulldown to detect whether GATA2AS can interact with LRF, KLF1, TAL1, or GATA1 as reported in cancer cells.15 GATA2AS interacted with LRF and KLF1, but we did not detect interaction with TAL1 or GATA1 in HUDEP2 cells under the conditions we used (Figure 3D). RIP confirmed the interaction of LRF and KLF1 with GATA2AS (Figure 3E). Heat maps of GATA2AS peaks aligned with published ChIP-seq peaks for HUDEP2 cells illustrate 620 GATA2AS and LRF co-occupied peaks clustering with GATA1, KLF1, TAL1, and more weakly, BCL11A and LDB1 peaks (Figure 3F). Thus, a subset of GATA2AS peaks, about 10%, clusters with erythroid regulators some of which interact with GATA2AS, whereas the majority of GATA2AS peaks genome wide appear devoid of the erythroid factors we interrogated.
To explore the large number of GATA2AS peaks that did not cluster with erythroid factors noted in panel F, we used GREAT35 to predict genomic regions functionally associated with all GATA2AS peaks (Figure 3G). Overlap of these regions with GATA2AS DEGs yielded 1551 additional targets of GATA2AS. Figure 3H shows that pathways enriched in these GATA2AS DEGs included those related to developmental process, cell cycle, cell differentiation, transcription, and translation, all of which may relate to the phenotype (Figure 1) and gene dysregulation (Figure 2) observed in GATA2AS KO cells.
KO of GATA2AS decreases KLF1 and LRF binding genome wide
We performed ChIP-seq and DiffBind analysis (see supplemental Methods) for KLF1 and LRF, revealing that 1026 of 1027 KLF1 peaks showed decreased binding in GATA2AS KO cells compared with controls (Figure 4A). More than 90% of differentially occupied KLF1 peaks (diffbinds) were in promoters (Figure 4B), consistent with a previous report.36 KLF1 diffbinds showed enrichment of motifs for transcription factors involved in erythroid gene regulation including SP1, which has been reported to form a complex with KLF1 and GATA1 that binds promoters and enhancers of erythroid genes (supplemental Figure 4A).37 All 1925 LRF diffbinds upon GATA2AS loss showed reduced binding of LRF compared with controls (Figure 4C) and almost 90% were in promoters (Figure 4D). Motif analysis of LRF diffbinds also showed enrichment of many erythroid regulatory pathways (supplemental Figure 4A). KLF1 and LRF ChIP-seq signal density was significantly decreased genome wide (Figure 4E) and on GATA2AS cobound sites upon GATA2AS KO (supplemental Figure 4C). However, KLF1 and LRF transcription and protein levels were unaltered (supplemental Figure 5A-B).
KO of GATA2AS in HUDEP2 cells decreased KLF1 and LRF binding genome wide. (A,C) MA plot of differential binding of KLF1(A) or LRF (C) in undifferentiated GATA2AS KO HUDEP2 cells compared with controls. (B,D) Genomic distribution of differential KLF1 (B) or LRF (D) binding sites. (E) Read density plot of KLF1 (left) and LRF (right) on binding sites after GATA2AS KO compared with control cells. (F) Venn diagram showing KLF1 or LRF diffbinds that are also GATA2AS DEGs at day 0 and their overlap. (G) Top panels show gene ontology (GO) terms enriched in KLF1 or LFR diffbinds that are also GATA2AS DEGs at day 0. Lower panels show GO terms for erythroid fingerprint genes in these groups. (H-I) Genome browser views of representative upregulated (HSPA9) and downregulated (EIF2A) genes after GATA2AS KO HUDEP2 cells. GATA2AS ChIRP-seq, RNA-seq, H3K27ac ChIP-seq, LRF or KLF1 ChIP-seq signal in KO control (KO_ctrl) and GATA2AS KO (KO_GATA2AS) are indicated.
KO of GATA2AS in HUDEP2 cells decreased KLF1 and LRF binding genome wide. (A,C) MA plot of differential binding of KLF1(A) or LRF (C) in undifferentiated GATA2AS KO HUDEP2 cells compared with controls. (B,D) Genomic distribution of differential KLF1 (B) or LRF (D) binding sites. (E) Read density plot of KLF1 (left) and LRF (right) on binding sites after GATA2AS KO compared with control cells. (F) Venn diagram showing KLF1 or LRF diffbinds that are also GATA2AS DEGs at day 0 and their overlap. (G) Top panels show gene ontology (GO) terms enriched in KLF1 or LFR diffbinds that are also GATA2AS DEGs at day 0. Lower panels show GO terms for erythroid fingerprint genes in these groups. (H-I) Genome browser views of representative upregulated (HSPA9) and downregulated (EIF2A) genes after GATA2AS KO HUDEP2 cells. GATA2AS ChIRP-seq, RNA-seq, H3K27ac ChIP-seq, LRF or KLF1 ChIP-seq signal in KO control (KO_ctrl) and GATA2AS KO (KO_GATA2AS) are indicated.
Most KLF1 (67%) or LRF diffbinds (57%) were in promoters of GATA2AS KO DEGs (supplemental Figure 4B). To explore the coregulation of genes by GATA2AS and KLF1 or LRF, we focused on diffbind promoters occupied by KLF1 (132) or LRF (187) that were also GATA2AS DEGs at D0 (Figure 4F). KLF1 and LRF ChIP-seq signal density was significantly decreased on these sites after GATA2AS loss (supplemental Figure 4C). Pathways involved in translation were enriched among KLF1 diffbinds/GATA2AS KO DEGs at D0, whereas heme biosynthetic process pathway genes were enriched among LRF diffbinds (Figure 4G). These proclivities were reinforced after GO term analysis was limited to erythroid fingerprint genes (brackets in Figure 4F-G, lower panels).38-40
Of these diffbind/DEGs, 54 (KLF1) and 82 (LRF) were co-occupied by GATA2AS at day 0 according to ChIRP. Translation initiation factor EIF2A is a downregulated GATA2AS KO DEG that loses KLF1 binding and active histone mark H3K27ac (Figure 4H and see below). HSPA9, a regulator of heme synthetic pathway gene ALAS,41 is an upregulated GATA2AS KO DEG that loses LRF binding and gains H3K27ac (Figure 4I). These examples reflect KLF1 activation and LRF repressive function. However, loses or gains in KLF1 or LRF were observed for both upregulated and downregulated DEGs after GATA2AS KO. Overall, GATA2AS loss can directly affect the expression of a subset of genes, including erythroid fingerprint genes, in progenitor cells, possibly by destabilizing interactions of erythroid regulators at target genes.
GATA2AS dysregulation of genes through enhancers and influence on the H3K27ac landscape in erythroid progenitor cells
Almost all KLF1 and LRF diffbinds are in promoters, whereas only 10% of GATA2AS peaks are so localized. To capture the function of GATA2AS nonpromoter peaks, we first queried GREAT for predicted cis-regulatory regions with GATA2AS peaks that were not promoters and that were associated with KLF1 or LRF diffbind peaks. We identified 359 KLF1 diffbind promoters and 563 LRF diffbind promoters interacting with a nonpromoter GATA2AS peak, potentially an enhancer (Figure 5A). The largest fraction of associations (KLF1, 20%; LRF, 30%) was between a GATA2AS occupied distal potential enhancer and a KLF1 or LRF diffbind gene not co-occupied by GATA2AS (Figure 5A, coupled). Examples include SERINC5, EEF1A1, and STAT6 (supplemental Figure 6A-C). In very few cases, the KLF1 or LRF diffbind promoter was co-occupied by GATA2AS and associated with a GATA2AS occupied potential enhancer or a potential enhancer that was not occupied by GATA2AS (Figure 5A, coupled/cobound or cobound). When we queried GREAT for cis-regulatory elements associated with KLF1 or LRF diffbinds that were not at promoters, we identified an additional subset of promoters bound by GATA2AS with either KLF1 or LRF at potential enhancers, of which MYB is an example (supplemental Figure 7A-B).
KO of GATA2AS increases H3K27ac modification on GATA2AS/LRF sites. (A) Percent of KLF1 or LRF diffbinds genes associated with putative cis-regulatory element bound by GATA2AS (coupled), cobound with GATA2AS, or their overlap (coupled/cobound) predicted by GREAT (left). Schematic diagram for the model of coupled, cobound, and coupled/cobound (right). (B) MA plot showing differential H3K27ac sites upon KO of GATA2AS in HUDEP2 cells. (C) Distribution of H3K27ac diffbinds in the genome. (D) Heat map of GATA2AS peaks and H3K27ac ChIP-seq signal on the 2 sets (with or without LRF binding) of GATA2AS binding sites for control and GATA2AS KO cells. (E) Read density plot of H3K27ac ChIP-seq signal on GATA2AS/LRF binding peaks. (F) Motifs enriched in H3K27ac diffbinds were analyzed with HOMER. q-value for all motifs <0.05. (G) Heat maps showing ChIP-seq signal of YY1 from ENCODE and published LDB1 data33 on upregulated and downregulated H3K27ac sites.
KO of GATA2AS increases H3K27ac modification on GATA2AS/LRF sites. (A) Percent of KLF1 or LRF diffbinds genes associated with putative cis-regulatory element bound by GATA2AS (coupled), cobound with GATA2AS, or their overlap (coupled/cobound) predicted by GREAT (left). Schematic diagram for the model of coupled, cobound, and coupled/cobound (right). (B) MA plot showing differential H3K27ac sites upon KO of GATA2AS in HUDEP2 cells. (C) Distribution of H3K27ac diffbinds in the genome. (D) Heat map of GATA2AS peaks and H3K27ac ChIP-seq signal on the 2 sets (with or without LRF binding) of GATA2AS binding sites for control and GATA2AS KO cells. (E) Read density plot of H3K27ac ChIP-seq signal on GATA2AS/LRF binding peaks. (F) Motifs enriched in H3K27ac diffbinds were analyzed with HOMER. q-value for all motifs <0.05. (G) Heat maps showing ChIP-seq signal of YY1 from ENCODE and published LDB1 data33 on upregulated and downregulated H3K27ac sites.
Published H3K27ac HiChIP data,42 which detect interacting active enhancer-promoter pairs, documented that ∼8% of KLF1 or LRF diffbind genes looped to a GATA2AS occupied enhancer (supplemental Figure 7C). We conclude that although GATA2AS can bind to GATA2AS DEGs together with KLF1 or LRF, it more frequently binds to potential enhancers of GATA2AS DEGs whose promoters are occupied by either KLF1 or LRF alone. It is important to note that only a subset of GATA2AS sites function together with KLF1 or LRF in any orientation, and the majority of sites appear to function through other mechanisms that remain to be investigated.
ChIP-seq and DiffBind analysis for H3K27ac revealed that most differentially modified peaks had increased H3K27ac enrichment comparing GATA2AS KO cells with controls (Figure 5B). Seventy percent of these differentially acetylated regions were at promoters and ∼30% (∼500) were intronic or intergenic (Figure 5C). Heat maps of H3K27ac acetylation showed enrichment at GATA2AS/LRF jointly occupied peaks compared with GATA2AS only (Figure 5D). Global H3K27ac read density on GATA2AS/LRF peaks was increased in GATA2AS KO cells compared with controls, supporting that release of GATA2AS from chromatin has an activating effect on promoters and enhancers genome wide (Figure 5E).
Interestingly, motifs for chromatin looping factors YY1 and the LDB1/TAL1/LMO2 complex were modestly enriched at differentially acetylated regions after GATA2AS loss (Figure 5F).43,44 Published ChIP-seq data for YY1 in K562 cells and for LDB1 in HUDEP2 cells, indicated clustering of these factors with H3K27ac GATA2AS KO differentially acetylated peaks33 (Figure 5G). Stronger YY1 and LDB1 signal was observed on increased H3K27ac peaks than on decreased H3K27ac peaks. This result supports an association of these chromatin looping proteins with activated enhancers and promoters after GATA2AS loss.
Differentially accessible sites after GATA2AS loss are GATA factor switch sites
We used ATAC-seq and DiffBind software to determine chromatin accessibility or open chromatin, revealing that 2506 peaks gained and 2217 lost accessibility after GATA2AS KO compared with controls (Figure 6A). Twenty percent to 30% of the differentially accessible regions were at promoters, whereas 65% to 70% were intronic or intergenic (Figure 6B). Heat maps of ATAC-seq signals showed clustering near GATA2AS/LRF sites and very low signal at GATA2AS-only peaks (Figure 6C). This suggests that at many sites, GATA2AS does not directly affect chromatin accessibility.
KO of GATA2AS affects chromatin accessibility on GATA2 and GATA1 switching sites. (A) MA plot showing differential ATAC-seq accessibility sites upon KO of GATA2AS in undifferentiated HUDEP2 cells. (B) Distribution of differentially accessible sites losing accessibility (left, down) or gaining accessibility (right, up) in the genome upon GATA2AS loss. (C) Heat maps of ATAC-seq signal on the 2 sets (with or without LRF binding) of GATA2AS binding peaks. (D) Motifs enriched in up and down differentially accessible sites were analyzed with HOMER. GATA1 and GATA2 motifs were enriched in both up and down differentially accessible sites. q-value for all motifs is <0.0001. (E) Heat maps showing GATA2 and GATA1 ChIP-seq signal of erythroid progenitor (Eprog) and precursor (Eprec) cells on sites that were up and down differentially accessible sites in GATA2AS KO cells. GATA1 binding on upregulated differentially accessible sites (blue trace) or downregulated sites (green trace) in upper graphs.
KO of GATA2AS affects chromatin accessibility on GATA2 and GATA1 switching sites. (A) MA plot showing differential ATAC-seq accessibility sites upon KO of GATA2AS in undifferentiated HUDEP2 cells. (B) Distribution of differentially accessible sites losing accessibility (left, down) or gaining accessibility (right, up) in the genome upon GATA2AS loss. (C) Heat maps of ATAC-seq signal on the 2 sets (with or without LRF binding) of GATA2AS binding peaks. (D) Motifs enriched in up and down differentially accessible sites were analyzed with HOMER. GATA1 and GATA2 motifs were enriched in both up and down differentially accessible sites. q-value for all motifs is <0.0001. (E) Heat maps showing GATA2 and GATA1 ChIP-seq signal of erythroid progenitor (Eprog) and precursor (Eprec) cells on sites that were up and down differentially accessible sites in GATA2AS KO cells. GATA1 binding on upregulated differentially accessible sites (blue trace) or downregulated sites (green trace) in upper graphs.
Motif analysis to predict potential transcription factors that might affect chromatin accessibility at GATA2AS differentially accessible sites revealed significant enrichment of GATA1 and GATA2 motifs (Figure 6D). During erythroid differentiation, GATA1 replaces GATA2 to activate terminal erythroid gene expression, and GATA factors are considered key erythroid regulators and chromatin organizers.45 Published ChIP-seq data for GATA1/2 in human erythroid progenitors and more mature precursors during differentiation46 indicated GATA2 peaks clustered with GATA2AS differentially accessible peaks in progenitors but not in later precursor cells. GATA1 peaks clustered with GATA2AS differentially accessible peaks at both stages of differentiation, with enrichment at sites whose accessibility increased upon GATA2AS KO rather than decreased (Figure 6E). These results show a switch between GATA2 and GATA1 on GATA2AS KO differentially accessible sites during differentiation. Thus, GATA1 and GATA2 may be among factors involved in changes in chromatin accessibility after GATA2AS loss.
GATA2AS influences differentiation of primary CD34+ cells
We extended our study of GATA2AS function to primary human CD34+ HSCs using a 2-step erythroid differentiation protocol (Figure 7A).47 Two shRNAs against GATA2AS were transduced at day 5, but the efficiency of the knockdown (KD) was poor, with at least 50% of GATA2AS remaining in cells (Figure 7B). A limited effect on proliferation was observed in shRNA-treated CD34+ cells, but GATA2 transcription and the HBG/(HBG + HBB) ratio was reduced, consistent with HBG reduction in HUDEP2 GATA2AS KO cells (supplemental Figure 8B-C). Flow cytometry revealed KD of GATA2AS in CD34+ cells accelerated gain of CD235a and accelerated loss of CD71 (Figure 7C-D). Thus, both in HUDEP2 cells and in primary erythroid progenitors, loss of GATA2AS has an accelerating effect on erythroid differentiation.
KO of GATA2AS increased human primary CD34+ cell differentiation. (A) Schematic of human CD34+ cell differentiation. Cells were infected with lentivirus containing shRNA on day 5 and transduction efficiency was 21%. (B) KD of GATA2AS by shRNA-1 and -2 in human CD34+ cells detected on days 6 and 10 by RT-qPCR. (C) Flow cytometry analysis of CD235a and CD71 expression of GATA2AS KD and control human CD34+ cells during differentiation. Percentage of cells for each quadrant is indicated. (D) MFI of CD235a and CD71 during erythroid differentiation. (E) Heat map of 5 groups of expression patterns for subsets of genes in GATA2AS KO and KD. (F) Expression curves for the 5 groups of genes shown in panel E. (G) GO term analysis of shared DEGs between GATA2AS KD CD34+ cells and KO HUDEP2 cells. (H) Examples of shared DEGs between GATA2AS KD CD34+ cells and KO HUDEP2 cells.
KO of GATA2AS increased human primary CD34+ cell differentiation. (A) Schematic of human CD34+ cell differentiation. Cells were infected with lentivirus containing shRNA on day 5 and transduction efficiency was 21%. (B) KD of GATA2AS by shRNA-1 and -2 in human CD34+ cells detected on days 6 and 10 by RT-qPCR. (C) Flow cytometry analysis of CD235a and CD71 expression of GATA2AS KD and control human CD34+ cells during differentiation. Percentage of cells for each quadrant is indicated. (D) MFI of CD235a and CD71 during erythroid differentiation. (E) Heat map of 5 groups of expression patterns for subsets of genes in GATA2AS KO and KD. (F) Expression curves for the 5 groups of genes shown in panel E. (G) GO term analysis of shared DEGs between GATA2AS KD CD34+ cells and KO HUDEP2 cells. (H) Examples of shared DEGs between GATA2AS KD CD34+ cells and KO HUDEP2 cells.
We performed RNA-seq for GATA2AS CD34+ KD cells and controls and observed sets of genes dysregulated at different time points of differentiation, with 6129 DEGs in total (supplemental Figure 8D). GO term analysis of DEGs revealed enrichment of genes in pathways related to erythrocyte homeostasis/differentiation and heme signaling, similar to HUDEP2 cells (supplemental Figure 8E). Importantly, thousands of DEGs were shared between GATA2AS KD CD34+ cells and KO HUDEP2 cells, of which 562 were erythroid fingerprint genes (supplemental Figure 8F). Clustering revealed groups of genes whose response to GATA2AS loss was the same in the 2 cell types (Figure 7E-F; supplemental Table 4). GO terms enriched for the shared GATA2AS targets in the 2 cell types were mostly erythroid pathways. Examples include many erythroid differentiation genes (Figure 7G-H). Together, the data from HUDEP2 cells and CD34+ progenitor cells support a regulatory role for GATA2AS in erythropoiesis.
Discussion
GATA2AS is a human-specific lncRNA that is transcribed divergently to GATA2. We found that KO of GATA2AS in HUDEP2 cells promoted differentiation with reduced proliferation. GATA2AS KO altered the transcriptome of differentiating cells and downregulated GATA2 as well as HBG1 and HBG2. ChIRP revealed that GATA2AS targets thousands of genomic sites and colocalizes with erythroid transcription factors including LRF and KLF1 at a subset of promoter sites. Most GATA2AS sites correspond to potential enhancers that, in some cases, function together through long range interactions with KLF1 and LRF at promoters. An shRNA partial KD of GATA2AS in CD34+ progenitor cells revealed thousands of dysregulated genes shared with HUDEP2 GATA2AS KO cells, including >500 erythroid fingerprint genes. Thus, GATA2AS is essential for the process of differentiation in erythroid cells.
GATA2AS KO in HUDEP2 cells and KD in CD34+ cells had a negative effect on GATA2 transcription before differentiation. Upregulation of GATA2 by GATA2AS is consistent with published data reporting positive feedback between the two.16 ChIRP showed that some of the strongest GATA2AS sites in HUDEP2 cells were within GATA2AS coding exons, but the GATA2 promoter also had a peak. However, GATA2 was not among the LRF or KLF1 diffbinds that were also GATA2AS DEGs in our data. Therefore, further work will be required to explore the mechanism by which GATA2AS may drive expression of GATA2. Because GATA2AS interacts with LRF, it is appealing to speculate that GATA2AS may function as a novel direct regulator of GATA2 by sequestering repressor LRF.7 We found that in GATA2AS KO HUDEP2 cells, HBG1 and HBG2 were downregulated, and HbF was reduced. Interestingly, the HBG1/2 promoters do not have GATA2AS peaks nor do HS1-4 of the β-globin LCR, suggesting an indirect effect of GATA2AS. None of the known HbF repressors increased upon GATA2AS KO, supporting the possibility of additional repressor mechanisms.
To investigate potential coregulation by KLF1 or LRF with GATA2AS, we compared DEGs in GATA2AS KO HUDEP2 cells with DEGs in published KLF1 KO HSPCs (see supplemental Methods) or LRF KO HUDEP2 cells.30 Thirty-four percent of GATA2AS KO DEGs (936) were shared with KLF1 KO cells, but only 3% (71 DEGs) were shared with LRF KO cells (supplemental Figure 9A-B). Log2-fold change plots for these data showed that GATA2 was among those upregulated upon LRF loss and downregulated upon GATA2AS loss, consistent with the speculation above. Gene set enrichment analysis using these data showed a strong correlation between KLF1 KO and GATA2AS KO upregulated/downregulated DEGs, suggesting KLF1 and GATA2AS have similar effects on gene expression (supplemental Figure 9E). There was a much weaker correlation between LRF KO and GATA2AS KO upregulated DEGs, suggesting that GATA2AS is infrequently a repressor, but a stronger correlation between LRF and GATA2AS KO downregulated DEGs. Overall, there is a considerable overlap of targets between GATA2AS and these factors, particularly KLF1.
We found using the GREAT algorithm to predict associated regions of chromatin that GATA2AS at putative enhancers can regulate target promoters occupied by KLF1 or LRF alone or together with GATA2AS. However, the vast majority of GATA2AS peaks were neither coupled nor cobound with peaks for KLF1 or LRF. Thus, GATA2AS can regulate erythroid gene expression in multiple orientations, and our data indicate, it may cooperate with additional yet to be explored transcription factors/RNA binding proteins. Taken together, our study provides new insight into the role of the human-specific lncRNA GATA2AS in erythropoiesis and erythroid gene expression.
Acknowledgments
HUDEP2 cells were a kind gift from R. Kurita and Y. Nakamura (RIKEN BioResource Center). The authors thank Rebecca Chu and John Tisdale for HPLC analysis. The authors acknowledge the National Institutes of Health (NIH), National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Genomics Core for sequencing support and NIH High-Performance Computing (Biowulf) for computational support.
This work was supported by the Intramural Research Program of th NIH/ NIDDK (Z1A DK755015; A.D.).
Authorship
Contribution: G.L. and A.D. designed the study; G.L., J.K., and N.N. carried out experiments; L.Z. performed bioinformatic analysis; G.L., J.K., and A.D. analyzed data and wrote the manuscript; and all authors edited the manuscript.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: Ann Dean, Laboratory of Cellular and Developmental Biology, National Institute of Diabetes and Digestive and Kidney Diseases, National Institutes of Health, Building 50, Room 3154, 50 South Dr, MSC 8028, Bethesda, MD 20892; email: ann.dean@nih.gov.
References
Author notes
G.L. and J.K. are joint first authors.
The data reported in this article have been deposited in the Gene Expression Omnibus database (accession number GSE213779).
Data are available on request from corresponding author, Ann Dean (ann.dean@nih.gov).
The online version of this article contains a data supplement.
There is a Blood Commentary on this article in this issue.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal