Key Points
Heptad TF occupancy is highly dynamic across HSPC subsets and associated with cell-type–specific gene expression.
Enhancers with cell-type–specific heptad occupancy share a common grammar with respect to TF binding motifs.
Abstract
Hematopoietic stem and progenitor cells (HSPCs) rely on a complex interplay among transcription factors (TFs) to regulate differentiation into mature blood cells. A heptad of TFs (FLI1, ERG, GATA2, RUNX1, TAL1, LYL1, LMO2) bind regulatory elements in bulk CD34+ HSPCs. However, whether specific heptad-TF combinations have distinct roles in regulating hematopoietic differentiation remains unknown. We mapped genome-wide chromatin contacts (HiC, H3K27ac, HiChIP), chromatin modifications (H3K4me3, H3K27ac, H3K27me3) and 10 TF binding profiles (heptad, PU.1, CTCF, STAG2) in HSPC subsets (stem/multipotent progenitors plus common myeloid, granulocyte macrophage, and megakaryocyte erythrocyte progenitors) and found TF occupancy and enhancer-promoter interactions varied significantly across cell types and were associated with cell-type–specific gene expression. Distinct regulatory elements were enriched with specific heptad-TF combinations, including stem-cell–specific elements with ERG, and myeloid- and erythroid-specific elements with combinations of FLI1, RUNX1, GATA2, TAL1, LYL1, and LMO2. Furthermore, heptad-occupied regions in HSPCs were subsequently bound by lineage-defining TFs, including PU.1 and GATA1, suggesting that heptad factors may prime regulatory elements for use in mature cell types. We also found that enhancers with cell-type–specific heptad occupancy shared a common grammar with respect to TF binding motifs, suggesting that combinatorial binding of TF complexes was at least partially regulated by features encoded in DNA sequence motifs. Taken together, this study comprehensively characterizes the gene regulatory landscape in rare subpopulations of human HSPCs. The accompanying data sets should serve as a valuable resource for understanding adult hematopoiesis and a framework for analyzing aberrant regulatory networks in leukemic cells.
Introduction
Hematopoietic stem cells (HSCs) maintain production of circulating blood cells via their capacity to either self-renew or differentiate into mature cell types.1 The most primitive HSCs have multilineage potential but give rise to progenitor cells with increasing lineage restriction. Although single-cell analyses have suggested that differentiation occurs over a continuum rather than in discrete leaps,2-4 relatively pure populations which correspond to intermediate progenitor stages can be prospectively isolated based on cell-surface markers.4
Changes in cell identity across differentiation trajectories are directly related to altered transcriptional programs5 which are controlled by lineage-specific gene regulatory networks (GRNs).6 At the simplest level, GRNs comprise genes, their associated regulatory elements (promoters and cis-regulatory elements [CREs], such as enhancers), and transcriptional regulators, including transcription factors (TF), which bind these elements.7,8 Accessibility of regulatory elements is controlled by various chromatin modifications, and the DNA sequence of such elements at least partially determines which TFs can bind.9,10 Further control is imposed by chromatin organization into topologically associated domains.11 Interactions between promoters and their CREs, mediated by chromatin loops and complexes of transcriptional regulators, modulate GRNs and therefore cell identity.12
We previously showed that 7 TFs (heptad: FLI1, ERG, GATA2, RUNX1, TAL1, LYL1, and LMO2), all known regulators of healthy and leukemic haematopoiesis,13-20 bind combinatorically in bulk CD34+ hematopoietic stem and progenitor cells (HSPCs)21 and leukemias.22-24 In healthy HSPCs, heptad combinatorial binding occurs at regulatory regions associated with genes involved in stem-cell maintenance and function and also at heptad CREs such that heptad genes form a highly interconnected regulatory circuit.21,25 However, study of GRNs in HSPCs is hindered by heterogeneity within the CD34+ population and lack of experimental evidence linking promoters to distal regulatory elements.
To address these issues and further our understanding of heptad centered GRNs in blood development, we sorted CD34+ HSPCs into HSC/multipotent progenitor (MPP, collectively HSC-MPP), common myeloid progenitor (CMP), granulocyte macrophage progenitor (GMP), and megakaryocyte erythrocyte progenitor (MEP) cell types. We then used chromatin immunoprecipitation followed by sequencing (ChIP-seq) targeting 10 TFs (heptad, PU.1, CTCF, and STAG2) and 3 histone modifications (H3K27ac, H3K4me3, and H3K27me3), Hi-C, and H3K27ac-HiChIP in each of these cell types to chart the regulatory landscape of human HSPC differentiation. Combinatorial binding of heptad TFs was observed in all sorted populations, although specific patterns of chromatin occupancy differed among cell types. Heptad promoter looping to putative enhancers was variable across cell types, and in many cases combinatorial binding was observed at CREs in immature cells before formation of loops in more mature progenitors. Genome-wide occupancy of heptad TFs was also variable across cell types, with distinct sets of CREs enriched for heptad binding in MEP compared with that in GMP. This variation was at least partially due to sequence motifs in the CREs, with motif composition sufficient to predict cell type with high sensitivity and specificity.
Methods
Mobilized peripheral blood was collected with patient consent in accordance with the Declaration of Helsinki and used with institutional ethics approval ref:08/190 from the South-Eastern Sydney Local Health District.
Cell sorting, preparation of nuclei, and ChIP were performed essentially as described4,22,26 (detailed in supplemental Methods, available on the Blood website). HiC and HiChIP libraries were generated using the Arima Genomics HiC+ kit (Arima catalog no. A101020).
Bioinformatic analysis used standard pipelines for ChIP21 and HiC/HiChIP data.27 Machine learning models were trained using XGBoost.28 A UCSC browser session is provided at http://genome.ucsc.edu/s/PimandaLab/Heptad_Regulome. We also provide a tool for data exploration (http://unsw-data-analytics.shinyapps.io/CD34_Heptad_Regulome).
Details of standard experimental and analysis techniques are provided in the supplemental Methods.
Results
Genome-wide heptad factor binding in HSPCs
Primary mobilized human CD34+ HSPCs were sorted (Figure 1A; supplemental Figure 1A), purity checked with colony assays (supplemental Figure 1B), and fixed for downstream assays (Figure 1B; supplemental Table 1). Cell populations were chosen to span the trajectory from early multipotent stem cells (HSC-MPP) to progenitors committed to the myeloid (GMP) or erythroid (MEP) lineages.
High-quality ChIP data were obtained for all heptad factors (Figure 1C-D; supplemental Table 2; http://genome.ucsc.edu/s/PimandaLab/Heptad_Regulome). Total peak numbers were highly variable between heptad factors and across cell types (Figure 1D). We observed cell-type–specific trends consistent with known expression patterns and biology. For example, ERG peaks were most abundant in HSC-MPPs, consistent with its role in maintaining the stem cell state,29,30 whereas TAL1 peaks were most abundant in MEP, consistent with its role in erythroid development (Figure 1D). Distributions of TFs across genome features were generally conserved across cell types but differed among heptad factors. For example, FLI1, ERG, and RUNX1 peaks were located at both promoter and nonpromoter regions (Figure 1D), whereas GATA2, TAL1, LYL1, and LMO2 peaks were predominantly located at nonpromoter regions. TAL1 peak distribution in MEP was unique, with many peaks found at intergenic regions. Motif enrichment analysis showed ETS (GGAA) and E-Box (CANNTG) motifs in TF-occupied regions from all factors. FLI1, ERG, and RUNX1 peaks were highly enriched for the ETS motif, whereas GATA2, TAL1, LYL1, and LMO2 peaks showed additional enrichment for the GATA motif (GATA), particularly for GATA2 and TAL1 in MEP. Consistent with observations in bulks HSPCs21 enrichment of RUNX1 motifs was minimal. Overall, we observed not only conserved patterns of heptad binding, but also distinct differences between factors and across cell types, consistent with dynamic remodeling of the heptad network across the HSPC differentiation trajectory.
Combinatorial binding of heptad TFs is cell-type–specific
We previously described combinatorial binding of heptad factors in bulk human HSPCs21 and now extend these observations to specific cell populations. We quantified ChIP peaks containing all possible combinations of ≥2 heptad factors (Figure 2Ai-ii), and evaluated the probability of these occurring by random chance in each cell type (Figure 2Aiii). As in bulk HSPCs,21 combinatorial binding of all 7 factors was the most significant event in any of the 4 cell types (z scores for all 7 factors; HSC-MPP: 8199.94, CMP:6314.92, GMP:3543.93, and MEP:1877.48). Pairwise combinations had low significance scores, with exceptions such as ERG/RUNX in HSC-MPPs. Higher order complexes generally had higher significance scores, with 5 of 7 possible 6-factor complexes showing highly significant binding in at least 1 cell type. Specific combinations broadly matching known TF function were also observed. For example, 5- or 6-factor combinations lacking GATA2, TAL1, or both, had low significance scores in MEPs (Figure 2A, stars). We also asked whether PU.1 showed combinatorial binding with heptad factors. As previously observed,25 co-binding of single heptad factors with PU.1 was minimal (supplemental Figure 2A), whereas addition of PU.1 to 6 of 7 TF combinations modestly accentuated existing patterns (supplemental Figure 2B). Overall, we find that combinatorial binding of heptad TFs is a general feature of stem and progenitor cells, with some combinations restricted to specific cell types.
Given the cell-type specificity of some TF combinations we predicted that dynamic formation of TF complexes might play a role in priming CREs for subsequent activation. We explored heptad factor binding at promoter regions of 2 lineage-specific genes: the erythroid regulator GATA1 and the monocyte gene MPO, neither of which showed heptad binding in HSC-MPPs. At the GATA1 promoter (Figure 2B; yellow region), GATA2, RUNX1, TAL1, LYL1, and LMO2 binding was observed in CMP. As cells transitioned to MEP, binding peaks became more prominent and now included FLI1 and ERG. However, there was essentially no heptad binding at the GATA1 promoter in GMP. At the MPO promoter (Figure 2C; yellow region), FLI1, ERG, TAL1, LYL1, and LMO2 binding was observed in CMP. Binding peaks became more prominent as cells transitioned to GMP, with RUNX1 also bound, but no TF binding was observed at the MPO promoter in MEP. Taken together our data suggest that distinct patterns of heptad TF binding may prime the genomic landscape of human blood stem cells toward either an erythroid or myeloid fate.
Heptad regulatory circuits are remodeled during myeloid progenitor development
Genes encoding heptad TFs form a densely interconnected regulatory circuit in bulk HSPCs,21 and chromatin accessibility at heptad gene CREs is sufficient to predict blood-cell identity.23 To better understand genome organization and connectivity at heptad loci during HSPC development we performed HiC and H3K27ac HiChIP experiments on HSC-MPP, CMP, GMP, and MEP. Although the majority of genome compartments remained stable across all cell types, some compartments underwent B to A or A to B switching upon transition from CMP to lineage-committed progenitor (supplemental Figure 3Ai). Notably, some regions underwent compartment switching only in GMP, with corresponding changes in H3K27ac signal (supplemental Figure 3Aii-iii). Topologically associated domain boundaries were highly conserved between cell types (supplemental Figure 3B), and HiC contact matrixes around heptad gene loci were also highly similar (supplemental Figure 4). Global CTCF and STAG2 binding was also conserved across cell types (supplemental Figure 5). Overall, consistent with previous reports,31 we observed minimal variation in high-level genome organization among HSPC subpopulations.
H3K27ac HiChIP experiments generated thousands of significant interactions with false discovery rate ≤0.01 (HSC-MPP: 26210, CMP: 8170, GMP: 43448, and MEP: 32773; supplemental Table 3). We focused on loops in which at least 1 interacting region was annotated as a promoter (P). Because this experiment enriched fragments with the H3K27ac active enhancer mark, we consider looped CREs as putative enhancers (E). We filtered promoter-enhancer (P-E) loops based on presence of a transposase-accessible chromatin with sequencing (ATACseq) peak at the distal enhancer and integrated these with ChIP-seq data to create regulatory network maps at each heptad gene locus for each cell type.
Promoter-enhancer loops corresponding to the heptad spanned wide genomic regions; ∼500 kb for FLI1, GATA2, and TAL1 and from 1 to 2 Mb for ERG, RUNX1, LYL1, and LMO2 (Figure 3Ai-ii; supplemental Figure 6). Overall, we detected multiple putative enhancers for the heptad genes ERG (−610/−410/−230/+85/+88/+191/+1200), FLI1 (+27/+32/+64), GATA2 (−123/−92/+4), RUNX1 (−880/+22/+100/+110/+141/+161), TAL1 (−101/−82/−25/+0.5/+14/+45), LYL1 (−744/−50/+165/+310), and LMO2 (−570/−100/−67/−61/−51/−23/−22/−15/−12) (Figure 3Ai-ii; supplemental Figure 6; supplemental Table 4). Some regions have been described in humans and/or in mice 17,21,32-45 whereas others were novel (supplemental Table 4). Two known heptad enhancers were not directly looped to their corresponding promoter (FLI1-15, GATA2-117; supplemental Figure 6Ai,Bi),21,44 although we did observe looping between GATA2-117 and other putative enhancers in MEPs (supplemental Figure 6Bi). Furthermore, looping at heptad genes generally increased in complexity in GMP and MEP compared to HSC-MPP, with the sparsest looping observed in CMP (Figure 3Ai; supplemental Figure 6). This additional complexity often included extensive looping to CREs not directly connected to promoters, suggesting that heptad gene expression is fine-tuned by highly interconnected cell-specific enhancer communities (eg, TAL1 locus in MEP [supplemental Figure 6Di], LYL1 locus in GMP and MEP [supplemental Figure 6Ei]).
Most directly looped elements showed heptad factor binding in at least 1 cell type (Figure 3Aiii; supplemental Figure 6). Some core CREs showed extensive heptad binding in all cell types, including known and functionally validated enhancers, such as ERG+85, GATA2+4, RUNX1+22, and TAL+45 (Figure 3Aiii; supplemental Figures 6Biii,Ciii,Diii; supplemental Table 4), plus novel CREs, such as ERG-410 and LMO2-570 that can now be linked to heptad genes (Figure 3Aiii; supplemental Figure 6Fiii). Integration of HiChIP and ChIP data at heptad gene loci showed diverse patterns of looping and TF binding across the 4 cell types (Figure 3Bi-ii). At the ERG locus, the ERG+85 enhancer was linked to the proximal promoter in HSC-MPP, GMP, and MEP, whereas other ERG elements showed promoter looping in GMP and/or MEP. (Figure 3Bii). Furthermore, heptad TF binding often occurred in HSC-MPPs with subsequent promoter looping of that element in more differentiated cell types (eg, GATA2−123, TAL1+45, and LMO2−570; Figure 3Bii). However, these epigenetic changes did not directly correspond to the steady state transcriptional output of heptad genes, which was relatively stable across the 4 cell types (Figure 3Biii).
Analysis of bulk HSPCs found that heptad genes regulate themselves and each other via a densely interconnected autoregulatory circuit.21 We used our expanded set of heptad CREs to construct network connectivity maps for each cell type (supplemental Figure 7). These maps show that although the transcriptional output of the heptad genes was relatively stable across HSC-MPP, CMP, GMP, and MEP, there were cell-type–specific differences in cis- and trans-regulatory mechanisms by which stable expression was maintained. Overall, our data set allowed us to observe extensive remodeling of the regulatory connections within and among individual heptad genes during hematopoiesis and increases our understanding of the complex network regulating heptad genes during hematopoiesis.
Heptad TFs and regulation of lineage-specific gene expression
We next asked whether heptad factor chromatin occupancy was associated with cell-type–specific transcriptional output. We identified genomic regions with cell-type–enriched binding of at least 2 heptad TFs (differentially enriched for heptad [DEH]) and linked these to genes using HiChIP data (genes associated with DEH [DEHG]) (Figure 4A; supplemental Table 5A). Approximately half of the regions were promoter-like (up to 10 kb upstream of a transcription start site [TSS]). To characterize candidate regulatory elements and their associated genes, we conducted gene set enrichment analysis, ingenuity pathway analysis, and single-cell analysis. DEHG in HSC-MPP/GMP/MEP had greater expression in their respective cell types compared to other cell types (Figure 4B) including previously reported lineage-specific genes such as SLAMF1 and MPO in GMPs46,47 and GATA1 and KLF1 in MEPs.48-50 Furthermore, genes differentially bound in GMP were enriched for pathways linked to myelopoiesis and granulopoiesis (supplemental Figure 8A) whereas genes linked to differentially bound regions in MEP were enriched for pathways linked to erythropoiesis (supplemental Figure 8B). Only 16 HSC-MPP–specific genes were identified, precluding pathway analysis. However, these genes included stem-cell regulators,51 such as HOXB1, HOXB2, and HOXB4 (supplemental Table 5A).
We also explored expression of DEHG in single cells across hematopoietic differentiation (Figure 4Ci).3 HSC-MPP–associated DEHG (DEHGHSC) were enriched in cells annotated as HSC clusters (Figure 4Cii), whereas DEHGGMP were enriched in monocyte clusters (Figure 4Ciii) and DEHGMEP in erythroid lineage cells (Figure 4Civ). Together these data support our hypothesis that heptad occupancy at regulatory elements can be linked to cell-specific transcriptional output.
Heptad TF binding at regulators of genes crucial for myeloid and erythroid cell development
We next asked whether we could detect specific patterns of heptad occupancy at known lineage-specific genes with roles in mature monocyte and granulocyte maintenance (myeloid cell development), at genes linked to erythroid cell development and heme metabolism (erythroid cell development), or at genes linked to stem cell function (supplemental Table 5B). From these lists we focused on genes whose promoters were looped to a putative enhancer in any of our HiChIP data sets. We identified 40 P-E pairs from the myeloid genes, 91 from the erythroid genes, and 81 from the stem cell genes (supplemental Table 5B) and used k-means clustering of TF binding signals to compare heptad occupancy patterns at each P-E pair in each cell type. We observed variable TF binding across associated promoter and enhancer regions. Genes in cluster 1 (C1) showed TF enrichment at promoters, genes in cluster 2 (C2) showed TF enrichment at enhancers, and genes in clusters 3 and 4 (C3 and C4) had TF enrichment at both promoters and enhancers (Figure 5A-B; supplemental Figure 9A). Furthermore, several TF-specific observations were evident. First, ERG occupancy at stem cell, myeloid, and erythroid genes was generally highest in HSC-MPPs and to a lesser extent GMPs, consistent with its role in maintaining stem cells (supplemental Figure 9B; supplemental Table 6).23,29,30 Second, FLI1 and RUNX1 were bound across both myeloid and erythroid gene–associated regions across all differentiation stages (supplemental Figure 9C). Third, GATA2, LYL1, and LMO2 have increased occupancy at myeloid- and erythroid-specific regulatory regions during lineage commitment (supplemental Figure 9D; supplemental Table 6). Fourth, high TAL1 occupancy was observed in MEPs and their precursor CMPs, and this occurred at regions linked to erythroid genes and regions linked to myeloid genes (supplemental Figure 9E; supplemental Table 6). Binding in CMPs may indicate a role for TAL1 in priming regulatory regions and recruiting activators or repressors in downstream cell types. Finally, there was a general pattern of heptad TF occupancy at promoters and enhancers of lineage-specific genes even in the earliest stem cells, suggesting that heptad factors bind lineage-specific regulatory regions before lineage commitment and subsequent differentiation.
To further explore whether heptad binding in differentiated cell types is initiated in early stem cells, we identified ChIP-seq peaks bound by a single heptad factor in HSC-MPPs and compared TF binding density at these regions across cell types. Regions with peaks in HSC-MPPs had additional heptad factors bound in committed progenitors. For example, peaks bound by ERG in HSC-MPPs (supplemental Figure 10A) showed prominent binding of FLI1 and RUNX1 in GMPs, and binding of all heptad factors, except for ERG, in MEPs (supplemental Figure 10B-C). Furthermore, regions bound by single heptad TFs in HSC-MPPs were subsequently bound by PU.1 in dendritic cells or GATA1 in proerythroblasts (supplemental Figure 10D). Overall, these data support dynamic assembly of heptad factor complexes at sites subsequently occupied by lineage-determining TFs. Further experimentation will resolve whether heptad factors actively prime CREs for use in more mature cell types.
Regulatory regions with cell-type–specific heptad occupancy have distinct epigenetic features
To better understand the underlying mechanisms regulating dynamic heptad occupancy during blood formation, we used our ChIP data sets to annotate and cluster ∼85 000 regions previously shown to be accessible in any of our 4 cell types.4 The resulting Uniform Manifold Approximation and Projection (UMAP) of the accessibility landscape could be segmented into 13 clusters (Figure 6A). Clusters 1 to 3 had characteristics of promoters, including high H3K4me3 signal (Figure 6B). We could further classify cluster 1 as active promoters (H3K4me3 and H3K27ac), cluster 2 as bivalent promoters (H3K4me3 and H3K27me3 enriched), and cluster 3 as active promoters which were also bound by CTCF. These classifications aligned with ChromHMM annotation of the same regions (Figure 6C).52 Cluster 4 was enriched for CTCF alone (Figure 6B) and may contain regions involved in 3-dimensional genome organization. Clusters 5 to 11 could be broadly characterized as nonpromoter regulatory regions which again aligned with their ChromHMM annotations (Figure 6C). Most of the variations among these regions mapped to variable TF occupancy in specific cell types (supplemental Figure 11A), which was particularly pronounced in clusters 8 to 11 which also showed cell-type–specific changes in chromatin accessibility (Figure 6B). For example, cluster 9 and 10 showed high accessibility in MEP compared to GMP, accompanied by high TAL1 and LYL1 occupancy (supplemental Figure 11A), suggesting that regions in these clusters may function in the erythroid lineage.
Visualizing TF signal across the accessibility landscape revealed variable occupancy patterns across cell types (supplemental Figure 11B). FLI1, RUNX1, and ERG had similar distributions across all cell types, with highest FLI1 and RUNX1 occupancy in CMP and GMP, and ERG occupancy reduced in MEP. LYL1 and LMO2 had similar distributions with enrichment in clusters 5, 6, or 7 in GMP and clusters 9 or 10 in MEP (supplemental Figure 11B). TAL1 was also enriched in clusters 9 or 10 in MEP with lower binding in other cell types, whereas PU.1 was enriched in cluster 5, 6, or 7 in CMP and GMP (supplemental Figure 11B). GATA2 had a unique occupancy pattern with enrichment in cluster 5 or 6 in HSC-MPP and GMP and additional occupancy in cluster 9 in CMP and MEP (supplemental Figure 11B) which may reflect distinct roles of GATA2 in early hematopoiesis and subsequent erythroid specification.53
We then made pairwise comparisons of TF binding signals in GMP vs MEP, HSC-MPP vs GMP, and HSC-MPP vs MEP (Figure 6D; supplemental Figure 11C). Comparing GMP (Figure 6D, green) to MEP (Figure 6D, orange), there were distinct zones of TF enrichment in each cell type (Figure 6D, gray dotted line [GMP], black dotted line [MEP]) except for TAL1 which showed no region of enrichment in GMP. To confirm that regions enriched for heptad TF binding in GMP and MEP represent lineage-specific regulators, we mapped ChIP-seq signal from lineage-defining TFs in 2 mature cell types (Figure 6E and F). Regions with high heptad occupancy in GMP showed similar occupancy of PU.1 in dendritic cells (Figure 6E),54 whereas regions with high heptad occupancy in MEP showed similar occupancy of GATA1 in proerythroblasts (Figure 6F).6 Taken together these data show that regulatory regions with heptad occupancy in progenitor populations are regions occupied by lineage-specific TFs in more mature cells.
Cell-type specificity of regulatory elements is encoded in the underlying motif composition
TFs bind DNA through consensus binding motifs whose sequence and relative locations in each regulator determine which TF complexes can potentially bind. To better understand enhancer features underpinning lineage-specific TF occupancy, we selected regions with differential accessibility and TF occupancy in HSC-MPP (Figure 7Ai, 3992 regions), GMP (Figure 7Bi, 4395 regions), and MEP (Figure 7Ci, 3469 regions) and developed machine learning models to predict cell-type associations based on DNA sequence motifs. All models showed high sensitivity and specificity to associate regions with cell types (Figure 7Aii,Bii,Cii; supplemental Table 7). To understand how specific motifs contribute to assigning regulatory elements to cell types, we calculated SHapley Additive exPlanations (SHAP) values for each cell type (Figure 7Aiii,Biii,Ciii; supplemental Table 8). Distinct combinations of motifs were found to contribute to each model, many of which fit the expected profile for cell-specific regulatory regions. For example, ETS motifs had positive SHAP values in the GMP model but negative values in the MEP model, consistent with known roles for ETS factors, such as PU.1, in driving myeloid differentiation, whereas GATA motifs had positive SHAP values in HSC-MPP and MEP models but negative values in the GMP model, consistent with GATA factors, such as GATA1, driving the erythroid lineage.55,56 Motifs not corresponding to heptad factors had high SHAP scores in all models, consistent with heptad factors binding at lineage-specific enhancers in the context of larger regulatory protein complexes. We explored expression of nonheptad TFs corresponding to motifs predicted to positively affect SHAP scores for each model (supplemental Table 8). GATA3 (HSC-MPP model), PU.1, and SPI-B (GMP model), and GATA1 (MEP model) had lineage-specific expression patterns consistent with known expression and activity of these factors49,53,56,57,58,59,60 (supplemental Figure 12A-C). All other motif-associated TFs had minimal expression variation across cell types.
Finally, we used our models to classify functional HSC regulatory elements identified through analysis of γ-retroviral integration sites (γRV-IS) in patients undergoing gene therapy.61 γRV-IS had similar features to HSC-MPP–specific regions (supplemental Figures 7A and 12D). Importantly, our models scored γRV-IS as most likely to be HSC-MPP–specific, confirming the validity of our analyses (supplemental Figure 12E).
Discussion
In this study we explored genome-wide dynamics of chromatin occupancy and structure in 4 cell types along the HSC to myeloid/erythroid differentiation axis. Analysis of putative CREs with direct looping to heptad promoter regions revealed greater regulatory complexity than previously appreciated. Previous work identified a set of 9 regions with combinatorial heptad binding in human HSPCs whose relative accessibility is sufficient to predict cell identity.23 This study identified >30 putative CREs directly looped to heptad promoters in at least 1 cell type (Figure 3B). Although most previously described heptad regulatory elements have been tested in functional assays (supplemental Table 4), further validation is required to precisely understand individual and cooperative roles of specific CREs for each heptad gene.62,63 Surprisingly, some previously identified enhancers were not directly looped to heptad promoters in this analysis. For example GATA2−117, an enhancer dysregulated in inv(3) acute myeloid leukemia,64,65 was only indirectly looped to the GATA2 promoter, and therefore excluded from our network model (supplemental Figure 6B). However, lack of direct looping does not preclude a role for this enhancer in regulating GATA2. Indeed, previous studies have found that enhancer-enhancer interactions can stabilize and amplify TF binding, and the resulting enhancer communities can drive and coordinate cell-type–specific gene expression.66,67 Consistent with this model, we observed a trend of increasingly complex chromatin looping interactions involving heptad promoters and putative enhancers as cells committed to either erythroid or myeloid lineages (Figure 3B). Increased looping may restrict committed progenitors from accessing alternate fates or participate in shutting off stem cell programs. Intriguingly, the stem cell gene ERG23,29,30 acquired loops to a +1200 kb region in MEPs (Figure 3Ai), whereas increased long range looping and heptad binding occurred at the +45 enhancer of the erythroid gene TAL114,68 (supplemental Figure 6Di). Thus, increased looping can be associated with genes subsequently down or upregulated. Further experimentation is needed to fully understand rapidly increased chromatin looping in committed progenitors.
Motif-based analysis showed that heptad-occupied CREs contain information encoding cell-type specificity. Among motifs that contributed to cell-type specificity, we found motifs for additional factors including STAT, interferon-regulatory factor, and homeodomain TFs (Figure 7; supplemental Table 8). Although our focus was on heptad factors, it is clear that additional regulators co-occupy CREs and contribute to enhancer specificity.69 To date these studies have focused on transcriptional cofactors with broad function, such as P53 and MED14,70 and further work is required to understand how heptad factors, general cofactors, and intrinsic features of enhancers and promoters cooperate to regulate transcriptional programs throughout hematopoiesis.
Numerous bone-marrow failure syndromes have an underlying germ line genetic cause and may be amenable to gene therapy approaches targeting primitive HSCs to enable long-term benefits. However, ectopic gene expression may lead to lineage skewing and other unwanted outcomes. Cell-type–specific enhancers are conserved across evolution and can potentially be used to restrict transgene expression.6,71,72 Indeed, erythroid-specific GATA1 enhancers can drive restricted expression of GATA1 to rescue erythroid differentiation in Diamond-Blackfan anemia.73 However, identifying enhancers with appropriate spatiotemporal activity is not trivial. Although experimental techniques for massively parallel interrogation of enhancer activity and specificity have been developed,63,74 our data provide a catalog of putative enhancers that could fast track functional studies.
GRNs are significantly perturbed in multiple acute myeloid leukemia subtypes,8,75 often as a result of specific translocation events, including RUNX1-RUNX1T176,77 and inv(3).64,65 Recent sequencing studies found that small structural variants (SVs) can mediate enhancer hijacking and subsequent oncogenic transformation.63,78 However, understanding the effects of SVs requires significant knowledge of cell-specific enhancers; our data set provides extensive annotation of regulatory regions that could be used to guide functional experiments exploring the effects of specific SVs.
Although extensive, our data set is limited to myeloid cell types. Sorted HSPCs, particularly CMPs, are also inherently heterogenous79-81 which may limit our ability to resolve differences among adjacent cell types. Furthermore, although capable of reconstituting patients who underwent transplantation, mobilized cells may have subtle epigenomic differences to their unstimulated counterparts. Finally, even using low cell input techniques, our data sets represent averaged binding and looping probabilities captured across a shifting epigenetic landscape, rather than truly capturing GRNs, which operate at the level of single cells.7 Nonetheless, our data set represents the most extensive epigenetic characterization of primary human blood progenitors to date and provides important insights for future studies.
Acknowledgments
The authors acknowledge members of the Pimanda group for assistance in thawing apheresis bags. The authors thank the South Australian Cancer Research Biobank and all patients who donated samples.
J.A.I.T. was supported by the Anthony Rothe Memorial Trust. B.G. acknowledges funding from Wellcome (206328/Z/17/Z). J.W.H.W was supported by the Research Grants Council, HK (17100920). J.E.P. was supported by grants from the National Health and Medical Research Council of Australia (GNT1139787, GNT2011627, MRF1200271), a translational program grant from the Leukemia and Lymphoma Society-Snowdome Foundation-Leukaemia Foundation (6620-21), and the Anthony Rothe Memorial Trust.
Authorship
Contribution: S. Subramanian, J.A.I.T., J.W.H.W., and J.E.P. conceptualized the study; S. Subramanian, J.A.I.T., Y.H., S. Jacquelin, S. Shen, E.S., C.B., P.S.W., D.C-F., D.B., A.T., and S.W.L. developed the methodology; S. Subramanian, J.A.I.T., Y.H., S. Shen, E.S., S. Joshi, D.J.C., K.Y., V.A., T.O., J.A.P., I.D.L., S.M.P., and M.K.G. carried out investigation; S. Subramanian, J.A.I.T., P.C-P., F.C.K., F.V., E.S.W., B.G., H.A.R., J.W.H.W., and J.E.P. did formal analysis; J.A.I.T., J.W.H.W., and J.E.P. supervised the study; S. Subramanian, J.A.I.T., and J.E.P. contributed to writing original draft; J.A.I.T. and J.E.P. reviewed and edited the manuscript; and J.E.P. contributed in funding acquisition.
Conflict-of-interest disclosure: The authors declare no competing financial interests.
Correspondence: John E. Pimanda, Level 2, Lowy Cancer Research Centre, University of New South Wales, Sydney, NSW 2052, Australia; e-mail: jpimanda@unsw.edu.au; Jason Wong, L1-51, Laboratory Block, 21 Sassoon Road, Pokfulam, Hong Kong SAR, China; e-mail: jwhwong@hku.hk; and Julie Thoms, Level 2, Lowy Cancer Research Centre, University of New South Wales, Sydney, NSW 2052, Australia; e-mail: j.thoms@unsw.edu.au.
References
Author notes
∗S. Subramanian and J.A.I.T. contributed equally to this study.
Sequencing data have been uploaded to Gene Expression Omnibus with accession number GSE231486.
Any original data not included herewith will be shared upon reasonable request from the corresponding author, John E. Pimanda (jpimanda@unsw.edu.au).
The online version of this article contains a data supplement.
There is a Blood Commentary on this article in this issue.
The publication costs of this article were defrayed in part by page charge payment. Therefore, and solely to indicate this fact, this article is hereby marked “advertisement” in accordance with 18 USC section 1734.