Abstract
Hypersensitive site 3 (HS3) of the β-like globin locus control region has been implicated as an important regulator of the β-like globin genes, but the trans factors that bind HS3 have only been partially characterized. Using a five-species alignment (human, galago, rabbit, goat, and mouse) that represents 370 million years of evolution, we have identified 24 phylogenetic footprints in the HS3 core and surrounding regions. Probes corresponding to the human sequence at each footprint have been used in binding studies to identify the nuclear factors that bind within and near these conserved sequence elements. Among the high-affinity interactions observed were several binding sites for proteins with repressor activity, including YY1, CCAAT displacement protein, and G1/G2 complexes (uncharacterized putative repressors) and several binding sites for the stage selector protein. To complement this analysis, orthologous galago sequences were also used to derive probes and the pattern of proteins binding to human and galago probes was compared. Binding interactions differing between these two species could be responsible for the different expression patterns shown by the two γ genes (galago γ is embryonic; human γ is fetal). Alternatively, binding interactions that are conserved in the two species may be important in the regulation of common expression patterns (eg, repression of γ in adult life).
EACH OF THE FIVE active human β-like globin genes is expressed in an erythroid-specific and developmental stage-specific fashion. Interestingly, the arrangement of these genes on chromosome 11 parallels their temporal order of developmental expression: 5′ ε (embryonic), γ1 (fetal), γ2 (fetal), δ (postnatal), β (postnatal) 3′. Studies in several laboratories have shown that within the 60-kb region of human chromosome 11 that contains the five active genes proper, regulatory elements exist that control the characteristic expression pattern of each gene.1-3 Thus, both the γ and the β genes show regulated developmental expression (albeit at low levels) in erythroid cells when human constructs containing only these genes and a few hundred base pairs of 5′ flanking sequence are tested in transgenic mice. However, high-level expression of each of the globin genes requires sequences located several kilobases upstream from ε.4 These upstream sequences were first identified by their altered chromatin structure5,6 and later shown to harbor domain opening and enhancing functions.7-11 The critical sequences span five super hypersensitive DNase I sites (HS1-HS5). Because these sequences are dominant regulators of the entire gene cluster, they are collectively referred to as the locus control region or LCR. The fact that deletion of the region encompassed by HS2-HS5 causes a severe γδβ0 thalassemia12 attests to the importance of the LCR in globin gene regulation. The sequences spanning the hypersensitive cores of HS1-HS4 have been extensively probed in both binding and functional assays7,10,11,13-24; site 5 is less well studied.25 26 In the course of these studies in different laboratories, different landmarks have been used to define the functional cores; effects of these cores have been studied on different genes (γ or β or γβ or γγδβ constructs), and the analyses have been performed at different times in development in transgenic mice or in different systems altogether (transient v stable assays; MEL v K562 cells). Although much meaningful information has nevertheless emerged, several important questions remain. One critical question regards the location and characterization of the key regulators that govern LCR function.
One approach to the identification of these cis regulators is the use of evolution. All eutherian mammals possess β-like globin clusters that were derived from the same ancestral five member cluster (5′-ε-γ-η-δ-β-3′).27 Within this cluster, similar developmental switches in gene activity occur in all mammalian species, indicating that the regulatory machinery that controls developmental switching and erythroid-specific expression is conserved. The fact that the human globin genes undergo a developmental switching program in the transgenic mouse confirms that many of the cis signals encoded by human globin genes are properly recognized in the murine background. Moreover, experiments in the chicken suggest that the evolution of the switching machinery predated the separation of aves and mammals.28 To detect conserved sequence elements that could be involved in globin gene regulation, we have used alignments in which sequences from multiple mammalian species are represented. In such alignments, constrained (and possibly functionally relevant) motifs appear as invariable regions (phylogenetic footprints).29-32
In this study, we have applied phylogenetic footprinting to the region surrounding hypersensitive site 3 of the LCR. The region surveyed includes sequences spanning the hypersensitive core and surrounding regions, and extends 430 bp further 3′ from the HindIII site that has marked the 3′ end of the HS3 fragment (1.9 kb HindIII) used in the majority of functional studies. This region shows a great deal of sequence conservation but has not been previously tested in binding or expression studies. A total of 24 phylogenetic footprints were analyzed in the region surveyed; oligonucleotide probes corresponding to all of these conserved sequences were found to contain binding sites for nuclear factors. To further determine which of these interactions are actually functionally conserved, and which might represent newly gained motifs in the human genome, galago probes corresponding to all of these human sequence oligonucleotides were also tested in binding assays. Finally, to facilitate the general use of the large amount of binding data generated in this study, and to begin to correlate this binding information with binding and functional studies already in the literature, we are working to integrate these data into an existing Web Server called the Globin Gene Server. Presently, this Server site allows the user to observe multispecies alignments for any region of the globin cluster and to query for information on DNA transfer experiments, DNA-protein binding results, and location of DNaseI hypersensitive regions. Software that will allow interactive electronic exploration of sequence alignments in direct register with all existing functional data for a given region is being developed and refined.
MATERIALS AND METHODS
Alignment.The alignment described here covered bases 4205 to 5606 of the human β cluster (GenBank HUMHBB). All sequences used are available through GenBank; the alignment itself can be accessed on the Globin Server (http://globin.cse.psu.edu/). A schematic view of the alignment is presented in Fig 1. Within the aligned sequences, 24 phylogenetic footprints were analyzed (FP1-FP24). For each phylogenetic footprint tested, two double-stranded oligonucleotide probes were synthesized, one corresponding to the human sequence and one corresponding to the galago sequence (Table 1). The sequences corresponding to the phylogenetic footprints are underlined in Table 1. Oligonucleotides were synthesized on an Applied Biosystems Synthesizer, Model 380B (Applied Biosystems, Foster City, CA). The 5′ end of the antisense strand of all probes contained a GATC overhang which facilitated Klenow labeling of the annealed probes.
Gel shift assays, competition assays, antibody supershifts, and limited protease digestion.Labeling of probes, preparation of nuclear extracts, and performance of gel-shift assays were as previously described.33 Binding studies were performed using erythroid (K562, HEL, MEL, U937) as well as nonerythroid (HeLa, CaCo2, HepG2, Jurkat, and B cells) nuclear extracts. Competition assays were performed by preincubation of nuclear extract (5 minutes, room temperature) with a significant molar excess of unlabeled competitor oligonucleotide before addition of labeled probe. Identification of complexes containing CCAAT displacement protein (CDP) was accomplished by preincubation of nuclear extract with CDP antibody (kindly provided by Dr Ellis Neufeld, Children's Hospital, Boston, MA) in the absence of poly(dIdC). After 5 minutes at room temperature, poly(dIdC) and probe were added. These binding reactions were loaded onto 3.5% polyacrylamide gels and the presence of CDP was detected by loss of complex formation in the presence of antibody. For upstream transcription factor (USF ) supershifts, antibody (a generous gift of Dr Emory Bresnick, University of Wisconsin, Madison) was added after incubation of probe, nonspecific competitor, and nuclear extract. Under these conditions, complexes containing USF remained at the top of the well.
For confirmation of YY1 binding, semi-purified preparations of YY1 were used. These were prepared as described earlier34 and contain significant amounts of YY1 binding activity, but no binding activity for any other transcription factor tested in this study.
RESULTS
Twenty-four phylogenetic footprints are detected in 1.4 kb of hypersensitive site 3.The region shown schematically in Fig 1 covers 1,401 bp of the hypersensitive site 3 region (positions 4205 to 5606 of the human sequence, GenBank HUMHBB). Thus, the alignment includes the previously described functional core of HS322,23 and extends an additional 342 bp 5′ and 834 bp 3′ from this core. The relationship of the region analyzed here to the areas previously assessed in functional studies is illustrated in Fig 1A.
Over the majority of the region studied, sequences from five mammalian species are available: human, galago, goat, rabbit, and mouse. Because evolutionary time is additive within such an alignment, this alignment spans roughly 370 million years (MY) of evolution. In previous studies, we defined a phyogenetic footprint as six contiguous base pairs showing 100% conservation.31,32 However, given the relative flexibility in binding exhibited by most transcription factors, the 6-bp criterion may be overly strict. For example, GATA-1 has been shown to bind with high affinity to AGATAG, TGATTG, AGATTG, GGATAA, TGATTA, GGTTAG, GGATAG, CGATAC, CGATTA, CGATAG, and AGATAA.35 Since the alignment used here covers a large amount of evolutionary time over which small fluctuations in sequence could be tolerated by GATA-1 and other transcription factors, a more relaxed phylogenetic footprinting criterion was used: the 6-bp window was preserved, but one base pair mismatch relative to the human sequence was allowed per species. Some of the motifs selected by this criterion are not highly conserved in sequence, but may be conserved in terms of binding function. A total of 24 probes were synthesized spanning these modified footprints (FP1-FP24, Fig 1A); some covered more than one phylogenetic footprint. Because some phylogenetic footprints were clustered, overlapping oligonucleotides were synthesized to ensure that conserved motifs were centered within the probe. These overlaps are shown schematically in Fig 1B.
All 24 human probes detected nuclear proteins. Figure 2 displays the gel shift patterns for each of these probes; the side-by-side run in three gels facilitates comparison of band migration. However, because it is difficult to obtain optimum binding of all factors using the same conditions, an idealized version of the binding results is displayed in Fig 3. This schematic compilation will be placed on the Globin Server. Band intensity in Fig 2 reflects not only binding affinity, but also specific factor concentration within the extract and the presence or absence of overlapping binding sites on the probe. To more directly compare affinity of binding among all probes and to establish a relative hierarchy of binding affinity, competition experiments were done. For these experiments, oligonucleotides containing well-characterized binding sites were labeled and used as probes. Each of the 24 HS3 competitor DNAs was tested at several concentrations for its ability to compete for binding of specific complexes. The results of these experiments are tabulated in Table 2. In Fig 3, band thicknesses reflect these affinity data. Although most of the binding factors could be identified, six could not. The complexes representing these unidentified factors could not be competed by oligonucleotides containing known binding sites for Sp1, Oct-1, GATA-1, CP1, CDP, YY1, SSP,36 CSBP-2,32 G1/G2,37 NF-E2, AP1, EKLF, USF, or TEF-2.
Binding interactions in the core of HS3.The hypersensitive core of HS3 has been finely mapped in MEL cells by Philipsen et al.22 The two areas of hypersensitivity detected in that study (corresponding to HUMHBB 4240-4375 and 4550-4780) are indicated by the stippled bars in Fig 1. Nine phylogenetic footprints (FP1, FP2, and FP5-11) are contained within these hypersensitive regions; seven of these (FP5-11) lie within the 225-bp core (HphI to Fnu4HI) that harbors most of the enhancing activity of the HS3 region.22 23
In their analysis of binding interactions within the core of HS3, Philipsen et al22 observed six DNaseI footprints in the HphI-Fnu4HI fragment. A subset of the specific proteins binding to these footprints were also identified using gel shift studies.18,23 A number of the probes used in the present analysis overlap substantially with those used in the Philipsen study (Table 3) and have allowed further characterization of several previously undefined binding interactions. Of particular interest is the fact that factor “X,” a ubiquitously distributed protein that binds to footprint 1 of Philipsen et al,23 has now been identified as CSBP-2 (Conserved Sequence Binding Protein-2)32 in the probe FP6. Figure 4A establishes that the FP6 oligonucleotide is one of four (FP18, FP8, FP6, and FP2) that compete well for the binding of CSBP-2 to a previously characterized CSBP-2 binding site located at −835 bp upstream from the ε gene. Six additional probes showed weaker competition (data not shown; tabulated in Table 2). In an earlier study, a common 11-bp motif was noted in several strong CSBP-2 binding sites in the ε upstream region.32 As shown in Fig 4C, nine nucleotides of this motif are also well conserved in two of the four strong CSBP-2 binding oligonucleotides of HS3 (FP6 and FP18). FP8 contains a related motif that matches six of the nine positions, while FP2 contains four different motifs that each match at six of nine positions: AAGAGTCAA (sense strand); CTCAGTCTT (antisense strand); ACAAGACTG (sense strand) and CTGAGCTCA (part of a dyad, and found on both strands). To further confirm that this motif represents the core binding motif for CSBP-2, an oligonucleotide probe was synthesized that was identical to −698ε, except that four bases at the center of the putative core motif were altered (AGAC to CTCG, Fig 4C). This mutation severely curtailed CSBP-2 binding (Fig 4B). In competition studies, this mutant oligonucleotide failed to compete for CSBP-2 binding to FP18, FP8, FP6, and FP2 (not shown). Other mutations that were earlier shown to affect the binding of this complex are summarized in Fig 4C.23 All of these data support the sequence SHBAGAYAS as the recognition core motif for the CSBP-2 complex.
To further characterize the CSBP-2 protein, binding was performed in the presence of the metal chelators EDTA or orthophenanthroline. Most zinc finger proteins show reduced binding in the presence of these compounds, as illustrated by the response of YY1, a well-characterized zinc finger protein (Fig 4D, lanes 3 and 4). In contrast, CSBP-2 binding is not reduced in the presence of chelators (Fig 4E, lanes 2 through 5). Thus, CSBP-2 is not likely to be a member of the zinc finger class of transcription factors. Insensitivity of this complex to chelators was confirmed using all four probes with high affinity CSBP-2 sites (FP18, FP8, FP6, and FP2).
Other previously undescribed binding interactions of note within the HphI-Fnu4HI core include two strong sites for SSP (stage selector protein)36 in FP5 and FP8 (Figs 4A and 5). The high-affinity binding site for G1/G2, a complex of putative repressor proteins, within FP8 was described in an earlier study37 and is further discussed below.
Binding to conserved sequences 5′ to the HS3 core.The DNaseI hypersensitive region 5′ to the functional core contains two phylogenetic footprints. One of these, FP1, binds both YY1 and GATA-1 with high affinity. Simultaneous binding of these two factors has also been observed in the ε promoter, and it has been postulated that the differential binding of these two factors during development could be involved in the switching mechanism.38 39 Simultaneous strong binding of YY1 and GATA-1 also occurs at FP11. Several other probes are bound by both YY1 and GATA-1, but in these cases one of the two factors binds strongly and the other weakly (FP3, FP6, FP7, FP12, FP13, and FP16; see Fig 2 and Table 2). FP2, which also falls within the 5′-most region of hypersensitivity, is bound with high affinity by SSP (Figs 4A and 5), CSBP-2 (Fig 4A), and an unidentified protein (last column of Table 2, factor 1) which appears to be enriched in erythroid cells, but not restricted to this cell type (data not shown).
The region between the two DNaseI hypersensitive regions contains two conserved sequence motifs. FP3 binds YY1 with high affinity, and CSBP-2 and GATA-1 weakly. FP4 contains a highly conserved binding site for NF-E2 of moderate affinity which had been previously noted.18,22,23 In functional assays in MEL cells, this site caused a moderate increase in expression level when linked to fragments from the HS3 core and the human β-globin promoter.18 Furthermore, restriction fragments containing this site can enhance expression of an ε globin-luciferase construct in transiently transfected K562 cells,40 and in vivo footprinting studies show contacts at this site in U11 cells.41
Binding to conserved sequence elements 3′ to the hypersensitive core region.Probes FP12 through FP15 lie outside of the HS3 core but within the 1.9-kb HindIII fragment that has been tested in functional studies (Fig 1). The FP12 probe is bisected by the Fnu4HI site that defines the 3′ end of the most active core fragment and the FP16 probe is bisected by the 3′ HindIII site at 5172. Several binding interactions in this region are of particular note. FP13 contains a high-affinity binding site for the G1/G2 complex of proteins (Fig 6), proposed repressors of embryonic globin genes in fetal cells.37 Thus, two sites for the G1/G2 complexes exist in HS3 (FP8 and FP13). Figures 6A and B illustrate the cross competition between G1/G2 complexes binding to FP8, FP13, and the galago proximal CCAAT box probe (G), the first binding site described for these complexes.37 In Fig 6C, the cell-type specificity of the G1/G2 complex is confirmed using the FP13 probe; the complexes are present in K562 and absent in Jurkat cells. This had been shown earlier for the FP8 probe and the GALCAT probe.37
FP14 contains binding sites for CSBP-2, SSP, and YY1 (Figs 1 and 5). The binding of all three of these proteins to the same probe was previously noted at three sites (−1095, −835, and −49) upstream from the ε gene.32 FP15 contains a strong Sp1 binding site (Table 1; Fig 5). It has been suggested that Sp1 is capable of protein-protein interactions with itself,42 with GATA-1,43 and with YY1,44,45 and it has been speculated that such protein-protein interaction could be important in generating contacts between the LCR and the individual promoters.43 Thus, HS3 contains two high-affinity Sp1 sites located within or near conserved regions (FP9 and FP15) and two additional avid Sp1 sites, neither of which is conserved (Table 3).22 23
FP16 is bound by CDP (CCAAT displacement protein) as shown by competition assays (Fig 7A) and antibody-mediated binding inhibition (Fig 7B). This is one of four CDP binding sites within HS3 (FP1, FP16, FP19, FP20).
Finally, numerous YY1 binding sites were found in this region (Fig 1, Fig 2, Fig 4A, Table 1, Table 2). Interestingly, eight of the highest affinity sites in HS3 are located within a region that begins at the 3′ end of the HS3 core and stretches over 650 bp: FP9/10 (the same YY1 site is probably detected by overlapping FP9 and FP10 probes), FP11, FP12, FP13, FP14, FP15, FP16, FP17. For all of these probes, the identity of the YY1 complex was confirmed by performing binding assays with highly purified YY1 protein prepared as described previously34 (data not shown).
Binding interactions in functionally untested areas 3′ to HS3.Probes FP17 through FP24 lie 3′ to the HindIII site at 5172. Only one study using transgenic mice has included this region in an HS3-containing fragment47; however, the function of the region 3′ to the HindIII site was not specifically tested in that study. FP17 contains a very high-affinity site for YY1 and little additional binding activity. A strong binding site for USF was identified in FP18 (Fig 8A) and confirmed by supershift analysis with an anti-USF antibody (Fig 8B). The FP18 probe is also bound with high affinity by CSBP-2 (Fig 4A) and with low affinity by SSP and YY1 (Figs 4A and 5A). Binding sites for CDP are present in probes FP19 and FP20 (Fig 7).
Moderate to strong binding sites for the erythroid-specific GATA-1 protein are detected with FP19, FP20, FP21, and FP24 and a weaker GATA-1 interaction is seen with FP23 (Figs 2 and 3). FP20 and FP21 probes overlap and most likely detect the same GATA-1 interaction. The detection of four GATA-1 binding sites in this region is of interest considering the central role played by this factor in erythroid development48,49 and the possibility that GATA-1 could mediate LCR-promoter contact via protein-protein interactions.43 As has been noted in earlier studies, Oct-1 binding often accompanies GATA-1 binding,31 32 and this is the case with each of these probes. Oct-1 binding is weak on FP19 and FP20 and strong on FP21, FP23, and FP24. Competition studies suggest that the binding of CDP to FP19 and FP20 interferes with the binding of Oct-1 to these probes (data not shown). Finally, a strong binding site for SSP is detectable with the FP23 probe (Figs 4A and 5A).
Patterns of factor binding.When all data are accumulated (Fig 2), three overall patterns are apparent. First, a striking set of strong YY1 interactions is visible in the center of the region surveyed. While FP9 and FP10 likely share the same YY1 binding site, none of the other eight probes (FP11 through FP18) that exhibit YY1 binding are overlapping. Additional weak sites (FP6, FP7, FP8, FP18, see Table 2) extend this YY1-rich region. A second pattern is shown by the distribution of Oct-1 sites, which appear to be concentrated in the 3′ end of the region surveyed. In this region, except for FP22, Oct-1 sites are always accompanied by GATA-1 sites. In the 5′ end of the region, in contrast, strong GATA-1 sites are seen in the absence of Oct-1 binding. A final pattern concerns the distribution of factor 5, for which eight sites were detected. Five of these were located in the 3′ end of the region tested.
Conservation of binding interactions.To further pinpoint the subset of interactions that are most highly conserved, all 24 probes were resynthesized using the galago sequence at each site. Human and galago last shared a common ancestor approximately 55 MY ago.29 Although the galago is a primate, its globin genes (particularly the γ and β globin genes) are expressed in a pattern that is more similar to mouse and rabbit than to human. Binding interactions that are conserved between human and galago may be significant for common aspects of globin gene regulation that are shared by both species (eg, expression of ε in embryonic life and its repression in the fetal liver) while binding interactions that differ could be important for the evolution of these new patterns of expression (eg, expression of the γ gene as an embryonic gene in galago and a fetal gene in human). For each set of probes binding patterns were compared directly and six such comparisons are shown in Fig 9. In addition, the ability of the galago oligonucleotide (100 ng) to compete for binding of identified human complexes was tested. Proteins bound by galago and not human probes were not further identified. Binding interactions were designated “conserved” if evidence for binding to both probes could be shown by these two assays (regardless of differences in affinity of binding). In Table 2, all interactions that were found to be conserved between human and galago are shown in parentheses.
DISCUSSION
The HS3 region has been implicated in stage specificity,13 position independence,10 chromatin alteration,50,51 and transcriptional regulation.18,22,23 47 These diverse functions are more than likely the result of the very complex set of transcription factors that bind to this region of the LCR. Through the use of phylogenetic footprinting, we have identified a large number of evolutionarily conserved and potentially important cis elements within HS3 and have identified the trans factors binding at or near these elements. In addition, the resulting “maps” of binding factors reveal interesting patterns within the binding interactions that may be of functional importance. Finally, analysis of the sequences bound adds further information about the binding requirements of uncharacterized transcription factors such as CSBP-2. Several of the specific binding interactions identified in this study are of particular interest.
First, eight moderate- to high-affinity binding sites for SSP were observed (FP2, FP5, FP8, FP13, FP14, FP18, FP20, and FP23). Binding of SSP to the −50 region of the γ promoter appears to be responsible for the competitive advantage, in cis, of the γ promoter over the β promoter in transient assays in K562 cells.36 Those transient assays were performed using a construct containing HS2. We have detected several SSP sites in HS2 (Zhu et al, in preparation), but most of these HS2 sites are lower in affinity than those in HS3. It is not known whether the cis competition effects seen in the K562 assay require the SSP sites in HS2 nor whether HS3 would also mediate such an activity. It is also noteworthy that five strong binding sites for SSP were identified upstream from the ε gene in an earlier study.32 Thus, in addition to its involvement in the γ to β switch, SSP may play a role in ε gene transcription and/or LCR:ε interaction.
Several strong binding sites for the CSBP-2 protein were detected in this study. An earlier phylogenetic footprinting analysis of the ε upstream region showed seven binding sites for this protein; in contrast, no CSBP-2 binding sites were found in phylogenetic footprints upstream from the γ gene. One of the strongest CSBP-2 sites in HS3 was recognized in FP6. In an earlier analysis, Philipsen et al23 noted this same complex (called “Factor X” in that study) within the region known as DNase footprint 1. Deletion of footprint 1 resulted in loss of 70% of the activity of the entire core fragment in MEL assays.18 The investigators concluded that the transcriptional activity of footprint 1 is due to the presence of a strong binding site for GATA-1, because mutation of the factor X site in the presence of an intact GATA-1 site appeared to have no effect in transgenic mice.23 However, that analysis was carried out only on 13.5-day founders and the test construct contained only the β globin gene. Given the number of CSBP-2 binding sites upstream from ε, and the apparent importance of the HS3 region in embryonic expression,13 52 it would be of interest to test the role of this factor in ε gene expression. Derivation of a consensus binding sequence, SHBAGAYAS, will facilitate the identification of additional CSBP-2 sites in the globin cluster.
A striking arrangement of 13 clustered YY1 sites was detected in the center of the area surveyed (FP6-FP18, Table 2). Eight of these sites are of high affinity (Figs 2 and 3) and are located within a 650-bp region beginning at the end of the HS3 core and stretching 3′. Most of the high-affinity sites (6/8) are conserved in the galago (Fig 9 and data not shown). The location of these clustered YY1 sites is interesting in light of two recent studies in which deletions of different regions of HS3 were shown to produce dramatically different phenotypes in mice.52,53 In one study, deletion of the region that includes FP1-FP16 (and removes all but one of the clustered YY1 sites) had only mild effects on globin gene transcription.53 In a second study,52 deletion of the HS3 core (the region spanning FP5-FP11) seriously reduced the expression of the ε, γ, and β genes. Among the possible explanations for the dramatic effect of this smaller deletion, the investigators suggested a dominant negative effect mediated by sequences surrounding the deleted core. In fact, this deletion removes only two (FP9/10, FP11) of the nine consecutive YY1 binding sites immediately downstream from the core (Figs 2 and 3), and juxtaposes the strong YY1 binding sites at FP1 and FP3 to the remaining YY1 sites downstream from the core. Strong silencing effects mediated through these multiple juxtaposed YY1 binding sites and/or other nearby repressor sites (such as the three CDP binding sites in FP16, FP19, and FP20 or the strong G1/G2 site in FP15) could potentially lead to a dominant negative phenotype in the absence of the core region. In fact, an enhancer silencing property of YY1 was recently shown in studies of the embryonically expressed chicken ρ-globin gene in K562 cells.54 The identification of this string of YY1 binding sites provides an opportunity to design experiments to test whether or not they can mediate a dominant negative silencing effect and to investigate the mechanism of such an effect.
Besides its repressor activities, YY1 has additional interesting properties. It is known to interact with other proteins such as Sp1.44,45 Because Sp1 binds to conserved CACCC motifs in the promoter regions of all β-like globin genes55 these clustered YY1 sites could stabilize important enhancer/promoter interactions. In fact, YY1 is also known to interact with TAF55, a coactivator that is part of the basal transcription machinery.56 Moreover, several systems have been described in which YY1 mediates a switch from activation to repression57-60 and all of these switching systems rely on the ability of YY1 to interact with other proteins. YY1 can also bend DNA58 and could thus be involved in the generation of the three-dimensional structure of the LCR holocomplex. Finally, YY1 has recently been shown to preferentially associate with the nuclear matrix.61 All of these attributes make further functional analysis of the multiple high-affinity binding sites for YY1 found in HS3 a compelling goal.
Downstream from the string of YY1 binding sites are three binding sites for CDP, another protein which has been implicated as a repressor.46 In Drosophila, the CDP homologue, cut, is involved in fate determination.62,63 The combination of CDP sites, YY1 sites, and a strong binding site for the G1/G2 complex of proteins, also putative repressors,37 make this region of HS3 (FP9 through FP20) remarkably rich in repressor binding proteins.
The tabular compilation of the binding results obtained in this study suggests that the functional properties of HS3 are carried out by multiple redundant sites for a limited number of transcription factors (Table 2). Conservation of many of the high-affinity YY1, GATA-1, Oct-1, G1/G2, SSP, CSBP-2, and Sp1 sites suggests that these factors could be important in the control of regulatory properties necessary for overall LCR function that are shared between human and galago globin genes. In contrast, several binding interactions are not conserved, including the strong USF site in FP18, all four CDP sites, the strong YY1 site in FP3, and avid SSP sites in FP5, FP18, FP20, and FP23. The evolutionary gain of some of these binding sites in the human LCR could have facilitated the gain of new LCR functions that affect the pattern of globin gene expression (eg, the recruitment of the previously embryonic γ globin gene to a fetal expression pattern).
Integration of the data in this report with the large number of other published binding and functional studies of the globin gene locus will be greatly aided by electronic collation. The Globin Gene Server64 provides an electronic database with the capacity to respond to user queries; software available at this web site should greatly aid in the integration of this information and the generation of new hypotheses regarding globin gene regulation.
ACKNOWLEDGMENT
The authors thank Dr Ellis Neufeld and Dr Emory Bresnik for generously providing antibodies for the supershift experiments performed with CDP and USF, respectively.
Support from National Institutes of Health Grants No. HL-48802 (D.L.G.), HL-33940 (M.G.), LM-05773 (R.H.), LM-05110 (R.H.), and DK-27635 (R.H.) is gratefully acknowledged. Supported in part by the Michigan General Clinical Research Center Grant MO1-RR00042, funded by the National Center for Research Resources (NIH-USPHS).
Address reprint requests to Deborah L. Gumucio, PhD, University of Michigan Medical School, Department of Anatomy and Cell Biology, 5793A Medical Science II, Ann Arbor, MI 48109-0616.