THE MAJOR HISTOCOMPATIBILITY complex (MHC) is an extended cluster of genes that are remarkable for the number and importance of the immunological functions they encode. Historically, interest in the MHC emanated from tissue transplantation experiments, hence the reference to histocompatibility. Now we know that this highly conserved region encodes genes that have many important and varied functions both immune and nonimmune in nature. One family of MHC genes, the human leukocyte antigen (HLA) genes, are the most polymorphic yet discovered, with hundreds of alleles described so far.1,10 Serotyping of these polymorphisms has permitted genetic mapping with these loci in early disease-association studies. In addition, these polymorphisms have permitted extensive population studies revealing a number of interesting MHC associations and phenomena. Recent cDNA selections,2 genomic sequencing, and cosmid screenings have shown the existence of a large number of genes in the MHC, including a number that are expressed selectively in cells of the immune system.2-8 In this review we discuss recent data that considerably expands the number of immune-related genes encoded in the MHC and review factors that may underlie its large scale organization.
The MHC includes coding regions for the well-known and eponymous polymorphic surface antigens of the HLA class I and class II type that, respectively, present antigen to CD8 and CD4 T cells. Intervening between the class II and the class I region of the human MHC is a class III region that includes genes for several of the components of the complement system.9,10 Genes in the class II region have been found to encode components of the cytoplasmic proteasome that degrades proteins to peptides11 as well as peptide transporters for loading peptides onto class I proteins.12 The class II region also encodes DM, a gene that facilitates loading and assembly of class II proteins.13,14 Just centromeric to the class I region is a cluster of genes encoding three inflammation related proteins: tumor necrosis factor-α (TNF-α), lymphotoxin α (LTA), and lymphotoxin β (LTB).15-18 Remarkably, the list of immune system–related genes and of chromosome segments potentially derived as duplications of parts of the MHC has grown substantially within the last year.
The MHC is of major biomedical interest because of its contribution to transplant rejection and to variation between individuals in susceptibility to a variety of autoimmune disorders.19-23 In a number of cases it is clear that this susceptibility is determined by variation in the protein sequence of the class II molecules themselves. However, as will be discussed below, the possibility must still be entertained that polymorphism in other immune system–related genes of the MHC also contribute to these diseases.
In addition to its role in influencing the propensity for known autoimmune diseases such as insulin-dependent diabetes mellitus, multiple sclerosis, systemic lupus erythematosus, myasthenia gravis, and rheumatoid arthritis (reviewed in Thomson, 199523 ), the MHC contains genes contributing to several other hereditary disorders that are either not autoimmune in nature or where the role of autoimmunity is uncertain. These include ankylosing spondyloarthropathies, where there is a clear association with the class I allele HLA-B27, and narcolepsy, equally clearly associated with certain class II alleles. The MHC also includes the genes for steroid 21-hydroxylase and hemochromatosis. Deficiency of 21-hydroxylase causes the adrenogenital syndrome, and hemochromatosis is one of the most common simple Mendelian disorders of man. Recently, a major quantitative trait locus for dyslexia was mapped to the distal MHC. This is actively being pursued by several groups.24 25
The MHC has been extensively studied and a large number of excellent reviews exist on a range of related topics.1,10,12,19,20 26-50 The present review will largely be restricted to the newest developments in recognition of the complexity of the role of the human MHC in the immuno-hematologic systems and the relationship among segments of the MHC.
A SUBREGION OF THE MHC CONTAINS AN UNUSUAL CONCENTRATION OF GENES RELATED TO INFLAMMATION
By convention, the MHC is divided into three contiguous regions that approximate the location of genes with shared characteristics (Fig 1). Most centromeric on chromosome 6p is the class II region, which contains the 17 known HLA class II genes and pseudogenes.5 Contiguous to that is the class III region, which encodes several of the components of the complement system. Telomeric to the class III region is the class I region, which encodes more than 18 HLA class I–related genes and pseudogenes. Recently a number of genes putatively involved in inflammation have been identified in the central MHC, at the telomeric end of the class III region. As presented below, this group of genes may have roles in various aspects of stress, inflammation, or infection. We suggest that this concentration of genes is sufficiently distinct to be designated as the class IV region.
TNF family cluster. A cluster of genes for three related cytokines, TNF, LTA, and LTB, lies at the telomeric end of the class III region,42 shortly before the most centromeric class I–related genes. TNF has been very extensively studied47 and plays an important role in inflammation, bacterial,51 and viral infection,52 tumor cachexia, and the immune response. It is produced by a variety of cells including prominently monocytes, macrophages, and some T-cell subsets. In mice, variants in the TNF gene that affect the levels of TNF production are associated with variations in the level of susceptibility to renal disease in a lupus-like syndrome.53 Polymorphisms in the human TNF gene are associated with increased susceptibility to fatal cerebral malaria,54 rheumatoid arthritis, inflammatory bowel disease,55 and perhaps other autoimmune disorders.56-59
LTA (also called TNF-β) has actions that are very similar to those of TNF on a cellular basis but its pattern of expression is considerably more limited. Remarkably, deletion of the LTA gene leads to a specific absence of lymph nodes, Peyer's patches, and splenic germinal centers in mice,60,61 whereas mice lacking TNF develop lymph nodes and Peyer's patches but lack splenic primary B-cell follicles.62 Deletion of the p55 TNF receptor results in normal lymph node development and architecture, normal migration of lymphocytes into the lamina propria and epithelium of the small intestine, but no organized Peyer's patches and absence of germinal centers.63 Based on these observations it is speculated that the effect of LTA on the development of lymphoid organs may be mediated by distinct receptors, some functioning in an organ-specific context.
LTB (also called TNF C) is a membrane bound molecule that forms a heterotrimer with LTA.18 This LTA-LTB complex can then induce activation of NF κ B in certain cell lines by binding with the LTB receptor, a member of the TNF receptor family.64,65 NF κ B is a pleiotropic transcription factor capable of activating the expression of a great variety of genes critical for the immunoinflammatory response.65 The LTA-LTB complex is also weakly cytotoxic to some tumor cells but does not appear to induce apoptosis.
B144 (D6S49E ). Within 15 kb upstream of the human TNF cluster is a gene designated B144 or LST-1 (leukocyte-specific transcript),66 a homologue of a partial cDNA first identified in the mouse.67 The transcript of this gene is expressed exclusively in monocytes, macrophages, and some T cells. Expression of the gene at the RNA level is increased by various activating stimuli, and in particular by γ-interferon (γ-IFN). The most probable gene product is encoded in an open reading frame whose cognate amino acid sequence is over 50% identical between humans and mouse. The amino acid sequence contains a long hydrophobic region near the N terminus but otherwise shows no apparent relationship to proteins of known function. The interpretation of the open reading frame and the regulation of this gene is complicated by the remarkable degree of alternative splicing that is seen in the mRNA products.
The B144 RNA is actually an ensemble of related sequences derived from four alternative 5′ exons and 3 internal exons as well as several forms of the 3′ exon. Exons 5 and 6 may be associated with any of the 5′ exons and may be separately or jointly omitted from the final mRNA product. The result is that 24 or more forms of mRNA may be seen in U937 monocyte cells. Some of these forms could encode altered forms of the putative protein while others affect only the 5′ untranslated region. The gene is also peculiar in that one spliced form apparently has a GA dinucleotide as the first bases at the 5′ end of an intron. The first two bases of introns are generally GT with an occasional GC, so that the B144 splice site is highly unusual and perhaps unique. The mouse gene also seems to have multiple alternative splice forms in cultured cells. The amount of B144 mRNA in some cells is quite substantial and it would be surprising if the protein were not formed in significant amounts, although there are presently no publications characterizing the putative protein.
1C7 (D6S2570E ). A second gene, 1C7, that is preferentially expressed in monocytes and certain other hematopoietic cell lines, lies immediately adjacent to B144,68 such that the 3′ ends of the two mRNA templates come within a few bases of overlapping. The gene encodes at least three differentially spliced mRNAs. Each mRNA encodes a potential leader sequence and an internal membrane anchor segment, but the three forms differ in the length and sequence of the carboxy-terminal intracytoplasmic domain. The encoded product has weak similarity to surface antigens such as CTLA4.
AIF-1. Recently another MHC-encoded monocyte and lymphocyte-specific cDNA, called AIF-1, was identified because of its elevated level of expression in experimental cardiac transplants.69-72 Separately the gene had been characterized and termed G1.73 The gene for this transcript lies shortly centromeric of B144. It is highly conserved between rodents and humans. The mRNA is upregulated by γ-interferon and is expressed by a subset of macrophages infiltrating allografts. The putative amino acid translation lacks a hydrophobic leader sequence. However, it has strong homology to the EF hand protein motif found in calcium binding proteins involved in neutrophil activation74 and also in a myeloid chemoattractant protein.75 Not all cytokine genes encode leader sequences and it remains possible that the AIF-1 protein may be secreted or released from injured cells. The function of this gene remains unknown.
I κ B-like (NFKBIL1). In the same region of the MHC just telomeric to the TNF cluster is a gene for a protein that contains certain motifs similar to those seen in I κ B, called I κ B-Like (IkBL).76 The role of I κ B in negative regulation of NF κ B and hence the inflammatory response is a well-known paradigm in transcriptional control, involving regulation both by phosphorylation and by ubiquitination and protein degradation. By analogy the MHC-encoded IkBL molecule is a candidate for a regulator of inflammatory responses although it is described as being widely expressed. Rather surprisingly, there are no functional data available for this molecule.
SKI2W. Centromeric of the Hsp70 genes43 lies SKI2W,77-79 a gene encoding a protein with substantial similarity to the yeast Ski2 (superkiller 2) protein,80 including its RNA helicase motifs. Ski2 makes yeast cells resistant to killer toxin RNA viruses by virtue of its ability to block the translation of nonpolyadenylated or noncapped viral mRNAs targeted for degradation.81,82 This gene plays a partially redundant but essential role in normal yeast growth83 and has an antiviral effect as well. Because of the homologies with the yeast gene it remains possible that one role for the human gene is as a mediator of antiviral effects, particularly against RNA viruses that use nonpolyadenylated RNA during their life cycle.
MIC family. The MIC (MHC class I related) family84,85 is a group of homologous genes and pseudogenes interspersed in the distal class III and class I region of the MHC. The structure of the genes and predicted coding products resemble that of class I genes but they are clearly a divergent family. They are also characterized by very large first introns. The only members of the group known to be expressed as mRNA are MIC-A and B, whose genes lie telomeric to the IkBL gene and centromeric to HLA-B. The MIC-A protein is expressed and properly folded independently of β2-microglobulin and is preferentially expressed in the intestinal mucosa.86 Curiously, the transcription of MIC-A is upregulated by heat shock. Thus, the gene shares properties with the class I genes per se but has at least one functional feature that would relate it to the heat shock genes located more centromerically.
Overall then there are, in the central MHC between the complement region (class III) and the class I region, at least seven genes implicated to some degree in inflammatory responses. There is also an eighth gene, 1C7, that has not yet been studied functionally but whose pattern of expression is consistent with a similar role. In addition, the IkBL molecule near one end of this cluster is at least a potential candidate for a role in inflammation. The BAT1 gene (D6S81E ), also in this region, encodes an RNA helicase that conceivably could play a role in the complex splicing of B144 as well as other genes.87 At the centromeric end of this region are a number of genes of unknown function and broad expression, and also the genes for three heat shock proteins88 whose expression may be elevated in inflammation. Heat shock proteins have also been implicated in mediating antiviral effects of prostaglandins.89 Overall we suggest that this segment of the MHC be given a separate designation as the class IV region with recognition of its multiple potential roles in inflammation and stress responses.
ADDITIONAL IMMUNE-RELATED GENES ARE DISTRIBUTED THROUGH THE ENTIRE MHC
The genes mentioned in the above paragraphs and the cited reviews encode a substantial number of structurally remotely related or totally unrelated proteins that are expressed specifically within or function by interacting with cells of the immune system. Gene linkage as a consequence of duplication and divergence does not seem an adequate explanation for this clustering. Ascertainment bias is a conceivable explanation because of the intensity with which the immune system and this gene cluster have been investigated. This explanation seems progressively less likely as more immune-system genes of divergent structure are discovered within the cluster. The recent finding in the distal MHC of an additional three or four genes of different families that are expressed selectively in immunocytes seems to make the coincidence argument almost untenable. Further, these genes are of unknown function and, as of yet, of an unknown degree of polymorphism, so their potential contribution to association of MHC haplotypes with propensity for autoimmune disorders is of some interest. The recently detected immune system genes of the distal MHC include the following.
A ubiquitin-like gene. Ubiquitin is a 76-amino acid peptide whose sequence is remarkably conserved among eukaryotes. It is expressed in almost all cells of the body and its generic function is to become attached to other proteins and target their degradation by the proteasome system. Ubiquitin plays a role in a diverse variety of cellular processes including chromatin structure, cell-cycle control, modification of cell receptors, DNA repair processes, gene silencing, and the activation and degradation of NF κ B.90 However, it is curious that ubiquitin was initially studied as a peptide that might have transcellular effects in modulating the activity of immunocytes.
Several genes encoding ubiquitin homologues have been discovered including at least two non-MHC genes that are implicated in the function of the immune system. One of these, ISG15,91-94 encodes a protein homologous to a ubiquitin dimer that is not expressed in unstimulated cells, but is induced by α- or β-IFN in a variety of cell types. After interferon treatment newly synthesized ISG15 is found intracellularly. Pulse-chase experiments show that after 24 hours, more than 50% of labeled ISG15 is found extracellularly. The question of whether this protein has a transcellular effect in vivo remains unaddressed.95
The primary product of the ISG15 gene contains an extension of eight amino acids beyond the C terminal diglycine that is characteristic of ubiquitin.96 This C terminal extension is rapidly removed intracellularly. The primary translation product has no demonstrated effect on cells. In contrast the processed product is active at the low nanogram per milliliter level in inducing γ-IFN production by T cells.97 This peptide also induces a remarkable proliferation and activation of natural killer (NK) cells in B-cell–depleted mononuclear cell populations.98 The effect on NK cells requires the presence of T cells but cannot be accounted for by γ-IFN interferon itself.98
Nakamura et al99 have been investigating a ubiquitin homologue synthesized by T cells, and that acts as a nonantigen-specific suppresser factor of B cells. Isolation and sequencing of the protein, known as MNSF β (monoclonal nonspecific suppressor factor), followed by cloning of its cDNA showed that this protein was a ubiquitin homologue despite its apparent larger molecular weight when isolated from cells.100
The distal MHC encodes a molecule, known as FAT10, that is homologous to ubiquitin but as divergent from ISG15 or MNSF β as either of the two are from ubiquitin. The homology to ubiquitin extends to preservation of the C terminal diglycine that appears necessary for transfer of ubiquitin to other molecules as well as the internal lysines and flanking sequence on which ubiquitinylation occurs. The molecule is strongly expressed at the RNA level by some B-lymphoblastoid lines and γ-IFN inducible in others. In vivo it is found in thymus and to a lesser degree in spleen. It is not expressed in Jurkat T cells or a variety of other cell lines. A mouse homologue exists with over 50% conservation of the amino acid sequence. Experimental investigations of the function of FAT10 are just beginning (unpublished data).
A transcription factor–like gene family. The distal MHC encodes a family of at least five genes that share a C terminal homology domain with butyrophilin,6,101 dubbed the “CTB box,” 102 that extends about 100 amino acids in length.103 Butyrophilin is a protein that coats fat globules in milk and is itself encoded in the distal MHC. The function of this conserved CTB box domain is unknown, although the degree of conservation suggests that it may be a motif for interaction with a second protein or family of proteins. The proteins of this family that are encoded by genes strictly within the MHC have somewhat divergent N terminal structures. One generally expressed gene, ZNF173 or the “Acid Finger Protein,” 102 has several earmarks of a transcription factor including paired zinc fingers and an acidic domain. Two other members of this family, ZNF178 and 960-205, encode proteins resembling ZNF173 but lacking the acidic domain that could potentially act as a transcriptional activator. One of these, 960-205, has a limited pattern of expression, including undefined cell types within the spleen (unpublished data, August 1995).
TSSP. As discussed below, the telomeric boundary of the MHC has recently been extended to include the region around the hemochromatosis locus. Between the olfactory receptor cluster and the hemochromatosis locus (Fig 1) is the gene for yet another “orphan” protein of the immune system whose mRNA so far has been found only in the thymus (unpublished data, July 1996). It is aptly called thymus specific serine protease (TSSP). The predicted protein has a hydrophobic leader sequence, suggesting that it is either membrane bound or secreted. Because it lacks a clear internal membrane anchor sequence the latter is more likely, and it will be of some interest to immunologists to learn more of the function of this gene.
FAT 9. Recently an RNA transcript encoded by the FAT 9104 gene and containing sequences of one of the human MHC olfactory receptor–like genes was detected in spleen mRNA. Closer examination showed that the olfactory receptor antisense sequence was included in the probable 3′ untranslated region of the novel FAT 9 transcript. (unpublished data, June 1995) This transcript putatively encodes a protein of about 130 amino acids. On Northern blots it is found predominantly in lymph nodes as compared to spleen, thymus, bone marrow, or a variety of other tissues and organs. The encoded protein does not have obvious homologies to any proteins in the database. Therefore, FAT 9 seems to represent yet another example of an MHC gene that functions specifically in the immune system.
Other genes within the MHC may be involved specifically in immune processes. The homology between the murine class II region and the human class II is striking, except that there is an insertion of about 60 kb in the murine class II region, containing the H2K genes.46 The murine KE5 gene lies centromeric to H2K and has been reported to be expressed only in lymphatic tissues.105 KE5 lies within a collagen gene mapping centromeric to H2K and has been reported to be expressed only in lymphatic tissue, so that its functional significance is unclear. The recently described HCGVIII transcript4 is reported to be expressed in intestine and in lymphoid tissues, so that it is yet another candidate for a gene with specific functions in the immune system.
In addition to these several genes whose expression is limited to immunocytes, a number of other more broadly expressed genes of the MHC have the potential to have more or less specific roles in the immune system. One example is PBX2 (also known as G17), located approximately 250 kb telomeric to HLA-DRa.88,106 Although there are no functional data available for PBX2, a non-MHC PBX gene has been implicated in human leukemia presumably because of formation of an oncogenic fusion product via a chromosome 1:19 translocation.107 Further experiments suggest that PBX genes can modify the specificity of clustered homeobox genes by forming heterodimers with them that directly bind DNA. The clustered homeobox genes are expressed in different patterns in various stages and lineages of the hematopoietic system, and may modulate the expansion of various hematopoietic precursors.108
In view of the concentration of immune-related genes it now appears most unlikely that the clustering of immune functions within the MHC is a coincidence. As partly discussed above, there is a suggestion that there may be regional subspecialization of immune functions within the MHC beyond that simply explained by local gene duplication. This description is strikingly apt for the well-studied antigen presentation functions in the proximal MHC, but could also apply to the cluster of complement components, and the cluster of inflammation and stress-response genes.
A large fraction of the immune system–related genes of the MHC are γ-IFN responsive. Although any proposals concerning reasons for this clustering are quite speculative, the question arises as to whether clustering may be promoted by long-range gene control, perhaps at the level of chromatin organization and DNA looping.
An alternative proposal for the cause of the clustering of these diverse genes of the immune system is that particular allelic forms at one locus create proteins that interact preferentially with the products of one or another allelic form at a second locus.109 The two alleles would therefore be preferentially co-inherited. This hypothesis in its simplest form has the difficulty that there is no evidence for protein-protein interactions between the products of several of the relevant immune-system genes.
Consideration of the population genetics of the MHC suggests an alternative explanation for the clustering of these genes. Class I and II alleles in general show a high rate of accumulation of various forms of mutation, at least in the peptide-binding regions.45 As has been extensively discussed by others, class I37,41 and class II alleles are also subject to gene conversion like events110-112 as well as recombination and point mutations. This high degree of polymorphism is believed to be driven by selection for particular alleles in response to rapidly evolving parasites, or by kinship recognition (see below). Certain of these allelic variants influence the predisposition toward various diseases, including common autoimmune disorders. It is possible that as a particular haplotype expanded in a population, other genes with alleles that act either to protect against the environmental stress or as modifiers of some of the potential adverse effects of the particular class I and class II alleles, were also selected. For example, it is not inconceivable that a class II allele associated with early onset of a severe autoimmune disease might become associated with alleles of linked genes that themselves have a protective effect against that particular autoimmune disorder. The consequences could even include a paradoxical effect in which alleles protective against the development of a disease would be found at a higher incidence in affected individuals than in controls.
SEX AND THE SINGLE MHC
Mating preferences and the MHC. The large number of alleles for the peptide-binding regions of classical MHC surface antigen genes are commonly attributed to selection for immune resistance to parasites.1 However, kin recognition43 with potential effects on mating preferences, relative fertility, and frequency of abortions, could also favor a high level of heterozygosity and the expansion of rare alleles in the odorant determining genes of the MHC. If the peptide binding regions of the class I and class II antigens themselves are, by any mechanism, the odorant determining molecules then the above effects could contribute to the selection for diversity of these regions. In this regard, some portions of the olfactory receptor genes show a relatively high level of nonsynonymous substitutions compared with synonymous nucleotide substitutions,113,114 reminiscent of the similar phenomenon in the coding regions for the peptide binding segments of the MHC class I and class II genes, Ig-related genes,115 and in flowering plant self-incompatibility loci.116 117
Shortly distal to the HLA-F gene the human MHC contains a group of genes that encode 7-transmembrane segment receptors of the olfactory receptor family104 (Fig 1). A role for these receptors in olfaction or their specific expression in olfactory receptor epithelium has not yet been directly shown, although their similarity to genes for receptors known to be expressed in olfactory tissue makes this a plausible suggestion. Physiologically these genes are of some curiosity because of reports that urine odorants determine several types of reproductive behavior including fetal retention and mating type preference in the mouse and that the differences in odor are determined by genes linked to the MHC.
Studies with mice that are congenic except for the MHC suggest that 50% of the total difference in odorant stimulus between strains of mice may be caused by genes in the MHC, and that mice can distinguish the odor of urine from two strains that differ only in a mutation in a single class I gene.118 Humans can also distinguish urine odors from mice of different MHC haplotypes.119 Cross-breeding studies in mice suggest that the preference of mice for certain MHC determined odors may be learned rather than genetically irreversibly determined.120 It is not yet known if the mouse MHC also has olfactory receptor genes. It has been claimed that in humans there is evidence for preference for particular MHC determined variations in sweat odors121 and that this preference may change with the hormonal status of the individual. However, there are not clear data in the literature showing an MHC-determined mating type preference in humans122 and there has been some caution raised about the precise interpretation of some of the mouse experiments.123 It is also curious that a subset of olfactory receptor–like genes are expressed in sperm of a number of species.124 The occurrence of olfactory receptors within the MHC may favor the selection of certain receptor alleles matched to a subset of MHC-determined odorants, such that those haplotypes that carry certain matched alleles would have a selective advantage in the population.
If odorant discrimination is learned, this could have implications for the population genetics of the MHC in species where odorant attraction or repulsion is important. Depending on the manner of psychological imprinting, social and population structure of a species, it is imaginable that learned haplotype discrimination could be a stronger driving force for heterozygosity than genetically determined haplotype discrimination. This could be so if an individual might learn to discriminate not only against self-haplotypes but also against haplotypes present in his early environment. Also, diversification of the MHC alleles as a result of selection by kin recognition and or by evolving parasites are not at all mutually exclusive mechanisms.
Infertility and fetal loss. Recurrent spontaneous abortions and infertility have been reported to be associated with the sharing of HLA antigens between husband and wife, although there are also a number of negative studies125-128 and the subject is somewhat controversial. These studies are difficult because of the outbred nature of the human population and the polymorphism of the MHC. In particular, any possible synergistic or additive effect by more than one locus within the MHC would be difficult to detect in most populations. Recent investigators have suggested that different regions of the MHC may affect different aspects of fertility,125 including the suggestion that homozygosity at DR may affect fertilization and implantation while homozygosity at HLA-B may affect fetal loss.128
A scarcity of homozygotes. A higher-than-expected ratio of heterozygosity to homozygosity at MHC loci has been seen in several different populations129-131 and has been suggested to be a consequence of loss of MHC homozygotes due to their increased susceptibility to infections. Alternatively it may be due to some undefined disadvantage during pregnancy accruing to fetuses that are histocompatible with their mother. However, the reduction in homozygotic individuals has been seen even in developed countries127 128 where the current generation would not be expected to show any significant reduction of homozygotes due to fetal wastage or postnatal morbidity due to infection.
Studies of both reduced reproductive effectiveness in couples sharing HLA alleles, and studies of homozygosity at single MHC loci would not fully reflect the effects that might be seen if comparisons were made with entire haplotypes. The studies of the South Dakota Hutterite population are valuable in this regard. These individuals are a reproductively isolated group derived from a relatively small number of individuals of North European origin, apparently fewer than 70 founders.132 They also traditionally have large families. In this group the reduction in the number of individuals sharing MHC haplotypes, to about one third of the statistically expected level, is quite remarkable.127 132 This suggests that diversity of MHC haplotypes may be favored for reasons other than parasite resistance, perhaps as a mechanism for avoiding inbred populations.
Studies with a limited population of Amerindians suggests that production of viable offspring discriminates against MHC homozygosity rather than concordance of maternal and fetal haplotypes.133 In one of these studies parents sharing an MHC haplotype produced an excess of offspring that were heterozygous but haploidentical to the mother, with less than the expected number of MHC homozygous offspring. This could occur if zygotes homozygous for the MHC region are markedly less viable, for example if there were recessive lethal alleles in the MHC. This effect would extend across a number of haplotypes, requiring the unlikely occurrence of mutually complementing sets of such recessive lethals in each haplotype.133 In view of the absence of any comments to the contrary, those individuals that are born with homozygous MHC haplotypes apparently show no congenital abnormalities to suggest developmental difficulties. In addition, Hedrick134 has presented quantitative arguments against the likelihood of recessive lethal alleles explaining population deficits in MHC homozygotes.
The preferential occurrence of heterozygosity would be compatible, in principle, with kin recognition effects that avoid in-breeding. This could be operative at the level of mating type preference, preferential fertilization of the ovum by sperm of different MHC haplotype, or decreased survival of a zygote or embryo as a consequence of homozygosity at MHC loci. A speculative explanation consistent with both the Hutterite and Amerindian data is that there might be selective failure of fertilization of ova by haplotype identical sperm. Presumably, this would have to occur as a consequence of gamete marking accomplished by postmeiosis I gene expression.
Immune attack against fetal cells could be consistent with the mechanisms for self-recognition used by NK cells. Killing by NK cells is inhibited by the presence of a complete set of self class I antigens. Therefore, homozygous fetal cells would be susceptible to killing because they have an incomplete set of maternal self antigens. However, because fetuses that are not histocompatible with mother do well, this cell killing would also have to be inhibited by recognition of foreign antigens on fetal cells. Presumably this recognition would be mediated through the T-cell receptor.
THE HEMOCHROMATOSIS GENE: AN EXAMPLE OF RECOMBINATION SUPPRESSION?
Hereditary hemochromatosis was recognized some 20 years ago to be a recessive disorder linked to the MHC class I region (reviewed in Chu et al135 ). With a prevalence rate of 3 to 5/1,000 in whites, hemochromatosis is counted among the most common of autosomal-recessive diseases. The underlying pathophysiology was unknown to Von Recklinghausen172 and others who, over a century ago, originally described the triad of diabetes, hepatomegaly, and skin hyperpigmentation. Although we now know that the clinical manifestations are caused by iron overload resulting from malregulated dietary iron absorption, the precise molecular pathway remains unknown. The recent identification of an HLA-like gene and mutation that is frequently associated with the disease,136 as well as the observation that β2-microglobulin–deficient mice develop a similar iron overload syndrome,137 suggests that the immune system is functionally linked to iron metabolism.
Early genetic mapping studies reproducibly and independently showed genetic localization of the hemochromatosis gene to the vicinity of the distal MHC. Studies within pedigrees (linkage) showed that the phenotype segregated with the HLA-A gene.138,139 Studies of large populations (linkage disequilibrium) showed a strong association with the HLA-A3 allele.140-143 Additional studies showed that HLA-A3 is just a single marker of an extended haplotype that spans the distance from HLA-B, through and beyond HLA-A,144 that is almost always associated with the phenotype. In other mapping studies of Mendelian traits such as diastrophic dysplasia, linkage disequilibrium studies were able to pinpoint the gene location to less than 60 kb.145 It was surprising then, given the linkage disequilibrium with HLA-A3, that Feder et al136 found a hemochromatosis gene* (HFE) candidate and mutation 4 megabases (Mb) telomeric of HLA-A, a considerable distance in view of the genetic data.
The HFE gene shares approximately 58% homology with HLA-A2 on an amino acid level, and less than 45% similarity on the nucleotide level. It retains many of the structural hallmarks of MHC class I molecules: signal sequence, α 1 and 2 domains for peptide binding, the Ig-like α 3 domain, a transmembrane region, and a cytoplasmic region. A single mutation is described, Cys282Tyr, that disrupts the disulfide bridge in the α 3 domain, through which class I genes are posited to associate with β2-microglobulin. Eighty-three percent of unrelated American patients were found to be homozygous for this mutation in the Feder report, with a carrier frequency of 6.4% in normal controls. Presumably, the Cys282Tyr mutation interferes with the association of HFE with β2-microglobulin, which in other class I molecules has been shown to be important for intracellular transport of the protein from the endoplasmic reticulum to the cell surface. The involvement of a class I gene in iron transport is supported by studies showing iron overload in mice that are homozygous for a knockout of β2-microglobulin.137 However, the mechanistic connection between intestinal iron absorption and what is known of the function of the class I genes is obscure. Although extensively searched for, autoimmunity is not a part of the pathophysiology of hemochromatosis.
Northern blot analysis shows that HFE is a 4-kb mRNA that is weakly expressed in many tissues. The published cDNA is smaller than the mRNA, 2.7 kb, and presumably missing 1.3 kb of the 5′ untranslated region. The Cys282Tyr mutation is the single described mutation consistently associated with disease bearing chromosomes. A second much less common mutation is described as well, which by itself does not appear to increase the risk of disease. Presumably, there are other mutations of HFE that are yet to be discovered, but analysis of this region has so far proven elusive.
Interestingly, Feder et al did not find linkage disequilibrium with 6p21.3 markers in those hemochromatosis chromosomes lacking the Cys282Tyr mutation. The lack of linkage disequilibrium was interpreted as evidence of an additional nonlinked locus causing the same hemochromatosis phenotype. In their follow-up studies from Australia, Jazwinska et al146 recently reported that the severest phenotype was found in patients with the ancestral haplotype, spanning the region from HLA-B through and beyond HLA-F (Fig 1). Because all of their patients were homozygous for the Cys282Tyr mutation, they suggested that there was a second modifier gene close to HFE that varied with the haplotype. In contrast, Feder et al found that in the United States hemochromatosis phenotypes associated with homozygote Cys282Tyr, heterozygote Cys282Tyr, or homozygote non-Cys282Tyr mutations could not be differentiated from each other on clinical grounds. Although the association of the Cys282Tyr mutation with hemochromatosis is a compelling argument for causation, an explanation for the non-Cys282Tyr patients is lacking. It is possible that some cases of non-Cys282Tyr hemochromatosis are caused by alternative mutations of HFE. However, proof of this awaits the cloning of the remaining 1.7 kb of the HFE cDNA and identification of alternative mutations. Curiously, a new syndrome of liver iron overload with normal transferrin saturation was recently described by Pippard147 and Moirand et al.148 This discovery adds further weight to the speculation by Feder et al that mutations of other genes may cause a clinical picture indistinguishable from hemochromatosis.
The hemochromatosis phenotype, Cys282Tyr mutation, and the most common MHC haplotype, A3/B7, are widespread in distribution among white populations, including Great Britain, France, Italy, Germany, Sweden, Norway, Australia, and the United States.135 The high prevalence and distribution strongly suggest that the mutation arose on an ancestral chromosome as an ancient event, perhaps in a Celtic or Scandinavian population. Based on the estimated incidence of hemochromatosis, the high carrier frequency, and the frequency of the Cys282Tyr mutation, it can be estimated that the Cys282Tyr mutation may be present in at least 50 to 100 million individuals. Given a plausible value for the coefficient of selection, this mutation must have occurred much more than 100 generations ago.
The physical distance between HLA-A and the hemochromatosis gene is over 4 megabases. If the rate of recombination in this region were comparable to the average rate over the genome, one would expect that 4% of meioses would result in separation of HLA-A and HFE markers. At this recombination frequency linkage disequilibrium should have dissipated over a relatively short period of time, yet it has not. Possible explanations for preservation of linkage disequilibrium through this 4-Mb region include multiple identical HFE mutations on the same haplotypic background, recent origin for the HFE mutation, recombination suppression, or perhaps simultaneous selection of HLA-A alleles such as HLA-A3 and the hemochromatosis mutation. Of these, the first two explanations are less likely. The possibility that the identical HFE mutation could occur multiple times on the same haplotypic background suggests that there might be an inherently unstable sequence in the HFE gene, such as a trinucleotide repeat that is known to expand over successive generations or a CpG dinucleotide. This is not the case. The HFE mutation that is associated with the most common haplotype, A3/B7, is the G to A transition at nucleotide 845 of the open reading frame that causes the Cys282Tyr missense mutation. Furthermore, our own work and others have shown that the mutation is very stable in families, and as yet no examples of spontaneous mutations of this type have been reported.
Simultaneous selection of alleles at multiple loci and recombination suppression are more difficult explanations to dismiss. It is possible that a certain genetic advantage is offered by a combination of alleles from genes distributed through this region including HLA genes and even perhaps extending to the HFE gene. However, if simultaneous coselection of alleles at the ends of the region were responsible for the maintenance of linkage disequilibrium, then intervening alleles might not have been consistently retained in the haplotype. This is not what is observed among hemochromatosis chromosomes, nor have genes been found in the distal region that might interact with proximal HLA genes, and thereby promote selection.
Linkage disequilibrium for portions of the MHC, particularly the region between HLA-A and HLA-B, suggest that either recombination suppression or coselection of alleles at several loci may be operative for some combinations of alleles.130 The A1/B8 haplotype, “long renowned for its high disequilibrium,”149 occurs in 7% of French families illustrating the high association that alleles at these two loci may exhibit. The investigators studying this have interpreted these data as suggestive of selection, although the distinction has not been made between coselection operating on several loci or recombination suppression in the region. In contrast, A2 does not show linkage disequilibrium with HLA-B alleles, suggesting that the selection phenomenon or low recombination rate does not apply to all possible haplotypes. Studies of the syntenic region in the mouse suggest that there is significant recombination suppression, and that it is haplotype specific.150
The linkage data distal to HLA-A are somewhat more limited. The Genethon sex averaged linkage map151 shows no recombinants in the 186 meioses derived from the eight CEPH families that were studied, with markers adjacent to or interspersed between HLA-A and HFE. The major hemochromatosis mutation is in strong linkage disequilibrium with markers near HLA-A. The A3/B7 hemochromatosis haplotype comprises up to 40% of hemochromatosis bearing chromosomes.141 Compared with genome-wide averages, recombination in this region on hemochromatosis mutation bearing chromosomes is suppressed. A common extended haplotype has been described in the normal European control populations associated with A1/B8, that extends at least 4 Mb from HLA-B. Recombination suppression or coselection of favorable alleles are both plausible explanations for persistence of linkage disequilibrium in these haplotypes. The three independent histone clusters mapping to this region136 (Fig 1, unpublished data, July 1996) may also be relevant to recombination suppression.
If there is haplotype-specific heterogeneity in the rate of recombination, then it is tempting to consider that it may be modulated by haplotype specific local sequences. Preferred sites for recombination have been identified in the class II and class III regions, in both the mouse150 and human genome,152 and seem to be a general feature throughout the MHC.153 In some cases the rate of recombination within a relatively short region of less than 2,000 bases may be elevated several orders of magnitude above the average rate for the genome as a whole. Therefore, all the MHC recombinants obtained in laboratory crosses of a particular pair of strains may occur within the “hotspot.”150 The hotspot is thought to be the resolution point where joint molecules between two DNA duplexes become separated. The phenomenon is complex, in that a particular hot spot is operative only in certain haplotypes and genetic backgrounds, and may be sex specific.150 Recombination may152 or may not require perfect homology across the hotspot,154 but differences in hotspot activity may be due to variation in the number of copies of relatively simple repetitive sequences present in tandem arrays at the recombination site.155 Recombination only occurs when the number of copies of at least one set of repeats is the same in both chromosomes. The activity of the hotspot seems to require “instigator” sequences present in some but not all haplotypes and located some distance from the region where the recombination actually occurs.150
Recent work on yeast mating type loci has also shown that chromosomal regions may exhibit regulated activation of recombination, and that the regulation may involve short regions of cis-active DNA sequence together with trans-acting proteins. The area over which recombination is activated may be several orders of magnitude larger than the cis-active DNA sequences that interact with the relevant protein(s). In particular, a 700-bp sequence has been identified that activates recombination along a 90,000-bp region of a yeast chromosome.156 This occurs in cells of only one of the two mating types, perhaps as a consequence of expression of a mating type specific DNA binding protein. The region where the recombination actually occurs is 17,000 bases removed from the 700-bp instigator sequence. Phenomenologically this is very similar to what has been deduced for the MHC, but the detailed mechanisms in both cases remain to be worked out.
ANCIENT DUPLICATIONS
The class II and class III regions of the human and mouse are roughly colinear with a good correspondence between the presence and linear order of specific genes in the two species. Persuasive evidence exists that chromosomes 1q22-23 and 9q34.3 contain multigene segments of DNA that are paralogous to the class II, III, and perhaps IV regions of the MHC on chromosome 6.157 For example, a gene for one of the three members of the RXR steroid receptor family is located just centromeric to the class II region on chromosome 6, and on chromosomes 1 and 9, presumably in paralogous positions. A similar situation occurs with respect to the PBX genes, and to Notch homologue genes, and several other homologues that are shared by at least two of the three chromosomal regions. The chromosome 1 duplication is particularly striking as it falls in the same region where the class I–like CD1 genes lie. Also, OTF3 maps in the MHC between HLA-A and E, and OTF2 maps at an unknown site on chromosome 1. Most recently, a new class I gene called MR1 has been identified on the long arm of chromosome 1,158 although its reported location is somewhat more telomeric than the remainder of the MHC-like genes referred to above. It is not yet known whether MR1 is polymorphic or is expressed at the protein level. Unfortunately, the regions on chromosomes 1 and 9 have not been mapped with sufficient resolution to see how closely the organization of the paralogous regions match that of either the present MHC or some plausible model for a precursor MHC. Among other things these observations raise the interesting possibility that additional immune system genes will be located in the paralogous regions on chromosomes 1 and 9, and perhaps 19 where the class I–like zinc-α2-glycoprotein gene is located.
As discussed above, both class II and class I genes show extensive allelic variability due to point mutations, intra-locus recombinations, and intra-locus gene conversion-like events. In addition, the relationship between class I regions of various mammals is complex and multiple cycles of expansion and contraction must have occurred to account for the genetic structure of the region and its inter- and intra-species differences.36,159,160 The human MHC overall contains at least 28 class I–related genes or gene fragments including the 18 to 19 analyzed by Geraghty et al,161 the 5 MIC genes or pseudogenes,84 at least 2 class I segments near MIC-A and B, respectively, a class I–related sequence in the class II region,162 and the hemochromatosis gene. HFE is the only one of these genes that is confidently the ortholog of a specific mouse gene, a reflection of the massive changes created by cycles of expansion and contraction since the divergence of the progenitors of these two mammals.
One striking feature of the divergence between the murine and human class I regions is that it has resulted in the presence of functional subfamilies of class I genes in each species that are missing from the other. Thus, the mouse contains a family of TL genes of largely unknown function29 and also has H2-M genes that have the interesting property of binding specifically to formyl-methionine–containing peptides such as occur at the amino terminus of bacterial and mitochondrial proteins.163 Both of these types of class I genes have not been detected and are probably completely missing from humans. Conversely, humans have nonclassical class I heavy-chain genes, including an HLA-G gene164 that is expressed in specific locations such as placental and fetal cells165 where it may serve to prevent NK cell rejection of fetuses166 and also in very early stages of embryogenesis.167 Very recently mouse class I genes have been found that may be preferentially expressed in early development but it is quite unclear whether they are homologues in either structure, evolutionary relationship, or even function, to the human HLA-G gene. As described above, humans also have a family of five genes or pseudogenes termed MIC85 or PERB168 genes. These encode at least one expressed product, MIC-A protein, which is polymorphic, induced by heat shock and expressed preferentially in intestinal mucosa, but its precise function is not yet known. Comparable products were undetectable in the mouse.
The MIC genes are clearly structurally related to class I genes both in terms of the structure of the proteins they encode and in the position of seven introns located at homologous positions in the coding sequences. The structure of the MIC genes differs from those of other class I genes in that they contain a long (about 10 kb) first intron. This intron contains the first exon of a transcript derived from the complementary strand. A spliced form of the transcript from one of the MIC pseudogenes has been detected and contains additional exons from DNA upstream of the MIC transcription initiation site.168 This complex of two interspersed genes or pseudogenes occurs twice centromeric of HLA-B and three more times between HLA-E and HLA-F in the distal MHC, accounting at least in part for the observation of repeated blocks of sequence in these regions of the MHC.
The mouse MHC complex apparently lacks any genes of the MIC series, and the human lacks the extensive series of TL genes of the mouse. The location of genes B30.2 and MOG suggests that either the entire region of the human MHC from HLA-A to HLA-F is compressed to less than 150 kb in the mouse and contains a single H2M class I gene,169 or that an inversion has occurred with one end point lying between B30.2 and MOG. This is further evidence of the remarkably extensive rearrangements, deletions, and expansions that have occurred in the vertebrate MHC class I region.
The hemochromatosis gene containing region of chromosome 6p is located about 4 Mb telomeric to HLA-F. The hemochromatosis region also appears to have originated by a large duplication, involving histone genes136 and perhaps a RET finger-like gene170 in addition to the class I gene. The exact duplication region has not been defined yet, but it apparently lacks the olfactory receptor genes that are located between the nearest class I gene and the proximal histone gene cluster.
DISCUSSION AND SPECULATION
As indicated in the above paragraphs, the MHC exhibits several singular characteristics, including a clustering of genes of disparate structure that function within the immuno-hematopoietic system, evidence of rapid evolution relative to most of the nonrepetitive sequence portions of the genome, and evidence for variation in the rates of intra-MHC recombination. We speculate that these phenomena may be related in the following way. Class I and class II genes of the MHC, in particular, appear to be subject to selective pressures for diversification that operate on a short time scale relative to most evolutionary events. Environmental stresses may also put selective pressure on alleles of other immune system genes. In addition, the resulting rapid shifts in allele abundance and composition of the class I and class II genes, whether due to parasite-driven selection or to effects of the MHC on procreation, may result in the emergence of autoimmune phenomena putting secondary selective pressure on other genes of the immune system to diminish the incidence or consequences of autoimmunity. Also, the extreme divergence of class I regions of mammals compared with even the class II region raises the question of whether there are other evolutionary pressures on class I genes beyond those arising from the presentation of peptides to the T-cell receptor.
The presence of other immune system genes in the same chromosomal region and therefore in linkage disequilibrium with the classical MHC genes provides the opportunity for simultaneous selection of the optimal alleles of these other genes from among the variety of chromosomes carrying the newly selected class I or II allele. This diminishes the rate of disruption of the favored combinations of alleles that would result from recombination or reassortment of chromosomes in meiosis. If recombination rates were the same in all MHCs then there would be an optimal low rate of recombination sufficient to generate favorable combinations of alleles but low enough to limit the genetic burden of recombinant nonoptimal chromosomal segments.171 However, different MHC regions exhibit heterogeneous rates of recombination, controlled at least in part by sequences within the MHC itself. This permits recombination of alleles so as to generate, among others, favorable combinations in chromosomal segments that may then be relatively locked in by recombination suppression. These chromosomal segments would be advantageous and thus expanded, generating extended haplotypes that show linkage disequilibrium in the population. The same considerations could apply to the occurrence of olfactory receptor genes linked to the MHC. If class I genes in particular are determinants of odor, as reviewed above, then there might be coselection of receptors that are responsive (or nonresponsive) to the class I–determined odors.
SUMMARY
The MHC has long been known to play a major role in the determination of genetic susceptibility to autoimmune disorders, and a large part of this is due to polymorphisms in the class II genes. However, there are at least some cases where the evidence is strong that non-class II genes of the MHC also play a role in the predisposition to autoimmunity. The identification of additional genes in this region that modify the propensity for autoimmune diseases could have important diagnostic and therapeutic implications. In recent years analyses of MHC encoded transcripts has shown that there are within the MHC more than 10 different structurally diverse families of genes whose patterns of expression indicate a specific role in the immune system. The number of different MHC genes functioning in the immune response and the extensive regions of linkage disequilibrium seen in some instances in the MHC make it particularly difficult to use positional genetic approaches to assign disease contributions to particular gene products. Short of understanding the regulation and function of each protein at a level such that the effect of amino acid or nucleotide sequence variations could be predicted, perhaps the best genetic approach is to look for sense changes in codons or variation in known promoter elements for each gene, and then test in large populations to see if there is any correlation with disease manifestations independent of their disequilibrium with the class II genes. This could be very difficult for a number of reasons, not least of which is the extensive degree of linkage disequilibrium and the consequent occurrence of extended haplotypes in this region.
NOTE ADDED IN PROOF
An increasing amount of genomic sequence data from the MHC is emerging and available directly from the Sanger Centre (ftp://ftp.sanger.ac.uk/pub/human/chr6/). Tapasin, a transmembrane glycoprotein that mediates calreticulin bound MHC class I molecule interactions with TAP, was recently mapped close to the MHC.173
ACKNOWLEDGMENT
The authors thank Drs Frank Black, Chris Bowlus, Nancy Ruddle, Judith Kidd, Gordon Shepherd, and Ruma Raha-Chowdhury for their helpful comments and corrections of the manuscript. We also thank our colleagues for their helpful discussions and for materials provided: Drs Duncan Campbell, Peter Lengyel, and John Trowsdale.
J.R.G. is supported by the March of Dimes Birth Defects Research Foundation, Clinical Research Grant No. 6-FY96-0272, and by National Institutes of Health (NIH) Grant No. R29 DK45819-05. S.M.W. is supported by NIH Outstanding Investigator Award CA42556-11.
Address reprint requests to Jeffrey R. Gruen, MD, Yale Child Health Research Center, Department of Pediatrics, Yale University School of Medicine, 464 Congress Ave, New Haven, CT 06520-8081.
The name given by Feder et al was HLA-H. This nomenclature is confusing because it had been previously dubbed “HFE” years ago in anticipation of its discovery, and “HLA-H” had already been used to designate an MHC class I gene several years earlier that mapped between HLA-A and HLA-G. We refer to the new gene simply as HFE for the remainder of this review.