• Black South African population groups were found to be a homogeneous group, with similar matching probabilities as White patients.

  • The data indicate that Black South African patients will benefit considerably from intrapopulation donor recruitment.

Abstract

More than 41 million potential unrelated donors are currently listed in registries worldwide. However, these donors are not evenly distributed. In particular, individuals from sub-Saharan Africa are underrepresented, complicating unrelated donor searches for patients from this region. Here, we analyzed HLA allele and haplotype frequencies as well as resulting matching probabilities of the 4 South African population groups (Black African, Coloured, Indian/Asian, and White) based on a data set (n = 56 961) of potential stem cell donors registered with DKMS. Our large data set of well-typed and -characterized individuals allowed for unprecedentedly accurate results for these population groups. One major finding was that the haplotypic diversity of Black South African and White individuals was almost identical. The cumulative frequencies of the 50 most frequent 5-locus (HLA-A, -B, -C, -DRB1, and -DQB1) haplotypes were 38.9% for the Black African population and 38.5% for the White population. Indian/Asian (27.5%) and especially Coloured populations (20.8%) had lower cumulative frequencies, indicating higher intrapopulation diversity. Consequently, when donors and patients from the same population group were considered, matching probabilities of Black African and White populations were approximately the same and significantly greater than those of Indian/Asian and particularly of Coloured populations. With 1 000 000 registered Black African donors, the probability for Black African patients to find at least 1 fully matched donor would be 80%. Donor recruitment in the Black African population is therefore very promising. For Indian/Asian and Coloured populations, there is at least a good chance of finding a suitable mismatched donor; for example, 51% (92%) for 9/10 (8/10) matched donors at a registry size of 100 000 for the Coloured population.

Allogeneic hematopoietic stem cell transplantation is the only available cure for many hematological diseases. Although many patients have suitable donors in the family, either HLA-identical or haploidentical, an unrelated donor is nevertheless often the donor of choice.1 According to the World Marrow Donor Association, >41 million potential unrelated donors are currently listed in registries in 57 different countries worldwide.2 However, these donors are not evenly distributed around the world; in fact, there is a disproportionally high number of registered donors of European descent.3 

The HLA complex is located on the short arm of chromosome 6, in the most polymorphic region of the human genome.4,5 More than 41 000 HLA alleles are described in the Immuno Polymorphism Database-international ImMunoGeneTics project/Human Leukocyte Antigen (IPD-IMGT/HLA) database (version 3.59.0).6HLA genes are inherited as haplotypes with closely linked alleles.7 Because HLA allele frequencies (AFs) and haplotype frequencies (HFs) are highly population specific,8 patients have the best chance of finding a matching donor within their own population group.

DKMS is a large stem cell donor registry that currently administers 12.7 million potential donors in 7 countries including South Africa. Donors registered with DKMS have donated stem cells >123 000 times so far.9 

People from Africa are underrepresented in registries worldwide.10,11 Furthermore, it has been indicated that the Black African population shows less single-allele dominance and higher genetic diversity than individuals of European descent.12-15 As a result, the chances of finding a suitable unrelated stem cell donor for patients of African origin are generally regarded to be relatively poor. For African American individuals, this has been confirmed in comparison to other population groups.16 To increase the representation of people from Africa in the global stem cell donor pool, DKMS has started donor recruitment activities in South Africa. Until January 2025, more than 45 000 Black African individuals have been registered.

South Africa is of particular interest in the context of hominization, because several fossil finds date back up to 4 million years.17 Approximately 20 000 years ago, the San lived in South Africa as hunters and gatherers.18 Approximately 2500 years ago, some San became pastoralists and separated from their hunter-gatherer ancestors, becoming the Khoikhoi.19,20 These stock farmers eventually inhabited most of the fertile land, with the San being more and more pushed back into dryer areas or the mountain range. Bantu tribes started migrating from Central Africa to the South ∼2000 years ago, settling in the eastern part of South Africa, leaving the drier western part to the Khoikhoi and San.21 The first European colony was founded in 1652 by the Dutch.19,22 They soon started their own farmlands and brought enslaved people from other colonies, mainly India and Indonesia, to South Africa.19 The first British settlement was founded in 1820. Due to the British rule over the Cape, the more independently living Dutch settlers abandoned their farms to seek new lands further north.23,24 In the 1860s, the British started to bring Indian workers, mainly from Tamil Nadu, to South Africa to work on sugar cane plantations.19,25 In the 1870s, traders from India, especially Gujarat, began following the indentured workers to South Africa seeking new business opportunities.26,27 Furthermore, in the early 20th century, Chinese workers were recruited to work in gold mines.19 During the apartheid era (1948-1990), people were systematically segregated according to their appearance, and existing discrimination against non-White populations was intensified.28 Today, South Africa is a diverse country, mainly inhabited by 4 population groups: Black African (81.4%), Coloured (8.2%), Indian/Asian (2.7%), and White (7.3%).29,30 The Coloured and Indian/Asian populations represent a heterogeneous group that, for decades, was considered separate from other South African population groups in terms of their “nonbelonging,” not only terminologically. This, in turn, created a separate identity and sense of belonging within these groups, as a result of which the designations have persisted.30 Genetic analyses of South African Coloured population revealed that their ancestors were predominantly Khoikhoi and San (Khoisan), Bantu-speaking Africans, Europeans, and a small Asian proportion, demonstrating the mixed heritage of this population group.31 In this study, we analyzed AFs and HFs of registered stem cell donors of these 4 population groups. Apart from providing information on population-specific HLA frequency distributions in South Africa with unprecedented accuracy, the study aimed to assess the potential impact of stem cell donor recruitment efforts in South Africa.

Cohort

During registration as a potential stem cell donor, volunteers were asked to provide information about their own population group, ZIP code, and native language, as well as their parents’ population group and region of origin. Donors provided informed consent for their data to be used anonymously for research purposes. In total, 87 062 donors registered between March 2019 and March 2024. In this study, we considered all donors who were HLA typed at DKMS Life Science Lab (Dresden, Germany), had no new alleles, provided complete information including a South African ZIP code, and had parents who originated from South Africa and belonged to the donor’s population group. Based on these criteria, 56 961 individuals (21 168 Black African [15 482 female and 5686 male; median age, 23 years], 5196 Coloured [3880 female and 1316 male; median age, 31 years], 5244 Indian/Asian [3516 female and 1728 male; median age, 35 years], and 25 353 White individuals [20 146 female and 5207 male; median age, 36 years]) were included in the analysis (supplemental Figure 1; refer to supplemental Figure 2 for a map of South Africa showing the distribution and ethnic composition across the individual provinces). Furthermore, we defined subpopulations based on population group, native language, and province. Fifteen subpopulations with >1000 individuals were identified (Table 1).

Table 1.

Subpopulations with 1000 individuals or more

Population groupLanguageProvinceSample size
Black African isiXhosa Western Cape 3348 
Black African isiZulu Gauteng 1910 
Black African isiZulu KwaZulu Natal 4554 
Black African Sepedi Gauteng 1001 
Coloured Afrikaans Western Cape 1005 
Coloured English Western Cape 2697 
Indian/Asian English Gauteng 1411 
Indian/Asian English KwaZulu Natal 3441 
White Afrikaans Gauteng 5577 
White Afrikaans Mpumalanga 1534 
White Afrikaans Western Cape 2920 
White English Gauteng 5026 
White English Eastern Cape 1005 
White English KwaZulu Natal 2127 
White English Western Cape 3153 
Population groupLanguageProvinceSample size
Black African isiXhosa Western Cape 3348 
Black African isiZulu Gauteng 1910 
Black African isiZulu KwaZulu Natal 4554 
Black African Sepedi Gauteng 1001 
Coloured Afrikaans Western Cape 1005 
Coloured English Western Cape 2697 
Indian/Asian English Gauteng 1411 
Indian/Asian English KwaZulu Natal 3441 
White Afrikaans Gauteng 5577 
White Afrikaans Mpumalanga 1534 
White Afrikaans Western Cape 2920 
White English Gauteng 5026 
White English Eastern Cape 1005 
White English KwaZulu Natal 2127 
White English Western Cape 3153 

HLA typing

HLA typing at registration was performed at DKMS Life Science Lab, as described previously.32,33 In short, standardized next-generation sequencing was performed on Illumina devices using primers targeting exons 2 and 3 of HLA-A, -B, -C, -DRB1, -DQB1, and -DPB1. We used the “g” grouping of alleles, which combines alleles with identical DNA sequences over exons 2 and 3 (including null alleles) with alleles that show synonymous mutations in this region.34 

AF and HF analysis

Five-locus (HLA-A, -B, -C, -DRB1, and -DQB1) and 6-locus (HLA-DPB1 added) HFs were calculated using the Hapl-o-Mat open-source software,35,36 which applies an expectation-maximization algorithm.37 AFs were derived from the HFs, because direct allele counting was not possible due to some typing ambiguities (in 904/341 766 of the typed loci [0.26%]).

LD and HWE

The linkage disequilibrium (LD) coefficient D’ was calculated for each 2-locus allele combination based on the estimated 6-locus HFs.38,39 We used Fisher exact test with Holm-Bonferroni correction for multiple testing to analyze significance and determine P values. We identified all frequent allele pairs (HF ≥ 0.01) with significant LD (P ≤ .05, after Holm-Bonferroni correction) and D’ ≥ 0.9, indicating a strong LD.

Hardy-Weinberg equilibrium (HWE) analysis was performed on g-level resolution with Arlequin v 3.540 for all 6 loci. Deviations from HWE were evaluated by comparing the observed and expected heterozygosity and by using the effect size statistic Wn to exclude statistically significant deviations from HWE without practical relevance.41 HWE deviations with Wn < 0.1 were considered not relevant.

MPs

Matching probabilities (MPs) were calculated using 5-locus HFs (excluding HLA-DPB1), as previously described.42 Haplotypes (sorted by descending frequency) were considered up to a cumulated frequency of 99.5% and normalized to 1. MP describes the probability that a patient from a given population will find at least 1 donor of a defined match grade (full match or up to number of mismatches) in a registry of donors of the same (intrapopulation matching) or a defined other population. For mismatch calculations, we also refer to Beatty et al.43 In addition to the 4 South African population groups, we included a sample of African American donors from DKMS USA, as well as 2 samples from donors from DKMS Foundation India from Gujarat and Tamil Nadu. To mitigate sample size effects, random samples of 5000 individuals per population group were selected for MP analyses. To estimate the effect of the random reduction of sample sizes, we took 10 different samples of Black South African donors and determined the corresponding intrapopulation MPs. These showed only little variation; for example, for a registry size of 1 000 000, MP values ranged from 0.795 to 0.804. Therefore, in the following, we conducted analyses for each population only for a reduced sample of size 5000. To make the results more comparable with those of other studies,44 we also calculated MPs for Black African and White populations based on a sample of 20 000 each.

GDs

Locus-specific genetic distances (GDs) between the various populations were calculated as Cavalli-Sforza and Edwards chord distances.45 The overall distance for each population pair was calculated as the Euclidean distance of the locus-specific distances. GDs were visualized by multidimensional scaling (MDS) using R 4.4.0 (“Puppy Cup”).46 For GD calculations, the same samples were used as for the MP calculations. Additionally, we included samples from individuals of Chinese, Dutch, and Indonesian descent from DKMS Germany and British individuals from DKMS UK. Reference samples had a size of 5000, with the exception of donors of Chinese (n = 3578) and Indonesian origin (n = 1021). AFs for GD determination were calculated from the 5-locus HFs (up to a cumulated frequency of 0.995 and then normalized to 1) of the randomly reduced samples.

AFs

We determined HLA AFs by population group (Table 2; see supplemental Tables 1-6 for complete allele lists). The allele with the highest frequency in one of the population groups was HLA-DPB1∗04:01g, with a frequency of 42.8% in the White population. When assessing allelic diversity by the cumulative frequency of the 10 most frequent alleles (high cumulative frequencies indicate low diversity), DQB1 was the least diverse locus in all population groups, with cumulative frequencies between 92.3% (Coloured population) and 99.6% (Black African population). HLA-B was the locus with the highest allelic diversity in all population groups except Black African, in whom HLA-A was more diverse. The cumulative frequency of the 10 most common HLA-B alleles was between 44.4% (Coloured population) and 74.2% (Black African population). The Coloured population showed the greatest allelic diversity for 5 of the 6 loci studied (except HLA-DPB1, for which the Indian/Asian population displayed greater diversity). Black African individuals showed the lowest allelic diversity for HLA-B, -C, DQB1, and -DPB1, the Indian/Asian population for HLA-DRB1, and the White population for HLA-A. In all loci except HLA-A, Black African donors showed lower allelic diversity than the White population. AFs of the Coloured population indicated the mixed nature of this population group. For example, HLA-B∗58:02g was moderately common in the Coloured population, with a frequency of 5.1%, whereas it was the most frequent HLA-B allele in the Black African population (12.3%) and only rarely occurred in Indian/Asian and White populations (0.2% each).

Table 2.

Frequencies and cumulated frequencies of the 10 most frequent alleles by HLA locus and population group

Black AfricanColouredIndian/AsianWhite
AlleleFrequency, %Cumulated frequency, %AlleleFrequency, %Cumulated frequency, %AlleleFrequency, %Cumulated frequency, %AlleleFrequency, %Cumulated frequency, %
HLA-A 30:01g 10.1 10.1 02:01g 11.5 11.5 11:01g 15.6 15.6 02:01g 26.3 26.3 
 23:01g 9.5 19.6 24:02g 8.9 20.4 24:02g 14.6 30.2 03:01g 15.6 41.9 
 30:02g 8.4 28.0 01:01g 8.3 28.6 01:01g 14.1 44.3 01:01g 15.1 57.0 
 68:02g 8.0 36.0 03:01g 7.9 36.6 33:03g 9.9 54.1 24:02g 8.9 65.9 
 29:02g 6.8 42.8 11:01g 7.0 43.5 02:11g 8.3 62.4 11:01g 5.9 71.8 
 03:01g 5.8 48.6 23:01g 4.9 48.5 03:01g 6.5 68.9 29:02g 3.3 75.1 
 02:01g 5.7 54.3 30:01g 4.0 52.4 68:01g 6.1 74.9 32:01g 3.3 78.4 
 74:01g 5.6 59.9 32:01g 3.4 55.9 02:01g 4.2 79.1 68:01g 2.8 81.2 
 02:05g 5.2 65.0 33:03g 3.4 59.2 26:01g 3.8 82.9 31:01g 2.6 83.8 
 34:02g 4.2 69.2 68:02g 3.2 62.4 31:01g 2.7 85.6 26:01g 2.3 86.0 
HLA-B 58:02g 12.3 12.3 07:02g 7.0 7.0 40:06g 12.5 12.5 07:02g 14.5 14.5 
 42:01g 9.8 22.1 44:03g 6.4 13.4 52:01g 9.0 21.5 08:01g 10.7 25.2 
 44:03g 8.6 30.7 58:02g 5.1 18.5 57:01g 7.3 28.8 44:02g 8.5 33.7 
 15:03g 8.4 39.1 08:01g 5.1 23.6 44:03g 7.1 35.9 15:01g 7.6 41.3 
 15:10g 8.3 47.4 18:01g 4.5 28.1 51:01g 7.0 42.9 35:01g 5.9 47.3 
 08:01g 6.5 53.9 58:01g 3.8 31.8 35:03g 6.2 49.1 51:01g 5.0 52.2 
 58:01g 5.7 59.6 15:03g 3.5 35.3 35:01g 4.4 53.5 44:03g 4.6 56.8 
 45:01g 5.3 64.9 15:10g 3.1 38.4 58:01g 4.1 57.6 40:01g 4.5 61.2 
 07:02g 5.0 69.9 35:01g 3.1 41.5 07:02g 3.3 60.9 18:01g 3.7 65.0 
 81:01g 4.3 74.2 51:01g 3.0 44.4 08:01g 2.9 63.8 14:02g 3.7 68.7 
HLA-C 06:02g 16.7 16.7 06:02g 13.4 13.4 06:02g 12.4 12.4 07:02g 15.0 15.0 
 07:01g 13.5 30.2 04:01g 12.5 25.9 15:02g 11.1 23.5 07:01g 14.9 29.9 
 17:01g 12.8 43.0 07:01g 12.0 37.9 04:01g 10.8 34.4 04:01g 10.3 40.2 
 04:01g 11.9 54.9 07:02g 9.5 47.4 07:02g 10.8 45.2 06:02g 8.8 49.0 
 02:02g 10.0 65.0 02:02g 6.7 54.2 07:01g 10.4 55.6 05:01g 7.7 56.7 
 16:01g 6.2 71.2 03:04g 4.9 59.1 12:02g 10.1 65.7 03:04g 7.2 63.9 
 07:02g 6.1 77.3 17:01g 4.5 63.5 14:02g 4.8 70.6 03:03g 6.6 70.4 
 03:04g 5.0 82.4 07:04g 3.3 66.9 03:02g 4.2 74.7 02:02g 5.6 76.1 
 18:01g 4.9 87.3 16:01g 3.2 70.1 12:03g 3.4 78.1 12:03g 4.2 80.2 
 08:04 3.2 90.5 15:02g 2.8 72.9 01:02g 3.3 81.4 08:02g 4.1 84.3 
HLA-DRB1 11:01g 12.8 12.8 07:01g 11.9 11.9 07:01g 18.2 18.2 15:01g 14.5 14.5 
 03:02g 12.7 25.5 15:01g 8.7 20.6 15:01g 13.0 31.2 03:01g 11.4 25.9 
 13:01g 12.3 37.7 03:01g 8.3 28.9 15:02g 11.4 42.5 07:01g 10.7 36.6 
 15:03g 9.5 47.3 13:01g 6.5 35.4 14:04g 8.9 51.4 04:01g 9.0 45.7 
 03:01g 7.7 55.0 11:01g 6.4 41.8 13:01g 6.6 58.1 01:01g 8.7 54.4 
 07:01g 7.7 62.7 04:01g 6.4 48.2 03:01g 6.4 64.5 13:01g 7.6 61.9 
 13:02g 6.1 68.8 15:03g 5.5 53.7 04:03g 6.3 70.8 11:01g 5.3 67.2 
 01:02g 4.5 73.3 12:02g 5.4 59.0 11:01g 5.4 76.2 13:02g 5.1 72.3 
 11:02g 4.2 77.4 13:02g 4.9 63.9 12:02g 2.8 79.0 08:01g 3.3 75.6 
 12:01g 4.1 81.5 15:02g 4.2 68.1 13:02g 2.7 81.6 14:01g 2.8 78.4 
HLA-DQB1 06:02g 22.0 22.0 02:01g 17.4 17.4 06:01g 20.8 20.8 03:01g 18.6 18.6 
 02:01g 16.4 38.4 03:01g 17.0 34.5 02:01g 16.9 37.7 02:01g 18.5 37.2 
 04:02g 15.5 53.8 06:02g 15.2 49.6 03:01g 12.9 50.6 06:02g 14.6 51.8 
 03:01g 15.3 69.1 05:01g 10.4 60.0 05:03g 11.9 62.5 05:01g 10.7 62.5 
 05:01g 11.1 80.2 03:02g 8.6 68.6 03:02g 8.8 71.2 03:02g 9.4 71.8 
 06:03g 7.6 87.8 04:02g 5.5 74.1 03:03g 8.3 79.5 06:03g 7.7 79.5 
 06:09g 5.0 92.8 06:01g 5.1 79.2 06:03g 6.7 86.2 03:03g 5.3 84.9 
 03:02g 4.3 97.2 06:03g 4.8 84.0 05:01g 4.6 90.8 06:04g 3.6 88.5 
 06:04g 1.9 99.1 05:03g 4.2 88.2 05:02g 2.7 93.4 04:02g 3.2 91.7 
 03:03g 0.5 99.6 03:03g 4.0 92.3 06:02g 1.9 95.4 05:03g 3.2 94.9 
HLA-DPB1 04:01g 34.1 34.1 01:01g 33.8 33.8 04:01g 21.5 21.5 04:01g 42.8 42.8 
 02:01g 23.1 57.2 04:02g 21.1 54.9 04:02g 15.3 36.9 02:01g 13.0 55.8 
 13:01g 7.0 64.2 02:01g 11.4 66.4 01:01g 15.0 51.9 04:02g 11.5 67.3 
 04:02g 6.1 70.4 13:01g 6.2 72.5 02:01g 13.3 65.2 03:01g 9.5 76.8 
 14:01g 5.7 76.1 18:01g 5.4 77.9 03:01g 6.6 71.8 01:01g 5.2 82.0 
 03:01g 5.0 81.1 04:01g 4.8 82.7 13:01g 6.3 78.1 10:01g 2.1 84.2 
 26:01g 5.0 86.0 03:01g 4.6 87.4 05:01g 3.1 81.2 05:01g 2.1 86.3 
 09:01g 4.1 90.2 11:01g 2.9 90.2 14:01g 2.1 83.3 06:01g 1.9 88.2 
 01:01g 3.6 93.7 17:01g 1.9 92.1 18:01g 2.0 85.3 11:01g 1.9 90.1 
 17:01g 1.1 94.8 34:01g 1.8 93.9 11:01g 1.9 87.2 13:01g 1.9 92.0 
Black AfricanColouredIndian/AsianWhite
AlleleFrequency, %Cumulated frequency, %AlleleFrequency, %Cumulated frequency, %AlleleFrequency, %Cumulated frequency, %AlleleFrequency, %Cumulated frequency, %
HLA-A 30:01g 10.1 10.1 02:01g 11.5 11.5 11:01g 15.6 15.6 02:01g 26.3 26.3 
 23:01g 9.5 19.6 24:02g 8.9 20.4 24:02g 14.6 30.2 03:01g 15.6 41.9 
 30:02g 8.4 28.0 01:01g 8.3 28.6 01:01g 14.1 44.3 01:01g 15.1 57.0 
 68:02g 8.0 36.0 03:01g 7.9 36.6 33:03g 9.9 54.1 24:02g 8.9 65.9 
 29:02g 6.8 42.8 11:01g 7.0 43.5 02:11g 8.3 62.4 11:01g 5.9 71.8 
 03:01g 5.8 48.6 23:01g 4.9 48.5 03:01g 6.5 68.9 29:02g 3.3 75.1 
 02:01g 5.7 54.3 30:01g 4.0 52.4 68:01g 6.1 74.9 32:01g 3.3 78.4 
 74:01g 5.6 59.9 32:01g 3.4 55.9 02:01g 4.2 79.1 68:01g 2.8 81.2 
 02:05g 5.2 65.0 33:03g 3.4 59.2 26:01g 3.8 82.9 31:01g 2.6 83.8 
 34:02g 4.2 69.2 68:02g 3.2 62.4 31:01g 2.7 85.6 26:01g 2.3 86.0 
HLA-B 58:02g 12.3 12.3 07:02g 7.0 7.0 40:06g 12.5 12.5 07:02g 14.5 14.5 
 42:01g 9.8 22.1 44:03g 6.4 13.4 52:01g 9.0 21.5 08:01g 10.7 25.2 
 44:03g 8.6 30.7 58:02g 5.1 18.5 57:01g 7.3 28.8 44:02g 8.5 33.7 
 15:03g 8.4 39.1 08:01g 5.1 23.6 44:03g 7.1 35.9 15:01g 7.6 41.3 
 15:10g 8.3 47.4 18:01g 4.5 28.1 51:01g 7.0 42.9 35:01g 5.9 47.3 
 08:01g 6.5 53.9 58:01g 3.8 31.8 35:03g 6.2 49.1 51:01g 5.0 52.2 
 58:01g 5.7 59.6 15:03g 3.5 35.3 35:01g 4.4 53.5 44:03g 4.6 56.8 
 45:01g 5.3 64.9 15:10g 3.1 38.4 58:01g 4.1 57.6 40:01g 4.5 61.2 
 07:02g 5.0 69.9 35:01g 3.1 41.5 07:02g 3.3 60.9 18:01g 3.7 65.0 
 81:01g 4.3 74.2 51:01g 3.0 44.4 08:01g 2.9 63.8 14:02g 3.7 68.7 
HLA-C 06:02g 16.7 16.7 06:02g 13.4 13.4 06:02g 12.4 12.4 07:02g 15.0 15.0 
 07:01g 13.5 30.2 04:01g 12.5 25.9 15:02g 11.1 23.5 07:01g 14.9 29.9 
 17:01g 12.8 43.0 07:01g 12.0 37.9 04:01g 10.8 34.4 04:01g 10.3 40.2 
 04:01g 11.9 54.9 07:02g 9.5 47.4 07:02g 10.8 45.2 06:02g 8.8 49.0 
 02:02g 10.0 65.0 02:02g 6.7 54.2 07:01g 10.4 55.6 05:01g 7.7 56.7 
 16:01g 6.2 71.2 03:04g 4.9 59.1 12:02g 10.1 65.7 03:04g 7.2 63.9 
 07:02g 6.1 77.3 17:01g 4.5 63.5 14:02g 4.8 70.6 03:03g 6.6 70.4 
 03:04g 5.0 82.4 07:04g 3.3 66.9 03:02g 4.2 74.7 02:02g 5.6 76.1 
 18:01g 4.9 87.3 16:01g 3.2 70.1 12:03g 3.4 78.1 12:03g 4.2 80.2 
 08:04 3.2 90.5 15:02g 2.8 72.9 01:02g 3.3 81.4 08:02g 4.1 84.3 
HLA-DRB1 11:01g 12.8 12.8 07:01g 11.9 11.9 07:01g 18.2 18.2 15:01g 14.5 14.5 
 03:02g 12.7 25.5 15:01g 8.7 20.6 15:01g 13.0 31.2 03:01g 11.4 25.9 
 13:01g 12.3 37.7 03:01g 8.3 28.9 15:02g 11.4 42.5 07:01g 10.7 36.6 
 15:03g 9.5 47.3 13:01g 6.5 35.4 14:04g 8.9 51.4 04:01g 9.0 45.7 
 03:01g 7.7 55.0 11:01g 6.4 41.8 13:01g 6.6 58.1 01:01g 8.7 54.4 
 07:01g 7.7 62.7 04:01g 6.4 48.2 03:01g 6.4 64.5 13:01g 7.6 61.9 
 13:02g 6.1 68.8 15:03g 5.5 53.7 04:03g 6.3 70.8 11:01g 5.3 67.2 
 01:02g 4.5 73.3 12:02g 5.4 59.0 11:01g 5.4 76.2 13:02g 5.1 72.3 
 11:02g 4.2 77.4 13:02g 4.9 63.9 12:02g 2.8 79.0 08:01g 3.3 75.6 
 12:01g 4.1 81.5 15:02g 4.2 68.1 13:02g 2.7 81.6 14:01g 2.8 78.4 
HLA-DQB1 06:02g 22.0 22.0 02:01g 17.4 17.4 06:01g 20.8 20.8 03:01g 18.6 18.6 
 02:01g 16.4 38.4 03:01g 17.0 34.5 02:01g 16.9 37.7 02:01g 18.5 37.2 
 04:02g 15.5 53.8 06:02g 15.2 49.6 03:01g 12.9 50.6 06:02g 14.6 51.8 
 03:01g 15.3 69.1 05:01g 10.4 60.0 05:03g 11.9 62.5 05:01g 10.7 62.5 
 05:01g 11.1 80.2 03:02g 8.6 68.6 03:02g 8.8 71.2 03:02g 9.4 71.8 
 06:03g 7.6 87.8 04:02g 5.5 74.1 03:03g 8.3 79.5 06:03g 7.7 79.5 
 06:09g 5.0 92.8 06:01g 5.1 79.2 06:03g 6.7 86.2 03:03g 5.3 84.9 
 03:02g 4.3 97.2 06:03g 4.8 84.0 05:01g 4.6 90.8 06:04g 3.6 88.5 
 06:04g 1.9 99.1 05:03g 4.2 88.2 05:02g 2.7 93.4 04:02g 3.2 91.7 
 03:03g 0.5 99.6 03:03g 4.0 92.3 06:02g 1.9 95.4 05:03g 3.2 94.9 
HLA-DPB1 04:01g 34.1 34.1 01:01g 33.8 33.8 04:01g 21.5 21.5 04:01g 42.8 42.8 
 02:01g 23.1 57.2 04:02g 21.1 54.9 04:02g 15.3 36.9 02:01g 13.0 55.8 
 13:01g 7.0 64.2 02:01g 11.4 66.4 01:01g 15.0 51.9 04:02g 11.5 67.3 
 04:02g 6.1 70.4 13:01g 6.2 72.5 02:01g 13.3 65.2 03:01g 9.5 76.8 
 14:01g 5.7 76.1 18:01g 5.4 77.9 03:01g 6.6 71.8 01:01g 5.2 82.0 
 03:01g 5.0 81.1 04:01g 4.8 82.7 13:01g 6.3 78.1 10:01g 2.1 84.2 
 26:01g 5.0 86.0 03:01g 4.6 87.4 05:01g 3.1 81.2 05:01g 2.1 86.3 
 09:01g 4.1 90.2 11:01g 2.9 90.2 14:01g 2.1 83.3 06:01g 1.9 88.2 
 01:01g 3.6 93.7 17:01g 1.9 92.1 18:01g 2.0 85.3 11:01g 1.9 90.1 
 17:01g 1.1 94.8 34:01g 1.8 93.9 11:01g 1.9 87.2 13:01g 1.9 92.0 

HFs

The most common 5-locus haplotype was A∗30:01g∼B∗42:01g∼C∗17:01g∼DRB1∗03:02g∼DQB1∗04:02g (f = 3.7%) for the Black African population, A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g (f = 2.0%) for the Coloured population, A∗01:01g∼B∗57:01g∼C∗06:02g∼DRB1∗07:01g∼DQB1∗03:03g (f = 3.5%) for the Indian/Asian population, and A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g (f = 6.5%) for the White population. The cumulative frequencies of the 50, 100, and 250 most frequent haplotypes were 38.9%, 50.8%, and 66.6% for the Black African population, 20.8%, 29.0%, and 42.4% for the Coloured population, 27.5%, 35.8%, and 49.4% for the Indian/Asian population, and 38.5%, 49.9%, and 66.1% for the White population, respectively (Table 3; see supplemental Table 7 for complete haplotype lists). Because the frequency of the most common haplotype was by far the highest in the White population, the corresponding cumulative HF curve started above the curves for the other population groups (Figure 1). Subsequently, however, the curve for the Black African population group had a steeper slope, corresponding to a slower decline in individual HFs, so that the curve for the Black African population ran above the curve for the White population from haplotype number 44 onward. From there on, the curves followed a similar course, crossing twice, with the difference between the cumulative frequencies of the 2 population groups never exceeding 0.02 when haplotype ranks up to 5000 were considered.

Table 3.

Frequencies and cumulated frequencies of the 10 most frequent haplotypes by population group

Black AfricanColoured
HaplotypeFrequency, %Cumulated frequency, %HaplotypeFrequency, %Cumulated frequency, %
A∗30:01g∼B∗42:01g∼C∗17:01g∼DRB1∗03:02g∼DQB1∗04:02g 3.67 3.67 A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g 1.96 1.96 
A∗02:05g∼B∗58:01g∼C∗07:01g∼DRB1∗11:02g∼DQB1∗03:01g 1.99 5.66 A∗03:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 1.29 3.26 
A∗29:02g∼B∗44:03g∼C∗07:01g∼DRB1∗11:01g∼DQB1∗06:02g 1.57 7.23 A∗01:01g∼B∗57:01g∼C∗06:02g∼DRB1∗07:01g∼DQB1∗03:03g 0.99 4.25 
A∗34:02g∼B∗44:03g∼C∗04:01g∼DRB1∗13:01g∼DQB1∗06:03g 1.46 8.69 A∗24:07g∼B∗35:05g∼C∗04:01g∼DRB1∗12:02g∼DQB1∗03:01g 0.98 5.23 
A∗02:01g∼B∗45:01g∼C∗16:01g∼DRB1∗13:01g∼DQB1∗06:03g 1.40 10.09 A∗33:03g∼B∗44:03g∼C∗07:01g∼DRB1∗07:01g∼DQB1∗02:01g 0.93 6.16 
A∗01:01g∼B∗81:01g∼C∗18:01g∼DRB1∗11:01g∼DQB1∗03:01g 1.35 11.44 A∗30:01g∼B∗42:01g∼C∗17:01g∼DRB1∗03:02g∼DQB1∗04:02g 0.75 6.91 
A∗68:02g∼B∗15:10g∼C∗03:04g∼DRB1∗03:01g∼DQB1∗02:01g 1.29 12.73 A∗02:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 0.71 7.62 
A∗29:02g∼B∗42:01g∼C∗17:01g∼DRB1∗03:02g∼DQB1∗04:02g 1.21 13.94 A∗43:01g∼B∗15:10g∼C∗04:01g∼DRB1∗04:01g∼DQB1∗03:02g 0.64 8.26 
A∗30:02g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g 1.17 15.11 A∗02:05g∼B∗58:01g∼C∗07:01g∼DRB1∗11:02g∼DQB1∗03:01g 0.55 8.81 
A∗24:02g∼B∗07:02g∼C∗07:02g∼DRB1∗15:03g∼DQB1∗06:02g 1.14 16.25 A∗03:01g∼B∗47:01g∼C∗06:02g∼DRB1∗03:01g∼DQB1∗02:01g 0.51 9.32 
Black AfricanColoured
HaplotypeFrequency, %Cumulated frequency, %HaplotypeFrequency, %Cumulated frequency, %
A∗30:01g∼B∗42:01g∼C∗17:01g∼DRB1∗03:02g∼DQB1∗04:02g 3.67 3.67 A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g 1.96 1.96 
A∗02:05g∼B∗58:01g∼C∗07:01g∼DRB1∗11:02g∼DQB1∗03:01g 1.99 5.66 A∗03:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 1.29 3.26 
A∗29:02g∼B∗44:03g∼C∗07:01g∼DRB1∗11:01g∼DQB1∗06:02g 1.57 7.23 A∗01:01g∼B∗57:01g∼C∗06:02g∼DRB1∗07:01g∼DQB1∗03:03g 0.99 4.25 
A∗34:02g∼B∗44:03g∼C∗04:01g∼DRB1∗13:01g∼DQB1∗06:03g 1.46 8.69 A∗24:07g∼B∗35:05g∼C∗04:01g∼DRB1∗12:02g∼DQB1∗03:01g 0.98 5.23 
A∗02:01g∼B∗45:01g∼C∗16:01g∼DRB1∗13:01g∼DQB1∗06:03g 1.40 10.09 A∗33:03g∼B∗44:03g∼C∗07:01g∼DRB1∗07:01g∼DQB1∗02:01g 0.93 6.16 
A∗01:01g∼B∗81:01g∼C∗18:01g∼DRB1∗11:01g∼DQB1∗03:01g 1.35 11.44 A∗30:01g∼B∗42:01g∼C∗17:01g∼DRB1∗03:02g∼DQB1∗04:02g 0.75 6.91 
A∗68:02g∼B∗15:10g∼C∗03:04g∼DRB1∗03:01g∼DQB1∗02:01g 1.29 12.73 A∗02:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 0.71 7.62 
A∗29:02g∼B∗42:01g∼C∗17:01g∼DRB1∗03:02g∼DQB1∗04:02g 1.21 13.94 A∗43:01g∼B∗15:10g∼C∗04:01g∼DRB1∗04:01g∼DQB1∗03:02g 0.64 8.26 
A∗30:02g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g 1.17 15.11 A∗02:05g∼B∗58:01g∼C∗07:01g∼DRB1∗11:02g∼DQB1∗03:01g 0.55 8.81 
A∗24:02g∼B∗07:02g∼C∗07:02g∼DRB1∗15:03g∼DQB1∗06:02g 1.14 16.25 A∗03:01g∼B∗47:01g∼C∗06:02g∼DRB1∗03:01g∼DQB1∗02:01g 0.51 9.32 
Indian/AsianWhite
HaplotypeFrequency, %Cumulated frequency, %HaplotypeFrequency, %Cumulated frequency, %
A∗01:01g∼B∗57:01g∼C∗06:02g∼DRB1∗07:01g∼DQB1∗03:03g 3.53 3.53 A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g 6.46 6.46 
A∗33:03g∼B∗44:03g∼C∗07:01g∼DRB1∗07:01g∼DQB1∗02:01g 3.15 6.68 A∗03:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 4.04 10.50 
A∗02:11g∼B∗40:06g∼C∗15:02g∼DRB1∗15:01g∼DQB1∗06:01g 1.74 8.42 A∗02:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 2.04 12.55 
A∗33:03g∼B∗58:01g∼C∗03:02g∼DRB1∗03:01g∼DQB1∗02:01g 1.57 9.99 A∗03:01g∼B∗35:01g∼C∗04:01g∼DRB1∗01:01g∼DQB1∗05:01g 1.51 14.05 
A∗26:01g∼B∗08:01g∼C∗07:02g∼DRB1∗03:01g∼DQB1∗02:01g 0.93 10.92 A∗68:02g∼B∗14:02g∼C∗08:02g∼DRB1∗13:03g∼DQB1∗03:01g 1.12 15.17 
A∗11:01g∼B∗52:01g∼C∗12:02g∼DRB1∗15:02g∼DQB1∗06:01g 0.80 11.72 A∗02:01g∼B∗15:01g∼C∗03:04g∼DRB1∗04:01g∼DQB1∗03:02g 1.11 16.28 
A∗01:01g∼B∗15:17g∼C∗07:01g∼DRB1∗13:02g∼DQB1∗06:04g 0.76 12.48 A∗02:01g∼B∗44:02g∼C∗05:01g∼DRB1∗04:01g∼DQB1∗03:01g 1.09 17.37 
A∗11:01g∼B∗52:01g∼C∗12:02g∼DRB1∗04:03g∼DQB1∗03:02g 0.64 13.12 A∗02:01g∼B∗15:01g∼C∗03:03g∼DRB1∗13:01g∼DQB1∗06:03g 1.07 18.44 
A∗30:01g∼B∗13:02g∼C∗06:02g∼DRB1∗07:01g∼DQB1∗02:01g 0.61 13.73 A∗29:02g∼B∗44:03g∼C∗16:01g∼DRB1∗07:01g∼DQB1∗02:01g 1.06 19.49 
A∗24:02g∼B∗40:06g∼C∗15:02g∼DRB1∗15:01g∼DQB1∗06:01g 0.60 14.32 A∗03:01g∼B∗07:02g∼C∗07:02g∼DRB1∗01:01g∼DQB1∗05:01g 0.99 20.49 
Indian/AsianWhite
HaplotypeFrequency, %Cumulated frequency, %HaplotypeFrequency, %Cumulated frequency, %
A∗01:01g∼B∗57:01g∼C∗06:02g∼DRB1∗07:01g∼DQB1∗03:03g 3.53 3.53 A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g 6.46 6.46 
A∗33:03g∼B∗44:03g∼C∗07:01g∼DRB1∗07:01g∼DQB1∗02:01g 3.15 6.68 A∗03:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 4.04 10.50 
A∗02:11g∼B∗40:06g∼C∗15:02g∼DRB1∗15:01g∼DQB1∗06:01g 1.74 8.42 A∗02:01g∼B∗07:02g∼C∗07:02g∼DRB1∗15:01g∼DQB1∗06:02g 2.04 12.55 
A∗33:03g∼B∗58:01g∼C∗03:02g∼DRB1∗03:01g∼DQB1∗02:01g 1.57 9.99 A∗03:01g∼B∗35:01g∼C∗04:01g∼DRB1∗01:01g∼DQB1∗05:01g 1.51 14.05 
A∗26:01g∼B∗08:01g∼C∗07:02g∼DRB1∗03:01g∼DQB1∗02:01g 0.93 10.92 A∗68:02g∼B∗14:02g∼C∗08:02g∼DRB1∗13:03g∼DQB1∗03:01g 1.12 15.17 
A∗11:01g∼B∗52:01g∼C∗12:02g∼DRB1∗15:02g∼DQB1∗06:01g 0.80 11.72 A∗02:01g∼B∗15:01g∼C∗03:04g∼DRB1∗04:01g∼DQB1∗03:02g 1.11 16.28 
A∗01:01g∼B∗15:17g∼C∗07:01g∼DRB1∗13:02g∼DQB1∗06:04g 0.76 12.48 A∗02:01g∼B∗44:02g∼C∗05:01g∼DRB1∗04:01g∼DQB1∗03:01g 1.09 17.37 
A∗11:01g∼B∗52:01g∼C∗12:02g∼DRB1∗04:03g∼DQB1∗03:02g 0.64 13.12 A∗02:01g∼B∗15:01g∼C∗03:03g∼DRB1∗13:01g∼DQB1∗06:03g 1.07 18.44 
A∗30:01g∼B∗13:02g∼C∗06:02g∼DRB1∗07:01g∼DQB1∗02:01g 0.61 13.73 A∗29:02g∼B∗44:03g∼C∗16:01g∼DRB1∗07:01g∼DQB1∗02:01g 1.06 19.49 
A∗24:02g∼B∗40:06g∼C∗15:02g∼DRB1∗15:01g∼DQB1∗06:01g 0.60 14.32 A∗03:01g∼B∗07:02g∼C∗07:02g∼DRB1∗01:01g∼DQB1∗05:01g 0.99 20.49 
Figure 1.

Cumulated 5-locus HFs of the 250 most frequent haplotypes for 4 South African and African American population groups. Purple represents Black African; green, Coloured; blue, Indian/Asian; orange, White; and yellow, African American populations.

Figure 1.

Cumulated 5-locus HFs of the 250 most frequent haplotypes for 4 South African and African American population groups. Purple represents Black African; green, Coloured; blue, Indian/Asian; orange, White; and yellow, African American populations.

Close modal

Subpopulation analysis showed that the frequency of the most common haplotype was highest among English-speaking White individuals from KwaZulu Natal, with a frequency of 8.2% (A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g), whereas Afrikaans-speaking Coloured individuals from the Western Cape province had the lowest corresponding frequency at 1.6% (A∗01:01g∼B∗08:01g∼C∗07:01g∼DRB1∗03:01g∼DQB1∗02:01g). Overall, the 2 least diverse subpopulations were Afrikaans-speaking White individuals from Mpumalanga and the isiXhosa-speaking Black African population from the Western Cape province (supplemental Figure 3). Among the White population, the 4 English-speaking subpopulations were more diverse than the 3 Afrikaans-speaking ones (see supplemental Table 8 for HFs of subpopulations).

LD and HWE

For the 4 population groups, LD was analyzed for all 2-locus pairs of the 6-locus haplotypes. In Black African individuals (n = 33 529 allele pairs), we identified 2783 pairs with a significant LD (P ≤ .05, after Holm-Bonferroni correction). Of these, 19 pairs showed a 2-locus HF ≥0.01 and a LD coefficient D’ ≥0.9, indicating a strong LD. The corresponding values for the Coloured (n = 59 929 allele pairs), Indian/Asian (n = 49 633 allele pairs), and White populations (n = 78 452 allele pairs) were 14 of 743, 21 of 847, and 23 of 2788, respectively. All 77 allele pairs with significant LD, D’ ≥0.9, and HF ≥0.01 were identified in either BC or DRB1DQB1 locus combinations. The highest frequencies of these allele pairs, B∗07:02g∼C∗07:02g (f = 0.14) and DQB1∗06:02g∼DRB1∗15:01g (f = 0.14), were identified in the White population, followed by 2 pairs in the Black African population, namely B∗58:02g∼C∗06:02g (f = 0.12) and DQB1∗04:02g∼DRB1∗03:02g (f = 0.13; supplemental Table 9).

We identified 9 significant deviations from HWE (3 in Black African, 4 in Indian/Asian, and 1 each in Coloured and White individuals; Table 4). However, all deviations showed Wn values clearly below 0.1, indicating HWE deviations without practical relevance. The highest Wn value was 0.013 for HLA-B in the Indian/Asian population.

Table 4.

Statistically significant deviations from HWE

PopulationLocusObserved heterozygosityExpected heterozygosityP valueWn
Black African HLA-A 0.935 0.942 .002 0.004 
Black African HLA-C 0.892 0.898 .000 0.002 
Black African HLA-DQB1 0.855 0.855 .050 0.000 
Coloured HLA-B 0.964 0.970 .021 0.012 
Indian/Asian HLA-A 0.889 0.905 .012 0.010 
Indian/Asian HLA-B 0.931 0.946 .000 0.013 
Indian/Asian HLA-DRB1 0.900 0.911 .003 0.009 
Indian/Asian HLA-DPB1 0.797 0.810 .003 0.007 
White HLA-DRB1 0.919 0.923 .042 0.001 
PopulationLocusObserved heterozygosityExpected heterozygosityP valueWn
Black African HLA-A 0.935 0.942 .002 0.004 
Black African HLA-C 0.892 0.898 .000 0.002 
Black African HLA-DQB1 0.855 0.855 .050 0.000 
Coloured HLA-B 0.964 0.970 .021 0.012 
Indian/Asian HLA-A 0.889 0.905 .012 0.010 
Indian/Asian HLA-B 0.931 0.946 .000 0.013 
Indian/Asian HLA-DRB1 0.900 0.911 .003 0.009 
Indian/Asian HLA-DPB1 0.797 0.810 .003 0.007 
White HLA-DRB1 0.919 0.923 .042 0.001 

MPs

For sample sizes of 5000, we calculated MPs by donor registry size for both identical (Figure 2) and different (Figure 3) donor and patient populations. In the model with identical donor and patient populations, MP curves generally reflect the cumulated population-specific HFs (Figure 1). Consistently, Black African and White individuals had the highest and very similar MPs. For a registry size 10 000, for example, we obtained PBlack African of .21 and PWhite of .25 (Δ = 0.04; supplemental Table 10). With increasing donor registry size, the curve for the Black African population grew more steeply, approaching that for White individuals. At 1 000 000, the MP difference was only 0.01 (PBlack African = .80; PWhite = .81). In line with the cumulative HFs, Indian/Asian, Coloured, and African American individuals had significantly lower MPs than Black African and White individuals.

Figure 2.

Full match (10/10) MPs for identical donor and patient populations. Purple represents Black African; green, Coloured; blue, Indian/Asian; orange, White; and yellow, African American populations.

Figure 2.

Full match (10/10) MPs for identical donor and patient populations. Purple represents Black African; green, Coloured; blue, Indian/Asian; orange, White; and yellow, African American populations.

Close modal
Figure 3.

Full match (10/10) MPs for various combinations of donor and patient populations. (A) Black African patients. (B) Coloured patients. (C) Indian/Asian patients. (D) White patients. (E) African American patients. Purple reprents Black African donors; green, Coloured donors; blue, Indian/Asian donors; orange, White donors; and yellow, African American donors.

Figure 3.

Full match (10/10) MPs for various combinations of donor and patient populations. (A) Black African patients. (B) Coloured patients. (C) Indian/Asian patients. (D) White patients. (E) African American patients. Purple reprents Black African donors; green, Coloured donors; blue, Indian/Asian donors; orange, White donors; and yellow, African American donors.

Close modal

For different donor and patient populations, MPs were generally low (Figure 3). The largest MP we observed at a registry size of 100 000 in this scenario was 0.19 for Coloured donors and White patients, and the smallest was 0.0002 for Black African donors and Indian/Asian patients. When the Black African population from South Africa and the African American population from the United States were considered, the corresponding MPs were 0.06 for donors from the United States and patients from South Africa and 0.03 vice versa. As the results of the GD analysis showed close relatedness between the South African Indian/Asian population and the populations of Gujarat and Tamil Nadu (see “GDs”), we determined MPs for donors from these 2 Indian states and South African Indian/Asian patients (supplemental Figure 4). However, the relatively low MP values obtained (P = .16 for Gujarat; P = .20 for Tamil Nadu; both values at registry size n = 100 000; supplemental Table 11), indicated that donor recruitment in India is of limited use for Indian/Asian patients in South Africa. Taken together, these results emphasize the importance of same-population donor recruitment.

The calculated MP increased considerably when matching requirements were relaxed, that is, when mismatches were allowed (Figure 4; supplemental Table 12). At a registry size of 100 000 and identical donor and patient populations, the smallest MPs were .49 for African American patients and .51 for Coloured individuals in the case of 1 permitted mismatch (Figure 4A) and .92 for the same 2 population groups in the case of 2 permitted mismatches (Figure 4B). With 3 or 4 permitted mismatches, all MPs approached 1 at very small registry sizes (Figure 4C-D), thereby indicating easy donor searches for all patients.

Figure 4.

Probabilities for partial matches for identical donor and patient populations. (A) MPs of ≥9/10. (B) MPs of ≥8/10. (C) MPs of ≥7/10. (D) MPs of ≥6/10. Purple represents Black African; green, Coloured; blue, Indian/Asian; orange, White; and yellow, African American populations.

Figure 4.

Probabilities for partial matches for identical donor and patient populations. (A) MPs of ≥9/10. (B) MPs of ≥8/10. (C) MPs of ≥7/10. (D) MPs of ≥6/10. Purple represents Black African; green, Coloured; blue, Indian/Asian; orange, White; and yellow, African American populations.

Close modal

GDs

GD analysis revealed close relatedness for several pairs of South African and reference samples: the Indian/Asian population had small GDs to the 2 Indian populations from Tamil Nadu (d = 0.236) and Gujarat (d = 0.307). Distances to individuals of Indonesian (d = 0.779) and Chinese origin (d = 0.815) were clearly higher, even higher than to Coloured individuals from South Africa (d = 0.746). The smallest GD of Black African individuals occurred to African American individuals (d = 0.535). White individuals were closely related to individuals from The Netherlands (d = 0.216, the smallest distance obtained in the analysis) and United Kingdom (d = 0.265; supplemental Table 13). As a result, we obtained 3 clusters in the visualization via MDS, namely an African cluster (Black South African and African American individuals), an Asian cluster (Indian/Asian individuals from South Africa; reference populations from India [Gujarat and Tamil Nadu], China and Indonesia), and a European cluster (White individuals from South Africa; reference populations from The Netherlands and United Kingdom; Figure 5), whereas Coloured individuals were located in between these clusters with similar distances to African American (d = 0.594) and White individuals (d = 0.613). Coloured individuals were related closer to White individuals than to other South African population groups (Black African, d = 0.722; Indian/Asian, d = 0.746). The largest GD was observed between Black African and individuals of Chinese origin (d = 1.327). The goodness-of-fit value for MDS was 0.7, indicating that most, but not all, genetic variation is captured in a 2-dimensional representation.

Figure 5.

GDs (Cavalli-Sforza and Edwards chord distances) between 4 South African population groups and 7 reference populations. PC1, principal component 1; PC2, principal component 2.

Figure 5.

GDs (Cavalli-Sforza and Edwards chord distances) between 4 South African population groups and 7 reference populations. PC1, principal component 1; PC2, principal component 2.

Close modal

In this study, we analyzed a large HLA data set of 56 961 registered stem cell donors from South Africa. In accordance with the South African census of 2022,29 four population groups were distinguished by self-assessment: Black African, Coloured, Indian/Asian, and White. Using these population groups, we determined AFs, HFs, MPs, and GDs. Previous studies analyzing South African HLA frequencies were limited by small sample sizes, lack of differentiation between population groups, or incomplete data sets.14,47-49 Our large data set of well-typed and -characterized individuals, on the contrary, allowed for comprehensive analyses on the level of population groups and even subgroups defined by language and place of residence.

Generally, our data are in good accordance with previously published results. This is demonstrated, for example, by the most common HLA-A alleles in Black African (HLA-A∗30:01, f = 10.1% [based on n = 21 168 in this study]; f = 10.1% [n = 200]14; f = 13.1% [n = 641]50) and White individuals (HLA-A∗02:01, f = 26.3% [based on n = 25 353 in this study]; f = 26% [n = 102]14; f = 22.4% [n = 380]50). Nevertheless, the nature of our data allowed us to obtain deeper insights than previously possible. Notably, our analyses show that the haplotypic diversity of Black South African and White individuals is approximately the same. To our knowledge, this is a new observation. When focusing only on the few very common haplotypes, one obtains an incorrect result, namely supposedly greater diversity among the Black South African population group as indicated by the course of the cumulated HF curves of Black African and White individuals for very low haplotype ranks (Figure 1). A previous study indicated that African populations have low diversity within themselves, whereas diversity between populations was higher than, for example, between European populations. Thus, the HLA diversity of Black African individuals of a cohort without a defined regional reference will presumably be rather high.51 The haplotypic diversity of a population group is of great practical relevance for stem cell donor registry planning because it determines intrapopulation MPs. Consistently, the intrapopulation MP curves of Black South African and White individuals also show very similar MP values, which are higher than those of the other population groups (Figure 2). Comparing MPs across different studies is challenging because the calculated MP values depend on the underlying sample sizes (larger sample sizes lead to lower MPs42). In a brief MP analysis with sample sizes of 20 000, we had shown that Black African individuals from South Africa have lower intrapopulation MPs than British people but higher ones than Germans and Poles.9 The data presented here are consistent with these findings. Overall, this is a very encouraging result regarding domestic donor recruitment efforts among the Black South African population, because it will be feasible to find fully matched donors for many Black African patients with realistic numbers of to-be-registered donors.

Black South African individuals and African American individuals from the United States are not very closely related, as shown by GD analysis (d = 0.535; which is only slightly smaller than the GD between South African Coloured and White individuals [d = 0.613]) and MP results (P = .06, at registry size of 100 000 for African American donors and Black South African patients; and P = .03 for the opposite scenario). From January 2023 to August 2024, a total of 38 stem cell donations were made by DKMS donors from South Africa, 4 of which went to the United States. Of these 4 donations, only 2 were from Black African donors. Given these figures, it is not likely that donor recruitment in South Africa will make a substantial contribution to the provision of fully matched unrelated stem cell donors for African American patients in the United States.

For Indian/Asian and especially Coloured patients, intrapopulation MPs were considerably lower than for Black African and White individuals. Although MPs with different donor and patient populations were generally low (Figure 3), the situation of Coloured patients was particularly unfavorable. For patients from the other 3 South African population groups, Coloured donors provided the second-highest MPs after their own group (P = .35 for Black African; P = .15 for Indian/Asian; and P = .37 for White patients; all values at registry size, n = 1 000 000) and at a considerable distance from the other population groups in each case (Figure 3A,C-D). However, corresponding MPs of donors from the other population groups for Coloured patients were much lower (P = .08 for Black African; P = .05 for Indian/Asian; and P = .09 for White donors; Figure 3B). These differences result from the particularly high intrapopulation diversity and mixed nature of the Coloured population group.

Our results indicate that it will be challenging to achieve satisfactory MPs for Coloured and Indian/Asian patients in South Africa. Transplants with mismatched unrelated donors may therefore play an important role for these population groups. Recent publications have indicated that in unrelated hematopoietic stem cell transplantation with posttransplantation cyclophosphamide (PTCy)-based graft-versus-host disease prophylaxis, the impact of 1 mismatch may be negligible.52-54 Notably, even >1 HLA mismatch appears acceptable in the context of PTCy-based graft-versus-host disease prophylaxis.55,56 Against this background, Coloured and Indian/Asian patients have a good chance of finding a suitable unrelated donor with the current recruitment strategy.57 Our results show, in good agreement with other data,57 that even with small donor numbers, donors with (multiple) mismatches can easily be found for South African patients of any background/descent (Figure 5).

Our study is subject to several limitations that are typical for analyses based on stem cell donor registry HLA data and have been described previously.7 First, registered stem cell donors do not constitute a random sample of the underlying population with respect to, for example, age, sex, or social status. This is not a major issue for this work, because our conclusions essentially focus on stem cell donation, but must be considered when the AFs and HFs provided here are used for other purposes. Second, the assignment of donors to different population groups is based solely on self-assessment, which may be incorrect in some cases. Third, 176 donors were excluded due to the occurrence of new alleles (supplemental Figure 1), resulting in a minor bias (underestimation of HLA diversity and overestimation of MPs). Fourth, some of the control groups (China, Indonesia, and The Netherlands) consist exclusively of migrants to Germany and their descendants. The resulting bias is difficult to estimate, but we have used this approach before with plausible results.58 Fifth, estimated MP values generally depend on underlying sample sizes. We have addressed this comprehensively in the “Results” and are confident that the results of the work are not substantially affected by this or the other limitations.

In summary, we have estimated HLA AFs and HFs, as well as GDs and MPs, of the 4 South African population groups on the basis of an unprecedentedly large and well-defined data set. Our main finding was that the Black African population from South Africa has a relatively low HLA diversity, similar to that of European populations. Thus, donor recruitment efforts in the Black South African population group are very promising, whereas Coloured and Indian/Asian individuals will often have to rely on mismatched unrelated donors or haploidentical donors.

The authors thank all registered donors.

Contribution: J. Sauter performed data analysis and contributed to manuscript drafting; S.N.B. and A.H.S. contributed to data analysis and drafted the manuscript; X.H., P.M., and K.M. recruited donors; V.L. was responsible for HLA typing; and E.W., T.G., E.N., M.F.-V., and J. Schetelig provided advice and input for the manuscript.

Conflict-of-interest disclosure: The authors declare no competing financial interests.

Correspondence: Stefanie N. Bernas, DKMS Group, Kressbach 1, 72072 Tübingen, Germany; email: bernas@dkms.de.

1.
Passweg
JR
,
Baldomero
H
,
Chabannon
C
, et al
.
Hematopoietic cell transplantation and cellular therapy survey of the EBMT: monitoring of activities and trends over 30 years
.
Bone Marrow Transpl
.
2021
;
56
(
7
):
1651
-
1664
.
2.
WMDA
.
Total Number of Donors and Cord Blood Units
. Accessed 3 February 2025. https://statistics.wmda.info/.
3.
Schmidt
AH
.
Unrelated hematopoietic stem cell donor registries: present reality and future prospects
.
Curr Opin Hematol
.
2024
;
31
(
6
):
251
-
260
.
4.
Norman
PJ
,
Norberg
SJ
,
Guethlein
LA
, et al
.
Sequences of 95 human MHC haplotypes reveal extreme coding variation in genes other than highly polymorphic HLA class I and II
.
Genome Res
.
2017
;
27
(
5
):
813
-
823
.
5.
Shiina
T
,
Hosomichi
K
,
Inoko
H
,
Kulski
JK
.
The HLA genomic loci map: expression, interaction, diversity and disease
.
J Hum Genet
.
2009
;
54
(
1
):
15
-
39
.
6.
Barker
DJ
,
Maccari
G
,
Georgiou
X
, et al
.
The IPD-IMGT/HLA database
.
Nucleic Acids Res
.
2023
;
51
(
D1
):
D1053
-
D1060
.
7.
Solloch
UV
,
Giani
AS
,
Pattillo Garnham
MI
, et al
.
HLA allele and haplotype frequencies of registered stem cell donors in Chile
.
Front Immunol
.
2023
;
14
:
1175135
.
8.
Solberg
OD
,
Mack
SJ
,
Lancaster
AK
, et al
.
Balancing selection and heterogeneity across the classical human leukocyte antigen loci: a meta-analytic review of 497 population studies
.
Hum Immunol
.
2008
;
69
(
7
):
443
-
464
.
9.
Schmidt
AH
,
Sauter
J
,
Schetelig
J
,
Neujahr
E
,
Pingel
J
.
Providing hematopoietic stem cell products from unrelated donors to the world: DKMS donor centers and DKMS Registry
.
Best Pract Res Clin Haematol
.
2024
;
37
(
1
):
101541
.
10.
Barker
JN
,
Boughan
K
,
Dahi
PB
, et al
.
Racial disparities in access to HLA-matched unrelated donor transplants: a prospective 1312-patient analysis
.
Blood Adv
.
2019
;
3
(
7
):
939
-
944
.
11.
Tozatto-Maio
K
,
Torres
MA
,
Degaide
NHS
, et al
.
HLA-matched unrelated donors for patients with sickle cell disease: results of international donor searches
.
Biol Blood Marrow Transpl
.
2020
;
26
(
11
):
2034
-
2039
.
12.
Disotell
TR
.
Archaic human genomics
.
Am J Phys Anthropol
.
2012
;
149
(
suppl 55
):
24
-
39
.
13.
Nunes
K
,
Aguiar
VRC
,
Silva
M
, et al
.
How ancestry influences the chances of finding unrelated donors: an investigation in admixed Brazilians
.
Front Immunol
.
2020
;
11
:
584950
.
14.
Paximadis
M
,
Mathebula
TY
,
Gentle
NL
, et al
.
Human leukocyte antigen class I (A, B, C) and II (DRB1) diversity in the Black and Caucasian South African population
.
Hum Immunol
.
2012
;
73
(
1
):
80
-
92
.
15.
Prugnolle
F
,
Manica
A
,
Charpentier
M
,
Guégan
JF
,
Guernier
V
,
Balloux
F
.
Pathogen-driven selection and worldwide HLA class I diversity
.
Curr Biol
.
2005
;
15
(
11
):
1022
-
1027
.
16.
Gragert
L
,
Eapen
M
,
Williams
E
, et al
.
HLA match likelihoods for hematopoietic stem-cell grafts in the U.S. registry
.
N Engl J Med
.
2014
;
371
(
4
):
339
-
348
.
17.
Clarke
RJ
,
Kuman
K
.
The skull of StW 573, a 3.67 Ma Australopithecus prometheus skeleton from Sterkfontein Caves, South Africa
.
J Hum Evol
.
2019
;
134
:
102634
.
18.
d'Errico
F
,
Backwell
L
,
Villa
P
, et al
.
Early evidence of San material culture represented by organic artifacts from Border Cave, South Africa
.
Proc Natl Acad Sci U S A
.
2012
;
109
(
33
):
13214
-
13219
.
19.
Byrnes
RM
.
South Africa: A Country Study
. (3rd ed) .
Federal Research Division, Library of Congress: For sale by the Supt. of Docs., U.S. G.P.O.
;
1997
.
20.
Hausman
AJ
.
The biocultural evolution of Khoisan populations of Southern Africa
.
Am J Phys Anthropol
.
1982
;
58
(
3
):
315
-
330
.
21.
de Filippo
C
,
Bostoen
K
,
Stoneking
M
,
Pakendorf
B
.
Bringing together linguistic and genetic evidence to test the Bantu expansion
.
Proc Biol Sci
.
2012
;
279
(
1741
):
3256
-
3263
.
22.
Meeuwis
M
. A Bief History of Dutch in Africa.
EuropeNow
;
2018
Accessed 24 February 2025. https://www.europenowjournal.org/2018/02/28/a-brief-history-of-dutch-in-africa/.
23.
South African History Online
.
Great Trek 1835-1846
. Accessed 21 January 2025. https://www.sahistory.org.za/article/great-trek-1835-1846.
24.
Davenport
TRH
,
Saunders
C
.
South Africa: A Modern History
. (5th ed) .
Macmillan Press LTD
;
2000
.
25.
Guilmoto
CZ
.
The Tamil Migration Cycle, 1830-1950
.
Econ Polit Wkly
.
1993
;
28
(
3/4
):
111
-
120
.
26.
Vahed
G
.
An ‘Imagined Community’ in diaspora: Gujaratis in South Africa
.
South Asian Hist Cult
.
2010
;
1
(
4
):
615
-
629
.
27.
Landy
F
,
Maharaj
B
,
Mainet-Valleix
H
.
Are people of Indian origin (PIO) “Indian”? A case study of South Africa
.
Geoforum
.
2004
;
35
(
2
):
203
-
215
.
28.
Beinart
W
,
Dubow
S
. Introduction: the historiography of segregation and apartheid.
Segregation and apartheid in twentieth century South Africa
.
Routledge
;
1995
:
1
-
24
.
29.
Statistics South Africa
.
Census 2022 - Statistical Release
. Accessed 3 February 2025. https://census.statssa.gov.za/assets/documents/2022/P03014_Census_2022_Statistical_Release.pdf.
30.
Christopher
AJ
.
‘To define the indefinable’: population classification and the census in South Africa
.
Area
.
2003
;
34
(
4
):
401
-
408
.
31.
de Wit
E
,
Delport
W
,
Rugamika
CE
, et al
.
Genome-wide analysis of the structure of the South African Coloured Population in the Western Cape
.
Hum Genet
.
2010
;
128
(
2
):
145
-
153
.
32.
Lange
V
,
Böhme
I
,
Hofmann
J
, et al
.
Cost-efficient high-throughput HLA typing by MiSeq amplicon sequencing
.
BMC Genomics
.
2014
;
15
:
63
.
33.
Schöfl
G
,
Lang
K
,
Quenzel
P
, et al
.
2.7 million samples genotyped for HLA by next generation sequencing: lessons learned
.
BMC Genomics
.
2017
;
18
(
1
):
161
.
34.
Schmidt
AH
,
Baier
D
,
Solloch
UV
, et al
.
Estimation of high-resolution HLA-A, -B, -C, -DRB1 allele and haplotype frequencies based on 8862 German stem cell donors and implications for strategic donor registry planning
.
Hum Immunol
.
2009
;
70
(
11
):
895
-
902
.
35.
Schäfer
C
,
Schmidt
AH
,
Sauter
J
.
Hapl-o-Mat: open-source software for HLA haplotype frequency estimation from ambiguous and heterogeneous data
.
BMC Bioinformatics
.
2017
;
18
(
1
):
284
.
36.
Solloch
UV
,
Schmidt
AH
,
Sauter
J
.
Graphical user interface for the haplotype frequency estimation software Hapl-o-Mat
.
Hum Immunol
.
2022
;
83
(
2
):
107
-
112
.
37.
Excoffier
L
,
Slatkin
M
.
Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population
.
Mol Biol Evol
.
1995
;
12
(
5
):
921
-
927
.
38.
Lewontin
RC
.
The interaction of selection and linkage. I. General considerations; heterotic models
.
Genetics
.
1964
;
49
(
1
):
49
-
67
.
39.
Nothnagel
M
,
Fürst
R
,
Rohde
K
.
Entropy as a measure for linkage disequilibrium over multilocus haplotype blocks
.
Hum Hered
.
2002
;
54
(
4
):
186
-
198
.
40.
Excoffier
L
,
Lischer
HE
.
Arlequin suite ver 3.5: a new series of programs to perform population genetics analyses under Linux and Windows
.
Mol Ecol Resour
.
2010
;
10
(
3
):
564
-
567
.
41.
Klitz
W
,
Stephens
JC
,
Grote
M
,
Carrington
M
.
Discordant patterns of linkage disequilibrium of the peptide-transporter loci within the HLA class II region
.
Am J Hum Genet
.
1995
;
57
(
6
):
1436
-
1444
.
42.
Schmidt
AH
,
Sauter
J
,
Pingel
J
,
Ehninger
G
.
Toward an optimal global stem cell donor recruitment strategy
.
PLoS One
.
2014
;
9
(
1
):
e86605
.
43.
Beatty
PG
,
Boucher
KM
,
Mori
M
,
Milford
EL
.
Probability of finding HLA-mismatched related or unrelated marrow or cord blood donors
.
Hum Immunol
.
2000
;
61
(
8
):
834
-
840
.
44.
Schmidt
AH
,
Sauter
J
,
Baier
DM
, et al
.
Immunogenetics in stem cell donor registry work: the DKMS example (part 1)
.
Int J Immunogenet
.
2020
;
47
(
1
):
13
-
23
.
45.
Cavalli-Sforza
LL
,
Edwards
AW
.
Phylogenetic analysis. Models and estimation procedures
.
Am J Hum Genet
.
1967
;
19
(
3 pt 1
):
233
-
257
.
46.
R Core Team
. R: A Language and Environment for Statistical Computing.
R Foundation for Statistical Computing
;
2013
Accessed 5 September 2025. http://www.R-project.org/.
47.
Grifoni
A
,
Sidney
J
,
Carpenter
C
, et al
.
Sequence-based HLA-A, B, C, DP, DQ, and DR typing of 159 individuals from the Worcester region of the Western Cape province of South Africa
.
Hum Immunol
.
2018
;
79
(
3
):
143
-
144
.
48.
Tshabalala
M
,
Ingram
C
,
Schlaphoff
T
,
Borrill
V
,
Christoffels
A
,
Pepper
MS
.
Human leukocyte antigen-A, B, C, DRB1, and DQB1 allele and haplotype frequencies in a subset of 237 donors in the South African Bone Marrow Registry
.
J Immunol Res
.
2018
;
2018
:
2031571
.
49.
Tshabalala
M
,
Mellet
J
,
Vather
K
, et al
.
High resolution HLA ∼A, ∼B, ∼C, ∼DRB1, ∼DQA1, and ∼DQB1 diversity in South African populations
.
Front Genet
.
2022
;
13
:
711944
.
50.
Janse van Rensburg
WJ
,
de Kock
A
,
Bester
C
,
Kloppers
JF
.
HLA major allele group frequencies in a diverse population of the Free State Province, South Africa
.
Heliyon
.
2021
;
7
(
4
):
e06850
.
51.
Cao
K
,
Moormann
AM
,
Lyke
KE
, et al
.
Differentiation between African populations is evidenced by the diversity of alleles and haplotypes of HLA class I loci
.
Tissue Antigens
.
2004
;
63
(
4
):
293
-
325
.
52.
Arrieta-Bolaños
E
,
Bonneville
EF
,
Crivello
P
, et al
.
Human leukocyte antigen mismatching and survival in contemporary hematopoietic cell transplantation for hematologic malignancies
.
J Clin Oncol
.
2024
;
42
(
28
):
3287
-
3299
.
53.
Sanz
J
,
Labopin
M
,
Choi
G
, et al
.
Younger unrelated donors may be preferable over HLA match in the PTCy era: a study from the ALWP of the EBMT
.
Blood
.
2024
;
143
(
24
):
2534
-
2543
.
54.
Shaffer
BC
,
Gooptu
M
,
DeFor
TE
, et al
.
Post-transplant cyclophosphamide-based graft-versus-host disease prophylaxis attenuates disparity in outcomes between use of matched or mismatched unrelated donors
.
J Clin Oncol
.
2024
;
42
(
28
):
3277
-
3286
.
55.
Al Malki
MM
,
Tsai
NC
,
Palmer
J
, et al
.
Posttransplant cyclophosphamide as GVHD prophylaxis for peripheral blood stem cell HLA-mismatched unrelated donor transplant
.
Blood Adv
.
2021
;
5
(
12
):
2650
-
2659
.
56.
Shaw
BE
,
Jimenez-Jimenez
AM
,
Burns
LJ
, et al
.
Three-year outcomes in recipients of mismatched unrelated bone marrow donor transplants using post-transplantation cyclophosphamide: follow-up from a National Marrow Donor Program-Sponsored Prospective Clinical Trial
.
Transpl Cell Ther
.
2023
;
29
(
3
):
208.e1
-
208.e6
.
57.
Chowdhury
AS
,
Maiers
M
,
Spellman
SR
,
Deshpande
T
,
Bolon
YT
,
Devine
SM
.
Existence of HLA-mismatched unrelated donors closes the gap in donor availability regardless of recipient ancestry
.
Transpl Cell Ther
.
2023
;
29
(
11
):
686.e1
-
686.e8
.
58.
Sauter
J
,
Putke
K
,
Schefzyk
D
, et al
.
HLA-E typing of more than 2.5 million potential hematopoietic stem cell donors: methods and population-specific allele frequencies
.
Hum Immunol
.
2021
;
82
(
7
):
541
-
547
.

Author notes

J. Sauter and S.N.B contributed equally to this work.

The aggregated and anonymized data underlying the findings described and used to reach the conclusions of the manuscript are provided in this article and the supplemental Material. Further inquiries can be directed to the corresponding author, Stefanie N. Bernas (bernas@dkms.de). Raw data cannot be made publicly available for data protection reasons.

The full-text version of this article contains a data supplement.

Supplemental data