Tải bản đầy đủ (.pdf) (31 trang)

A linkage disequilibrium map of the human major histocompatibility complex in singapore chinese conserved extended haplotypes and ancestral blocks 3

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (7.24 MB, 31 trang )



49










CHAPTER 3:

RESULTS
Results

50
3.1 First Generation Linkage Disequilibrium and Haplotype Map of the
Chromosome 6p and the Major Histocompatibility Complex

To characterize the genetic variation and patterns of linkage disequilibrium (LD) of
the human chromosome 6p and the MHC, 2 separate genotyping projects were
initiated and from these, 3 distinct SNP maps with increasing marker densities were
constructed. SNP genotyping for all datasets were performed using the Illumina
Golden Gate platform. The descriptions of each SNP map are summarized in Table
3.1.

The first map surveys the entire chromosome 6p arm at a density of approximately 1
SNP every 100kb, the second focuses a higher density of SNPs (approximately 1 SNP


every 20kb) across a contiguous 7Mb stretch that contains the MHC. These 2 maps
provide an overview of LD distribution across the MHC and allow comparisons to be
made with the rest of the chromosome arm. The third SNP map is focused solely
within the MHC with a much greater density (1 SNP every 2.6kb). HLA haplotype
and homozygosity data obtained from the first 2 maps guided sample selection for the
third map, allowing the detailed analysis of the common and conserved MHC
haplotypes present in the Singapore Chinese population.






Results

51

3.1.1 SNP Set of the First Generation Map
A set of 1152 SNPs was selected and genotyped in 198 Singaporean Chinese
individuals. These individuals comprised of randomly selected, unrelated, healthy
blood donors from whom appropriate informed consent were obtained. The SNPs
were chosen based on available genotype data deposited in dbSNP (build 121,
Smigielski et al. 2002). SNPs were initially selected to achieve a targeted density of 1
SNP per 100kb across the entire Chromosome 6p arm, and a higher density of 1 SNP
per 10kb across the MHC and peri-MHC (physical coordinates 30.0Mb to 37.0Mb).
2nd Generation Map
2360 SNPs
282 Individuals
2290 SNPs
276 Individuals

Informative
SNPs
1877 SNPs
SNP Sets
1. Chr 6p Map
(Section 3.1.2)
2. MHC SNP Map
(Section 3.1.3)
3. High -Resolution
MHC SNP Map
(Section 3.2)
Informative
SNPs
615 345 1877
Chr. 6
Coordinates
175,572 -
58,750,370
30,133,482 -
37,000,199
28,970,148 -
33,882,048
SNP Density 1 per 95.2kb 1 per 19.9 kb 1 per 2.6 kb
A first generation, low-density, SNP map was used to analyse the LD structure of the
chromosome 6p and the MHC. A high resolution SNP map was then constructed to analyse the
LD of the MHC in greater detail. The corresponding sections in the results chapter of this
thesis is indicated for each SNP map.
Table 3.1: Summary of SNP Sets Used in this Study
1st Generation Map
909 SNPs

Assayed
Successfully
Genotyped
1152 SNPs
198 Individuals
1099 SNPs
192 Individuals
Results

52
To reduce the chance of genotyping uninformative markers, priority was given to
SNPs that has been shown to exist in high frequency in an East Asian population.

Of the 1152 SNP-genotyping attempted, 1099 were successful – this translates to a
95.4% marker success rate. Of the 198 samples genotyped, 6 samples failed to
produce results that passed Illumina’s stringent quality checks, possibly due to
inadequate DNA quality or quantity, thus attaining a sample success rate of 97.0%.
The overall genotype call rate and reproducibility was greater than 99.9%.

It has been noted that mis-genotyped and erroneously located SNPs may result in
spurious linkage disequilibrium associations and false introduction of haplotype
variants that do not exist in nature (Gabriel et al. 2002, Hosking et al. 2004). The
1099 successfully genotyped SNPs were further passed through a series of quality
control filters to remove possible genotyped errors. First, the flanking sequences of
the probes used in the assays were re-aligned to the human genome (NCBI build 36),
ensuring that SNPs locations were as planned. Eight loci could not be mapped to the
chromosome 6p and were eliminated at this stage. Next, to remove non-informative
markers and identify possible genotyping errors, SNPs that had a minor allele
frequency of less than 5% in the population, or did not satisfy the Hardy-Weinberg
Equilibrium at a 0.1% significance level were removed. 909 SNPs passed these sets of

filters and were used to construct the SNP maps.

Allele frequencies for the SNPs assayed in these samples were compared to those
reported for various populations in dbSNP, on which data it was relied on for the
initial SNP selection. As one would expect, the frequencies from this study had a
Results

53
much higher correlation of determination (R
2
= 0.86) with the aggregated East Asian
population data as compared to that from other non-Asian populations (R
2
= 0.35),
thus providing a gauge to the reliability of the genotyping data (Figure 3.1). This also
establishes that in the absence of any other information, the allele frequencies of
SNPs reported in dbSNP would be sufficient in guiding informative marker selection
in genotyping studies.


3.1.2 Chromosome 6p SNP LD Map
By design, these 909 SNPs were not uniformly distributed across the Chromosome
6p; the MHC region and its perimeter were deliberately more densely surveyed. In
order to construct an unbiased linkage disequilibrium map across the chromosome
Figure 3.1
Comparing Allele Frequencies
Between Singapore Chinese and
dbSNP

Allele frequencies in the local

Chinese population have a high
correlation to frequencies reported
in aggregated East Asian
populations (panel A), in contrast
with non-Asian populations (panel
B) found in dbSNP.

R
2
=0.35
R
2
=0.86
Results

54
arm, a “picket-fence” approach was used to select genotyped SNPs from the denser
30.0Mb - 37.0Mb segment to achieve an approximate density of 1 SNP per 100kb
distribution consistent with the overall SNP density. In all, 615 SNPs were used to
construct a linkage disequilibrium map across the entire chromosome 6p, with the
first marker starting at position 175,572bp and the last marker at position
58,750,370bp of the physical map. The average SNP density is 1 SNP per 95.2kb,
with a median distance of 80.0kb between consecutive SNP pairs, ranging from 27kb
to 485kb, with 414 pairs less than 100kb apart. The average minor allele frequency of
the data set is 28% with an average heterozygosity of 0.37.

To evaluate the LD structure across the chromosome arm, the 2 most commonly used
measures of LD, r
2
and D′, were calculated between all possible SNP pairs separated

by less than 5Mb. D′ and r
2
are both based on the disequilibrium parameter D (Ott
1999), a difference between observed and expected frequencies of 2-locus haplotypes,
although they differ in their interpretation; D′ is strictly an indicator of the absence of
recombination in the history of the studied population samples, whereas a high r
2

value has an additional requirement of correlation between allele frequencies ( Devlin
and Risch 1995, Ardlie et al. 2002). The distribution of LD between SNP pairs on
this map is shown as a heatmap in Figure 3.2 and can be seen to vary greatly along the
chromosome arm. Stronger LD between consecutive marker pairs is seen towards the
centre of the chromosome 6p especially in the region telomeric to the MHC. The
punctuate nature of LD can be seen with several islands of SNPs in high pairwise LD
standing out in contrast with the relatively uniform level of equilibrium in the rest of
the chromosome arm. At an average marker density of 1 SNP every 95kb, strong LD
between consecutive SNPs is not expected (Gabriel et al. 2002, International HapMap
Results

55
Consortium 2005), and these islands of high LD are the exceptions rather than the
rule.


Figure 3.2 Chromosome 6p SNP LD Map
Gene density across the chromosome 6p (represented in blue) is plotted as sliding window gene counts per
100kb. Locations of highlighted genes are represented green glyphs. All gene annotations were taken from the
Vertebrate Genome Annotation Database (Vega) (Wilming et al. 2008). The set of SNPs used in the map are
drawn as vertical grey lines. Linkage-disequilibrium between pairs of SNPs is depicted using a heatmap
produced by Haploview (Barret et al. 2005), with darker shades of red representing pairs of SNPs with high D´.

Results

56
Smoothing out the pairwise LD values by averaging them across 2Mb sliding
windows, the distribution of LD across the chromosome arm was plotted as a function
of physical distance in Figure 3.3.

Given that markers in LD do not necessarily show strong allelic correlation (Ott
1999), averaged r
2
values are much lower than D′ but the trend of these 2 parameters
track evenly across the chromosome arm. With the high marker spacing and relative
sparseness of this SNP map, high r
2
values are not expected and the mean r
2
value
between pairs of markers less than 5Mb apart is only 0.03 (yellow dotted line). The
average for D′ is 0.16 (blue dotted line). There is a noticeable elevation of linkage
disequilibrium above the chromosomal-average at several locations, the most
prominent of these being an 8Mb-long segment at the centre of the chromosome arm,
with elevated LD seen in both D′ and r
2
values. This strong LD segment lies between
Figure 3.3 Distribution of LD Across the Chromosome 6p
Averaged pairwise LD between SNPs within 2Mb windows was calculated and plotted
against physical distance. LD was calculated using both the D′ coefficient (blue shaded
area) as well as r
2
(red shaded area). The averaged pairwise LD value across the whole

chromosome arm is indicated by the blue (D′) and yellow (r
2
) horizontal dotted lines.
The HapMap genetic map (release 22) in centiMorgans is also plotted in the green line.

Results

57
positions 25Mb and 33Mb with the peak being a 2Mb window centred at position
28.9Mb. This peak between positions 27.9Mb and 29.9Mb of chromosome 6p
contains 21 informative SNPs with pairwise D′ averaging 0.54 and pairwise r
2

averaging 0.15. This segment of elevated LD is also underscored by fewer
recombination hotspots and a lower recombination rate (0.46 cM/Mb as opposed to
chromosome average of 1.27cM/Mb) in the genetic map reported recently by the
International HapMap project (International HapMap Consortium 2005).

The centromeric half of this high-LD segment contains the classical MHC loci
(positions 30.0Mb – 33.4Mb), while the telomeric half is marked by the presence of
the largest histone cluster in the human genome (there are over 40 loci coding for
histone genes between 26 Mb to 28 Mb) as well as an 8-zinc finger cluster (between
27.5Mb to 28.7Mb). At the centre and peak of this high-LD segment is a large
olfactory receptor cluster, with 13 olfactory receptor genes between 29.1Mb to 29.6
Mb. The gene map showing the clusters in this region can be seen in Figure 3.4.







Results

58

Figure 3.4: Gene Clusters Telomeric to the MHC
The list of genes in this region is obtained from the VEGA project (Wilming et al. 2008).

The large gene clusters in the region between 25Mb and 30Mb can be clearly seen in this figure.
The centre of the high-LD segment (28.9Mb) lies close to a large olfactory receptor cluster
marked out in a red border.
Results

59
3.1.2.1 Haplotype Blocks Across the Chromosome 6p
Haplotype blocks are defined as segments of DNA along chromosomes that exhibit
low diversity and low recombination rates (Daly et al. 2001, Patil et al. 2001). The
chromosome 6p SNP linkage disequilibrium map in this study was constructed using
a rather modest resolution of one SNP per 95.2kb, and consequently this sparse SNP
map will not allow for an exhaustive description of the haplotype blocks that exist on
the chromosome arm. However, haplotype blocks identified at this resolution will
highlight segments on the chromosome arm that are of significantly low diversity and
high linkage disequilibrium.

Using a conservative definition for haplotype blocks recently outlined (Gabriel et al.
2002), three such blocks can be identified in this SNP map (Figure 3.5).
Unsurprisingly two of these blocks, which are over 150kb in length, fall within the
high LD MHC-telomeric region described earlier. The third is a clearly discernable
535kb haplotype block coinciding with a large gene locus SUPT3H, and overlapping
with the RUNX2 loci. This long haplotype block has remarkably low diversity, with 3

out of a possible 256 haplotypes representing more than 97% of the variation in the
local Chinese population. SUPT3H is a transcription initiation factor associated with
the RNA polymerase II complex and is highly conserved across many organisms.
Functional constrains may have resulted in a lack of diversity and recombination
suppression across this loci, maintaining a strong haplotype-block structure across a
long stretch of DNA.



Results

60

3.1.3 An Integrated SNP-HLA Map of the MHC
To generate a MHC SNP map of higher density, the same 192 Singaporean Chinese
samples were successfully genotyped at a denser resolution of 1 SNP per 20kb across
a 7Mb segment from 30.0Mb to 37.0Mb that includes the MHC. The HLA genotypes
for these 192 individuals were also determined using sequence-base typing and an
integrated SNP-HLA haplotype map was constructed. Using this integrated map, the
extended and conserved MHC haplotypes present in the local Chinese population are
described.
Figure 3.5 Large Haplotype Blocks in the Chromosome 6p
Using the definition of haplotype blocks established by Gabriel et al. 2002, 3 large haplotype
blocks can be identified across the chromosome arm.

Panel A:
The left side of the panel shows a LD heat map of a segment of the chromosome between
27.4Mb and 28.6Mb, drawn using Haploview (Barrett et al. 2005). Two haplotype blocks of
over 150kb in length fall within this region and are outlined in a black border on the heatmap.
The positions of the SNPs in the blocks are highlighted in blue. These 2 blocks lie within the

high LD peak described earlier and overlap with zinc finger and histone clusters.

Panel B:
The 3rd haplotype block of SNPs is over 500kb long and lies across a large gene locus
SUPT3H.
Results

61
3.1.3.1 LD Structure of the MHC and peri-MHC
The same quality control criteria described in the previous section was employed,
removing uninformative markers with less than 5% MAF as well as those that fail the
Hardy Weinberg equilibrium test. In all, 81 SNPs were filtered out, and the remaining
345 SNPs were used to describe the LD across the MHC. This SNP map has an
average interval of 19.9kb between consecutive markers (ranging from 0.37 to 140kb,
and a median of 12.6kb), with an average minor allele frequency of 30% and
heterozygosity of 0.39.

As before, to calculate LD, pairwise r
2
and D′ values were calculated between all SNP
pairs less than 500kb apart. The distribution of pairwise LD is shown as a heatmap in
Figure 3.6. Regions of stronger LD is seen outside the core MHC (30.0 to 33.4Mb)
and at this resolution, no strong LD block is seen across the class I loci HLA-A, -B,
and –C, nor the highly polymorphic class II locus HLA-DRB1. The more invariant
HLA-DRA, -DMA and -DMB loci are however seen amidst SNP markers with higher
LD.

The block-like structure of LD is more evident at this resolution, and the criterion laid
out in Gabriel et al. 2002 was again used to define haplotype block boundaries. In
total, 61 blocks of varying physical lengths were identified, ranging from 780bp to

280kb, with the average size of a haplotype block 35.5kb in length (Table 3.2). At this
SNP density 31.5% of the region covered in this map falls within a haplotype block.
The majority of the longer haplotype blocks lie outside the traditionally defined MHC
locus, most notably within a 1Mb section from 34.5 to 35.5Mb, with 760kb (76.5%)
of this segment covered by haplotype blocks. This region contains 11 open-reading
Results

62
frames, including loci coding for a linked pair of genes – TCP11 (T-complex
homologue) and ZN76 (a zinc finger) – that are expressed in tandem (Ragoussis et al.
1992). Within the MHC proper, 2 large haplotype blocks lie within the class III
region; a 132kb block containing CLIC1, VARS2, DDAH1 and several heat shock
proteins, as well as a 65kb block containing C6orf10.
Figure 3.6 LD Map of the MHC and the peri-MHC
The pairwise LD (calculated as D′) of the 426 SNPs successfully genotyped between 30Mb to 37Mb is
shown in this figure and drawn as a heatmap using Haploview (Barrett et al. 2005). Shades of red indicate
strength of LD between SNP pairs. Known genes are marked in green and located in this figure. Stronger
pairwise LD can be seen in the centromeric region 34.0Mb to 37.0Mb that is outside the boundary of the
traditional MHC.
Results

63
Table 3.2 List of Haplotype Blocks between 30.0Mb and 37.0Mb of the
Chromosome 6p
Haplotype block boundaries are defined using a well-defined criterion published recently
(Gabriel et al. 2002). The set of genes found within each block is also listed below.

Block
Number
of SNPs

in Block
Start
Location
(Mb)
End
Location
(Mb)
Block
Length
(kb)
List of Genes in Block






1
4
30.620
30.659
39.12
GNL1 DDR1 PRR3 ABCF1
2
4
30.946
30.992
46.90
DDR1 GTF2H4
3

4
31.159
31.187
27.45

4
2
31.198
31.208
9.98
PSORS1C1
5
3
31.230
31.252
21.71
CCHCR1 TCF19 POU5F1P1 Q6H1K9_HUMAN
6
2
31.501
31.515
14.73

7
2
31.633
31.648
14.75
NFKBIL1 LTA
8

2
31.692
31.701
9.65
AIF1 BAT2 SNORA38
9
4
31.720
31.754
34.56
BAT3 APOM C6orf47 BAT4 Y LY6G5B
NP_079538.2
10
2
31.773
31.786
12.58
BAT5_HUMAN LY6G6D
11
7
31.794
31.927
132.88
LY6G6C C6orf25 CLIC1 MSH5 G7C_HUMAN
VARS Y LSM2 HSPA1L HSPA1A HSPA1B
C6orf48 SNORD48 SNORD52
12
2
32.155
32.166

10.19
TNXB
13
4
32.327
32.392
65.00
NM_001013681 C6orf10
14
4
32.414
32.443
29.02
C6orf10
15
3
32.472
32.485
12.70
BTNL2_HUMAN
16
3
32.541
32.557
16.30

17
3
32.802
32.832

29.94
HLA-DQA2 HLA-DQB2
18
2
32.933
32.941
7.29
PSMB9
19
5
33.031
33.071
40.11
BRD2
20
2
33.185
33.194
9.18
Q8WM95_HUMAN
21
4
33.340
33.366
26.39
VPS52 RPS18 B3GALT4 WDR46 PFDN6
22
2
33.376
33.388

12.64
TAPBP
23
2
33.409
33.419
9.75

24
7
33.467
33.546
79.50
KIFC1 NP_001074327.1 PHF1 CUTA SYNGAP1
25
3
33.683
33.696
13.51

26
2
33.718
33.728
10.04
ITPR3
27
2
33.739
33.752

13.02
ITPR3
28
2
33.769
33.788
18.56
ITPR3 Q6ZSG0_HUMAN C6orf125
29
2
33.848
33.858
9.45
LEMD2
30
2
33.864
33.880
15.78
LEMD2 MLN
31
2
33.952
33.966
13.77

32
2
34.026
34.034

7.68

33
2
34.065
34.068
2.86

34
2
34.160
34.169
9.73
GRM4
35
2
34.230
34.238
7.70
GRM4
36
7
34.363
34.460
96.29
NUDT3
37
2
34.544
34.545

0.78
PACSIN1
38
17
34.659
34.940
280.53
C6orf106 SRP_euk_arch SNRPC C6orf107
39
8
34.948
35.069
120.89
C6orf107 TAF11 ANKS1A
40
4
35.117
35.187
69.47
ANKS1A SRP_euk_arch
41
3
35.240
35.255
15.35

42
4
35.277
35.298

21.51
SCUBE3
43
7
35.324
35.435
111.06
SCUBE3 ZNF76 DEF6 PPARD
44
5
35.486
35.531
44.63
PPARD FANCE
45
2
35.587
35.593
6.18
TULP1
Results

64
Block
Number
of SNPs
in Block
Start
Location
(Mb)

End
Location
(Mb)
Block
Length
(kb)
List of Genes in Block
46
3
35.637
35.663
26.61
FKBP5
47
7
35.687
35.779
92.16
FKBP5 SNORA40
48
2
35.800
35.809
9.50
FKBP5
49
9
35.911
36.014
103.11

SRPK1
50
3
36.071
36.081
9.79
SLC26A8
51
7
36.096
36.186
90.54
SLC26A8 MAPK14 MAPK13
52
2
36.209
36.220
11.69
MAPK13
53
2
36.286
36.298
12.30
BRPF3
54
2
36.318
36.326
8.60


55
2
36.466
36.483
17.02
PXT1
56
10
36.490
36.592
102.21
PXT1 KCTD20 STK38
57
3
36.607
36.633
25.44
STK38
58
3
36.660
36.678
18.10
SFRS3
59
2
36.750
36.760
10.59

CDKN1A
60
2
36.770
36.778
8.23
NP_997381.1
61
3
36.805
36.821
16.06
NP_997381.1 CPNE5








3.1.3.2 HLA Allele and Haplotype Frequencies in the Singaporean Chinese
Population
Using sequence-based typing, each of the 192 individuals was additionally genotyped
at 4 classical HLA loci (HLA-A, -B, -C and -DRB1) with allele-level HLA genotypes
determined at each locus. A number of HLA alleles stand out as being very common
in the population, for example the HLA-A*1101 allele is present at a frequency of
30%. The most common allele at the HLA-B locus is B*4001 with a frequency of
16%, at the HLA-C locus – C*0102 (22%) and at the DRB1 locus – DRB1*0901
(17%). The complete list of HLA allele frequencies can be found in Table 3.3.


Results

65
Table 3.3 Distribution of HLA Allele Frequencies
HLA-A

HLA-C

HLA-DRB1
Allele
Frequency

Allele
Frequency

Allele
Frequency

Allele
Frequency
A*1101
30.21% (116)

C*0102
22.14% (85)

DRB1*0901
16.93% (65)


DRB1*0809
1.04% (4)
A*2402
13.28% (51)

C*0702
20.83% (80)

DRB1*1202
9.38% (36)

DRB1*1302
1.04% (4)
A*0207
12.76% (49)

C*0304
13.28% (51)

DRB1*1501
8.59% (33)

DRB1*1001
0.78% (3)
A*3303
10.42% (40)

C*0302
8.85% (34)


DRB1*0803
7.81% (30)

DRB1*1439
0.78% (3)
A*0201
8.59% (33)

C*0801
8.59% (33)

DRB1*1101
7.03% (29)

DRB1*1502
0.78% (3)
A*0203
8.07% (31)

C*0401
4.95% (19)

DRB1*1602
7.55% (27)

DRB1*0101
0.52% (2)
A*1102
4.43% (17)


C*1202
3.65% (14)

DRB1*0405
6.51% (25)

DRB1*1312
0.52% (2)
A*0206
3.13% (12)

C*0303
3.39% (13)

DRB1*0301
6.51% (25)

DRB1*0307
0.26% (1)
A*3101
2.08% (8)

C*1402
3.39% (13)

DRB1*1201
4.69% (18)

DRB1*0327
0.26% (1)

A*2601
1.56% (6)

C*1502
3.13% (12)

DRB1*0403
3.39% (13)

DRB1*0801
0.26% (1)
A*3001
1.56% (6)

C*0602
2.34% (9)

DRB1*0406
2.86% (11)

DRB1*1104
0.26% (1)
A*0101
0.78% (3)

C*1203
1.56% (6)

DRB1*0701
2.86% (11)


DRB1*1106
0.26% (1)
A*0301
0.78% (3)

C*0403
1.04% (4)

DRB1*1401
2.86% (11)

DRB1*1303
0.26% (1)
A*2901
0.78% (3)

C*1505
0.78% (3)

DRB1*0404
1.3% (5)

DRB1*1403
0.26% (1)
A*6801
0.52% (2)

C*0701
0.52% (2)


DRB1*0802
1.3% (5)

DRB1*1404
0.26% (1)
A*2410
0.26% (1)

C*0704
0.52% (2)

DRB1*1301
1.3% (5)

DRB1*1504
0.26% (1)
A*2420
0.26% (1)

C*0103
0.26% (1)

DRB1*1405
1.3% (5)



A*3201
0.26% (1)


C*0586
0.26% (1)






A*3401
0.26% (1)

C*0802
0.26% (1)









C*1403
0.26% (1)


















HLA-B






Allele
Frequency

Allele
Frequency







B*4001
15.63% (60)

B*1518
0.52% (2)






B*4601
15.1% (58)

B*1527
0.52% (2)






B*5801
8.85% (34)

B*1801
0.52% (2)







B*1301
6.77% (26)

B*3503
0.52% (2)






B*3802
5.47% (21)

B*3505
0.52% (2)






B*1502
5.21% (20)

B*5201
0.52% (2)







B*5502
4.95% (19)

B*5501
0.52% (2)






B*5101
3.91% (15)

B*5504
0.52% (2)






B*2704
3.39% (13)


B*5604
0.52% (2)






B*5401
2.86% (11)

B*5701
0.52% (2)






B*1501
2.6% (10)

B*1401
0.26% (1)







B*3901
2.6% (10)

B*1512
0.26% (1)






B*5102
2.08% (8)

B*1521
0.26% (1)






B*4002
2.08% (8)

B*1525
0.26% (1)







B*3501
1.82% (7)

B*1558
0.26% (1)






B*4801
1.82% (7)

B*1802
0.26% (1)






B*1302
1.56% (6)

B*3701

0.26% (1)






B*0705
1.3% (5)

B*3801
0.26% (1)






B*4006
1.04% (4)

B*3909
0.26% (1)






B*4403

0.78% (3)

B*5107
0.26% (1)






B*5601
0.78% (3)

B*5512
0.26% (1)






B*0801
0.52% (2)

B*5603
0.26% (1)







B*1511
0.52% (2)









Table 3.3 Distribution of HLA Allele
Frequencies
The complete list of alleles at the 4 classical HLA
loci (HLA-A, -B, -C and -DRB1) seen in 192
individuals from the Singaporean Chinese
Population.

Absolute counts can be found in parentheses under
the frequency column. Common alleles (present at
least 5% in the population) are listed in bold.
Results

66
The HLA haplotypes of each individual were reconstructed using the application
PHASE, a Bayesian-based algorithm that has been shown to be accurate for biallelic
and multi-allelic haplotype reconstruction (Stephens and Scheet 2005, Marchini et al.
2006). The list of common 2- and 3-locus HLA haplotypes that are seen in more than

5 chromosomes is shown in Table 3.4. Two long-range 3-locus haplotypes are seen
to exist in remarkably high frequency in the population: the A*0207-B*4601-
DRB1*0901 and A*3303-B*5801-DRB1*0301 haplotypes are each present at a
frequency of almost 5% in the population. The HLA allele and haplotype frequencies
presented here agree very well with a report published very recently with data from
536 Singaporean Chinese samples (Tang et al. 2007)

3.1.3.3 Linkage Between HLA Alleles
The HLA genes are the most polymorphic in the human genome; in this sample of
384 chromosomes from the Singaporean Chinese population, there are a total of 19
HLA-A, 20 HLA-C, 45 HLA-B and 33 HLA-DRB1 alleles seen. However, due to
allelic association between HLA alleles, the diversity of HLA haplotypes fall short of
that as expected under linkage equilibrium, with some haplotypes over represented.
For example, given the individual frequencies of HLA-A*0207 and HLA-B*4601
(12.8% and 15.1% respectively, from Table 3.3), if the alleles are truly independent,
only 2% of the samples are expected to carry the haplotype A*0207-B*4601. Yet, this
haplotype is present in over 9% of the chromosomes (Table 3.4). In contrast, the
frequency of the A*1101-B*4001 haplotype (5.47%) does not deviate from the
expected, suggesting that allelic association may not be strong between all HLA allele
pairs. This allelic association is also reflected in the p-values of the difference
between the expected and observed haplotype frequencies (Table 3.4).
Results

67
Table 3.4 Common 2- and 3-locus HLA Haplotypes in Singaporean Chinese
Population
HLA haplotypes were reconstructed using the program PHASE and these are listed as
observed frequencies. Only haplotypes seen in more than 5 individual chromosomes are listed
in this table. Expected frequencies are calculated from individual allele frequencies. P-values
for 2-locus haplotypes are calculated using 2X2 contingency tables and Fisher’s test, while p-

values for 3-locus haplotypes were calculated using the 1-sample z-test.




HLA-A, B Haplotype Count
Obs.
Freq.
Exp.
Freq.
P-Value HLA-B, DRB1 Haplotype Count
Obs.
Freq.
Exp.
Freq.
P-Value
A*0207 B*4601 36 9.38% 1.93% 0.0001 B*4601 DRB1*0901 27 7.03% 2.56% 0.0001
A*3303 B*5801 29 7.55% 0.92% 0.0001 B*5801 DRB1*0301 22 5.73% 0.58% 0.0001
A*1101 B*4001 21 5.47% 4.72% 0.4410 B*1502 DRB1*1202 12 3.13% 0.49% 0.0001
A*1101 B*1301 16 4.17% 2.05% 0.0007 B*4001 DRB1*0901 11 2.86% 2.64% 0.7112
A*0203 B*3802 15 3.91% 0.44% 0.0001 B*4001 DRB1*1602 10 2.60% 1.18% 0.0074
A*1101 B*1502 12 3.13% 0.16% 0.0050 B*4601 DRB1*1101 7 1.82% 1.06% 0.1566
A*2402 B*4001 12 3.13% 2.08% 0.1003 B*4001 DRB1*1501 7 1.82% 1.10% 0.3250
A*0201 B*4001 10 2.60% 1.34% 0.0228 B*1301 DRB1*1501 7 1.82% 0.58% 0.0037
A*1101 B*4601 9 2.34% 4.56% 0.0080 B*3802 DRB1*1602 7 1.82% 0.41% 0.0004
A*1101 B*5502 9 2.34% 1.57% 0.1225 B*5401 DRB1*0405 6 1.56% 0.19% 0.0001
A*1102 B*2704 9 2.34% 0.15% 0.0001 B*4601 DRB1*0803 6 1.56% 1.18% 0.4276
A*1101 B*5101 8 2.08% 1.18% 0.0801 B*4001 DRB1*1101 6 1.56% 1.10% 0.4057
A*1101 B*1501 6 1.56% 0.79% 0.0727 B*1301 DRB1*1602 6 1.56% 0.51% 0.0087
A*1101 B*5401 6 1.56% 0.87% 0.0956

A*2402 B*4601 6 1.56% 2.01% 0.6740
HLA-C, B Haplotype Count
Obs.
Freq.
Exp.
Freq.
P-Value HLA-A, B, DRB1 Haplotype Count
Obs.
Freq.
Exp.
Freq.
P-Value
C*0102 B*4601 54 14.06% 3.34% 0.0001 A*0207 B*4601 DRB1*0901 19 4.95% 0.33% <0.0001
C*0302 B*5801 34 8.85% 0.78% 0.0001 A*3303 B*5801 DRB1*0301 18 4.69% 0.06% <0.0001
C*0702 B*4001 32 8.33% 3.26% 0.0001 A*1101 B*4001 DRB1*1501 7 1.82% 0.41% 0.0379
C*0304 B*1301 24 6.25% 0.90% 0.0001 A*1101 B*1502 DRB1*1202 7 1.82% 0.15% 0.0141
C*0801 B*1502 20 5.21% 0.45% 0.0001 A*0203 B*3802 DRB1*1602 6 1.56% 0.03% 0.0157
C*0702 B*3802 19 4.95% 1.14% 0.0001
C*0304 B*4001 17 4.43% 2.08% 0.0006
C*1202 B*2704 12 3.13% 0.12% 0.0001
C*0102 B*5401 11 2.86% 0.63% 0.0001
C*0102 B*5502 10 2.60% 1.10% 0.0028
C*0702 B*3901 10 2.60% 0.54% 0.0001
C*1402 B*5101 10 2.60% 0.13% 0.0001
C*0401 B*1501 7 1.82% 0.13% 0.0001
C*0602 B*1302 6 1.56% 0.04% 0.0001
C*0801 B*4801 6 1.56% 0.16% 0.0001
Results

68

The frequencies at which the common HLA-A, -C and DRB1 alleles are seen on the
same chromosome with HLA-B alleles are plotted as a series of bar charts in Figure
3.7. HLA-B was chosen as the focal locus for comparison because of its central
position within the MHC but the allelic association pattern is similar regardless of
which HLA loci is used as the focal point. The corresponding r
2
and D′ coefficients
were also calculated to provide a quantitative strength to the linkage disequilibrium
between allele pairs.

At each locus, certain alleles are seen to exist on the same chromosome in uniform
frequency with multiple HLA-B partners. For example, HLA-A*1101 is not
exclusively found with a single HLA-B allele. In contrast several alleles are
predominately associated with a single HLA-B partner (e.g HLA-A*3303, HLA-
C*0302, HLA-DRB1*0301). Overall, there is a tighter linkage between HLA-C and
HLA-B alleles as compared to HLA-A or HLA-DRB1. Each HLA-C allele has a
dominant HLA-B partner with an r
2
value of at least 0.1, a reflection of the close
proximity between the loci. HLA-B (located at position 31.43Mb on the physical
map) and HLA-C (31.34Mb) are separated by less than 100kb and are only 0.15cM
apart on the genetic map (International HapMap Consortium 2005), and hence there
are less historical recombinant events between the loci. Still the strength of the
association varies between HLA-B and -C pairs. C*0302 is perfectly correlated with
B*5801, while C*0702 has no consistent HLA-B partner: 40% of C*0702
chromosomes carry B*4001, 24% carry B*3802 and 13% carry B*3901.

Results

69





Figure 3.7 Linkage Between HLA Allele Pairs
(Panel A)
The distribution of the top 5 HLA-B partners for each common HLA-A (red charts) , -C (blue charts) and DRB1 (green charts in Panel B on the next
page) allele is shown in these bar charts. The blue and red numbers above each bar indicate D’ and r
2
coefficients between corresponding HLA pairs
respectively. Some HLA allele pairs are found exclusively with each other (e.g HLA-C*0302 and HLA-B*5801) indicative of an absence of
recombination in chromosomes carrying these pairs, in contrast other HLA alleles (e.g HLA-A*1101) are found with multiple HLA-B partners.
Results

70

Figure 3.7 Linkage Between HLA Allele Pairs
(Panel B)
The distribution of the top 5 HLA-B partners for each common DRB1 allele is shown in these bar charts. The blue and red numbers above each bar
indicate D’ and r
2
coefficients between corresponding HLA pairs respectively.

Results

71
Highly correlated allele HLA-A,-B pairs include A*0203 - B*3802 (r
2
= 0.31),
A*0207 - B*4601 (r

2
= 0.39) and A*3303 - B*5801 (r
2
= 0.58). For HLA-C and -B
pairs, besides the perfect correlation between C*0302 - B*5801 (r
2
= 1.00), there is
high correlation between C*0102 - B*4601(r
2
= 0.52), C*0304 - B*1301 (r
2
= 0.39)
and C*0801 - B*1502 (r
2
= 0.58). For HLA-B, -DRB1 pairs only B*5801-
DRB1*0301 (r
2
= 0.54) are found to be strongly associated together.

3.1.3.4 Homozygosity of HLA Haplotypes
It has been suggested that the majority of MHC haplotypes in major ethnic groups
exist as conserved extended haplotypes (CEHs) that stretch over 1Mb in length.
(Awdeh et al. 1983, Degli-Esposti et al. 1992b, Alper et al. 2006). While useful in
depicting historical recombination hotspots, as well as indicating regions of limited
diversity and uncovering tagging SNPs, the fragmented nature of SNP LD/haplotype
maps structure renders them unable to detect CEHs in a population sample (Alper et
al. 2006). The need to better understand the structure of these extended haplotypes is
underscored by their presence at high frequencies and implicated roles in MHC
associated diseases (Dawkins et al. 1999, Yunis et al. 2006). Counting the frequency
of HLA alleles and haplotypes establishes the allelic combinations that are in linkage

disequilibrium, and if these high LD HLA haplotypes belong to conserved extended
haplotypes, there should be a high level of sequence similarity in the intervening
DNA sequence sandwiched between the HLA loci. To test this, SNP-HLA haplotypes
were constructed and the homozygosity of each SNP location on individual HLA
haplotypes was calculated and plotted.

Results

72
We define single marker homozygosity as the probability of randomly selecting 2
chromosomes from a population and finding them to be identical at that locus (See
Methods). Similarly, haplotype homozygosity (HH) is taken to be the probability that
2 randomly selected chromosomes are identical across a defined stretch of
polymorphic markers. Using this definition, HLA haplotypes with high SNP
homozygosity are highly similar throughout its length, possibly identical-by-descent
with little or no recombination having occurred since those haplotypes diverged from
a common ancestor. Such haplotypes exhibit ‘genetic fixity’ (Yunis et al. 2003) and
are representative of the conserved extended haplotypes in the population.

To determine the homozygosity of different HLA haplotypes, the SNP and HLA
genotypes of the 192 individuals were first combined into a single dataset. SNP-HLA
haplotypes were then re-constructed using the program PHASE. The phased samples
were segregated accordingly into HLA haplotypes and the homozygosity of each SNP
within the haplotypes was calculated.

These single-SNP homozygosity plots were constructed for all common (n>=5) 2-
locus (HLA-A,-B and HLA-B,-DRB1) and 3-locus (HLA-A,-B,-DRB1) haplotypes
(Figures 3.8, 3.9 and 3.10). Additionally, the haplotype homozygosity for each HLA-
haplotype was also determined. For example, the HH for haplotype A*0201-B*4001
is 0.19 (top left hand corner, Figure 3.8), and this implies that if 2 samples are

randomly drawn from a population of A*0201-B*4001 chromosomes, there is a 19%
chance that those 2 samples are completely identical at all the assayed SNPs in
between HLA-A and HLA-B.

Results

73
From the homozygosity plots it is clear that there are haplotypes with consistently
high single-SNP homozygosity throughout the length of the haplotype segment, and
in contrast, there are haplotypes with very little similarity in between. Within the
HLA-A, -B haplotypes, the haplotypes with a consistently high single-SNP
homozygosity (>=0.7) across the segment are A*1101-B*1502 (HH=0.73), A*1102-
B*2704 (HH=0.58), A*0203-B*3802 (HH=0.74), A*0207-B*4601 (HH=0.84) and
A*3303-B*5801 (HH=1.00). Within the HLA-B, -DRB1 haplotypes strong fixity is
seen in the following haplotypes: B*1301-DRB1*1501 (HH=0.71), B*1502-
DRB1*1202 (HH=1.00), B*3802-DRB1*1602 (HH=1.00), B*4601-DRB1*0901
(HH=0.92), B*5401-DRB1*0405 (HH=1.00) and B*5801-DRB1*0301 (HH=0.91).

There are only 5 three-locus HLA-A,-B,-DRB1 haplotypes that are represented more
than 5 times in the samples, and of these A*0203-B*3802-DRB1*1602 (HH=1.00),
A*0207-B*4601-DRB1*0901 (HH=0.72) , A*1101-B*1502-DRB1*1202 (HH=1.00)
and A*3303-B*5801-DRB1*0301 (HH=1.00) are highly conserved. The genetic
fixity of these haplotypes across unrelated chromosomes is remarkable. For instance,
18 chromosomes carry the A*3303-B*5801-DRB1*0301 haplotype and they appear
completely identical across the entire 2.6Mb segment. In contrast A*1101-B*4001-
DRB1*1501 show very little identity across the segment, and with HH=0, indicates
that none of the chromosomes in the samples which carry these 3 alleles are identical.

The plots also show that HLA allele frequencies are not an indicator of homozygosity.
HLA alleles common in the local Chinese population such as A*0201, A*2402,

B*1301, B*4001, DRB1*1101 and DRB1*1501 are not associated with haplotypes
with high homozygosity. Additionally, although the HLA-A*1101 allele (found in

×