Tải bản đầy đủ (.pdf) (50 trang)

Association study of ABCA1 polymorphisms in singapore populations 5

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.38 MB, 50 trang )

Chapter 5 ABCA1 SNP Survey

63
5 ABCA1 SNP Survey
5.1 Introduction
Evidence that ABCA1 gene mutations are responsible for the familial high density
lipoprotein (HDL) deficiency disorders of Tangier disease and familial
hypoalphalipoproteinemia (Bodzoich et al., 1999; Brooks-Wilson et al., 1999; Marcil et al.,
1999; Remaley et al., 1999; Rust et al., 1999), together with the long established finding
from epidemiological studies that HDL levels are inversely related with coronary artery
disease (CAD; Wang and Briggs, 2004), suggests common ABCA1 genetic variations
may explain phenotypic variation in HDL levels and CAD susceptibility in the general
population. Association studies are an approach to investigate this notion, but to facilitate
such studies, single-nucleotide polymorphisms (SNPs) in ABCA1 must first be identified.
When this research was initiated in early 2000, there was a paucity of ABCA1
SNPs reported in literature. Pullinger et al. (2000) discovered -278G>C and -14C>T in the
proximal promoter as well as 237indelG and 296C>G in the 5’ untranslated region (UTR)
while mapping the transcriptional initiation site of the ABCA1 gene. In another report,
Wang et al. (2000) found four missense (R216K, V825I, M883I and R1587K), and five
silent coding SNPs (cSNPs; P312, G316, I680, L960 and T1427) during their cDNA-
based resequencing efforts.
Here, we surveyed the sequence variation in the ABCA1 gene among Singapore
Chinese, Malays and Indians using experimental and in silico strategies. A segment of
the ABCA1 proximal promoter was amplified and resequenced. Two strategies were used
to examine variation in the exonic regions. Individual exons were amplified and subjected
to heteroduplex detection using denaturing high performance liquid chromatography
(DHPLC) under partially denaturing conditions. In addition, expressed sequence tags
(ESTs) and full-length mRNAs from the ABCA1 UniGene cluster were multiply aligned
Chapter 5 ABCA1 SNP Survey

64


and candidate SNPs identified from regions of sequence overlaps. The results of the SNP
discovery efforts are presented in a chronological order.
5.2 Results
5.2.1 First SNPFINDER Analysis of ABCA1 UniGene ESTs (Early 2000)
In silico SNP discovery essentially mines SNPs from pre-existing DNA sequences. Public
domain sequences such as ESTs provide a rich resource for SNP discovery (Buetow et
al., 1999; Garg et al., 1999; Marth et al., 1999; Picoult-Newberg et al., 1999; Irizarry et al.,
2000; Cox et al., 2001). The SNPFINDER (Buetow et al., 1999) was chosen to perform in
silico SNP mining in the ABCA1 gene because of its integration with the UniGene
database which consists of gene-specific ESTs and full-length mRNAs, and public DNA
electrophoretogram archives. The key steps in the SNPFINDER pipeline include
basecalling of DNA traces using PHRED which assigns a quality (Q) value to each base,
assembly and alignment of multiple sequences using PHRAP, and finally, identification of
candidate variants from regions of sequence overlaps using a statistical analysis that
takes into account sequence quality. By examining sequencing traces, bona fide allelic
variants can be discerned from sequencing errors with greater confidence as opposed to
mere comparison of text-based sequences, for instance, using BLAST (Cox et al., 2001).
Candidate variants flagged by SNPFINDER are assigned a score which directly reflects
the probability that a position within a given assembly has heterogeneity in nucleotide
composition (Buetow et al., 1999).
When this survey was first conducted in early 2000, the ABCA1 UniGene cluster
comprised of two mRNAs and 60 non-redundant ESTs (Table 5.1). Most of the ESTs
originated from normal tissues. DNA traces were available for approximately half (31/60)
of the ESTs, and these together with the mRNAs, were analyzed by SNPFINDER. Six
candidate sequence variants with scores ranging from 0.05 to 0.96 were predicted
Chapter 5 ABCA1 SNP Survey

65
(Figures 5.1 and 5.2). All candidate variants are located in the 3’ portion of the ABCA1
gene, which probably reflects the fact that cDNA synthesis is generally primed by oligo dT

primers as well as the fairly long 3’ portion of the ABCA1 mRNA (Santamarina-Fojo, et
al., 2000). Since SNPFINDER was primarily developed for large-scale identification of
SNPs, the original authors used a high arbitrary cutoff score of 0.99 in order to have a
higher confirmation rate at the validation stage. Applying such a stringent cutoff in our
case would result in no hit. Nevertheless, closer inspection of the multiple alignment
revealed a highly probable A>G polymorphism (SNPFINDER score 0.96) in the position
corresponding to nucleotide 8995 in the 3’UTR of the ABCA1 mRNA (numbering in
mRNA with respect to reference sequence NM_005502). As illustrated in Figure 5.1, two
out of nine ESTs originating from distinct tissue sources harbour the minor G variant.
Moreover, the candidate variant is flanked by high quality bases (denoted by uppercase
letters in the multiple alignment in Figure 5.1), further indicating the high likelihood of the
SNP’s existence.
In contrast, the candidate variants at positions 9091, 9410, 10027, 10029 and
10032 were less unlikely to be true SNPs because they occurred as singletons,
possessed low scores or were flanked by low quality bases (Figure 5.2). Conversely,
some of them might represent true but rare variants which would necessitate more
members in the contig for detection. The variant at nucleotide 9410 has since been
confirmed in a recent Japanese population survey (Iida et al., 2001).
To experimentally verify the existence of the 8995A>G candidate SNP, a short
109 bp segment containing the candidate SNP was amplified and subjected to single-
strand conformation polymorphism (SSCP) analysis and sequencing. Figure 5.3 shows
the three distinct band migration patterns on SSCP gels which match the three expected
genotypes identified by sequencing analysis. The 8995A>G SNP is located ~1900 bp
Chapter 5 ABCA1 SNP Survey

66
downstream of the stop codon, and ~1400 bp upstream of the first polyadenylation motif,
AAUAAA (Santamarina-Fojo et al., 2000).
To assess the potential biological significance of 8995A>G, we searched for
potential conserved sequences in the ABCA1 3’UTR. There is an absence of sequence

conservation in the 3’UTRs of the ABCA1 mRNAs from human, mouse, rat and chicken.
This is not an unexpected finding since the 3’ ends of genes are generally more
heterogeneous among species compared to protein coding sequences (Makalowski et
al., 1996). The less conserved nature of 3’UTRs possibly confers flexibility in spatial and
temporal aspects of gene regulation in a manner specific to the organism (Conne et al.,
2000). Furthermore, a search against the 3’UTR database (Pesole et al., 2002;
also revealed no other genes
carrying sequence motifs similar to the ABCA1 3’UTR.
Allelic variants can create different structural folds in mRNA, leading to different
phenotypic consequences (Shen et al., 1999). The MFOLD program (Zuker, 2003;
was utilized to investigate whether
the A and G allelic variants at 8995 would impact the folding of the ABCA1 3’UTR. RNA
secondary structures encoded by both variants appear similar to one another (Figure
5.4).
Chapter 5 ABCA1 SNP Survey

67

Table 5.1 List of mRNAs and ESTs in the human ABCA1 UniGene cluster
Hs.211562 from a release in early 2000. “*” indicates ESTs with DNA
traces (31 in total) that are available from public FTP archives (e.g.
Washington University Genome Sequencing Centre) and which could be
automatically retrieved by SNPFINDER for SNP detection.
mRNA


NM 005502
AJ012376
EST cDNA source


EST cDNA source
AI802228* Pancreas

AL048638 Unknown
AA627178* Thyroid

AA434152* Ovary
AI807534 Unknown

AA302670 Adipose
AI356194* Brain

AL038231 Unknown
AI628099* Kidney

AL048434 Unknown
R01050* Unknown

AA618276* Thyroid
R01051* Unknown

AA883989* Unknown
AI359714* Brain

AA521292* Tonsil
AA902925* Smooth muscle

AA328447 Whole embryo
AA493786* Thyroid


N94914 CNS
AI399824* Unknown

AW364342 Denis drash
AA731742* Tonsil

AA748860* Tonsil
AI819656 lung

AW044702 Pooled
R31961* Placenta

AW364344 denis drash
AW051752 Kidney

AW364424 denis drash
AA814091* Tonsil

N36906* Foreskin
AI695068* Lung

AI707785* Aorta
AA625082* Unknown

AW364428 Denis drash
AI344681* Kidney

C01846 Unknown
AA292158* Ovary


AA367573 Placenta
AA669024* Lung

AW006879 Kidney
AW130712 Stomach

N46182* Foreskin
N63586* Central nervous
system
AW362709 Colon
AW190098 Pancreas

AW380897 Head neck
AW019981 Ear

AA442439* Whole embryo
AA704305* Unknown

AA357618 Prostate
D79969 Aorta

AL048433 Uterus
AA618309 Thyroid

AW364331 Denis drash
AI241822* Brain

AA826281* Tonsil
AW019972 Ear


AA302777 Adipose

Chapter 5 ABCA1 SNP Survey

68









Figure 5.1 A high confidence candidate 8995A>G SNP identified from the ABCA1
UniGene cluster Hs.211562 (early 2000 release). Only ESTs with DNA traces were
analyzed with the SNPFINDER program (indicated by an asterick in Table 5.1).
These sequences were basecalled by PHRED, assembled by PHRAP and candidate
variants identified from regions of overlaps by DEMIGLACE (Buetow et al., 1999). (A)
Position of the candidate SNP in the context of the multiple alignment. Two out of
nine ESTs harbour the G variant. Upper and lower case letters in the alignment
denote bases of high or low sequence quality respectively. (B) Representative DNA
traces with the corresponding PHRED Q values at the variant position. Q value is a
measure of the quality of each basecall and is related to the error probability p by: Q=
-10 log
10
p (Ewing and Green, 1998).

A
B

Chapter 5 ABCA1 SNP Survey

69






Figure 5.2 SNPFINDER multiple alignment showing the low confidence candidate
variants identified from members of the ABCA1 UniGene cluster Hs.211562.
Candidate variants are highlighted in blue or green columns. (A) 9091G>A. (B) from
left to right, 10027T>G, 10029G>A and 10032A>T. (C) 9410A>G. Upper and lower
case letters in alignment denote high and low base quality calls respectively.
SNPFINDER scores ranged between 0.05 and 0.66. This analysis was conducted in
early 2000.
A
B
C
Chapter 5 ABCA1 SNP Survey

70



B
A
Figure 5.3 Experimental confirmation of 8995A>G, a novel SNP in the 3’UTR of
the ABCA1 gene. (A) SSCP analysis of a short 109 bp amplicon flanking the SNP
reveals three reproducible and distinctive band migration patterns. (B) DNA traces

showing the three representative genotypes.

Chapter 5 ABCA1 SNP Survey

71






Figure 5.4A Lowest energy RNA structure predicted by MFOLD (Zuker, 2003)
for the 8995A allele. Due to size limits imposed by the program, only the
3’UTR (nucleotides 144716-148034 in Genbank entry AF275948) was folded.
The position of the 8995A allele is indicated by an arrow.
A
Chapter 5 ABCA1 SNP Survey

72




Figure 5.4B Lowest energy RNA structure predicted by MFOLD (Zuker, 2003)
for the 8995G variant.
B
Chapter 5 ABCA1 SNP Survey

73
5.2.2 Second SNPFINDER Analysis of ABCA1 UniGene ESTs (Late 2002)

To gauge the efficiency of the initial SNPFINDER analysis as well as to potentially
uncover more ABCA1 SNPs, a second analysis was conducted on a later release of the
ABCA1 UniGene cluster (build 156, release 28 Sep 2002). By this time, the number of
sequences in the UniGene cluster increased from 62 to 87 (Table 5.2). Also noted is an
increased number of ESTs derived from tumourigenic sources, attributable to initiatives to
catalogue genes expressed in cancers (Strausberg et al., 2003).
Seven high confidence candidate variants with scores of at least 0.96 were
identified (Figure 5.5). These include 8995A>G which had been confirmed earlier.
Singletons were recorded at positions 8375 (Figure 5.5B) and 8517 (Figure 5.5C)
whereas multiple occurrences of the alternative variants were recorded at positions 8705
(four sequences with G vs 11 sequences with T, low quality sequences disregarded,
Figure 5.5D) and 8720 (three sequences with G vs 12 sequences with T, Figure 5.5D).
However, we also noted that these high scoring candidate variants resided exclusively in
ESTs obtained from a single cDNA library, NIH MGC69 (sequences 601490437FL,
601491738FL and 601492503FL in Figure 5.5 correspond to ESTs BE880894,
BE879545 and BE878485 respectively in Table 5.2), which originated from an
undifferentiated large cell carcinoma (library information obtained from the IMAGE
consortium, ). None of these putative variants have been confirmed in
dbSNP (build 123). It is plausible that they were acquired during propagation of the tissue
in culture or represent mutations with a specific role in tumour development. Two
candidate variants with high scores of 0.96, 8539C>T (Figure 5.5C) and 9410A>G
(Figure 5.5F), were found in two ESTs, one of which was derived from a large cell
carcinoma. In the first SNPFINDER analysis conducted in early 2000, we also detected
9410A>G (see Figure 5.2B) with a score of 0.23. But because low scoring variants have
a lower confirmation rate (Cox et al., 2001), we had not considered validating the
Chapter 5 ABCA1 SNP Survey

74
9410A>G variant then. Both 8539C>T (rs
4149339) and 9410A>G (rs4149341) were first

verified in the Japanese population (Iida et al., 2001), and multiple dbSNP (build 123)
submissions have also been noted to date.
The remaining five candidate variants from the second SNPFINDER analysis
registered lower scores ranging between 0.02 and 0.79. Among these was 10029G>A
(Figure 5.5H) which was previously encountered in the first SNPFINDER analysis (Figure
5.2) whereas the other four candidate variants at positions 8570 (Figure 5.5C), 8673
(Figure 5.5D), 9097 (Figure 5.5E) and 9696 (Figure 5.5G) represented novel predictions
from the second analysis. None of the low scoring putative variants have been
documented dbSNP (build 123).Two variants at positions 10027 and 10032 documented
from the first SNPFINDER analysis (Figure 5.2) were not found in the second analysis
because the original EST sequences in which these candidate sequence variants were
initially identified had been withdrawn from the UniGene set.
Table 5.3 summarizes the comparison of the two SNPFINDER analyses
conducted separately in 2000 and 2002. We combined information from dbSNP (build
123) as well as published reports (Iida et al., 2001) to verify the number of in silico
variants that are likely to be true positives or negatives. By increasing the number of
sequences in the alignment, the specificity of the in silico mining is higher since no false
negatives were encountered in the second SNPFINDER analysis. On the other hand,
using a cutoff score of 0.96, two false negatives from the first SNPFINDER analysis were
documented: 9410A>G which showed a low score of 0.23 (Figure 5.2C), and
8539C>G
which was missed completely due to a lack of ESTs in this region. Although an expanded
set of UniGene sequences was able to detect more true variants, more false positives
were also generated, probably due largely to the source of the ESTs in which these
variants were found.
Chapter 5 ABCA1 SNP Survey

75

Table 5.2 List of mRNAs and ESTs in the human ABCA1 UniGene cluster Hs.211562

from a later release (release 156, date accessed 28 Sep 2002).

mRNA




AJ012376 AK024328




NM 005502 AB055982




AF165281 AB037924




AF285167 AF258627




EST cDNA Source

EST cDNA Source


EST cDNA Source
AI802228 Adenocarcinoma

BG573350 Placenta

BE222116 Carcinoid
AA627178 Thyroid

BG567595 Liver

BE177793 Head neck
AI807534 Pooled

BG567118 Liver

AW889550 Nervous tumour
AI356194 Glioblastoma (pooled)

N46182 Melanocyte

AW845151 Colon
AI628099 Kidney

BG482804 Lung

AW380897 Head neck
R01050 Liver and spleen

BU198380 -


AW372918 Breast
R01051 Liver and spleen

BF574391 Muscle (skeletal)

AW364428 Denis drash
AA527406 Colon

AA826281 Germinal center B

AW364424 Denis drash
AI359714 Glioblastoma (pooled)

AI241822 Tumour, 5 pooled

AW364344 Denis drash
AA902925 Leiomyosarcoma

AA748860 Germinal center B

AW364342 Denis drash
AA493786 Thyroid

AA704305 Liver and Spleen

AW364331 Denis drash
AI399824 Pooled human melanocyte,
fetal heart,uterus


AI707785 Aorta

AW362709 Colon
AA731742 Germinal center b cell

AL048638 Brain

AW190098 Adenocarcinoma
AI819656 Squamous cell carcinoma

AA669024 Lung

AW130712 Adenocarcinoma
R31961 Placenta

AL048433 Uterus

Z44377 Total brain
AW051752 Kidney

AA618309 Thyroid

AW019981 Cochlea
AA814091 Germinal center B cell

AA618276 Thyroid

AW044702 Pooled
AI695068 Carcinoid


AL048434 Uterus

BE929960 Placenta normal
AA883989 Pooled

AA521292 Germinal center B

BG400012 Kidney
AA625082 Pooled human melanocyte,
fetal hea
rt,uterus

AW006879 Kidney

BG436050 Placenta
AI344681 2 pooled tumours (clear
cell t
yp
e
)


AA442439 -

N36906 Melanocyte
AA292158 Ovarian tumour

AA434152 Ovarian tumour

BG149600 Carcinoid

BM768930 Stomach

AA367573 Placenta

BF879888 Lung tumour
BM769397 Stomach

AA357618 Prostate

BF892148 Marrow
BM823180 Stomach

AA328447

BF886004 Testis normal
BM830709 Stomach

AA302670 Adipose tissue, white

BF951740 Nervous normal
BM978608 Primary lung epithelial cells

AA302777 Adipose tissue, white

BF988872 Placenta normal
AL698654 Human skeletal muscle

C01846 -

AW019972 Cochlea

AL701341 Human skeletal muscle

BU198400 -

BF439764 Carcinoid
BQ025022 Placenta

BQ940486 Sciatic nerve

BF433708 Carcinoid
BQ026286 Placenta

D79969 Aorta

BF431704 Pooled
BF928185 Nervous_tumour

BE857175 Glioblastoma

AU156154 Placenta
BM153383 Leukopheresis

BE880894 Large cell carcinoma

AW601575 Breast
N94914 Multiple sclerosis lesions

BE879545 Large cell carcinoma

BF855659 Prostate normal

BM728651 Ocular tissues

BE878485 Large cell carcinoma

BF671104 Muscle (skeletal)
BI754756 Brain

BE816862 Breast normal

AA737119 Ewing's sarcoma
BI494651 Cochlea

BE715104 Head neck

BF379205 Uterus tumour
BI494650 Cochlea

AV661400 Liver tissue

BF348792 Denis drash
BI063291 Uterus_tumour

AV656040 Liver tissue

BF216316 Glioblastoma
N63586 Multiple sclerosis lesions

AV647223 Liver tissue

AU135588 Placenta

BG678861 Squamous cell carcinoma

BF094524 Uterus tumour

BF116114 Fibrotheoma

BE971402 Muscle (skeletal)

AU121731 Mammary gland

Chapter 5 ABCA1 SNP Survey

76



A
B
C
Figure 5.5 Twelve putative SNPs predicted by SNPFINDER conducted on an
expanded and later release of ABCA1 UniGene cluster (build 156, release
Sep 2002). Sequence variants are located in blue or green columns. (A)
8995A>G, score 0.99. (B) 8375C>T, score 0.96. (C) From left to right:
8517G>C, 8539C>T and 8570T>A with scores of 0.96, 0.96 and 0.54,
respectively. Numbering is with respect to ABCA1 mRNA reference sequence
NM_055022.
Chapter 5 ABCA1 SNP Survey

77








D
E
F
Figure 5.5 Continued from previous page. (D) from left to right: 8673T>G,
8705T>G and 8720T>G with scores of 0.16, 0.99 and 0.99 respectively.
(E) 9097G>A, 0.79. (F) 9410A>G, 0.99.

Chapter 5 ABCA1 SNP Survey

78








G
H
Figure 5.5 Continued from previous page. (G) 9696C>T, 0.02. (H)
10029G>A, 0.38.
Chapter 5 ABCA1 SNP Survey


79

Table 5.3. Summary of two SNPFINDER analyses conducted separately in
2000 and 2002.
SNPFINDER I SNPFINDER II
Number of sequences in ABCA1
UniGene cluster
62 87
Total number of predicted variants
identified in silico
6 12
Number of predicted variants with
score ≥0.96
1 7
Number of validated
d
variants with
score ≥0.96
1
a
3
b
Number of validated
d
variants with
score <0.96
1
c
0
a

8995A>G (rs363717).
b
8539C>G (rs4149339), 8995A>G and 9410A>G (rs4149341).
c
9410A>G.
d
Confirmation from laboratory, publications or with multiple submissions to dbSNP (build 123).


Chapter 5 ABCA1 SNP Survey

80

5.2.3 SNP Survey in the ABCA1 Proximal Promoter
Resequencing of a 600 bp segment of the ABCA1 proximal promoter revealed seven
sequence variants: -14C>T, -99G>C, -278C>G, -302C>T, -407C>G, -463C>T and -
564T>C. Representative DNA traces for these SNPs except -14C>T from individuals are
shown in Figure 5.6. Because only a limited number of individuals (n=16 per ethnicity)
were sequenced, genotype and allele frequencies were not determined as they are
unlikely to be representative of the true population frequencies. All variants except -
463C>T were detected in multiple individuals in each local population sample. -463C>T
appeared as a heterozygous variant in one Indian individual and thus it could represent a
rare or population-specific variant, or a PCR-induced mutation. Subjecting a second
freshly amplified fragment to sequencing confirmed that the singleton variant was truly
existent. -14C>T, -99G>C, -278C>G, -302C>T, -407C>G and -564T>C have been
documented in numerous ABCA1 promoter surveys involving Caucasian (Pullinger et al.,
2000; Zwarts et al., 2002; Probst et al., 2004; Tregouet et al., 2004) and Japanese
individuals (Iida et al., 2002; Shioji et al., 2004; Yamakawa-Kobayashi et al., 2004).
To assess the potential biological significance of the ABCA1 promoter SNPs, we
determined if they affect known or putative consensus transcription factor binding sites.

Previous work has identified that a cholesterol responsive element in the ABCA1
promoter, the DR4 element. It consists of two half sites of an imperfect direct repeat of
TGACCT separated by 4 bp (TGACCGatagTAACCT) and is located in the region -70 to -
55 bp. The DR4 element is critical for oxysterol activation of the ABCA1 gene by the
nuclear receptor heterodimers LXR-RXR (Costet et al., 2000; Schwartz et al., 2000).
Other experimentally mapped elements include the E-box centered at position -147
(Yang et al., 2002) and the GnT repeats (recognition motif for the zinc finger transcription
Chapter 5 ABCA1 SNP Survey

81
factor, Znf202) between -229 and -210 (Porsch-Ozcurumez et al., 2001) which repress
ABCA1 gene transcription. None of the seven promoter SNPs identified in the study
disrupt any of these experimentally-verified transcriptional regulatory elements.
Putative transcription factor binding sites in the ABCA1 proximal promoter were
identified by searching against the TRANSFAC database (Wingender et al., 1996) using
MATCH (Kel et al., 2003). -407C>G lies in a segment that matches the consensus
binding sites for hepatic nuclear factor 4 (HNF4) and c-REL with core and matrix similarity
scores of 0.88 and 0.78, and 1.00 and 0.87 respectively (Figure 5.7). HNF4, a liver-
enriched transcription factor, controls a variety of genes involved in lipid and glucose
metabolism such as the apolipoproteins AI, AII, B and CIII (Sladek and Seidel, 2001) and
mutations of the HNF4α isoform underlie human diseases such as maturity-onset
diabetes of the young (Yamagata et al., 1996) and non-insulin dependent diabetes
mellitus (Nakajima et al., 1996). c-REL is a member of the NF-κb family of transcription
factors and contains a potent transactivation domain (Chen and Green, 2004). The
presence of this putative NF-κB recognition element in the ABCA1 proximal promoter is
consistent with the role of inflammation in atherosclerosis.
To determine whether the proximal promoter SNPs affected evolutionarily
conserved bases, we performed a comparative analysis of the promoter segments of
ABCA1 orthologues. Figure 5.8 shows the multiple alignment of human, chimpanzee,
dog, mouse and rat ABCA1 promoters. The human variant -99C>G targets an extremely

conserved base although no putative (Figure 5.7) or known transcription factor binding
site is located here. In contrast, the sites corresponding to human -14C>T, -463C>T and -
564T>C variants have apparently diverged across evolutionary times; for instance, three
different nucleotide variants are seen across species for -14C>T and -564T>C. The
regions flanking the human variable sites -278G>C, -302C>T, -463C>T and -564T>C are
less evolutionarily conserved compared to -14C>T, -99C>G and -407C>G. The highly
Chapter 5 ABCA1 SNP Survey

82
conserved segment containing the -14C>T and -99C>G variants is attributed to the
presence of the DR4, Ebox, GnT and core promoter elements (Costet et al., 2000;
Schwartz et al., 2000; Porsch-Ozcurumez et al., 2001; Langmann et al., 2002; Yang et
al., 2002). We had earlier noted that -407C>G resides in a putative transcription binding
site (Figure 5.7).
Chapter 5 ABCA1 SNP Survey

83

Figure 5.6 ABCA1 proximal
promoter SNPs identified by direct
sequencing of PCR products.
Except -407C>G, the common
homozygote and heterozygote
forms of the variants are shown. -
14C>T is not shown here.
Chapter 5 ABCA1 SNP Survey

84

< HNF3-betaB(0.91)

< FOXD3(0.91)
-564T>C < CAAT box(0.93)
acaaaagcagcccattacccagaggactgtcCgccttcccctcaccccagcctaggcctttgaaaggaaacaaaagacaagacaaaatgattggcgtcctgagggagattcagcctagag -487


>C/EBP_C(0.91) < c-REL_01(0.87)
< FOXD3(0.91) >Oct1(0.82)
< FOXD3(0.89)
< HNF3-beta(0.90)
< FOXD3(0.92)
>HNF4(0.80)

ctctctctccccCaatccctccctccggctgaggaaactaacaaaggaaaaaaaaattgcggaaagcaCgatttagaggaagcaaattccactggtgcccttggctgccgggaacgtgga -357


< Elk1(0.92) < E47(0.98)

ctagagagtctgcggcgcagccccgagcccagcgcttcccgcgcgtcttaggcCggcgggcccgggcgggggaagggGacgcagaccgcggaccctaagacacctgctgtaccctccacc -237


>NF-E2(0.90)
< ARNT(0.98)
>RREB1(0.91) < HFH3(0.96)
>RREB1(0.90) < HFH8(0.99)
< c-MycMax(0.96)
>c-MycMax(0.94)
>USF (0.93) FREAC7(0.98)
>USF(1.00) FOXJ2(0.96)
< HNF4(0.81)
< USF(0.99)

>AP1(0.98)
GnT/Znf202 E-box
cccacccca
ccccacccacctccccccaactccctagatgtgtcgtgggcggctgaacgtcgcccgtttaaggggcgggccccggctccacgtgctttctgctgagtgactgaactacat -117


< HNF4(0.83) < CDPCR1(0.85)
>CDPCR1(0.87) >TATA(1.00)
HFH3(0.96)
HFH8(0.99)
>FREAC7(0.98)
>FOXJ2(0.96)
DR4/LXR +1
aaacagaggccgggaaGggggcggggaggagggagagcacaggctttgaccgatagtaacctctgcgctcggtgcagccgaatctataaaaggaactagtcCcggcaaaaaccccG 1




Figure 5.7 Putative transcription factor binding sites in the ABCA1 gene proximal promoter identified using MATCH (Kel et
al., 2003). Experimentally mapped regulatory motifs are underlined. SNPs are highlighted. Only motifs for human
transcription factors are depicted. The matrix similarity scores are indicated in brackets. +1 refers to the transcriptional start
site (Santamarino-Fojo et al., 2000).
-564T>C
-463C>T -407C>G
-302C>T -278C>G
-99G>C -14C>T


85



human CAAAGTCCAGGTTTGTGGGGGGAAAACAAAAGCAGCCCATTACCCAGAGGACTGTCCGCC 120
chimp CAAAGTCCAGGTTTGTGGGGGGAAAACAAAAGCAGCCCATTACCCAGAGGACTGTCCGCC 120
dog
mouse GGCTG-AGCAAACTAACAAAAGGAG AGGGGGGAGAGTGGG 39
rat CCAAGCTGGGGGCTG-AGTAAACTAACAAAAGGAG AGAGGG-AGAGAGGG 98


human TTCCCCTCACCCCAGCCTAGGCCTTTGAAAGGAAACAAAAGACAAGACAAAATGATTGGC 180
chimp TTCCCCTCACCCCAGCCTAGGCCTTTGAAAGGAAACAAAAGACAAGACAAAATGATTGGC 180
dog
mouse AGTA AGGGAGAGCG GGAGGGAGAGAGGAAG AGGGC ATACAC 80
rat AGTACGGGGGCGGGGGGGGGGAG GGAGAGAAAGAGGAAGGTAGGGAGAAAAAAACAC 155


human GTCCTGAGGGAGATTCAGCCTAGAGCTCTCTCTCCCCCAATCCCTCCCTCCGGCTGAGGA 240
chimp GTCCTGAGGGAGATTCAGCCTCGAGCTCTCTCTCCCCCAATCCCTCCCTCCGGCTGAGGA 240
dog
mouse ACACAAA-CAAAA CAAA-CAAA ACTCAAAAAGC AA 113
rat ACACAAAACAAAA CAAAACAAAACACACAAACTCAAAATTC AA 198


human AACTAACAAAGGAAAAAAAAATTGCGGAAAGCAGGATTTAGAGGAAGCAAATTCCACTGG 300
chimp AACTAACAAAGGAAAAAAAAATTGCGGAAAGCACGATTTAGAGGAAGCAAATTCCACTGG 300
dog
mouse CACCCACAAAACCCCAAACAATTGCAGAAAGAGGAGTTTAGAGAACGAGCTTTTCCCCTT 173
rat CACCCACAAACCCCCAAACAATTGCAGAAAGAAGAGTTTAGAGGAAGCGTTTTTCCCTTT 258

human TGCCCTTGGCTGCCGGGAACGTGGACTAGAGAGTCTGCGGCGCAGCCCCGAGCCCAGCGC 360
chimp TGCCCTTGGCTGCCGGGAACGTGGACTAGAGAGTCTGCGGCGCAGCCCCGAGCTCAGCGC 360

dog
mouse TCCTCCT CTGCCGGGAATGTGGA GTCCCTGGCTCAGCGC-AAGTCCGGAGT 223
rat CCCACTT CTGCCGGGAATGCGGA GTCCCTGGCTCAGCTC-AAGTCCGGAGT 308


human TTCCCGCGCGTCTTAGGCCGGCGGGCCCGGGCGGGGGAAGGGGACGCAGACCGCGGACCC 420
chimp TTCCCGTGCGTCTTAGGCCGGCGGGCCCGGGCGGGGGAAGGGGACGCAGACCGCGGACCG 420
dog
mouse TTCCCGTTTCCCGAAGGCTAGCAGGTCAGGGCCAGGGC TACAGAAAGCGGGCCC 277
rat TTCCCGTTACCCGAAG-CTAGCAGGTCAGGGCCGGGGC TAC-GAAAGCGGACCC 360

human TAAGACACCTGCTGTACCCTCCACCCCCACCCCACCC ACCTCCCCCCAACTCCCT 475
chimp TAAGACACCTGCTGTACCCTCCACCCCCACCCCACCCCACCCACCTCCCCCCAACCCCCT 480
dog
mouse CACAAAGCTCTC ACCATGCGCCCCCAGTGC CCGCTGC 314
rat CACGAAGCTTGC ACCATCCTCCCCCAGTGT CAGCCGC 397

human AGATGTGTCGTGGGCGGCTGAACGTCGCCCGTTTAAGGGGCGGGCCCCGGCTCCACGTGC 535
chimp AGATGTGTCGTGGGCGGCTGAACGTCGCCCGTTTAAGGGGCGGGCCCCGGCTCCACGTGC 540
dog ACGTCGCCCGTTTAAGGGGCGGGGCGCGGCTCCACGTGC 39
mouse GGCGGCGCACTGTCGCCGGTTTAAGGGGCGGGCCATGTCTCCACGTGC 362
rat GGCG CACTGTCGCCGGTTTAAGGGGCGGGCCATAGCTCCACGTGC 442
****** ************** * **********

human TTTCTGCTGAGTGACTGAACTACATAAACAGAGGCCGGGAACGGGGCGGGGAGGA GGG 593
chimp TTTCTGCTGAGTGACTGAACTACATAAACAGAGGCCGGGAAGGGGGCGGGGAGGA GGG 598
dog TTTCTGCTGAGTGACTGAACTACATAAACAGAGGCCGGGGAGGGGGCGGGGAGGA GGG 97
mouse TTTCTGCTGAGTGACTGAACTACATAAACAGAGGCCGGGAAGGGGGCGGGGGAAAGAGGG 422
rat TTTCTGCTGAGTGACTGAACTACATAAACAGAGGCCGGGAAGGAGGCGGGGAAAA-AGGG 501
*************************************** * * ******* * ***


-564T>C
-463C>T
-407C>G
-302C>T -278G>C
-99C>G
Figure 5.8 Sequence variants in the human ABCA1 gene proximal promoter
shown in relation to the chimpanzee, dog, mouse and rat sequences. The
sequences were aligned using ClustalW. Human sequence variants are
highlighted. “*” denotes a base that is conserved across all five species.


86

human AGAGCACAGGCTTTGACCGATAGTAACCTCTGCGCTCGGTGCAGCCGAATCTATAAAAGG 653
chimp AGAGCACAGGCTTTGACCGATAGTAACCTCTGCGCTCGGTGCAGCCGAATCTATAAAAGG 658
dog AGCGAGCGGGCTTTGACCGGTAGTAACCCCGGCGCCCGGCGCAGCCGAATCTATAAAAGG 157
mouse AGAGAACAGCGTTTGACCGGTAGTAACCCCGGCGCTCGGCACAGCCGAATCTATAAAAGG 482
rat AGAGAACAGCGTTTGACCGGTAGTAACCCCGGCGCTCGGCACAGCCGAATCTATAAAAGG 561
** * * * ******** ******** * **** *** *******************




human AACTAGTCCCGGCAAAAACCCCGTAATTGCGAGCGAGAGTGAGTGGGGCCGGGACCCGCA 713
chimp AACTAGTCTCGGCAAAAACCCCGTAATTGCGAGCGAGAGTGAGTGGGGCCGGGACCCGCA 718
dog AACTAGTCGCGGCGAAACCCC-G 179
mouse AACTAGTCGCGGCAAAAACCA-GTAATTCCGAGGGCGAGCGAGCGGG-CCGGGACCGGCA 540
rat AACTAGTCGCGGCAAAAACCA-GTAATTCCGAGAGCGAGCGAGCGGG-CCGGGACCGGCA 619
******** **** *** ** *


human GAGCCGAGCCGACCCTTCTCTCCCGGGCTGCGGCAGGGCAGGGCGGGGAGCTCCGCGCAC 773
chimp GAGCCGAGCCGACCCTTCTCTCCCGGGCTGCGGCAGGGCAGGGCGGGGAGCTCCGCGCAC 778
dog
mouse GAGCCCACTTCTCTCC 556
rat GAGCC 624




-14C>T
Figure 5.8 Continued from previous page. Sequence variants in the human
ABCA1 gene proximal promoter shown in relation to the chimpanzee, dog,
mouse and rat sequences.


87
5.2.4 DHPLC Screening of ABCA1 Exons, Intron-Exon Junctions and Distal
Promoter
The large ABCA1 gene encodes 50 exons (Santamarina-Fojo et al., 2000), hence a SNP
discovery strategy based entirely on resequencing would be time-consuming and costly.
Therefore, for efficient SNP discovery in the ABCA1 exons, we used a screening
technique, DHPLC, which is based on heteroduplex detection of sequence variants under
partially denaturing conditions. Primers were designed to amplify most of the 50 exons
individually including the entire 5’ UTR and the protein coding portion of exon 50. In
addition, a fragment containing the newly identified exon1A and distal promoter (Cavelier
et al., 2001) was also analyzed. Collectively, 19,273 bp including 7,162 bp of exonic
sequences over 49 fragments were subjected to DHPLC analysis. Sixteen DNA samples
were screened per fragment in each ethnic group. This sample size is estimated to have
>99% power to detect SNPs with minor allele frequency of at least 10% (Kruglyak and

Nickerson, 2001). Samples displaying differential DHPLC profiles were re-amplified from
genomic DNA and sequenced in order to identify the nature and location of the sequence
variant.
Six cSNPs comprising of five missense and one silent SNPs, as well as two
5’UTR SNPs were identified. Figure 5.9 shows the DHPLC elution profiles of
representative samples harbouring putative variants. The 5’UTR SNPs, 237indelG and
296C>G, were initially identified from the same PCR fragment during DHPLC analysis.
None of these exonic SNPs are considered novel as they had been documented prior to
or during the course of the DHPLC analysis (Pullinger et al., 2000; Wang et al., 2000;
Clee et al., 2001). All exonic SNPs were detected in at least one representative DNA from
each of the local ethnic groups.
We predicted the potential functional significance of the missense cSNPs using
various criteria. Based on Grantham and BLOSUM62 scores, all the missense SNPs,

×