Tải bản đầy đủ (.pdf) (12 trang)

Báo cáo khoa học: Modified PCR methods for 3¢ end amplification from serial analysis of gene expression (SAGE) tags doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (348.49 KB, 12 trang )

REVIEW ARTICLE
Modified PCR methods for 3¢ end amplification from serial
analysis of gene expression (SAGE) tags
Wang-Jie Xu
1
, Zhao-Xia Wang
1
and Zhong-Dong Qiao
1,2
1 College of Life Science and Technology, Bio-X Research Center, Key Laboratory of Developmental Genetics and Neuropsychiatric
Diseases (Ministry of Education), Shanghai Jiao Tong University, China
2 Shanghai Institute of Medical Genetics, Shanghai Jiao Tong University, China
Introduction
The challenge of the so-called ‘post-genomic’ era will
be extracting biological information on a large scale
from the available sequence data [1]. Such studies will
include annotation from genes to genome [2] and
large-scale gene expression screens [3]. A set of high-
throughput techniques is required to complete the
work. Ever since it was proposed by Velculescu et al.
[4], the SAGE method has been used for transcription
analysis in many species, with collection of a large
number of 14-base SAGE tags [5–12]. By analyzing a
short sequence tag representing a transcript, SAGE
significantly decreases the overall scale of the sequenc-
ing analysis, and makes it possible to analyze nearly
all of the expressed transcripts from the genome, a
capability that is unmatched by any other currently
available method [13]. Application of the SAGE tech-
nique has provided valuable information in various
biological systems [14–17] and in transcriptome charac-


terization and genome annotation [18–21].
Although SAGE is doubtless a useful and high-
throughput technique for transcriptomics, it neverthe-
less has a disadvantage. The size of the SAGE tag, 14
Keywords
3¢ longer fragment cDNA; generation of
longer cDNA fragments from SAGE tags for
gene identification (GLGI); high-throughput;
methods; mRNA; rapid RT-PCR analysis of
unknown SAGE tags (RAST-PCR); reverse
SAGE (rSAGE); serial analysis of gene
expression (SAGE); tag; two-step analysis of
unknown SAGE tags (TSAT-PCR)
Correspondence
Z. Qiao, College of Life Science and
Technology, Shanghai Jiao Tong University,
800 Dongchuan Road, 200240 Shanghai,
China
Fax:+86 21 5474 7330
Tel: +86 21 3420 4925
E-mail:
(Received 11 January 2009, revised 11
February 2009, accepted 24 February 2009)
doi:10.1111/j.1742-4658.2009.06981.x
Serial analysis of gene expression (SAGE) is a powerful technique to study
gene expression at the genome level. However, a disadvantage of the short-
ness of SAGE tags is that it prevents further study of SAGE library data,
thus limiting extensive application of the SAGE method in gene expression
studies. However, this problem can be solved by extension of the SAGE
tags to 3¢ cDNAs. Therefore, several methods based on PCR have been

developed to generate a 3¢ longer fragment cDNA corresponding to a
SAGE tag. The list of modified methods is extensive, and includes rapid
RT-PCR analysis of unknown SAGE tags (RAST-PCR), generation of
longer cDNA fragments from SAGE tags for gene identification (GLGI), a
high-throughput GLGI procedure, reverse SAGE (rSAGE), two-step analy-
sis of unknown SAGE tags (TSAT-PCR), etc. These procedures are
constantly being updated because they have characteristics and advantages
that can be shared. Development of these methods has promoted the wide-
spread use of the SAGE technique, and has accelerated the speed of studies
of large-scale gene expression.
Abbreviations
GLGI, generation of longer cDNA fragments from SAGE tags for gene identification; Ptag, tag-specific primer; RAST-PCR, rapid RT-PCR
analysis of unknown SAGE tags; rSAGE, reverse SAGE; SAGE, serial analysis of gene expression; TSAT-PCR, two-step analysis of unknown
SAGE tags.
FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS 2657
bases, is frequently too short to unequivocally identify
the gene of origin [22]. Numerous improvements to the
original technology have been proposed [20,23–27],
including the production of longer tags, which have
improved the specificity of tag-to-gene mapping [7,24],
and modifications designed to facilitate library construc-
tion from nanogram quantities of total RNA [23,28].
The improved techniques can extend the length of tags
to 21 bases and 26 bases [1,22], but the tag size still
causes problems during the process of gene identification
from the tag of SAGE. If entered as a query in a BLAST
search, many SAGE tags [14,29] and LongSAGE tags
do not match any known expressed sequences in data-
bases [30,31]. On the other hand, the same tag frequently
matches two or more gene sequences, which confounds

further analysis [32]. If SAGE is applied to organisms
for which no DNA database is available, it is necessary
to recover a longer DNA sequence adjacent to the tag
by experiment, and to further annotate this longer frag-
ment by BLAST search, i.e. it is necessary to isolate a 3¢
longer fragment or full-length cDNA from the SAGE
tag for gene identification.
The most difficult and key step in isolation of the
full-length cDNA is generation of the 3¢ longer frag-
ment cDNA from the SAGE tag. Thus, several strate-
gies have been developed: rapid RT-PCR analysis of
unknown SAGE tags (RAST-PCR) [33], generation of
longer cDNA fragments from SAGE tags for gene
identification (GLGI) [34], a high-throughput GLGI
procedure [32], the two-step analysis of unknown
SAGE tags (TSAT-PCR) [36], and modified reverse
SAGE (rSAGE) [35], etc. These methods have differing
characteristics and advantages.
Rapid RT-PCR analysis of unknown
SAGE tags
Rapid RT-PCR analysis of unknown SAGE tags
(RAST-PCR) was developed in 1999 by van den
Berg et al. [33]. In brief, an oligo(dT)
24
primer with
a5¢ M13 tail [5¢-CTAGTTGTAAAACGACGGC
CAG(T)
24
-3¢] replaced a conventional oligo(dT) primer
for synthesizing first-strand cDNA to serve as PCR

template. For PCR, the tag-specific primer was used as
the sense primer, and consisted of the 14-base tag
nucleotides and 5¢ inosine nucleotides to increase the
annealing temperature of the primers (5¢-IIIII-CATG-
tag sequence-3¢). In addition, a 20-base primer
(20bM13) corresponding to the 5¢ tail of the oligo(dT)
primer (5¢-AGTTGTAAAACGACGGCCAG-3¢) was
used as the antisense primer.
However, using the M13 sequence connected to the
oligo(dT) as the antisense primer for PCR will
generate multiple fragments with different sizes or
a smear caused by inclusion of various lengths of
poly(dA) ⁄ (dT) sequences [34]. The reason for this is
that oligo(dT) primers anneal randomly along the
poly(A) sequences found in mRNA templates during
the process of cDNA synthesis.
Generation of longer cDNA fragments
from SAGE tags for gene identification
(GLGI)
To avoid the disadvantage mentioned above in the
RAST-PCR method, Chen et al. proposed a new
method called generation of longer cDNA fragments
from SAGE tags for gene identification (GLGI) [34].
The improvement in GLGI is that an anchored oli-
go(dT) primer is used as the antisense primer in PCR,
replacing the M13 sequence in the RAST-PCR
method. Figure 1 shows the general strategy. Briefly,
in the first PCR cycle, sense primers containing the
SAGE tags will anneal to specific cDNA templates
and extension will proceed from these primers. In the

second cycle, extension will occur only from the
anchored oligo(dT) primer that annealed at the 5¢ end
of the poly(dA) sequences with the anchored nucleo-
tide, and all other anchored primers annealed along
the poly(dA) sequences will not be extended because
the anchor nucleotide of primers is not paired with the
template. In the following cycles, only the cDNA tem-
plates containing the SAGE tag sequence will undergo
exponential amplification. Thus, only copies of the
same size will be generated. The GLGI method has
been widely applied to exploit many kinds of SAGE
data for characterization of the eukaryotic transcrip-
tome [7,12,18,36,37], and has even been used to ana-
lyze data obtained by massively parallel signature
sequencing [38].
As the antisense primer in GLGI is an oligo(dT), rig-
orous PCR conditions, Pfu DNA polymerase, Mg
2+
concentration, number of PCR cycles and the anneal-
ing temperature, must be optimized for each SAGE tag
[39]. In a typical SAGE project, hundreds or even thou-
sands of SAGE tags need to be analyzed. Therefore,
the method is very labor-intensive to analyze all SAGE
tags, and does not facilitate large-scale analyses.
High-throughput GLGI
The original GLGI technique was subsequently
developed into a high-throughput procedure for
simultaneous conversion of a large number of SAGE
tags into corresponding 3¢ cDNAs. Figure 2 shows a
schematic of the high-throughput GLGI procedure

Methods for 3¢ end amplification from SAGE tags W J. Xu et al.
2658 FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS
AB
Fig. 1. Schematic for GLGI. (A) In this process, first-strand cDNA synthesized by oligo(dT) is used for PCR. In the first cycle, the template
with the SAGE tag binding site is annealed to the sense primer and extended to the end of the template. In the second cycle, extension
occurs only from the anchored oligo(dT) primer that has annealed and paired correctly at the start of poly(dA) sequences. Exponential amplifi-
cation occurs only for the template with the SAGE tag-binding site. (B) The result of GLGI is conversion of 14 nucleotides of SAGE tag to a
hundred bases of 3¢ cDNA fragment. This figure is reproduced from [34].
Fig. 2. Schematic of the high-throughput
GLGI procedure. This figure is reproduced
from [32] with permission.
W J. Xu et al. Methods for 3¢ end amplification from SAGE tags
FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS 2659
mRNA
mRNA
NBAAAAAAA-3′
NBAAAAAAA-3′
NBAAAAAAA
Modified oligo (dT)
NVTTTTTTT
NBAAAAAAA
NVTTTTTTT
NBAAAAAAA
NVTTTTTTT
NBAAAAAAA
NVTTTTTTT
NBAAAAAAA
NVTTTTTTT
16
16

16
16
16
16
16
16
5′
5′
5′-cap oligo
NVTTTTTTT
NVTTTTTTT
NBAAAAAAA
NVTTTTTTT
GGG
CCC
GGG
CCC
GGG
CCC
GGG
CCC
GGG
CCC
Anneal first strand
Primer to mRNA
cDNA first
strand synthesis
Modified oligo (dT)
Tag-specific primer
UP-I

The 2
nd
PCR
The 1
st

PCR
UP-II
UP-II
PLF
PLR
GGATCC
GGATCC
GGATCC
cDNAs synthesis
cDNA library
AB
Fig. 3. Detailed mechanism of amplification of whole cDNAs and the TSAT-PCR technique. (A) Amplification of full-length cDNAs. (B) Proce-
dure for the nested PCR reactions. This figure is reproduced from [39].
Fig. 4. Schematic of the rSAGE method. Some steps are the same as those in the SAGE protocol. This figure is reproduced from [35] with
permission.
Methods for 3¢ end amplification from SAGE tags W J. Xu et al.
2660 FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS
(see the protocol in Scheme 1 in the Appendix for
more details). Compared with the original GLGI
technique, the high-throughput GLGI procedure has
several new features [32]: (a) 3¢ cDNAs after the last
CATG rather than full-length cDNAs are used as
the templates for GLGI amplification, decreasing the
complexity of the templates and thus reducing

possible nonspecific amplification; (b) the 3¢ cDNAs
are enriched by PCR to provide sufficient templates
for high-throughput GLGI analysis; (c) a single anti-
sense primer (5¢-ACTATCTAGAGCGGCCGCTT-3¢)
is used for the GLGI reaction, instead of the three
anchored oligo(dT) primers (dT
11
A, G, C) in the
original GLGI technique; (d) platinum DNA
polymerase rather than Pfu DNA polymerase is used
for GLGI amplification; and (e) direct precipitation
of GLGI-amplified products and cloning into vectors
without gel purification prevents the potential loss of
amplified products.
These changes will enhance the capabilities of the
revised GLGI procedure, and the modified process is
highly specific, low-cost and highly efficient [32]. How-
ever, the high-throughput GLGI procedure still
involves many steps, and is thus very time-consuming.
Moreover, clones may contaminate each other due to
incautious manipulation when all experiments are
simultaneously performed in a 96-well plate.
Two-step analysis of unknown SAGE
tags (TSAT-PCR)
Given the disadvantages of the above techniques,
Xu et al. proposed two-step analysis of unknown SAGE
tags (TSAT-PCR) based on the nested PCR and RACE
technology. The TSAT-PCR process can be divided into
two steps. First, full-length cDNAs are amplified by
PCR using RACE technology [40–42]. Then the

amplified cDNAs are used as templates for TSAT-PCR
to obtain the 3¢ cDNA fragments by two-step nested
PCR.
As shown in Fig. 3, TSAT-PCR has the following
key features. First, it uses a modified lock-docking
oligo(dT) primer, with two degenerate nucleotide
positions at the 3¢-end, as a reverse primer to synthe-
size the first-strand cDNA, which eliminates the 3¢
heterogeneity inherent in conventional oligo(dT) prim-
ing [14], thereby increasing the specificity in the sub-
sequent PCR. Second, amplification of full-length
cDNAs, especially low-abundance cDNAs, provides
sufficient templates not only for the subsequent PCR
but also for 5¢-RACE, 3¢-RACE and northern blot-
ting, etc. Finally, the two-step nested PCR principle
can easily produce 3¢-end cDNA tag-specific frag-
ments from the SAGE tags (see the protocol in
Scheme 2 in the Appendix for more details).
The greatest advantage of TSAT-PCR is that it can
easily obtain specific PCR products covering the
sequences of SAGE tags from those transcripts, espe-
cially low-abundance transcripts. However, like the
other methods, this method cannot completely avoid
nonspecific amplifications because short SAGE tags
can decrease the annealing temperature of PCR.
Modified reverse SAGE (rSAGE)
The rSAGE method was first developed in the Kin-
zler ⁄ Vogelstein laboratories [43,44] by using the SAGE
protocol as a reference to isolate cDNA fragments cor-
responding to novel SAGE tags that do not match

EST databases using a PCR-based method. Subse-
quently, Richards et al. amended the method and
proposed a modified reverse SAGE [35]. Many steps of
rSAGE are similar, even identical, to those of SAGE,
and many reagents are shared by these two protocols.
Figure 4 shows the detailed process (see protocol in
Scheme 3 in the Appendix for more details).
As shown in Fig. 4, steps 1–5 are shared by the
SAGE protocol, while the following two PCRs are the
essence of rSAGE. The first PCR enriches the 3¢ end
cDNA fragments as the templates for the second PCR
using primer 2 ⁄ M13F. Then specific products are
amplified using primer Ptag ⁄ M13F. Primer Ptag has
the 14-base tag sequence plus and additional 7–10
bases at the 5¢ ends of linker 2A, resulting in a primer
with a high melting temperature > 60 °C. Unlike the
methods described above, this primer (Ptag) eliminates
most nonspecific fragments resulting from use of a
short SAGE tag. Nevertheless, the rSAGE method
also has its deficiencies, and step 4 (linker ligation) of
the rSAGE protocol cannot avoid self-ligation of the
cDNA, which will lead to smearing in the subsequent
PCR amplification. In addition, the rSAGE method
requires more initial total RNA and poly(A)+,
because of the loss of RNA at each step [36].
Conclusions
In summary, despite many shortcomings, the methods
discussed above have various advantages. As a result
of their development, 3¢ or 5¢ longer cDNA fragments
can be extended from SAGE tags. Thus, more cDNA

sequences information about novel SAGE tags, e.g.
transcriptional start sites [45–47], polyadenylation sites
[20], homology analysis, etc. may be obtained. Thus,
full-length gene sequences can be isolated easily from
novel SAGE tags. Development of these methods has
W J. Xu et al. Methods for 3¢ end amplification from SAGE tags
FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS 2661
extended the widespread use of high-throughput
SAGE techniques, and hence accelerated the annota-
tion from tags to genes and genes to genome.
Acknowledgements
This work was supported by the Shanghai Leading
Academic Discipline Project (B205) and the National
Key Basic Research Program (2009CB941704).
References
1 Wahl MB, Heinzmann U & Imai K (2005) LongSAGE
analysis significantly improves genome annotation: iden-
tifications of novel genes and alternative transcripts in
the mouse. Bioinformatics 21, 1393–1400.
2 Stein L (2001) Genome annotation: from sequence to
biology. Nat Rev Genet 2, 493–503.
3 Kanehisa M & Bork P (2003) Bioinformatics in the
post-sequence era. Nat Genet 33, S305–S310.
4 Velculescu VE, Zhang L, Vogelstein B & Kinzler KW
(1995) Serial analysis of gene expression. Science 270,
484–487.
5 Velculescu VE, Zhang L, Zhou W, Vogelstein J, Basrai
MA, Bassett DE Jr, Hieter P, Vogelstein B & Kinzler
KW (1997) Characterization of the yeast transcriptome.
Cell 88, 243–251.

6 Yamashita T, Hashimoto S, Kaneko S, Nagai S,
Toyoda N, Suzuki T, Kobayashi K & Matsushima K
(2000) Comprehensive gene expression profile of a
normal human liver. Biochem Biophys Res Commun
269, 110–116.
7 Matsumura H, Reich S, Ito A, Saitoh H, Kamoun S,
Winter P, Kahl G, Reuter M, Kruger DH & Terauchi R
(2003) Gene expression analysis of plant host–pathogen
interactions by SuperSAGE. Proc Natl Acad Sci USA
100, 15718–15723.
8 Boon WM, Beissbarth T, Hyde L, Smyth G, Gunnersen
J, Denton DA, Scott H & Tan SS (2004) A comparative
analysis of transcribed genes in the mouse hypothala-
mus and neocortex reveals chromosomal clustering.
Proc Natl Acad Sci USA 101, 14942–14977.
9 Halaschek-Wiener J, Khattra JS, McKay S, Pouzyrev
A, Stott JM, Yang GS, Holt RA, Jones SJ, Marra MA,
Brooks-Wilson AR et al. (2005) Analysis of long-lived
C. elegans daf-2 mutants using serial analysis of gene
expression. Genome Res 15, 603–615.
10 Ryu EJ, Angelastro JM & Greene LA (2005) Analysis
of gene expression changes in a cellular model of Par-
kinson disease. Neurobiol Dis 18, 54–75.
11 Zhao YX, Li QL, Yao CJ, Wang ZX, Zhou Y, Wang
YJ, Liu LM, Wang YF, Wang LY & Qiao ZD (2006)
Characterization and quantification of mRNA tran-
scripts in ejaculated spermatozoa of fertile men by serial
analysis of gene expression. Human Reprod 21, 1583–
1590.
12 Li QL, Zhao YX, Ni B, Yao CJ, Zhou Y, Xu WJ,

Wang ZX & Qiao ZD (2008) Comparison of the expres-
sion profiles of promastigotes and axenic amastigotes in
Leishmania donovani using serial analysis of gene
expression. Parasitol Res 103, 821–828.
13 Velculescu VE, Madden SL, Zhang L, Lash AE, Yu J,
Rago C, Lal A, Wang CJ, Beaudry GA, Ciriello KM
et al. (1999) Analysis of human transcriptomes. Nat
Genet 23, 387–388.
14 Zhang L, Zhou W, Velculescu VE, Kern SE, Hruban
RH, Hamilton SR, Vogelstein B & Kinzler KW (1997)
Gene expression profiles in normal and cancer cells.
Science 276, 1268–1272.
15 Madden SL, Galella EA, Zhu J, Bertelsen AH & Beau-
dry GA (1997) SAGE transcript profiles for p53-depen-
dent growth regulation. Oncogene 15, 1079–1085.
16 Hibi K, Liu Q, Beaudry GA, Madden SL, Westra WH,
Wehage SL, Yang SC, Heitmiller RF, Bertelsen AH,
Sidransky D et al. (1998) Serial analysis of gene expres-
sion in non-small cell lung cancer. Cancer Res 58, 5690–
5694.
17 Hashimoto S, Suzuki T, Dong HY, Nagai S, Yamazaki
N & Matsushima K (1999) Serial analysis of gene
expression in human monocyte-derived dendritic cells.
Blood 94, 845–852.
18 Peters BA, Croix BS, Sjo
¨
blom T, Cummins JM, Silli-
man N, Ptak J, Saha S, Kinzler KW, Christos Hatzis C
& Velculescu VE (2007) Large-scale identification of
novel transcripts in the human genome. Genome Res 17,

287–292.
19 Rivals E, Boureux A, Lejeune M, Ottones F, Pe
´
rez OP,
Tarhio J, Pierrat F, Ruffle F, Commes T & Marti J
(2007) Transcriptome annotation using tandem SAGE
tags. Nucleic Acids Res 35, e108, doi:10.1093/nar/
gkm495.
20 Wei CL, Ng P, Chiu KP, Wong CH, Ang CC, Lipovich
L, Liu ET & Ruan Y (2004) 5¢ Long serial analysis of
gene expression (LongSAGE) and 3¢ LongSAGE for
transcriptome characterization and genome annotation.
Proc Natl Acad Sci USA 101, 11701–11706.
21 Khattra J, Delaney AD, Zhao Y, Siddiqui A, Asano J,
McDonald H, Pandoh P, Dhalla N, Prabhu AL, Ma K
et al. (2007) Large-scale production of SAGE libraries
from microdissected tissues, flow-sorted cells, and cell
lines. Genome Res 17, 108–116.
22 Matsumura H, Reuter M, Kru
¨
ger DH, Winter P, Kahl
G & Terauchi R (2008) SuperSAGE. Methods Mol Biol
387, 55–70.
23 Peters DG, Kassam AB, Yonas H, O’Hare EH, Ferrell
RE & Brufsky AM (1999) Comprehensive transcript
analysis in small quantities of mRNA by SAGE-lite.
Nucleic Acids Res 27, e39.
Methods for 3¢ end amplification from SAGE tags W J. Xu et al.
2662 FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS
24 Saha S, Sparks AB, Rago C, Akmaev V, Wang CJ,

Vogelstein B, Kinzler KW & Velculescu VE (2002)
Using the transcriptome to annotate the genome. Nat
Biotechnol 20, 508–512.
25 Gowda M, Jantasuriyarat C, Dean RA & Wang GL
(2004) Robust-LongSAGE (RL-SAGE): a substantially
improved LongSAGE method for gene discovery
and transcriptome analysis. Plant Physiol 134,
890–897.
26 Heidenblut AM, Lu
¨
ttges J, Buchholz M, Heinitz C,
Emmersen J, Nielsen KL, Schreiter P, Souquet M,
Nowacki S, Herbrand U et al. (2004) aRNA-long-
SAGE: a new approach to generate SAGE
libraries from microdissected cells. Nucleic Acids Res
32, e131.
27 Kodzius R, Kojima M, Nishiyori H, Nakamura M,
Fukuda S, Tagami M, Sasaki D, Imamura K, Kai C,
Harbers M et al. (2006) CAGE: cap analysis of gene
expression. Nat Methods 3, 211–222.
28 Neilson L, Andalibi A, Kang D, Coutifaris C, Strauss
JF III, Stanton JA & Green DP (2000) Molecular phe-
notype of the human oocyte by PCR-SAGE. Genomics
63, 13–24.
29 Lee S, Chen J, Zhou G & Wang SM (2001) Generation
of high quality and quantity of tag ⁄ ditag for SAGE
analysis. BioTechniques 31, 348–354.
30 Wahl M, Shukunami C, Heinzmann U, Hamajima K,
Hiraki Y & Imai K (2004) Transcriptome analysis of
early chondrogenesis in ATDC5 cells induced by bone

morphogenetic protein 4. Genomics 83, 45–58.
31 Siddiqui AS, Khattra J, Delaney AD, Zhao Y, Astell C,
Asano J, Babakaiff R, Barber S, Beland J, Bohacec S
et al. (2005) Large-scale digital gene-expression profiles
from precisely defined developing C57BL ⁄ 6J mouse tis-
sues and cells. Proc Natl Acad Sci USA 102, 18485–
18490.
32 Chen J, Lee S, Zhou G & Wang SM (2002) High-
throughput GLGI procedure for converting a large
number of serial analysis of gene expression tag
sequences into 3¢ complementary DNAs. Genes Chromo-
somes Cancer 33, 252–261.
33 van den Berg A, van der Leij J & Poppema S (1999)
Serial analysis of gene expression: rapid RT-PCR
analysis of unknown SAGE tags. Nucleic Acids Res
27, e17.
34 Chen JJ, Rowley JD & Wang SM (2000) Generation of
longer cDNA fragments from serial analysis of gene
expression tags for gene identification. Proc Natl Acad
Sci USA 97, 349–353.
35 Richards M, Tan SP, Chan WK & Bongso A (2006)
Reverse serial analysis of gene expression (SAGE)
characterization of orphan SAGE tags from human
embryonic stem cells identifies the presence of novel
transcripts and antisense transcription of key pluripo-
tency genes. Stem Cells 24, 1162–1173.
36 Vanpoucke G, Orr B, Grace OC, Chan R, Ashley GR,
Williams K, Franco OE, Hayward SW & Thomson AA
(2007) Transcriptional profiling of inductive mesen-
chyme to identify molecules involved in prostate devel-

opment and disease. Genome Biol 8, R213.
37 Wu SM, Baxendale V, Chen Y, Pang AL, Stitely T,
Munson PJ, Leung MY, Ravindranath N, Dym M,
Rennert OM et al.
(2004) Analysis of mouse
germ-cell transcriptome at different stages of spermato-
genesis by SAGE: biological significance. Genomics 84,
971–981.
38 Silva AP, Chen J, Carraro DM, Wang SM & Camargo
AA (2004) Generation of longer 3¢ cDNA fragments
from massively parallel signature sequencing tags.
Nucleic Acids Res 32, e94, doi:10.1093/nar/gnh095.
39 Xu WJ, Li QL, Yao CJ, Wang ZX, Zhao YX & Qiao
ZD (2008) Semi-nested PCR analysis of unknown tags
on serial analysis of gene expression. FEBS J 275,
5422–5428.
40 Schaefer BC (1995) Revolutions in rapid amplification
of cDNAends: new strategies for polymerase chain reac-
tion cloning of full-length cDNA ends. Anal Biochem
227, 255–273.
41 Frohman MA (1993) Rapid amplification of comple-
mentary DNA ends for generation of full-length com-
plementary DNAs: thermal RACE. Meth Enzymol 218,
340–356.
42 Das M, Harvey I, Chu LL, Sinha M & Pelletier J
(2001) Full-length cDNAs: more than just reaching the
ends. Physiol Genomics 6, 57–80.
43 Polyak K, Xia Y, Zweier JL, Kinzler KW &
Vogelstein B (1997) A model for p53-induced apoptosis.
Nature 389, 300–305.

44 Yu J, Zhang L, Hwang PM, Rago C, Kinzler KW &
Vogelstein B (1999) Identification and classification of
p53-regulated genes. Proc Natl Acad Sci USA 96,
14517–14522.
45 Hashimoto S, Suzuki Y, Kasai Y, Morohoshi K,
Yamada T, Sese J, Morishita S, Sugano S &
Matsushima K (2004) 5¢-End SAGE for the analysis
of transcriptional start sites. Nat Biotechnol 22, 1146–
1149.
46 Kasai Y, Hashimoto S, Yamada T, Sese J, Sugano S,
Matsushima K & Morishita S (2005) 5¢SAGE: 5¢-end
serial analysis of gene expression database. Nucleic
Acids Res 33, D550–D552.
47 Harbers M & Carninci P (2005) Tag-based approaches
for transcriptome research and genome annotation. Nat
Methods 2, 495–502.
W J. Xu et al. Methods for 3¢ end amplification from SAGE tags
FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS 2663
Appendix
Scheme 1 – high-throughput GLGI procedure
Step 1. cDNA synthesis with 5¢ biotinylated, 3¢anchored oligo(dT) primers
Prepare total RNA using Trizol RNA isolation reagent (Invitrogen, Carlsbad, CA, USA), and isolate mRNA from total
RNA using oligo(dT)
25
Dynabeads (Dynal, Oslo, Norway). Synthesize poly(dA ⁄ dT) cDNAs using 5¢ biotinylated,
3¢anchored oligo(dT) primers.
Step 2. NlaIII digestion
Digest double-stranded cDNAs NlaIII [see SAGE protocol (www.sagenet.org/protocol) for more details].
Step 3. Binding of biotinylated cDNA to magnetic beads
Isolate 3¢ cDNAs using streptavidin beads (Dynal) (see SAGE protocol).

Step 4. Linker ligation
Ligate SAGE linker A or B to the 3¢ cDNAs bound to the streptavidin beads (see SAGE protocol). After ligation, wash
beads four times with 2 x B+W [10 mm Tris-HCl (pH 7.5), 1 mm EDTA, 2 m NaCl; store at room temperature], transfer
the last wash mixture to clean microfuge tubes. Proceed immediately to the next step.
Step 5. 3¢ cDNA amplification
Make several dilutions of the ligation product. Usually 1 lLof1⁄ 50 and 1 ⁄ 300 dilutions are recommended as tem-
plates for PCR. Perform PCR using the SAGE sense primer and the universal antisense primer with platinum Taq
polymerase (Life Technologies, Gaithersburg, MD) and an annealing temperature of 55 °C. Purify and resuspend the
amplified templates in Tris-EDTA (TE) buffer, pH 8.0 for GLGI amplification.
Step 6. Prepare GLGI master mixture and all tag-specific primers
Prepare GLGI master mixture (described below) containing the GLGI antisense primer and the amplified cDNA
templates (step 5 above), and DNA polymerase. Synthesize tag-specific primers in a 96-well plate (Integrated DNA
Technologies, Coralville, IA), with each tag-specific primer included in a single well. Adjust the concentration of primers
to 50 ngÆlL
)1
using TE.
Step 7. GLGI amplification
Transfer the GLGI master mixture by aliquots into a 96-well PCR plate (Applied Biosystems, Foster City, CA), and
then add 1.4 lL each tag-specific primer (70 ng) to each well. Precipitate the amplified products directly in the PCR
plate by addition to each well of 100 lL of precipitation mixture I (described below). Seal the plate vortex, and keep
them at room temperature for 15 min. Centrifuge at 4000 g for 35 min at 4 °C, then remove the supernatants. Add
150 lL of 70% ethanol per well, and centrifuge again at 4000 g for 15 min to wash the DNA pellets. Remove the super-
natants. Dry the pellets in air, and dissolve in 5 lLH
2
O.
Step 8. Cloning GLGI products
Clone GLGI-amplified cDNAs in a 96-well plate into vector for sequencing.
Step 9. Amplifying GLGI clones
Use direct-colony PCR to amplify GLGI clones, and screen four colonies if there are ‡ 50 copies tag, or six colonies if
< 50 copies of tag. Transfer the PCR master mixtures as aliquots into 96-well PCR plates at 25 lL per well. Move the

Methods for 3¢ end amplification from SAGE tags W J. Xu et al.
2664 FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS
colonies directly from plates to wells using sterile pipette tips. Precipitate the PCR products by addition of 75 lLof
precipitation mixture II (described below) per well. Seal, vortex, and keep the plate at room temperature for 5 min.
Centrifuge the plate at 4000 g for 35 min at 4 °C, and remove the supernatants. Add 150 lL of 70% ethanol per well to
wash the DNA pellets. Centrifuge the plate at 4000 g for 25 min at 4 °C, and remove the supernatants. Dry the pellets in
air and dissolve in 10 lLH
2
O.
Step 10. Sequencing GLGI clones
Perform the sequencing reaction in a total volume of 7 lL (including 4 lL master sequencing mixtures (described below)
and 3 lL DNA template) for each reaction. Precipitate the sequencing products by addition of 75 lL of precipitation
mixture III (described below) per well. Seal, vortex, and keep the plate at room temperature for 2–5 min. Centrifuge the
plate at 4000 g for 35 min at 4 °C, and remove the supernatant. Add 150 lL of 70% ethanol per well to wash the DNA
pellets. Centrifuge the plate at 4000 g for 15 min at 4 °C, and remove the supernatants. Dry the pellets in air and dissolve
in 3 lL of sequencing loading dye (Applied Biosystems). Load the sequencing products (0.7–1 lL) onto a 5–6% sequenc-
ing gel with 96 lanes, and determine the sequences using an ABI377 sequencer (Applied Biosystems).
Step 11. Verify the tag sequence
Once the complete sequence of the PCR product has been obtained, check the tag to see whether its location is right after
the last CATG site before the polyA signal in the PCR product, if possible, before further analysis. Match the sequences
to the GenBank database using BLAST ( to determine whether the sequences
generated from SAGE tags match known sequences, mismatch, or match unknown sequences in GenBank.
Reagents and primers
GLGI master mixture (28.6 lL per reaction): 3 lL10·PCR buffer, 1.2 lL MgCl
2
(50 mm), 0.6 lL dNTPs
(10 mm), 1.4 lL universal antisense primer (50 ngÆlL
)1
), 3 lL (0.5–5 ng) amplified cDNA templates (step 5
above), 0.3 lL platinum Taq DNA polymerase (5 UÆ lL

)1
), and ddH
2
O up to a total volume of 28.6 lL.
Precipitation mixture I: 1 lL of glycogen (20 mgÆmL
)1
; Roche, Indianapolis, IN), 15 lL of 7.5 m NH
4
Ac, and
84 lL of 100% ethanol.
Precipitation mixture II: 22 lLofH
2
O, 15 lLof2m NaClO
4
, and 38 lL of 2-propanol.
Precipitation mixture III: 64 lL of a 100% ethanol ⁄ 3 m NaAc mixture (25 : 1), 1 lL of glycogen (20 mgÆmL
)1
;
Roche), and 10 lLofH
2
O.
Master sequencing mixtures: 0.7 lL of Big-Dye premixture (Applied Biosystems), 1.5 lL of dilution buffer
(400 mm Tris ⁄ HCl at pH 9.0 and 10 mm MgCl
2
), 0.3 lL of 100 ngÆlL
)1
sequencing primer (M13 forward
primer or M13 reverse primer), and 1.5 lLofH
2
O.

5¢ Biotinylated, 3¢anchored oligo(dT) primers:
5¢biotin-ATCTAGAGCGGCCGC-T
16
VN-3¢, where N = A, C, G or
T, and V = A, G or C.
Linker A: 5¢-TTTGGATTTGCTGGTGCAGTACAACTAGGCTTAATAGGGACATG-3¢ and 5¢-pTCCCT ATTAA
GCCTAGTTGTACTGCACCAGCAAATCC-amino modified C
7
-3¢.
Linker B: 5¢-TTTCTGCTCGAATTCAAGCTTCTAACGATGTACGGGGACATG-3¢ and 5¢-pTCCCCGTACATCGT
TAGAAGCTTGAATTCGAGCAG-amino modified C
7
-3¢
SAGE sense primer: 5¢-GGATTTGCTGGTGCAGTACA-3¢ for linker A or 5¢-CTGCTCGAATTCAAGCTTCT-3¢ for
linker B.
Universal antisense primer: 5¢-ACTATCTAGAGCGGCCGCTT-3¢.
Tag-specific primer: 5¢-GGATCCCATGXXXXXXXXXX-3¢, where GGATCC is the BamHI site, and CATG(X)
10
is
the tag.
W J. Xu et al. Methods for 3¢ end amplification from SAGE tags
FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS 2665
Scheme 2 – protocol for TSAT-PCR
Step 1. Full-length double-stranded cDNA synthesis with modified lock-docking oligo(dT) and 5¢-cap
oligonucleotide primer
Prepare total RNA using Trizol RNA isolation reagent (Invitrogen). Synthesize double-stranded cDNAs using the
modified oligo(dT) primer and the 5¢-cap oligonucleotide primer and PrimeScript reverse transcriptase (TaKaRa, Dalian,
China) according to the manufacturers’ instructions.
Step 2. Amplification of the full-length cDNAs
Use 1 lL of the RT-PCR reaction liquor as template to enrich the full-length cDNAs by PCR using primers PLF and

PLR and Takara Ex Taq Hot Start.
Step 3. PCR amplification with tag-specific primer (Ptag) and UP-I
Make several dilutions of the amplified full-length cDNAs above: 1 lLofa1⁄ 1000 dilution is recommended for PCR
with tag-specific primer (Ptag) and UP-I, but this may differ by a factor of 10 in different experiments.
Step 4. PCR amplification with tag-specific primer (Ptag) and UP-II
Dilute the resulting PCR product (step 3) to 1 ⁄ 500–1 ⁄ 2500 using sterile H
2
O, and use a 1 lL aliquot of a 1/1000 dilution
as template for the subsequent PCR with tag-specific primer (Ptag) and UP-II.
Step 5. PCR product isolation and sequencing
After visualizing specific PCR product on an agarose gel, excise and purify individual bands, and cloned into T-vectors.
Step 6. Verification of tag sequence
Once the complete sequence of the PCR product has been obtained, check the tag to see whether its location is right after
the last CATG site before the polyA signal in the PCR product, if possible, before further analysis.
Primers
Modified oligo(dT) primer:
5¢-CCAGACACTATGCTCATACGACGCAG(T)
16
VN-3¢, where N = A, C, G or T, and
V = A, G or C.
5¢-cap oligonucleotide primer: 5¢-AAGCAGTGGTATCAACGCAGAGTACGCGGG-3¢
PLF: 5¢-AAGCAGTGGTATCAACGCAGAGT-3¢
PLR: 5¢-CCAGACACTATGCTCATACGACG-3¢
Tag-specific primer: 5¢-GGATCCCATGXXXXXXXXXX-3¢, where GGATCC is the BamHI site, and CATG(X)
10
is
the tag.
UP-I: 5¢-CCAGACACTATGCTCATA-3¢
UP-II: 5¢-CACTATGCTCATACGACGCAGT-3 ¢
Scheme 3 – modified reverse SAGE (rSAGE) method

Step 1. cDNA synthesis with modified oligo(dT) primer
Prepare total RNA and polyA+ (approximately 2–5 lg is needed for good representation) as described in the SAGE
protocol (www.sagenet.org/protocol) 1st strand cDNA using a Superscript choice system for cDNA synthesis kit (Gib-
coBRL catalog number 18090019) primed using gel-purified RT primer oligo, and save one-tenth of the volume for elec-
trophoresis. Follow the SAGE protocol (www.sagenet.org/protocol) instructions for 2nd strand synthesis. Precipitate the
Methods for 3¢ end amplification from SAGE tags W J. Xu et al.
2666 FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS
resulting cDNA, resuspend it in 22 lL LoTE [3 mm Tris-HCl (pH 7.5), 0.2 mm EDTA (pH 7.5) in dH
2
O; store at 4 °C]
and save 2 lL for electrophoresis according to the SAGE protocol (www.sagenet.org/protocol) instructions. Analyze the
saved 1st and 2nd strand cDNA products together with a 1 kb ladder on a 1.2% agarose gel stained with ethidium bro-
mide for 2 h. Proceed if the expected pattern is observed.
Step 2. NlaIII digestion
See SAGE protocol.
Step 3. Binding of biotinylated cDNA to magnetic beads
See SAGE protocol.
Step 4. Linker ligation
See SAGE protocol. Only linkers A1 and A2 are used in rSAGE. They are used at one-fifth of the amount used in SAGE
the SAGE protocol (www.sagenet.org/protocol). The same linker prepared for SAGE can be used if it is less than
3 months old. Otherwise, linker A1 and A2 should be treated with T4 Ligase and tested for self-ligation as described in
the SAGE protocol. Divide the beads intoclean microfuge tubes and proceed. After ligation, wash beads four times with
2 · B+W, transfer the last wash mixture into clean microfuge tubes. Proceed immediately to the next step.
Step 5. Asc I digestion to release the 3¢ cDNA fragments from the beads
Resuspend the beads and add AscI components into tubes. After digestion, collect supernatant carefully using a magnet.
Extract once with PC8 [480 mL phenol (heat to 65 °C), 320 mL 0.5 m Tris-HCl (pH 8.0), 640 mL chloroform; add in
sequence, shake, and place at 4 °C; after 2–3 h shake again; after another 2–3 h, aspirate aqueous layer; aliquot and store
at )20 °C] and high-concentration ethanol precipitate (see SAGE protocol), and resuspend DNA in 25 lL LoTE. The
ligation is the primary rSAGE library.
Step 6. Generation of amplified rSAGE library by PCR using primers rSAGEF1 and rSAGER1

Make several dilutions of the ligation product: 1 lLof1⁄ 50 and 1 ⁄ 300 dilutions is recommended for PCR, but can differ
by a factor of 10 due to variations in yield. Analyze 10 lL of the PCR product on a 4–20% Novex gel (Invitrogen,
Carlsbad, CA, USA) along with 1 kb ladder. A strong smear (predominantly in the 100–700 bp range) should be visible
after staining with ethidium bromide. The PCR products of these dilutions are the amplified rSAGE library.
Step 7. SAGE tag-specific primer
Design a gene-specific forward primer in such a way that the bold letters [14 bases or 15 bases if an additional base can
be determined by sage 2000 version 4.5 software ()] are included, plus an additional 7–10 bases 5¢
of linker 2A. Hence, the resulting primer has a melting temperature of approximately 60 °C [calculated as 4 · the number
of G ⁄ C bases + 2 · the number of A ⁄ T bases], i.e. 5¢-AAGCAGTGGTATCAACGCAGAGT
CATGXXXXXXXXXXX-
3¢, where GGATCC is the BamHI site, and CATG(X)
10
is the tag.
Step 8. PCR amplification with tag-specific primer (TSP)
At this stage, try to separate amplifications using various TSP ⁄ rSAGER1 primer pairs to obtain specific PCR products
for each of the novel tags using 1 lLof1⁄ 50 and 1 ⁄ 300 dilutions of amplified rSAGE library as template. Hot-start and
touchdown PCR are highly recommended. PCR conditions may require further optimization in terms of cycle number,
template dilutions (amplified rSAGE) and temperature.
W J. Xu et al. Methods for 3¢ end amplification from SAGE tags
FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS 2667
Step 9. PCR product isolation and sequencing
After visualizing specific PCR product on an agarose gel, excise and purify individual bands. Sequencing of purified PCR
product can be performed by the standard dideoxynucleotide termination method either manually or by an automatic
sequencer.
Step 10. Verify the tag sequence
Once the complete sequence of the PCR product has been obtained, check the tag to see whether its location is right after the
last CATG site before the polyA signal in the PCR product, if possible, before further analysis. If the presence of the tag
is confirmed, the additional nucleotide sequence obtained, usually hundreds of base pair long, can assist BLAST searches.
If it is still not possible to match it to known sequence or ESTs, a novel gene might have been obtained. The purified
PCR product can be used as probe for northern blot analysis or library screening. The additional sequence can also help to

design primers for 5¢ RACE to isolate full-length cDNA if the mRNA is small.
Primers and linkers
RT primer:
5¢-(biotin)ATTGGCGCGCCGCGAGCACTGAGTCAATACGA(T)
30
VN-3¢
rSAGER1: 5¢-GCGAGCACTGAGTCAATACGA-3¢
rSAGEF1: 5¢-AAGCAGTGGTATCAACGCAGAGT-3¢
TSP (tag-specific primer): See step 7.
Linker A1: 5¢-AAGCAGTGGTATCAACGCAGAGTCATG-3¢
Linker A2: 5¢-ACTCT GCGTTGATAC CACTGCTT-am ino-modified C
7
-3¢
Methods for 3¢ end amplification from SAGE tags W J. Xu et al.
2668 FEBS Journal 276 (2009) 2657–2668 ª 2009 The Authors Journal compilation ª 2009 FEBS

×