BioMed Central
Page 1 of 7
(page number not for citation purposes)
Virology Journal
Open Access
Methodology
Universal primers for HBV genome DNA amplification across
subtypes: a case study for designing more effective viral primers
Qingrun Zhang
†1
, Guanghua Wu
†1
, Elliott Richards
2
, Shan'gang Jia
1
and
Changqing Zeng*
1
Address:
1
Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 101300, China and
2
Department of Biology, College of Biology and
Agriculture, Brigham Young University, Provo UT 84602, USA
Email: Qingrun Zhang - ; Guanghua Wu - ; Elliott Richards - ;
Shan'gang Jia - ; Changqing Zeng* -
* Corresponding author †Equal contributors
Abstract
Background: The highly heterogenic characteristic of viruses is the major obstacle to efficient
DNA amplification. Taking advantage of the large number of virus DNA sequences in public
databases to select conserved sites for primer design is an optimal way to tackle the difficulties in
virus genome amplification.
Results: Here we use hepatitis B virus as an example to introduce a simple and efficient way for
virus primer design. Based on the alignment of HBV sequences in public databases and a program
BxB in Perl script, our method selected several optimal sites for HBV primer design. Polymerase
chain reaction showed that compared with the success rate of the most popular primers for whole
genome amplification of HBV, one set of primers for full length genome amplification and four sets
of walking primers showed significant improvement. These newly designed primers are suitable for
most subtypes of HBV.
Conclusion: Researchers can extend the method described here to design universal or subtype
specific primers for various types of viruses. The BxB program based on multiple sequence
alignment not only can be used as a separate tool but also can be integrated in any open source
primer design software to select conserved regions for primer design.
Background
Chronic hepatitis B virus (HBV) infection is a major
health problem worldwide, affecting approximately 350
million people, and 500,000 to 1.2 million deaths world-
wide per year are attributed to HBV infection. Currently
there are eight accepted genotypes (A to H) for HBV, based
on one of the following criteria: an inter-group divergence
of 8% or greater in the complete genome nucleotide
sequence or a 4 ± 1% divergence of the surface antigen
gene [1,2]. It has been widely reported that it is possible
to have two HBV genotypes or recombinant types in one
infected individual [3-5].
The HBV reverse transcriptase (RT) is an error-prone
enzyme as a result of lacking 3'-5'-exonuclease proofread-
ing capacity. HBV, like other viruses such as HIV, HCV and
poliovirus, has a high mutation rate of 2 × 10-5/site/year
[6,7]and quasispecies distribution in infected individuals
Published: 24 September 2007
Virology Journal 2007, 4:92 doi:10.1186/1743-422X-4-92
Received: 28 June 2007
Accepted: 24 September 2007
This article is available from: />© 2007 Zhang et al; licensee BioMed Central Ltd.
This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( />),
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Virology Journal 2007, 4:92 />Page 2 of 7
(page number not for citation purposes)
[8]. This means that HBV circulates as a complex mixture
of genetically distinct but closely related variants that are
in equilibrium at a certain time point of infection in a
given circumstance. A mixture of HBV quasispecies is in
fact a mixture of HBV haplotypes, which is a more impor-
tant concept to researchers, such as in drug resistant
mutant studies-different haplotypes of HBV may repre-
sent different types of drug resistance [9-12].
Because of the existence of quasispecies, the only way to
obtain HBV haplotype sequences is through full length
genome amplification and clone-sequencing instead of
assembling the PCR sequences of several amplified frag-
ments of the genome. However, the partially double
stranded characteristic of HBV DNA structure causes the
instability of exposed HBV DNA and the low efficiency of
whole genome amplification. Günther et al. developed a
set of primers for full length HBV genome amplification,
with a restriction enzyme site for further cloning and func-
tion study [13,14]. The success rate reported in this paper
is one out of eight genomes (12%) amplified with Taq
polymerase, and seven out of seventeen genomes ampli-
fied with Taq-Pwo polymerases (41%). Further studies
showed similar success rate (40%). In our laboratory, 141
of 420 genomes amplified with Takara-LA polymerases
(34%) using this method. Tellier et al. developed two
pairs of primers for nested PCR. Those primers can
amplify nearly full length of HBV (3.12 kb), yet the whole
process is complicated, time consuming and may intro-
duce risk of cross contamination [15,16]. So it is not
widely used and no success rate has been reported.
The considerable number of HBV isolates with rather
divergent nucleotide sequences and the partially double-
stranded characteristic of HBV impose the need for
extreme care in the choice of primers for both full length
and fragment amplification.
In order to identify optimal sites for primer design, we uti-
lized 1020 whole genome sequences in public databases
(NCBI, EMBL and DDBJ) and 103 sequences in our labo-
ratory, and developed a program BxB to select conserved
regions as candidates for primer design. We testified those
primer designs in silico by e-PCR and real polymerase
chain reactions. One set of primers for nearly full length
HBV genome amplification (3181 bp, 40 bp shorter than
full length) and four sets of walking primers for fragment
amplification were finally obtained. These primer
sequences are within areas that are highly conserved
across all genome sequences available in public databases,
therefore the use of such primers makes it unlikely that
HBV strains are missed due to sequence variations and
allows further search for quasispecies as well as unknown
HBV genotypes and other subtypes.
Results
Identification of candidate regions for primer design by
BxB
We analyzed 1123 sequences, 1020 from public databases
(Additional file 1) and103 sequences identified in our
laboratory, with the BxB program. 10 regions were
selected according to BxB analysis (Table 1). Candidate
regions were defined as sites within the desired locations
that had 17+ bases from the 3' end and with a frequency
of 0.90+ in the BxB. The output of BxB analysis was desig-
nated as a FASTA format, which could be illuminated in
sequence analysis software interface such as ClustalX soft-
ware to facilitate primer selection (Figure 1).
Primer selection
One set of full length genome primers, four sets of walk-
ing primers were designed with the aid of Primer 3 (Figure
2, Table 2). Degenerate sites were also considered when
there were sites yielding low BxB frequency in selected
primers. All these primers gave negative result when they
were tested in UCSC in silico PCR to see whether primers
would amplify human DNA.
Table 1: Candidate regions selected by BxB for primer design
Candidate Regions* Sequence ORF located in
8~26 ACCTCTGCCTAATCATCTC X/preC
40~68 ACTGTTCAAGCCTCCAAGCTGTGCCTTGG preC
591~616 GCCGCGTCGCAGAAGATCTCAATCTC Terminal Protein
993~1018 GGGTCACCATATTCTTGGGAACAAGA Terminal Protein
1450~1469 CCTGCTGGTGGCTCCAGTTC pre-S2
1571~1592 TCCTAGGACCCCTGCTCGTGTT S
1657~1679 ACTTCTCTCAATTTTCTAGGGGG S
2131~2159 TATATGGATGATGTGGTATTGGGGGCCAA S
2491~2516 TTCTCGCCAACTTACAAGGCCTTTCT RT
3199~3221 CACCAGCACCATGCAACTTTTT X/preC
*Here we select "CTTTTTC" of X ORF as the start point
Virology Journal 2007, 4:92 />Page 3 of 7
(page number not for citation purposes)
Experiment verification
All primers, including one set of primers for full length
(3181 bp) amplification and four sets of primers for frag-
ment amplification, demonstrated a good efficiency in
real polymerase chain reactions (Figure 3).
Discussion
Using an alignment of 1123 complete genomes from pub-
lic databases and our laboratory, we selected primers from
several highly conserved regions of HBV genomes. These
primers are situated in the sequences encoding X, preC,
Table 2: Primers for full length genome amplification and fragment amplification
Primers Sequence Location* Length GC % Tm(°C) Amplicon Size (bp)
WA-L ACTGTTCAAGCCTCCAAGCTGTGC 40 24 54.2 60.6 3181
WA-R AGCAAAAAGTTGCATGGTGCTGGT 3221 24 45.8 60.7
FA1-L TTTCACCTCTGCCTAATCATCTC 4 23 43.5 52.0 1014
FA1-L' TTT ACCTCTGCCTAATCATCTC 4 22 40.9 47.5
FA1-R TCTTGTTCCCAAGAATATGGTG 1018 22 40.9 51.0
FA2-L GCGTCGCAGAAGATCTCAAT 593 20 50.0 51.9 1074
FA2-R TTGAGAGAAGTCCACCACGAG 1667 21 52.4 51.7
FA3-L CTGCTGGTGGCTCCAGTT 1451 18 61.1 50.6 1059
FA3-R GCCTTGTAAGTTGGCGAGAA 2510 20 50.0 52.2
FA4-L GTATTGGGGGCCAAGTCTGT 2416 20 55.0 52.8 1072
FA4-L' GTATTGGGGGCCAAATCTGT 2416 20 50.0 52.8
FA4-R AAAAAGTTGCATGGTGCTG 3218 19 42.1 48.5
*Here we select "CTTTTTC" of X ORF as the start point. FA1-L' and FA4-L' are degenerate primers
Output of BxB illiminated in ClustalX softwareFigure 1
Output of BxB illiminated in ClustalX software. When the ratio of the most frequently presented nucleotide is larger
than current cutoff value, the program outputs this nucleotide, otherwise outputs a '-'. The cutoff was set to (0.05, 1), and the
step length is 0.05. The frequency is listed in the left box and the nucleotides are in the right box.
Virology Journal 2007, 4:92 />Page 4 of 7
(page number not for citation purposes)
Diagram of HBV ORFs and designed primersFigure 2
Diagram of HBV ORFs and designed primers. WA-L and WA-R in blue arrows represents the primers for full
length genomic DNA amplification. FA1-L/FA1-L' and FA1-R (amplicon size: 1014 bp), FA2-L and FA2-R(amplicon size:
1074 bp), FA3-L and FA3-R (amplicon size: 1059 bp), FA4-L/FA4-L' and FA4-R (amplicon size: 1072 bp) in red arrows represent
the four sets of walking primers for fragment amplification. FA2-L and FA2-R Here we select "CTTTTTC" of X ORF as the
start point. FA1-L' and FA4-L' are degenerate primers.
Agarose gel analysis of HBV genomes amplified by the newly designed primersFigure 3
Agarose gel analysis of HBV genomes amplified by the newly designed primers. Sample 1 and sample 2 are for frag-
ment amplification primers testing. 1, 2, 3, 4 in the figure represent: FA3-L and FA3-R (amplicon size: 1059 bp), FA1-L/FA1-L'
and FA1-R (amplicon size: 1014 bp), FA4-L/FA4-L' and FA4-R (amplicon size: 1072 bp), FA2-L and FA2-R (amplicon size: 1074
bp) primer pairs respectively. Sample 3~7 are for full length genome amplification primers (WA-L and WA-R) testing (amplicon
size: 3181 bp).
Virology Journal 2007, 4:92 />Page 5 of 7
(page number not for citation purposes)
terminal protein, pre-S2, S and reverse transcriptase
regions. Sequences of the primers are sufficiently con-
served in all HBV genotypes and are believed to be con-
served in quasispecies. All these primers were shown to be
very efficient in real polymerase reaction. The advantage
of such approach is that it utilized all HBV sequences
available and a simple Perl program to precisely select
optimal regions in HBV genome for amplification. Such
approach is unlikely to produce significant bias towards
any one genotype when there is no bias in the multiple
sequences alignment which the approach was based on.
These primer designs make it possible to efficiently
amplify quasispecies and allow further search for
unknown HBV genotypes/subtypes.
We used two methods to estimate HBV genotype distribu-
tion in the public databases were: counting the genotyped
sequences with a simple Perl program and calculating the
percentages; or using BLAST[17] to align eight HBV geno-
type reference sequences from NCBI with those from pub-
lic databases to get all sequences of different HBV
genotypes and then to calculate the percentage. Both
methods yielded similar results: Genotype C counts most
(about 1/3 in the databases); Genotype B, A, D in
descending order. These four genotypes represent about
80~90% in the databases and the rest are E, F, G and H.
Besides, there are also a few CD and GC recombinants.
The primer design described in this study is based on
sequences from the public databases and our laboratory
which are genotype B and C. Therefore, it would only give
bias when the genotype distribution in the databases does
not reflect the actual HBV genotype distribution in reality.
Since this method is based on multiple sequences, it can
be much more reliable when target amplifications are
within one genotype or within a certain group. In such
occasions, sequences of one genotype or a given group
should be used and analyzed with the BxB program to
obtain genotype-specific or group-specific candidate
regions and primer sequences. Recently, based on full
length sequences in our laboratory most of which are
from Beijing, we successfully selected optimal primers for
HBV in Beijing regions using this approach. Further
research of this approach should be done on other geno-
types like A, D, E, F etc. to testify its specificity, either
through sequences of one genotype or sequences of mix-
tures.
The amplified full length genome with our method is
3181 bp which is only 40 bp shorter than the full length
of HBV genome. It is not applicable in functional study
but much valuable in genomic study. The set of primers
were proved to have a good PCR efficiency. The four sets
of fragment primers are also based on the most conserved
regions from public sequences. These primers are walking
primers covering the whole HBV genome. They should be
very useful in amplifying certain regions of the genome. In
future research on this method, both full length amplifi-
cation primers and fragment amplification ones should
be testified in samples with different viral titers to check
its sensitivity.
The BxB program we utilized in this study was a simple
Perl script, which can be easily integrated in any primer
design software and online tools. What BxB demands is
just a multiple sequences alignment of the target
sequences FASTA format, and outputs a description of
conserved sites for primer design in FASTA format. It not
only can be used as a separate tool but also can be inte-
grated in any open source primer design software[18] to
select conserved sites based on the alignments.
The highly heterogenic characteristic of viruses is the
major obstacle to efficient DNA amplification. Taking
advantage of the large number of virus DNA sequences in
public databases to select conserved sites for primer
designing should be an optimal way to tackle the difficul-
ties in virus genome amplification. DNA sequences in
public databases are on the increase. Take HIV and Hepa-
titis viruses for example, up to March 2007, the number of
full length genome DNA sequences in public databases
(Additional file 1) are ranges from about 40 to more than
2000: HIV is 2005; HCV is 183; HBV is 1020; HEV is 78;
HAV is 35 and HDV is 83. This amount of data makes it
possible to easily select conserved sites for primer design
in different scale, genome regions, subtypes and groups.
Conclusion
Utilizing the HBV sequence in public databases and our
laboratory, and a Perl program, we selected optimal
regions for primer design. Those primers designed were
verified in silico by e-PCR and polymerase chain reactions.
One set of primers for full length HBV genome amplifica-
tion and four sets of walking primers for fragment ampli-
fication were proved to be efficient. The use of such
primers makes it unlikely that HBV strains are missed due
to sequence variations and allows furthermore search for
quiasispecies as well as genotype-unknown HBV strains.
Our approach of primer design is simple, efficient and is
totally applicable to other viruses, such as HIV, HCV etc.
when multiple sequences alignments are available and
efficient amplification in a heterogeneous mixture is
needed.
Methods
HBV sequence data
Initially in the study all complete genome sequences of
HBV available in March 2007 from GenBank, EMBL, and
DDBJ were downloaded. 1020 public sequences together
with 103 sequences from our laboratory were aligned in
Virology Journal 2007, 4:92 />Page 6 of 7
(page number not for citation purposes)
ClustalW. The alignment was manually corrected by shift-
ing sequences in places, for some sequences possessed
large spans of unique deletions or insertions which threw
off the alignment algorithm. Finally, as the start point of
the sequences in databases were different, most of which
were the EcoR I restriction enzyme cutting site, a unani-
mous start point was selected and the alignment was cor-
rected to begin at the same location. Here we select
"CTTTTTC" of X ORF as the start point.
Selection of highly conserved genome regions for primer
design
The term "conserved genome regions" used here is
defined as genome regions that have most frequently pre-
sented nucleotide sequences. To identify the highly con-
served regions for primer design in HBV genome, Perl
script[19] was used to write a program BxB (Base by Base)
to scan through the alignment of the 1123 sequences base
by base. BxB demands a multiple sequences alignment of
the target sequences in FASTA[20] format. It is to detect
the most frequently presented base in the same coordinate
for all sequences of the alignment. Different cutoff values
were tested to identify a best one for the alignment scan.
If the ratio of the most frequently presented nucleotide is
larger than current cutoff value, the program outputs this
nucleotide, otherwise outputs a '-'. Finally, the cutoff was
set to (0.05, 1), and the step length is 0.05. The output is
a FASTA file which could be easily illuminated in
sequence analysis software such as ClustalX[21], with
which conserved region selection and primer design could
be much facilitated in a user friendly interface.
Primer design
With the aid of the BxB, candidate regions were selected
for primer design. Candidate regions were defined as sites
within the desired locations that had 17+ bases from the
3' end and with a frequency of 0.90+ in the BxB. Using
Primer 3 [22], primers were selected within the candidate
regions, taking target regions, primer length and
sequence, GC content and Tm etc. into consideration.
In silico primer testing
All primers were tested in University of California Santa
Cruz (UCSC) in silico PCR to see whether primers would
amplify human DNA.
Experiment verification
Clinical material
Serum samples were collected from seven patients with
hepatitis B surface antigen (HBsAg)-positive chronic hep-
atitis (serum HBsAg positive for at least 6 months). Five of
them were genotype C and two were genotype B. All
patients were seronegative for hepatitis C virus. The serum
samples were stored at -20°C until analysis.
Extraction of serum HBV DNA
Serum viral DNA was extracted by using commercially
available kits (QIAamp DNA Blood Mini Kit, QIAGEN,
Inc., Valencia, CA).
Polymerase chain reaction
Full length amplification
The PCR was performed in a 96-well cycler (GeneAmp
PCR System 9700; Applied Biosystems) and in a 10 μl
reaction volume containing 0.5 U LA Taq (TaKaRa). The
primers were WA-R and WA-L (Table 2). The cycling con-
ditions were initial denaturation at 95°C for 2 min 30 s,
followed by 35 cycles of denaturation at 94°C for 1 min,
annealing at 58°C for 1 min 30 s and extension at 72°C
for 3 min, finally extension at 72°C for 10 min. Ampli-
cons (1 μl) were analyzed by electrophoresis on 1.5% aga-
rose gel, stained with ethidium bromide and observed
under UV light.
Fragment amplification
For fragment amplification, the primers were FA1-R, FA1-
L/FA1-L', FA2-R, FA2-L, FA3-R, FA3-L, FA4-R, FA4-L/FA4-
L' (Table 2). The cycling conditions were initial denatura-
tion at 95°C for 2 min 30 s, followed by 35 cycles of dena-
turation at 94°C for 1 min, annealing at 55°C for 1 min
and extension at 72°C for 2 min, finally extension at
72°C for 10 min. Amplicons (1 μl) were analyzed by elec-
trophoresis on 1.5% agarose gel, stained with ethidium
bromide and observed under UV light.
Competing interests
The author(s) declare that they have no competing inter-
ests.
Authors' contributions
QRZ conceived of the study, designed the experiment,
performed the multiple sequence alignment and selected
the primers. GHW did the experiments and wrote the
manuscript. ER corrected the multiple sequence align-
ment, wrote the Perl scripts and performed in silico testing
SGJ did some of the experiments. CZ supervised the whole
research.
Additional material
Additional file 1
Sequence IDs of HBV full length genome entries in GenBank, EMBL and
DDBJ (March 2007). The data provided represent all available HBV
genome sequences when the study was conducted.
Click here for file
[ />422X-4-92-S1.doc]
Publish with BioMed Central and every
scientist can read your work free of charge
"BioMed Central will be the most significant development for
disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright
Submit your manuscript here:
/>BioMedcentral
Virology Journal 2007, 4:92 />Page 7 of 7
(page number not for citation purposes)
Acknowledgements
This study was supported by CAS National Knowledge Innovation Program
(KIP) (KSCX2-SW-207), Beijing Municipal Education Commission Funds
Program (KM20070025024) and Beijing Integrated Traditional and West-
ern Medicine Key Disciplines.
References
1. Norder H, Courouce AM, Magnius LO: Complete genomes, phy-
logenetic relatedness, and structural proteins of six strains of
the hepatitis B virus, four of which represent two new geno-
types. Virology 1994, 198:489-503.
2. Okamoto H, Tsuda F, Sakugawa H, Sastrosoewignjo RI, Imai M, Miya-
kawa Y, Mayumi M: Typing hepatitis B virus by homology in
nucleotide sequence: comparison of surface antigen sub-
types. J Gen Virol 1988, 69(Pt 10):2575-2583.
3. Chen BF, Kao JH, Liu CJ, Chen DS, Chen PJ: Genotypic dominance
and novel recombinations in HBV genotype B and C co-
infected intravenous drug users. J Med Virol 2004, 73:13-22.
4. Chen BF, Liu CJ, Jow GM, Chen PJ, Kao JH, Chen DS: Evolution of
Hepatitis B virus in an acute hepatitis B patient co-infected
with genotypes B and C. J Gen Virol 2006, 87:39-49.
5. Wang Z, Liu Z, Zeng G, Wen S, Qi Y, Ma S, Naoumov NV, Hou J: A
new intertype recombinant between genotypes C and D of
hepatitis B virus identified in China. J Gen Virol 2005,
86:985-990.
6. Okamoto H, Imai M, Kametani M, Nakamura T, Mayumi M:
Genomic heterogeneity of hepatitis B virus in a 54-year-old
woman who contracted the infection through materno-fetal
transmission. Jpn J Exp Med 1987, 57:231-236.
7. Orito E, Mizokami M, Ina Y, Moriyama EN, Kameshima N, Yamamoto
M, Gojobori T: Host-independent evolution and a genetic clas-
sification of the hepadnavirus family based on nucleotide
sequences. Proc Natl Acad Sci USA 1989, 86:7059-7062.
8. Blum HE: Hepatitis B virus: significance of naturally occurring
mutants. Intervirology 1993, 35:40-50.
9. Alexopoulou A, Dourakis SP: Genetic heterogeneity of hepatitis
viruses and its clinical significance. Curr Drug Targets Inflamm
Allergy 2005, 4:47-55.
10. Ngui SL, Teo CG: Hepatitis B virus genomic heterogeneity:
variation between quasispecies may confound molecular epi-
demiological analyses of transmission incidents. J Viral Hepat
1997,
4:309-315.
11. Ohishi W, Chayama K: Rare quasispecies in the YMDD motif of
hepatitis B virus detected by polymerase chain reaction with
peptide nucleic acid clamping. Intervirology 2003, 46:355-361.
12. Yim HJ, Hussain M, Liu Y, Wong SN, Fung SK, Lok AS: Evolution of
multi-drug resistant hepatitis B virus during sequential ther-
apy. Hepatology 2006, 44:703-712.
13. Gunther S, Li BC, Miska S, Kruger DH, Meisel H, Will H: A novel
method for efficient amplification of whole hepatitis B virus
genomes permits rapid functional analysis and reveals dele-
tion mutants in immunosuppressed patients. J Virol 1995,
69:5437-5444.
14. Gunther S, Sommer G, Von Breunig F, Iwanska A, Kalinina T, Sterneck
M, Will H: Amplification of full-length hepatitis B virus
genomes from samples from patients with low levels of
viremia: frequency and functional consequences of PCR-
introduced mutations. J Clin Microbiol 1998, 36:531-538.
15. Tellier R, Bukh J, Emerson SU, Miller RH, Purcell RH: Long PCR and
its application to hepatitis viruses: amplification of hepatitis
A, hepatitis B, and hepatitis C virus genomes. J Clin Microbiol
1996, 34:3085-3091.
16. Tellier R, Bukh J, Emerson SU, Purcell RH: Amplification of the
full-length hepatitis A virus genome by long reverse tran-
scription-PCR and transcription of infectious RNA directly
from the amplicon. Proc Natl Acad Sci USA 1996, 93:4370-4373.
17. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local
alignment search tool. J Mol Biol 1990, 215:403-410.
18. [ />].
19. [ />].
20. [ />].
21. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG: The
CLUSTAL_X windows interface: flexible strategies for mul-
tiple sequence alignment aided by quality analysis tools.
Nucleic Acids Res 1997, 25:4876-4882.
22. Rozen S, Skaletsky H: Primer3 on the WWW for general users
and for biologist programmers. Methods Mol Biol 2000,
132:365-386.