Tải bản đầy đủ (.pdf) (131 trang)

Characterizing evolutionarily conserved influenza a virus sequences as vaccine targets

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.86 MB, 131 trang )

CHARACTERIZING EVOLUTIONARILY CONSERVED
INFLUENZA A VIRUS SEQUENCES
AS VACCINE TARGETS

HEINY
B.Sc. (Hons), NUS

A THESIS SUBMITTED
FOR THE DEGREE OF MASTER OF SCIENCE
DEPARTMENT OF BIOCHEMISTRY
NATIONAL UNIVERSITY OF SINGAPORE
2008


Parts of this dissertation work have contributed to the following publications:
1. Heiny AT, Miotto O, Srinivasan KN, Khan AM, Zhang GL, Brusic V, Tan TW,
August JT. (2007). Conserved protein sequences of all influenza A viruses as
vaccine targets. PLoS ONE 2(11): e1190. (Citation: 1)
Contribution: Conceived, designed and performed the experiments; analyzed the
data and drafted the manuscript.
2. Miotto O, Heiny AT, Tan TW, August JT, Brusic V. (2008). Identification of
human-to-human transmissibility factors in PB2 proteins of influenza A by largescale mutual information analysis. BMC Bioinformatics 9 Suppl. 1: S18.
(Citation: 6; Impact Factor: 3.49)
Contribution: Collected and annotated the sequence data; analyzed the data.
3. Khan AM, Heiny AT, Lee KX, Srinivasan KN, Tan TW, August JT, Brusic V.
(2006). Large-scale analysis of antigenic diversity of T-cell epitopes in dengue
virus. BMC Bioinformatics 7 Suppl. 5: S4. (Citation: 7; Impact Factor: 3.49)
Contribution: Participated in the design of the study.
4. Khan AM, Miotto O, Heiny AT, Salmon J, Srinivasan KN, Nascimento EJ,
Marques ET Jr, Brusic V, Tan TW, August JT. (2007). A systematic
bioinformatics approach for selection of epitope-based vaccine targets. Cell


Immunol. 244(2): 141-7. (Citation: 6; Impact Factor: 1.81)
Contribution: Participated in the development of the methodology platform.
5. Zhang GL, Khan AM, Srinivasan KN, Heiny AT, Lee KX, Kwoh CK, August JT
and Brusic V. (2008). Hotspot Hunter: a computational system for large-scale
screening and selection of candidate immunological hotspots in pathogen
proteomes. BMC Bioinformatics 9 Suppl. 1: S19. (Citation: 1; Impact Factor:
3.49)
Contribution: Participated in the design of the study and pilot testing of the
system.
6. Khan AM, Miotto O, Nascimento EJM, Srinivasan KN, Heiny AT, Zhang GL,
Marques ET, Tan TW, Brusic V, Salmon J, August JT. (2008). Conservation and
Variability of Dengue virus Proteins: Implications for Vaccine Design. PLoS
Neglected Tropical Diseases 2(8): e272.
Contribution: Participated in the design of experiments, data analysis, and
methods development.
ii


ACKNOWLEDGEMENT

I would like to express my gratitude to my supervisors  Assoc. Prof. Tan Tin Wee
(Department of Biochemistry, National University of Singapore), Dr. Vladimir Brusic
(Dana-Farber Cancer Institute, Harvard Medical School), and Prof. J Thomas August
(Johns Hopkins University)  for the opportunity given to me, for their guidance,
advice, continuous support and encouragement. I also would like to thank Mr. Asif
Khan (PhD candidate), Mr. Olivo Miotto (PhD candidate), Ms. Hu Yong Li (graduate
student) and Dr. KN Srinivasan (previously a postdoctoral fellow at Prof. August’s
laboratory) for their help and suggestions; Dr. Paul August, Dr. Lin Hong Huang, and
Ms. Zhang Guang Lan for their expert knowledge in programming and technical help;
Dr. Paul Tan Thiam Joo for his critical review of the thesis; Dr. Songsak Tongchusak,

Mr. Mark de Silva and Mr. Lim Kuan Siong for their support.

iii


TABLE OF CONTENTS

1.

Introduction..........................................................................................................1
1.1. The genome, structure and life cycle of influenza A virus.........................6
1.2. Influenza virus sequences and classification ............................................12
1.3. Current vaccines against influenza and their limitations..........................15
1.4. Antigenic variation of influenza A viruses...............................................18
1.5. Immune responses to influenza ................................................................22
1.6. The approach, aims and contribution of this work ...................................28
2. Materials and Methods......................................................................................30
2.1. General thesis overview............................................................................30
2.2. Sequence data collection and processing..................................................32
2.3. Measuring the diversity of human and avian influenza A viruses............33
2.4. Selection of highly conserved human and avian influenza A virus
sequences ..................................................................................................35
2.5. Determining the immune-relevance of the highly conserved sequences..36
2.5.1. HLA supertype-restricted T-cell epitopes ....................................36
2.5.2. Experimentally identified T-cell epitopes ....................................39
2.6. Retrospective study on the stability of highly conserved sequences........39
2.7. Functional sites of influenza A virus proteins ..........................................40
3. Results and Discussion.......................................................................................41
3.1. Summary of results ...................................................................................41
3.2. Avian and human influenza A virus sequences........................................43

3.3. The diversity of human and avian influenza A virus proteins..................45
3.4. Highly conserved sequences of influenza A virus were present mostly in
the internal proteins ..................................................................................52
3.5. Highly conserved sequences contain numerous antigenic determinants..55
3.5.1. Prediction data ..............................................................................55
3.5.2. Experimental data .........................................................................62
3.6. Stability of highly conserved sequences through the viral evolutionary
history .......................................................................................................64
3.7. Functional sites of influenza A virus proteins ..........................................68
4. Conclusions and Discussion ..............................................................................71
5. Future Work.......................................................................................................76
REFERENCES...........................................................................................................79
APPENDIX A
WHO recommended seasonal vaccine composition ................96
APPENDIX B
List of identified highly conserved sequences ..........................97
APPENDIX C
Predicted epitopes in the highly conserved sequences...........100
APPENDIX D
Classification of HLA I and II molecules into supertypes ....112
APPENDIX E
Functional sites of influenza A virus proteins........................113
APPENDIX F
Permission to reproduce copyrighted materials ....................120
APPENDIX G
Poster and oral presentations ..................................................122
APPENDIX H
Publications ...............................................................................123

iv



LIST OF FIGURES

Figure 1. Influenza A virus genome and structure.........................................................8
Figure 2. Influenza A virus replication cycle...............................................................11
Figure 3. Outline of thesis overview............................................................................31
Figure 4. Nonamer entropy plot for avian influenza A viruses for the period 19771986, 1987-1996, and 1997-2006. ...............................................................47
Figure 5. Nonamer entropy plot for avian and human H5N1 influenza A virus
sequences......................................................................................................48
Figure 6. Nonamer entropy plot of human H1N1, H3N2, and H1N2 influenza A virus
sequences......................................................................................................50
Figure 7. Entropy-sequence conservation relationship plot from data in this study....51
Figure 8. Highly conserved sequences of influenza A viruses in human H1N1, H3N2,
H1N2, H5N1, avian H5N1 and other avian subtypes circulating between
1997 and 2006. .............................................................................................54
Figure 9. Binding sequences present in the highly conserved sequences....................63
Figure 10. Functional sites of the influenza A virus PB2, PB1, PA, NP and M1
proteins. ........................................................................................................69
Figure 11. Functional sites of the influenza A virus HA, NA, NS1, NS2, M2 and PB1F2 proteins....................................................................................................70

v


LIST OF TABLES

Table 1. Influenza A virus proteins and their sequence lengths. ...................................7
Table 2. Public influenza sequence databases. ............................................................13
Table 3. A summary of sialic acid linkages and binding site preferences...................21
Table 4. HLA I and II molecules that were included in this thesis and their

corresponding supertypes.............................................................................26
Table 5. Number of amino acid sequences of influenza A virus proteins from the past
decade (1997-2006)......................................................................................44
Table 6. Number of influenza A virus proteins sequences prior to 1997. ...................44
Table 7. A summary of the identified highly conserved sequences in the avian and
human influenza A virus sequences from 1997 to 2006. .............................53
Table 8. List of predicted epitopes within the highly conserved sequences................57
Table 9. Summary of the number of predicted epitopes within the highly conserved
sequences......................................................................................................59
Table 10. Summary of the number of highly conserved sequences with predicted
epitopes per HLA supertype.........................................................................61
Table 11. Highly conserved sequences that remain conserved in the retrospective
study using data from prior to 1997. ............................................................65
Table 12. List of highly conserved sequences that were less than 80% conserved in
sequences prior to 1997................................................................................67

vi


LIST OF ABBREVIATIONS

ABK
ANN
APC
ARB
AVANA
BIG
CDC
cRNA
DC

DMID
ELISA
ELISPOT
ER
FDA
MHC
HA or H
HIV
HLA
HMM
HPAI
IC50
ICTVdb
IEDB
IFN
M1
M2
NA or N
NCBI
NIAID
nM
NP
NS1
NS2
PA
PB1
PB2
RNA
RNP
SN

ssRNA
SP
TAP
TLR
vRNA
WHO

Aggregator of Biological Knowledge
Artifical Neural Network
Antigen Presenting Cell
Average Relative Binding
Antigenic Variability Analyser tool
Beijing Institute of Genomics
Centers for Disease Control and Prevention
complementary RiboNucleic Acid
Dendritic Cell
Division of Microbiology and Infectious Diseases
Enzyme-Linked ImmunoSorbent Assay
Enzyme-Linked Immunosorbent Spot
Endoplasmic Reticulum
Food and Drug Administration
Major Histocompatibility Complex
Hemagglutinin
Human Immunodeficiency Virus
Human Leukocyte Antigen
Hidden Markov Model
Highly Pathogenic Avian Influenza
Median Inhibition Concentration
The Universal Virus Database by International Committee on
Taxonomy of Viruses

Immune Epitope Database
Interferon
Matrix protein 1
Matrix protein 2
Neuraminidase
National Center for Biotechnology Information
National Institute of Allergy and Infectious Diseases
Nanomolar
Nucleoprotein
Non-Structural protein 1
Non-Structural protein 2
Polymerase Acidic protein
Polymerase Basic protein 1
Polymerase Basic protein 2
Ribonucleic Acid
Ribonucleoprotein particle
Sensitivity
single-stranded Ribonucleic Acid
Specificity
Transporter of Antigen Processing
Toll-Like Receptor
viral Ribonucleic Acid
World Health Organization

vii


SUMMARY
Influenza A viruses generate an extreme genetic diversity through point mutation and
gene segment exchange, resulting in many new strains and variants that emerge from

the avian reservoirs, among which was the emergence of highly pathogenic H5N1
virus. Given the looming threat of emergence of an influenza pandemic, a vaccine that
will provide broad spectrum coverage to influenza A subtypes/strains is a critical
need. One feasible approach is a vaccine containing conserved immunogenic protein
sequences that represent the genotypic diversity of the currently known and newly
emerged avian and human influenza viruses as an alternative to current vaccines that
address only the known circulating virus strains. This thesis focuses on bioinformatics
approaches in characterizing the proteomes of known influenza A viruses for
identification of highly conserved sequences that are potentially immunogenic in a
broad spectrum of the human population. Tools and methodologies for automated
aggregation and annotation of influenza A virus sequences from public databases and
identification of highly conserved sequences were developed. A total of 36,343
sequences of various influenza subtypes from avian and human hosts were collected
and classified into six major subgroups (human H1N1, human H1N2, human H3N2,
human H5N1, avian H5N1 and other avian subtypes) for analysis. Fifty-five (55)
highly conserved sequences that were present in at least 80% of all sequences from
the past decade (1997-2006) in each virus subgroups were defined and were present
mostly in the internal proteins PB2, PB1, PA, NP and M1. Forty-nine (49) of 55
conserved sequences contained clusters of potential and reported T-cell epitopes.
Forty-nine (49) conserved sequences remained unchanged in at least 80% of human
and avian sequences prior to 1997. Many of the highly conserved sequences are
located in the functionally important domain/site of influenza A proteins, suggesting
their relative importance to virus survival. The identified sequences that are both
highly conserved and immune-relevant are suitable candidates for a T-cell epitopebased vaccine and can be designed to provide a continuum of immune responses to
influenza A infection.

(Word Count: 330)

viii



1. Introduction

One of the most important threats to human health is infection by influenza A viruses,
which has its natural reservoir in aquatic birds (de Jong et al., 2000; Treanor, 2004;
Killbourne, 2006). While global influenza pandemics have occurred only three times
in the past century, the H1N1 pandemic of 1918-1919 caused an estimated 20-50
million deaths, making it one of the most serious disease outbreaks in recorded
history. The recent evolution of the highly pathogenic avian influenza (HPAI) virus of
H5N1 subtype, while not human-to-human transmissible, has emphasized the
continuous threat of influenza viruses on a global scale (Peiris et al., 2007). The initial
outbreak of the HPAI virus in poultry (Hong Kong, 1997) was soon followed by
reports of human infections by avian influenza H5N1 with high fatality rate. Since
then, the World Health Organization (WHO) has reported a total of 382 human cases
in 14 countries with 241 deaths (as of 30 April 2008). Outbreaks of HPAI virus of H7
subtype in poultry have also been reported, with several confirmed cases of human
infection (Belser et al., 2008). It is widely predicted that because of the increased
human population density and increased travel, a new pandemic on the scale of the
H1N1 infection would have a devastating effect globally.
Besides the threat of pandemic, influenza A virus also poses challenges in the
form of recurrent flu epidemics. Influenza epidemics occur periodically, affecting the
majority of world populations, of all age groups. In the United States, 5-20% of the
population is affected, resulting in some 200,000 hospitalizations and 30,000 deaths
each year (Centers for Disease Control and Prevention, />Singapore is also affected by annual influenza, 20% of population is estimated to have
flu with clinical symptoms, with the mortality rate of approximately 14.8 per 100,000

1


person-years (Lee et al., 2007). The socioeconomic burden caused by influenza from

hospitalization costs, lost of work productivity, and death is huge (Szucs, 1999).
Molinari et al. (2007) conducted a study to estimate the economic impact of influenza
epidemics in the United States based on 2003 population and concluded that the total
economic burden of seasonal influenza, including both direct and indirect costs, to be
more than $87 billion per year.
Vaccination has been the main strategy for prevention and control of disease
(Nichol, 2008). However, despite the availability of seasonal influenza vaccines, their
efficacy varies, depending on the match between vaccine strains and the circulating
virus strains (Carrat and Flahault, 2007). This is mainly caused by the ability of
influenza A virus to undergo rapid genetic changes known as antigenic drift, and
occasionally major changes in the form of antigenic shift. As a result of such
variability, vaccination can only target against a small number of selected circulating
strains. In addition, the situation is further complicated by the disproportionate
correlation between genetic change and antigenicity (Smith et al., 2004), making it
difficult to anticipate the effectiveness of a preemptive vaccine formulation.
The issue of matching the vaccine formulation to viruses in circulation is
complicated by the lengthy process of vaccine production. The manufacturing process
of the seasonal flu vaccine, which is based on technology that is more than 60 years
old, takes about six to eight months. It includes growing the selected strains of viruses
in eggs for distribution to vaccines manufacturers, expansion of virus seed pools in
embryonated chicken eggs, harvesting of virions from the allantoic fluids and
inactivation, purification of hemagglutinin (HA) and neuraminidase (NA) subunits, to
the final blending and packaging (Treanor, 2004). Because of the lengthy vaccine
production process, the manufacturing has to begin very much in advance of a flu

2


season. The choice of strains for inclusion into yearly formulation relies on the
epidemiological data of the previous season. These data may not necessarily be

predictive of the upcoming epidemic strains due to rapidly changing characteristics of
the surface glycoproteins of influenza A viruses. The surface glycoproteins are the
main antibody targets that mediate protection against seasonal influenza.
In addition, the responses to vaccination also vary in the general population
due to genetic differences in the immune system components (Ovsyanikova et al.,
2004; Poland et al., 2008). One such example includes highly polymorphic human
leukocyte antigen (HLA) molecules, key components of the immune response. HLA
diversity was identified to be responsible for the spectrum of vaccination responses
observed following influenza vaccination, including failure in mounting immune
response to vaccination (Gelder et al., 2002; Lambkin et al., 2004).
Alternative vaccine strategies that overcome the problem of rapid viral
mutation, applicable to global populations, and provide for easy production are crucial
in the face of pandemics. The rapid mutation of the virus glycoproteins –HA and NA
proteins – facilitates the selective replication of new virus strains not subject to
immunity based on previous vaccination and is a serious obstacle to the effectiveness
of these vaccines (Killbourne et al., 2002; Ghedin et al., 2005). Many studies focusing
on developing new vaccines with heterologous protection against multiple virus
strains and long-term immunity by focusing on conserved viral features (reviewed in
Ben-Yedidia and Arnon, 2007 and McMurry et al., 2008) have been reported. Short
conserved virus sequences generally trigger the cell-mediated immunity arm of the
immune system, which plays a role in virus clearance during infection.
Cellular immune responses are recognized to play a role in influenza
immunity (Townsend, 1987; Lamb et al., 1987; Askonas et al., 1988; McMichael and

3


Gotch, 1989; Gianfrani et al., 2000; Rimmelzwaan et al., 2007) and the application of
T-cell epitopes has been extensively studied as an alternative to vaccines designed for
humoral immunity (Ulmer et al., 1993 and 1998; Fomsgaard et al., 1999; Fu et al.,

1999; Swain et al., 2006; Thomas et al., 2006). Cell-mediated immunity is based upon
the binding of short sequences of antigen proteins, termed T-cell epitopes, to
specialized cellular proteins, known as human leukocyte antigens (HLAs). There are
two major HLA classes: class I (HLA I) and class II (HLA II). HLA molecules
facilitate the presentation of epitopes to T-cells of the immune system (Germain,
1994; Cresswell et al., 2005; Trombetta and Mellman, 2005). The chemical and
structural determinants of HLA-peptide binding have been defined for a number of
HLA alleles (Falk et al., 1991; van Bleek and Nathenson, 1991; Hammer et al., 1993;
Engelhard, 1994; Rammensee et al., 1995). Of particular relevance for vaccine design
are supertype groupings of similar HLA alleles that display overlapping peptidebinding capacities (Sidney et al., 1995; Southwood et al., 1998; Sette and Sidney,
1999; Lund et al., 2004; Doytchinova and Flower, 2005; Reche and Reinherz, 2007;
Sidney et al., 2008). The supertypes cover a large fraction of the HLA diversity in the
human population and antigen epitopes that bind to the supertypes are considered
prime candidates for vaccine formulations (Bian and Hammer, 2004; Khan et al.,
2006; Ben-Yedidia and Arnon, 2007; Zhang et al., 2008). Supertype-binding motifs
and quantitative matrices have been incorporated into several computational
prediction algorithms and it is now possible to identify, in silico, candidate HLArestricted T-cell epitopes of protein sequences, allowing large-scale analysis of
potential vaccine targets (Bian and Hammer, 2004; Bui et al., 2005; Larsen et al.,
2005; Zhang et al., 2005). Moreover, increasing attention is being given to T-cellbased vaccines because they can be designed as genetic formulations to include

4


selected regions of the viral antigens (Wilson et al., 2003; Sette and Fikes, 2003; BenYedidia and Arnon, 2007; Brusic and August, 2004; Fischer et al., 2007).
Many prediction algorithms that have been developed using experimental data
are available for peptides binding to components of cellular mediated immunity, such
as human leukocyte antigen (HLA), transporter of antigen processing (TAP), and
proteasomal cleavage (Peters et al., 2006; Lin et al., 2008; Wang et al., 2008). The
availability of effective models in the public domains facilitates the assessment of the
immune-relevance of the highly conserved sequences. However, modeling of

conformational B-cell epitopes is much more complex; the currently available models
are of limited applicability (Lundegaard et al., 2007) and therefore analysis of B-cell
epitopes is not considered in this thesis.
This thesis focuses on bioinformatics approaches in characterizing the
proteomes of known influenza A viruses for identification of highly conserved
sequences that are potentially immunogenic in a broad spectrum of the human
population. Information entropy and consensus sequence methodologies were
combined to identify sequences of nine amino acids or longer with a history of
complete conservation in 80% or more of both avian and human virus strains. These
conserved sequences were further analyzed to identify targets for candidate epitopebased T-cell vaccine formulations against all current and possibly future influenza A
pathogens of avian or human origin. In addition, the highly conserved sequences were
mapped to a set of known functional sites of influenza A proteins.

5


This chapter provides the background to the work conducted in this thesis,
encompassing the biology of influenza A virus and the immune response to influenza
infection. The challenges posed by the rapid evolution of influenza A viruses and the
complexity of human immune system are discussed. The limitations in the existing
vaccine formulation and what is needed to bring the field forward are highlighted.
This chapter ends with the aims and contribution of this thesis in addressing the
current limitations in influenza vaccine targets selection.

1.1. The genome, structure and life cycle of influenza A virus

Influenza A viruses belong to the family Orthomyxoviridae. This family of RNA
viruses also includes influenza virus B, influenza virus C, Thogotovirus and Isavirus
(Krossøy


et

al.,

1999;

ICTVdb



The

Universal

Virus

Database,

The three types of influenza viruses
(A, B and C) are differentiated based on the antigenic property of nucleoprotein (NP)
and matrix proteins (Cox et al., 2004). Influenza A virus has wide host range
specificity and is the causative agent of all flu pandemics. On another hand, influenza
B and C viruses have more limited host range and do not evolve as rapidly. Epidemics
in humans are generally caused by influenza A and B viruses. This thesis focuses on
influenza A viruses. The following sections describe the biology of influenza A
viruses.

6



Influenza A virus genome and structure
Influenza viruses are negative-strand, enveloped ribonucleic acid (RNA) viruses. The
genome of influenza A viruses consists of eight single-stranded RNA segments which
encodes for at least eleven proteins (Table 1). Each segment of the genome is
independently encapsidated by the viral nucleoprotein (NP) and associated with the
polymerase complex (Ruigrok, 1998). The viral RNA, NP and polymerase complex is
called a ribonucleoprotein (RNP) particle.
Table 1. Influenza A virus proteins and their sequence lengths.
RNA
Protein
GenPept
Protein Product
Segment
Length
Accession
1
Polymerase basic protein 2 (PB2)
759 aa
NP_040987
2
Polymerase basic protein 1 (PB1)
757 aa
NP_040985
2a
PB1-F2 protein (PB1-F2)
87 aa
YP_418248
3
Polymerase acidic protein (PA)
716 aa

NP_040986
4
Hemagglutinin (HA)
566 aa
NP_040980
5
Nucleoprotein (NP)
498 aa
NP_040982
6
Neuraminidase (NA)
454 aa
NP_040981
7a
Matrix protein 1 (M1)
252 aa
NP_040978
NP_040979
7b
Matrix protein 2 (M2)
96 aa
8a
Nonstructural protein 1 (NS1)
230 aa
NP_040984
8b
Nonstructural protein 2 (NS2)
121 aa
NP_040983
* The protein length is based on the reference strain A/PR/8/34 (H1N1)


The viral genome is located inside a shell of multiple copies of M1 protein.
M1 protein lines the viral lipid membrane derived from the plasma membrane of the
infected cell during the budding process. The lipid membrane is embedded with
hemagglutinin (HA), neuraminidase (NA), and the membrane-channel protein, M2. A
diagrammatic representation of influenza A virus RNP particle and viral particle
structure are shown in Figure 1.

7


Figure 1. Influenza A virus genome and structure.
(Left) The ribonucleoprotein (RNP) particle consists of viral RNA, nucleoprotein
(NP) and polymerase complex (PA, PB1, PB2). (Right) Influenza A virus particle
with eight RNP particles inside a shell of lipid envelope; the matrix 1 protein (M1)
lines the envelope, with hemagglutinin (HA), neuraminidase (NA), and matrix protein
2 (M2) embedded in the lipid layer. [Reproduced with permission from: Cox et al.,
2004]
HA and NA are the two major targets of antibody responses to influenza A
viruses. The HA and NA spikes on the surface of virus particle occur in a ratio of
approximately 8:1 (de Jong et al., 2000). Hemagglutinin is the viral attachment
protein that facilitates the entry of influenza virus into the host cells – the first step in
the viral infection. HA binds the sialic-acid receptors on the host cell surface, and
mediates fusion between viral and endosomal membranes of the host cell prior to the
transfer of nucleocapsids into the host cytoplasm (Steinhauer and Wharton, 1998).
The HA protein is synthesized as a single polypeptide (HA0) and subsequently
cleaved into two segments – HA1 and HA2 – forming the active form of a viral
particle (de Jong et al., 2000). NA is an enzyme that helps the release of newly
formed viral particles from the infected cells (Nicholson et al., 2003), facilitating the
cell-to-cell spread of virus. NA also functions as signal for virus transport to cell

membrane (Webster et al., 1992).
8


Polymerase basic protein 1 (PB1), polymerase basic protein 2 (PB2), and
polymerase acidic protein (PA) are components of the RNA polymerase that play a
role in transcription and replication of the influenza A viruses (Hay, 1998;
Kolpashchikov et al., 2004). PB1-F2 is a protein product of an alternative reading
frame of PB1 gene, reported by Chen et al. (2001). It was proposed that PB1-F2 plays
a role in death of host immune cells, triggered as a result of influenza virus infection.
However, PB1-F2 protein is not always encoded by influenza A viruses (Shaw et al.,
2008). Nucleoprotein (NP) binds and coats the viral RNA together with RNA
polymerases to form RNP particle for RNA transcription, replication and packaging
of the virus (Webster et al., 1992; Portela and Digard, 2002).
Matrix protein M1 is involved in the nuclear export process of the RNP
complexes (Arzt et al., 2004). Matrix protein M2 serves as an ion channel that
regulates the internal pH of influenza A virus (Nicholson et al., 2003). Along with
NA, M2 protein also serves as a signal for virus transport of to cell membrane
(Webster et al., 1992).
Nonstructural protein 1 (NS1) prevents the activation of interferon (IFN) that
inhibits viral replication in the host cells (Yewdell and García-Sastre, 2002) among
many other functions in disrupting the host innate and adaptive immune responses
against influenza infection (Fernandez-Sesma, 2007). Nonstructural protein 2 (NS2), a
nuclear export protein, contains the signal that mediates the transport of RNP
complexes out of the nucleus for assembly into mature virus particles (Yewdell and
García-Sastre, 2002). The biology of influenza A viruses and the functions of viral
proteins are reviewed in Cheung and Poon, 2007.

9



These eleven proteins encoded by influenza A virus genome were included in
the analysis of conserved sequences. Specific sites in the influenza A proteins that are
known to be critical to virus activities (functional sites) were identified in this thesis
through literature review, to assess the relative functional importance of highly
conserved sequences.

The life cycle of influenza A virus
The transcription and replication of influenza viruses take place in the cell nucleus
(Herz et al., 1981; Jackson et al., 1982). The genome of influenza viruses, being
negative-strand RNAs, acts as template for messenger RNA (mRNA) and
complementary RNA (cRNA) syntheses. The replication cycle starts when the virus
particles enter the host cells through the interaction between sialic acid receptors on
the cell surface and the cleavage-activated HA on virus surface.
Figure 2 illustrates the replication cycle of influenza A virus particle. The
uptake of the attached virus particle into the host cells is mediated by endocytosis.
Acidic environment in the endocytic vesicle triggers HA conformational changes that
facilitates fusion of virus membrane and endosomal membrane, which results in the
release of viral genome that consists of eight vRNPs into the cytoplasm.
Subsequently, the vRNPs are transported to the nucleus for transcription and
replication. The vRNAs of virus progeny is synthesized from cRNAs. The NP protein
acts as a scaffold to encapsidate the cRNAs and vRNAs. New vRNP complexes are
formed and exported out of the nucleus into the cytoplasm. On another hand, the
structural proteins, HA, NA, and M2, of the new virus progeny are produced in the
endoplasmic reticulum (ER) and post-translationally modified in the Golgi apparatus.
At the apical surface of polarized epithelial cells, the newly synthesized vRNP

10



complexes and the structural proteins assemble to form new influenza virions, which
eventually bud off from the host cell membrane. In addition to the virus-encoded
proteins, many host cellular proteins would be incorporated into the virus particle
during the budding process. Shaw et al. (2008) identified as many as 36 cellular
proteins in the purified influenza virus particles using mass spectrometry techniques
and immunoblot assays, including cytoskeletal proteins, annexins, and glycolytic
enzymes. For reviews of influenza A virus life cycle, see Mikulásová et al. (2000) and
Cheung and Poon (2007).
Viral proteins, which entered into the host as part of the virus particle and viral
proteins synthesized de novo in the host cells, can be processed and presented by
antigen presentation cells (APCs) (Hackett and Eisenlohr, 1990). APCs can present
antigen to T cells and subsequently activates adaptive immune responses that play a
role in clearing virus infection (Section 1.5).

Figure 2. Influenza A virus replication cycle.
Reproduced with permission from: McSwiggen and Seth (2008)

11


1.2. Influenza virus sequences and classification

Influenza A virus sequences in public databases
In the recent years, especially since the outbreak of highly pathogenic avian influenza
H5N1 virus in poultry in 1997, tremendous efforts have been put into large-scale
collection, sequencing, and analysis (Ghedin et al., 2005; Obenauer et al., 2006), as
well as storing the resulted sequences and information in public domains (Chang et
al., 2007; Bao et al., 2008; Squires et al., 2008). Amongst the publicly accessible
influenza sequence databases are the Influenza Virus Resource by National Center for
Biotechnology Information (NCBI), BioHealthBase for Influenza Virus by National

Institute of Allergy and Infectious Diseases (NIAID), and Influenza Virus Database
by Beijing Institute of Genomics (Table 2). For the purpose of this thesis, the
influenza A virus sequences were primarily obtained from the sequence database at
NCBI. At the time of sequence data collection, the Influenza Virus Resource was not
available. Hence, incomplete record annotation (isolate name, country and year of
isolation, host organism, subtype and protein name) was obtained by manual curation
(Section 2.2) based on the information from the the corresponding literatures and
mirror entries on UniProtKB/Swiss-Prot database (Boutet et al., 2007).

12


Table 2. Public influenza sequence databases.
Database
Influenza Virus Resource
/>By National Center for Biotechnology Information
(NCBI)
BioHealthBase for Influenza Virus
/>By The National Institute of Allergy and Infectious
Diseases (NIAID), Division of Microbiology and
Infectious Diseases (DMID)
Influenza Virus Database
/>By Beijing Institute of Genomics (BIG), Chinese
Academy of Sciences

Source of Data
Annotated sequences from the
NIAID Influenza Genome
Sequencing Project and GenBank
(Bao et al., 2008)

Curated sequences from NCBI,
UniProt, and Immune Epitope
Database
(Squires et al., 2008)
Curated sequences of isolates
from different parts of China
(Chang et al., 2007)

The availability of huge amount of sequences is useful to understanding the
biology of influenza viruses and to finding preventive measures or cures for influenza
infection. At the same time, the large amount of data also poses challenges in terms of
the technologies available for analysis and the interpretation of analysis results. The
advancement of bioinformatics helps to circumvent this problem and can be exploited
to systematic large-scale identification of suitable vaccine targets. Firstly,
bioinformatics analysis of virus sequences can help narrow down targets selection for
experimental validation, which helps to save cost and time. Secondly, bioinformatics
tools can be employed for the analysis and interpretation of experimentation results.
The output would be selected vaccine targets for further development. This thesis
focuses on the first step of this process, that is, to build the methodological workflow
for selection for suitable vaccine target candidates for further experimental validation
for influenza A viruses. The methodology developed for this thesis work can also be
applied for other small viruses. The author of this thesis helped to develop and
implement methods for study of dengue virus (Khan et al., 2006 and 2008), West Nile
virus (Koo et al., manuscript in preparation), human immunodeficiency viruses (HIV)
(Hu et al., work in progress) and rabies virus (Heiny, work in progress).

13


Classification of influenza A viruses

The conventional classification of influenza A viruses are based on the antigenic
differences of the HA and NA surface glycoproteins. There are 16 HA subtypes and 9
NA subtypes classified to date. All HA and NA subtypes have been isolated from the
aquatic birds reservoir (Fouchier et al., 2005), while only H1N1, H2N2, H3N2,
H5N1, H7N7 and H9N2 subtypes have been identified in human isolates (reviewed in
Cheng and Poon, 2007). The system used for classification of influenza viruses
underwent changes in 1953, 1959, 1971, 1979, and 1980 (WHO, 1953, 1959, 1971,
1979 and 1980). The revised system of nomenclature for influenza A viruses consists
of a type and strain designation, and a description of the subtype of the HA and NA.
The strain designation contains information on the antigenic type of the virus (A for
influenza A viruses), the host of origin (indicated only for strains isolated from nonhuman sources), geographical origin (where the virus was isolated), strain number by
order of identification, and the year of isolation. The strain designation is followed by
the antigenic description of the HA and NA antigens subtype in parentheses. For
example, “A/duck/Alberta/35/76 (H1N1)” corresponds to influenza A virus strain
number 35 of subtype H1N1, isolated from duck located at Alberta in 1976; while
“A/Singapore/1/57 (H2N2)” corresponds to influenza A virus strain number 1 of
subtype H2N2, isolated from human (note that indication for human source is not
necessary) located at Singapore in 1957.
A system for designation of recombinant viruses was also recommended
particularly for laboratory-derived strains. A final “R” is added after the strains of
origin designation for recombinant strains that do not involve HA and NA genes. For
hybrid recombinants that involve HA and NA genes, the strains of originating H and
N antigens are indicated in the nomenclature, in addition to “R”. For example,

14


“A/BEL/42(H1)—Singapore/1/57(N2)R” corresponds to hybrid recombinant of
A/BEL/42 and A/Singapore/1/57, where HA gene of H1 subtype was contributed by
the former virus strain and NA gene of N2 subtype was contributed by the latter. In

this thesis, the recombinant viruses were excluded given that they are likely to
represent laboratory-derived strains.

1.3. Current vaccines against influenza and their limitations

Eradication of influenza infection is practically not possible because the wild aquatic
birds act as the reservoir for these viruses. Therefore, good preventative and
therapeutic measurements are necessary to control the infection and the impact on the
clinical and socio-economic aspects. The existing seasonal influenza vaccines have
been in use since 1945, with proven protection against the virus when there is a good
match between the vaccine strains and the circulating strains. However, historical
records have shown that at times vaccine strains poorly matched the epidemic strains
(Killbourne et al., 2002). The limitation lies in difficulty in predicting the antigenicity
of the viruses from the genetic sequences. Smith et al. (2004) showed that the
relationship between genetic and antigenic changes is not linear and that small genetic
changes could have a disproportionately large effect on antigenicity.
In addition to the antigenic variability of the viruses, the responses to
influenza vaccination is further complicated by varying degree of protection in
different individuals due to the genetic polymorphisms of HLA system (Lambkin et
al., 2004; Ovsyanikova et al., 2004). This thesis aims to improve the background
knowledge for current vaccine formulation strategy. Specifically, the author of this
thesis has analyzed the virus diversity in the context of immune system variability.

15


The author focused on identification of highly conserved regions in the virus and the
analysis of their antigenic potential. The author has identified conserved regions of
influenza A proteome that are likely to be immunogenic in the majority of the human
populations.


Seasonal vaccines
The seasonal influenza vaccines consist of three components: two strains of the
predominantly circulating influenza A viruses, and one strain of the predominantly
circulating influenza B virus. Appendix A lists the seasonal vaccine composition
recommended by World Health Organization in the past decade for both northern and
southern hemispheres. There are mainly two types of influenza vaccines in the
market: (1) trivalent inactivated influenza vaccines and (2) live-attenuated influenza
vaccine (reviewed in Tosh and Poland, 2008). Examples of the commercial vaccines
are Fluarix® split inactivated virus vaccine by GlaxoSmith Kline, FluMist® live
attenuated virus vaccine by MedImmune Vaccines, Fluvirin® split inactivated virus
vaccine by Chiron Corporation and Fluzone® split inactivated by Aventis Pasteur.
Typically, seasonal influenza vaccines manufacturing and production process takes
approximately 7 months, not considering the purification, testing, packaging and
shipping time (Treanor et al., 2004).

Pandemic vaccines
In the preparation for a possible flu pandemic caused by H5N1, the U.S. Food and
Drug Administration (FDA) approved the use of human vaccine against avian
influenza virus H5N1 in April 2007 (US FDA, 2007). The European Union authorized
the use of human prepandemic vaccine against H5N1 in May 2008 (Scully, 2008).

16


In addition, numerous clinical trials for candidate vaccine against H5 viruses
have been initiated (Dennis, 2006; Keitel and Atmar, 2007). Most of the efforts in
developing new vaccines against avian influenza focus on neutralizing antibody
responses induced by the hemagglutinin protein (Hobson et al., 1972). This type of
protection is very specific, type- and subtype-specific, and also often strain-specific.

Therefore, it is a challenge to keep up to the changes of the virus surface
glycoproteins.
The analysis of isolates from other species, such as swine and horse, would
also be of interest because of close physical proximity of these two species to humans.
As human and bird isolates represent large majority of known influenza A sequences,
these sets have the most representative data sets. This thesis, therefore, focuses on the
analysis of human and bird influenza A viruses.

Limitations in the existing vaccine design
The main limitations of the existing influenza vaccine strategies are: (1) the
requirement for annual change of vaccine composition to follow the virus changes and
(2) the variability in the vaccination response to vaccination, and therefore
effectiveness, across human population.

Strain-specific vaccines require frequent update
Existing vaccines against influenza are focused on the neutralizing antibody responses
against the virus glycoproteins. Due to the extreme diversity of the virus
glycoproteins, the protection conferred by the vaccines is subtype-specific, and at
times strain-specific. These vaccines suffer from limited effectiveness (depending on
the antigenicity match between the vaccine and the circulating strain) and limited

17


×