Principles of Molecular Epidemiology
Lecture 1: Molecular Epidemiology: overview and
definitions
National Institute of Infectious Disease
Tokyo, Japan
January 16-20, 2017
Lee W. Riley, MD
School of Public Health, University of California, Berkeley
1
What the course will cover
Basic epidemiology concepts
Laboratory methods used to conduct epidemiologic investigations
Principles of molecular epidemiology
Principles of evolutionary biology applied to infectious diseases
Practices of molecular epidemiology: examples
Modular exercises illustrating practical approaches to molecular
epidemiologic investigations
Next generation molecular epidemiology
Paper discussion
2
What is molecular epidemiology?
First paper to use the term “Molecular Epidemiology”:
Huang, E. S., C. A. Alford, D. W. Reynolds, S. Stagno, and R. F. Pass. 1980.
“Molecular epidemiology of cytomegalovirus infections in women and their
infants”, New England J of Medicine, 1980
Abstract: We studied cytomegaloviruses (CMV's) isolated from mothers and their
children to determine whether recurrent infections and transmission to the fetus in
immune women are due to reinfection or reactivation of endogenous virus. …
Endogenous CMV appears to be most frequent source of recurrent infection and
intrauterine transmission in immune women; reinfection also occurs, but less commonly.
3
What is molecular epidemiology?
Definitions:
Epidemiology: Study of the distribution and determinants of
distribution of diseases in human (and non-human animal) population
Molecular epidemiology of infectious diseases
Study of the distribution and determinants of distribution of infectious
diseases using molecular techniques
Study of the genetics of pathogens that determine disease transmission
Next generation molecular epidemiology:
Study of the distribution and determinants of distribution of infectious
diseases using next-generation sequencing methods
Study of complex microbial niches that determine infectious and noninfectious disease occurrence.
4
Is this epidemiology?
“Molecular epidemiology of the sil streptococcal invasive locus
in group A streptococci causing invasive infections in French
children”
We found 31 different emm-toxin genotypes among 74 group A
streptococcal isolates causing invasive infections in French children. The
predominant emm types were emm1 (25%), emm3 (8%), emm4 (8%),
emm6 (7%), and emm89 (9%). Sixteen percent of isolates harbored the
streptococcal invasive locus, half of them belonging to emm4.
5
Is this epidemiology?
“Molecular epidemiology of Mycobacterium tuberculosis in an urban
area in Japan, 2002-2006”
SETTING: Shinjuku City, Tokyo, Japan. OBJECTIVE: To evaluate the status of transmission of Mycobacterium
tuberculosis in Shinjuku City to allocate resources efficiently and effectively for a successful tuberculosis (TB)
control programme.
DESIGN: Observational descriptive study combining the genotype data of M. tuberculosis with TB patient profiles.
RESULTS: The genotype clustering rate was significantly higher in males (adjusted odds ratio [aOR] 1.94, 95%CI
1.04-3.65, P = 0.038), patients aged <40 years (aOR 2.09, 95%CI 1.17-3.71, P = 0.012) and the homeless
(aOR 2.72, 95%CI 1.42-5.20, P = 0.002), and was lower for the foreign-born (aOR 0.21, 95%CI 0.06-0.76, P
= 0.017). Among 45 genotype clusters containing 152 TB patients, 26 clusters containing 102 patients (67.1%)
were composed of a mix of homeless and non-homeless patients.
CONCLUSION: The study revealed that M. tuberculosis transmission occurred more frequently among the
homeless than in non-homeless persons. However, transmission by casual contact between the homeless and the
general population was also shown to occur.
6
Definitions: cont.
Phylogenetics: study of lines of descent or evolutionary
development of an organism
Taxonomy: the science of classification of organisms into
natural, related groups based on a factor common to each
Molecular evolution: phylogenetics based on analyses of
nucleic acid sequences to infer evolutionary relationships of
organisms
7
Definitions-cont.
Taxonomy/phylogenetics/molecular evolution:
studies relationship of organisms to each other
Molecular epidemiology: studies relationship of
organisms to each other and to their hosts within an
environmental context
8
Components of epidemiology of infectious diseases
People and nonhuman animals
Pathogen
Environment
Hypothesis generation about risks and causes
Identification of risks
Suggestions for approaches to identify causes
Devise appropriate intervention
9
Components of molecular epidemiology of infectious diseases
People and nonhuman animals
Pathogen characterized genetically
Environment
Hypothesis generation about risks and causes
Identification of risks
Suggestions for approaches to identify causes
Devise appropriate intervention
10
Epidemiology vs phylogeny/taxonomy/molecular evolution
Epidemiology
hypotheses can be generated and tested empirically
provides opportunity for intervention
not technique-dependent
Phylogenetics/taxonomy/molecular evolution
descriptive
information is inferred
intervention not implied
technique-dependent
11
Taxonomy example: Changes in the classification of Salmonella:
Before 1960s: >1000 “species”, based on O, H, and Vi antigens
(Kauffman-White scheme)
1960s-early 80s: 3 species (S. typhi, S. cholerasuis, S. enteritidis), based
on biochemical reactions (Ewing’s classification)
Current: 2 species (S. enterica, S. bongori), based on rRNA sequence
S. enterica
6 subspecies (I, II, IIIa, IIIb, IV, VI)
2501 serotypes
S. bongori (formerly subspecies V)
12
What is “species”?
(Janda & Abbott; J Clin Microbiol. 2007)
Number of bacteria ranked at the level of species:
1980: 1,791
2012: 9,620
( />
/>
13
“Species” (Janda & Abbott; J Clin Microbiol. 2007)
DNA-DNA hybridization (“gold standard”):
Species definition:
>70% DNA-DNA relatedness and
5°C or less TM for the stability of heteroduplex molecules
14
“Species” (Janda & Abbott; J Clin Microbiol, 2007)
16S rRNA sequences:
Species definition: strains with <97% similarity score
belong to new species
Similarity score >97%--unclear; no general agreement
September 30, 2016: 3,356,809 rRNA sequences
catalogued ( />
15
Bacteria that cannot be classified accurately by 16S rRNA sequencing
(Janda & Abbot, JCM, 2007)
Genus
Aeromonas
Species
Bacillus
B. anthracis, B cereus, B. globisporus, B. psychrophilus
Bordetella
B. bronchideptica, B. parapertussus, B. pertussus
Burkholderia
B. cocovenenans, B. gladioli, B. pseudomallei, B. thailandensis
Campylobacter
Edwardsiella
Non-jejuni-coli group
Enterobacter
Neisseria
Pseudomonas
Streptococcus
E. cloacae
A. veronii
E. tarda, E. hoshinae, E. ictaluri
N. coinerea, N. meningitidis
P. fluorescens, P. jessenii
S. mitis, S. oralis, S. pneumoniae
16
Pan-genome
Set of all the genes within a species
Core genome: genes found in all strains in a species
Dispensable genome: genes found in 2 or more strains of a species
Unique genes: genes specific to one strain
unique
core
dispensable
17
E. coli pan-genome (Kars RS et al, BMC Genomics, 2012)
•
•
•
•
•
186 E. coli isolates
945,211 genes
16,373 gene clusters
3051 “soft core” genes
1702 “strict core” genes
“soft core” –found in 95%
“strict core”—found in 100%
18
Phylogenetic tree of E. coli O157:H7 by their core genes
(Kaas RS et al, 2012)
19
Phylogenetic tree based on 1278 core genes of 186 E. coli strains
(Kaas et al, 2012)
20
Core/pangenome ratio (Raouli et al, New Microbes and New Infections,2015)
21
Scope of investigation covered by epidemiology
Identifying…
disease occurrence and distribution in time and place
reservoir of infectious agents
modes and pattern of disease transmission
setting of disease transmission
pathogen-related biologic factors that influence transmission
host-related (demographic, behavioral, clinical, genetic) factors that influence
transmission
environmental factors (socioeconomic, anthropologic, ecologic) that influence
transmission
etiologic role of a microbe for a newly-recognized disease or a disease previously not
recognized to be associated with an infectious agent
22
Scope of investigations covered by next generation molecular epidemiology
Identifying …
risk factors that could not be identified by conventional or early-generation
molecular biology laboratory methods
new or hidden transmission pathways
direction of transmission of an infectious agent
endogenous reactivation vs exogenous reinfection
ecological niche from which clonal pathogenic strains are selected and
disseminate
pathogen microbial population structures associated with a syndrome
host commensal microbial population structures that determine noncommunicable disease outcomes
23
Infectious disease epidemiological problems
addressed by molecular biology techniques (2009)
Tracking
strains across time and geography
Distinguishing endemic from epidemic disease
occurrence
Stratification of data to refine study designs
Distinguishing pathovars vs commensal flora or
saprophytes
Studying microorganisms associated with hospital or
institutional infections
Identifying genetic basis for disease transmission
24
Infectious disease epidemiological problems addressed by
molecular biology techniques (2016)
Tracking strains across time and geography
Distinguishing endemic from epidemic disease occurrence
Stratification of data to refine study designs
Distinguishing pathovars vs commensal flora or saprophytes
Identifying new modes of transmission
Studying microorganisms associated with healthcare or institutional infections
Surveillance and monitoring response to intervention
Characterizing population distribution and determinants of distribution of parasitic organisms
Identifying genetic basis for disease transmission
Validating microdiversity genotyping methods applied to epidemiology
Virus quasispecies population structure analysis
NGS
Identifying direction and chain of transmission
Identifying hidden social networks and transmission links
Analyzing microbiomes to study non-infectious disease epidemiology
25