Tải bản đầy đủ (.pdf) (87 trang)

Identifying the effect of exposure to dioxin and furans on human health leading to diffuse large b lymphoma through gene network construction

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.42 MB, 87 trang )

THAI NGUYEN UNIVERSITY
UNIVERSITY OF AGRICULTURE AND FORESTRY

NGUYEN THI QUYNH LAM

IDENTIFYING THE EFFECT OF EXPOSURE TO DIOXIN AND FURAN
ON HUMAN HEALTH LEADING TO DIFFUSE LARGE B LYMPHOMA
THROUGH GENE-NETWORK CONSTRUCTION

BACHELOR THESIS

Study Mode:

Full-time

Major:

Environmental Science and Management

Faculty:

International Programs Office

Batch:

2013 - 2017

Thai Nguyen, December 2017


DOCUMENTATION PAGE WITH ABSTRACT



Thai Nguyen University of Agriculture and Forestry
Degree Program

Bachelor of Environmental Science and Management

Student name

Nguyen Thi Quynh Lam

Student ID

DTN1353110372

Thesis Title

Identifying the effect of exposure to dioxin and furans on
human health leading to diffuse large B lymphoma through
gene-network construction

Supervisor(s)

Prof. ChunYu Chuang, Assoc. Prof. Tran Thi Thu Ha

Abstract:
Many studies indicated that exposure to dioxins and dioxins – compounds
(e.g.,

2,3,7,8


tetrachlorodibenzo-p-dioxin

(TCDD)

and

polychlorinated

dibenzofurans (furans) can induce several outcomes on human and animal in the
long term period, and one of them is diffuse large B lymphoma which is considered
as the most popular kind of lymphoma. In order to identify the gene expression
altered by TCDD and Furans potentially underlying DLBCL development,
bioinformatics meta-analysis was applied in this study. In this study, 10 datasets
containing the information of gene expression of DLBCL, TCDD and Furans were
obtained from Gene Expression Omnibus (GEO) and Array Express websites, and
further analyzed using Cytoscape software and its plugins – ClueGO and CluePedia.
As a result, the most differentially expressed genes were found to construct genenetworks of DLBCL, TCDD and furans and hence the potential pathway presented
how dioxins could cause the progress of lymphoma. In addition, the analytical result

i


reported that TCDD and furans have a possibility to induce the receptor AhR which
promotes the appearance of protein TWIST1 and enhance the progress of DLBCL.
The result of this study has made a great contribution for further dioxins study, and
it is also considered as the initial steps of future work for DLBCL diagnosis and
treatment.
Keywords:

TCDD, Furans, DBLCL, bioinformatics, GEO, Array

Express

Number of pages:

87

Date of Submission:

20/09/2017

Supervisor’s signature

ii


ACKNOWLEDGEMENT

First of all, I would like to use this opportunity to express my deepest gratitude
and special thanks to Prof. Chun-Yu Chuang for her patient to guide and keep me on
the correct path and show me many of wonderful things during the time of my
internship at the Department of Biomedical Engineering and Environmental Science
at National Tsing Hua University.
I would like to express my deep thanks to Assoc. Prof. Tran Thi Thu Ha for
giving me necessary advices and guidance in order to complete my thesis.
My sincere thanks are also given to all the members working in the Department
of Biomedical Engineering and Environmental Science for supporting me all the
materials and necessities when conducting experiments for my research.
Finally, I would like to thank my family and my friends encouraging me and
advising me during completion of this thesis.
Thai Nguyen, October 2017


Nguyen Thi Quynh Lam

iii


TABLE OF CONTENTS
ACKNOWLEDGEMENT .............................................................................................. iii
TABLE OF CONTENTS ................................................................................................ iv
LIST OF FIGURES ........................................................................................................ vii
LIST OF TABLES ......................................................................................................... viii
LIST OF ABBREVIATIONS ......................................................................................... ix
PART I. INTRODUCTION..............................................................................................1
1.1. Research rationale................................................................................................ 1
1.2. Research objectives ............................................................................................. 2
PART II. LITERATURE REVIEW ...............................................................................3
2.1. Persistent Organic Compounds (POPs) ............................................................... 3
2.2. Dioxins and dioxin – liked compounds ............................................................... 4
2.3. Lymphoma and non – Hodgkin lymphoma ......................................................... 8
2.3.1. Diffuse large B lymphoma ........................................................................... 8
2.3.2. SNPs of Diffuse Large B lymphoma ............................................................ 9
2.4. Gene - network components .............................................................................. 10
2.4.1. Microarray data .......................................................................................... 10
2.4.2. Gene network database: Array Express and GEO...................................... 11
2.4.3. Statistical analysis ...................................................................................... 13
2.4.4. Hub – proteins ............................................................................................ 15
2.4.5. GO term ...................................................................................................... 15
iv



2.5. Gene Network construction tools ...................................................................... 16
2.5.1. Network Analyst website ........................................................................... 16
2.5.2. Cytoscape software and plugins: ClueGO and CluePedia Apps ................ 17
PART III. METHODOLOGY ...................................................................................... 19
3.1. Data collection ................................................................................................... 19
3.2. Data processing.................................................................................................. 19
3.3. Network construction ........................................................................................ 21
PART IV. RESULTS AND DISCUSSION.................................................................. 24
4.1. Results ............................................................................................................... 24
4.1.1. Genetic datasets .......................................................................................... 24
4.1.2. Differentially genes expression .................................................................. 27
4.1.3. Gene-network construction of DLBCL, TCDD and Furans ...................... 33
4.1.4. Protein – protein interaction network of DLBCL, TCDD and Furans ....... 35
4.1.5. Potential pathway showing the relation between TCDD and Furans and
Diffuse Large B lymphoma. ................................................................................. 37
4.2. Discussion ......................................................................................................... 39
4.2.1. AhR – mediated key factor of dioxins – like compounds .......................... 39
4.2.2. Key factors of hypoxia response and the risk of MYC – TP53 interaction.
.............................................................................................................................. 40
4.2.3. Inhibition of cancer cell apoptosis and tumorigenesis factor in DLBCL ... 42
PART V. CONCLUSION .............................................................................................. 44
REFERENCES ................................................................................................................ 46
v


APPENDICES ................................................................................................................. 56
Appendix 1. Differentially expressed genes of DLBCL versus normal cell ............ 56
Appendix 2. Differentially expressed genes of exposure to TCDD group and versus
control group............................................................................................................. 63
Appendix 3. Differentially expressed genes of exposure to FURANS group versus

control group............................................................................................................. 67
Appendix 4. Hub proteins of DLBCL network ........................................................ 74
Appendix 5. Hub proteins of TCDD network .......................................................... 75
Appendix 6. Hub proteins of FURANS network ..................................................... 76

vi


LIST OF FIGURES

Figure 2.1: General molecular structure of polychlorinated dibenzo-p-dioxins
(PCDD) and dibenzofurans (Source: Pereira, 2004)................................4
Figure 2.2: Representative structure of 2,3,7,8-tetrachhlorodibenzo-p-dioxins
(TCDD) (Pereira, 2004) ................................................................................................
5
Figure 2.3: A schematic representation of signal transduction after
TCDD/AHR interaction (Fracchiolla et al., 2016) ................................ 7
22
Figure 3.1: The flowchart of methodology................................................................
Figure 4.1: Gene Ontology network showing the relationship of DLBCL,
TCDD and Furans ................................................................................................
32
Figure 4.2: Protein – protein interaction network constructed by CluePedia
plugin in Cytoscape ................................................................................................
36
Figure 4.3: Flowchart of proteins interaction and involved biological
processes ................................................................................................37

vii



LIST OF TABLES

24
Table 4.1: Database of DLBCL ................................................................................................
25
Table 4.2: Database of TCDD and Furans ................................................................
Table 4.3: Differentially expressed genes, including up-and down-regulated
genes in Diffuse Large B lymphoma compared to normal cells ................................
27
Table 4.4: Differentially expressed genes, including up-and down-regulated
genes activated by TCDD compared to control group ................................
29
Table 4.5: Differentially expressed genes, including up-and down-regulated
genes activated by Furans compared to control group ................................
30
Table 4.6: Lists of hub proteins containing in DLBCL, TCDD and Furans
networks ................................................................................................ 34

viii


LIST OF ABBREVIATIONS

ABC DLBCL

activated B – cell like DLBCL

AhR


Aryl Hydrocarbon Receptor

ARNT

Aryl Hydrocarbon Receptor nuclear translocator

B-NHL

B cell non-Hodgkin lymphoma

CRE

CAMP response element

DEG

Differentially expressed genes

DLBCL

Diffuse large B cell lymphoma

DNA

Deoxyribonucleic acid

DNMT1

DNA methyl transferase


EGFR

Epidermal growth factor receptor

FDR

False discovery rate

GCB DLBCL

Germinal center B-cell like DLBCL

GEO

Gene expression omnibus

GO

Gene ontology

HAHs

Halogenate aromatic hydrocarbon

MAGE-ML

Microarray and Gene Expression Markup Language

MAGE-TAB


Microarray Gene Expression - Tabular format

MIAME

Minimum information about microarray experiment

MM

Mismatch

OC

Organochlorine

PCBs

Polychlorinated biphenyls

ix


PCDD/Fs

Polychlorinated dibenzo-p-dioxins/furans

PCDDs

Polychlorinated dibenzofurans

PM


Perfect match

POPs

Persistent organic compounds

ROS

Reactive oxygen species

SNPs

Single nucleotide polymorphisms

TCDD

2,3,7,8 tetrachlorodibenzo-p-dioxin

TNF

Tumor necrosis factor

XRE

Xenobiotic response elements transcription.

x



PART I. INTRODUCTION
1.1. Research rationale
Dioxins and dioxins-like compounds are largely concerned these days due to
their permanent impacts on human and animals in the long-term period. TCDD and
Furans are representative of dioxins and dioxins-liked compounds, which can
influence negatively on human health with a little amount through bio-magnification
and food chain. The most significant impact of these chemicals is genetic variation
through aryl hydrocarbon receptor (AhR) activation when it passes into nucleus in
animal body and hence induces genetic disease and carcinogenesis.
Diffuse large B lymphoma (DLBCL) is the most prevalent B cell non –
Hodgkin lymphoma, which occupies 40% of lymphoma diagnoses. The cause of
Diffuse Large B lymphoma is exactly unknown, however, many pro-oncogenes and
abnormal genes causing lymphoma have been found in previous studies. The
identification of biological mechanisms activating those genes, whether they are
related to dioxins and dioxins-liked compounds impact or not, is highly essential to be
considered. Bioinformatics, including sequence analysis, gene and protein expression,
cellular organization analysis, structural bioinformatics, network and system biology
and others, has a large contribution to various fields in global scale.
The application of bioinformatics in biomedical has been largely paid attention
in many developed countries, by contrast, it is still unpopular in Vietnam.
Specifically, many researches indicated that the application of high sequencing and
DNA microarray technology has a significant role in attempt to identify
genetic/transcriptomic alterations causing DLBCL and prognosis biomarkers for

1


lymphoma treatment. Therefore, the activation of those abnormal genes and the
influence of dioxins can be clarified by the application of bioinformatics. In order to
clarify diagnosis of lymphoma, the study “Identifying the effect of exposure TCDD

and Furans on human health leading to diffuse large B lymphoma through genenetwork construction” has been conducted with promotion of Biomedical
Engineering and Environmental Science faculty of National Tsing Hua University in
Taiwan.
1.2. Research objectives
The objectives of this research are:
- To investigate respectively the differentially expressed genes for diffuse large
B lymphoma (DLBCL) tissues and dioxin exposure of human cell lines;
- To construct the gene-network for exploring number whether exposure to
dioxin can induce DLBCL;
- To identify the potential pathway exposure to dioxin corresponding to
DLBCL.

2


PART II. LITERATURE REVIEW
2.1. Persistent Organic Compounds (POPs)
Persistent organic compounds include a variety of lipophilic compounds that
relate to environmental degradation. Amongst various kinds of POPs, for example,
Organochlorine (OC) pesticides or industrial chemicals of by products, the category
containing Cl atom has a great ability to cause the most deleterious effects and as a
consequently, they have been banned and strictly regulated in many countries. Despite
of that regulation, POPs exposure sustains in general population due to the
consumption of fatty acid derived from animals. The concentration of POPs has a
tendency of increasing which corresponds to the level of food webs in order to
perform biomagnification, as a results, the POPs concentration accumulating in
human bodies might be higher compared to the external environment (Fisher et al.,
1999). In addition, POPs accumulated in adipose tissue in life is considered as one
route of chronic exposure since they are continuously released from adipose tissue to
the circulation and vital organs with lipid content (La Merrill et al., 2013).

POPs consist of these main properties. The first property is mentioned as a
combination of lipophilic compounds that accumulate mainly in lipid – containing
tissues like adipose tissue and move within the body bound to lipids (Lewis et al.,
2002). In addition, POPs are always presented as chemical mixtures in the external
environment due to mixing in the environment, food web, long – term retention of fat
tissues (Kortenkamp et al., 2008) Therefore, these distinct groups of OC pesticides,
polychlorinated biphenyls (PCBs) and dioxins are classified referring to chemical
mixtures of each POPs subclasses.
3


2.2. Dioxins and dioxin – liked compounds
Polychlorinated

dibenzo-p-dioxins/furans

(PCDD/Fs)

is

classified

as

ubiquitous POPs. PCDD/Fs is two of the three subclasses of the halogenated aromatic
hydrocacbon and two of them are referred as dioxins and dioxin-liked-compounds
respectively (see Figure 2.1).

Figure 2.1: General molecular structure of polychlorinated dibenzo-p-dioxins
(PCDD) and dibenzofurans (PCDF)

(Source: Pereira, 2004).
They are widespread in almost area in the environment, especially there is no
exception for the remote area. Dioxins and dioxin-liked-compounds tend to be
persistent and lipophilic in the external environment so that they can be bioaccumulated through food chains and potentially cause potential effects on human
health biota and even human. PCDD/Fs are two of subclasses of the halogenate
aromatic hydrocarbon (HAHs), which are specified by the basic aromatic structure of
a benzene ring, a hexagonal carbon structure with conjugated double bonds
connecting to the carbon. The difference of both dioxins and dioxins like compounds
depends on the number of oxygen rings in their structure; are 2 and 1 rings
respectively. The biological impacts of PCDDs/Fs are activated by the production of
4


similar spectrum of toxic effects through binding of dioxins and dioxins-likedcompounds to a receptor protein – Aryl Hydrocarbon Receptor (AHR). The molecular
planar shape facilities binding to the receptor and its relative potency depends to a
large degree on its persistence and how well it fits to the receptor. PCDDs/Fs and one
component of PCDDs – tetrachlorodibenzo-p-dioxins (TCDD) have a high affinity to
AHR and fit very well on that receptor, actively. PCDD/Fs are derived from 4 main
sources, including (1) combustion, (2) meta – smelting, refining and processing, and
(3) biological and photochemical process (US National Research Council, 2006).
PCDD/Fs has a potential to cause cancer, birth effect, reproductive disorders,
immunotoxicity, and other potential toxic end points, including liver diseases, thyroid
dysfunction, lipid disorders, neurotoxicity, cardiovascular disease, and metabolic
disorders, such as diabetes (US National Research Council, 2006).
* 2,3,7,8 tetrachlorodibenzo-p-dioxin (TCDD):
According Pereira (2004) 2,3,7,8-tetrachhlorodibenzo-p-dioxins (TCDD) is
structured as below (see Figure 2.2).

Figure 2.2: Representative structure of 2,3,7,8-tetrachhlorodibenzo-p-dioxins
(TCDD)

(Source: Pereira, 2004).

5


2,3,7,8 tetrachlorodibenzo-p-dioxin (TCDD) is one of the most toxic members
of the family of polychlorinated dibenzodioxin (PCDDs) and represents a nearly
ubiquitous environmental contaminant (Pesatori et al., 1993, 2009). TCDD is
considered as a synthesis byproduct from chlorophenols or chlorophenoxy herbicides
manufacturing (Saracci et al., 1991). It can be formed in burning processes along with
other polychlorinated dibenzodioxins and dibenzofurans. In addition, it can be derived
from waste incineration, metal production, fossil fuel or wood combustion (Deziel et
al., 2012). Dioxins are likely to involve in bioaccumulation in the food chain due to
its long biological half-life and the low water solubility; even the small amount of
dioxins can induce the significant level of dioxin concentration in the food chain
(Paustenbach et al., 1992). It is proved that TCDD can induce its effects via the
binding of the dioxin receptor AhR due to its affinity to TCDD in many mammalian
species.
AhR is a basic-loop-helix/PAS transcription factor that locates in cytoplasm
where it forms a complex with various proteins and lipophilic compounds (Agostinis
et al., 2007). In cytoplasm, it is associated to pp60, which can bind to epidermal
growth factor receptor (EGFR) and induce mitogen – activated protein signaling. In
nucleus, AhR builds up a heterodimer with the intranuclear aryl hydrocarbon receptor
nuclear translocator (ARNT) to form a AhR – ARNT complex which promotes
xenobiotic response elements transcription (XRE) and interact with several important
pathways, for example, Wnt-beta-catenin, estrogen receptors, retinoblastoma protein,
retino acids, NF-kB and the circadian rhythm regulators (Sorg, 2013). AhR has been
proved to be involved in multiple physiological regulation and effects, for example,

6



altered cell cycle regulation and proliferation. In fact, exposure to TCDD in Sweden
and US workers indicated similar observation of a relationship between phenoxyl
herbicide exposure and cancer, particularly prolong TCDD exposure are related to the
increase of relative risk of Non – Hodgkin lymphoma (Hardell et al., 1996). Besides,
45 million liters of Agent Orange contaminated TCDD were spread out in South
Vietnam and Cambodia to destroy vegetation from 1962 to 1971 that leads to several
cancer incidence has still remained (Stellman et al., 2003).
Therefore, the aim of this study mainly focus on the potential gene-network
and pathway to investigate how the most toxic substance of PCDDs – TCDD and
furans - group of dioxin-liked-compound can induce one of common Non – Hodgkin,
especially diffuse large B lymphoma disease (Figure 2.3).

Figure 2.3: A schematic representation of signal transduction after TCDD/AHR
interaction
(Source: Fracchiolla et al., 2016)
7


2.3. Lymphoma and non – Hodgkin lymphoma
Lymphoma is considered as a well-known name of neoplasms of lymphoid
precursor cells, which was initially reported in 1832 by Thomas Hodgkin and hence
the disease was named to Hodgkin’s lymphoma. After that, several kinds of
lymphoma were discovered, however, the disease was divided mainly into 2
subclasses: Hodgkin lymphoma and non – Hodgkin lymphoma. The majority of Non –
Hodgkin lymphoma is B cell lymphoma apart from T-cell and NK-cell lymphoma.
Lymphoid neoplasms are a group of highly diverse disease and reflect the diversity of
immune system (Hussain and Harris, 1998). In Vietnam, the incidence of NonHodgkin lymphoma has increased during the last ten years that record 2700 cases each
year (Nguyen, 2015).

2.3.1. Diffuse large B lymphoma
Diffuse large B lymphoma (DLBCL) is considered as the most prevalent B cell
non-Hodgkin lymphoma (B-NHL) in adulthood, occupying for 40% of diagnoses
There are three major subclasses of DLBCL which are characterized basing on
molecular heterogeneity of DLBCL, including germinal center B-cell like DLBCL
(GCB DLBCL), activated B-cell like DLBCL (ABC DLBCL) and primary
mediastinal B-cell lymphoma. GCB DLBCL is derived from germinal center B cell
and expresses genes characteristics of germinal center B lymphocytes, while ABC
DLBCL expresses genes characteristic of plasma cells, which are thought to arise
from B-cells activated for differentiation into plasma cells. Primary mediastinal B cell
lymphoma is thought to mediate from rare B-cell populations that reside in the thymus

8


and have a distinct gene expression compared to GCB and ABC DLBCL (Rosenwald,
2003).
2.3.2. SNPs of Diffuse Large B lymphoma
The application of gene expression and genome sequencing is carried out in
order to increase our understanding of DLBCL subclasses and the molecular basic of
chemotherapy resistance and support for identification of novel molecular DLBCL
subset and target for drug interventions and hence to prevent and treat DLBCL
(Lossos et al., 2006).
The majority of DLBCL can arise from normal antigen-exposed B cells that are
at separate stages of differentiation and undergo clonal expansion in the germinal
center (GCs) of peripheral lymphoid organs (Martelli et al., 2013). Besides, DLBCL
can involve and progress through a range of multistep transformation processes.
Specifically, progression of DLBCL can be evolved slowly or rapidly due to different
stages, through clonal evolution or simultaneous and extensive DNA rearrangements
in subclones. Several diverse genetic abnormalities have been observed referring to

their clinical and genetic (clonal) heterogeneity, including aberrant somatic
hypermutation, nonrandom chromosomal deletions, balanced reciprocal translocation,
deregulating the expression of proto – oncogene products, such as BCL6, BCL2, REL
or c-MYC and dysregulated apoptosis of defective DNA repair (Morin et al., 2013).
Several genes mutation causing DLBCL have been identified in several studies,
for example, the primary or early oncogenic events are chromosomal translocations
involving oncogenes such as BCL6, BCL2, REL or c-MYC, whereas a groups of
BCL2, PRDM1, CARD11, MyD88, TNFAIP3, CREBBP, TP53, EZH2, MLL2,
9


MYOM2, PIM1, LYN, CD36, B2M, CD79B, MEF2B, ANKLE2, KDM2B, HNF1B,
NOTCH1/2, DTX1 and MYCCD58 tend to appear in the secondary or late oncogenic
events of clonally represented recurrent mutations or gene alteration (Morin et al.,
2013). In addition, the alteration of DNA repair and DNA signaling genes causing
effects on DNA repair pathway has been identified in DLBCL tumors and they have a
tendency to form intermediate cancer driver events in lymphomagenesis. Moreover,
mutation or translocation of BCL6, BCL2, REL or c-MYC can induce overexpression
of proto-oncogene products, whereas genetic lesions and mutations in TNFAIP3,
CARD11, CD79A/B, MYD88 or TRAF2 can activate canonical and non-canonical
NF-kB pathways (Zhang et al., 2015). Furthermore, most frequent cancer driver
events in DLBCL are accounted for some epigenetic reprogramming, trigged by
mutations in genes, for example, TET1, MLL2, EZH2, MEF2B, EP300 and CREBBP
(Zhang et al., 2013). Therefore, tumor cell with gene expression plasticity, escape
from apoptosis and enhanced growth are provided by the alterations in gene
expression of proto – oncogene products and tumor suppressors through constitutive
survival and proliferative signals.
2.4. Gene - network components
2.4.1. Microarray data
DNA microarray has been used to determine the expression level of a large

number of genes. Microarray platforms for gene expression include single-color and
two-color system. Affymetrix Gene Chip arrays are widely used single-platform for
microarray analysis, which are constituted of probed complementary to a region of
each mRNA transcript, usually at the 3’ end of the transcript. Each probe sets consists
10


of a set of 11 to 20 perfect match (PM) of probes which are typically 25 nucleotides
long, together with an equal number of mismatch (MM) probes which are identical to
the PM probes except for a single nucleotide substitution in the center of proves.
DNA microarray techniques have been applied to predict DLBCL treatment
success and explain disease heterogeneity five clinical features (age, tumor stage,
serum lactate, dehydrogenase concentration, performance status, number of extra
nodal disease sites) (Gohlmanm and Talloen, 2009). In fact, this technique is most
widely used to profile gene expression of an organism on a whole genome scale, and
available for spawning a series of microarray-based expression studies of DLBCL in
order to refine prognosis referring to molecular – level information (Segal, 2005).
Besides, DNA microarray was also carried out to analyze the changes of human B-cell
gene expression induced by dioxins (Kovalova et al., 2017).
In this study, the gene expression profiling representing DLBCL and dioxins
(TCDD and Furans) created by DNA microarray techniques were conducted for
further analytical steps. The datasets of gene expression are collected in two main
kinds of databases: Gene Expression Omnibus (GEO) and Array Express databases,
that will be discussed more detailed in the following part.
2.4.2. Gene network database: Array Express and GEO
All of the datasets in this study were derived from Array Express database and
Gene Expression Omnibus (GEO) database. Array Express is a public database for
high throughput functional genomics data, which consists two distinct parts, including
the Array Express Repository and the Array Express Data Warehouse. The Array
Express Repository is considered as a MIAME supportive public archive of

11


microarray data, whereas the Array Express Data Warehouse performs a database of
gene expression profiles selected from the repository and consistently re-annotated.
The required samples or experiments can be found by experiment attributes, for
example, keywords, species, array platforms, authors, journals or accession numbers.
Gene names, gene properties or gene ontology terms are useful in order to visualize
gene expression profiles. The database of Array Express is rapidly growing and it
includes data from larger 50000 hybridization and 1500000 individual expression
profiles. MIAME (Minimum Information About Microarray Experiment), Microarray
and Gene Expression Markup Language (MAGE-ML) and Microarray Gene
Expression - Tabular format (MAGE-TAB) are considered as some of community
standards that are supported by Array Express. (Parkinson et al., 2007).
GEO database derived from National Center for Biotechnology Information
(NCBI) is considered as an abundant data containing gene expression data generated
by DNA microarray technology. The database has a suitable design for both
unprocessed and processed data in a MIAME. The quantitative of gene expression
data resulting in a large number of biological phenomena in GEO is about billion, and
all of them are derived from over 100 organisms and 1500 laboratories. Several userfriendly web applications have been carried out in order to increase the utility,
effective exploration, query and visualization of these data in both individual and
entire studies (Barrett, 2004).

12


2.4.3. Statistical analysis
2.4.3.1. Meta-analysis
Meta-analysis is a kind of statistical techniques for the sake of combining result
from several studies apart from various kinds of statistics, for example, Fisher’s

statistic, minimum and maximum statistic. This technique has been applied to
microanalysis, in particular, in order to combine different studies for DEGs
(Differentially expressed genes) application in microarray studies and boost the
reliability of results from individual studies (Shen and Tseng, 2010). In order to
conduct microarray meta-analysis, seven steps have been carried out, including: (1)
identify suitable microarray studied, (2) extract the data from the studies, (3) prepare
the individual datasets, (4) annotate the individual datasets, (5) resolve the relationship
between probes and genes, (6) combine the estimation of the studies and (7) analyze,
present and interpret results (Ramasamy et al., 2008). Meta-analysis is probably
beneficial for this study in the attempt to identify DEGs of DLBCL tissues and
dioxins group compared to normal tissues and control group respectively, which are
mainly concerned in the next part.
2.4.3.2. False Discovery Rate (FDR)
The false discovery rate (FDR) is considered as the expected fraction of false
rejections among those hypotheses rejected. This method is carried out in
microanalysis in order to estimate the proportion of false positive finding amongst the
genes that were selected to become differentially expressed (Gohlmann and Talloen,
2009). Although various procedures have been built to control the FDR, the FDR
method of Benjamini and Hochberg is considered the most popular which has been
13


carried out in this study. The Benjamini and Hochberg method is calculated as the
formula below:
p

=p

m
with i = 1,2,3,4 … , m

order(pi)

Where:
p

is adjust P value by Benjamini Hochberg method

pi is the p Value of gene I
m is the total number of genes in dataset

2.4.3.3. Different Expression Analysis
Different gene expression is currently applied in microarray analysis in order to
find the genes that are differentially expressed. In fact, mutation in gene or a set of
gene is the main factor that induce abnormal or fail gene expression, for example, p53
tumor suppressor gene are transcribed that can cause cancer disease. Therefore,
microarray experiments are useful to identify which gene are differentially expressed
in disease cell versus normal cells. The comparison between various kind of “disease”
and “normal” cells provides an opportunity in order to find multiple target genes that
their up- and down- regulation can be the result of the disease. After that, the
development of drug target for specific mutated genes is carried out in order to reduce
their undesirable effects. In addition, Different Gene Expression has a significant
relationship with gene function and it can provide fully information about genes and
protein interaction. Therefore, differentially expressed genes are carried out in the
reconstruction of gene network, metabolic pathway and gene annotation (Zhang,

14


×