Tải bản đầy đủ (.pdf) (3 trang)

DSpace at VNU: A maximum likelihood method for detecting bad samples from Illumina BeadChips data

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (98.2 KB, 3 trang )

A maximum likelihood method for detecting
bad samples from Illumina BeadChips data
Nguyễn Hà Anh Tuấn

Trường Đại học Công nghệ
Luận văn Thạc sĩ ngành: Khoa học máy tính; Mã số: 60 48 01
Người hướng dẫn: TS. Lê Sỹ Vinh
Năm bảo vệ: 2012

Keywords. Công nghệ thông tin; Dữ liệu
Content
Table of Contents

Overview

1

1

Introduction
3
1.1 Biological background .................................................................................
3
1.2 Some common types of mutation ................................................................
5
1.3 SNP and SNP genotype ...............................................................................
6
1.4 Microarray technology and Illumina BeadChips ........................................
7
1.5 Genotype callers ..........................................................................................
8


1.6 Quality control and quality assurance ..........................................................
9
1.6.1 Identify samples with discordant sex information .......................... 10
1.6.2 Identify samples that have high missing and heterozygosity rate 11
1.6.3 Identify duplicated or related samples ............................................ 11
1.6.4 Identify samples that have different ancestries ................................ 12

2

Genotype callers
14
2.1 Illuminus ....................................................................................................... 14
2.2 GenoSNP ..................................................................................................... 17
2.3 GenCall ......................................................................................................... 18


2.4

Comparing three callers ...............................................................................

18

3

Maximum likelihood method for detecting bad samples
20
3.1 Create potential bad sample list .................................................................. 21
3.2 Estimate the fitness of data ........................................................................... 22
3.3 Remove bad samples .................................................................................... 24


4

Experimental result
25
4.1 Input file format ........................................................................................... 25
4.2

Experiment 1 ................................................................................................. 27

4.3

Experiment 2 ................................................................................................ 31

Conclusion

34

Publications

35

References
[APC+10] C.A. Anderson, F.H. Pettersson, G.M. Clarke, L.R. Cardon, A.P. Morris, and
K.T. Zondervan. Data quality control in genetic case-control association
studies. Nat Protoc, 5(9):1564-73, 2010.
[CBSI07] Benilton Carvalho, Henrik Bengtsson, Terence P. Speed, and Rafael A.
Irizarry. Exploration, normalization, and genotype calls of high-density
oligonucleotide snp array data. Biostatistics, 8(2):485-499, 2007.
[CM01] Francis S. Collins and Victor A. McKusick. Implications of the human genome
project for medical science. JAMA: The Journal of the American Medical

Association, 285(5):540-544, 2001.
[GYC+08a] Eleni Giannoulatou, Christopher Yau, Stefano Colella, Jiannis Ragous- sis, and
Christopher C. Holmes. Genosnp: a variational bayes within- sample snp
genotyping algorithm that does not require a reference population.
Bioinformatics, 24(19):2209-2214, 2008.
[GYC+08b] Eleni Giannoulatou, Christopher Yau, Stefano Colella, Jiannis Ragous- sis, and
Christopher C. Holmes. A genotype calling algorithm for the illumina
beadarray platform. Bioinformatics, 24(19):2209-2214, 2008.
[Inc05]

Illumina Inc. Illumina gencall data analysis software.
//www.illumina.com/documents/products/technotes/technote_
gencall_data_analysis_software.pdf, 2005.

http:


[Inc06] Illumina Inc. Infinium ii assay workflow. />documents/products/workflows/workflow_infinium_ii.pdf, 2006.
[KF01] Larry J. Kricka and Paolo Fortina. Microarray technology and appli-



×