Tải bản đầy đủ (.pdf) (15 trang)

Báo cáo khoa học lập chỉ mục cơ sở dữ liệu cấu trúc protein

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (976.28 KB, 15 trang )

LҰP C Hӌ M Ө&&Ѫ6Ӣ DӲ L IӊU CҨU T R Ú C PR O T E I N
Phan MҥQK7Kѭӡng1, L âm T hӏ Hoà Bình 1ĈһQJ1Kѭ7RjQ1ĈRjQ7KLӋn M inh1
T rҫQ9ăQ/ăQJ2
1

Khoa Công ngh͏ WK{QJWLQ7U˱ͥQJĈ̩ i h͕c L̩c H ͛ng
10 HuǤQK9ăQ1JKӋ%LrQ+zDĈӗng Nai
{thuong,binh,dangnhutoan,dtminh}@lhu.edu.vn
2

9L͏Q.KRDK͕FYj&{QJQJK͏9L͏W1DP

0ҥFĈƭQK&KL4XұQ73+ӗ&Kt0LQK


7yP WҳW 7uP NLӃP Vӵ WѭѫQJ ÿӗQJ YӅ FҩX WU~F EұF ED FӫD FiF SURWHLQ
WURQJFѫ Vӣ GӳOLӋXFҩXWU~F SURWHLQOӟQOj PӝWEjLWRiQSKӭFWҥSYjÿzL
KӓLQKLӅX WKӡLJLDQ[ӱOê6ӕOѭӧQJFiFFҩXWU~FSURWHLQÿѭӧFNKiPSKi
QJj\ FjQJ JLD WăQJ QKDQK FKyQJ Yj WURQJ FiF Fѫ Vӣ Gӳ OLӋX YӅ FҩX WU~F
SURWHLQ YLӋF OұS FKӍ PөF FKR FiF SURWHLQ VӁ JL~S WKDR WiF WuP NLӃP VR
ViQK FҩX WU~F WKӵF KLӋQ QKDQK KѫQ Yj KLӋX TXҧ KѫQ 7Uong bài báo này
WUuQKEj\PӝWSKѭѫQJSKiSOұSFKӍPөFFKRFѫVӣGӳOLӋXFҩXWU~FSURWHLQ
WK{QJ TXD YLӋF SKkQ WtFK FҩX WU~F Wӯ ÿy U~W UD YHFWRU ÿһF WUѭQJ Yj [k\
GӵQJ PӝWFҩXWU~FFk\GӵD trên các YHFWRUÿһFWUѭQJÿӇOұSFKӍPөFFKR
FҩX WU~F SURWHLQ 9ӟL Fѫ Vӣ Gӳ OLӋX ÿm ÿѭӧF OұS FKӍ PөF YLӋF WuP NLӃP
PӝW FҩX WU~F SURWHLQ KRһF PӝW FҩX WU~F FRQ WURQJ SURWHLQ WUӣ QrQ QKDQK
FKyQJYjFKtQK[iFKѫQ
7ӯNKRi&ҩXWU~FSURWHLQEұFEDOұSFKӍPөFFѫVӣGӳOLӋXSURWHLQ.

1. Ĉһt Yҩn ÿӅ
Protein là mӝt chuӛi polypeptLGHÿѭӧc tҥo thành tӯ các axít amin. Nghiên cӭu


SURWHLQÿyQJYDLWUzTXDQWUӑng, vì chúng hoҥWÿӝng trong tҩt cҧ các quá trình sinh hӑc,
bao gӗm cҧ xúc tác enzym (tҩt cҧ các phҧn ӭng hóa hӑc trong tӃ bào sӕQJÿѭӧc xúc tác

1  
 


bӣi enzyme protein), vұn chuyӇn các chҩWNKiFQKDXQKѭGѭӥQJNKtFiFLRQ«, và tín
hiӋu. ĈӇ hiӇXÿѭӧc mӕi quan hӋ giӳa cҩu trúc và chӭFQăQJ cӫa protein, các nhà nghiên
cӭu cҫn phҧi lҩy tӯ Fѫ Vӣ dӳ liӋu cҩu trúc protein và phân loҥi chúng thành các hӑ
protein khác nhau.VҩQ ÿӅ quan trӑng trong viӋc gom nhóm các protein dӵa trên sӵ
WѭѫQJÿӗng cҩu trúc nhҵm mөc tiêu:
o

Phát hiӋn các mӕi quan hӋ tiӃn hóa

o

Xác ÿӏQKFiFPRWLI ÿRҥn lһp), là nhӳng cҩXWU~Fÿѭӧc hình thành bӣi sӵ
sҳp xӃp cӫa các axit amin trong không gian ba chiӅu

o

Phát hiӋn mӕi quan hӋ giӳa cҩu trúc và chӭFQăQJFӫa protein

o

Hӛ trӧ trong viӋc thiӃt kӃ thuӕc trӏ bӋnh

o


Phát hiӋn các trình tӵ FyOLrQTXDQÿӃn bӋQKXQJWKѭYjFiFEӋnh khác.

9ӟLVӵÿәLPӟLF{QJQJKӋYjSKiWWULӇQQKDQKFKyQJFӫD các SKѭѫQJSKiSxác
ÿӏQKFҩXWU~FSURWHLQQKѭSKѭѫQJSKiS;-quang WLQKWKӇ, NӻWKXұWSKkQWtFKTXDQJSKә
NMR«PӝWVӕOѭӧQJOӟQ FiFFҩXWU~F FKLӅX FӫD FiFSKkQWӱSURWHLQ PӟLÿm ÿѭӧF[iF
ÿӏQK &iFFҩXWU~F Qj\KLӋQÿDQJÿѭӧFOѭXWUӳWҥLQKLӅXFѫVӣGӳOLӋXWUrQLQWHUQHWYj
FXQJFҩSPLӉQSKtFKRFiFQKjQJKLrQFӭXFyWKӇNӇÿӃQ
o

Ngân hàng dӳ liӋu protein PDB [1] (Protein Data Bank) thuӝc phòng thí
nghiӋm RCSB (Research Collaboratory for Structural Bioinformatics): bao
gӗm 73153 cҩu trúc

o

SCOP Structural Classification of Proteins [2]: bao gӗm 38221 cҩu trúc

o

CATH Protein Structure Classification [3]: bao gӗm 104238 cҩu trúc

o

ModBase Database of Comparative Protein Structure Models (Sali Lab,
UCSF): bao gӗm 41140 cҩu trúc

7uPNLӃPVӵWѭѫQJÿӗQJYӅFҩXWU~F EұFba FӫDPӝWSURWHLQKRһFPӝWFҩXWU~F
con cӫDprotein EҩWNǤtrong FѫVӣGӳOLӋXFҩXWU~FSURWHLQngày càng OӟQ OjPӝWQKLӋP
YөNKyNKăQYjWӕQWKӡLJLDQ9uYұ\ FiFQKjVLQKKӑFÿDQJFҫQPӝWSKѭѫQJWLӋQÿӇWuP

NLӃPFѫVӣGӳOLӋXFҩXWU~FSURWHLQQKDQKFKyQJ YjKLӋXTXҧWѭѫQJWӵQKѭFiFK%/$67
[5] WuPNLӃP trong FѫVӣGӳOLӋXWUuQKWӵ %jLWRiQWuPNLӃPYjSKkQORҥLSURWHLQWKѭӡQJ
WUҧLTXDKDLJLDLÿRҥQU~WWUtFKÿһFWUѭQJP{WҧFKRSURWHLQ YjÿRVӵJLӕQJQKDXYӅÿһF
WUѭQJFӫDFiFSURWHLQÿӇSKkQORҥLFK~QJ

2  
 


ĈӇ WKӵF KLӋQ U~W WUtFK ÿһF WUѭQJ FӫD Fҩu trúc protein Fy UҩW QKLӅX WKXұW WRiQ,
WKXұWWRiQ&766>6@[ҩS[ӍFҩXWU~FFiF&Į[ѭѫQJVӕQJFӫDSURWHLQ EҵQJ PӝWÿѭӡQJ
VSOLQHPӏQYӟLÿӝFRQJWӕLWKLӇXVDXÿyOѭXWUӳÿѭӡQJFRQJJyF[RҳQYjFҩXWU~FEұF
KDLFӫDPӛLQJX\rQWӱ&ĮWURQJPӝWPөFFKӍVӕGӵD WUrQSKpSEăP
ProGreSS [5@OjPӝWSKѭѫQJSKiS PӟL, WKӵFKLӋQU~WWUtFKÿһFWUѭQJWӯFҩXWU~F
NӃWKӧSYӟLWUuQKWӵWK{QJTXDPӝWFӱDVәWUѭӧWWUrQFҩXWU~F[ѭѫQJVӕQJFӫDSURWHLQ
ĈһFWUѭQJYӅFҩXWU~FFӫDQyWѭѫQJWӵQKѭFiFÿһFWUѭQJU~WUDWӯ&766 ÿӝcong, góc
[RҳQYjWK{QJWLQFҩXWU~FEұFKDL
FiFFKXӛLÿһFWUѭQJÿѭӧFWtQKWRiQWӯYLӋFVӱGөQJ
PD WUұQ ÿLӇP QKѭ 3$0 KRһF %/2680 *LӕQJ QKѭ &766 FiF ÿһF WUѭQJ U~W UD Wӯ
ProGreSS NK{QJSKҧLOjÿһFWUѭQJFөFEӝ
7KXұWtoán PSIST[7] OjPӝWWURQJVӕFiFWKXұWWRiQKLӋXTXҧYuFyÿӝFKtQK[iF
WѭѫQJÿӕLFDR, ciFKWLӃSFұQFӫD WKXұWWRiQ36,67 là ELӃQÿәLFiFWK{QJWLQFҩXWU~FFөF
EӝFӫDPӝWSURWHLQWKjQKPӝWWUuQKWӵ" YjGӵDtrên WұSFiF³WUuQKWӵ´ÿy [k\GӵQJPӝW
cây KұX WӕSKөFYөFKRYLӋFWuPNLӃP6RYӟi cách rút trích FiFÿһFWUѭQJFөFEӝWӯPӝW
axit amin GX\QKҩWthì cách rút trích ÿһFWUѭQJWKHRFӱDVәWUѭӧWWURQJKѭӟQJWLӃSFұQ
FӫD WKXұWWRiQ36,67 OjWӕWKѫQYuYHFWRUÿһFWUѭQJKjPFKӭD FҧKDLWK{QJWLQWӏQKWLӃQ
và xoay ӣ ErQ WURQJ Sau khi các veFWѫ ÿһF WUѭQJ ÿѭӧF FKXҭQ KyD FҩX WU~F SURWHLQ
ÿѭӧFFKX\ӇQWKjQKPӝWFKXӛL JӑLOjWUuQKWӵÿһFWUѭQJ-FҩXWU~F
FӫDFiFNêKLӋXÿѭӧF
UӡLUҥFKRi.

Tuy nhiên viӋc tìm kiӃm trên cây hұu tӕ thӵc sӵ FKѭDÿҥt hiӋu quҧ cao vӅ tӕFÿӝ,
thuұt toán PSISA[8] sӱ dөng hѭӟng tiӃp cұQWUtFKYHFWRUÿһFWUѭQJJLӕQJ36,67QKѭQJ
thay vì dùng cây hұu tӕ thì thuұt toán này sӱ dөng mҧng hұu tӕ WURQJ SKѭѫQJ SKiS
ÿiQK FKӍ mөc nhҵP WăQJ WӕF ÿӝ tìm kiӃm. KӃt quҧ thӵc nghiӋm trong PSISA chӍ ra
rҵQJÿiQKFKӍ mөc bҵng mҧng hұu tӕ giúp WăQJtӕFÿӝ tìm kiӃPQKѭQJÿӗng thӡLFNJQJ
OjPJLDWăQJkhҧ QăQJVӱ dөng bӝ nhӟ vӟi hӋ sӕ OrQÿӃQKѫQVRYӟi cây hұu tӕ QKѭ
trong PSIST.
7URQJEjLEiRQj\WUuQKEj\ PӝWSKѭѫQJSKiS OұSFKӍ PөFFKRFѫ VӣGӳOLӋX
FҩXWU~FSURWHLQWK{QJTXDYLӋFNӃWKӯD WKXұWWRiQ36,67 ÿӇ U~WUDYHFWRUÿһFWUѭQJYj
WӯWұSFiFYHFWRUÿһFWUѭQJEjLEiRÿӅ[XҩW[k\GӵQJPӝWFҩXWU~FFk\FKӍPөF GӵDWUrQ
YLӋFJKpSQKiQKFiFFKXӛLYHFWRUÿһFWUѭQJFҩXWU~FFk\Qj\YӯDJL~SKҥQFKӃYLӋFVӱ
GөQJEӝQKӟYjYӯDFKRSKpSWuPNLӃPWUrQNK{QJJLDQFӫDWRjQEӝFiFFҩXWU~FWKXӝF

3  
 


FiFKӑSURWHLQNKiFQKDX, ÿLӅXQj\JL~SFKR YLӋFWuPNLӃPPӝWFҩXWU~FSURWHLQKRһF
PӝWWLӇXFҩXWU~FWURQJSURWHLQWUӣQrQQKDQKFKyQJYjFKtQK[iFKѫQ
&iFQӝLGXQJ FzQOҥLFӫDEjLEiRÿѭӧF WUuQKEj\QKѭVau: SKҫQWKӭKDLWUuQKEj\
SKѭѫQJSKiSOұSFKӍPөFGӳOLӋXFҩXWU~FSURWHLQFiFKWKӭFU~WWUtFKYHFWRUÿһFWUѭQJ
FKXҭQKyDYeFWRUÿһFWUѭQJFNJQJQKѭYLӋF[k\GӵQJFk\FKӍPөFSKҫQWKӭEDQrXOrQ
PӝWVӕWKӱQJKLӋPWӯQJXӗQGӳOLӋXFҩXWU~FSURWHLQ YLӋF WUX\YҩQWUrQQJXӗQGӳOLӋX
Qj\SKҫQFXӕLFQJWUuQKEj\PӝWVӕÿiQKJLiYjNӃWOXұQ
2. /ұSFKӍPөFGӳOLӋXFҩXWU~FSURWHLQ
a) 5~WWUtFKYHFWRUÿһFWUѭQJ
0ӛLSURWHLQOjPӝWWәKӧSFӫDPӝWFKXӛLFyWKӭWӵFiFD[LWDPLQ UHVLGXH
ÿѭӧF
OLrQNӃWYӟLQKDXEӣLFiFOLrQNӃWSHSWLGH0ӛLUHVLGXHJӗPPӝW& D , các N và C khác.
&KLӅXGjLFӫDOLrQNӃWJyFOLrQNӃWYjFiFJyF[RҳQKRjQWRjQ[iFÿӏQKFҩXWҥRYjKuQK

KӑFFӫDSURWHLQ
ĈӝGjLOLrQNӃWOjNKRҧQJFiFKJLӳDFiFQJX\rQWӱÿѭӧFQӕLNӃW ÿѭӧFWtQKEҵQJ
o

ÿѫQYӏ Amstrong ( A )YjJyFOLrQNӃWOjJyFJLӳDKDLOLrQNӃWFӝQJKRiWUӏFӫDFQJPӝW
o

QJX\rQWӱ9tGөÿӝGjLOLrQNӃWJLӳDFһSQJX\rQWӱ1-C là 1.33 A JyFOLrQNӃWJLӳD
CD-N và N-C là 1220.

Hình 1ĈӝGjLOLrQNӃWYjFiFJyFOLrQNӃWJLӳDFiFQJX\rQWӱ
*yF[RҳQGQJÿӇP{WҧFiFFҩXWU~FFyWKӇ[RD\TXDQKFiFOLrQNӃW*LҧVӱWDFy
EӕQ ngX\rQWӱÿѭӧFNӃWQӕLWK{QJTXDED OLrQNӃW%i-1, Bi và Bi+1WKuJyF[RҳQFӫDPӕL
OLrQNӃW%i ÿѭӧFÿӏQKQJKƭDEҵQJJyFQKӓQKҩWFӫDFiFKuQKFKLӃX%i-1 và Bi+1 OrQPһW
SKҷQJYX{QJJyFYӟL%i

4  
 


Hình 2&iFJyF[RҳQI, M và Z JLӳDFiFQJX\rQWӱ
ĈӇFKөSÿѭӧFFiFÿһFWUѭQJFөFEӝPӝWFiFKFKtQK[iFKѫQ FҫQSKҧLWUtFK[XҩW
FiFÿһFWUѭQJWӯPӝWWұSFiFUHVLGXHFөFEӝĈӇWҥRUDYHFWRUÿһFWUѭQJFөFEӝÿҫXWLrQ
P{WҧWӯQJUHVLGXHULrQJELӋWYj[iFÿӏQKVӵOLrQKӋJLӳDPӝWFһSUHVLGXHYjJLӳDPӝW
o

WұSFiFUHVLGXHYӟLQKDX9ӟLPӛLUHVLGXHÿӝGjLOLrQNӃWCD-N là 1.46 A OLrQNӃW&D-C
o

là 1.51 A YjJyFJLӳD&D-N và CD-C là 11601KѭYұ\WҩWFҧFiFWDPJLiFWҥRQrQWӯFiF

QJX\rQWӱ1-CD-&FӫDPӛLUHVLGXHOjWѭѫQJÿѭѫQJQKѭQKDXYjPӛLUHVLGXHFyWKӇÿҥL
GLӋQEӣLPӝWWDPJLiF
.KRҧQJ FiFK G JLӳD PӝW FһS UHVLGXH ÿѭӧF [iF ÿӏQK GӵD WUrQ NKRҧQJ FiFK
EXFOLGH JLӳD KDL QJX\rQ Wӱ &D FӫD FK~QJ &{QJ WKӭF 
 ÿѭӧF Vӱ GөQJ ÿӇ WtQK WRiQ
NKRҧQJFiFKJLӳDhai residue

(1)
Góc T JLӳDPӝWFһSUHVLGXHÿѭӧF[iFÿӏQKEҵQJJyFJLӳDKDLPһWSKҷQJWҥRQrQ
Wӯba QJX\rQWӱ1-CD-&FӫDPӛLUHVLGXH

Hình 3. .KRҧQJFiFKYjJyFJLӳDKDLUHVLGXH

5  
 


.KRҧQJ FiFK Yj JyF Oj EҩW ELӃQ ÿӕL YӟL SKpS GӏFK FKX\ӇQ Yj [RD\ SURWHLQ
.KRҧQJ FiFK (XFOLGH JLӳD hai QJX\rQ Wӱ &D ÿѭӧF WtQK WUӵF WLӃS Wӯ FiF WRҥ ÿӝ WURQJ
không gian ba FKLӅXFӫDFK~QJ*yFJLӳDKDLPһWSKҷQJWҥRQrQWӯEӝED ngu\rQWӱ1CD-&ÿѭӧFWtQKWRiQGӵDWUrQJyFFӫDFһSYHFWRUSKiSWX\ӃQFyJӕF[XҩWSKiWWӯQJX\rQ
Wӱ&D FӫDPӛLPһWSKҷQJ9HFWRUSKiSWX\ӃQQj\ÿѭӧFWtQKEӣLF{QJWKӭF (2)

(2)
*yFJLӳDKDLYHFWRUSKiSWX\ӃQQYjQÿѭӧFWtQKWKHRF{QJWKӭF (3)

(3)
ĈӇ P{Wҧ FiF ÿһFWUѭQJFөF EӝWӯPӝWWұSFiFUHVLGXH QKyP WiF JLҧ GQJ PӝW
FӱD Vә Fy NtFK WKѭӟF Z WUѭӧW TXD WUrQ FKXӛL & D [ѭѫQJ VӕQJ FӫD SURWHLQ &iF NKRҧQJ
FiFKYjFiFJyFJLӳDUHVLGXHÿҫXWLrQYjFiFUHVLGXHFzQOҥLWURQJFӱDVәVӁÿѭӧFWtQK
toán và thêm vào vHFWRUÿһFWUѭQJ, mӛLFӱDVәӭQJYӟLPӝWYHFWRUÿһFWUѭQJ.

&KRWұS3 ^S1,p2,..pn`ÿҥLGLӋQFKRPӝWSURWHLQWURQJÿyS i OjUHVLGXHWKӭLWURQJ
FҩX WU~F [ѭѫQJ VӕQJ FӫD SURWHLQ 9HFWRU ÿһF WUѭQJ FӫD SURWHLQ ÿѭӧF ÿӏQK QJKƭD Oj
Pv={pv1, pv2« pvn-w+1}, trong ÿyZOjÿӝUӝQJFӱDVәWUѭӧWYjS vi OjYHFWRUÿһFWUѭQJFy
pvi=(d(pi,pi+1
FRVș Si,pi+1),..., d(pi,pLZí), FRVș Si,pLZí))
YӟLG Si, pj
OjNKRҧQJFiFKJLӳDKDL UHVLGXHWKӭLYjMYjFRVș Si,pj
FKREӣLJyFJLӳDhai
UHVLGXH9ӟLFӱDVәFyNtFKWKѭӟFZ WKuFKLӅXFӫDPӛLYHFWRUÿһFWUѭQJSvi là 2(w-1).
b) C huҭQKRiYHFWRUÿһFWUѭQJ
'RFiFYHFWRUÿһFWUѭQJFKӭDFiFWK{QJWLQYӅNKRҧQJFiFKYjJyFOLrQNӃWYӟL
ÿѫQYӏÿROѭӡQJNKiFQKDXQrQFҫQSKҧLÿѭӧFFKXҭQKRi7KrPQӳDYLӋFFKXҭQKRiVӁ
JL~SKҥQFKӃEӟWPLӅQJLiWUӏFӫDFiFWKjQKSKҫQWURQJYHFWRUÿһFWUѭQJ*yFș WKXӝF
SKҥPYL>ʌ@YuYұ\FRVș‫[ א‬-1, 1]. ĈӇFKXҭQKyDNKRҧQJFiFKFK~QJWDFҫQSKҧLELӃW
FұQWUrQ YӅNKRҧQJFiFKJLӳDresidue WKӭL YjUHVLGXHWKӭ (i+w-1) trong protein.
7ҩWFҧFiFNKRҧQJFiFKYjFiFJyFÿӅXÿѭӧFFKXҭQKRiYjÿѭDYӅPӝWVӕQJX\rQ
WURQJNKRҧQJ>E-1] YӟLEOjPӝWWKDPVӕ FKRWUѭӟF.
0ӛLNKRҧQJFiFKGWURQJYHFWRUÿһFWUѭQJVӁÿѭӧFFKXҭQKRiWKHRc{QJWKӭF(4)

6  
 


d=

«
»
d *b
« 4.025 * ( w  1) » (4)
¬

¼

WURQJF{QJWKӭF 
JLiWUӏKҵQJVӕ5 OjNKRҧQJFiFKWUXQJEuQKJLӳDKDLQJX\rQWӱ
CD , và ZOjÿӝUӝQJFӱDVәWUѭӧW
&iFJyFWURQJYHFWRUÿһFWUѭQJVӁÿѭӧFFKXҭQKRiWKHRF{QJWKӭF(5)
cos T =

« (cos T  1) * b »
«¬
»¼ (5)
2

6DXNKLFKXҭQKRiFҩXWU~FSURWHLQVӁÿѭӧFELӇXGLӉQEҵQJPӝWFKXӛL³WUuQKWӵ´
FiFJLiWUӏUӡLUҥFWKHRFiF YHFWRUÿһFWUѭQJWURQJÿyYHFWRUWKӭLELӇXGLӉQÿһFWUѭQJ
FӫDUHVLGXHWKӭLWURQJFKXӛL[ѭѫQJVӕQJFӫDSURWHLQ
c) X ây dӵng cây chӍ mөc
ĈӇ WLӃQKjQKOұSFKӍ PөF FKRWұSGӳOLӋXFҩXWU~FSURWHLQEjLEiRÿӅ[XҩW[k\
GӵQJPӝWFҩXWU~FFk\QKLӅXQKiQKWKHR WKXұWWRiQQKѭWURQJKuQK.
ĈҫXWLrQWKXұWWRiQVӁÿӑFGӳOLӋXFҩXWU~FFӫDWӯQJSURWHLQWURQJFѫVӣGӳOLӋX
VDXÿyWLӃQKjQKU~WWUtFKÿһFWUѭQJGӵDWKHRWKXұWWRiQÿmWUuQKEj\ QKҵP³WUuQKWӵ´KRi
FҩXWU~FEDFKLӅXFӫD PӛLSURWHLQEҵQJPӝWWұSFiFYHFWRUÿһFWUѭQJӭQJYӟLFҩXWU~F
[ѭѫQJ VӕQJ FӫD Qy 6DX NKL FKXҭQ KRi FiF YHFWRU ÿһF WUѭQJ PӛL ³WUuQK Wӵ´ FҩX WU~F
SURWHLQVӁÿѭӧF WKrPYjRWURQJFk\FKӍPөFÿӇSKөFYөFKRYLӋFWUDFӭX.

Hình 4. 7KXұWWRiQWҥRFk\FKӍPөFGӵDWUrQÿһFWUѭQJFҩXWU~FFӫDSURWHLQ.

7  
 



9tGө;k\GӵQJFk\FKӍPөFWӯWұSJӗPViX FҩXWU~FSURWHLQÿmWUuQKWӵKRiӣ
ÿk\PӛLWUuQKWӵSURWHLQÿѭӧFELӇXGLӉQEӣLPӝWWұS FiFNêWӵPӛLNêWӵӭQJYӟLPӝW
YHFWRUÿһFWUѭQJÿmÿѭӧFFKXҭQKRi
P1={a,b,d,f,a,h}; P2={b,a,d,b,d}; P3={a,b,c,b,d,s,f};
P4={c,a,b,a,b,c}; P5={c,a,b,c,c,b}; P6={a,c,b,a,d};
.ӃWTXҧVӁÿѭӧFFҩXWU~FFk\QKѭKuQK

Hình 5. Cây FKӍPөFGӵDWUrQÿһFWUѭQJFҩXWU~FFӫDcác protein.
d) T ruy vҩn dӳ liӋu trên cây chӍ mөc
&KRPӝWWUX\YҩQ4WUѭӟFWLrQcác vector ÿһFWUѭQJFӫDFҩXWU~F4VӁÿѭӧFtrích
[XҩWYjFKX\ӇQÿәLWKjQKPӝWFKXӛL³WUuQKWӵ´QKѭP{WҧWURQJPөFD và 2b6DXÿy
vLӋFWUDFӭXVӁÿѭӧFWKӵFKLӋQ TXDEDJLDLÿRҥQWuPNLӃP[ӃSKҥQJYj FKӑQWӕLѭX. Giai
ÿRҥQ WuP NLӃP WKӕQJ Nr các FҩX WU~F WURQJ Fѫ Vӣ Gӳ OLӋX SK KӧS YӟL Q theo PӝW
QJѭӥQJ NKRҧQJ FiFK H JLӳD FiF YHFWRU JLDL ÿRҥQ WKӭ Kai [ӃS KҥQJ WҩW Fҧ FiF SURWHLQ
FKӭD FKXӛL SK KӧS WuP WKҩ\, và JLDL ÿRҥQ sau cùng Vӱ GөQJ WKXұW WRiQ SmithWaterman[9@ÿӇWuPNLӃPFҩXWU~FWѭѫQJÿӗQJFөFEӝ WӕWQKҩW GӵDWUrQWUX\YҩQQ và
WұSJӗPFiFSURWHLQÿѭӧFOӵDFKӑQ.
7KXұWWRiQ WuPNLӃP PүXWUX\YҩQ Q trên FҩXWU~FFk\FKӍ PөF ÿѭӧc trình bày
QKѭVDX

InputÿRҥQFҩXWU~FSURWHLQ4QJѭӥQJVRNKӟSQKӓQKҩWH
Output7ұSFiFFҩXWU~FSURWHLQWKRҧÿLӅXNLӋQWuPNLӃPÿѭӧFVҳS[ӃSWKHRVӕ
OѭӧQJUHVLGXHVRNKӟSJLҧPGҫQ

F unction Search WUHH5RRWPͱFLFKX͟LWUX\Y̭Q4QJ˱ͩQJH ){
While (i  FKL͉XFDRFk\ - ÿ͡GjLFKX͟L4

^
-


*RPQKiQKWKHRPͱFL

-

)RUHDFKQRGHW̩LPͱFL

8  
 


o

1͇X QRGH1>M@WUQJNKͣSYͣL4 [0])
ƒ

)RU HDFK QKiQK FRQ FͯD 1>M@ 1͇X VR NKͣS YͣL SK̯Q
FzQO̩LFͯDFKX͟L4WKR̫QJ˱ͩQJH thì:

ƒ
o

x

7KrPQKiQKYjRW̵SN͇WTX̫

x

/R̩LQKiQKNK͗L cây

Return Search (Root, i +1, Q[0], H);


1J˱ͫFO̩L
ƒ

Return Search (N[j], i +1, Q[i+1], H);

} end while
}end function
)XQFWLRQ4XHU\ WUHH5RRWP̳XWUX\Y̭Q4WRSNP̳XF̯QFK͕QQJ˱ͩQJH){
-

.KͧLW̩RW̵SN͇WTX̫U͟QJ

-

5~WWUtFKÿ̿FWU˱QJYjW̩RFKX͟LWUuQKW͹F̭XWU~FFKRWUX\Y̭Q4

-

;k\G͹QJFk\FK͑PͭF

-

Search (Root, i =0, Q, H);

-

6̷S[͇SW̵SN͇WTX̫ JL̫PG̯QWKHRV͙O˱ͫQJVRNKͣS m ;

-


&K͕QNP̳XW͙WQK̭WWURQJW̵SN͇WTX̫YjiSGͭQJWKX̵WWRiQ6PLWK -Waterman
WuPV̷SKjQJF̭XWU~FFͭFE͡W͙WQK̭W

}end function
9tGө: 7uPNLӃPPүXWUX\YҩQ4 ^EFGE`trên cây FKӍPөFWӯWұSFiFFҩXWU~FSURWHin
ÿmWUuQKWӵKRi YӟL QJѭӥQJH=3. 7ұSJӗPP1={a,b,d,f,a,h}; P2={b,a,d,b,c};
P3={a,b,c,d,b,s,f}; P4={c,b,c,a,b,c}; P5={c,b,c,c,d,b}; P6={a,c,b,a,d}
x

TUX\YҩQWҥLPӭFJӕF PӭF
Æ 7ұSNӃWTXҧ ^P2 (VӕVRNKӟSP )}

9  
 


x

TUX\YҩQWҥLPӭF1Æ 7ұSNӃWTXҧ ^P4 (VӕVRNKӟSP ), P3 (m=4)}

x

Truy YҩQWҥLPӭF2Æ 7ұSNӃWTXҧ ^P5 (VӕVRNKӟSP )}

3. 0ӝWVӕNӃWTXҧWKӱQJKLӋP
a) C ác nguӗn dӳ liӋu cҩu trúc protein
&iF FҩX WU~F SURWHLQ EұF ED ÿѭӧF OѭX WUӳ QKLӅX WҥL QJkQ KjQJ Gӳ OLӋX 3URWHLQ
(PDB ± Protein Data Bank
>@ÿyOj NKROѭXWUӳFKtQKFKRWKӵFQJKLӋP[iFÿӏQK FҩX

trúc EұF ED FӫD Protein. Ngân hàng PDB ÿѭӧF WҥR UD YjR QăP  WҥL 3KzQJ WKt
QJKLӋPTXӕFJLD%URRNKDYHQ %1/
ӣ0ӻ1KӳQJFҩXWU~FÿѭӧF [iFÿӏQKQKӡ VӱGөQJ
SKѭѫQJSKiSWLQKWKӇKӑF+LӋQ QD\FyKѫQ 73153 FҩXWU~FSURWHLQWURQJNKROѭXWUӳWҥL
PDB và KjQJQăP có KѫQF{QJWUuQKPӟLÿѭӧFOѭXWUӳ
&iF SURWHLQ WURQJ Fѫ Vӣ Gӳ OLӋX 6&23 >@ ÿѭӧF Wә FKӭF WҥL 3KzQJ WKt QJKLӋP
6LQKKӑF3KkQWӱFӫD+ӝLÿӗQJ1JKLrQFӭX<NKRD 05&
ӣ&DPEULGJH$QKP{Wҧ
FiFPӕLTXDQKӋFҩXWU~FYjWLӃQKyDJLӳDFiFFҩXWU~FSURWHLQÿmÿѭӧFELӃWÿӃQ. SCOP
ÿmÿѭӧFFKҩSQKұQOjSKKӧSQKҩWYjSKkQORҥLFiFWұSGӳOLӋXÿiQJWLQFұ\QKҩWGR
WKӵFWӃUҵQJ6&23[k\GӵQJTX\ӃWÿӏQKSKkQORҥLFӫDQyGӵDWUrQQKӳQJTXDQViWWUӵF
TXDQFiF\ӃXWӕFҩXWU~FFӫDSURWHLQGRFiFFKX\rQJLDWKӵFKLӋQ3URWHLQÿѭӧFSKkQORҥL
PӝWFiFKFyWKӭEұFSKҧQiQKPӕLTXDQKӋFӫDFK~QJYӅFҩXWU~FYjWLӃQKyD&iFFҩS
FKtQKFӫD KӋ WKӕQJSKkQFҩSOjKӑJLDÿuQK IDPLO\
 GӵDWUrQFiFPӕLTXDQKӋWLӃQ

10  
 


×