Tải bản đầy đủ (.pdf) (12 trang)

TissueSpecific Expressed Sequence Tags from the Black Tiger ShrimpPenaeus monodon

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (112.19 KB, 12 trang )

Mar. Biotechnol. 1, 465–476, 1999

© 1999 Springer-Verlag New York Inc.

Tissue-Specific Expressed Sequence Tags from the Black
Tiger Shrimp Penaeus monodon
Sigrid A. Lehnert,1,* Kate J. Wilson,2 Keren Byrne,1 and Stephen S. Moore1
1

CSIRO Tropical Agriculture, Molecular Animal Genetics Centre, Level 3, Gehrmann Laboratories, University of Queensland,
St. Lucia, Qld 4072, Australia
2
Australian Institute of Marine Science, PMB No. 3, Townsville Mail Centre, Townsville, Qld 4810, Australia

Abstract: Expressed sequence tag data were generated from complementary DNA libraries created from cephalothorax, eyestalk, and pleopod tissue of the black tiger shrimp (Penaeus monodon). Significant database
matches were found for 48 of 83 nuclear genes sequenced from the cephalothorax library, 22 of 55 nuclear genes
from the eyestalk library, and 6 of 13 nuclear genes from the pleopod library. The putative identities of these
genes reflected the expected tissue specificity. For example, genes for digestive enzymes were identified from the
cephalothorax library and genes involved in the visual and neuroendocrine system from the eyestalk library. A
few sequences matched anonymous EST or genomic sequences, and others contained mini-satellite or microsatellite repeat sequences. The remainder, 31 from the cephalothorax library, 25 from the eyestalk library, and
5 from the pleopod library, were sequences of high nucleotide complexity with no matches in any database
searched and thus may represent novel genes.
Key words: Penaeus monodon, shrimp, cDNA libraries, ESTs, gene expression, genome

I NTRODUCTION
Penaeus monodon, the black tiger shrimp, is the most important shrimp aquaculture species in the Southern IndoPacific region. In 1997, approximately 400,000 tonnes were
produced worldwide with a value of more than U.S. $3
billion (Rosenberry, 1997). This industry is now threatened
by problems of disease and depletion of the wild broodstock
that are used to stock commercial hatcheries (Browdy,
1998). Research on this species is therefore focusing on


domestication to overcome broodstock shortages and to
allow genetic selection for improved strains, and on idenReceived February 12, 1999; accepted April 13, 1999
*Corresponding author; telephone +61-7-3214-2445; fax +61-7-3214-2480; e-mail


tification of the causative agents of major prawn diseases.
Molecular tools are being applied to both problems (Benzie,
1998).
At present, there is little baseline information on the
molecular biology of any crustacean species. The GenBank
release of July 1, 1998, contained 846 crustacean sequences.
Excluding sequences for ribosomal RNA genes, highly repetitive DNA, and mitochondrial genes, there were only 254
entries, including many redundancies. Of these, only 133
were from decapod crustaceans, the group to which most
commercially important species belong.
To address this constraint for the commercially important penaeid species, we have initiated a project to characterize expressed sequence tags (ESTs) in P. monodon. ESTs
are generated by single-pass DNA sequencing of clones obtained from complementary DNA cDNA libraries and are a


466

Sigrid A. Lehnert et al.

powerful tool in the genetic characterization of organisms,
owing in large part to the speed and affordability of generating these sequences. Comparison of sequences obtained
with those available in public sequence databases allows
putative identification of many genes (Marra et al., 1998).
ESTs can be used to characterize patterns of gene expression
as they represent genes that are actively expressed in the
tissues from which cDNA librarires are prepared (Fields,

1994). Expressed sequence data are also an important component of gene mapping studies, as anchor loci to allow the
transfer of gene map information between species (Khan et
al., 1992; Ruyter-Spira et al., 1996).
As the first step in this project we have sequenced a
limited number of clones from three different shrimp
cDNA libraries to obtain information about representation,
sequence homologies to other species, and gene expression
that will inform further work on shrimp ESTs. Shrimp ESTs
will act as a valuable resource for the study of crustacean
physiology, genetics, and molecular evolution.

M ATERIALS

AND

M ETHODS

Whole Cephalothorax Library
For the “whole cephalothorax” cDNA library, mRNA was
prepared from five adult P. monodon obtained from a North
Queensland prawn farm. The cephalothorax was severed
from the abdomen, the carapace was carefully removed, and
the legs and other appendages were cut off. Eyes and eyestalks were included in the preparation. Total RNA was
extracted from tissues snap frozen in liquid nitrogen, disrupted with a hammer, and homogenized in Trizol LS
(Gibco) using a glass homogenizer. Messenger RNA was
prepared using oligo(dT)-cellulose spin columns (Pharmacia mRNA purification kit), and cDNA was prepared and
directionally cloned into the UniZAP vector (Stratagene)
using the ZAP-cDNA synthesis kit (Stratagene). The resulting library contained 1.6 × 106 independent phages and was
amplified to 5 × 1010 pfu/ml. For dot blot analysis, 500 ng
of plasmid DNA was spotted onto Hybond N+ membrane

(Amersham) and hybridized to a 32P-labeled probe using
standard protocols (Sambrook et al., 1989).

Eyestalk and Pleopod cDNA Libraries
RNA was prepared from four eyestalks, one each from four
separate broodstock females provided by a Queensland
hatchery, and from 10 pleopods, 5 each from two tank-

reared males, using the “one-step” total RNA preparation
method (Chomczynski and Sacchi, 1987). Messenger RNA
was prepared using oligo(dT)-cellulose spin columns (Promega mRNA purification kit), and cDNA was prepared and
directionally cloned into the ZAP Express vector (Stratagene) using the ZAP Express cDNA synthesis kit (Stratagene). The resulting libraries contained 5 × 105 independent
phage and were amplified to 108 pfu/ml.

EST Sequencing and Analysis
In vivo excision using ExAssist helper phage was performed
on a small aliquot of each library. Plasmid DNA from the
resulting pBluescript SK phagemid clones (cephalothorax
library) or pBK-CMV phagemid clones (eyestalk and pleopod libraries) was prepared using spin column miniprep
kits. Automated cycle sequencing was performed on 300 ng
of plasmid DNA using the T3 sequencing primer and ABI
reagents and equipment. Sequence analysis and BLASTN/
BLASTX homology searches were carried out using
MacVector 6.0 (Oxford Molecular Group) software and the
Australian National Genome Information Service (ANGIS).
BLASTX searches were performed for the top strand only
on the ANGIS nonredundant protein database using the
PAM120 matrix and with prefiltering of sequences using the
programs SEG followed by XNU to eliminate regions of low
complexity.


R ESULTS

AND

D ISCUSSION

Complementary DNA Libraries
A key prerequisite for an EST project is the availability of
high-quality cDNA libraries. Ideally, a library should be
unidirectionally cloned and comprise a high proportion of
long or full-length clones, representative of the entire
mRNA population from the tissue used to generate the
library and free of contaminating mitochondrial, ribosomal
RNA, and genomic sequences (Adams et al., 1995). These
parameters can be evaluated by initial sequencing of a small
number of randomly selected clones from each library.
In the present study, unidirectional cDNA libraries
were constructed from mRNA isolated from the cephalothorax, pleopods (swimming legs), or eyestalks of the black
tiger shrimp P. monodon. The average insert size of the
whole cephalothorax library was 2 kb, of the pleopod library
1.6 kb, and of the eyestalk library 1.2 kb, which exceeds the


Expressed Sequence Tags from the Black Tiger Shrimp 467

minimum size of 1 kb recommended by Adams et al.
(1995).
To assess the sequence content, phagemid inserts were
excised in vivo from small aliquots of the libraries, and a

number of randomly selected clones were sequenced from
the 5Ј region of the clones. GenBank and EMBL databases
(without EST sequences) were searched for homologies
with shrimp ESTs using the BLASTN program (Altschul et
al., 1990). Protein databases were also searched with conceptual translations of the EST sequences using the program
BLASTX (Gish and States, 1993). Probabilities <10−2 were
considered significant (Adams et al., 1991; Anderson and
Brass, 1998). Sequence matches due to regions of lowcomplexity nucleotide or amino acid sequence were not
included in the final table of putative homologies (see further discussion below).
The libraries all showed some contamination with mitochondrial sequences. This was most pronounced in the
pleopod library, for which 14 of 32 clones sequenced were
the mitochondrial large subunit rRNA and a further 5 were
other mitochondrial genes. This library was therefore
deemed unsuitable for future large-scale EST analysis. The
cephalothorax library had less than 10% mitochondrial
genes, and only one such sequence was detected among 56
analyzed sequences from the eyestalk library. No matches to
nuclear ribosomal RNA genes were found in any of the
libraries, and hence both the cephalothorax and the eyestalk
library satisfy the criterion that there should be less than
20% of clones with no insert, or mitochondrial or rRNA
genes (Adams et al., 1995) (Table 1).
The presence of mitochondrial sequences in these genomes may result from the high A+T content of the P.
monodon mitochondrial genome (>70% A+T; K. Wilson,
unpublished data), which may lead to mitochondrial RNA
molecules binding to the oligo(dT) columns used to isolate
poly(A)-containing mRNA. The differing prevalence of
contaminating mitochondrial genes in the three libraries
probably reflects the abundance of mitochondria in the different tissues, as the eyestalk and pleopod libraries were
made in parallel using exactly the same techniques. Moreover, the cephalothorax library, although made separately,

used very similar protocols. Considerable variation in the
prevalence of mitochondrial genes in different cDNA libraries was also observed by Adams et al. (1995).
Another useful criterion for evaluating cDNA libraries
is “distinctness,” or the proportion of sequences that are
clearly different from each other (Adams et al., 1995). Of
the nuclear genes, 59.8% were distinct in the cephalothorax

library, and 81.8% in the eyestalk library (Table 1). The
figure for the cephalothorax library is heavily affected by the
extreme abundance of hemocyanin clones. In general, such
highly abundant clones can be avoided by prescreening the
library prior to further sequencing (Adams et al., 1995). In
the present study, after the first five hemocyanin and five
mitochondrial clones had been identified by sequencing, the
cephalothorax library was prescreened with a hemocyanin
probe and a P. monodon 10-kb mitochondrial probe containing both the large and small subunit mitochondrial
rRNAs as well as a number of other mitochondrial genes (K.
Wilson, unpublished data). This identified a further 19 hemocyanin clones and 5 mitochondrial clones. Following
prescreening, there were 77.2% distinct nuclear clones in
the cephalothorax library (Table 1).

Identification of Shrimp Homologues for
Well-Characterized Sequences
On the basis of homology searches, 48 of 83 nuclear sequences analyzed from the cephalothorax cDNA library, 22
of 55 from the eyestalk library, and 6 of 13 from the pleopod library could be assigned a possible identity (Tables 1
and 2).
Most of the identified sequences were represented only
once. The most prominent exception was the hemocyanin
gene, as discussed above. In addition, homologues of arrestin, GTP binding proteins, members of the thrombospondin family, as well as a number of muscle-related RNAs,
arginine kinase, myosin light and heavy chains, tropomyosin and actin, were all found up to four times (Table 2).


Unidentified Sequences
The sequences that did not show significant homologies in
the initial BLAST searches were compared with the GenBank EST databases. This search found another four significant homologies, three from the cephalothorax library
and one from the eyestalk library (Table 2). In addition, one
of the eyestalk sequences showed homology to a hypothetical protein of unknown function identified from the Caenorhabditis elegans (nematode) genomic sequencing project.
The remaining “unidentified” EST sequences were
aligned with each other at the nucleic acid level using
MacVector 6.0 to look for families of related sequences.
Only scores higher than 1000 were considered significant.
The shrimp cephalothorax cDNA library contained a family
of 11 highly related sequences that did not match any se-


468 Sigrid A. Lehnert et al.

Table 1. Complementary cDNA Library Statistics
Results of database searches†

Composition of cDNA library*

Cephalothorax
Eyestalk
Pleopod

Nuclear
genes

Nuclear
rRNA


mt
rRNA

102/112
(91.0%)
55/56
(98.2%)
13/32
(40.6%)

0

9/112
(8.0%)
1/56
(1.8%)
14/32
(43.8%)

0
0

Other
mt
genes

Nuclear genes
with matches to
database sequences


1/112
(0.9%)
0

48/83
(57.8%)
22/55
(40.0%)
6/13
(46.2%)

5/32
(15.6%)

Nuclear genes
with matches
to anonymous
database
sequences

Nuclear genes
of high sequence
complexity with
no database
matches

cDNAs with
microsatellite
sequences


3/83
(3.6%)
2/55
(3.6%)
0.0%

31/83
(37.3%)
25/55
(45.5%)
5/13
(38.5%)

1/83
(1.2%)
6/55
(10.9%)
2/13
(15.4%)

Distinct
nuclear genes‡
61/102 or 61/79
(59.8%) or (77.2%)
48/55
(81.8%)
12/13
(91.7%)


*Data for the cephalothorax library are based on a combination of sequence and hybridization data: 88 clones were sequenced, and a further 19 hemocyanin clones and 5 mitochondrial clones were identified
by dot blot hybridization.

Percentages for the cephalothorax library are based on the total number of sequenced nuclear clones, not the total number of nuclear clones.

Two figures are given for the cephalothorax library. The first is based on the total number of identified nuclear clones, and the second is the figure that would result if all hemocyanin clones were eliminated
by prescreening.


Table 2. Homologies to Shrimp ESTs Identified in Database Searches

Clone ID

Sequence
length
(bp)

Library

Redundancy

Data
base

Database
accession

Species
with closest
homology


Probability

Acidic ribosomal
phosphoprotein
Elongation factor 1␣
Elongation factor 2
Heat shock protein
hsp 60
Ribosomal protein L18
Ribosomal protein L27a
Ribosomal protein L7a
Ribosomal protein S1a
Ribosomal protein S2
Ribosomal protein S7

M17885

Human

1.00E-39

394

X77689
M86959
X99341

Zebra fish
Nematode

Fruit fly

3.60E-78
5.10E-84
1.40E-13

1036
449
283

L04128
U66358
X62640
X57322
U01334
L20096

1.80E-24
9.10E-58
1.30E-71
1.10E-65
2.50E-120
9.80E-33

412
782
597
913
1386
442


S57432

Mouse
Fruit fly
Chicken
Frog
Fruit fly
Tobacco
hornworm
Frog

1.10E-47

474

U18973

Fruit fly

5.60E-41

604

X69422

Wild oat

2.00E-152


1957

Lobster:
H. vulgaris
Lobster:
H.vulgaris
Fish
Fruit fly
Fruit fly
Fruit fly

5.70E-109

1404

1.10E-36

513

4.90E-96
1.90E-41
4.00E-16
4.20E-54

748
624
316
756

Gene


C

GB

AIMS-P.mon66
SAL078
SAL010

290
767
295

P
C
C

GB
GB
GB

AIMS-P.mon19
SAL012
SAL111
SAL039
SAL084
SAL063

302
538

857
505
857
767

E
C
C
C
C
C

GB
GB
GB
GB
GB
GB

AIMS-P.mon30

379

E

GB

AIMS-P.mon52

432


E

GB

AIMS-P.mon26

569

E

GB

Ribosomal small
subunit protein (40S)
Protein disulfide
isomerase
Ubiquitin

Muscle-related proteins
AIMS-P.mon44

389

E

GB

Arginine kinase


X68703

SAL048

677

C

GB

Arginine kinase

X68703

SAL104
SAL053
AIMS-P.mon21
AIMS-P.mon4

836
817
296
413

C
C
E
E

GB

GB
GB
GB

Actin (muscle)
Muscle LIM protein
Muscle LIM protein 84B
Myosin alkali light chain

D87740
X81192
X91245
L08052

2

2

2

Expressed Sequence Tags from the Black Tiger Shrimp 469

Protein synthesis and processing
SAL109
812

Score


Clone ID


Sequence
length
(bp)

Library

AIMS-P.mon58

203

AIMS-P.mon1
SAL033
SAL096
SAL088

Gene

Database
accession

Species
with closest
homology

Probability

SP

Myosin alkali light chain


Q24755

Fruit fly

1.60E-10

GB
GB
GB
GB

Myosin (fast) heavy chain
Myosin heavy chain
Myosin light chain
Sarco/endoplasmic reticulum
Ca2+ ATPase
Sarcoplasmic calcium
binding protein
Sarcoplasmic calcium
binding protein
Tropomyosin

U03091
M61229
L08051
AF025848

4.00E-43
3.00E-108

8.30E-51
2.20E-107

637
1427
737
1417

9.10E-56

184

AF014951

Lobster: H. americanus
Fruit fly
Fruit fly
Crayfish:
P. clarkii
Shrimp:
Penaeus sp.
Fruit fly

5.60E-09

234

AF034954

Lobster: H. americanus


1.40E-117

1540

Y07894
Q06185
M11259

Fruit fly
Mouse
Fruit fly

3.20E-34
8.10E-07
4.7 E-67

535
100
903

NUFM_BOVIN

Cow

1.10E-42

235

S67973


Human

3.80E-84

1102

Q62425

Mouse

1.90E-16

145

X85127
X15800
X86369

Shrimp: P.vannamei
Rat
Shrimp: P.vannamei

5.50E-163
1.10E-11
2.80E-122

1274
266
875


Data
base

P

271
684
867
832

E
C
C
C

SAL054

977

C

SP

SAL101

854

C


GB

SAL097

881

C

Energy generation
SAL042
AIMS-P.mon68
AIMS-P.mon57

and metabolism
632
C
365
P
378
P

Redundancy

3

2

GB

GB

SP
GB

SAL106

737

C

SP

AIMS-P.mon8

395

E

GB

AIMS-P.mon59

210

P

SP

Digestive enzymes
SAL099
SAL028

SAL107

741
707
809

C
C
C

GB
GB
GB

ATP synthetase alpha subunit
ATP synthase E chain
Glyceraldehyde-3-phosphate
dehydrogenase
NADH ubiquinone oxidoreductase
13 kDa subunit
NADH:ubiquinone oxidoreductase
flavoprotein 1 subunit
NADH-ubiquinone oxidoreductase
MLRQ subunit

Cathepsin L-like cysteine proteinase
Pyruvate kinase
Trypsin

SCPB_PENSP


Score
83

470 Sigrid A. Lehnert et al.

Table 2. Continued


Table 2. Continued

Clone ID

Sequence
length
(bp)

Library

Neurosensory/endocrine system
AIMS-P.mon10 486
E
AIMS-P.mon31 599
E
SAL040
509
C
SAL068
674
C

SAL117
772
C
AIMS-P.mon3 409
E

Redundancy
2
2

2

Data
base

Gene

GB
GB
GB
GB
GB
SP

Beta-arrestin
Arrestin
Assembly protein 180
Cysteine string protein
GTP binding protein
GTP binding protein


C
E

GB
GB

AIMS-P.mon7
SAL087

448
857

E
C

GB
GB

Opsin
Phenylalanine/tryptophan
hydroxylase
Phospholipase C
Ubiquitin carboxyl-terminal hydrolase

Other
SAL006
SAL114
SAL027
AIMS-P.mon39

AIMS-P.mon13

300
734
519
402
403

C
C
C
E
E

GB
GB
GB
GB
SP

Actin (cytoplasmic)
Aldose reductase
Cartilage oligomeric matrix protein
Clathrin heavy chain 2
Dynactin

AIMS-P.mon49
SAL046
AIMS-P.mon20
AIMS-P.mon25

SAL083
SAL060

357
628
411
566
810
467

E
C
E
E
C
C

PIR
GB
SP
GB
GB
SP

Exoskeletal protein
Hemocyanin
Hemocyte transglutaminase
Histone H1
Low density lipoprotein receptor
Metallothionin-1


SAL043

671

C

GB

SAL005

484

C

GB

Putative transcriptional regulator
(CON7)
Receptor for activated kinase
(RACK1)

24

Species
with closest
homology

Probability


Score

M33601
M30140
X68878
S81917
M33141
GBGB
HUMAN
X71665
M32802

Cow
Fruit fly
Rat
Rat
Cow
Human

1.20E-40
9.30E-46
9.30E-16
7.80E-42
8.60E-86
2.00E-08

451
675
313
296

1106
113

Mantis
Fruit fly

1.90E-54
2.90E-46

780
672

J03138
M30496

Fruit fly
Human

2.50E-84
4.10E-15

1111
132

U09635
M59754
X74326
U60803
DYNC_HUM
AN

S77934
X82502
Q05187
D87065
M11501
MT1_HOMA
M
AF015771

Sea urchin
Cow
Cow
Human
Human

3.00E-70
5.00E-55
1.90E-05
1.20E-89
6.40E-22

775
787
144
1175
154

Lobster: H. americanus
Shrimp: P. vannamei
Horseshoe crab: T. tridentatus

Wheat
Rabbit
Lobster: H. americanus

3.4E-18
2.40E-117
5.6E-32
1.70E-17
1.10E-03
3.00E-36

169
1148
242
335
170
299

Fungus

3.10E-08

224

Fish

9.70E-25

194


AF025331

Expressed Sequence Tags from the Black Tiger Shrimp 471

SAL026
679
AIMS-P.mon17 400

Database
accession


472 Sigrid A. Lehnert et al.

Table 2. Continued

Clone ID

Sequence
length
(bp)

Library

SAL072
SAL082
SAL008
AIMS-P.mon63

791

754
311
249

C
C
C
P

Matches to anonymous database sequences
SAL065
665
C
SAL075
807
C
SAL100
738
C
AIMS-P.mon2
427
E
AIMS-P.mon16 310
E

Redundancy

Gene

Database

accession

Species
with closest
homology

Probability

GB
GB
GB
GB

Thrombospondin-1
Thrombospondin-4
U2 snRNP-specific A’protein
Voltage-dependent anion-selective channel

U76994
Z19585
X13482
U70314

Chicken
Human
Human
Fruit fly

1.40E-13
1.50E-37

8.80E-36
7.9E-27

289
369
539
434

GB (EST)
GB (EST)
GB (EST)
GB (EST)
TREMBL

EST-mouse skin
EST-fruit fly head
EST-fruit fly head
EST-human fetal heart
Genomic sequence-nematode

AA512039
AA699176
AI062868
AA009415
E276071

Mouse
Fruit fly
Fruit fly
Human

Nematode

1.30E-10
5.00E-05
1.70E-06
5.70E-14
86

252
185
203
289
86

Data
base

Score

*When multiple significant similarities were found for a single cDNA, only the highest scoring hit is included in the table. When more than one clone from a library matched the same gene, only the highest
scoring clone from each library is listed and the total number of “hits” (i.e., including the listed one) is indicated in the “redundancy” column.
†Abbreviations for cDNA libraries: C indicates cephalothorax; E, eyestalk; P, pleopod.
‡All shrimp EST sequences described in this article, including those without database matches, have been submitted to GenBank (ESTs). The accession numbers for AIMS-P.mon1–AIMS-P.mon55 (eyestalk
library) are AI253798–AI253852, for AIMS-P.mon56–AIMS-P.mon68 (pleopod library) are AI253853–AI253865, and for SAL001–SAL118 (cephalothorax library) are AI254886–253953. Databases: GB
indicates GenBank (no ESTs); GB(EST), GenBank dbEST; PIR, Protein Information Resource; SP, SwissPROT; TREMBL, translated from EMBL.


Expressed Sequence Tags from the Black Tiger Shrimp 473

Table 3. Microsatellite sequences from Shrimp EST Sequences

Clone

Type of microsatellite

Nucleotide repeat unit

SAL001*
SAL015

Perfect
Compound imperfect

Dinucleotide
Di- and teteranucleotide

AIMS-P.mon12

Compound perfect and
compound imperfect
Imperfect
Compound perfect
Imperfect
Perfect
Imperfect
Imperfect/perfect

AIMS-P.mon27
AIMS-P.mon35
AIMS-P.mon45
AIMS-P.mon47

AIMS-P.mon56
AIMS-P.mon60

Microsatellite sequence

(GA)56
(GA)25GT(GA)2GTGAAA(GA)2CA(GA)2GTGA(GAAA)4
AAACA(GA)4GAAA(GA)3
Di- and tetranucleotide
(ATAG)5(AG)17 and (ATGT)5ACGT(ATGT)2,
(AGAT)4(TGAT)(AGAT)(AT)4(AG)4
Trinucleotide
(AAT)9AGT(AAT)5
Pentanucleotide/mononucleotide
(TTTTC)5(T)11
Trinucleotide
(AAT)7AGT(AAT)13
Dinucleotide
(AG)8
Trinucleotide
(ATG)4ACG(ATG)6 and (ATG)6(ACG)2(ATG)4
Trinucleotide
(CCA)3TCA(CCA)3 and (GCC)6

*This clone appears to be a chimeric clone, containing mitochondrial 16S rRNA sequences associated with microsatellite sequence at the extreme 5Ј end
of the clone. Sequencing the complete mitochondrial genome of P. monodon failed to detect this microsatellite (K. Wilson, unpublished data). The 16S
rRNA–like sequence in SAL001 is clearly that of a decapod crustacean, based on BLASTN analysis.

quences in the published databases or the ESTs from the
eyestalk or pleopod cDNA libraries. One further pair of

unidentified related sequences was also discovered in the
cephalothorax library.
The eyestalk library contained one family of four highly
related unknown sequences that did not overlap with sequences from the other libraries. One of the “unknown”
sequences from the pleopod library was related to one from
the eyestalk library.

Complementary DNAs Containing Sequences of
Low Complexity
A number of sequenced cDNA clones contained microsatellite sequences (Table 3). Microsatellites are commonly believed to occur primarily in noncoding DNA. However,
surveys of other cDNA libraries have indicated that up to
8% of clones may contain microsatellites (Khan et al., 1992;
Depeiges et al., 1995; Ruyter-Spira et al., 1996)
A high proportion of the P. monodon cDNA microsatellites identified in this study were imperfect and compound
repeats, an observation consistent with the results of other
groups studying penaeid microsatellites (Tassanakajon et
al., 1998). It is also noteworthy that three of the nine clones
with microsatellites (AIMS-P.mon12, AIMS-P.mon56, and
AIMS-P.mon60) contained two independent microsatellites, further emphasizing the complexity of microsatellite
structure in decapod crustaceans. When subjected to

BLASTX searches without prefiltering to remove sequences
of low complexity, almost all these sequences produced statistically significant database matches. For example, AIMSP.mon45 showed homology to phosphatidylinositol 3-kinase from the slime mold Dictyostelium discoideum with a
score of 101 and p value of 1.50E-14. However, in each case
these matches are almost entirely due to long runs of a
single amino acid (asparagine in the example of AIMSP.mon45), and the significance is therefore difficult to
evaluate.
Two further eyestalk sequences (AIMS-P.mon32 and
AIMS-P.mon43) and one cephalothorax sequence (SAL079)
showed long stretches consisting only of A and G residues,

but without a regular short repeat unit, which would allow
them to be classed as microsatellites. In AIMS-P.mon32,
three out of the four blocks of A and G residues contain a
repeat unit of 31 residues with only a single nucleotide
difference between them, which perhaps could be classed as
a minisatellite. Five of the cephalothorax sequences
(SAL024, 071, 091, 095, and 106) showed stretches of
around 15 A or T residues, which had to be filtered out in
order to perform database searches. Only one of the ESTs
containing poly(T) or poly(A) (SAL106) matched a database entry.
A further group of sequences did not contain obvious
nucleotide repeat units, but conceptual translation of the
sequence revealed significant repetition of amino acid residues, leading to apparent database matches when the


474 Sigrid A. Lehnert et al.

BLASTX searches were run without prefiltering. There are
three striking examples: AIMS-P.mon9 gives rise to a proline-rich sequence with homology to proteins such as a
putative proline-rich protein from the nematode Caenorhabditis elegans (TREMBL Q20001); AIMS-P.mon11 potentially encodes an arginine-rich protein with homology to
a human brain protein (TREMBL D1026392) and a mouse
protein kinase homologue (GenPept AF033663); and
AIMS-P.mon36 may encode a glycine-rich protein with homology to the glycine-rich cell wall structural proteins of
plants (e.g., SwissPROT GRP1 PETHY). Similarly, SAL007
may encode a proline- or glycine-rich protein, and SAL041
could give rise to a very arginine-rich sequence with homologies to sperm protamines (e.g., SwissPROT
HSP1 TACAC).

Tissue-Specific Expression of Shrimp Sequences
Shrimp ESTs that were assigned putative identities based on

homology searches are listed in Table 2. The ESTs are
broadly grouped according to functional category, to highlight the distribution of these initial ESTs between the different cDNA libraries and, hence, tissue types.
The shrimp cephalothorax contains all the major organ
systems (digestive system, nervous system, hepatopancreas,
gills, gonads, endocrine system, and heart); therefore, the
cephalothorax library would be expected to contain a large
diversity of transcripts. The most abundant transcript identified from the cephalothorax was hemocyanin (24 of 112
clones when assayed by dot blot hybridization). As the
hepatopancreas is the main site of synthesis of hemocyanin
(Rainer and Brouwer, 1993), and as hemocyanin is a highly
abundant protein, the prevalence of transcripts of this gene
would be predicted. The protein encoded by the metallothionein homologue may be linked with hemocyanin activity as it is probably involved in reactivation of copperdepleted hemocyanin (Brouwer et al., 1989).
A number of sequences showed homology to genes
involved in general cellular maintenance, such as ribosomal
proteins and polypeptide elongation factors. A preponderance of transcripts from muscle-specific genes, such as actin, myosin, and the sarcoplasmic calcium-binding protein,
and of digestive enzymes such as trypsin, is to be expected
in the cephalothorax library. Penaeus monodon homologues
of the thrombospondin family of genes were identified three
times during the BLAST analysis (SAL027, SAL072, and
SAL082). Thrombospondins and the related cartilage oligomeric matrix protein constitute a family of glycoproteins
that appear to be involved in cell-to-cell and cell-to-matrix

adhesion. Alignment of SAL027, SAL072, and SAL082 with
thrombospondin peptide sequences showed that the ESTs
aligned with the COOH-terminal region and the type 3
calcium binding repeats of the thrombospondin molecule.
The three ESTs therefore appear to code for truncated crustacean homologues of the thrombospondin family of molecules.
Other transcripts potentially encode proteins related to
activity of the shrimp nervous system; e.g., homologues of
the cysteine string protein have been associated with neural

activity in Drosophila, the marine ray, and rats (Braun and
Scheller, 1995). Expression of the clathrin assembly protein
180 is restricted to neuronal tissue in mammals (Morris et
al., 1993). A transcript encoding a homologue of the retinal
pigment opsin was detected most likely because eyestalks
were included in the cephalothorax tissue used to make this
library.
The eyestalk of P. monodon contains both optical and
endocrine organs as well as associated tissues such as muscle
and the vascular system. At the terminus of the eyestalk lies
the compound eye, consisting of photoreceptor and pigment cells, from which the optic ganglion runs through the
eyestalk to the brain. At the base of the eyestalk lies the
medulla terminalis X-organ, the site of synthesis of a number of hormones, many of which are stored in the sinus
gland, a neurohemal organ located around the midpoint of
the eyestalk. The eyestalk also contains the organ of Bellonci, which is believed to be sensory or secretory in nature
(Fingerman, 1992).
In the eyestalk, homologues of a number of genes involved in general cellular metabolism were identified (e.g.,
ubiquitin, histone, and ribosomal proteins), as well as a
group involved in muscle action (e.g., arginine kinase and
the myosin molecules). The remaining genes appeared to be
related to specific cell types within the eyestalk.
The most common sequence homology identified
among the eyestalk ESTs was to arrestin (AIMS-P.mon10,
AIMS-P.mon18, AIMS-P.mon31, AIMS-P.mon37). Arrestins are a family of proteins involved in desensitization of
various G-protein-coupled receptors. In vertebrates, there
are two broad classes: the visual arrestins, which inactivate
rhodopsin, and the ␤-arrestins, which inactivate the ␤-adrenergic receptor. Two different arrestin molecules have
been identified in the Drosophila visual system (Matsumoto
and Yamada, 1991), and two different classes have also been
identified in insect antennae (Raming et al., 1993). Three of

the four shrimp clones appear to contain the 5Ј end of the
gene and hence can be aligned with each other. This indi-


Expressed Sequence Tags from the Black Tiger Shrimp 475

cates clear differences between the sequence encoded by
AIMS-P.mon10 and that encoded by AIMS-P.mon18 and
AIMS-P.mon38, suggesting that two different types of arrestins may also have been identified in the shrimp. However, the functional significance of these different arrestin
sequences in invertebrates remains unclear.
Another gene that may be involved in vision is the
phospholipase C (PLC) homologue. In Drosophila, the PLC
gene is specifically expressed in the retina, and mutations in
this gene render the flies blind (Bloomquist et al., 1988).
Three of the other eyestalk ESTs with database identifications encode homologues of genes that could play a role
in the neuroendocrine system. Clathrin is required for receptor-mediated endocytosis via coated pits, and molecules
taken up by this pathway in mammals include growth factors such as epidermal growth factor. Protein disulfide
isomerase is generally required to assist in formation of
disulfide bonds during passage of secreted proteins through
the endoplasmic reticulum, and hence might be expected to
be relatively abundant in neurosecretory tissue. Phenylalanine/tryptophan hydroxylase activities are encoded by the
same locus in Drosophila, with the different activities
thought to be regulated by different posttranslational modifications. Expression of this gene is seen in Drosophila neural tissue, most likely because tryptophan and phenylalanine
hydroxylase activity are required for biosynthesis of the biogenic amines serotonin and dopamine, respectively (Neckameyer and White, 1992).
Only very limited EST analysis was undertaken of the
pleopod library, owing to the extremely high number of
contaminating mitochondrial sequences. However, of the
six sequences that were identified, all are homologous to
sequences that would be expected to be abundant in muscle
tissue: four are involved in energy generation and mitochondrial function, one is a muscle structural protein, and

one is involved in protein synthesis.
It is likely that the sequences for which no matches
were found in the databases also included some genes with
tissue specificity. Several of these unidentified ESTs were
highly abundant in the libraries from which they were isolated (i.e., appeared more than once in a small sample of
random sequences) and yet were not found in the other
libraries.
The only genes that were found in common between
different libraries were the muscle-specific transcripts arginine kinase, myosin light and heavy chains, and one of the
muscle LIM genes, and a GTP-binding protein (Table 2). In
addition, separate nuclear-encoded subunits of the multi-

meric mitochondrial enzymes ATP synthease and NADH
dehydrogenase were found in different libraries, and genes
involved in protein synthesis, namely elongation factors and
ribosomal proteins, were found in all libraries.

C ONCLUSIONS
The sequences presented in this article represent tagging of
at least 60 new genes with putative database matches from
P. monodon, 49 of which have not previously been identified
in crustaceans. In addition, at least 42 distinct sequences
with no database matches were detected, representing either
completely novel functions, or genes with sequences that
are too diverged from genes of known sequence and similar
function in other organisms to enable database matching.
Hence, the cephalothorax and eyestalk cDNA libraries
proved suitable for EST analysis.
The domestication of P. monodon has progressed to a
point where it is now possible to construct gene maps with

a view to mapping traits of economic importance. Shrimp
ESTs will contribute type I anchor loci to the current genemapping projects and will assist in the construction of
cross-species genetic maps. Options for mapping ESTs include intron-length polymorphisms (Palumbi and Baker,
1998), single-strand conformation polymorphisms (SSCPs)
(Brady et al., 1997), and those microsatellite repeats within
cDNAs that are suitable for development into polymorphic
markers (Khan et al., 1992). Mapping of ESTs will enable
cross-species comparison of shrimp genome maps, as different species of penaeid prawns (P. vannamei, P. japonicus,
and P. monodon) have proved too divergent to allow transfer of type II markers such as microsatellite loci (Moore et
al., 1999).
To further study the physiological and developmental
significance of the sequences that have no known function
in shrimp, we are also in the process of characterizing the
spatial and temporal expression patterns of these sequences.

A CKNOWLEDGMENTS
We thank Zahra Fayazi for assistance with producing some
of the sequences presented in this article. This is contribution number 957 from the Australian Institute of Marine
Science.

R EFERENCES
Adams, M.D., Kelley, J.M., Gocayne, J.D., Dubnick, M., Polymeropoulos, M.H., Xiao, H., Merril, C.R., Wu, A., and Venter, J.C.


476 Sigrid A. Lehnert et al.

(1991). Complementary DNA sequencing: expressed sequence tags
and the human genome project. Science 252:1651–1656.
Adams, M.D., Kerlavage A.R., Fleischmann R.D., Fuldner R.A., et
al. (1995). Initial assessment of human gene diversity and expression patterns based upon 83 million nucleotides of cDNA sequence. Nature 377(Suppl):3–174.

Altschul, S.F., Gish, W., Miller, W., Myers, E.G., and Lipman, D.J.
(1990). Basic local alignment search tool. J Mol Biol 215:403–410.
Anderson, I., and Brass, A. (1998). Searching DNA databases for
similarities to DNA sequences: when is a match significant? Bioinformatics 14:349–356.
Benzie, J.A.H. (1998). Penaeid genetics and biotechnology. Aquaculture 164:23–47.
Bloomquist, B.T., Shortridge, R.D., Schneuwly, S., Perdew, M.,
Montell, C., Steller, H., Rubin, G., and Pak, W.L. (1988). Isolation
of a putative phospholipase C gene of Drosophila, norpA, and its
role in phototransduction. Cell 54:723–733.
Brady, K.P., Rowe, L.B., Her, H., Stevens, T.J., Eppig, J. Sussman,
D.J., Sikela, J., and Beier, D.R. (1997). Genetic mapping of 262 loci
derived from expressed sequences in a murine interspecific cross
using single-strand conformational polymorphism analysis. Genome Res 7:1085–1093.
Braun, J.E., and Scheller, R.H. (1995). Cysteine string protein, a
DnaJ family member, is present on diverse secretory vesicles. Neuropharmacology 34:1361–1369.
Brouwer, M., Winge, D.R., and Gray, W.R. (1989). Structural and
functional diversity of copper-metallothionins from the American
lobster Homarus vulgaris. J Inorg Biochem 35:289–303.
Browdy, C. (1998). Recent developments in penaeid broodstock
and seed production technologies: improving the outlook for superior captive stocks. Aquaculture 164:3–21.
Chomczynski, P., and Sacchi, N. (1987). Single-step method of
RNA isolation by acid guanidium thiocyanate-phenol-chloroform
extraction. Anal Biochem 162:156–159.
Depeiges, A., Goubely, C., Lenoir, A., Cocherel, S., Picard, G.,
Raynal, M., Grellet, F., and Delseny, M. (1995). Identification of
the most represented repeated motifs in Arabidopsis thaliana microsatellite loci. Theor Appl Genet 91:160–168.
Fields, C. (1994). Analysis of gene expression by tissue and developmental stage. Curr Opin Biotechnol 5:595–598.
Fingerman, M. (1992). Glands and secretion. In: Harrison, F.W.,
and Humes, A.G. (eds.). Microscopic Anatomy of Invertebrates: Volume 10, Decapod Crustacea, New York: Wiley-Liss, 345–394.
Gish, W., and States, D.J. (1993). Identification of protein coding

regions by database similarity search. Nature Genet 3:266–272.

Khan, A.S., Wilcox, A.S., Polymeropoulos, M.H. Hopkins, J.A.,
Stevens, T.J., Robinson, M., Orpana, A.K., and Sikela, J.M. (1992).
Single pass sequencing and physical and genetic mapping of human brain cDNAs. Nature Genet 2:180–185.
Marra, M.A., Hillier, L., and Waterston, R.H. (1998). Expressed
sequence tags—ESTablishing bridges between genomes. Trends
Genet 14:4–7.
Matsumoto, H., and Yamada, T. (1991). Phosrestins I and II:
arrestin homologs which undergo differential light-induced phosphorylation in the Drosophila photoreceptor in vivo. Biochem Biophys Res Commun 177:1306–1312.
Moore, S.S., Whan, V., Davis, G.P., Byrne, K., Hetzel, D.J.S., and
Preston, N.P. (1999). The development and application of genetic
markers for the Kuruma prawn Penaeus japonicus. Aquaculture
173:19–32.
Morris, S.A., Schroder, S., Plessman, U., Weber, K., and
Ungewickell, E. (1993). Clathrin assembly protein AP180: primary
structure, domain organization and identification of a clathrin
binding site. EMBO J 12:667–675.
Neckameyer, W.S., and White, K. (1992). A single locus encodes
both phenylalanine hydroxlase and tryptophan hydroxylase activities in Drosophila. J Biol Chem 267:4199–4206.
Palumbi, S.R., and Baker, C.S. (1994). Contrasting population
structure from nuclear intron sequences and mtDNA of humpback whales. Mol Biol Evol 11:426–435.
Rainer, J., and Brouwer, M. (1993). Hemocyanin synthesis in the
blue-crab Callinectes sapidus. Comp Biochem Physiol B 104:69–73.
Raming, K., Freitag, J., Krieger, J., and Breer, H. (1993). Arrestinsubtypes in insect antennae. Cell Signal 5:69–80.
Rosenberry B. (1997). World Shrimp Farming 1997. San Diego,
Calif.: Shrimp News International.
Ruyter-Spira, C.P., Crooijmans, R.A., Dijkhof, R.M., Van Oers,
P.M., Strijk, J.A., Van der Poel, J., and Groenen, M.M. (1996).
Development and mapping of polymorphic microsatellite markers

derived from a chicken brain cDNA library. Anim Genet 27:229–
234.
Sambrook, J., Fritsch, E.F., and Maniatis, T. (1989). Molecular
Cloning: A Laboratory Manual, 2nd ed. New York: Cold Spring
Harbor Laboratory Press.
Tassanakajon, A., Tiptawonnukul, A., Supungul, P., Rimphanitchayakit, V., Cook, D., Jarayabhand, P., Klinbunga, S., and Boonsaeng, V. (1998). Isolation and characterization of microsatellite
markers in the black tiger prawn Penaeus monodon. Mol Mar Biol
Biotechnol 7:55–61.



×