Tải bản đầy đủ (.pdf) (10 trang)

Báo cáo Y học: Cloning, chromosomal localization and characterization of the murine mucin gene orthologous to human MUC4 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (361.86 KB, 10 trang )

Cloning, chromosomal localization and characterization
of the murine mucin gene orthologous to human
MUC4
Jean-Luc Desseyn, Isabelle Clavereau and Anne Laine
Unite
´
560 INSERM, Place de Verdun, Lille, France
We report here the full coding sequence of a novel mouse
putative membrane-associated mucin containing three
extracellular EGF-like motifs and a mucin-like domain
consisting of at least 20 tandem repeats of 124–126 amino
acids. Screening a cosmid and a BAC libraries allowed to
isolate several genomic clones. Genomic and cDNA
sequence comparisons showed that the gene consists of 25
exons and 24 introns covering a genomic region of  52 kb.
The first intron is  16 kb in length and is followed by an
unusually large exon ( 9.5 kb) encoding Ser/Thr-rich
tandemly repeated sequences. Radiation hybrid mapping
localized this new gene to a mouse region of chromo-
some 16, which is the orthologous region of human chro-
mosome 3q29 encompassing the large membrane-anchored
mucin MUC4. Contigs analysis of the Human Genome
Project did not reveal any other mucin on chromosome 3q29
and, interestingly, our analysis allowed the determination of
the genomic organization of the human MUC4 and showed
that its exon/intron structure is identical to that of the mouse
gene we cloned. Furthermore, the human MUC4 shares
considerable homologies with the mouse gene. Based on
these data, we concluded that we isolated the mouse ortho-
log of MUC4 we propose as Muc4. Expression studies
showed that Muc4 is ubiquitous like SMC and MUC4, with


highest levels of expression in trachea and intestinal tract.
Keywords: MUC4; SMC; expression; large exon; tandem
repeat.
Epithelial mucins are high molecular mass glycoproteins
synthesized by secretory epithelia. All mucins have a large
domain composed of tandemly repeated sequences rich in
serine and threonine residues that carry O-linked oligo-
saccharides. Epithelial mucins are usually subdivided into
secretory and membrane-associated classes [1] and, in
humans, the latter family contains at least five members.
Four of them, MUC3A, 3B, 11 and 12, are organized in a
cluster of genes on chromosome 7q22 [2–4], while MUC4
andMUC1havebeenmappedtothechromosomes3and
1, respectively (reviewed in [5]). Except for the small mucin
MUC1, all membrane-associated mucins are very large and
seem to share four common domains: a short cytoplasmic
domain, a transmembrane domain, EGF-like domains and
the large O-glycosylated region with an amino-acid
sequence that differs from one mucin to another and is
not conserved during evolution.
To date, MUC1 and the rat tumor sialomucin complex
(SMC) are the two membrane-bound mucins best char-
acterized. They are both expressed on the cell surface as a
complex composed of two subunits coming from the same
polypeptide precursor [6,7]. It has been shown by cell
transfection and coimmunoprecipitation that SMC can act
as a ligand for the tyrosine kinase p185
neu
(homolog of
ErbB2) [8] suggesting that SMC may play a role in

malignancy. Furthermore, SMC seems to be implicated in
the metastasis and in the resistance of SMC-expressing cells
to natural killer cells [9,10].
More recently, cloning and sequencing human MUC4
cDNAs showed similarities between MUC4 and SMC at
the N- and C-terminal portions of the molecules [11,12].
Although less is known about other large membrane-bound
mucins, it has been suggested that MUC4 is a homolog of
SMC.
Cloning a complete mucin cDNA and/or a complete
mucin gene is not an easy task for several reasons: (a) the
RNA messenger is very large (>10 kb) and often expressed
in low abundance in normal tissues; (b) the highly repetitive
sequence of the central portion makes it difficult to map,
subclone and sequence; (c) 5¢ and 3¢ ends are usually very
similar between genes from the same class; and (d) clones
may show instability presumably due to the repetitive
structure of sequences and this may explain that BAC and
YAC clones covering mucin genes clusters are still lacking in
the Human Genome Project.
In an effort to determine by genetic strategies, the specific
functions of large membrane-associated mucins, we cloned
and determined the complete cDNA and genomic sequences
of a new large putative membrane-bound mouse mucin.
This gene was assigned to mouse chromosome 16. RT-PCR
experiments showed high expression of Muc4 in trachea,
duodenum and intestine in contrast with a lower expression
in stomach, in salivary glands, in liver and gallbladder, and
in kidney. Chromosomal localization, sequence and expres-
sion analyses provide strong evidence that the human

MUC4 and the rat SMC are both the ortholog of the new
mouse gene we characterized and proposed as Muc4.We
analyzed human contigs of the evolving human draft
Correspondence to A. Laine, Unite
´
560 INSERM,
Place de Verdun, 59045 Lille Cedex, France.
Fax: + 33 320538 562, Tel.: + 33 320298 850,
E-mail:
Abbreviations:SMC,sialomucincomplex;vWD,vonWillebrand-D
domain; vWF, von Willebrand Factor; RH, radiation hybrid.
Note: the nucleotides sequences reported in this paper have been
submitted to GenBank with accession numbers AF441785, AF441786
and AF441787.
(Received 5 February 2002, revised 1 May 2002, accepted 2 May 2002)
Eur. J. Biochem. 269, 3150–3159 (2002) Ó FEBS 2002 doi:10.1046/j.1432-1033.2002.02988.x
sequence and this allowed to determine the complete
genomic organization of the human MUC4 which spans
at least 70 kb and we found that its organization is very
close to the one of the mouse Muc4 we described in this
paper. Furthermore, domain analyses of Muc4, SMC and
the human MUC4 revealed that these three large molecules
have a Nido domain followed by a von Willebrand-D
domain (vWD) and three EGF-like motifs.
EXPERIMENTAL PROCEDURES
Isolation of total RNAs
Adult tissues from mice were obtained fresh, rapidly frozen
andstoredat)80 °C before use. Total RNA was extracted
from parotid, submaxillary gland, salivary glands, trachea,
stomach, liver and gallbladder, duodenum and intestine and

kidney by using guanidine hydrochloride as previously
described [13].
RT-PCR amplifications
Single-stranded cDNA was generated from 1 lgoftotal
RNA using random hexamers or oligo(dT) primer. Several
oligonucleotides (Fig. 1) were designed by comparison of
the similar regions in the 3¢ regions of the gene coding for the
rat SMC and of the human MUC4 gene. The two sense
oligonucleotides NAU728 (5¢-TCCACTATCTGAACAA
CCAACT-3¢) and NAU727 (5¢-ATGCTGATTTCTCTAG
CTCCA-3¢) and the two antisense oligonucleotides NAU
726 (5¢-AACTTGTTCATGGAGCAGCCGC-3¢)and
NAU729 (5¢-AGTTGGTTGTTCAGATAGTGGA-3¢)
were designed from the SMC sequence (GenBank accession
number M91662). The sense oligonucleotide NAU576
(5¢-CCCCACATCACCACCTTGGAT-3¢) and the anti-
sense oligonucleotide NAU484 (5¢-AGAGAAACAGGGC
ATAGGACC-3¢) have been chosen from the human
MUC4 cDNA sequence (GenBank accession number
AJ000281). The antisense oligonucleotides NAU750
(5¢-CTACATTTCTTGGAGAGGCTGAGT-3¢)and
NAU762 (5¢-TGGAGCTAGAGAAATCAGCAT-3¢)
were designed from our mouse cDNA sequences.
NAU762 was used in RT-PCRs with the sense oligonucleo-
tide NAU941 from within the repeat region (see further) to
obtain the cDNA corresponding to the end of the repeat
region. The sense oligonucleotide NAU972 (5¢-GAGCTGC
CTGTGTTCTTGCCTCCT-3¢) was designed from the
sequence coding for the signal peptide of SMC (GenBank
accession number U06746) and used in RT-PCR experi-

ments with the antisense oligonucleotide NAU966 from
within the repeat region (see below) to obtain the 5¢ part of
the cDNA 2308. PCR amplification was carried out in
50-lL reaction volumes containing 5 lL of the first strand
cDNA, 0.3 m
M
dNTPs, 15 pmol of each primer, 2.5 U of
Taq DNA polymerase (Roche) and PCR buffer (final
concentration 10 m
M
Tris/HCl; 1.5 m
M
MgCl
2
;50m
M
KCl). PCR parameters were 94 °C for 2 min, followed by
30 cycles at 94 °C for 45 s, 55 °Cfor1minand72°Cfor
2 min, followed by a final extension at 72 °C for 10 min.
5¢ RACE
The three antisense primers used in this experiment
(NAU1102, NAU1126 and NAU1123) were designed from
the 5¢ part of the cDNA 2308. First-strand cDNA was
synthesized using the 5¢-AmpliFINDER RACE kit (Roche)
and RNA (1 lg) from trachea or duodenum and intestine
with the antisense oligonucleotide NAU1102 (5¢-TGGAAC
TTGGAGTATCCCTTG-3¢). The cDNA was then tailed
and PCR reaction was performed using the nested antisense
primer NAU1126 (5¢-ATGTTGATGAGGTCGATG
CTT-3¢) and the oligo dT-anchor primer. Nested PCR

involving a second round amplification using the antisense
oligonucleotide NAU1123 (5¢-CTGCTGGAAAGGGACA
TGGGT-3¢) and the anchor primer was carried out with
1 lL of the reaction mixture obtained from the previous
round of PCR as template. A major band of 230 bp was
amplified, cloned and sequenced.
Screening of genomic libraries
The mouse cDNA probe 1719 was generated by reverse-
transcription PCR using the two oligonucleotides NAU728
and NAU726 and used to screen a mouse pWE15 cosmid
library (Stratagene). Four clones were obtained and studied
but their analysis showed that they did not contain the
central region or the 5¢ end of the gene. The cosmid clone
containing the longest part of the new gene was named
CAR1 and studied. The 1719 probe and a cDNA probe
(1820) obtained by RT-PCR using the two oligonucleotides
NAU727 and NAU484 were then used to screen a mouse
BAC (bacterial artificial chromosome) library (Incyte
Genomics, Inc.). Filters were prehybridized, hybridized
and washed according to the manufacturer’s instructions
and one positive clone was obtained and named BAC4.
Fig. 1. Muc4 cDNA (A) and genomic (B) cloning strategy, and protein
domains organization (C). (A,B) Several cDNA clones and DNA
fragments are indicated with their numbers. Some primers and their
directions are indicated (not to scale) by horizontal arrows and their
NAU (N) numbers. Restriction enzymes: N, NdeI; B, BamHI; K,
KpnI. The hatched part corresponds to the repeat region. (C) Box with
dashes represents the signal peptide; dense dots, Ser/Thr-rich non-
repetitive sequence domain; diagonal lines, repetitive domain; square
blocks, first Cys-rich region; wavy lines, domain rich both in Ser/Thr

and N-glycosylation sites; black boxes, EGF-like domains; white box,
unique sequence; grey box, transmembrane domain; horizontal lines,
cytoplasmic domain. The star indicates the GDPH sequence. The thin
vertical lines locate the 24 introns. Some exon numbers are indicated
below.
Ó FEBS 2002 The mouse mucin gene Muc4 (Eur. J. Biochem. 269) 3151
Restriction mapping of the BAC clone
and Southern blot analyses
TheBACwasdigestedtocompletionwiththerestriction
enzyme NotI. For each of the other restriction enzymes
used, one part of this NotI-digested BAC was digested to
completion and a second part partially digested with the
enzyme in order to generate a set of fragments that begin at
the T7 or SP6 promoters and end at the site of cleavage of
the chosen enzyme. These digestion products were fraction-
ated on an agarose gel (0.6%) and blotted overnight to
Hybond
TM
-N
+
membrane (Amersham Corp.). The frag-
ments were then mapped relative to the T7 or SP6
promoters by hybridizing the membrane with end-labeled
oligonucleotide-sequencing primers specific for these prim-
ers. To determine the fragments to study we used various
end-labeled oligonucleotides designed from the cDNA
sequences.
Sequence determination and analyses
Fragments of the cosmid clone CAR1 were obtained after
restriction enzyme digestion and subcloned into pBlue-

scriptII KS(+) vector (Stratagene). Genomic fragments of
interest obtained by PCR on cosmid CAR1 DNA and on
the clone BAC4 DNA using oligonucleotides designed
from cDNA sequences were cloned in pCR2-1 vector.
Large fragments of interest were cut from BAC4,
electrophoresed on a 0.8% agarose gel, electroeluted and
cloned to be sequenced and further digested and sub-
cloned into pBluescriptII KS(+) vector (Stratagene). To
determine the sequences of the 5¢ and 3¢ ends of intron 1,
we performed PCR experiments using BAC4 DNA as
template, the sense oligonucleotide NAU972 (located in
exon 1) and NAU1126 (located in exon 2), respectively,
and a mixture of hexamers used as second primer. PCR
products were subcloned into pCR4-TOPO vector (Invi-
trogen Ltd).
Plasmid inserts were sequenced on both strands several
times on LI-COR 4000 and computer analyses were
performed using
PC
/
GENE
Software. The mouse cDNA
and genomic sequences reported in this paper have been
deposited in the GenBank with accession numbers
AF441785 and AF441786, respectively.
BLAST
searches
of the human draft sequence using the MUC4 cDNA
(GenBank accession numbers AJ000281 and AJ010901)
and the Ensembl Genome Server (em-

bl.org/) revealed two clones spanning MUC4: RP11-
423B7 and RP11-171N2, respectively. These two sequenc-
es were aligned with the human cDNA, the mouse
cDNA and genomic sequences we determined. The full
genomic organization of the human MUC4 has been
deposited in the GenBank with accession number
AF441787.
Zoo blot
An interspecies Zoo blot containing EcoRI-digested DNA
from human, monkey, rat, mouse (Balb/c), dog, cow,
rabbit, chicken and yeast from Clontech was prehybridized,
hybridized and washed according to the manufacturer’s
instructions with the probe 2155 (594 bp) containing one-
and-a-half repeats.
Expression of the mouse gene
In order to determine the expression of the new gene, RNAs
isolated from various tissues were reverse-transcribed using
random hexamers as primers. cDNAs were subjected to
PCR amplification using the oligonucleotides NAU764 and
NAU726 located within the 3¢ end and designed to amplify
a 324-bp fragment. After electrophoresis on a 1% agarose
gel (FMC, Rockland, ME, USA), the amplified products
were stained with ethidium bromide and transferred for
analysis by Southern blot using a specific internal antisense
oligonucleotide NAU1432 (5¢-GCATTGGGGCCCATCT
GGCAGG-3¢) as a probe. The efficiency of the cDNA
synthesis was estimated by PCR using two mouse b-actin
specific primers: sense 5¢-GTGGGCCGCTCTAGGCAC
CA-3¢ and antisense 5¢-TGGCCTTAGGGTTCAGGG
GG-3¢ for an expected band of 240 bp.

Radiation hybrid (RH) mapping
The chromosomal localization of the new gene was
performed by PCR analysis using the T31 mouse/hamster
RH panel (Research Genetics) [14]. Primers NAU726
(antisense, see above) and NAU764 (sense 5¢-AAGTATGC
TGGAGGAGTACTT-3¢) located within the 3¢ end of the
mouse gene were tested on mouse and hamster DNA. These
oligonucleotides allow the amplification of a mouse DNA
fragment of 1182 bp without amplification from hamster
genomic template. The PCR reaction (50 lL) consists of
25 ng of DNA template, 15 pmol of each primer, 1.5 m
M
MgCl
2
,25l
M
of each dNTP and 2 U of Taq DNA
polymerase (Roche). The cycling conditions were: 2 min at
94 °C, 38 cycles of 15 s at 95 °C, 40 s at 56 °Cand90sat
72 °C followed by a final extension at 72 °Cfor7min.PCR
products were electrophoresed through a 1% agarose gel,
transferred overnight to membranes and hybridized with an
internal
32
P-labeled oligonucleotide NAU914 (5¢-GCTGCC
TAAGAATGGATACCCT-3¢). Logarithm of odds (lod)
scores were analyzed using the Jackson Laboratory mouse
RH data base ().
RESULTS
Isolation and sequencing of cDNAs

Oligonucleotides and cDNA fragments are located on
Fig. 1A. Using the sense oligonucleotide NAU728 and the
antisense oligonucleotide NAU726, a first cDNA fragment
of 481 bp (named 1719) was obtained by RT-PCR on
mouse trachea RNA. Using the sense oligonucleotide
NAU727 and the antisense NAU484 or the antisense
oligonucleotide NAU750, which were upstream of
NAU728 and NAU726 in the aligned sequences of the
human MUC4 and rat SMC, we obtained by RT-PCR the
two fragments of 258 bp (named 1820) and 1755 bp (named
1751), respectively. Using the sense NAU576 and the
antisense NAU729, we obtained a cDNA fragment of
1615 bp (named 1729) that overlaps the fragment 1751 by
900 bp. All the fragments were subcloned and sequenced.
The cDNA compiled sequence of the 3¢ region is  3 kb.
Using NAU762 and NAU941 which was designed from the
repeated sequence we found within the sequence of a
fragment from the BAC4 clone (see Characterization of the
3152 J L. Desseyn et al. (Eur. J. Biochem. 269) Ó FEBS 2002
BAC genomic clone and sequencing strategy section), we
obtained two fragments of 1.6 and 2 kb named 2350 and
2355, respectively. They both overlap the 3¢ end of the
repeat region.
To sequence the 5¢ region, we performed RT-PCR using
NAU972 (designed from SMC) and the antisense oligo-
nucleotide NAU966 chosen within the repeated sequence.
We obtained a cDNA of 523 bp (named 2308). Using an
internal sense oligonucleotide (NAU1109, 5¢-CAAGTAAA
ACAGAACAAACAT-3¢) and NAU966 we obtained the
cDNA named 2367 (1474 bp) that contains two repeats of

363 and 366 bp surrounding a unique sequence. 5¢ RACE
PCR was then performed and we cloned and sequenced a
major 230-bp band (named 2359). Within this unique
sequence is an ATG with a Kozak consensus sequence [15]
suggesting that it represents the codon for initiation of
translation and, therefore, codes for the N-terminus of the
protein. This fragment contains a 105-bp fragment of 5¢
untranslated sequence upstream of the putative start site
ATG.
Characterization of the cosmid clones
and sequencing strategy
Four positive cosmid clones were obtained by screening a
mouse cosmid library using the cDNA probe 1719. The
clone that contains the longest part of the new gene, CAR1,
was studied. Two adjacent KpnI–KpnI fragments of 5 and
4 kb (named 1941 and 1956, respectively, Fig. 1B) overlap-
ping the 3¢ end of the cDNA were subcloned. We also
performed PCR using the CAR1 DNA as template and the
oligonucleotides we used previously to clone the cDNAs.
All PCR products were subcloned into pCR2.1 vector and
sequenced on both strands. We then obtained the complete
genomic sequence of the 3¢ part of the gene encompassing
17 kb. Three poly(A) signals were found after the stop
codon. Analysis of the four cosmid clones by PCR revealed
that they do not contain any sequence upstream of the
oligonucleotide NAU484.
Characterization of the BAC genomic clone
and sequencing strategy
We screened a BAC library using the cDNA probe 1820 and
one clone, BAC4, was obtained and studied. PCR amplifi-

cations and hybridization experiments revealed that this
clone included the entire gene (data not shown). A PCR
amplification product (1743) of 2320 bp was obtained using
the two oligonucleotides NAU727 and NAU484 and the
BAC4 DNA as template. This fragment was completely
sequenced and was shown to contain one NdeIrestriction
site. BAC4 DNA, digested by various restriction enzymes,
was blotted and hybridized with the end-labeled oligo-
nucleotide NAU856, designed from the sequence of the
fragment 1743. One NdeI–NdeI genomic fragment of 7 kb
was identified, purified and subcloned (named 1850). It
contains two repeats of 372 bp followed by one repeat of
378 bp at its 5¢ end. We then chose to synthesize the two
following oligonucleotides from the repeat sequence: the
sense oligonucleotide NAU941 (5¢-GAGACAGAAACAA
GTTCCCAA-3¢) and the antisense oligonucleotide
NAU966 (5¢-CTGGGATGAAGGTGTCAATGA-3¢).
RT-PCR experiments on trachea RNA using this pair of
primers allowed the amplification of several fragments
containing various numbers of repeats. PCR performed on
BAC4 DNA allowed determination of the exon–intron
junctions.
Genomic organization and
BLAST
searches
Exon–intron boundaries were defined by alignment of the
cDNA and the genomic sequences and this revealed a total
of 25 exons (Fig. 1C) ranging in size from 71 bp to  9.5 kb
(Table 1). The size of the largest exon, containing the repeat
sequence (exon 2), was estimated from restriction mapping

experiments. The last exon is composed of a coding
sequence of 193 bp and of an untranslated region of at
least 131 bp. The size of the 24 introns ranges from 79 bp to
about 16 kb. The size of the largest intron was determined
by restriction mapping. Each intron begins with a GT and
ends with an AG. The gene spans  52 kb from the
initiation ATG codon to the stop codon. Three poly(A)
signals are present located 122, 191 and 361 bp downstream
of the stop codon.
BLAST
searches showed homologies with
SMC and the human MUC4 mucin. Sequence similarity
searches using the MUC4 cDNA identified two clones on
human chromosome 3 from the evolving working draft
sequence. The clone RP11-423B7 is  173 kb and consists
of 16 ordered pieces and the clone RP11-171N2 is  164 kb
and consists of 29 unordered pieces. Multiple alignments of
genomic sequence pieces with the MUC4 cDNA allow for
the first time determination of the complete exon–intron
structure of this large membrane-bound mucin gene.
Because several differences exist between the human cDNA
and the two human genomic sequences, exons sizes given in
the Table 1 for the human MUC4 gene correspond to the
sizes deduced from the cDNA sequence. All introns have
the same classes and positions in MUC4 and in the mouse
gene we describe here.
Chromosomal localization
Using the T31 mouse/hamster RH panel that consists of 100
hybrid cell lines [14], the new gene was mapped by PCR
screening on the chromosome 16 with highest lod score of

linkage (13.7) to the marker D16Mit60. This region exhibits
synteny with human chromosome 3q where the human
MUC4 gene is located [16]. This and sequence similarities
suggest that the new gene we cloned is the mouse ortholog
of both MUC4 and SMC; we have named this gene Muc4.
Analysis of the nucleotide and deduced amino-acid
sequences and domains organization
A schematic representation of the deduced amino acids
sequence of Muc4 is depicted in Fig. 1C. The first 29 amino
acids, predicted as the signal sequence, are followed by a
Ser/Thr-rich region coded by exons 2–8, followed by a
cysteine-rich region of 139 amino acids (Cys ¼ 7.2%)
coded by exons 9–11, a second Ser/Thr-rich region of
374 amino acids coded by exons 12–18 and a C-terminal
cysteine-rich region of 357 amino acids (Cys ¼ 7.6%)
coded by the last seven exons. The first Ser/Thr-rich region
is made of a 63-amino-acid peptide followed by 20 or 21
tandem repeats of 124–126 amino acids and this region ends
with a 422-amino-acid peptide that is Ser/Thr-rich
Ó FEBS 2002 The mouse mucin gene Muc4 (Eur. J. Biochem. 269) 3153
Table 1. Comparison of nucleotide sequences of intron–exon junctions between the mouse Muc4 and human MUC4 showing that exon sizes and
positions are conserved between the two genes. The mouse exons (except exon 2) and introns (except intron 1) have been entirely sequenced. Intron
positions have been determined by alignment between genomic and cDNA sequences. Uppercase and lowercase letters are for exon and intron
sequences, respectively.
Exons Introns
No. Size (bp) 5¢ end 3¢ end No. Size (bp) Class
Mouse 1 >178
a
ACCTGgtaagacaag 1  16000
b

1
Human 1 >110
a
CCCAGgtaagtgatg 1 >2008 1
Mouse 2  9500
b
tttcaagaagATGCT CTCAGgtgagtcagc 2 223 1
Human 2 >15 432
c
ttcactccagGAACC ATCAGgtagctgcca 2 334 1
Mouse 3 129
ccatgtccagGATTG TCAGGgtaagtgata 3 1669 1
Human 3 153
acgtgtccagGAATG GAGAGgtgaggccat 3 >2579
c
1
Mouse 4 134
ccttgtctagGCATT TCTACgtgagtctct 4 761 0
Human 4 134
cctggcccagGAGTT TCTACgtgagtccgg 4 >2147
c
0
Mouse 5 165
ttgtgctcagGTTAC ACCAGgtgagtcatt 5 945 0
Human 5 165
atgtgctcagTTCAC ATCAGgtgagccttt 5 1629 0
Mouse 6 156 cctttcctagGAATA TTGGGgtgagtggat 6 1102 0
Human 6 156
cctttcctagGAATA TCGGGgtgagtagac 6 1063 0
Mouse 7 131

caacttccagACCAA TCCAGgtaagatcgg 7 1437 2
Human 7 131
ccacccccagAGCAA TCTAGgtaggatggg 7 >1759
c
2
Mouse 8 89
ttgcctgcagTGGAG ATTAGgtaaaagtgc 8 1680 1
Human 8 90/89
d
tttcctgcagTGGAG CTCAGgtaaaagtgc 8 1213 1
Mouse 9 180
tgttcctcagGCATC CCCAGgtgatacctc 9 202 1
Human 9 180
ccgacctcagGCCTC CATAGgtgacacctc 9 145 1
Mouse 10 117
ctctttgcagGTTGG GTTTGgtaagtatct 10 681 1
Human 10 114/126
d
cttgtttcagGTCGC GTTGGgtgatctcaa 10 801 1
Mouse 11 120
ttcttcacagATGAG GCCCGgtgagcatca 11 303 1
Human 11 120
ttctccgcagCCCAG GCCCGgtgagcgaca 11 396 1
Mouse 12 212
tttcttttagCTTGG TCACGgtaagtgagg 12 1174 0
Human 12 209
ttccttccagCCTGG TCACGgtgagtgagg 12 487 0
Mouse 13 85
tctttcccagGTTCA TGAAGgtaggctccg 13 363 1
Human 13 91

ctccttccagGTCCA CGGAGgtaggttggg 13 820 1
Mouse 14 168
tgtcttccagTCTTA TCTGGgtaagatgca 14 445 1
Human 14 168
gatgctccagGCCAG CCTGGgtgagggcgg 14 501 1
Mouse 15 102
tgtctcacagGAGTG GACATgtgagtctgg 15 351 1
Human 15 102
tgtccctcagGGGTC GACCTgtgagtctgg 15 366 1
Mouse 16 234 tgtgttacagGGCAC CCTCAgtaagtgaca 16 1133 1
Human 16 234
tgtgttacagGGCAG CCTCAgtaagtggcc 16 2059 1
Mouse 17 138
tctgtttcagATCAG CTTTGgtatgaatct 17 3067 1
Human 17 138
tgtgtttcagATCAG CTTTGgtaggactat 17 1806 1
Mouse 18 182
ctctggacagAAAAC TTGAGgtgagtagtg 18 838 0
Human 18 182
cccggggcagAGAAT TGGAGgtgagtgttg 18 >2683
c
0
Mouse 19 160
tgtcattcagGTGAC TGCAGgtgagtgtgg 19 862 1
Human 19 160
cctcctccagGTGGC TGCGGgtgagccggg 19 982 1
Mouse 20 174
ccctttacagCTCTG TACGGgtatggctaa 20 695 1
Human 20 180
cactctgcagCTCTG CCTAGgtaccgccag 20 1659 1

Mouse 21 74
ttatctccagAGCTT CCTCGgtcagtgctg 21 1020 0
Human 21 74
ccatctccagAACTT CCTCGgtcagtgctg 21 1101 0
Mouse 22 71
tccattgtagGTGGC CGAACgtaagtagag 22 79 2
Human 22 65
tctaacctagGTGGC CGAATgtaagtggga 22 94 2
Mouse 23 236
ttccatacagCTCTC GGCTCgtgagtcact 23 1613 1
Human 23 224
tcccacacagCGATT AGCCCgtgagtccgt 23 1819 1
Mouse 24 163
ctgcctacagTGAAC TGCAGgtgggtaggg 24 858 2
Human 24 163
ttcccgacagTGAAC TGCAGgtgcataggg 24 1522 2
Mouse 25 >411
a
ctgtcttcagCTGCG
Human 25 >323
a
tggtcaccagCTGTG
a
the precise sizes of UTRs have not been determined.
b
Size estimated by restriction mapping.
c
Size estimated from the two contigs analysis
(see text).
d

Two different sizes depending on the contig considered.
3154 J L. Desseyn et al. (Eur. J. Biochem. 269) Ó FEBS 2002
(S + T ¼ 28.7%). According to the results of restriction
mapping, the whole repeat region encompasses  8.3 kb, of
which more than 4 kb have been sequenced. Amino-acid
repeats are aligned in Fig. 2A and the comparison of the
126 amino acids consensus sequence with the consensus
sequence of SMC repeats is shown (Fig. 2B). A unique
sequence of 119 amino acids is inserted between the first two
repeats a1 and a2. A
32
P-labelled oligonucleotide designed
from this unique sequence was hybridized to a Southern
blot of the BAC4 DNA digested with various restriction
enzymes and revealed a single HpaI–HpaIbandof1.5kb
suggesting that this sequence is unique (data not shown).
The order of the repeats b1-b2, c1-c2 and d1 is unknown.
There is at least one potential N-glycosylation site (Asn-
X-Ser/Thr where X is any amino acid except Pro) in each
repeat. The tandem repeat array ends with the e1-e8 repeats.
The second Ser/Thr rich region starts at a AWTFGDPH
peptide and consists of 374 amino acids (S + T ¼ 21.1%).
Moreover, it contains 14 potential N-glycosylation sites.
The GDPH sequence is conserved in human MUC4 [11]
and in SMC [17]. It has previously been shown that SMC is
cleaved early in the pathway to the cell surface [7] at this site.
Comparison with SMC and MUC4 and domains analysis
using
SMART
N-Terminal parts of Muc4, SMC and MUC4 were aligned

(Fig. 3) showing that signal sequences are very similar for
the three molecules but it is noticeable that mouse and rat
are closer. The Ser/Thr-rich region upstream of the repeat
area (amino acids 30–98 in Muc4) shows strong similarity
between mouse and rat while this region differs markedly in
the human sequence. This region is 63 amino acids in rodent
and longer in human (951 amino acids, shortened in Fig. 3).
Comparison of the predicted amino-acid sequences coded
by exons 3–23 with those of MUC4 and SMC peptides
(Fig. 4) shows that the three peptides are very similar
although areas of high sequence homology are interspersed
in SMC with four sections of low homology (see sequences
in italic on Fig. 4). Nevertheless, the nucleotide sequence of
Muc4 shares high homology with the cDNA sequence of
SMC except for a few nucleotide insertions/deletions (data
not shown). It is noticeable that 12 potential N-glycosyla-
tion sites are perfectly conserved in Muc4, SMC and
MUC4. Amino-acid sequence analysis using the
SMART
program [18] shows the presence of a Nido domain followed
by a von Willebrand-D domain (Fig. 1C) and three EGF-
like motifs (Figs 1C and 4). Concerning the vWD domain,
the sequence (residues 397–578, Fig. 4) is more similar to the
vWF-D2 domain (residues 378–540, GenBank accession
number P04275). A putative transmembrane motif of
23 amino acids at the C-terminal portion of the molecule
is followed by a short cytoplasmic tail (18 amino acids).
Conservation of
Muc4
tandem repeats

A comparison of the consensus sequence of Muc4 tandem
repeats with the consensus sequence of SMC tandem
repeats published previously [19] demonstrated that there is
a good degree of sequence identity (67%) between the two
Fig. 2. Alignment of the amino-acid sequences of the Muc4 repeats (A) and comparison of the repeat consensus sequences of Muc4 and SMC (B).
(A) Under the consensus sequence, the repeats are numbered in order of appearance from N- to C-terminal. The order of the repeats (b–d) is
unknown. Repeats e1–e8 are the last eight repeats. Dots indicate exact sequence matches with the consensus sequence and dashes gaps in the
sequence. A unique sequence inserted between repeats a1 and a2 is observed. The potential N-glycosylation sites are underlined. (B) Conserved
amino acids are shaded. Dash indicates a gap.
Fig. 3. Alignment of the amino-terminal
sequences of mouse Muc4, rat SMC and human
MUC4. The dashes indicate gaps in the
sequence. Identical sequences are shaded.
Ó FEBS 2002 The mouse mucin gene Muc4 (Eur. J. Biochem. 269) 3155
rodent species (Fig. 2B). We then hybridized an interspecies
Zoo blot containing EcoRI-digested DNA from human,
monkey, rat, mouse (Balb/c), dog, cow, rabbit, chicken and
yeast with the probe 2155 corresponding to one-and-a-half
repeats. This showed a 20-kb faint band with the rat DNA
while a very strong signal at 18 kb is observed for the mouse
DNA supporting that Muc4 and SMC are orthologs. No
signal is observed for other species (Fig. 5).
Tissue specific expression of
Muc4
The tissue distribution of Muc4 was determined by
RT-PCR using total RNA from various tissues. PCR
amplifications were analyzed by ethidium bromide staining
(Fig. 6) and Southern blotting using an internal primer as a
probe. Strong expression of Muc4 is shown in trachea,
duodenum and intestine, while a much weaker expression is

shown in stomach. In each case, the quality of the cDNA
was verified by amplification of the b-actin cDNA, shown to
be equally expressed in all cDNAs. By Southern blot an
even weaker expression in submaxillary glands, salivary
glands, liver and gallbladder and in kidney (not shown) was
obtained.
Fig. 4. Comparison of Muc4 C-terminal protein with SMC and MUC4. Dashes indicate gaps introduced in the sequence for alignment purposes.
Amino acids are numbered at the right. Conserved amino acids are shaded. Amino acids in bold italic correspond to regions with a frameshift in
SMC. The GDPH sequence is underlined with triangles. The potential N-glycosylation sites conserved in the three molecules are indicated with
hashes. The three EGF-like domains are underlined. The transmembrane domain is underlined with stars.
Fig. 5. Zoo blot hybridized with repeated sequence (probe 2155) showing
cross-hybridization between rat and mouse DNA. The Zoo-blot contains
EcoRI-digested DNA from human, monkey, rat, mouse (Balb/c), dog,
cow, rabbit, chicken and yeast. The size of the band is estimated to be
20 kb for rat and 18 kb for mouse.
Fig. 6. Expression of Muc4 by RT-PCR. Total RNA from parotid
(lane 1), submaxillary glands (lane 2), salivary glands (lane 3), trachea
(lane 4), stomach (lane 5), liver and gallbladder (lane 6), duodenum
and intestine (lane 7) and kidney (lane 8) was used. (A) One band of
324 bp is obtained with the couple of primers NAU764 and NAU726.
(B) The efficiency of the cDNA synthesis was estimated by PCR using
two mouse b-actin specific primers producing one band of 240 bp. L,
ladder;C,control(H
2
O).
3156 J L. Desseyn et al. (Eur. J. Biochem. 269) Ó FEBS 2002
DISCUSSION
The SMC is a large heterodimeric glycoprotein that protects
tumors from immune system and may influence signaling
pathways via the transmembrane subunit [8–10]. Over-

expression of SMC is believed to mask antigens at the tumor
cell surface. The precise function of the SMC transmem-
brane subunit is poorly understood due to the large size of
the molecule and the presence of several domains. In order
to further investigate the biological roles of large mem-
brane-bound mucins, we cloned a new mouse gene using
primers designed from the cDNA sequence of SMC and
human MUC4. Cloning and sequencing the human MUC4
had shown substantial similarities between MUC4 and
SMC at the N- and C-terminal portions of the molecules
[11,12] but such similarities may exist with rat and human
mucin cDNAs that still remain to be cloned. The work we
present in this paper provides strong evidence that the
mouse gene we cloned, characterized and suggested as
Muc4, is the ortholog of SMC and MUC4:(a)thethree
molecules show high sequence similarities; (b) the region of
mouse chromosome 16 on which we mapped the gene
exhibits synteny with human chromosome 3q where the
human MUC4 gene has been mapped; (c) Muc4 and MUC4
have a similar pattern of expression; (d) the tandem repeat
sequences of Muc4 share homology with the tandem repeat
sequences of SMC; and (e)
BLAST
searches and multiple
alignments allowed to determine the complete genomic
organization of MUC4 and this clearly shows that the two
genes are very close.
Based on sequence similarities we believe that we cloned
the complete coding sequence from the ATG initiator to the
stop codon of Muc4 which is followed by three poly(A)

signals. Furthermore, we suggest that the ATG is embedded
in a Kozak consensus sequence [15].
BLAST
searches revealed
one sequence deposited in the GenBank (GenBank acces-
sion number AF296636) that encompasses the first exon of
Muc4 and in which the initiator methionine suggested is the
same as the one we suggest.
Our restriction mapping and sequencing results show that
the mouse Muc4 is  52 kb and codes for a transcript of a
predicted size of  13 kb. Both human MUC4 and mouse
Muc4 genes are virtually identical in terms of the class of
introns, the exon number and size of exons. The sizes of the
24 mouse introns range from 79 to 3067 bp except the first
intron, which is unusually large ( 16 kb). The genomic
organization of MUC4 we determined by comparison of the
published cDNA and the evolving human draft sequence
[20] is very close to that of the mouse Muc4 (Table 1). The
size of intron 1 of the human MUC4 has been estimated to
be  15 kb by restriction mapping [12] and to be at least
20 kb by our analysis of the human draft sequence. Intronic
sequences are not conserved between species but sequences
surrounding splice junctions are highly similar (Table 1).
Furthermore, it is noticeable that introns of the human gene
are longer than in the mouse gene. Each intron of both
genes begins with a GT and ends with an AG, obeying
strictly the GT/AG rule of splice-junction sequences [21].
Twenty-two out of the 23 internal exons are in the range 71
to 236 bp (Table 1), in good agreement with the mean
length of exons [20,22]. Due to the repetitive structure, we

did not succeed in cloning and sequencing the full repetitive
region but we can assume by restriction mapping experi-
ments that exon 2 is  9.5 kb in length and codes for two
Ser/Thr-rich regions flanking 20 or 21 imperfect mucin-type
repeats of 124 or 126 amino acids. An unusually large exon
coding for tandem repeats rich in Ser/Thr is a common
feature of mucins [23–25] and this suggests that the tandem
repeat array arose through internal duplications rather than
through exon shuffling. Alignment of repeat sequences
(Fig. 2A) shows an insertion of a unique peptide of
119 amino acids between repeats a1 and a2. This sequence
is unrelated to the three insertion sequences of 33, 43 and 94
amino acids described previously between tandem repeats of
SMC [19]. It is interesting to note that all the repeats of
Muc4 contain at least one potential N-glycosylation site, an
uncommon feature in mucin, contained only by the mouse
submandibular small mucin [26].
The consensus sequence of the Muc4 tandem repeat is
very close to that of SMC. Nevertheless, the tandem repeat
domain of Muc4 does not show significant identity with the
human MUC4 except that they are both rich in serine and
threonine. It is known that tandem repeats differ in
sequence and size between the two species and different
mucins [5]. Previous work on rat Muc5ac [27], and mouse
Muc3 [28], together with this work, clearly shows that rat
and mouse tandem repeats are conserved suggesting a high
pressure of selection on the tandem repeat sequences of
rodents.
The derived amino-acid sequence was used to search a
collection of gapped alignments of domains using the

SMART
program [18]. As predicted, this analysis revealed at
the C-terminal portion of the molecule the two EGF-like
motifs and the transmembrane helix already described for
SMC and found in MUC4 [11,17] suggesting that the Muc4
is a member of the large membrane-bound mucins family.
MUC4, SMC and Muc4 have a smaller cytoplasmic tail
than the other members of the transmembrane epithelial
mucins family [4]. It is interesting to note that
SMART
program revealed also a third EGF-like motif, a vW-D
domain coded by exons 11–15 and a Nido domain coded by
exons 5–10 (Figs 1B and 4). A third EGF-like motif has
been identified previously in human MUC4 [29] encoded by
a single exon (exon 19).
SMART
analysis shows that these
three domains are also conserved in MUC4 and SMC. The
vWF-D domain is a feature of the large secreted gel-forming
mucins. This domain, found four times in the von Wille-
brand Factor (vWF), is rich in cysteine residues and may
participate in intermolecular disulfide bonds (reviewed in
[30]). Nevertheless, the cysteine residues conserved between
the vWF and the large secreted mucins are not conserved in
Muc4. This may reflect a lost of function of this domain
during evolution. The Nido domain has been found in
various proteins. This domain is a part of the nidogen
glycoproteins, which are expressed by mesenchymal and
epithelial cells. Nidogens have a high affinity for laminin-
binding protein and are believed to be important for

epithelial morphogenesis [31]. The significance of this
domain in membrane-bound mucins is unclear. EGF-like
motifs are found in numerous growth factors and extracel-
lular proteins involved in formation of extracellular matrix,
cell adhesion, chemotaxis and wound healing [4]. These
motifs may allow exposure of ligand-binding sites outside of
the cell. To date, no ligand has been identified for MUC3
but SMC has been shown to bind the erbB2 receptor
tyrosine kinase through one of the EGF-like domains of
Ó FEBS 2002 The mouse mucin gene Muc4 (Eur. J. Biochem. 269) 3157
SMC and this interaction modulates the receptor tyrosine
kinase activity [8]. According to these authors, SMC
interacting directly with ErbB2 extracellular domain
through its EGF1 domain potentiates signaling through
the ErbB receptor network.
Expression of epithelial mucins is tissue-specific and
several mucins may be expressed in each tissue [5]. MUC4 is
expressed in numerous normal tissues including ocular,
salivary glands, trachea, lung, stomach, colon, ovary, uterus,
prostate, and endocervix [32–34]. There is no detectable
expression in normal pancreas while there is an abnormal
expression of MUC4 in pancreatic tumors [35,36]. The
mouse Muc4 seems to have a similar pattern of expression as
MUC4. It is also expressed in numerous epithelial tissues
with the highest expression in trachea, duodenum and
intestine while a lower expression is observed in submaxil-
lary glands, salivary glands, liver, gallbladder and in kidney.
Abnormal expression of MUC4 has been reported in
several human epithelial cancers [35,37,38] and it has been
shown that SMC contributes to tumor progression [8]. The

molecular cloning and characterization of the mouse
ortholog of MUC4 will allow us to investigate the functions
of large membrane-bound mucins and the precise role of
Muc4 in cancer using gene targeting technology.
ACKNOWLEDGEMENTS
This work was supported by l’Association de Recherche contre le
Cancer (no 4458) and l’Institut National de la Sante
´
et de la Recherche
Me
´
dicale. The authors thank Dominique Demeyer for the nucleotide
sequencing, Marie-Paule Delescaut for RNA extraction, Viviane
Mortelec for media and buffers preparation, Se
´
verine Louvel and
Marie Le Masson for help in PCR and cloning.
REFERENCES
1. Lagow, E., DeSouza, M.M. & Carson, D.D. (1999) Mammalian
reproductive tract mucins. Hum. Reprod. Update 5, 280–292.
2. Kyo,K.,Muto,T.,Nagawa,H.,Lathrop,G.M.&Nakamura,Y.
(2001) Associations of distinct variants of the intestinal mucin gene
MUC3A with ulcerative colitis and Crohn’s disease. J. Hum.
Genet. 46,5–20.
3. Pratt,W.S.,Crawley,S.,Hicks,J.,Ho,J.,Nash,M.,Kim,Y.S.,
Gum, J.R. & Swallow, D.M. (2000) Multiple transcripts of
MUC3: evidence for two genes, MUC3A and MUC3B. Biochem.
Biophys. Res. Commun. 275, 916–923.
4. Williams, S.J., McGuckin, M.A., Gotley, D.C., Eyre, H.J.,
Sutherland, G.R. & Antalis, T.M. (1999) Two novel mucin genes

down-regulated in colorectal cancer identified by differential
display. Cancer Res. 59, 4083–4089.
5. Gendler, S.J. & Spicer, A.P. (1995) Epithelial mucin genes. Annu.
Rev. Physiol. 57, 607–634.
6. Ligtenberg, M.J., Kruijshaar, L., Buijs, F., van Meijer, M.,
Litvinov, S.V. & Hilkens, J. (1992) Cell-associated episialin is a
complex containing two proteins derived from a common pre-
cursor. J. Biol. Chem. 267, 6171–6177.
7. Sheng, Z.Q., Hull, S.R. & Carraway, K.L. (1990) Biosynthesis of
the cell surface sialomucin complex of ascites 13762 rat mammary
adenocarcinoma cells from a high molecular weight precursor.
J. Biol. Chem. 265, 8505–8510.
8. Carraway,K.L.,Rossi,E.A.,Komatsu,M.,Price-Schiavi,S.A.,
Huang, D., Guy, P.M., Carvajal, M.E., Fregien, N. & Carraway,
C.A. (1999) An intramembrane modulator of the ErbB2 receptor
tyrosine kinase that potentiates neuregulin signaling. J. Biol.
Chem. 274, 5263–5266.
9.Komatsu,M.,Tatum,L.,Altman,N.H.,Carothers,C.C.&
Carraway, K.L. (2000) Potentiation of metastasis by cell surface
sialomucin complex (rat MUC4), a multifunctional anti-adhesive
glycoprotein. Int. J. Cancer 87, 480–486.
10. Moriarty, J., Skelly, C.M., Bharathan, S., Moody, C.E. & Sher-
blom, A.P. (1990) Sialomucin and lytic susceptibility of rat
mammary tumor ascites cells. Cancer Res. 50, 6800–6805.
11. Moniaux, N., Nollet, S., Porchet, N., Degand, P., Laine, A. &
Aubert, J.P. (1999) Complete sequence of the human mucin
MUC4: a putative cell membrane-associated mucin. Biochem. J.
338, 325–333.
12. Nollet, S., Moniaux, N., Maury, J., Petitprez, D., Degand, P.,
Laine, A., Porchet, N. & Aubert, J.P. (1998) Human mucin gene

MUC4: organization of its 5¢-region and polymorphism of its
central tandem repeat array. Biochem. J. 332, 739–748.
13. Crepin, M., Porchet, N., Aubert, J.P. & Degand, P. (1990)
Diversity of the peptide moiety of human airway mucins.
Biorheology 27, 471–484.
14. McCarthy, L.C., Terrett, J., Davis, M.E., Knights, C.J., Smith,
A.L.,Critcher,R.,Schmitt,K.,Hudson,J.,Spurr,N.K.&
Goodfellow, P.N. (1997) A first-generation whole genome-radia-
tion hybrid map spanning the mouse genome. Genome Res. 7,
1153–1161.
15. Kozak, M. (1987) An analysis of 5¢-noncoding sequences from
699 vertebrate messenger RNAs. Nucleic Acids Res. 15, 8125–
8148.
16. Gross, M.S., Guyonnet-Duperat, V., Porchet, N., Bernheim, A.,
Aubert, J.P. & Nguyen, V.C. (1992) Mucin 4 (MUC4) gene:
regional assignment (3q29) and RFLP analysis. Ann. Genet. 35,
21–26.
17. Sheng, Z., Wu, K., Carraway, K.L. & Fregien, N. (1992) Mole-
cular cloning of the transmembrane component of the 13762
mammary adenocarcinoma sialomucin complex. A new member
of the epidermal growth factor superfamily. J. Biol. Chem. 267,
16341–16346.
18. Schultz, J., Milpetz, F., Bork, P. & Ponting, C.P. (1998) SMART,
a simple modular architecture research tool: identification of sig-
naling domains. Proc.NatlAcad.Sci.USA95, 5857–5864.
19. Wu, K., Fregien, N. & Carraway, K.L. (1994) Molecular cloning
and sequencing of the mucin subunit of a heterodimeric, bifunc-
tional cell surface glycoprotein complex of ascites rat mammary
adenocarcinoma cells. J. Biol. Chem. 269, 11950–11955.
20. Lander, E.S., Linton, L.M., Birren, B., Nusbaum, C., Zody, M.C.,

Baldwin,J.,Devon,K.,Dewar,K.,Doyle,M.&FitzHugh,W.
(2001) Initial sequencing and analysis of the human genome.
Nature 409, 860–921.
21. Mount, S.M. (1982) A catalogue of splice junction sequences.
Nucleic Acids Res. 10, 459–472.
22. Hawkins, J.D. (1988) A survey on intron and exon lengths. Nucleic
Acids Res. 16, 9893–9908.
23. Bobek, L.A., Liu, J., Sait, S.N., Shows, T.B., Bobek, Y.A. &
Levine, M.J. (1996) Structure and chromosomal localization of the
human salivary mucin gene, MUC7. Genomics 31, 277–282.
24. Desseyn, J.L., Guyonnet-Duperat, V., Porchet, N., Aubert, J.P. &
Laine, A. (1997) Human mucin gene MUC5B, the 10.7-kb large
central exon encodes various alternate subdomains resulting in a
super-repeat. Structural evidence for a 11p15.5 gene family.
J. Biol. Chem. 272, 3168–3178.
25. Lancaster, C.A., Peat, N., Duhig, T., Wilson, D., Taylor-Papadi-
mitriou, J. & Gendler, S.J. (1990) Structure and expression of the
human polymorphic epithelial mucin gene: an expressed VNTR
unit. Biochem.Biophys.Res.Commun. 173, 1019–1029.
26. Denny, P.C., Mirels, L. & Denny, P.A. (1996) Mouse sub-
mandibular gland salivary apomucin contains repeated N-glyco-
sylation sites. Glycobiology 6, 43–50.
27. Inatomi, T., Tisdale, A.S., Zhan, Q., Spurr-Michaud, S. & Gipson,
I.K. (1997) Cloning of rat Muc5AC mucin gene: comparison of its
3158 J L. Desseyn et al. (Eur. J. Biochem. 269) Ó FEBS 2002
structure and tissue distribution to that of human and mouse
homologues. Biochem. Biophys. Res. Commun. 236, 789–797.
28. Shekels, L.L., Hunninghake, D.A., Tisdale, A.S., Gipson, I.K.,
Kieliszewski, M., Kozak, C.A. & Ho, S.B. (1998) Cloning and
characterization of mouse intestinal MUC3 mucin: 3¢ sequence

contains epidermal-growth-factor-like domains. Biochem. J. 330,
1301–1308.
29. Choudhury, A., Moniaux, N., Winpenny, J.P., Hollingsworth,
M.A.,Aubert,J.P.&Batra,S.K.(2000)HumanMUC4mucin
cDNA and its variants in pancreatic carcinoma. J. Biochem.
(Tokyo) 128, 233–243.
30. Perez-Vilar, J. & Hill, R.L. (1999) The structure and assembly of
secreted mucins. J. Biol. Chem. 274, 31751–31754.
31. Ekblom, M., Falk, M., Salmivirta, K., Durbeej, M. & Ekblom, P.
(1998) Laminin isoforms and epithelial development. Ann. NY
Acad. Sci. 857, 194–211.
32. Audie, J.P., Janin, A., Porchet, N., Copin, M.C., Gosselin, B.
& Aubert, J.P. (1993) Expression of human mucin genes in
respiratory, digestive, and reproductive tracts ascertained by in situ
hybridization. J. Histochem. Cytochem. 41, 1479–1485.
33. Gipson, I.K., Ho, S.B., Spurr-Michaud, S.J., Tisdale,
A.S., Zhan, Q., Torlakovic, E., Pudney, J., Anderson, D.J.,
Toribara, N.W. & Hill, J.A. (1997) Mucin genes expressed
by human female reproductive tract epithelia. Biol. Reprod. 56,
999–1011.
34. Inatomi, T., Spurr-Michaud, S., Tisdale, A.S., Zhan, Q., Feldman,
S.T. & Gipson, I.K. (1996) Expression of secretory mucin genes by
human conjunctival epithelia. Invest. Ophthalmol Vis. Sci. 37,
1684–1692.
35. Balague, C., Audie, J.P., Porchet, N. & Real, F.X. (1995) In situ
hybridization shows distinct patterns of mucin gene expression in
normal, benign, and malignant pancreas tissues. Gastroenterology
109, 953–964.
36. Hollingsworth, M.A., Strawhecker, J.M., Caffrey, T.C. & Mack,
D.R. (1994) Expression of MUC1, MUC2, MUC3 and MUC4

mucin mRNAs in human pancreatic and intestinal tumor cell
lines. Int. J. Cancer 57, 198–203.
37. Lesuffleur, T., Zweibaum, A. & Real, F.X. (1994) Mucins in
normal and neoplastic human gastrointestinal tissues. Crit. Rev.
Oncol. Hematol. 17, 153–180.
38. Nguyen, P.L., Niehans, G.A., Cherwitz, D.L., Kim, Y.S. & Ho,
S.B. (1996) Membrane-bound (MUC1) and secretory (MUC2,
MUC3, and MUC4) mucin gene expression in human lung cancer.
Tumour Biol. 17, 176–192.
Ó FEBS 2002 The mouse mucin gene Muc4 (Eur. J. Biochem. 269) 3159

×