Tải bản đầy đủ (.pdf) (15 trang)

Báo cáo khoa học: Structure and function of KH domains docx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (935.27 KB, 15 trang )

REVIEW ARTICLE
Structure and function of KH domains
Roberto Valverde
1
, Laura Edwards
2
and Lynne Regan
1,3
1 Department of Molecular Biophysics & Biochemistry, Yale University, New Haven, CT, USA
2 Department of Molecular and Cellular Developmental Biology, Yale University, New Haven, CT, USA
3 Department of Chemistry, Yale University, New Haven, CT, USA
Introduction
The hnRNP K homology (KH) domain was named
for the human heterogeneous nuclear ribonucleopro-
tein K (hnRNP K), the first protein in which the motif
was identified [1]. The KH motif consists of approxi-
mately 70 amino acids, and is found in a diverse vari-
ety of proteins in archaea, bacteria and eukaryota
Keywords
fragile X mental retardation; interaction
motif; KH domains; K homology domain;
noncrystallographic symmetry; protein motif;
RNA-binding; RNA-binding protein;
RNA-recognition; solvent accessibility
Correspondence
L. Regan, Yale University, 266 Whitney
Avenue, New Haven, CT 06520, USA
Fax: +1 203 432 3104
Tel: +1 203 432 9843
E-mail:
(Received 3 January 2008, revised 18


February 2008, accepted 14 March 2008)
doi:10.1111/j.1742-4658.2008.06411.x
The hnRNP K homology (KH) domain was first identified in the protein
human heterogeneous nuclear ribonucleoprotein K (hnRNP K) 14 years
ago. Since then, KH domains have been identified as nucleic acid recognition
motifs in proteins that perform a wide range of cellular functions. KH
domains bind RNA or ssDNA, and are found in proteins associated with
transcriptional and translational regulation, along with other cellular
processes. Several diseases, e.g. fragile X mental retardation syndrome and
paraneoplastic disease, are associated with the loss of function of a particular
KH domain. Here we discuss the progress made towards understanding both
general and specific features of the molecular recognition of nucleic acids by
KH domains. The typical binding surface of KH domains is a cleft that is
versatile but that can typically accommodate only four unpaired bases. Van
der Waals forces and hydrophobic interactions and, to a lesser extent, elec-
trostatic interactions, contribute to the nucleic acid binding affinity. ‘Aug-
mented’ KH domains or multiple copies of KH domains within a protein are
two strategies that are used to achieve greater affinity and specificity of
nucleic acid binding. Isolated KH domains have been seen to crystallize as
monomers, dimers and tetramers, but no published data support the forma-
tion of noncovalent higher-order oligomers by KH domains in solution.
Much attention has been given in the literature to a conserved hydrophobic
residue (typically Ile or Leu) that is present in most KH domains. The inter-
est derives from the observation that an individual with this Ile mutated to
Asn, in the KH2 domain of fragile X mental retardation protein, exhibits a
particularly severe form of the syndrome. The structural effects of this muta-
tion in the fragile X mental retardation protein KH2 domain have recently
been reported. We discuss the use of analogous point mutations at this posi-
tion in other KH domains to dissect both structure and function.
Abbreviations

BPS, branchpoint sequence; dFXRP, Drosophila fragile X-related protein; FBP, FUSE-binding protein; FMRP, fragile X mental retardation
protein; FUSE, ssDNA far-upstream element; FXRP, fragile X-related protein; hFMRP, human fragile X mental retardation protein; hnRNP K,
human heterogeneous nuclear ribonucleoprotein K; KH, hnRNP K homology; KSRP, K homology splicing regulator protein; NCS,
noncrystallographic symmetry; PCBP, poly(C)-binding protein; PSI, P-element somatic inhibitor protein; SF1, splicing factor 1; Y2H, yeast-two
hybrid.
2712 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS
[1,2]. Typically, KH domains are found in multiple
copies, two in fragile X mental retardation protein
(FMRP) [3–5], three in hnRNP K [1,6], and 14 in vigi-
lin [7,8]. There are, however, a few examples of pro-
teins with single KH motifs; Mer1p [1,9] and Sam68
[10] each have just one. The typical function of KH
domains, whether they are present in single or multiple
copies, is RNA or ssDNA recognition. When present
in a protein in multiple copies, KH domains can func-
tion independently or cooperatively. In ssDNA far-
upstream element (FUSE)-binding protein (FBP), for
example, the KH3 and KH4 domains are separated by
a flexible Gly linker with no interdomain contacts [11].
Each KH domains binds to a segment of ssDNA, with
a linker of noncontacted ssDNA between [12]. By con-
trast, the two KH domain of NusA have an extensive
interdomain contact area, and bind an extended seg-
ment of RNA that runs across both domains [13–15].
KH modules are found in many different proteins,
which are involved in a myriad of different biological
processes, including splicing, transcriptional regulation,
and translational control.
Two folds, one motif
It was pointed out by Grishin that there are actually

two different versions of the KH motif, which he
named type I and type II KH folds (Fig. 1) [2]. The
type I fold is typically found in eukaryotic proteins,
whereas the type II fold is typically found in prokary-
otic proteins. Although type I and type II folds both
share a ‘minimal KH motif’ in the linear sequence, the
three-dimensional arrangement of the secondary struc-
tural elements is different. In the type I fold, a b-sheet
composed of three antiparallel b-strands is abutted by
three a-helices (a1, a2, and a¢). The b-sheet in type I
KH domains consists of three b-strands in the order
b1, b¢ and b2. The b1-strand and b2-strand are parallel
to each other, and the b¢-strand is antiparallel to both
(Fig. 1). This all-antiparallel arrangement of strands
distinguishes the type I KH fold from the type II KH
fold, in which the b1-strand and b2-strand are adjacent
and parallel to each other, and the b¢-strand is adja-
cent and antiparallel to the b1-strand (Fig. 1). The
length and sequence of the variable loop are different
in different KH domains, be they type I or type II (the
variable loop is shown as a dotted line in Fig. 1). Vari-
able loop lengths from three to over 60 residues are
known. All typical KH domains have a GXXG loop
(shown in white in Fig. 1) [2], although this is some-
times altered or interrupted in divergent KH domains
[16].
Not only is the order of secondary structural ele-
ments in individual eukaryotic type I KH domains dif-
ferent from that in prokaryotic type II KH domains,
but the relative orientation of tandem type I versus

type II KH domains is also quite different. The com-
parison is limited, however, because the structure of
only one of each type of tandem KH domain has been
published. Here we compare the structures of the tan-
dem KH1–KH2 domains from protein NusA (Protein
Data Bank entry 2ASB) [14,15] and from human
FMRP (hFMRP) (Protein Data Bank entry 2QND)
[17] as examples of tandem prokaryotic KH (type II)
domains and tandem eukaryotic (type I) KH domains,
respectively. In NusA, an unstructured six amino acid
linker connects KH1 to KH2, and an area of
 1380 A
˚
2
is buried at the interface between the
b-sheet of KH1 and the a-helices (a¢ and a2) of KH2
(Fig. 2B). By contrast, in hFMRP(KH1–KH2D), the
a¢-helix of KH1 is linked to the b1-strand of KH2 by
the single residue, Glu280, which adopts non-b non-a
A
B
Fig. 1. Type I and type II KH domain folds.
Stylized representations of (A) the type I KH
domain (eukaryotic) and (B) the type II KH
domain (prokaryotic). The labeling of second-
ary structure elements is according to stan-
dard KH nomenclature [2]. The dotted line
connecting the b2-strand and b¢-strand rep-
resents the variable loop. The white line
connecting the a1-helix and the a2-helix

represents the GXXG loop.
R. Valverde et al. Structure and function of KH domains
FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS 2713
phi ⁄ psi angles to accomplish this tight connection,
which contains minimal interdomain contacts between
aliphatic residues from the a1-helix of KH1 and the
b-sheet of KH2 [17,18].
Evolutionary relationships between KH
domains
Type I domains are found in multiple copies in
eukaryotic proteins, whereas type II KH domains are
typically found as single copies in prokaryotic proteins.
Here, therefore, we discuss eukaryotic proteins. Within
a family of KH proteins with multiple KH domains
(i.e. type I KH domains), the KH1 domain is always
more similar to other KH1 domains in different pro-
teins than to the KH2 and KH3 domains in the same
protein. Similar relationships are seen for KH2 and
KH3 domains – they are more similar to other KH2
and KH3 domains, respectively, than they are to each
other or to KH1 domains (Fig. 3). This relationship
holds true in all families and between species, from
those within which the like-pairs of domains have very
high identity [over 95% in the Nova and poly(C)-bind-
ing protein families], to those within which like-pairs
of domains have much lower identity (around 50% in
the FXR family; Fig. 3).
From this observation, a number of hypotheses
about the origin and evolution of the KH domains
may be proposed. If multiple KH domains arose as a

result of a gene duplication event, the results cited
above suggest that duplication occurred before the
divergent evolution of the members of each protein
family. Alternatively, one could speculate that the
interdomain identities are a result of convergent evolu-
tion of different domains in a parent protein, before
subsequent evolutionary divergence produced different
members of the family.
Nucleic acid binding by KH domains –
general features
The structures of KH domains in complex with their
cognate nucleic acid ligand are mostly of type I
domains from eukaryotic proteins, which function in
transcriptional and translation regulation. The only
structures of type II KH domains in complex with
nucleic acid ligand are of the bacterial protein
NusA [15] (Protein Data Bank entries 2ATW and
2ASB).
Although the total number of structures in the Pro-
tein Data Bank of KH domains bound to cognate
nucleic acid ligand is small, some common features of
nucleic acid recognition emerge among them. The
RNA or DNA is bound in an extended, single-
stranded conformation across one face of the KH
domain, between the a1-helix and the a2-helix and
GXXG on the ‘left’, and the b2-sheet and the vari-
able loop on the right (Fig. 4A). Together, these
secondary structural elements form a binding cleft that
accommodates four bases. Note that the secondary
structure elements that shape the binding cleft com-

prise, in part, the core motif found in type I and
type II domains. The variable loop in type II KH
domains, however, is located at the bottom of the
binding cleft (Fig. 4A). The center of the binding
pocket tends to be hydrophobic, with a variety of
additional specific interactions stabilizing the complex.
Nucleic acid base-to-protein aromatic side-chain stack-
ing interactions, which are prevalent in other types of
single-stranded nucleic acid binding motifs [19,20], are
notably absent in KH domain nucleic acid recognition.
In some complexes, the bases in the ssDNA or RNA
bound by the KH domain stack with each other
(Fig. 4B), whereas in other examples there is no base
stacking.
An adenine–backbone interaction is a feature seen in
some KH domain–nucleic acid structures (Fig. 4C).
Examples are (relevant adenine in bold) A42–G43–
A44–A45 in NusA KH1, C48–A49–A50–U51 in NusA
KH2 [15], U12–C13–A14–C15 in Nova-2 KH3 [21],
and U6–A7–A8–C9 in splicing factor 1 (SF1) [22]. The
adenine bases hydrogen bond to the protein backbone,
mimicking a Watson–Crick base pairing pattern.
Superimposing the NusA KH1 domain and ribonu-
cleotides 42–46 on the NusA KH2 domain and ribonu-
cleotides 48–53 reveals that the adenine bases of A44
and A50 make exactly equivalent hydrogen bonds to
the protein backbone [15].
Fig. 2. The orientation of individual KH domains in tandem type I
and type II arrays. Schematics are based on the crystal structures
of the KH1–KH2 domains of NusA (type II) (Protein Data Bank entry

1KOR) and fragile X mental retardation protein [type I (B)] (Protein
Data Bank entry 2QND). Each domain is represented as an oval
with the b-sheet side colored solid black and the abutting a-helices
striped.
Structure and function of KH domains R. Valverde et al.
2714 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS
KH domains bind ssDNA and RNA with low
micromolar affinity. For example, the K
d
values of the
KH domain of the SF1–DNA complex and the
hnRNP K KH3 domain–DNA complex are 3 lm and
1 lm, respectively [22,23]. The clustering of KH
domains increases nucleic acid recognition and specific-
ity [24]; the four tandem KH domains of P-element
somatic inhibitor protein (PSI), for example, bind
ligand cooperatively [25]. The KH1–KH2 domains of
NusA (Protein Data Bank entries 2ATW and 2ASB)
form an uninterrupted recognition surface that binds
RNA with nanomolar affinity [15]. Together, the third
and fourth KH domains of the K homology splicing
regulator protein (KSRP) bind RNA ligand more
tightly than each does separately [26].
Finally, where the structures of both the KH–nucleic
acid complex and free KH domain have been deter-
mined, ligand binding produces little or no structural
change in the protein as determined by our analysis
[27] and concluded in [15,21,28].
Nucleic acid recognition by KH
domains – specific examples

NMR structure of the KH3 domain of hnRNPK
with ssDNA bound
The type I KH3 domain of the transcriptional regula-
tor hnRNP K binds to a 10mer ssDNA, specifically
recognizing the tetrad 5¢-dTCCC (Fig. 5) [23] (Protein
FMRP KH1 FMRP KH2 FXR1 KH1 FXR1 KH2 FXR2 KH1 FXR2 KH2 dmFMR1 KH1 dmFMR1 KH2
FMRP KH1
100.0 - -
FMRP KH2
21.7100.0 - -
FXR1 KH1
82.0 23.0 100.0 - - - - -
FXR1 KH2
20.3
55.2 17.7 100.0 - - - -
FXR2 KH1
64.3 20.0 68.6 19.0 100.0 - - -
FXR2 KH2
21.7
53.7 22.9 82.0 22.8 100.0 - -
dmFMR1 KH1
54.4 22.1 58.8 23.1 58.8 25.6 100.0 -
dmFMR1 KH2
22.5
43.1 24.0 65.3 26.7 62.5 22.6 100.0
NOVA-1 KH1 NOVA-1 KH2 NOVA-1 KH3 NOVA-2 KH1 NOVA-2 KH2 NOVA-2 KH3
NOVA-1 KH1
100.0 - - - - -
NOVA-1 KH2
35.3 100.0 - - - -

NOVA-1 KH3
40.3 37.3 100.0 - - -
NOVA-2 KH1
95.5 36.8 36.8 100.0 - -
NOVA-2 KH2
32.4
86.3 34.3 34.3 100.0 -
NOVA-2 KH3
38.8 35.8
90.9 37.3 35.4 100.0
PCB1 KH1 PCB1 KH2 PCB1 KH3 PCB2 KH1 PCB2 KH2 PCB2 KH3
PCB1 KH1
100.0 - - - - -
PCB1 KH2
33.8 100.0 - - - -
PCB1 KH3
35.4 33.8 100.0 - - -
PCB2 KH1
95.2 33.8 32.3 100.0 - -
PCB2 KH2
35.4
93.8 31.0 35.4 100.0 -
PCB2 KH3
33.8 35.4
92.1 30.8 32.4 100.0
PCB3 KH1
88.7 32.3 36.9 90.3 33.8 35.4
PCB3 KH2
35.4
84.6 35.4 33.8 89.2 36.9

PCB3 KH3
36.4 38.5
84.1 33.3 40.0 84.1
PCB4 KH1
74.2 35.4 35.4 69.4 33.8 35.4
PCB4 KH2
35.4
76.9 36.9 33.8 80.0 38.5
PCB4 KH3
33.8 33.8
66.7 30.8 35.4 71.4
PCB3 KH1 PCB3 KH2 PCB3 KH3 PCB4 KH1 PCB4 KH2 PCB4 KH3
PCB1 KH1

PCB1 KH2

PCB1 KH3

PCB2 KH1

PCB2 KH2

PCB2 KH3

PCB3 KH1
100.0 - - - - -
PCB3 KH2
33.4 100.0 - - - -
PCB3 KH3
36.9 40.0 100.0 - - -

PCB4 KH1
75.8 35.4 36.9 100.0 - -
PCB4 KH2
36.9
86.2 41.5 33.8 100.0 -
PCB4 KH3
35.4 36.9
68.3 33.8 33.8 100.0
Fig. 3. Table showing sequence identities of KH domains within protein families. Data for the FMRP, Nova and PCBP families are shown.
For each family, the sequences of individual KH domains were aligned with KH domains at different positions in the same protein, and KH
domains at the same position in different proteins. The highest percentage identities were consistently those between KH domains at the
same position in different members of a protein family (highlighted in purple).
R. Valverde et al. Structure and function of KH domains
FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS 2715
Data Bank entry 1JK5). The authors propose that the
complex is stabilized by methyl-to-oxygen hydrogen
bonds between three Ile side-chains and the O2 and
N3 atoms of the two central cytosine bases. Methyl-to-
oxygen hydrogen bonds are uncommon and weak, but
not without precedent [29,30]. Additional interactions
that stabilize the complex include protein backbone
and side-chain hydrogen bonds to bases, and electro-
static interactions between positively charged side-
chains on the protein and the phosphate backbone of
the nucleic acid.
Poly(C)-binding proteins
Poly(C)-binding proteins (PCBPs) contain three type I
KH domains, which appear to function independently,
because they are separated by long linkers: KH1–(16
amino acid spacer)–KH2–(67 to > 100 amino acid

spacer)–KH3. They bind to poly(C)-rich DNA and
RNA sequences and function in a diverse range of cel-
lular processes, including mRNA stabilization, transla-
tional activation, and translational silencing [31,32].
Crystal structures have been solved of the
PCBP2 KH1 in complex with a 12-nucleotide ssDNA
and with its RNA equivalent (Protein Data Bank
entries 2PQU and 2PQY, respectively) [33]. In both
the ssDNA and RNA complexes, the 12 nucleotides
correspond to two repeats of the human C-rich strand
telomeric DNA, 5¢-
AACCCTAACCCT-3¢ (a single
repeat is underlined, and the core recognition sequence
is in bold). The asymmetric unit of both ssDNA and
RNA crystals contains two KH1 molecules tethered by
one oligonucleotide ligand. The crystal structures of
PCBP2 KH1 in complex with either 12-nucleotide
ssDNA or its equivalent RNA are similar, with no
indication that the hydroxyl groups of the RNA bases
are involved in interactions with the protein (Fig. 6A).
The CCCT ⁄ U tetranucleotide motif constitutes the
core recognition sequence.
Interestingly, however, when PCBP2 KH1 was crys-
tallized with a seven-nucleotide single repeat ssDNA
ligand 5¢-AACCCTA-3¢ (core recognition sequence in
bold), a different ‘register’ of the nucleic acid–protein
complex was observed [28] (Protein Data Bank entry
2AXY, shown in Fig. 6B). In all structures, the nucleic
acid was in the ‘typical’ cleft, but its position relative
to the protein was shifted up by one base in the

5¢-direction in the seven-nucleotide structure (ACCC
versus CCCT; Fig. 6A–C). The first position of the
core recognition motif sits on top of the a1-helix, and
then the phosphate backbone of the next two nucleo-
tides interacts with the a1-helix and the GXXG motif
on the left, and the b2-strand and the variable loop on
the right. Base stacking is observed between the third
and fourth position nucleotides of the core recognition
sequence. The recently solved high-resolution structure
of the third KH domain of PCBP2 bound to ssDNA,
5¢-dAACCCTA-3¢ [34] (Protein Data Bank entry
2P2R) is similar to previous structures of the first KH
domain of PCBP2. However, because the crystals dif-
fracted to ultra-high resolution, hydrogen bonding and
water molecules mediating protein DNA contacts were
observed that previously could not be resolved in other
crystal structures. Specifically, the binding cleft is occu-
pied by the tetrad 5¢-CCCT-3¢, with direct water-medi-
ated contacts stabilizing the last two bases, and
protein nucleic acid contacts to two additional bases
beyond the binding cleft where seen. Also of interest is
AB
C
Fig. 4. Common features of KH domain–
nucleic acid interactions. (A) Type I KH
domain; the binding cleft comprises the sec-
ondary structural elements a1-helix, GXXG
loop, a2-helix, b2-strand, and variable loop
(colored green), and recognizes four nucleo-
tides (cyan sticks). The green dotted line

represents the location of the variable loop
in type II KH domains. (B) Nucleic acid
bases of the ligand stacking with each
other. Coordinates from Protein Data Bank
entry 1J5K were used in (A) and (B), and
coordinates from Protein Data Bank entry
2ASB were used in (C).
Structure and function of KH domains R. Valverde et al.
2716 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS
the observation that in different crystal forms, the KH
domains of PCBP2 were either monomeric or were a
crystal-contact-mediated dimer (see section on KH
dimers).
RNA recognition by a single KH domain in
cooperation with a QUA2 domain
SF1 specifically recognizes the intron branchpoint
sequence (BPS) UACUAAC in pre-mRNA transcripts
[35], with KH domain binding augmented by addi-
tional interactions with an N-terminal helix known as
the QUA2 domain (labeled in Fig. 7) [36]. The RNA
adopts an extended single-stranded conformation, and
is bound in a hydrophobic groove between QUA2, the
GXXG loop and the variable loop of the KH domain
[22] (Fig. 7; Protein Data Bank entry 1K1G). The
QUA2 region recognizes the 5¢-nucleotides of the BPS
(ACU), with the a1-helix and a2-helix and the
b2-strand of the KH domain region interacting with
the next nucleotides of the RNA in ‘typical’ fashion.
A large surface area of predominantly aliphatic hydro-
phobic residues is buried at the protein–RNA inter-

face. In addition, positively charged side-chains
undergo electrostatic interactions with the solvent-
exposed phosphate backbone. Protein contacts to the
3¢-end of the RNA are provided by the variable loop
and the b2-strand.
Binding of the seven-nucleotide RNA BPS requires
both the QUA2 and KH regions. Another example of
an augmented KH domain is the fourth KH domain
of KSRP [26], which contains a novel fourth b-strand
located adjacent and angled to the b1-strand and
contributes to the stability of the protein (Protein Data
Bank entry 2HH2). It is not yet known whether
the fourth b-strand is involved in contacts with RNA
[26].
X-ray structure of Nova-2 KH3 plus SELEX RNA
The X-ray structure of the KH3 domain of Nova-2
bound to an in vitro selected stem–loop RNA contain-
ing the 5¢-UCAC-3¢ core recognition sequence has been
solved [21] (Fig. 8). This structure is something of an
‘outlier’, because the nucleic acid has a double-
stranded hairpin stretch (not shown in Fig. 8), which
may be a consequence of stability requirements for
selection in vitro [37].
The stem of the hairpin adopts the A-form double-
helical conformation, with four Watson–Crick base
pairs (G1–C20, A2–U19, G3–C18, G4–C17) and a
single hydrogen bond between A5 and C16 (N1–
O2 = 2.4 A
˚
).

The extended target RNA (A11, U12, C13, A14,
C15) lies upon a hydrophobic platform (formed by the
a1-helix and the edge of the b2-strand), where it con-
tacts both the invariant GXXG motif and the variable
loop.
Nucleic acid binding by tandem but independent
KH domains – NMR structure of the KH3 and KH4
domains of FBP in complex with FUSE ssDNA
FUSE-binding protein has four KH domains, which
are separated by linkers of varying lengths [11]. FBP
regulates c-myc expression by binding to FUSE [38].
The NMR structure of a complex between the KH3
Fig. 5. Solution structure of the KH3 domain of hnRNP K bound to
ssDNA. The third KH domain of hnRNP K (Protein Data Bank entry
1J5K) recognizes a tetrad of sequence 5¢-dTCCC (purple sticks).
Regions on the protein that are in contact with the nucleic acid
ligand are colored green (hydrophilic) and cyan (polar). The sugar
phosphate backbone curves around the a1-helix near the GXXG
loop before proceeding parallel to the a2-helix. The first base sits
on top of the a1-helix, and the 5¢-dCCC bases of the tetrad fill the
interior of the predominantly hydrophobic cleft and base stack with
each other (see Fig. 4B). The ends of the ssDNA sugar backbone
are stabilized by electrostatic interactions with positively charged
residues that line the ridge of the cleft on the GXXG loop and
a2-helix.
R. Valverde et al. Structure and function of KH domains
FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS 2717
and KH4 domains of FBP and a 29-base ssDNA frag-
ment from FUSE [12] shows that each KH domain
binds to a separate 9- to10-base segment of ssDNA

(Fig. 9). The KH domains are connected by a flexible
Gly-rich linker, and behave independently. In addition,
the two ssDNA segments to which the KH domains
bind are themselves separated by a five-base linker of
ssDNA. There are no protein contacts between the
KH domains, and the linker DNA is not in contact
with protein.
In both KH domains, the ssDNA is bound in the
typical extended orientation, in the groove between
the a 1-helix and a2-helix plus the GXXG loop on
one side, and the b2-strand and the variable loop on
the other. The center of the groove is hydrophobic,
and the edges are hydrophilic and charged, with the
narrow binding site (10 A
˚
) favoring pyrimidines over
purines.
NusA – crystal structure of tandem type II KH
domains
NusA regulates transcriptional elongation, pausing,
termination and antitermination in prokaryotes [39–
41]. The protein contains two tandem type II KH
domains, which are connected by a short six-residue
linker [14,15]. This short linker, combined with a tight
turn between the domains, results in a structure in
which the two KH domains are in contact and form
an extended and continuous surface for RNA binding.
NusA binds with high affinity and specificity to BoxB–
BoxA–BoxC antitermination sequences within the lea-
der region of the rRNA operon [15]. Ligand binding

produces no change in the structure or relative orienta-
tion of the KH domains, (Protein Data Bank entries
1KOR and 2ASB) [27]. The ssRNA is bound in an
extended conformation and is in contact with large
areas on both KH domains (Fig. 10).
Despite having type II connectivity, each KH
domain of NusA contains a ‘typical’ binding cleft.
The variable loop, however, hangs at the bottom of
the cleft (Fig. 4A) instead of up and across from the
GXXG loop, as in type I KH domains. The 5¢-end
of the RNA (bases A42 through A45) is buried in
and across the groove between the a1-helix and a2-
helix and the b2-strand of KH1. Intimate contacts
between protein and RNA continue across the cusp
of the KH1 and KH2 domains. C46 binds to the
Fig. 6. Crystal structures of the first KH domain from PCBP-2 in complex with ssDNA. The first KH domain of PCBP2 recognizes the tetrad
sequence 5¢-dCCCT [(A) Protein Data Bank entry 2PQU] and 5¢-dACCC [(B) Protein Data Bank entry 2AXY). Polar and hydrophobic residues
that make contacts with nucleic acid (purple sticks) are colored cyan and green, respectively. Waters (gray spheres) that bridge protein and
ssDNA contacts were unambiguously resolved in the high-resolution structure in (B). Both structures are representative molecules within
the asymmetric unit. In (C), the tetrad sequence (purple letters) of each structure is aligned with respect to the seven-nucleotide single
repeat ssDNA ligand. The register of the sequence is shifted in the 5¢-direction in (A). In both structures, the nucleotide at the 5¢-end of the
ssDNA strand sits on the top of the a1-helix, and is stabilized by contacts that can recognize an adenine or cytosine nucleotide. The central
cytosine bases of the tetrad sequence occupy the hydrophobic interior of the binding cleft. The last nucleotide at the 3¢-end of the ssDNA
strand (dC in 2AXY; dT in 2PQU) is participating in base-stacking interactions with the preceding cytosine base.
Structure and function of KH domains R. Valverde et al.
2718 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS
loop connecting b¢ strand and a¢ helix of KH2, and
U47 and C48 make contacts with the a1-helix and
the GXXG loop of KH2. Finally, the nucleotides at
the 3¢-end of the RNA (A49–A52) pack against the

groove comprising the a1-helix and a2-helix and the
b2-strand of KH2. Hydrogen bonds to both amino
acid side-chains and the protein backbone, electro-
static and polar interactions and, to a lesser extent,
hydrophobic interactions between bases and nonaro-
matic amino acid side-chains stabilize the protein
RNA complex.
The interaction of the NusA tandem KH domains
with RNA is quite different from that seen in the dou-
ble KH domain of FBP bound to ssDNA from FUSE
– the only other structure of a double KH domain
bound to a nucleic acid target. In FBP, the two KH
domains are connected by a flexible 30-residue Gly-
rich linker and behave like beads on a string [12]. In
the protein DNA complex, each KH domain interacts
with a separate ssDNA recognition sequence, and a
five-nucleotide noninteracting spacer separates the two
bound DNA recognition sequences.
Although in both examples the coupling of two
RNA-binding domains will effectively increase the
specificity and affinity of the RNA–protein interaction,
the two different binding modes have very different
consequences for the type and length of RNA bound.
KH crystal dimers – a tenuous
relationship
Crystallographic data
Different KH domains crystallize as monomers,
dimers, or tetramers. This and other observations have
Fig. 7. Solution structure of the QUA2 and KH domains of SF1 in
complex with RNA. The Qua2 and KH domain of SF1, together,

recognize RNA BPS 5¢-UACUAAC (blue sticks; Protein Data Bank
entry 1K1G). Protein side-chains making polar and hydrophobic con-
tacts with RNA are colored cyan and green, respectively. The
QUA2 domain (labeled) abuts the a2-helix of the KH domain, giving
rise to an expanded contact with RNA, with the five nucleotides at
the 5¢-end of the RNA contacting the QUA2 domain, exclusively.
The base of Ura6 is buried between the a1-helix and the QUA2
helix. The RNA then continues in single-stranded, extended confor-
mation into the ‘typical’ KH groove. Finally, the RNA loops over to
the right and makes contact with the b2-strand. Note also the very
long variable loop, 24 amino acids, which loops back over the RNA
from the right.
Fig. 8. Crystal structure of Nova-2 KH3 bound to SELEX RNA. The
third KH domain of the protein Nova-2 binds to the tetranucleotide
sequence 5¢-UCAC (blue sticks; Protein Data Bank entry 1EC6),
which is part of the larger SELEX RNA. Protein side-chains making
polar and hydrophobic contacts with RNA are shown in cyan and
green, respectively. U12–C13–A14 rests on a hydrophobic platform
formed by the a1-helix and the b2-strand. Electrostatic interactions
between protein side-chains, nucleic acid bases and the sugar
phosphate backbone further stabilize the complex. Bases A14 and
C15 participate in base-stacking interactions with each other. The
2¢-hydroxyl groups of the tetrad hydrogen bond with protein or
other bases, making it unlikely that ssDNA could bind tightly to this
KH domain.
R. Valverde et al. Structure and function of KH domains
FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS 2719
led to the proposal that the functional form of certain
KH domains may involve noncovalent dimers or
higher-order oligomers. Here we review the data.

Crystals of the single KH3 domain of the protein
Nova-2 contain four KH molecules per asymmetric unit
(Protein Data Bank entry 1DTJ) related by pseudo-222
noncrystallographic symmetry (NCS; Fig. 11A) with
two different surfaces on each KH domain mediating
protein–protein contacts (Fig. 11B,C). One protein–
protein interface comprises primarily two b1-strands
from two KH domains related by two-fold NCS.
This arrangement creates an augmented antiparallel
b-sheet stabilized by cross-strand side-chain interac-
tions [42] and a buried surface area of 890 A
˚
2
[18,43]
(reported as 950 A
˚
2
in [44]) (Fig. 12A). The other
interface comprises two a¢-helices with an  50
0
packing angle [45] of the two KH domains related by
NCS that buries 1000 A
˚
(reported as 1250 A
˚
2
in [44];
Fig. 11C).
Interestingly, the same KH domain in complex with
a SELEX RNA crystallizes with only two KH mole-

cules in the asymmetric unit related by NCS [21]. The
two KH molecules interact through related a¢-helices
and bury 1000 A
˚
2
(Fig. 12B). This arrangement is
identical to the protein–protein interactions observed
in crystals of apo-Nova (Fig. 11C).
Crystals of the first KH domain of PCBP2 in com-
plex with ssDNA contain two identical dimer com-
plexes per asymmetric unit related by two-fold NCS
[28] (Protein Data Bank entry 2AXY; Fig. 13A). The
dimer buries 1890 A
˚
2
, and as in the protein–protein
interface depicted in Fig. 12A, an augmented antipar-
allel b-sheet is formed by symmetry-related b1-strands
and further stabilized by interactions between a¢-helices
(Fig. 13B). This dimeric arrangement is reproduced
in crystals of two PCBP2 KH1 molecules tethered by
one ssDNA or RNA ligand [33] (Protein Data Bank
entries 2PQU and 2PYQ). In the cocrystal structure
of the third KH domain of human PCBP-2 with
DNA [34], however, no protein–protein contacts were
observed in the crystal. Instead, crystal contacts were
solely formed by base-stacking interactions of DNA
molecules from adjacent asymmetric units. A1 of the
heptanucleotide stacks on C3 of a symmetry-related
DNA and vice versa.

For neither the apo nor nucleic acid-bound forms of
these KH domains are there published solution data in
support of the idea that these KH domains may exist
as dimers or higher-order oligomers in solution [17,44],
and nor have dimers or higher-order oligomers been
shown to be of functional significance in vivo.
Fig. 9. Solution structure of the FBP KH3–KH4 domain bound to
ssDNA. The third and fourth KH domains of FBP recognize ssDNA
5¢-dTTTT (A) and 5¢-ATTC (B), respectively. In both domains, the
binding cleft makes hydrophobic contacts with the ssDNA bases,
and polar residues lining the edge of the cleft contact the sugar
phosphate backbone. The bases of the DNA ligand stack with each
other, with the methyl groups of thymine pointing away from the
binding cleft. Both domains behave independently. Although both
the KH domains and both the DNA-binding sites were present as a
single unit, neither the Gly-rich protein linker nor the noncontacted
ssDNA were resolved.
Structure and function of KH domains R. Valverde et al.
2720 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS
In crystals of the tandem KH domains from human
FMRP, there are also two molecules in the asymmetric
unit related by NCS [17] (Protein Data Bank entry
2QND). Contacts between NCS-related b2-strands
and, to a lesser extent, a1-helices bury, 2100 A
˚
2
(Fig. 14A). This b-sheet augmentation is similar to that
seen with apo-Nova-2 KH3 and PCBP2 KH1, but its
interface comprises primarily b2–b2 and not b1–b1
interactions (compare Figs 12 and 13 with Fig. 14B).

When the C2 operation is applied to the asymmetric
unit, another interface is formed between neigh-
boring KH domains. This interface is mediated by
symmetry-related a¢-helices, as seen in crystals of
RNA-bound Nova-2 KH3 (Figs 11C and 12B), and
buries 1200 A
˚
2
– significantly less than observed in the
asymmetric unit.
In summary, two interfaces are commonly observed
in the crystals: (a) helix–helix packing between symme-
try-related a¢-helices with a  50
0
packing angle, as
seen in the Nova-2 KH3–RNA structure; and (b) b-
sheet augmentation achieved by contacts between b1
or b2 symmetry-related strands, as seen in Nova-2
KH3, hFMRP (KH1–KH2D), and PCBP2 KH1.
Caution is advised in extrapolating from crystal
structures to predict the solution oligomeric state of
KH domains. Although several KH domains form
ABC
Fig. 10. Crystal structure of tandem type II KH domains of NusA in complex with RNA. The tandem KH1–KH2 domains of NusA recognize
RNA ligand 5¢-GAACUCAAUAG. (A) The KH1–KH2 domains of NusA bound to cognate RNA ligand (Protein Data Bank entry 2ASB). The
RNA–protein contact surface spans across both domains. In particular, A45 makes contacts with residues in both KH1 and KH2. Additional
polar contacts with 2¢-hydroxyls specify RNA recognition. The KH1 and KH2 domains are shown separately in (B) and (C), respectively.
Type II KH domains are connected differently. The variable loop, for example, is located at the bottom and to the left of the binding cleft.
Although the connection of type II KH domains is different, the structural elements that comprise the binding cleft are the same in as type I
domains, and accommodate four nucleotides as well.

Fig. 11. Protein–protein interfaces in Nova-2 KH3 in crystals. This figure is an adaptation of Figs 6 and 7 from Lewis et al. [44], using Protein
Data Bank coordinates 1DTJ. (A) Contents of the asymmetric unit with the two-fold NCS axis labeled. The tetrameric arrangement of mole-
cules produces two protein–protein interfaces. (B) One protein–protein interface generated by two-fold NCS. (C) Other protein–protein inter-
faces also generated by two-fold NCS.
R. Valverde et al. Structure and function of KH domains
FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS 2721
dimers in the crystal, in several cases different crys-
tallization constructs and conditions give rise to dif-
ferent crystal forms – in which the KH domain is
monomeric, or has different crystal packing contacts.
In the crystal structure of the tandem KH1–KH2
domains from hFMRP, a crystallographic dimer with
the most buried surface area of all previous KH
crystal dimers is observed. Solution analytical ultra-
centrifugation measurements, however, clearly showed
that the protein is monomeric in solution [17].
Biochemical studies
Git and Standart [46] investigated the potential for the
four KH domains of the protein Vg1RBP to interact
with each other noncovalently. The results were
somewhat ambiguous, because although they found
that – using dimethyl suberimidate as a crosslinking
agent – dimers and higher-order oligomers were
formed in solution, in the absence of the crosslinking
agent, association of the KH domains was only
observed in the presence of RNA.
Chen et al. [47] investigated the possibility of self-
association of the protein Sam68, which contains a
single KH domain. They showed that Sam68 self-
associated in vivo [a c-myc-tagged Sma68 could coim-

munoprecipitate a non-c-myc-tagged Sam68, and
Sam68–Sam68 gave a positive signal in a yeast two-
hybrid (Y2H) assay]. However, the KH domain alone
neither self-associated nor bound RNA.
Ramos et al. [48] investigated the potential for the
KH3 domain of Nova-2 to self-associate in vitro,by
performing limited equilibrium ultracentrifugation
experiments, from which they estimated that 10–20%
dimer may be present, which would correspond to a
dissociation constant of about 300 lm. The authors
also reported a concentration-dependent increase of
the rotational correlation times, but these data were
not analyzed quantitatively with respect to either size
or dissociation constant.
Kim et al. [49] investigated possible homoprotein
and heteroprotein associations between hnRNPs. They
used the Y2H assay to show that the full-length
proteins formed specific homocomplexes and hetero-
complexes. Then they used a Y2H assay to map which
parts of the large proteins were involved in associa-
AB
Fig. 13. Schematic representation of protein–protein interfaces in the structure of PCBP. (A) was generated using Protein Data Bank coordi-
nates 2AXY, and is oriented looking down the NCS axis that generates the dimeric arrangement of molecules. The schematic in (B) shows
the crystal contacts stabilizing the protein–protein interaction.
A
B
Fig. 12. Schematic representation of protein–protein surfaces of
free and RNA-bound Nova-2 crystals. The schematic in (A) and (B)
is based on the protein–protein interactions shown in Fig. 11B,C,
respectively. Salient secondary structure elements are labeled.

Cross-strand side-chain interactions are shown in open and closed
circles.
Structure and function of KH domains R. Valverde et al.
2722 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS
tions. For hnRNP K, the N-terminal two-thirds of the
protein, spanning KH1, KH2, and the junction
between KH2 and KH3, was required for interactions
with hnRNP E2, I, K, and L. Deletion of the junction
sequence including the Pro-rich regions (but not the
KH domain) abolished protein–protein interactions,
and the region spanning the junction sequence and
KH3 domain was not sufficient for the protein–protein
interaction. In other words, these results do not map
the interacting region to the KH domains.
Again, we caution against any model in which iso-
lated KH domains are proposed to form stable dimers
in solution. There are no data to support this hypothe-
sis. One cannot, however, exclude the possibility that a
full-length protein containing a KH domain(s) may
have a dimerization interface that includes KH-medi-
ated contacts.
Fragile X mental retardation syndrome –
devastating effects of a single point mutation
within a KH domain
Fragile X mental retardation syndrome is the most
common form of inherited mental impairment in
humans. For all fragile X individuals, the underlying
cause of the syndrome is lack of functional FMRP. In
the majority of cases, FMRP is not made because a
CGG repeat expansion in the 5¢-UTR of the gene

encoding it is hypermethylated, causing both chromo-
somal fragility and transcriptional silencing [50].
FMRP is a putative RNA-binding protein with two
tandem KH domains [3–5]. A particularly pernicious
case of fragile X syndrome was identified in a boy who
did not have the CGG expansion, and who produced
normal levels of FMRP, but who had a single muta-
tion within the KH2 domain of FMRP: Ile304 was
mutated to Asn (Ile304 fi Asn) [51,52].
Since the clinical description of the consequences of
the Ile304 fi Asn mutation, various efforts have been
undertaken to determine the effects of the mutation on
FMRP structure and function. However, until recently,
all have been inconclusive and even contradictory. For
example, the mutation has been proposed to abrogate
RNA binding, have no effect on RNA binding, com-
pletely unfold the KH domain, have no effect on pro-
tein structure, be buried in the hydrophobic core, be
solvent-exposed, and be involved in direct interactions
with RNA [21,53,54]. The lack of a consensus can be
attributed, at least in part, to the extrapolation of data
from other KH domains to the KH1–KH2 domains of
FMRP.
The structure of the tandem KH1–KH2 domains of
hFMRP provided the first crystallographic description
of the structural environment of the Ile304 residue
[17]. It revealed that Ile304 is located in the main
hydrophobic core of the KH2 domain, which com-
prises buried hydrophobic residues from the hydropho-
bic face of the b1-strand and b2-strand and in part the

a1-helix and a2-helix. Ile304 is completely solvent-inac-
cessible, except for a single atom, Ile304-Cc2, whose
solvent accessibility is less than one-third that of an
Ile-Cc2 atom in a Gly-Ile-Gly extended chain [17].
Ile304 could only make significant contacts with RNA
if substantial structural rearrangements occurred upon
binding, which is not typical for KH domains (see
fig. 5A in [17]).
If a polar Asn residue were substituted for Ile at
position 304, one would expect that the integrity of the
hydrophobic core would be perturbed. Such a struc-
tural perturbation is indeed observed, as evidenced by
substantial changes in the structure and a decrease in
the stability of the KH1–KH2 domains of hFMRP
containing the Ile304 fi Asn mutation.
Even within the same KH domain family, however,
the Ile304 fi Asn equivalent mutation can have differ-
ent effects on protein structure, ranging from modest to
significant perturbation, despite the predicted similar
Fig. 14. Dimeric arrangement of hFMRP (KH1–KH2D) molecules.
The crystal of hFMRP (KH1–KH2D) contain two copies in the asym-
metric unit related by two-fold NCS (Protein Data Bank coordinates
2QND). The strands of one chain are represented as open arrows,
and the symmetry-related strands are shaded. Hydrophobic and
polar side-chains are shown in closed and open circles, respec-
tively. This orientation creates an augmented b-sheet composed of
six antiparallel strands. This arrangement of KH molecules buries
2100 A
˚
2

of total buried surface area with cross-strand side-chain
interactions.
R. Valverde et al. Structure and function of KH domains
FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS 2723
local environment of Ile304. For example, higher verte-
brae have two autosomal paralogs, fragile X-related
protein (FXRP) 1 and FXRP2, which have similar
expression patterns and domain organization to
FMRP [55]. The main hydrophobic core of the KH2
domain of human FMRP comprises residues, including
Ile304, that are identical in FXRP1 and FXRP2 [17].
From these observations, it follows that introducing
the equivalent Ile304 fi Asn mutation in FXRP1
KH1–KH2 would result in loss of secondary structure,
and indeed it does – as confirmed by CD spectroscopy
(data not shown). Amphibians and arthropods have
one FMRP-like ortholog, of which Drosophila FXRP
(dFXRP) has been most studied [56,57]. dFXRP is
46.9% identical to hFMRP in the KH1–KH2 region,
yet the Ile307 fi Asn (equivalent mutation) has rela-
tively modest effects on protein secondary structure
[58].
The Ile304 fi Asn mutation has been studied in the
context of other tandem KH domain-containing pro-
teins. Drosophila PSI has four tandem KH domains,
PSI KH1–KH4, that bind pre-mRNA cooperatively.
As with dFXRP, introducing the Ile304 fi Asn equiva-
lent mutation into each KH domain has relatively sub-
tle effects on secondary structure [25].
Leu28 in the KH3 domain of the protein Nova-2 is

structurally equivalent to Ile304 in FMRP. A study by
Lewis et al. [44] found that the Ile304 fi Asn mutation
perturbs the structure of Nova-2 KH3 and would
destabilize the hydrophobic core of the KH2 domain
of FMRP. This group subsequently reported that
introducing an iso-structural Asn in place of Leu28
would alter the electrostatic properties of a hydropho-
bic platform, stabilizing the RNA ligand on the pro-
tein, without changing the hydrophobic interior of the
domain [21]. The Ca backbones of the structures of
both free and RNA-bound Nova-2 KH3 are essentially
the same (compare Protein Data Bank entries 1DTJ
and 1EC6) [27], signifying that the protein backbone
does not move in the presence of RNA ligand. The
main hydrophobic core of Nova-2 KH3 comprises resi-
dues that are similar but not identical to the residues
in the main core of fragile X KH domains. Analysis of
the RNA-bound structure of Nova-2 KH3 reveals that
the atoms of Leu28 are buried except for Cb,Cc, and
Cd1, whose combined solvent accessibility change
upon RNA binding is < 1% of the total surface area
buried when RNA binds. Introducing a Leu28 fi Asn
mutation would more likely affect the hydrophobic
core of the protein.
We caution that the effects of Ile304 are different in
different contexts. The Ile304 fi Asn equivalent muta-
tion unfolds the first KH domain of FMRP, for exam-
ple, but has lesser structural effects on the Drosophila
proteins FXRP [58] and PSI [25].
Conclusions

The nucleic acid-binding activity of KH domains is
central to many cellular processes. Nucleic acid recog-
nition by KH domains is unique. Unlike RNA recog-
nition motifs, which recognize a diversity of RNA
lengths, the binding cleft of KH domains is versatile
but accommodates only four nucleic acid bases.
When more specificity is required, beyond that pos-
sible with a single KH domain, an augmented recogni-
tion surface may be achieved either by multiple
tandem KH domains or by including neighboring
structural motifs. KH domains are well-tuned motifs
that balance functional diversity and specificity, and
are thus widely utilized in biology.
References
1 Siomi H, Matunis MJ, Michael WM & Dreyfuss G
(1993) The pre-mRNA binding K protein contains a
novel evolutionarily conserved motif. Nucleic Acids Res
21, 1193–1198.
2 Grishin NV (2001) KH domain: one motif, two folds.
Nucleic Acids Res 29, 638–643.
3 Siomi H, Siomi MC, Nussbaum RL & Dreyfuss G
(1993) The protein product of the fragile X gene,
FMR1, has characteristics of an RNA-binding protein.
Cell 74, 291–298.
4 Ashley CT, Wilkinson KD, Reines D & Warren ST
(1993) FMR1 protein: conserved RNP family domains
and selective RNA binding. Science 262, 563–566.
5 O’Donnell WT & Warren ST (2002) A decade of molec-
ular studies of fragile X syndrome. Annu Rev Neurosci
25, 315–328.

6 Ostareck LA & Ostareck DH (2004) Control of mRNA
translation and stability in haematopoietic cells: the
function of hnRNPs K and E1 ⁄ E2. Biol Cell 96, 407–
411.
7 McKnight GL, Reasoner J, Gilbert T, Sundquist KO,
Hokland B, McKernan PA, Champagne J, Johnson CJ,
Bailey MC, Holly R et al. (1992) Cloning and expres-
sion of a cellular high density lipoprotein-binding pro-
tein that is up-regulated by cholesterol loading of cells.
J Biol Chem 267, 12131–12141.
8 Currie JR & Brown T (1999) KH domain-containing
proteins of yeast: absence of a fragile X gene homo-
logue. Am J Med Genet 84, 272–276.
9 Spingola M, Armisen J & Ares MJ (2004) Mer1p is a
modular splicing factor whose function depends on the
conserved U2 snRNP Snu17p. Nucleic Acids Res 32,
1242–1250.
Structure and function of KH domains R. Valverde et al.
2724 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS
10 Lukong KE & Richard S (2003) Sam68, the KH
domain-containing superSTAR. Biochim Biophys Acta
1653, 73–86.
11 Duncan R, Bazar L, Michelotti G, Tomonaga T, Kru-
tzsch H, Avigan M & Levens D (1994) A sequence-spe-
cific, single-stranded binding protein activates the far
upstream element of c-myc and defines a new DNA-
binding motif. Genes Dev 8, 465–480.
12 Braddock DT, Louis JM, Baber JL, Levens D & Clore
GM (2002) Structure and dynamics of KH domains
from FBP bound to single-stranded DNA. Nature 415,

1051–1056.
13 Gibson TJ, Thompson JD & Heringa J (1993) The KH
domain occurs in a diverse set of RNA-binding proteins
that include the antiterminator NusA and is probably
involved in binding nucleic acid. FEBS Lett 21, 361–
366.
14 Gopal B, Haire LF, Gamblin SJ, Dodson EJ, Lane
AN, Papavinasasundaram KG, Colston MJ & Dodson
G (2001) Crystal structure of the transcription elonga-
tion ⁄ anti-termination factor NusA from Mycobacterium
tuberculosis at 1.7 A resolution. J Mol Biol 314, 1087–
1095.
15 Beuth B, Pennell S, Arnvig KB, Martin SR & Taylor
IA (2005) Structure of a Mycobacterium tuberculosis
NusA-RNA complex. EMBO J 24, 3576–3587.
16 Brykailo MA, Corbett AH & Fridovich-Keil JL (2007)
Functional overlap between conserved and diverged
KH domains in Saccharomyces cervisiae SCP160.
Nucleic Acids Res 35, 1108–1118.
17 Valverde R, Pozdnyakova I, Kajander T & Regan L
(2007) Fragile X mental retardation: the structure of the
KH1–KH2 domains of fragile X mental retardation
protein. Structure 15, 1090–1098.
18 Collaborative Computational Project Number 4 (1994)
The CCP4 Suite: programs for protein crystallography.
Acta Crystallogr D 50, 760–763.
19 Nagai K (1996) Protein–RNA complexes. Curr Opin
Struct Biol 6, 53–61.
20 Stelf R, Skrisokska L & Allain FH (2005) RNA
sequence- and shape-dependent recognition by protein

in the ribonucleoprotein particle. EMBO Rep 6, 33–38.
21 Lewis HA, Musunuru K, Jensen KB, Edo C, Chen H,
Darnell RB & Burley SK (2000) Sequence-specific RNA
binding by a Nova KH domain: implications for para-
neoplastic disease and the fragile X syndrome. Cell 100,
323–332.
22 Liu Z, Luyten I, Bottomley MJ, Messias AC, Houngi-
nou-Molangao S, Spragers R, Zanier K, Kramer A &
Sattler M (2001) Structural basis of recognition of the
intron branch site RNA by splicing factor 1. Science
294, 1098–1101.
23 Braddock DT, Baber JL, Levens D & Clore GM (2002)
Molecular basis of sequence-specific single-stranded
DNA recognition by KH domains: solution structure of
a complex between hnRNP K KH3 single-stranded
DNA. EMBO 21, 3476–3485.
24 Lunde BM, Moore C & Varani G (2007) RNA-binding
proteins: modular design for efficient function. Nat Rev
Mol Cell Biol 8, 479–490.
25 Chmiel NH, Rio DC & Doudna JA (2006) Distinct
contributions of KH domains to substrate binding affin-
ity of Drosophila P-element somatic inhibitor protein.
RNA 12, 283–291.
26 Garcia-Mayoral MF, Hollingworth D, Masino L,
Diaz-Moreno I, Kelly G, Gherzi R, Chou CF, Chen
CY & Ramos A (2007) The structure of the C-termi-
nal KH domains of KSRP reveals a noncanonical
motif Important for mRNA degredation. Structure 15,
485–498.
27 Jones TA, Zou JY, Cowan SW & Kjeldgaard M (1991)

Improved methods for building protein models in elec-
tron density maps and the location of errors in these
models. Acta Crystallogr A 47, 110–119.
28 Du Z, Lee JK, Tjhen R, Li S, Pan H, Stroud RM &
James TL (2005) Crystal structure of the first KH
domain of human poly(C)-binding protein-2 in complex
with a C-rich strand of human telomeric DNA at 1.7A
˚
.
J Biol Chem 280, 38823–38829.
29 Senes A, Ubarretxena-Belandia I & Engelman DM
(2001) The Calpha–H…O hydrogen bond: a determi-
nant of stability and specificity in transmembrane helix
interactions. Proc Natl Acad Sci 98, 9056–9061.
30 Vargas R, Garza J, Dixon DA & Hay BP (2000) How
Strong Is the Ca-H…O=C hydrogen bond? J Am Chem
Soc 122, 4750–4755.
31 Makeyev AV & Liebhaber SA (2002) The poly(C)-bind-
ing proteins: a multiplicity of functions and search for
mechanisms. RNA 8, 265–278.
32 Gamernik AV & Andino R (2000) Interactions of viral
protein 3CD and poly(rC) binding protein with the
5¢ untranslated region of the poliovirus genome. J Virol
74, 2219–2226.
33 Du Z, Lee JK, Fenn S, Tjhen R, Stroud RM & James
TL (2007) X-ray crystallographic and NMR studies of
protein–protein and protein–nucleic acid interactions
involving the KH domains from human poly(C)-binding
protein-2. RNA 13, 1043–1051.
34 Fenn S, Du Z, Lee JK, Tjhen R, Stroud RM & James

TL (2007) Crystal structure of the third KH domain of
human poly(C)-binding protein-2 in complex with a
C-rich strand of human telomeric DNA at 1.6A
˚
resolu-
tion. Nucleic Acids Res 35, 2651–2660.
35 Berglund JA, Chua K, Abovich N, Reed R & Rosbash
M (1997) The splicing factor BBP interacts specifically
with the pre-mRNA branchpoint sequence UACUAAC.
Cell 89, 781–787.
36 Vernet C & Artzt Z (1997) STAR, a gene family
involved in signal transduction and activation of RNA.
Trends Genet 13, 479–484.
R. Valverde et al. Structure and function of KH domains
FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS 2725
37 Tuerk C & Gold L (1990) Systematic evolution of
ligands by exponential enrichment: RNA ligands to bac-
teriophage T4 DNA polymerase. Science 249, 505–510.
38 Michelotti GA, Michelotti EF, Pullner A, Duncan RC,
Eick D & Levens D (1996) Multiple single-stranded cis
elements are associated with activated chromatin of
human c-myc gene in vivo. Mol Cell Biol 16, 2656–
2669.
39 Gibson TJ, Thompson JD & Heringa J (1993) The
KH domain occurs in a diverse set of RNA-binding
proteins that include the antiterminator NusA and is
probably involved in binding to nucleic acid. FEBS
324, 361–366.
40 Linn T & Greenblatt J (1992) The NusA and NusG
proteins of Escherichia coli increase the in vitro read-

through frequency of a transcriptional attenuator pre-
ceding the gene for the beta subuit of RNA polymerase.
J Biol Chem 267, 1449–1454.
41 Borukhov S, Lee J & Laptenko O (2005) Bacterial tran-
scription elongation factors: new insights into molecular
mechanism of action. Mol Microbiol 55, 1315–1324.
42 Merkel JS, Sturtevant JM & Regan L (1999) Sidechain
interactions in parallel beta sheets: the energetics of
cross-strand pairings. Structure 7, 1333–1341.
43 Lee B & Richards FM (1971) The interpretation of pro-
tein structures: estimation of static accessibility. J Mol
Biol 55, 379–400.
44 Lewis HA, Chen H, Edo C, Buckanovich RJ, Yang YY,
Musunuru K, Zhong R, Darnell RB & Burley SK (1999)
Crystal structures of Nova-1 and Nova-2 K-homology
RNA-binding domains. Structure 7, 191–203.
45 Chothia C, Levitt M & Richardson D (1891) Helix to
helix packing in proteins. J Mol Biol 145, 215–250.
46 Git A & Standart N (2002) The KH domains of Xeno-
pus Vg1RBP mediate RNA binding and self associa-
tion. RNA 8, 1319–1333.
47 Chen T, Damaj BB, Herrera C, Lasko P & Richard S
(1997) Self-association of the single-KH-domain family
members Sam68, GRP33, GLD-1, and Qk1: role of the
KH domain. Mol Cell Biol 17, 5707–5718.
48 Ramos A, Hollingworth D, Major SA, Adinolfi S,
Kelly G, Muskett FW & Pastore A (2002) Role of
dimerization in KH ⁄ RNA complexes: the example of
Nova KH3. Biochemistry 41, 4193–4203.
49 Kim JH, Hahm B, Kim YK, Choi M & Jang SK (2000)

Protein–protein interaction among hnRNPs shuttling
between nucleus and cytoplasm. J Mol Biol 298, 395–
405.
50 Jin P & Warren ST (2000) Understanding the molecular
basis of fragile X syndrome Hum. Mol Genet 9, 901–
908.
51 De Boulle K, Verkerk AJ, Reyniers E, Vits L, Hend-
rickx J, Van Roy B, van den Bos F, de Graaff E, Oos-
tra BA & Willems PJ (1993) A point mutation in the
FMR-1 gene associated with fragile X mental retarda-
tion Nat. Genetics 3, 31–35.
52 Feng Y, Absher D, Eberhart D, Brown V, Malter H &
Warren S (1997) FMRP associates with polyribosomes
as an mRNP, and the I304N mutation of severe fragile
X syndrome abolishes this association. Mol Cell 1
, 109–
118.
53 Musco G, Kharrat A, Stier G, Fraternali F, Gibson TJ,
Nilges M & Pastore A (1997) The solution structure of
the first KH domain of FMR1, the protein responsible
for the fragile X syndrome. Nat Struct Biol 4, 712–716.
54 Ramos A, Hollingworth D & Pastore A (2003) The role
of a clinically important mutation in the fold and
RNA-binding properties of KH motifs. RNA 9, 293–
298.
55 Tamanini F, Willemsen R, van Unen L, Bontekoe C,
Galjaard H, Oostra BA & Hoogeveen AT (1997) Differ-
ential expression of FMR1, FXR1 and FXR2 proteins
in human brain and testis. Hum Mol Genet 6, 1315–
1322.

56 Wan L, Dockendorff TC, Jongens TA & Dreyfuss G
(2000) Characterization of dFMR1, a Drosophila mela-
nogaster homolog of the fragile X mental retardation
protein. Mol Cell Biol 20, 8536–8547.
57 Zarnescu DC, Shan G, Warren ST & Jin P (2005)
Come FLY with us: toward understanding fragile X
syndrome. Genes Brain Behav 4, 385–392.
58 Pozdnyakova I & Regan L (2005) New insights into
fragile X syndrome. Relating genotype to phenotype at
the molecular level. FEBS 272, 872–878.
Structure and function of KH domains R. Valverde et al.
2726 FEBS Journal 275 (2008) 2712–2726 ª 2008 The Authors Journal compilation ª 2008 FEBS

×