Tải bản đầy đủ (.pdf) (12 trang)

Báo cáo khoa học: Classification of the short-chain dehydrogenase ⁄reductase superfamily using hidden Markov models potx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (189.89 KB, 12 trang )

Classification of the short-chain dehydrogenase

reductase
superfamily using hidden Markov models
Yvonne Kallberg
1,2
, Udo Oppermann
3
and Bengt Persson
1,2,4
1 IFM Bioinformatics, Linko
¨
ping University, Sweden
2 Department of Cell and Molecular Biology (CMB), Karolinska Institutet, Stockholm, Sweden
3 Structural Genomics Consortium, The Botnar Research Centre, NIHR Biomedical Research Unit, University of Oxford, UK
4 National Supercomputer Centre (NSC) and Swedish E-science Research Centre (SERC), Linko
¨
ping University, Sweden
Introduction
The short-chain dehydrogenase ⁄ reductase (SDR)
superfamily, recently reviewed in [1], consists of
NAD(P)(H)-dependent oxidoreductases that are distinct
from the medium-chain dehydrogenase and aldo–keto
reductase (AKR) superfamilies. The term SDR was
coined in 1991 [2], and the enzyme family has been
shown to be present in all domains of life, from primi-
tive bacteria to higher eukaryotes. Interestingly, about
25% of all identified dehydrogenases belong to the
SDR superfamily [3]. Furthermore, in the ocean
sequence sampling by Venter et al. [4], this superfamily
was found to be the largest, with over 60 000 non-


redundant sequences (over 30 000 of the ‘classical’ type
and close to 30 000 of the ‘extended’ type).
The SDR superfamily currently has more than
47 000 primary structures available in sequence data-
bases and over 300 crystal structures deposited in the
Protein Data Bank. They show early divergence, the
majority of family members having only low pairwise
sequence identity (typically 20–30%), but have several
properties in common, described in [1,2]. The three-
dimensional structures are clearly homologous with a
single-domain globular Rossmann-related fold consist-
ing of a b-sheet sandwiched between three a-helices on
each side. The active site is formed by a triad ⁄ tetrad
with highly conserved Tyr, Lys, Ser (and Asn) residues
[1,5]. Substrate binding occurs in a cleft close to the
coenzyme-binding site. This cleft shows considerable
Keywords
bioinformatics; classification; genomes;
hidden Markov model; short-chain
dehydrogenases ⁄ reductase
Correspondence
B. Persson, IFM Bioinformatics, Linko
¨
ping
University, S-581 83 Linko
¨
ping, Sweden
Fax: +46 13 137 568
Tel. +46 13 282 983
E-mail:

(Received 23 August 2009, revised 12
February 2010, accepted 16 March 2010)
doi:10.1111/j.1742-4658.2010.07656.x
The short-chain dehydrogenase ⁄ reductase (SDR) superfamily now has over
47 000 members, most of which are distantly related, with typically 20–
30% residue identity in pairwise comparisons, making it difficult to obtain
an overview of this superfamily. We have therefore developed a family clas-
sification system, based upon hidden Markov models (HMMs). To this
end, we have identified 314 SDR families, encompassing about 31 900
members. In addition, about 9700 SDR forms belong to families with too
few members at present to establish valid HMMs. In the human genome,
we find 47 SDR families, corresponding to 82 genes. Thirteen families are
present in all three domains (Eukaryota, Bacteria, and Archaea), and are
hence expected to catalyze fundamental metabolic processes. The majority
of these enzymes are of the ‘extended’ type, in agreement with earlier find-
ings. About half of the SDR families are only found among bacteria, where
the ‘classical’ SDR type is most prominent. The HMM-based classification
is used as a basis for a sustainable and expandable nomenclature system.
Abbreviations
AKR, aldo–keto reductase; HMM, hidden Markov model; SDR, short-chain dehydrogenase ⁄ reductase; 17b-HSD, 17b-hydroxysteroid
dehydrogenase.
FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS 2375
variation between the individual SDR forms, explain-
ing the wide substrate spectrum of this enzyme super-
family.
In humans, there are 202 SDR forms, corresponding
to at least 82 SDR genes; they have important func-
tions in steroid hormone, prostaglandin and retinoid
metabolism, and hence signaling. They also play cru-
cial roles in the metabolism of xenobiotics, including

drugs and carcinogens. A growing number of single-
nucleotide polymorphisms have been assigned to SDR
genes. Of the 77 human SDRs that are listed in the
well-annotated database Swiss-Prot, 24 enzymes are
associated with diseases in the OMIM database
(Online Mendelian Inheritance in Man). Thus, many
(or even all) of these enzymes are medically important.
However, the functions of about half of the human
SDR enzymes are unknown.
The SDR superfamily has grown by several orders
of magnitude, from 20-odd members in 1991 [2] to
over 47 000 today, and thanks to the fast progress in
genome and environmental sequencing projects, the
number of known SDR forms can be expected to
increase even more in the future. This substantiates the
need for a subdivision into families to achieve a sys-
tematic overview and allow for annotations and for
functional conclusions. In this article, we apply hidden
Markov models (HMMs) to obtain a sequence-based
subdivision of the SDR superfamily that allows for
automatic classification of novel sequence data and
provides the basis for a nomenclature system.
Results and Discussion
The large SDR superfamily is of ancient origin, with
most members being equidistantly related at the 20–
30% residue identity level. Consequently, there are no
natural hierarchical relationships to rely on for the
functional assignments. HMMs have successfully been
used in protein family characterization [6], and their
use is now standard when new sequences are being

annotated. They are used in our functional categoriza-
tion of all SDR members, where each SDR family cor-
responds to an HMM, and this set of resulting HMMs
forms the basis for a sustainable nomenclature scheme
for the whole SDR superfamily.
So far, 314 families have been defined, covering
about 31 900 of  47 000 retrieved SDRs. Approxi-
mately 9700 SDRs form clusters that are too small
(fewer than 20 members with maximum 80% sequence
identity) for them to be reliably identified with an
HMM, but these will hopefully be extended as new
genomes become sequenced. The remaining SDRs
( 6800) will be investigated henceforth.
Family size
The numbers of members in the different families
identified vary considerably, but the majority of the
families are quite small. Over half of the SDR fami-
lies have fewer than 60 members, and 77% of the fam-
ilies have fewer than 100 members (Fig. 1). Large
families are rare; there are only 16 families with 400 or
more members (Table 1). They are primarily of the
‘extended’ SDR type (nine families), and several of
these members metabolize carbohydrate derivatives, a
basic function common to most life forms. Two such
families are the GDP-mannose-4,6-dehydratases
(SDR3E in the nomenclature system) and the GDP-
l-fucose synthetases (SDR4E), which are involved
in the two-step conversion of GDP-mannose to
GDP-l-fucose. The latter is a substrate for several
fucosyltransferases, which in turn are involved in the

expression of many glyconjugates [7,8]. Another exam-
ple is provided by the UDP-glucose-4-epimerases
(SDR1E), which catalyze the third and final step in the
Leloir pathway of galactose metabolism, interconvert-
ing UDP-galactose and UDP-glucose. Impairment of
this enzyme reaction leads to epimerase-deficiency
galactosemia [9], which can lead to, for example,
mental retardation or cataract in humans.
In 16th place among the largest families are the
insect alcohol dehydrogenases (SDR109I). These form
a very specialized group of enzymes, also called the
‘intermediate’ SDR type [10]. This seems to be unique
to insects, and the size of this family is due to the very
well-studied genomes of fruit flies. The 78 different
0
20
40
60
80
100
120
20–39
40–59
60–99
100–199
200–299
300–499
500-
Members
Families

Fig. 1. SDR family sizes. The bars represent the number of SDR
families of defined family sizes. The most common family size is
between 20 and 39 members.
SDR classification using HMM Y. Kallberg et al.
2376 FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS
Drosophila genomes alone account for more than half
of the members (228 of 404).
Human SDR members
Of the 314 families identified, 37 have human mem-
bers. Ten additional human SDR families have been
identified, but these have too few members (< 20
with < 80% pairwise residue identity) to be suitable
for HMM analysis at present. In total, the 47 fami-
lies represent 82 SDR genes (Table 2). In Table 3,
the numbers of genes for each SDR family in
human, mouse and rat are compared. For most fam-
ilies, the numbers are identical for the three species.
Of the 13 families with two or more genes in
humans, nine have at least two genes also in mouse
and ⁄ or rat.
Regarding family size in relation to number of
human members, one would imagine a linear correla-
tion: the larger the size, the more human members.
However, typically, the SDR families with more than
400 members have only a single human representative
and, in total, as many as 34 of the 47 families have
only one human member. There are four families, with
retinol and steroid dehydrogenases, that stand out:
SDR7C, SDR11E and SDR16C, with their six human
members each, and SDR9C, with as many as eight

human members. This observation emphasizes the
critical importance of enzymatic control of retinoid
and steroid metabolism in development as well as
metabolic and homeostatic signaling [11]. In this
context, control of ligand access, by oxidoreductases
such as the above-mentioned SDRs, to nuclear hor-
mone receptors such as retinoid or steroid receptors
appears to constitute an important determinant of ste-
roid, retinoid and lipid signaling, and seems to necessi-
tate the existence of such diversified enzyme forms to
maintain and execute proper functions in multicellular
eukaryotes and mammals.
Species distribution
A closer look at the distribution among the classified
SDRs in the domains Eukaryota, Bacteria and Archaea
(Fig. 2) reveals that more than half of the families are
unique to bacteria (178 of 314). The two largest of
these, SDR56X with 557 members and SDR61X with
245 members, have multidomain enzymes in the form
of polyketide synthases. They typically have two
NADP(H)-binding domains, and the two SDR families
cover one domain each.
One-fifth (63) of the families have members among
both bacteria and eukaryotes but not among archaea-
ons. Archaeal SDRs are quite rare, only 32 families
having any such member (Fig. 2). There are three fam-
ilies with much higher proportions of archaeal mem-
bers than are generally found. Two of them comprise
extended SDRs (SDR136E UDP-glucose-4-epimerase
and SDR144E UDP-glucose homolog), and the third

family contains a classical SDR (SDR146C 3-oxoacyl
reductase). There is no single SDR family unique to
archaeaons.
Table 1. The 16 largest SDR families. The largest families, with currently more than 400 members, are listed. In the domain columns, the
letters E, B and A denote eukaryotic, bacterial and archaeal genomes, respectively.
Family name Family designation Family size
Domain
EBA
Acetoacetyl-CoA reductase SDR152C 1444 x x x
UDP-glucose-4-epimerase SDR1E 1273 x x x
dTDP-
D-glucose-4,6-dehydratase SDR2E 1109 x x x
Enoyl-(acyl-carrier-protein) reductase SDR54D 691 x x
DTDP-4-rhamnose reductase SDR134E 656 x x
GDP-mannose-4,6 dehydratase SDR3E 642 x x x
Capsular polysaccharide biosynthesis protein SDR55E 558 x x x
Modular polyketide synthase, KR domain SDR56X 557 x
Gluconate-5-dehydrogenase SDR49C 509 x x x
Glucose and ribitol dehydrogenase SDR57C 492 x x
GDP-
L-fucose synthetase SDR4E 469 x x x
Dihydroflavonol-4-reductase SDR108E 456 x
UDP-glucuronic acid decarboxylase SDR6E 424 x x x
UDP-glucuronate-4-epimerase 4 SDR50E 423 x x x
3-Hydroxyacyl-CoA dehydrogenase type II SDR5C 411 x x
Drosophila alcohol dehydrogenase SDR109I 404 x
Y. Kallberg et al. SDR classification using HMM
FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS 2377
Table 2. Human SDR members. The 47 SDR families with human representatives are listed, together with data about family size, domain occurrence, human entries in Uniprot, and EC
number. The domain designations are: A, archaeal; B, bacterial; E, eukaryotic. The eukaryotic subdivisions are: M, mammal; I, insect; P, plant; O, other. Accession number, identifier and

description are extracted from the Uniprot-KB.
Family
designation
Family
size
Domain
Accession
number Identifier Description EC number
E
ABMI PO
SDR1E 1273 X X X X X X Q14376 GALE_HUMAN UDP-Glucose-4-epimerase 5.1.3.2
SDR2E 1109 X X X X X O95455 TGDS_HUMAN dTDP-
D-glucose-4,6-dehydratase 4.2.1.46
SDR3E 642 X X X X X X O60547 GMDS_HUMAN GDP-mannose-4,6-dehydratase 4.2.1.47
SDR4E 469 X X X X X X Q13630 FCL_HUMAN GDP-
L-fucose synthetase 1.1.1.271
SDR5C 411 X X X X Q99714 HCD2_HUMAN 3-Hydroxyacyl-CoA DH type 2 1.1.1.35
SDR6E 424 X X X X X X Q8NBZ7 UXS1_HUMAN UDP-glucuronic acid decarboxylase 1 4.1.1.35
SDR7C 262 X X X X X Q8TC12 RDH11_HUMAN Retinol dehydrogenase 11 1.1.1
Q96NR8 RDH12_HUMAN Retinol dehydrogenase 12 1.1.1
Q8NBN7 RDH13_HUMAN Retinol dehydrogenase 13 1.1.1
Q9HBH5 RDH14_HUMAN Retinol dehydrogenase 14 1.1.1
Q6UX07 DHR13_HUMAN SDR family member 13 1.1
Q8N5I4 DHRSX_HUMAN SDR family member on chromosome X 1.1
SDR8C 360 X X X X P51659 DHB4_HUMAN Peroxisomal multifunctional enzyme type 2 4.2.1.107 1.1.1.35
SDR9C 158 X X X Q02338 BDH_HUMAN
D-b-Hydroxybutyrate dehydrogenase,
mitochondrial
1.1.1.30
P37059 DHB2_HUMAN Estradiol-17-b-dehydrogenase 2 1.1.1.62

P80365 DHI2_HUMAN Corticosteroid-11-b-DH isozyme 2 1.1.1
Q9BPW9 DHRS9_HUMAN SDR family member 9 1.1
Q92781 RDH1_HUMAN 11-cis-Retinol dehydrogenase 1.1.1.105
O14756 H17B6_HUMAN Hydroxysteroid-17-b-dehydrogenase 6 1.1.1.62 1.1.1.63
1.1.1.105
Q8NEX9 DR9C7_HUMAN SDR family 9C member 7 1.1.1
O75452 RDH16_HUMAN Retinol dehydrogenase 16 1.1
SDR10E 257 X X X Q8WVX9 FACR1_HUMAN Fatty acyl-CoA reductase 1 1.2.1
Q96K12 FACR2_HUMAN Fatty acyl-CoA reductase 2 1.2.1
SDR11E 129 X X P14060 3BHS1_HUMAN 3-b-Hydroxysteroid dehydrogenase type 1 1.1.1.145 5.3.3.1
P26439 3BHS2_HUMAN 3-b-Hydroxysteroid dehydrogenase type 2 1.1.1.145 5.3.3.1
Q9H2F3 3BHS7_HUMAN 3-b-Hydroxysteroid dehydrogenase type 7 1.1.1
Q9UD07 Q9UD07_HUMAN 3-b-Hydroxysteroid dehydrogenase
Q6I955 Q6I955_HUMAN 3-b-Hydroxysteroid dehydrogenase w1
protein
Q9UDK8 Q9UDK8_HUMAN 3-b-Hydroxysteroid dehydrogenase
SDR12C 164 X X X X Q53GQ0 DHB12_HUMAN Estradiol-17-b-dehydrogenase 12 1.1.1.62
P37058 DHB3_HUMAN Testosterone-17-b-dehydrogenase 3 1.1.1.62
Q3SXM5 HSDL1_HUMAN Hydroxysteroid dehydrogenase-like protein 1
SDR13C 157 X X X X Q6YN16 HSDL2_HUMAN Hydroxysteroid dehydrogenase-like protein 2 1
SDR classification using HMM Y. Kallberg et al.
2378 FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS
Table 2. (Continued).
Family
designation
Family
size
Domain
Accession
number Identifier Description EC number

E
ABMI PO
SDR14E 135 X X X X X Q8IZJ6 TDH_HUMAN Inactive
L-threonine-3-dehydrogenase
SDR15C 143 X X X Q9BUT1 BDH2_HUMAN 3-Hydroxybutyrate dehydrogenase type 2 1.1.1.30
SDR16C 114 X X X X O75911 DHRS3_HUMAN Short-chain dehydrogenase ⁄ reductase 3 1.1
Q8NBQ5 DHB11_HUMAN Estradiol-17-b-dehydrogenase 11 1.1.1
Q7Z5P4 DHB13_HUMAN 17-b-Hydroxysteroid dehydrogenase 13 1.1
Q8IZV5 RDH10_HUMAN Retinol dehydrogenase 10 1.1.1
Q8N3Y7 RDHE2_HUMAN Epidermal retinal dehydrogenase 2 1.1.1
XP_498284
SDR17C 114 X X X X Q9NUI1 DECR2_HUMAN Peroxisomal 2,4-dienoyl-CoA reductase 1.3.1.34
SDR18C 95 X X X X Q16698 DECR_HUMAN 2,4-Dienoyl-CoA reductase, mitochondrial 1.3.1.34
SDR19C 63 X X X Q96LJ7 DHRS1_HUMAN SDR family member 1 1.1
SDR20C 82 X X X X Q7Z4W1 DCXR_HUMAN
L-Xylulose reductase 1.1.1.10
SDR21C 52 X X P16152 CBR1_HUMAN Carbonyl reductase (NADPH) 1 1.1.1.184
O75828 CBR3_HUMAN Carbonyl reductase (NADPH) 3 1.1.1.184
SDR22E 84 X X X X Q16795 NDUA9_HUMAN NADH (ubiquinone) 1a subcomplex subunit 9
SDR23E 45 X X X Q9NZL9 MAT2B_HUMAN Methionine adenosyltransferase 2 subunit b
SDR24C 149 X X X Q6UWP2 DHR11_HUMAN Dehydrogenase ⁄ reductase SDR family
member 11
1
SDR25C 187 X X X X X Q13268 DHRS2_HUMAN SDR family member 2 1.1
Q9BTZ2 DHRS4_HUMAN SDR family member 4 1.1.1.184
Q6PKH6 DR4L2_HUMAN SDR family member 4-like 2 1.1
NP_001075957
SDR26C 41 X X P28845 DHI1_HUMAN Corticosteroid-11-b-dehydrogenase isozyme 1 1.1.1.146
Q7Z5J1 DHI1L_HUMAN Hydroxysteroid-11-b-dehydrogenase 1-like
protein

1.1.1
SDR27X 74 X X X P49327 FAS_HUMAN Fatty acid synthase 2.3.1.85
SDR28C 47 X X P14061 DHB1_HUMAN Estradiol-17-b-dehydrogenase 1 1.1.1.62
Q9NYR8 RDH8_HUMAN Retinol dehydrogenase 8 1.1.1
Q13034 Q13034_HUMAN 17-b-Hydroxysteroid dehydrogenase
SDR29C 66 X X X Q9BY49 PECR_HUMAN Peroxisomal trans-2-enoyl-CoA reductase 1.3.1.38
SDR30C 47 X X X Q92506 DHB8_HUMAN Estradiol-17b-dehydrogenase 8 1.1.1.62
SDR31E 32 X X X Q15738 NSDHL_HUMAN Sterol-4-a-carboxylate 3-dehydrogenase 1.1.1.170
SDR32C 33 X X X Q6IAN0 DRS7B_HUMAN SDR family member 7B 1.1
A6NNS2 DRS7C_HUMAN SDR family member 7C 1.1
SDR33C 43 X X X P09417 DHPR_HUMAN Dihydropteridine reductase 1.5.1.34
SDR34C 42 X X X X X Q9Y394 DHRS7_HUMAN SDR family member 7 1.1
SDR35C 45 X X X X X Q06136 KDSR_HUMAN 3-Ketodihydrosphingosine reductase 1.1.1.102
SDR36C
a
23 X X P15428 PGDH_HUMAN 15-Hydroxyprostaglandin dehydrogenase 1.1.1.141
Y. Kallberg et al. SDR classification using HMM
FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS 2379
There are few families with representatives from all
three domains, only 14 of 314 (Table 4). Seven of them
have mammal (and human) representatives. Nine
families are of the extended type, which is more than
expected by chance, as the classical type is most com-
mon ( 69%). Therefore, it seems that families of the
extended SDR type are represented in more species
than the classical type, in agreement with early genome
investigations [12]. Interestingly, there is one family
(SDR53C, related to glucose dehydrogenases) with
only 38 members that still has members from all
domains, in spite of its small size. Typically, the bacte-

rial members form the vast majority (80% or more) in
most SDR families identified, in agreement with the
fact that 79% of the SDRs are from that domain.
However, in two of these 14 families, the eukaryotic
members are in the majority (SDR51C, l-xylulose
reductases; and SDR53C, glucose-1-dehydrogenase-
related proteins).
There are 41 families with only eukaryotic members
(Table 5). Around half of them are unique to one
group of species; insect alcohol dehydrogenases consti-
tute one such family, there are seven families unique to
plants, and as many as 15 families are unique to fungi.
Sixteen of the remaining families, with members from
multiple groups of species, have mammalian (and
human) representatives. These families include several
of the steroid dehydrogenases ⁄ reductases and carbonyl
and fatty acyl reductases.
SDR types
SDRs are divided into the types ‘classical’ and
‘extended’ [10], and it was previously noted that classi-
cal SDRs are more common; however, among SDRs
that are present in all eukaryotes, the extended type is
equally common [13]. Now we are able to make a
large-scale comparison, including not only eukaryotes
but also prokaryotes. Of the 314 families identified,
there are 218 families judged to be classical and 52
extended (68% and 17%, respectively). In total, these
cover about 27 900 proteins, and surprisingly, given
that the majority of the families are classical, 36% of
the proteins are of the extended type. This means that

many of the largest families are of the extended type.
Classical SDRs are in the vast majority in families with
members from only one domain (Eukaryota or Bacte-
ria) and also in families with both eukaryotic and bac-
terial members. When archaeal members are involved,
however, the pattern changes considerably. Among the
14 families with members from all three domains, only
five are classical; that is, the extended type is in the
majority. One reason for this could be that the
Table 2. (Continued).
Family
designation
Family
size
Domain
Accession
number Identifier Description EC number
E
ABMI PO
SDR37C 24 X X P56937 DHB7_HUMAN 3-Keto-steroid reductase 1.1.1.270
A6NH47 A6NH47_HUMAN Putative uncharacterized protein HSD17B7P2
SDR38C
a
13 X X P35270 SPRE_HUMAN Sepiapterin reductase 1.1.1.153
SDR39U 23 X X X Q9NRG7 D39U1_HUMAN Epimerase family protein SDR39U1
SDR40C
a
17 X X X A0PJE2 DHR12_HUMAN SDR family member 12 1.1
SDR41C
a

36 X X X X Q9NZC7 WWOX_HUMAN WW domain-containing oxidoreductase 1.1.1
SDR42E
a
18 X X X Q8WUS8 D42E1_HUMAN SDR family 42E member 1 1.1.1
A6NKP2 D42E2_HUMAN Putative SDR family 42E member 2 1.1.1
SDR43U
a
11 X X P30043 BLVRB_HUMAN Flavin reductase 1.5.1.30
SDR44U
a
17 X X Q9BUP3 HTAI2_HUMAN Oxidoreductase HTATIP2 1.1.1
SDR45C
a
8 X X Q8N4T8 CBR4_HUMAN Carbonyl reductase 4 1.1.1
SDR47C
a
7 X X Q9BPX1 DHB14_HUMAN 17-b-Hydroxysteroid dehydrogenase 14 1.1
SDR48A
a
7 X X Q9HBL8 NMRL1_HUMAN NmrA-like family domain-containing protein 1
a
This family has too few members (fewer than 20 with maximally 80% pairwise sequence identity) to allow for calculation of an HMM.
SDR classification using HMM Y. Kallberg et al.
2380 FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS
extended SDRs are typically involved in basic meta-
bolic functions, and thus have a lower tendency to
vary. Classical SDRs, on the other hand, are involved
in many different types of enzyme reactions, and are
thus more diverse [1].
Family correlation with functional annotations

Among the identified families, only one-third of the
proteins have informative annotations; the other two-
thirds have terms such as putative, hypothetical, or
Table 3. SDR members in human, mouse, and rat.
Family Family name
Genes per family
Human Mouse Rat
SDR1E UDP-glucose-4-epimerase 1 1 1
SDR2E dTDP-
D-glucose-4,6-dehydratase 1 1 1
SDR3E GDP-mannose-4,6-dehydratase 1 1 1
SDR4E GDP-
L-fucose synthetase 1 1 1
SDR5C 3-Hydroxyacyl-CoA dehydrogenase type II 1 1 1
SDR6E UDP-glucuronic acid decarboxylase 1 1 1
SDR7C NADPH-dependent retinal reductase 6 7 1
SDR8C Peroxisomal multifunctional enzyme 1 1 1
SDR9C Multisubstrate SDR9C with preference for NAD(H) 8 14 8
SDR10E Fatty acyl-CoA reductase 2 2 1
SDR11E 3b-Hydroxysteroid dehydrogenase 6 7 5
SDR12C NADP(H)-dependent 17b-hydroxysteroid dehydrogenase (SDR12C) 3 3 3
SDR13C SDR13C with unknown substrate specificity 1 1 1
SDR14E
L-Threonine dehydrogenase 1 1 0
SDR15C 3-Hydroxybutyrate dehydrogenase 1 1 0
SDR16C Multisubstrate NADP(H)-dependent SDR16C 6 6 4
SDR17C Peroxisomal 2,4-dienoyl-CoA reductase 1 1 1
SDR18C Microsomal 2,4-dienoyl-CoA reductase 1 1 1
SDR19C SDR19C with unknown substrate specificity 1 1 1
SDR20C

L-Xylulose reductase 1 2 1
SDR21C NADPH-dependent carbonyl reductases 1 and 3 2 2 2
SDR22E NADH dehydrogenase (ubiquinone) 1a subcomplex 9 1 1 1
SDR23E Methionine adenosyltransferase 2 subunit b 111
SDR24C SDR24C with unknown substrate specificity 1 1 1
SDR25C SDR25C with unknown substrate specificity 4 2 1
SDR26C 11b-Hydroxysteroid dehydrogenases 1 and 3 2 1 1
SDR27X Fatty acid synthase 1 1 1
SDR28C Multisubstrate SDR28C 3 1 1
SDR29C Peroxisomal trans-2-enoyl-CoA reductase 1 1 1
SDR30C NAD(H)-dependent 17b-hydroxysteroid dehydrogenase (SDR30C) 1 1 1
SDR31E Sterol-4-a-carboxylate-3-dehydrogenase 1 1 1
SDR32C SDR32C with unknown substrate specificity 2 2 1
SDR33C Dihydropteridine reductase 1 1 1
SDR34C SDR34C with unknown substrate specificity 1 1 1
SDR35C 3-Ketodihydrosphingosine reductase 1 1 0
SDR36C 15-Hydroxyprostaglandin dehydrogenase 1 1 1
SDR37C 17b-Hydroxysteroid dehydrogenase (SDR37C) 2 1 1
SDR38C Sepiapterin reductase 1 1 1
SDR39U SDR39U with unknown substrate specificity 1 1 0
SDR40C SDR40C with unknown substrate specificity 1 0 0
SDR41C SDR41C with unknown substrate specificity 1 1 0
SDR42E SDR42E with unknown substrate specificity 2 1 0
SDR43U Flavin reductase 1 1 0
SDR44U HIV-1 TAT-interactive protein 2 1 1 1
SDR45C NADH-dependent carbonyl reductase 1 1 1
SDR47C NAD(H)-dependent 17b-hydroxysteroid dehydrogenase (SDR47C) 1 1 0
SDR48A SDR48A with unknown function (NmrA-like) 1 1 0
Y. Kallberg et al. SDR classification using HMM
FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS 2381

possible ⁄ probable, or are only identified as SDR pro-
teins. One advantage of the present family grouping is
that functional annotations can be concluded from
other family members, as many families have at least a
few members with annotations describing their func-
tions. In order to investigate whether family functions
could be derived, annotations for each member were
compared within the families. In families where the
described functions are quite general, we find no
inconsistencies; that is, the annotations (if present) are
the same for every member in a family, thus support-
ing the present classification.
We also find a good correlation between the present
family classification and the function(s) among families
for which a more detailed functional role is known,
predominantly families with human and mammalian
members. In families with a single human member (34
families), there are no contradictory functions anno-
tated; that is, for all members with known function,
the annotation is the same. There are some members
that might have another function, according to the
protein description, but the function seems to be
derived rather than actually determined. For instance,
in family SDR6E, there are two members described as
GDP-mannose-4,6-dehydratase and 3b-hydroxysteroid
dehydrogenase ⁄ isomerase (accession numbers Q00VJ3
and A0ZGH3), respectively, suggesting that they
belong to families SDR3E and SDR11E instead, but
there are no experimental data, and pairwise sequence
comparisons clearly identify the human representative

for SDR6E (UXS1_HUMAN) as the closest human
ortholog.
The 13 families with multiple human members typi-
cally contain one or several 17b-hydroxysteroid dehy-
drogenases (17b-HSDs) (see, for example, [14,15] for
overviews of functions). These types of enzyme have
mixed origins; one of them (type 5) is not even an
SDR, but belongs to the AKR family, and phyloge-
netic studies have shown that 17b-HSD activity has
evolved from different ancestors, e.g. in types 1, 2 and
3 (corresponding to families SDR28C, SDR9C and
SDR12C, respectively; see [16] and references therein).
These studies also provide support for the inclusion
of retinol dehydrogenases, an 11b-hydroxysteroid
dehydrogenase and 17b-HSDs in the SDR9C family,
as they most probably have a common ancestor among
B
E
41
178
0
63
0
18
14
A
Fig. 2. Number of SDR families with members representing one,
two or three of the domains of life. The numbers represent the
numbers of families with members in different combinations of the
three domains Eukaryota (E), Bacteria (B) and Archaea (A).

Table 4. The 14 SDR families present in all domains. The average ratio column shows the average number of members per species. The
letters E, B and A denote eukaryotic, bacterial and archaeal genomes, respectively. The numbers represent percentage of members from
each domain. Families with human occurrences are indicated by bold type in the eukaryotic column.
Family name Family designation Family size Number of species Average ratio
Percentage in
domain
EBA
UDP-glucose-4-epimerase SDR1E 1273 705 1.8 13.8 85.9 0.2
dTDP-
D-glucose-4,6-dehydratase SDR2E 1109 652 1.7 4.1 92.8 2.9
GDP-mannose-4,6-dehydratase SDR3E 642 395 1.6 11.2 86.0 1.9
GDP-
L-fucose synthetase SDR4E 469 299 1.6 14.5 83.2 1.1
UDP-glucuronic acid decarboxylase SDR6E 424 261 1.6 28.3 69.3 2.4
L-Threonine dehydrogenase SDR14E 135 103 1.3 25.9 71.9 2.2
Microsomal 2,4-dienoyl-CoA reductase SDR18C 95 66 1.4 22.1 75.8 2.1
Gluconate-5-dehydrogenase SDR49C 509 250 2.0 5.5 93.7 0.6
UDP-glucuronate-4-epimerase 4 SDR50E 423 281 1.5 10.9 87.9 1.2
L-Xylulose reductase SDR51C 184 99 1.9 53.3 46.2 0.5
Sulfolipid biosynthesis protein SDR52E 105 89 1.2 10.5 80.0 9.5
Glucose-1-dehydrogenase-related protein SDR53C 38 33 1.2 57.9 36.8 5.3
Capsular polysaccharide biosynthesis SDR55E 558 372 1.5 0.2 99.6 0.2
Acetoacetyl-CoA reductase SDR152C 1444 747 1.9 2.2 97.2 0.6
SDR classification using HMM Y. Kallberg et al.
2382 FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS
Table 5. SDR families unique to eukaryotes. The average ratio column shows the average number of members per species. M, Fi, I, P, Fu
and O denote mammals, fish, insects, plants, fungi and other eukaryotes, respectively.
Family name
Family
designation

Family
size
Number
of species
Average
ratio
Species group
MFiI PFuO
Dihydroflavonol-4-reductase SDR108E 456 120 3.8 x
Drosophila alcohol dehydrogenase SDR109I 404 91 4.4 x
Fatty acyl-CoA reductase SDR10E 257 31 8.3 x x x x
Sex determination protein tasselseed-2 SDR110C 243 47 5.2 x
NADP(H)-dependent 17b-hydroxysteroid
dehydrogenase (SDR12C)
SDR12C 164 52 3.2 x x x x x
Aflatoxin biosynthesis, versicolorin reductase SDR111C 162 29 5.6 x
Multisubstrate SDR9C with preference
for NAD(H)
SDR9C 158 29 5.4 x x x x
SDR24C with unknown substrate specificity SDR24C 149 23 6.5 x x x x
3b-Hydroxysteroid dehydrogenase SDR11E 129 49 2.6 x x x
Hypothetical protein SDR112C SDR112C 128 27 4.7 x x x
NADPH-dependent methylglyoxal reductase
GRE2
SDR113E 112 37 3.0 x
NADH dehydrogenase (ubiquinone) 1a
subcomplex 9
SDR22E 84 70 1.2 x x x x x x
Fatty acid synthase SDR27X 74 29 2.6 x x x x
Menthol dehydrogenase SDR114C 70 14 5.0 x

Fatty acid synthase a subunit FasA,
3-oxoacyl-(acyl-carrier protein) domain
SDR116U 69 44 1.6 x x
NADPH-dependent HC-toxin reductase SDR115E 66 9 7.3 x
NADPH-dependent carbonyl reductases 1 and 3 SDR21C 52 24 2.2 x x x
Hypothetical protein SDR118C SDR118C 51 19 2.7 x
Male sterility 2-like protein SDR117E 49 14 3.5 x
Multisubstrate SDR28C SDR28C 47 22 2.1 x x x
NAD(H)-dependent 17b-hydroxysteroid
dehydrogenase (SDR30C)
SDR30C 47 32 1.5 x x x x
Short-chain dehydrogenase ⁄ reductase SDR120C,
putative
SDR120C 44 18 2.4 x
Dihydropteridine reductase SDR33C 43 32 1.3 x x x x
Aminoadipate-semialdehyde dehydrogenase SDR121E 43 38 1.1 x
NAD-dependent epimerase ⁄ dehydratase SDR122U 43 41 1.0 x x x
11-b-Hydroxysteroid dehydrogenase-like protein SDR119C 42 12 3.5 x
Hypothetical protein SDR123C SDR123C 42 28 1.5 x
11b-Hydroxysteroid dehydrogenases 1 and 3 SDR26C 41 22 1.9 x x x
Hypothetical protein SDR124C SDR124C 37 36 1.0 x
Short-chain dehydrogenase ⁄ reductase family
protein
SDR125C 34 18 1.9 x
SDR32C with unknown substrate specificity SDR32C 33 24 1.4 x x x x
SDR128C oxidoreductase SDR128C 33 31 1.1 x
Sterol-4a-carboxylate-3-dehydrogenase SDR31E 32 19 1.7 x x x x
D-Arabinitol-2-dehydrogenase SDR126C 31 27 1.1 x
Hypothetical protein SDR127C SDR127C 30 19 1.6 x
3-Oxoacyl-(acyl-carrier protein) reductase SDR129C 27 17 1.6 x

Short-chain dehydrogenase ⁄ reductase family
SDR130C
SDR130C 25 17 1.5 x
17b-Hydroxysteroid dehydrogenase (SDR37C) SDR37C 24 11 2.2 x x x
SDR39U with unknown substrate specificity SDR39U 23 20 1.2 x x x
Putative short-chain type alcohol dehydrogenase
SDR132C
SDR132C 23 8 2.9 x
C-3 sterol dehydrogenase ⁄ C-4 decarboxylase
family
SDR133E 22 17 1.3 x
Y. Kallberg et al. SDR classification using HMM
FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS 2383
the invertebrates. Thus, the family classification seems
to be valid also for these families.
For every member in the first 47 families, we also
made sequence comparisons with all identified human
SDRs. In every family except two, all of the members
have the largest number of identities with the human
representatives in their own family. The first exception
is the retinol dehydrogenase family SDR7C, where we
find a total of 41 members (of 262) scoring higher
towards human SDR41C1 (WWOX_HUMAN) than
any of its own human representatives. During HMM
training, some overlaps were found between these clus-
ters, but as we were unable to create a single HMM
that captured every member in the two clusters, it was
decided to keep them separate for now. It is possible
that these two families have the same ancient origin,
and hence should be one family instead. The other

exception is SDR8C, comprising peroxisomal multi-
functional enzymes, where 25 members (of 360) pre-
ferred SDR30C1 (DHB8_HUMAN). This is in spite
the fact that HMM iteration with these proteins as
seed led to the inclusion of human SDR8C1 and not
SDR30C1. The SDR8C family is primarily involved in
fatty acid metabolism, and has a multidomain struc-
ture, with an N-terminal SDR domain followed by a
hydratase domain, and finally a sterol carrier protein 2
domain. Members of the SDR30C family consist of a
single SDR domain; the exact function has yet to be
discovered, but both fatty acid and steroid metabolism
have been suggested. Thus, with the knowledge avail-
able today, it is not possible to evaluate these families
further.
Other SDR classifications
As mentioned in Experimental procedures, Pfam [6]
identifies SDRs through the three profiles Adh_short
(PF00106), Epimerase (PF01370) and 3Beta_HSD
(PF01073), thus classifying these proteins at a much
more general level, which does not allow the more
fine-grained analysis regarding the presently identified
families. Identifying members of the SDR superfamily
is, of course, a necessary step in order to be able to
cluster and divide them further, but does not give us
insights into the function of a specific protein, owing
to the large variation in functionality among SDRs.
Also, the general HMMs might not correctly identify
sequence fragments, which more specialized HMMs
can. About 1600 SDRs were identified in this way, i.e.

not found by the general SDR HMMs but by the
family HMMs only.
Another approach uses evolutionary trees [17] to
achieve subfamily classification. In comparison with
the method presented herein, this approach arrives at
much more fine-grained families; for example, our 3b-
hydroxysteroid dehydrogenase family is split into eight
subfamilies, and our family with retinol dehydrogenas-
es is split into as many as 19 subfamilies. A classifica-
tion system that is too specific would be impractical,
as it would not provide a correct overview of the
divergent SDR superfamily. Furthermore, functional
conclusions drawn from family members would be of
less practical value with smaller families, owing to limi-
tations in annotations.
Our HMM system as basis for nomenclature
The presently characterized SDR families form a natu-
ral foundation for a nomenclature system. We have
therefore, together with a number of researchers in the
SDR field, created such a nomenclature system [18],
which is already in use [19]. This nomenclature will
help us to keep track of the different SDR families,
and facilitate collection of knowledge on the structural
and functional properties of one of the largest protein
families known to date.
Experimental procedures
A number of HMMs were developed in order to arrive at
a subclassification of the SDR superfamily. There are
already HMMs (three Pfam HMMs and an HMM previ-
ously developed by us) for the identification of new SDR

members in general. The purpose of the HMMs now
developed is to divide the SDRs into more manageable
subfamilies with a more specialized function in common
than general dehydrogenase ⁄ reductase activity. The new
HMMs were developed using an iterative approach to
arrive at stable HMMs that correctly identifies their own
members and disregards members of other SDR families
(see below).
SDR proteins were extracted from the Uniprot database
[20], human RefSeq [21] and human Ensembl [22] as of 1
October, 2008, using a previously developed HMM [10]
and the Pfam [6] profiles PF00106, PF01073 and PF01370.
This dataset consisted of 47 011 proteins (7905 only by our
own method, and 6254 only by Pfam). In addition, 1581
proteins have so far been identified by the HMMs now
developed.
In order to identify clusters of SDR families, each of the
candidate sequences was compared with all of the other
candidates using fasta [23]. We tested clustering at various
levels, and found that an initial clustering at the 40% level
and an opt-score better than 700 were most appropriate, as
judged from test cases with SDR enzymes of known func-
tion. Furthermore, the 40% level has also been shown to
be suitable for other classification (nomenclature) systems,
SDR classification using HMM Y. Kallberg et al.
2384 FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS
such as those of AKRs [6] and cytochrome P450 [7]. How-
ever, for the SDR classification, we strive to avoid a strict
percentage residue identity threshold, as this might cause
strange effects for members with residue identities close to

the threshold, and would not correctly reflect the enzymatic
properties of the family. Instead, an iterative ‘fishing’
approach is taken, where the initial fasta clusters consti-
tute a starting point only.
The members of each SDR cluster were aligned using
clustalw [24]. The alignments were made nonredundant,
so that no pair of sequences had more than 80% sequence
identity, in order to avoid bias. After this redundancy
reduction, the alignment was transformed into an HMM
using hmmer [25,26]. Subsequently, a database search
against Uniprot was performed, hits with a score better
than 400 were added to the cluster, and the HMM was
retrained. This process of aligning, HMM training and
database search was repeated until no further cluster mem-
bers could be added.
If a cluster had too few members, its HMM could be
overtrained and only able to identify the training
sequences. Hence, minimum thresholds were set at three
points in the clustering procedure, and clusters failing the
threshold were set aside while the detection of further
members was awaited. The first point was before calcula-
tion of the initial alignment, if the fasta result list con-
tained fewer than five hits with 40–80% residue identity.
The second threshold point was after the first HMM itera-
tion, if fewer than 10 members were identified in the data-
base search. The third and last point was set after the
iteration process had ended, if the cluster contained fewer
than 20 members.
Subsequently, all HMMs were tested to ensure that each
SDR member was detected by only one HMM, in order to

achieve the desired specificity. If two clusters were equal,
one of the HMMs was selected, and if one cluster was a
subset of another cluster, the HMM of the larger cluster
was selected. If there was only a partial overlap between
the clusters, i.e. if both sets had members that were not in
the other set, none of the HMMs was selected. Instead, an
attempt was made to solve the overlap by excluding the
overlapping members from both clusters and retraining the
HMMs. If this approach still resulted in overlapping clus-
ters, the two sets were combined into one cluster, and an
attempt was made to achieve a stable HMM for the whole
set. If this was still unsuccessful, no HMM was created for
any of the two initial clusters.
Some SDRs form part of multidomain proteins, e.g.
FOX2_CANTR, a multifunctional b-oxidation protein with
two SDR domains, and FAS_HUMAN, the cytosolic 2511
amino acid residue fatty acid synthase, which contains one
SDR domain. In order to cluster these multidomain pro-
teins, the programs and scripts developed were adjusted so
that only the SDR parts of the protein sequences were
included in the HMM building procedure.
A quality test was performed on the resulting collection
of unique HMMs, using jack-knifing. Each HMM was
retrained with one of its members removed from the train-
ing set. This was repeated until each member had been
removed once, and the ability of the retrained HMMs to
correctly identify the member left out was tested. The test
consisted of two parts: the member left out needed to have
a score above the threshold (400), and all nonmembers (i.e.
members of other clusters) needed to have a score below

the threshold.
The iterative clustering process was automated using a
series of shell scripts and programs developed in C. Typi-
cally, each SDR cluster needed eight iterations, and each
iteration took approximately 4 h on a Linux workstation
equipped with an Intel 2.5 GHz processor. Hence, going
through the whole dataset would have been a tedious and
time-consuming process. Parts of the large-scale runs were
therefore carried out on the 805-node Hewlett-Packard
DL140 cluster Neolith at the National Supercomputer
Centre (Linko
¨
ping, Sweden).
Acknowledgements
Financial support from Linko
¨
ping University and the
Karolinska Institutet is gratefully acknowledged. The
Structural Genomics Consortium is a registered
charity (number 1097737) that receives funds from
the Canadian Institutes for Health Research, the
Canadian Foundation for Innovation, Genome
Canada, through the Ontario Genomics Institute,
GlaxoSmithKline, the Karolinska Institutet, the Knut
and Alice Wallenberg Foundation, the Ontario Inno-
vation Trust, the Ontario Ministry for Research and
Innovation, Merck & Co., Inc., the Novartis
Research Foundation, the Swedish Agency for
Innovation Systems, the Swedish Foundation for
Strategic Research, and the Wellcome Trust. U.

Oppermann is supported by the NIHR Oxford Bio-
medical Research Unit. Computational resources were
provided via the allocation committee of the Swedish
National Infrastructure for Computing (SNIC). We
also thank J O. Ja
¨
rrhed and the National Supercom-
puter Centre (NSC), Linko
¨
ping, Sweden, for computer
support.
References
1 Kavanagh KL, Jo
¨
rnvall H, Persson B & Oppermann U
(2008) Functional and structural diversity within the
short-chain dehydrogenase ⁄ reductase (SDR) superfam-
ily. Cell Mol Life Sci 65, 3895–3906.
2 Persson B, Krook M & Jo
¨
rnvall H (1991) Characteris-
tics of short-chain alcohol dehydrogenases and related
enzymes. Eur J Biochem 200, 537–543.
Y. Kallberg et al. SDR classification using HMM
FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS 2385
3 Kallberg Y & Persson B (2006) Prediction of coenzyme
specificity in dehydrogenases ⁄ reductases: a hidden
Markov model-based method and its application on
complete genomes. FEBS J 273, 1177–1184.
4 Yooseph S, Sutton G, Rusch DB, Halpern AL, Wil-

liamson SJ, Remington K, Eisen JA, Heidelberg KB,
Manning G, Li W et al. (2007) The Sorcerer II Global
Ocean Sampling expedition: expanding the universe of
protein families. PLoS Biol 5, e16.
5 Filling C, Berndt KD, Benach J, Knapp S, Prozorovski
T, Nordling E, Ladenstein R, Jo
¨
rnvall H & Oppermann
U (2002) Critical residues for structure and catalysis in
short-chain dehydrogenases ⁄ reductases. J Biol Chem
277, 25677–25684.
6 Bateman A, Coin L, Durbin R, Finn RD, Hollich V,
Griffiths-Jones S, Khanna A, Marshall M, Moxon S,
Sonnhammer EL et al. (2004) The Pfam protein families
database. Nucleic Acids Res 32 (Database issue) D138–
D141.
7 Sullivan FX, Kumar R, Kriz R, Stahl M, Xu GY,
Rouse J, Chang XJ, Boodhoo A, Potvin B & Cumming
DA (1998) Molecular cloning of human GDP-mannose
4,6-dehydratase and reconstitution of GDP-fucose bio-
synthesis in vitro. J Biol Chem 273, 8193–8202.
8 Tonetti M, Sturla L, Bisso A, Benatti U & De Flora A
(1996) Synthesis of GDP-l-fucose by the human FX
protein. J Biol Chem 271, 27274–27279.
9 Thoden JB, Wohlers TM, Fridovich-Keil JL & Holden
HM (2001) Molecular basis for severe epimerase defi-
ciency galactosemia. X-ray structure of the human
V94m-substituted UDP-galactose 4-epimerase. J Biol
Chem 276, 20617–20623.
10 Kallberg Y, Oppermann U, Jo

¨
rnvall H & Persson B
(2002) Short-chain dehydrogenases ⁄ reductases (SDRs):
coenzyme-based functional assignments in completed
genomes. Eur J Biochem 269, 4409–4417.
11 Nobel S, Abrahmsen L & Oppermann U (2001) Meta-
bolic conversion as a pre-receptor control mechanism
for lipophilic hormones. Eur J Biochem 268, 4113–4125.
12 Jo
¨
rnvall H, Ho
¨
o
¨
g J-O & Persson B (1999) SDR and
MDR: completed genome sequences show these protein
families to be large, of old origin, and of complex
nature. FEBS Lett 445, 261–264.
13 Kallberg Y, Oppermann U, Jo
¨
rnvall H & Persson B
(2002) Short-chain dehydrogenase ⁄ reductase (SDR)
relationships: a large family with eight clusters common
to human, animal, and plant genomes. Protein Sci 11,
636–641.
14 Lukacik P, Kavanagh KL & Oppermann U (2006)
Structure and function of human 17beta-hydroxysteroid
dehydrogenases. Mol Cell Endocrinol 248, 61–71.
15 Moeller G & Adamski J (2006) Multifunctionality of
human 17beta-hydroxysteroid dehydrogenases. Mol Cell

Endocrinol 248, 47–55.
16 Baker ME (2001) Evolution of 17beta-hydroxysteroid
dehydrogenases and their role in androgen, estrogen
and retinoid action. Mol Cell Endocrinol 171, 211–215.
17 Krishnamurthy N & Sjolander K (2005) Phylogenomic
inference of protein molecular function. Curr Protoc
Bioinformatics. Chapter 6, Unit 6.9.
18 Persson B, Kallberg Y, Bray JE, Bruford E, Dellaporta
SL, Favia AD, Duarte RG, Jornvall H, Kavanagh KL,
Kedishvili N et al. (2009) The SDR (short-chain dehy-
drogenase ⁄ reductase and related enzymes) nomenclature
initiative. Chem Biol Interact 178 , 94–98.
19 Kowalik D, Haller F, Adamski J & Moeller G (2009)
In search for function of two human orphan SDR
enzymes: hydroxysteroid dehydrogenase like 2
(HSDL2) and short-chain dehydrogenase ⁄ reductase-
orphan (SDR-O). J Steroid Biochem Mol Biol 117 ,
117–124.
20 Bairoch A, Apweiler R, Wu CH, Barker WC, Boeck-
mann B, Ferro S, Gasteiger E, Huang H, Lopez R,
Magrane M et al. (2005) The universal protein resource
(UniProt). Nucleic Acids Res 33 , D154–D159.
21 Pruitt KD, Tatusova T & Maglott DR (2007) NCBI
reference sequences (RefSeq): a curated non-redundant
sequence database of genomes, transcripts and proteins.
Nucleic Acids Res 35, D61–D65.
22 Hubbard TJ, Aken BL, Ayling S, Ballester B, Beal K,
Bragin E, Brent S, Chen Y, Clapham P, Clarke L et al.
(2009) Ensembl 2009. Nucleic Acids Res 37, D690–
D697.

23 Pearson WR & Lipman DJ (1988) Improved tools for
biological sequence comparison. Proc Natl Acad Sci
USA 85, 2444–2448.
24 Chenna R, Sugawara H, Koike T, Lopez R, Gibson TJ,
Higgins DG & Thompson JD (2003) Multiple sequence
alignment with the Clustal series of programs. Nucleic
Acids Res 31, 3497–3500.
25 Durbin R, Eddy S, Krogh A & Mitchison G (1998)
Biological sequence analysis: probabilistic models of
proteins and nucleic acids. Cambridge University Press,
Cambridge.
26 Eddy SR (1998) Profile hidden Markov models.
Bioinformatics 14, 755–763.
SDR classification using HMM Y. Kallberg et al.
2386 FEBS Journal 277 (2010) 2375–2386 ª 2010 The Authors Journal compilation ª 2010 FEBS

×