Tải bản đầy đủ (.pdf) (75 trang)

PREDICTIVE TOXICOLOGY - CHAPTER 9 ppsx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.92 MB, 75 trang )

9
Applications of Substructure-Based
SAR in Toxicology
HERBERT S. ROSENKRANZ
Department of Biomedical Sciences,
Florida Atlantic University,
Boca Raton, Florida, U.S.A.
BHAVANI P. THAMPATTY
Department of Environmental and
Occupational Health, Graduate
School of Public Health,
University of Pittsburgh, Pittsburgh,
Pennsylvania, U.S.A.
1. INTRODUCTION
The increased acceptance of SAR techniques in the regulatory
arena to predict health and ecological hazards (1–6) has
resulted in the development and marketing of a number of
SAR programs (7). The approaches are of optimal usefulness
when they are employed as adjuncts to the appropriate
The authors have no commercial interest in any of the technologies
described in this review.
309
© 2005 by Taylor & Francis Group, LLC
human expertise. In addition to predicting specific toxicologi-
cal endpoints, these methodologies, in the hands of an expert,
can also be used to gain insight into the mechanistic basis of
the action of toxicants and thereby allow a more refined
health or ecological risk assessment (8,9).
This review deals with aspects of SAR methodologies
that are based upon substructural analyses that are driven
primarily by statistical constraints [e.g., MULTICASE (10–


12)] as opposed to satisfying predetermined rules [e.g.,
DEREK, (13–15) ONCOLOGIC (16,17); ‘‘Structural Alerts’’
(18)]. It must, however, be made clear that human expertise
is very much involved in most aspects of these knowledge-
based substructural methods (8,9). Thus, the inclusion of
experimental data into the ‘‘learning set’’ that forms the basis
of any SAR model must adhere to previously agreed upon
protocols and data handling procedures (Fig. 1). Moreover,
prior to SAR modeling, the context in which the resulting
model will be used has to be defined as it will affect the
manner in which the biological=toxicological activities are
encoded and the derived SAR model interpreted.
Thus, it is commonly recognized (7,19) that the induction
of cancers in rodents is one of the most challenging phenom-
ena to model by SAR techniques. Yet, bearing in mind the
Figure 1 Outline of the SAR approach indicating the interactions
with the human expert.
310 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
complexity of the phenomenon and the regulatory context in
which SAR predictions were to be used, Matthews and Con-
trera (20) of the U.S. Food and Drug Administration—by
encoding the spectrum of activities, i.e., carcinogenicity in
male and=or female rats and=or mice and devising rules on
how the predictions were to be used—were able to develop a
highly predictive MULTICASE SAR model of rodent carcino-
genicity. It needs to be stressed that the success in developing
the model was primarily the result of the human insight
brought by the investigators (20).
2. THE ROLE OF HUMAN EXPERTISE

Substructure-based SAR approaches can handle databases in
which activities are expressed categorically, i.e., active, mar-
ginally active, inactive, or in a continuous scale. However, it
is not always a matter of simply inserting data into the model.
Thus, the database for the induction of unscheduled DNA
synthesis is indeed categorical (21) and allows the derivation
of a coherent SAR model (22). On the other hand, the Salmo-
nella mutagenicity database generated under the aegis of the
U.S. National Toxicology Program (23) requires insight into
how to express activities with respect to SAR modeling.
Essentially, in that data set, each chemical is reported with
respect to its ability to induce mutations in five Salmonella
typhimurium tester strains in the presence or in the absence
of several postmitochondrial activation mixtures (S9) derived
from rats, mice, and hamsters induced or uninduced with the
polychlorinated biphenyl mixture Aroclor 1254 (24). Each of
the tester strains has a different specificity with respect to
its response to mutagens. Moreover, the exogenous S9 mix-
tures may contain different levels of cytochrome P450 activat-
ing and deactivating enzymes which may act on the test
chemical and=or its metabolites. If the purpose for deriving
a SAR model is to understand the basis of the mutagenicity
of a class of chemicals, then the Salmonella strain that
is the most responsive to that chemical class should be
used [e.g., the mutagenicity of nitrated polycyclic aromatic
Applications of Substructure-Based SAR in Toxicology 311
© 2005 by Taylor & Francis Group, LLC
hydrocarbons should be studied in Salmonella typhimurium
TA98 in the absence of S9 (25–27)]. Similarly, if the aim is to
understand the differences in mutagenicity in tester strains

that respond to base substitution mutations vs. those that
respond to frameshift mutations as a result of covalent binding
to a DNA base, then one might model separately and then com-
pare, for example, the responses of aromatic amines in Salmo-
nella strains TA98 and TA100 in the presence of S9 (28). In
such instances, for SAR modeling, the human expert would
select the specific mutagenic potency (e.g., revertants=nmole=
plate) reported for each chemical for the specific strain with or
without S9. Moreover, based upon personal knowledge of the
system and the specific class of chemicals, the expert would
then have to select a cut-off value between mutagens and mar-
ginal mutagens, and between marginal mutagens and non-
mutagens. The expert would then be able to derive an equation
relating mutagenic potency to an SAR unit scale compatible
with the SAR program being used (see below).
If, on the other hand, the purpose of deriving a SAR
model is to identify potential ‘‘genotoxic’’ (i.e., mutagenic) car-
cinogens, which is the class of agents associated with risk to
humans (29–33), then one might consider deriving a dozen
or more separate SAR models (e.g., TA 100-S9, TA100 þS9,
TA 98, TA 1537, etc.) and then devise an algorithm to combine
the results of the different models into a single prediction [see
Refs. (34) and (35)]. This, however, is a tedious and time-con-
suming process. Moreover, ‘‘genotoxic’’ carcinogenicity has
not been associated with either a response in a specific tester
strain or with the mutagenic potency in that strain. Rather,
the association is a qualitative one between carcinogenicity
and mutagenicity in any of the strains and carcinogenicity
in rodents (36). Accordingly, consideration can then be given
to the paradigm that a response in any one of the tester

strains in the absence or the presence of a single S9 prepara-
tion will be sufficient to identify a carcinogenic hazard. More-
over, since different tester strains may respond differently
qualitatively as well as quantitatively to individual chemi-
cals, the indications of potencies that are used cannot be con-
tinuous. In fact, they must be categorical and the expert may
312 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
designate specific criteria for defining a mutagen, e.g., twice
the spontaneous frequency of mutations and a linear dose–
response (37,38).
Depending upon an understanding of the mechanis-
tic=biological basis of activity, there have been variations on
the potency metrics. Thus, the Carcinogen Potency Data Base
(CPDB) (39) reports results as TD
50
values, i.e., the daily dose
that in a lifetime study will permit 50% of the treated animals
to remain tumor-free. The TD
50
value is reported as
mg=kg=day (39–41). However, given the widespread range
in molecular weights of the chemicals in a data set (e.g.,
dimethylnitrosamine and benzo(a)pyrene, molecular weights
74 and 252 Da, respectively), for SAR studies that measure
needs to be transformed into mmol=kg=day in order to yield
a meaningful SAR model and the associated generation of
‘‘modulators’’ (see below) that affect the potency of the SAR
projection.
The human expert has to make a further decision: the

definition of a ‘‘marginal carcinogen’’ and a ‘‘non-carcinogen.’’
Should only chemicals inducing no cancers even at the maxi-
mum tolerated dose (42–44) be considered non-carcinogens or
should there be a cut-off dose, above which even if tumors are
induced, they would not be considered biologically or toxicolo-
gically significant given the high dose needed? This would
reflect Paracelsus’ dictum ‘‘that it is the dose that makes
the toxin’’ (45).
For the purpose of SAR modeling of CPDB, we chose cut-
off values of 8 and 28 mmol=kg=day between carcinogens and
marginal carcinogens, and between marginal carcinogens and
non-carcinogens, respectively. Based upon the characteristics
of the MULTICASE SAR methodology wherein SAR units
19 indicate non-carcinogenicity; 20–29 marginal carcino-
genicity; and 30 carcinogenicity, this led to the relationship
SAR activity ¼ð18:328 log 1=TD
50
Þþ46:55 ð1Þ
On the other hand, the rodent carcinogenicity database
generated under the auspices of the NTP has been classified
Applications of Substructure-Based SAR in Toxicology 313
© 2005 by Taylor & Francis Group, LLC
according to its spectrum of activities (29). The reason for that
classification is derived from the realization that agents that
are carcinogenic at multiple sites of both genders of rats and
mice are generally found to be ‘‘genotoxic’’ (i.e., possess muta-
genicity and=or structural alerts for DNA reactivity) (29,30).
These characteristics are associated with a greater carcino-
genic risk to humans than chemicals that are restricted to
inducing cancers in a single tissue of a single gender of a

single species (29,33).
That spectrum of carcinogenicity can be captured by hav-
ing the scale of carcinogenic activities (i.e., SAR units) reflect
it, i.e., 10 SAR units for non-carcinogens; 20 for ‘‘equivocal’’
carcinogens; 30 for chemicals carcinogenic at only a single site
in a single sex of a single species; 40 for chemicals carcino-
genic at a single site in both sexes of one species; 50 for
chemicals carcinogenic to a single species but at two or more
sites; and 60 for chemicals carcinogenic to mice and rats at
one or more sites (46).
Because the spectrum of activities as well as the poten-
cies reflect different aspects of the carcinogenic phenomenon,
algorithms were developed to combine the results of the
different SAR models of rodent carcinogenicity into a single
prediction model (34,35). Although the approach used
heretofore is a Bayesian one (47), there is no reason to
suppose that other approaches (neural networks, genetic
algorithm, rule learners) are not equally effective (e.g., see
Refs. 48,49).
Obviously, this integrative approach is not restricted only
to SAR models of rodent carcinogenicity. They could include
projections obtained with other SAR models related to
mechanisms of carcinogenicity, i.e., SAR projections of carci-
nogenicity combined with the prediction of the in vivo induc-
tion of micronuclei (50) and of inhibition of gap junctional
intercellular communication (51). Finally, the same approach
can be explored to combine SAR projections with the experi-
mental results of surrogate tests for carcinogenicity (e.g.,
induction of chromosomal aberration and of mutations at the
tk

þ=
locus of mouse lymphoma cells). Finally, combining the
results from different SAR approaches, e.g., knowledge-based
314 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
(e.g., MULTICASE) with rule-based [e.g., DEREK
(13–15) or ONCOLOGIC] (16,17) is a promising avenue that
is worthy of further investigation.
The point of the above examples is that human familiar-
ity with an expertise in the biological phenomenon under
investigation is essential for the maximal utilization of SAR
techniques.
Another instance in which human expertise was
essential for the development of a coherent SAR model
involves allergic contact dermatitis (ACD) in humans. In that
endeavor, initial human insight was needed at several crucial
steps:
1. The recognition that in spite of common practice and
assumption, human and guinea pig ACD data were
not equivalent and could not be pooled to develop a
coherent SAR model (52).
2. That the inclusion of ‘‘case reports’’ among experi-
mentally determined human ACD data degraded
the performance of the SAR model unless the
number of independent ‘‘case reports’’ was greater
than 7 (53).
3. That an ACD response calibration based upon the
challenge dose, the extent of the response, and the
proportion of responders among challenged humans
had to be developed to provide a potency scale (54).

When these pre-SAR processing experimental data hand-
ling procedures were resolved, a coherent and highly predic-
tive SAR model of human ACD was developed (54). But
again, it required the participation and collaboration of
experimental immunologists and SAR experts.
The same considerations entered in developing other
models, e.g., human developmental toxicity which depended
upon: (1) the acceptance of the results of an expert consensus
panel, and (2) the rejection of results of borderline signifi-
cance (55). Of course, it was also the reason for the success
of the development of the aforementioned highly predictive
SAR model of rodent carcinogenicity by Matthews and
Contrera (20).
Applications of Substructure-Based SAR in Toxicology 315
© 2005 by Taylor & Francis Group, LLC
3. MODEL VALIDATION: CHARACTERIZATION
AND INTERPRETATION
Irrespective of the SAR paradigm employed, knowledge and
understanding of the performance of the resulting SAR model
is crucial to its deployment. This is especially so as no SAR
model is perfectly predictive. Yet, understanding a model’s
limitations is needed if it is to be used and interpreted. The
most widely accepted measure of a model’s performance is
the concordance between experimentally determined results
and SAR-derived predictions of chemicals external to
the model. This parameter, in turn, is a function of a model’s
sensitivity (correctly predicted actives=total actives) and
specificity (correctly predicted inactives=total inactives).
The most direct and preferable approach to determine
these parameters is to randomly remove from the learning

set a number of chemicals to be used as the ‘‘tester set.’’ The
remaining chemicals can be used to develop the SAR model.
The resulting models’ predictivity parameters and their sta-
tistical significance can then be determined by challenge with
this external ‘‘tester set.’’
However, most frequently that approach cannot be taken
with respect to SAR models describing toxicological phenom-
ena. This derives from the fact that the performance of a
SAR model depends upon its size (i.e., the number of chemi-
cals in the database) (10,56–58). For most databases of toxico-
logical phenomena, there is a paucity of experimental results
for chemicals. Accordingly, the predictive performance of the
model will be negatively affected by removal of chemicals to
be used as the external ‘‘tester set.’’ Because of this considera-
tion, cross-validation and ‘‘leave-out one’’ approaches have
been used (59). Thus, it has been demonstrated that the itera-
tive random removal of chemicals (e.g., 5% of the total) and
using the remaining ones (i.e., 95%) as the learning set and
repeating the process (e.g., 20 times for a 5% removal), and
determining the cumulative predictivity parameters are an
acceptable approach (59).
In most substructure-based SAR approaches, the signifi-
cant structural determinant (e.g., biophores and toxicophores)
316 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
identified will be a substructure enriched among active chemi-
cals. Accordingly, the presence of the toxicophore is associated
with a probability of activity and a baseline potency (Table 1;
Fig. 2).
While biophores=toxicophores are the significant as well

as the principal determinants of biological and toxicological
activity, toxicologists as well as health risk assessors are well
aware that not all chemicals in a certain chemical class are
toxicants even though the majority may be. Thus, only
83.3% of nitroarenes tested are Salmonella mutagens and
only 74.4% of chloroarenes tested are reported to be rodent
carcinogens (60). This situation is reflected in the fact that
only 74% of the chemicals containing the toxicophore NH
2

c–cH¼ (Fig. 2) are rodent carcinogens. The question then
arises whether SAR approaches can be used to explain this
dichotomy as well as to provide a basis for the difference in
projected potencies. In MULTICASE SAR, this discrimination
is provided by modulators (10–12). Thus each biophore=
toxicophore is associated with a probability of activity and a
basal potency. For the illustration in Fig. 2, the presence of
the toxicophore is associated with a 75% probability of
carcinogenicity and a potency of 50.3 SAR units. Based
upon Eq. (1), 50.3 SAR units correspond to a TD
50
value
of 0.62 mmol=kg=day. In MULTICASE, each biophore=
toxicophore may be associated with a group of modulators
(Table 2) which determine whether the potential for activity
is realized and, if so, to what extent. Modulators are primarily
structural elements that can either increase (Fig. 3), decrease
(Fig. 4), or abolish (Fig. 5) the potential potency associated
with a toxicophore. Additionally, the potential of a toxico-
phore can be negated by the presence in the molecule of deac-

tivating moieties that are derived from inactive molecules in
the data set. The latter are not associated with chemicals that
are at the origin of the toxicophore (e.g., Fig. 6).
In addition to being substructural elements, modulators
may also be physical chemical or quantum chemical in nat-
ure. Thus, the rat-specific carcinogenic toxicophore associated
with the activity of the chloroaniline derivative shown in
Fig. 7, which defines a non-genotoxic rat carcinogenic species,
Applications of Substructure-Based SAR in Toxicology 317
© 2005 by Taylor & Francis Group, LLC
Table 1 Some of the Major Toxicophores Associated with Rodent Carcinogenicity: Non-congeneric Data
Base
Toxicophore 1–2–3–4–5–6–7–8–9–10
Number of
Fragments Inactives Marginals Actives Number
NH
2
–c¼cH–
65 15 3 47 1
NH–C¼N–
9 1 0 8 2
[Cl–]h–4.0A–i [Cl–] 21 2 0 19 3
CH
2
–N–CH
2
–2970224
O–CH¼
7007
5

N–C¼
5005
6
O–C¼
14 1 0 13 7
O
^
–CH
2
–6
0068
Br–CH
2
–5
0059
cH¼cH–c¼cH–cH¼h3–Cli 14 3 0 11 10
PO–O
11 1 0 10 11
CH
3
–N–c¼cH–h2–CH
3
i 6105
12
cH¼c–cH¼cH–c <¼h2–NHi 611413
Cl–CH
2
–26412114
c.
00

–CO–c.¼
700715
NO
2
–C¼CH–
14 0 0 14 16
cH¼c–cH¼cH–c¼h2–CH
3
i 500517
CH
3
–C¼cH–cH¼cH– 7 0 1 6 18
Toxicophore no. 1 is shown in Figs. 1–6, 18, and 19, no. 17 in Fig. 18.
‘‘c’’ and ‘‘C’’ refer to aromatic and acyclic atoms, respectively; c. indicates a carbon atom shared by two rings; O
^
indicates an epoxide; c
00
indicates a carbon atom connected by a double bond to another atom. h3–Cliindicates a chlorine atom substituted on the thrid non-hydro-
gen atom from the left. 4.0 A! indicates a 2-D 4 Angstrom distance descriptor.
In toxicophore no. 18, the second carbon from the left is shown as unsubstituted. This means that it can be substituted with any atom
except hydrogen. On the other hand, for this toxicophore, the last carbon on the right is shown with an attached hydrogen. This means it
cannot be substituted by any other atom but hydrogen. Finally, in toxicophore no. 10, the third non-hydrogen atom from the left is shown
as unsubstituted. It can only be substituted by a chlorine atom.
318 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
is modulated by 9

(water solubility of the chemical). In
effect, this can be interpreted (see legend to Fig. 7) that the
greater the lipophilicity (i.e., the lower the water solubility)

of a chemical containing that toxicophore, the greater its
carcinogenic potency. Mechanistically, this may reflect that
lipophilicity increases residence time in body tissues (e.g., sto-
rage in adipose tissues) and thus augments the effective dose.
An understanding of the nature of the toxicophores and
associated modulators can provide insight regarding the
mechanistic basis of the toxicity (see below). This knowledge
can also be used to modify the chemical’s structure in order
to decrease or abolish the unwanted toxic effects inherent in
Figure 2 Prediction of the carcinogenicity in rodents of m-cresi-
dine. The presence of toxicophore A is associated with a 75% prob-
ability of carcinogenicity and a basal potency of 50.3 SAR units
which corresponds to a TD
50
value of 0.62 mmol=kg=day [see Eq. (1)].
Applications of Substructure-Based SAR in Toxicology 319
© 2005 by Taylor & Francis Group, LLC
a beneficial molecule without affecting the latter (also see
below).
In addition to identifying toxicophores, MULTICASE
also has the capability of identifying substructures that,
although not statistically significant, may be indicative of bio-
logical or toxicological activity (Fig. 8). Such structures should
be scrutinized by the human expert to determine whether
they are relevant to a carcinogenic potential. Such an exami-
nation should include a search of databases to determine
whether other chemicals containing that substructure are
endowed with that or related potentials. An in-depth study
of these ‘‘unique’’ structures is especially appropriate if it is
Table 2 Modulators Associated with the Toxicophore NH

2
–c¼cH–
1–2–3–4–5–6–7–8–9–10 OSAR Number
2D [N–] h–2.6A–i [N¼] 29.1 1
CO–NH
2
28.6 2
N¼CH–C¼18.9 3
NH–C¼CH– 15.4 4
n¼c–cH¼h2–NH
2
i 19.0 5
c¼cH–c¼c– 23.8 6
cH¼c.–N¼C– 32.3 7
OH–CO–c¼c < – 20.1 8
Cl–c¼cH–c <¼23.2 9
cH¼cH–c¼c < – h3–CH
3
i 12.1 10
cH¼c–cH¼cH–c <¼ 17.7 11
CH
3
–O–c¼cH–c <¼h3–c¼i 0.7 12
CH
3
–O–c¼cH–c <¼cH– 0.7 13
NH
2
–c¼cH–cH¼c–NH– 20.1 14
NH

2
–c¼cH–cH¼c–NH2 25.5 15
NH
2
–c¼cH–cH¼c–cH¼ 25.1 16
NH
2
–c¼cH–cH¼cH–c¼35.3 17
OH–CO–c¼c < – h3–cH¼i 5.0 18
OH–CO–c¼c–cH¼5.0 19
OH–CO–c¼c <–cH¼5.0 20
OH–CO–c¼c <–cH¼h3–CH¼i 5.0 21
These modulators are associated with toxicophore no. 1 of Table 1 (i.e., non-congene-
ric database). Modulator no. 11 is shown in Fig. 3; no. 9 in Figs. 4 and 5; no. 6 in Fig. 5;
no. 11 in Fig. 18.
For an explanation of the significance of the structural moieties, see the legend to
Table 1.
320 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
derived from chemicals possessing great potency, e.g.,
tetrafluoro-m-phenylenediamine with a TD
50
value of
0.50 mmol=kg=day (Fig. 8).
One of the characteristics that differentiates SAR meth-
ods used in drug discovery from those used in toxicology
Figure 3 Prediction of the carcinogenicity in rodents of benzidine.
The basal potency associated with toxicophore A (i.e., 50.3 units) is
augmented by the presence of modulator B. The projected potency of
67.9 SAR units corresponds to a TD

50
value of 0.07 mmol=kg=day.
Applications of Substructure-Based SAR in Toxicology 321
© 2005 by Taylor & Francis Group, LLC
derives from the fact that the former deal primarily with con-
generic chemicals while the latter are concerned with non-
congeneric ones. This is reflected by the fact that in medicinal
chemistry one is most frequently dealing with a specific recep-
tor or the active site of an enzyme (9). On the other hand, with
respect to toxicological phenomena, the same endpoint can
arise as a result of a multitude of pathways and can be caused
Figure 4 The projected marginal potency of 2,6-dichloro-p-pheny-
lenediamine. The carcinogenic potency inherent in toxicophore A is
greatly decreased by modulator B. A carcinogenic potency of 27.1
SAR units corresponds to a TD
50
value of 11.5 mmol=kg=day. That
potency is defined as ‘‘marginal’’ (see text).
322 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
by many different classes of chemicals (e.g., carcinogenesis,
development toxicity). Given that SAR methods used in toxi-
cology must be able to handle many different chemical classes
within a single data basis, it is essential that the method must
also be able to identify chemical structures that do not fall
within the domain shared by chemicals that give rise to a
common toxicophore. MULTICASE accomplishes this in two
Figure 5 The prediction of the lack of carcinogenicity of 2, 2
0
,5,5

0
-
tetrachlorobenzidine. Although the presence of toxicophore A
endows the molecule with carcinogenic potential, the presence of
the inactivating modulators B and C abolishes it.
Applications of Substructure-Based SAR in Toxicology 323
© 2005 by Taylor & Francis Group, LLC
ways: (a) by identifying differences in the molecular environ-
ment, and (b) by recognizing (‘‘unknown’’) structures that are
not present in the learning set under investigation.
The presence of ‘‘unknown’’ moieties may be recognized
in molecules that contain recognized toxicophores. In that
situation, they have the potential of being modulators which
Figure 6 The projected lack of carcinogenicity of anthranilic acid.
The carcinogenic potential associated with toxicophore A is negated
by a deactivating moiety D derived from non-carcinogens external
to the molecules associated with the toxicophore.
324 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
either augment or decrease the potential toxicity. Hence, the
presence of such a moiety might introduce an element of
uncertainty in the prediction. However, overall, that type of
uncertainty is taken into consideration when determining
the predictive performance of the model, especially when a
cross-validation approach is used.
Figure 7 Predicted carcinogenicity in rats of 3-(l,l,l,-trichloro-)
propyl-p-chloroaniline. The prediction is based on the toxicophore
shown in bold. The potency is modulated by (9

[water solubility]).

The potency of 63.1 units corresponds to a TD
50
value of
0.12 mmol=kg=day. The analogous 3 propyl-p-chloraniline has a
water solubility of 4.18 (i.e., it is less lipophilic) and this results in
a contribution of 37.4 for a projected potency of 49.5 SAR units
or a TD
50
value of 0.54 mmol=kg= day, i.e., the decreased lipophili-
city results in decreased carcinogenic potency.
Applications of Substructure-Based SAR in Toxicology 325
© 2005 by Taylor & Francis Group, LLC
On the other hand, chemicals may be devoid of identifi-
able toxicophores and still possess an ‘‘unknown’’ moiety
(Fig. 9). In that situation, the unknown could possibly be a
toxicophore that might endow the molecule with toxicological
potential. When faced with such a situation, it is advisable to
conduct a search for molecules external to the data set that
contains such a moiety and are also devoid of toxicophore to
determine whether they have been tested in the same or a
related assay system. Thus, for example, the chemical may
not have been tested for mutagenicity in Salmonella, but it
might have been tested for its ability to induce mutation in
E. coli WP2 uvrA or error-prone DNA repair (37,38,61). Meth-
ods for determining the relatedness of such assays have been
described (47,62). With respect to the molecule shown in
Fig. 9, it has been reported that carcinogenic arylamine deri-
vatives when substituted with sulfonates show decreased
intestinal absorption and hence abolish carcinogenicity
(63–66), thus decreasing the level of concern that the

substance in Fig. 9 is a carcinogen.
Figure 8 The identification of a moiety in 2,4-difluoro-N-methyla-
niline that is present once in the data set. However, the molecule
containing it (tetrafluoro-m-phenylenediamine) is a carcinogen with
aTD
50
value of 0.50 mmol=kg=day. Accordingly, this N-methylani-
line derivative must be examined further.
326 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
The identification of differences in the molecular envir-
onment is a more subtle exercise. It might derive from the
presence of a toxicophore and a warning by the program that
in the test substance it exists in a different milieu (Fig. 10). To
ascertain the appropriateness of that determination requires
the SAR system to be able to provide documentation, i.e., the
nature of the chemicals that give rise to the toxicophore. SAR
systems that cannot provide that information are at a disad-
vantage. Thus ‘‘human’’ examination of the difference in
environments between the test chemical described in Fig. 10
and the chemicals that gave rise to that toxicophore indicates
that indeed the environments are very different (Fig. 11) and
the program’s determination (Fig. 10) is warranted.
Figure 9 Prediction of the lack of carcinogenicity of 1,5 naphtha-
lenedisulfonic acid. However, the prediction has an element of
uncertainty because of the presence of the moieties ‘‘unknown’’ to
the model. It is known, however, that in other instances the
sulfonate moiety facilitates excretion and thereby inhibits carcino-
genicity. (From Refs. 63 to 66.)
Applications of Substructure-Based SAR in Toxicology 327

© 2005 by Taylor & Francis Group, LLC
On the other hand, the determination of differences in
environment reported in Fig. 12 may not be justified as the
test chemical, 18-Crown-6 ether, can be biotransformed to
an acyclic structure that bears similarities to the structures
that gave rise to the toxicophore (Fig. 13). Thus, in this
instance, the ‘‘human’’ expert overrules the SAR program’s
Figure 10 The prediction of the inability of 18-Crown ether-6 to
induce sister chromatid exchanges in vitro. The structure of 18-
crown ether-6 is shown in Fig. 11.
Figure 11 Structures which are the origin of the toxicophore
associated with the induction of sister chromatid exchanges (see
Fig. 10). The four structures are clearly different from 18-crown-6.
328 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
analysis and confirms the mutagenic potential of that
chemical.
Finally, even when the SAR program does not recognize
differences in environments, the ‘‘human’’ expert may do so.
Thus curcumin is predicted to induce a
2
m-globulin associated
nephropathy (67) by virtue of the presence of a toxicophore
(Fig. 14), which is present in six molecules of the data set,
all of which are inducers of a
2
m-globulin associated nephropa-
thy. The SAR program does not detect a difference in environ-
ment (Fig. 14). Yet, a comparison of the molecules in the
learning set with curcumin indicates that the molecular

environments are quite dissimilar (Fig. 15). In the absence
of experimental data regarding the induction of this nephro-
pathy by curcumin or structurally related molecules, the
Figure 12 The prediction of the potential of 18-crown ether-6 to
induce mutations at the tk
þ=
locus of mouse lymphoma cells. The
structure of 18-crown ether-6 as well as of the seven molecules that
gave rise to the toxicophore are shown in Fig. 13. For an explana-
tion of the structure of the toxicophore, see the legend in Table 1.
Applications of Substructure-Based SAR in Toxicology 329
© 2005 by Taylor & Francis Group, LLC
prediction (Fig. 14) is overruled. This illustrates the need to
examine the basis of all SAR predictions.
As an additional example, we might examine the pre-
dicted carcinogenicity in mice of epitholone A (Fig. 16).
Epitholone A, an inhibitor of tubulin polymerization, is a pro-
mising cancer chemotherapeutic adjunct that may have the
potential to replace Taxol
Õ
in situations where tumors have
become resistant to Taxol (68,69).
However, examination of the basis of the prediction of
carcinogenicity (Fig. 16) indicates that the molecules in the
learning set containing that toxicophore all contain other moi-
eties (ammo, hydrazine, nitro) (Fig. 17). Each of these has been
associated with carcinogenicity. Epitholone A does not contain
any of them. Thus, in this instance, the toxicophore, albeit it is
statistically significant, is in fact an artifact. Based upon these
analyses, the ‘‘human expert’’ would agree with the SAR

model-generated prediction which is accompanied by a warn-
ing regarding the ‘‘environment.’’ Obviously, in the above
Figure 13 Structures of molecules that are at the origin of the
toxicophore associated with the potential to induce mutations at
the tk
þ=
locus of mouse lymphoma cells.
330 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
examples, the human expertise can only be maximally effec-
tive if the SAR method provides the necessary documentation.
As mentioned previously, the predictive performance of
an SAR model is dependent upon the size and chemical diver-
sity of the chemicals in the learning set (56–58). It follows
that the number of predictions accompanied by ‘‘warnings’’
of the presence of ‘‘unknown’’ moieties will be a function of
the size of the learning set (57,58). This relationship can be
expressed as the informational content of an SAR model. It
is defined as 100 Percent of Predictions Accompanied by
‘‘Warnings.’’ In practice, this value is determined by challen-
ging a SAR model with 10,000 chemicals representing the
‘‘universe of chemicals’’ and determining the number of predic-
tions accompanied by such warnings (58). This also identifies
the prevalence in the ‘‘universe’’ of moieties absent from the
model and suggests that experimental data on such chemicals
be identified and the data included in a future model.
Since SAR programs in use in toxicology may consist of
prepackaged programs and include specific SAR models,
Figure 14 An example of a prediction subsequently overruled.
The SAR model predicts that curcumin induces a

2
m-globulin asso-
ciated nephropathy in male rats. However, a comparison of the
structure of curcumin with the structures of the six chemicals at
the origin of the toxicophore (see Fig. 15) indicates that they differ
significantly. In this instance, the human expert overruled the
model’s prediction.
Applications of Substructure-Based SAR in Toxicology 331
© 2005 by Taylor & Francis Group, LLC
there is a tendency among some users not to evaluate
further either the SAR paradigm resident therein or the
predictive performance of the resultant SAR model. This
may negate the usefulness of the methodology, its applic-
ability to a specific situation, and its regulatory acceptance
(6). Thus, not only must the predictive performance of a
model be known [i.e., concordance between experimental
and predicted results; sensitivity and specificity (determined
as previously described)], in order to make individual
Figure 15 Comparison of curcumin with the structures of chemi-
cals that contain the same toxicophore (see Fig. 14). The toxicophore
is shown in bold. A: curcumin; B: 3,5,5-trimethylhexanoic acid
(THMA); C: g-lactone of TMHA; D: 3,5,5-trimethylcyclohexanone;
E: methylisobutylketone; F: isophorone; G: isobutyl ketone. Chemi-
cals B–G have been determined experimentally to induce a
2
m-globu-
lin-mediated nephropathy.
332 Rosenkranz and Thampatty
© 2005 by Taylor & Francis Group, LLC
predictions, but also in applying the projections to hazard

identification purposes or for the purpose of devising
rational combinations of SAR models or of a SAR model
coupled with certain experimental assays so as to make
the exercise meaningful.
Moreover, in order to allow for maximal human input in
the analyses, it is not sufficient to receive a message that the
test molecule’s structure or domain is not fully covered by the
model. Even, if the program indicates that the test molecule
falls with the domain, this may need verification. Accordingly,
the human, expert must know the nature of the chemicals in
the learning set, for example, in Figs. 10–17.
Figure 16 Prediction of the carcinogenicity in mice of epitholone
A. The structure of epitholone A (toxicophore shown in bold) is given
in Fig. 17.
Applications of Substructure-Based SAR in Toxicology 333
© 2005 by Taylor & Francis Group, LLC

×