Tải bản đầy đủ (.pdf) (8 trang)

Prediction of the chromatographic hydrophobicity index with immobilized artificial membrane chromatography using simple molecular descriptors and artificial neural networks

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.39 MB, 8 trang )

Journal of Chromatography A 1660 (2021) 462666

Contents lists available at ScienceDirect

Journal of Chromatography A
journal homepage: www.elsevier.com/locate/chroma

Prediction of the chromatographic hydrophobicity index with
immobilized artificial membrane chromatography using simple
molecular descriptors and artificial neural networks
Krzesimir Ciura a,b,∗, Strahinja Kovacˇ evic´ c, Monika Pastewska a, Hanna Kapica a,
Martyna Kornela a, Wiesław Sawicki a
a

´ sk, Aleja Gen. Hallera 107, Gdan
´ sk 80-416, Poland
Department of Physical Chemistry, Medical University of Gdan
´ sk 80-172, Poland
QSAR Lab Ltd., Trzy Lipy 3St. Gdan
c
Department of Applied and Engineering Chemistry, Faculty of Technology Novi Sad, University of Novi Sad, Bulevar cara Lazara 1, Novi Sad 21000, Serbia
b

a r t i c l e

i n f o

Article history:
Received 22 September 2021
Revised 27 October 2021
Accepted 28 October 2021


Available online 5 November 2021
Keywords:
Quantitative structure–retention
relationships
Chemometrics
IAM-HPLC
Artificial neural networks

a b s t r a c t
Screening of physicochemical properties should be considered one of the essential steps in the drug discovery pipeline. Among the available methods, biomimetic chromatography with an immobilized artificial
membrane is a powerful tool for simulating interactions between a molecule and a biological membrane.
This study developed a quantitative structure–retention relationships model that would predict the chromatographically determined affinity of xenobiotics to phospholipids, expressed as a chromatographic hydrophobicity index determined using immobilized artificial membrane chromatography. A heterogeneous
set of 261 molecules, mostly showing pharmacological activity or toxicity, was analyzed chromatographically to realize this goal. The chromatographic analysis was performed using the fast gradient protocol
proposed by Valko, where acetonitrile was applied as an organic modifier. Next, quantitative structure–
retention relationships modeling was performed using multiple linear regression (MLR) methods and artificial neural networks (ANNs) coupled with genetic algorithm (GA)-inspired selection. Subsequently, the
selection of the best ANN was supported by statistical parameters, the sum of ranking differences approach with the comparison of rank by random numbers and hierarchical cluster analysis.
© 2021 The Author(s). Published by Elsevier B.V.
This is an open access article under the CC BY license ( />
1. Introduction
In the early stages of drug discovery, each drug candidate’s
physicochemical properties should be determined beyond the biological activity screening [1]. Biomimetic chromatography with
an immobilized artificial membrane (IAM) can be used to assess
affinity for phospholipids because phosphatidylcholine head groups
are present on the surface of the stationary phase [2,3]. Therefore
IAM–high-performance liquid chromatography (HPLC) can mimic
the lipid membrane monolayer. The first HPLC columns with an
IAM were introduced by Pidgeon et al. in 1989 [4]. Nowadays,
only one type of IAM column, IAM.PC.DD2, has been provided
by Regis Technologies. IAM-HPLC has been successfully applied
for phospholipid affinity studies of various drug classes, including

beta blockers [37], calcium channel blockers [38], local anaesthetics [39], biogenic amines [40] and sets of structurally non-related

Corresponding author at: Department of Physical Chemistry, Medical University
´
´
of Gdansk,
Aleja Gen. Hallera 107, Gdansk
80-416, Poland.
E-mail address: (K. Ciura).

basic, acidic and neutral drugs [5,31]. IAM-HPLC has also been applied to the prediction of complex biological properties, such as
blood–brain barrier permeability [5], oral absorption [6], volume
of distribution [7], skin permeation [8], and cardiotoxicity [9]. Furthermore, IAM-HPLC plays an essential role in toxicity and ecotoxicity studies [10]. It is worth emphasizing that among non-cellbased methods, chromatographic approaches hold great applicability for high-throughput screening because modern HPLC systems are highly automated and widely distributed by academia
and pharmaceutical companies [11,12].
This study proposes Quantitative Structure–Retention Relationships (QSRR) models, which allow the prediction of the chromatographically determined affinity of xenobiotics to phospholipids. A
chromatographic hydrophobicity index with an immobilized artificial membrane (CHIIAM ) was used as retention data to develop models that would give information allowing an easily interpretable and quick comparison of phospholipid affinities with commercially available drugs. A heterogeneous set of 261 molecules,
mostly showing pharmacological activity or toxicity, was analyzed
under IAM-HPLC conditions. QSRR models were constructed us-

/>0021-9673/© 2021 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license ( />

K. Ciura, S. Kovacˇ evi´c, M. Pastewska et al.

Journal of Chromatography A 1660 (2021) 462666

ing multiple linear regression (MLR) and artificial neural networks
(ANNs) coupled with genetic algorithm (GA)-inspired selection.

ANN regression modeling was carried out using the Statistica
version 12 program with an automated network search (ANS) approach. The modeling was done so the feedforward multilayer perceptron (MLP) and radial basis function (RBF) networks were obtained. The algorithm employed in the modeling was Broyden–

Fletcher–Goldfarb–Shanno (BFGS). The range of hidden units was
set between 3 and 30. The inputs were the same as the independent variables in the MLR model. In addition, the data were split
into training and test sets in the same way as in the MLR model.
Identity, logistic, tanh, exponential and sine functions were used
during the network’s training. The number of training cycles varied
depending on finding of the best network configuration. The total
number of trained networks was 10 0 0. ANN modeling was performed on the raw data because the modeling of the normalized
data did not result in any acceptable ANN model. Besides the formation of the training set and external test set, in the ANN modeling, it was necessary to split the training set into the additional
test set and validation set. This division was carried out by the
software, so the random sample sizes were 70% for the training
set, 15% for the test set, and 15% for the validation set. The generalization error was determined using the test set, whereas the
purpose of the validation set is to find the best ANN configuration and training parameters by comparing the validation set error
and training set error during the training procedure [17]. A global
sensitivity analysis (GSA) was carried out to determine the significance of each input variable in each ANN model based on the value
of the GSA index. Generally, if the GSA index is greater than 1, the
variable should be kept in the model [18].
The rigorous internal and external validation of the ANN models
was done by calculating the following statistical parameters: determination coefficient (R2 ), adjusted determination coefficient (R2 adj ),
cross-validation determination coefficient (R2 cv ), F-test, RMSE, and
standard deviation of the cross-validation (SDPRESS ). For the external validation, the following additional parameters were calculated: determination coefficient of the distribution of the residual (R2 res ), predictive squared correlation coefficients (Q2 F1 and
Q2 F2 ), concordance correlation coefficient (CCC), RMSEP, average absolute error (AAE), and standard deviation (SD). The external validation parameters were calculated using the XternalValidationPlus_1.2 program ( />The networks’ similarities were examined using hierarchical
cluster analysis (HCA) based on Ward’s algorithm and Euclidean
distances. The HCA was carried out on the whole dataset containing raw data predicted by each ANN model.
To rank, group and select the established ANN models, the sum
of ranking differences (SRD) approach with the comparison of rank
by random numbers (CRRN) and 7-fold cross-validation procedure
was applied [19]. The SRD analysis of the ANNs was performed
based on the raw average (reference ranking – consensus) of predicted values for each ANN model. The experimental values were
also included in the analysis to determine which ANN models can
be considered acceptable. (Models with smaller SRD values than

the experimental values are acceptable). The inclusion of the experimental values in the ranking provides a simple approach to
the selection of “good” and “bad” models. “Bad” models rationalize
the information in the data worse than the experimental values, so
their existence and application are not justified [19–21]. A detailed
explanation of the SRD methodology can be found elsewhere [19–
21].

2. Materials and methods
2.1. Reagents
Analytical reagents were used without further purification. Ammonium acetate and acetonitrile (suitable for HPLC, gradient grade,
≥ 99.9%) were obtained from Sigma Aldrich (Steinheim, Germany).
Ultrapure water (18.2 M × cm−1 ) used to prepare the mobile
phase was purified and deionised in our laboratory via a Millipore Direct-Q 3 UV Water Purification System (Millipore Corporation, Bedford, MA, USA). In our study, 261 compounds (listed in
Table S1) were selected as a model set of solutes. All substances of
appropriate purity and information about the suppliers are given
in a supplemental datasheet file. All investigated substances were
dissolved in dimethyl sulfoxide (DMSO), water or hexane at a
1 mg/mL concentration level and stored at 2–8 °C between analyses.
2.2. IAM chromatography
All IAM-HPLC experiments were performed using a
Prominence-1 LC-2030C 3D HPLC system (Shimadzu, Japan)
equipped with a diode-array detector (DAD) and controlled by the
LabSolution system (version 5.90, Shimadzu, Japan). The stock solutions of solutes were diluted to obtain 100 μg/mL concentrations,
and the injected volume was 5 μL. The chromatographic analyses
were carried out on an IAM.PC.DD2 column (10 × 4.6 mm; particle size 10.0 μm with an IAM guard column; Regis Technologies,
USA) according to procedures proposed by Valko and co-workers
[11,13,14]. Briefly, a linear gradient of 0–85% phase B (where phase
A was a 50 mm ammonium acetate buffer with an adjusted pH
of 7.4 and phase B was acetonitrile) was used at a flow rate of
1.5 mL/min. The temperature of the chromatographic column was

constant at 30.0 °C, and the analysis time was 6.5 min. The CHIIAM
indexes of the target solutes were obtained using a calibration
set of reference substances. Each IAM-HPLC analysis was run in
triplicate. All collected data are presented in a supplemented
datasheet file.
2.3. Advanced chemistry development labs descriptors
The Advanced Chemistry Development (ACD)/Labs software
from the Percepta Platform (PhysChem Module) was implemented
to characterize the molecular structures of the investigated substances. Considering the character of the predicted endpoint,
CHIIAM , the following ACD/Labs descriptors were selected: polar
surface area (PSA), molar volume (MV), hydrogen bond donors
(HDo), hydrogen bond acceptors (HAc), and polarisability. The
ACD/Labs software was accessed on the chemspider.com website
(20.05.2021). Calculated molecular descriptors of target solutes are
listed in the supplemental datasheet file.
2.4. Chemometrics analysis
GA-MLR modeling was done using QSARINS version 2.2.4 software [15,16]. The analyzed solutes were divided into two groups
before QSRR analysis—a training group (70%) and a testing group
(30%). Information about belonging to training or validation subsets is presented in the supplemental datasheet file. The models were assessed based on fit, robustness and predictive abilities and included R2 , Q2 and root mean square error (RMSE) of
cross-validation (RMSECV) coming from the leave-one-out crossvalidation technique, as well as RMSE in prediction (RMSEP) derived from external validation.

3. Results and discussion
The relationship between retention and the chemical structure
of analytes has attracted attention from the beginning of chromatographic research. Kaliszan initiated and introduced the particular type of QSPR analysis, namely the QSRR approach, in 1977
2


K. Ciura, S. Kovacˇ evi´c, M. Pastewska et al.

Journal of Chromatography A 1660 (2021) 462666


R2 = 0.550;Q2 loo = 0.513; RMSECV = 11.648; RMSEext = 10.277

[22]. Since that time, QSRR has proven to be a powerful tool in
chromatographic research [23,24]. Nevertheless, the very actual paper published by Wiczling pointed out the main limitation of QSRR
studies [25]. Usually, relatively small datasets of compounds (frequently considering congeneric molecules) are used for QSRR modeling. Furthermore, the data matrix often includes a considerable
collection of calculated theoretical descriptors. These descriptors
are often challenging to interpret or not interpret, particularly in
the case of analytically oriented researchers. Moreover, since the
molecules in the congeneric groups are highly similar, the proportional differences of the theoretical molecular descriptors for the
congeners are generally only slightly visible. Consequently, the calculation of descriptors should be done with great care because the
calculation error must be noticeably lower than the real variance
of that descriptor between the congeners. Considering all these
aspects, we have revitalised the approach to building QSAR models. First, we used only descriptors provided by ACD/Labs Percepta
Platform (PhysChem Module) that could be easily calculated and
interpreted. According to a study presented by Kubik and Wiczling, ACD-based descriptors showed similar precision and applicability in QSRR modeling as quantum chemistry–based descriptors [4]. Nevertheless, it should be emphasized that using ACDbased descriptors is significantly more user friendly than the quantum chemistry approach is. What is more, the developer regularly
updates the accuracy and precision of ACD/Labs software. Analysis of previously published QSRR models for IAM chromatography suggested that the ACD-based descriptors should cover the
molecule-stationary phase interaction, which is mainly governed
by the lipophilicity character solutes [5,26–29]. Still, several QSRR
models pointed out that H-bond descriptors and other descriptors
related to PSA and molecular volume affected the IAM partitioning
[30–32].
The retention of analytes in IAM-HPLC can be measured in various ways, depending on the elution method. Two of the most commonly used approaches are extrapolated logkwIAM parameters in
the case of isocratic elution and IAM chromatographic hydrophobicity indices (CHIIAM ) determined in gradient elution. CHI parameters were introduced by Valko et al. for lipophilicity assessment
and later adapted to IAM-HPLC conditions. Briefly, CHI/CHIIAM values determined using a fast-organic phase gradient are derived
from an assumption that analytes do not move in the chromatographic system until a suitable organic phase concentration
reaches the column, which starts eluting the analytes practically
within the dead time. The CHIIAM linearly depends on the retention time and ranges from 0 to 100, corresponding to the acetonitrile concentration in the mobile phase. One of the essential advantages of the fast gradient protocol compared with the isocratic approach is the rapid determination of (phospho)lipophilicity because
they avoid multiple isocratic measurements and extrapolation procedures [33]. Another study published by Valko and co-workers
indicated that CHIIAM procedures showed excellent batch-to-batch

repeatability [34]. We determined the affinities to phospholipids
of pharmaceutically and toxically relevant compounds using the
CHIIAM approach and applied this parameter for QSRR modeling for
the reasons presented above.
The first step of QSRR model construction was the selection of
ACD descriptors using the GA-MLR approach. This procedure aims
to select the molecular descriptors that express the most decisive
influence on the retention behavior in IAM-HPLC. The model set
was randomly split into training and examination groups. The best
model had four theoretical descriptors, as follows:

The ACD descriptors included in this model have a tremendous physicochemical sense. The most crucial parameter is the
lipophilicity-related descriptor, logDp.H 7.4 . This descriptor included
information about influences of ionization on a lipophilic character under physiological conditions, the same as an experimental conditions. Numbers of H-donors can be related to the interaction between analytes and phosphate groups presented on the
phosphatidylcholine structure. Similarly, PSA and molecular volume descriptors are frequently applied to modeling the molecular mechanism of retention in IAM chromatography. Although the
MLR model met the Tosphas criteria in terms of the Q2 value, it
did not exceed the required threshold of R2 . Nevertheless, both
RMSECV and RMSEext have acceptable values of 11.648 and 10.277,
respectively.
To obtain QSRR models with improved prediction ability, a nonlinear approach was applied. The same input data that were used
in the MLR modeling served as the input variables in the ANN
modeling. The only differences concerned the division of the examination set in the case of the ANN, which was divided into two
subsets—the test and validation sets.
3.1. Non-linear QSRR modeling: the ANN approach
The ANN modeling resulted in 10 0 0 networks; among these,
11 networks were distinguished, comprising six MLP networks and
five RBF networks. The most reliable networks were selected based
on the statistical parameters calculated by the program Statistica
version 12, and they were submitted for further statistical validation. The statistical parameters, network architectures, algorithms
and activation functions of the distinguished ANNs are presented

in Table 1.
The networks differ in the number of hidden neurons, whereas
there is the same number of neurons in the input layer (4 independent variables: HDo , LogD7.4 , PSA and MV ) and the output layer
(1 dependent variable: CHIIAM ). To evaluate the importance of all
input variables, the GSA indices were calculated. As can be seen
from Fig. 1, all the input variables have a GSA index greater than
1, meaning that all are justifiably included in all the ANN models.
In most of the ANNs, the PSA descriptor is characterized by the
highest GSA coefficient, meaning that in those networks, it has the
strongest influence on the network’s parameters. Another descriptor with a significantly high GSA index is MV. Therefore, comparing
the average values of GSA indices (Fig. 1), it can be said that in the
set of the established ANNs, the PSA and MV descriptors have a
dominant influence on the parameters of the network.
The statistical parameters given in Table 1 indicate that all the
ANNs have quite good statistical performance. Considering the statistical parameters of the training set, calculated by the NCSS 2007
program, it can be seen that all the networks have considerably
high R2 , R2 adj, and R2 cv coefficients and satisfactorily low values of
RMSE and SDPRESS parameters. The F-test values indicate that a very
good fitting of the experimental and predicted data was achieved
by all the ANNs. Considering all parameters, ANN11 can be selected
as the network that fits the data best. However, observing the external validation parameters, the situation is a bit different. Considering the external validation, it can be seen that some of the models failed to fulfill some criteria, such as CCC> 0.8, Q2 F1 , Q2 F2 > 0.6
and R2cv > 0.6 [35]. It can be observed that the parameters of error metrics (RMSEP, SD, RMSE, SDPRESS ) in the case of the external
validation set are higher than those of the training set. The fitting
of the experimental and predicted data of the external dataset is a
bit worse than in the case of the training dataset; however, it is in
the acceptable range. Considering all the prediction statistics and
based on consensus, ANN4 can be considered the model that al-

CHIIAM = 3.781(±1.567 )HDo + 3.079(±0.947 )LogD7.4
− 0.206(±0.063 )PSA + 0.097(±0.026 )MV

+ 12.021(±5.482 )
3


K. Ciura, S. Kovacˇ evi´c, M. Pastewska et al.

Journal of Chromatography A 1660 (2021) 462666

Table 1
Statistical parameters of the obtained ANNs.
Training set (n = 197)
Statistical parameters

ANN1

ANN2

ANN3

ANN4

ANN5

ANN6

ANN7

ANN8

ANN9


ANN10

ANN11

Architecture
Algorithm
Cycles No.
Hidden a.f.
Output a.f.
R2
R2 adj
R2 cv
F-test
RMSE
SDPRESS

4–29–1
RBF

Gauss
Ident.
0.7401
0.7387
0.7350
555.2
7.44
7.47

4–8–1

MLP
105
Log
Exp
0.7682
0.7670
0.7634
646.1
7.60
7.64

4–10–1
MLP
83
Tanh
Log
0.7612
0.7600
0.7567
621.6
7.33
7.36

4–9–1
MLP
77
Log
Exp
0.7462
0.7449

0.7413
573.2
7.74
7.76

4–23–1
RBF

Gauss
Ident.
0.7065
0.7050
0.7004
469.4
7.63
7.67

4–25–1
RBF

Gauss
Ident.
0.7102
0.7087
0.7038
477.7
7.75
7.79

4–29–1

RBF

Gauss
Ident.
0.7468
0.7455
0.7416
575.1
7.41
7.44

4–6–1
MLP
113
Tanh
Log
0.7604
0.7591
0.7559
618.8
7.56
7.59

4–29–1
RBF

Gauss
Ident.
0.7405
0.7391

0.7350
556.3
7.56
7.60

4–8–1
MLP
53
Tanh
Exp
0.7456
0.7443
0.7409
571.6
7.76
7.79

4–8–1
MLP
54
Tanh
Exp
0.7790
0.7697
0.7663
656.0
7.38
7.41

0.6247

0.6186
0.5961
0.2556
0.9097
0.6076
0.7780
9.18
5.85
7.12
103.2
7.93
8.11

0.5231
0.5153
0.4899
0.1689
0.8765
0.4633
0.7226
10.74
7.19
8.02
68.0
9.95
10.13

0.5850
0.5782
0.5581

0.2756
0.9019
0.5737
0.7554
9.57
6.21
7.33
87.3
8.25
8.38

0.6769
0.6714
0.6562
0.1192
0.9206
0.6548
0.8220
8.61
6.02
6.20
129.7
8.21
8.33

0.6468
0.6415
0.6216
0.3039
0.9177

0.6424
0.7898
8.77
5.52
6.84
113.7
7.39
7.53

0.6164
0.6100
0.5890
0.2579
0.9085
0.6045
0.7744
9.24
5.90
7.15
99.6
8.02
8.17

0.5986
0.5922
0.5732
0.2594
0.9048
0.5863
0.7650

9.43
6.02
7.30
92.5
8.22
8.34

0.6411
0.6346
0.6178
0.1959
0.9143
0.6275
0.7968
8.95
6.02
6.66
110.4
8.14
8.27

0.6087
0.6024
0.5820
0.2736
0.9074
0.5974
0.7690
9.30
6.11

7.05
96.4
8.01
8.15

0.6604
0.6547
0.6380
0.1324
0.9168
0.6381
0.8119
8.82
6.11
6.40
120.5
8.34
8.48

0.6545
0.6490
0.6317
0.2169
0.9184
0.6451
0.8030
8.73
5.24
7.02
117.5

7.83
7.95

External test set (n = 64)
R2
R2 adj
R2 cv
R2 res
Q2 F1
Q2 F2
CCC
RMSEP
SD
AAE
F-test
RMSE
SDPRESS

Fig. 1. GSA indices of the input variables for each ANN model and average values of GSA indices of each input variable.

lowed the best fit of the data from the external dataset, generating
the lowest prediction error (RMSEP).
The comparison between the experimental data and the data
predicted by the ANN4 and ANN11 models, as well as the distribution of the residuals for these models, are presented in Fig. 2.
The graphs for the rest of the models are given in supplementary
data as Fig. 1S. The external dataset is very well fitted to the training set. The amplitude of the residuals is in the acceptable range,
and their random distribution around the zero axis implies that
the prediction error is unpredictable. This is also confirmed by the
quite low R2 res values for each ANN model except ANN5, where
R2 res is considerably high (R2res > 0.3).


on the presented plot, it is quite difficult to estimate similarities
and dissimilarities among the models because they seem to have
the same distribution, and there is no ANN that can be considered
significantly better or worse than the other networks. In addition,
it is worth stressing that there were no outliers or extremes detected on the box-whisker plot.
In the next step of analyzing the networks’ similarity, HCA was
conducted. The results are presented in the form of a dendrogram
in Fig. 4. What can be first noticed on the dendrogram is that the
experimental values (EXP) are outside of any cluster and can be
considered outliers. Therefore, there is some considerable difference between the experimental CHIIAM values and CHIIAM values
predicted by the ANNs. On the dendrogram, it can be observed
that there are two main clusters. The first cluster contains ANN11,
ANN3, ANN8, ANN10, ANN4 and ANN2, whereas the second cluster comprises ANN6, ANN9, ANN7, ANN5 and ANN1. This separation into two clusters is particularly interesting because there are
only MLP networks in the first cluster, and in the second cluster,
there are only RBF networks. This separation can be an indicator of

3.2. Network similarity and ranking
The comparison of the ANNs, their ranking, and their selection
is a challenging but not impossible task. To preliminarily compare
the models (predicted CHIIAM values) together with the experimental (EXP) values, a box-whisker plot was generated (Fig. 3). based
4


K. Ciura, S. Kovacˇ evi´c, M. Pastewska et al.

Journal of Chromatography A 1660 (2021) 462666

Fig. 2. Comparison between the experimental CHIIAM parameters and CHIIAM parameters predicted by ANN4 and ANN11 and the distribution of the residuals for each model
(•, training set , external test set).


Fig. 3. Box-whisker plot of the experimental CHIIAM values and CHIIAM values predicted by the ANNs.

the crucial differences in the prediction ability of these two types
of networks. Indeed, the MLP networks use any non-linear function as an activation function, while in RBF networks, the activation function is a function of Euclidean distance between inputs
and weights, and it usually applies Gaussian activation functions;
there can also be more than one hidden layer in MLP networks,
whereas there is only one hidden and one output layer in RBF networks [17,35].

The main advantage of RBF networks is that they make more
robust predictions than MLP networks do; however, they have
more limited applications. In contrast, MLP networks are more vulnerable to adversarial noise and can sometimes make quite wrong
predictions, unlike RBF networks [35]. Considering the number
of hidden neurons, the architecture of the RBF networks in the
present study is more complex than the architecture of MLP networks.

5


K. Ciura, S. Kovacˇ evi´c, M. Pastewska et al.

Journal of Chromatography A 1660 (2021) 462666

Fig. 4. HCA of the experimental CHIIAM values and CHIIAM values predicted by the ANNs.

Table 2
The SRD ranking of the ANNs based on row average and p% intervals.
SRD-CRRN results

The data in Table 2 indicate that the smallest SRD value is in

ANN4, and this is the closest to the reference ranking, whereas
ANN6 has the highest SRD value, and it is placed the furthest from
the reference. All the networks can be considered acceptable because they have SRD values smaller than the SRD value of the experimental data. In addition, the probability that the models are of
a random character is negligible (p% intervals are between 2.82E09 and 4.79E-09). The separation of the MLP and RBF networks
is also observable in the graph in Fig. 5, as evident in the HCA
dendrogram. Here, the MLP networks (ANN11, ANN4, ANN8, ANN3,
ANN2, ANN10) are placed closer to the reference ranking compared
with the RBF networks (ANN1, ANN7, ANN5, ANN9). As an RBF network, ANN6 is clearly separated from the others and can be considered an outlier. All the networks are significantly distinguished
from the experimental data on the SRD graph. Although the HCA
and SRD methodologies have very different basics, the results are
very similar.
Considering the statistical parameters of the training and external test sets of the established networks (Table 1), the networks
ANN4 and ANN11 were previously suggested as the networks that
would be the most suitable for predicting CHIIAM parameters in
the analyzed set of compounds under the applied chromatographic
conditions. The results of SRD analysis pointed out that these two
networks are closest to the reference ranking and confirmed previous assumptions about their selection as the best ones.
To estimate the uncertainties of the SRD values of the ANNs,
7-fold cross-validation was applied, so one-seventh of the objects
were left out, and the ranking was carried out on the remaining
six-sevenths of the objects. The results of the cross-validation of
the SRD procedure are given in the form of a box-whisker plot in
Fig. 6. In the presented plot, the same separation of the networks
is observable as in Fig. 5. The MLP networks are closer to the reference ranking than the RBF networks are, which are separated by
a vertical dashed line. The ANN4 and ANN11 networks possess the
lowest median and are the best choice for predicting CHIIAM parameters. Considering its ranking value, the application of ANN6
should definitely be avoided. The cross-validation confirmed the
reliability of the conducted SRD procedure.

p%


Networks

SRD

x < SRD < y

ANN4
ANN11
ANN8
ANN3
ANN2
ANN10
ANN1
ANN7
ANN5
ANN9
ANN6
EXP
XX1
Q1
Med
Q3
XX19

2918
2944
2962
3104
3172

3272
3688
3742
3858
3968
4958
6436
21,138
22,042
22,652
23,190
24,094

2.82E-09
2.84E-09
2.86E-09
3.00E-09
3.06E-09
3.16E-09
3.56E-09
3.61E-09
3.72E-09
3.83E-09
4.79E-09
6.21E-09
4.96
24.82
49.76
74.99
94.97


2.82E-09
2.84E-09
2.86E-09
3.00E-09
3.06E-09
3.16E-09
3.56E-09
3.61E-09
3.73E-09
3.83E-09
4.79E-09
6.22E-09
5.03
25.06
50.06
75.23
95.05

To rank and group the ANNs, as well as to choose the most
reliable ones, the SRD method was applied in the following step.
The reference ranking was the average row values that represent a
consensus. The average would provide the most probable ranking;
however, it is not necessarily a bias-free solution [19]. Rather, it is
a solution that has less bias than the ranking based on any other
Refs. [19,36].
The ranking of the ANNs was done based on the matrix that
contained the CHIIAM values predicted by each ANN sorted in
columns with regard to the row average as the reference ranking.
The experimental CHIIAM values were also considered so that acceptable models could be determined. The networks that can be

considered acceptable have smaller SRD values than the experimental ones do. The results of the ranking are presented in Table 2
and Fig. 5.
6


K. Ciura, S. Kovacˇ evi´c, M. Pastewska et al.

Journal of Chromatography A 1660 (2021) 462666

Fig. 5. Ranking of ANNs by SRD and comparison of ranks by random numbers with row average as a reference ranking. The statistical characteristics of Gaussian fit are as
follows: first icosaile [5%], XX1 = 21,138; first quartile, Q1 = 22,042; median, Mediana [Med] = 22,652; last quartile, Q3 = 23,190; last icosaile (95%), XX19 = 24,094.

Fig. 6. Box-whisker plot of the seven-fold cross-validation of SRD procedure using row average as a reference ranking (consensus ranking). The Y-axis represents the SRD
values with uncertainties.

4. Conclusion

terpretation of data; in the writing of the manuscript, or in the
decision to publish the results.

Although several free and commercial programs can be applied
for lipophilicity prediction, estimation of the affinity to phospholipids is a serious loophole. The proposed models are dedicated
to application in the early steps of the drug discovery pipeline
when high throughput is more required than accuracy. Non-linear
modeling is more sustained in terms of predicting phospholipids’
affinity. Non-linear models can be used for fast screening of phospholipid affinity, which represents a significant gap in the current modeling of physicochemical properties of drug candidates.
Furthermore, in the literature, it is possible to find an extensive
database for CHIIAM values determined for active pharmaceutical
ingredients [7,9,33] and compare the calculated CHIIAM value for
designed or newly synthesized molecules with well-known drugs

targeting the same therapeutic goals. The proposed ANN4 and
ANN11 networks allow for a better selection of drug candidates, reducing the costs of late-stage attrition experiments. They are also
the first step in creating a tool for assessing affinity to phospholipids as a more biomimetic feature than classical lipophilicity.

CRediT authorship contribution statement
Krzesimir Ciura: Conceptualization, Writing – original draft,
Methodology, Supervision, Project administration, Formal analysis,
Investigation. Strahinja Kovacˇ evic´ : Visualization, Software, Writing – original draft, Formal analysis, Investigation, Methodology.
Monika Pastewska: Investigation. Hanna Kapica: Investigation.
Martyna Kornela: Investigation. Wiesław Sawicki: Funding acquisition.

Acknowledgements
This research was funded by the Ministry of Science and Higher
Education by means of ST3 02–0 0 03/07/518 statutory funds. We
also thank Prof. Paola Gramatica for free academic licences for the
use of the QSARINS software.

Declaration of Competing Interest

Supplementary materials

The authors declare no conflict of interest. The funders had no
role in the design of the study; in the collection, analyses, or in-

Supplementary material associated with this article can be
found, in the online version, at doi:10.1016/j.chroma.2021.462666.
7


K. Ciura, S. Kovacˇ evi´c, M. Pastewska et al.


Journal of Chromatography A 1660 (2021) 462666

References

[24] R. Kaliszan, QSRR: quantitative structure-(Chromatographic) retention relationships, Chem. Rev. 107 (2007) 3212–3246, doi:10.1021/cr068412z.
[25] P. Wiczling, A. Kamedulska, Ł. Kubik, Application of bayesian multilevel
modeling in the quantitative structure–retention relationship studies of heterogeneous compounds, Anal. Chem. 93 (2021) 6961–6971, doi:10.1021/acs.
analchem.0c05227.
[26] L. Grumetto, C. Carpentiero, P. Di Vaio, F. Frecentese, F. Barbato, Lipophilic and
polar interaction forces between acidic drugs and membrane phospholipids
encoded in IAM-HPLC indexes: their role in membrane partition and relationships with BBB permeation data, J. Pharm. Biomed. Anal. 75 (2013) 165–172,
doi:10.1016/j.jpba.2012.11.034.
[27] A. Taillardat-Bertschinger, C.A.M. Martinet, P.A. Carrupt, M. Reist, G. Caron,
R. Fruttero, B. Testa, Molecular factors influencing retention on immobilized artificial membranes (IAM) compared to partitioning in liposomes and n-octanol,
Pharm. Res. 19 (2002) 729–737, doi:10.1023/a:1016156927420.
[28] L. Grumetto, C. Carpentiero, F. Barbato, Lipophilic and electrostatic forces encoded in IAM-HPLC indexes of basic drugs: their role in membrane partition
and their relationships with BBB passage data, Eur. J. Pharm. Sci. 45 (2012)
685–692, doi:10.1016/j.ejps.2012.01.008.
[29] G. Russo, L. Grumetto, F. Barbato, G. Vistoli, A. Pedretti, Prediction and mechanism elucidation of analyte retention on phospholipid stationary phases (IAMHPLC) by in silico calculated physico-chemical descriptors, Eur. J. Pharm. Sci.
Off. J. Eur. Fed. Pharm. Sci. 99 (2017) 173–184, doi:10.1016/j.ejps.2016.11.026.
[30] L. Grumetto, C. Carpentiero, F. Barbato, Lipophilic and electrostatic forces encoded in IAM-HPLC indexes of basic drugs: their role in membrane partition
and their relationships with BBB passage data, Eur. J. Pharm. Sci. 45 (2012)
685–692, doi:10.1016/j.ejps.2012.01.008.
[31] G. Russo, L. Grumetto, F. Barbato, G. Vistoli, A. Pedretti, Prediction and mechanism elucidation of analyte retention on phospholipid stationary phases (IAMHPLC) by in silico calculated physico-chemical descriptors, Eur. J. Pharm. Sci.
99 (2017) 173–184, doi:10.1016/j.ejps.2016.11.026.
[32] C. Giaginis, A. Tsantili-Kakoulidou, Alternative measures of lipophilicity: from
octanol-water partitioning to IAM retention, J. Pharm. Sci. (2008), doi:10.1002/
jps.21244.
[33] K. Valko, S. Nunhuck, C. Bevan, M.H. Abraham, D.P. Reynolds, Fast gradient

HPLC method to determine compounds binding to human serum albumin. Relationships with octanol/water and immobilized artificial membrane lipophilicity, J. Pharm. Sci. 92 (2003) 2236–2248, doi:10.1002/jps.10494.
[34] K.L. Valko, S. Rava, S. Bunally, S. Anderson, Revisiting the application of immobilized artificial membrane (IAM) chromatography to estimate in vivo distribution properties of drug discovery compounds based on the model of marketed
drugs, ADMET DMPK 8 (2020) 78–97, doi:10.5599/admet.757.
[35] V. Goncalves, K. Maria, A.B.F. da Silv, Applications of artificial neural networks
in chemical problems, Artificial Neural Network Architecture Applied, InTech,
2013, doi:10.5772/51275.
´ S.O. Podunavac-Kuzmanovic,
´ L.R. Jevric,
´ E.A. Djurendic,
´ J.J. Aj[36] S.Z. Kovacˇ evic,
´ S.B. Gadžuric,
´ M.B. Vraneš, How to rank and discriminate artificial
dukovic,
neural networks? Case study: prediction of anticancer activity of 17-picolyl
and 17-picolinylidene androstane derivatives, J. Iran. Chem. Soc. 13 (2016) 499–
507, doi:10.1007/s13738-015-0759-9.
[37] Masucci, Caldwell, Foley, Comparison of the retention behavior of β -blockers
using immobilized artificial membrane chromatography and lysophospholipid
micellar electrokinetic chromatography, Journal of Chromatography A (1998),
doi:10.1016/S0 021-9673(98)0 0219-2.
[38] Barbato, La Rotonda, Quaglia, Cromatographic indices determined on an immobilized artificialmembrane (IAM) column as descriptors of lipophilic and
polar interactions of 4-phenyldihydropyridinecalcium-channel blockers with
biomembranes, Eur. J. Med. Chem (1996), doi:10.1016/s0014- 827x(98)00082- 2.
[39] Demare, Roy, Legendre, Actors ongerning the retention of solutes on chromatographic immobilized artificial membranes: Application to anti-inflammotory
and analgesic drugs, J. Liq. Chromatogr. Relat. Technol. (1999), doi:10.1081/
JLC-100102051.
[40] Amato, Barbato, Morrica, Quaglia, Rotonda, Interactions between Amines and
Phospholipids: A chromatographic study on immobilized artificial membrane
(IAM) stationary phases at various pH values, Helvetica Chimica Acta (20 0 0).
doi:10.1002/1522-2675(20001004)83:10%3C2836::AID-HLCA2836%3E3.0.CO;2-G.


[1] F. Tsopelas, C. Giaginis, A. Tsantili-Kakoulidou, Lipophilicity and biomimetic
properties to support drug discovery, Expert Opin. Drug Discov. 12 (2017) 885–
896, doi:10.1080/17460441.2017.1344210.
[2] A. Tsantili-Kakoulidou, How can we better realize the potential of immobilized artificial membrane chromatography in drug discovery and development? Expert Opin. Drug Discov. 15 (2020) 273–276, doi:10.1080/17460441.
2020.1718101.
[3] F. Tsopelas, T. Vallianatou, A. Tsantili-Kakoulidou, Advances in immobilized
artificial membrane (IAM) chromatography for novel drug discovery, Expert
Opin. Drug Discov. 11 (2016) 473–488, doi:10.1517/17460441.2016.1160886.
[4] C. Pidgeon, U.V. Venkataram, Immobilized artificial membrane chromatography: supports composed of membrane lipids, Anal. Biochem. 176 (1989) 36–47,
doi:10.1016/0 0 03- 2697(89)90269- 8.
[5] L. Grumetto, C. Carpentiero, P. Di Vaio, F. Frecentese, F. Barbato, Lipophilic and
polar interaction forces between acidic drugs and membrane phospholipids
encoded in IAM-HPLC indexes: their role in membrane partition and relationships with BBB permeation data, J. Pharm. Biomed. Anal. 75 (2013) 165–172,
doi:10.1016/j.jpba.2012.11.034.
[6] L. Grumetto, G. Russo, F. Barbato, Relationships between human intestinal absorption and polar interactions drug/phospholipids estimated by IAM-HPLC,
Int. J. Pharm. 489 (2015) 186–194, doi:10.1016/j.ijpharm.2015.04.062.
[7] S. Teague, K. Valko, How to identify and eliminate compounds with a risk of
high clinical dose during the early phase of lead optimization in drug discovery, Eur. J. Pharm. Sci. 110 (2017) 37–50, doi:10.1016/j.ejps.2017.02.017.
[8] M. Hidalgo-Rodríguez, S. Soriano-Meseguer, E. Fuguet, C. Ràfols, M. Rosés, Evaluation of the suitability of chromatographic systems to predict human skin
permeation of neutral compounds, Eur. J. Pharm. Sci. 50 (2013) 557–568,
doi:10.1016/j.ejps.2013.04.005.
[9] C. Stergiopoulos, F. Tsopelas, K. Valko, Prediction of hERG inhibition of drug
discovery compounds using biomimetic HPLC measurements, ADMET DMPK 9
(2021), doi:10.5599/admet.995.
[10] F. Tsopelas, C. Stergiopoulos, L.A. Tsakanika, M. Ochsenkühn-Petropoulou,
A. Tsantili-Kakoulidou, The use of immobilized artificial membrane chromatography to predict bioconcentration of pharmaceutical compounds, Ecotoxicol.
Environ. Saf. 139 (2017) 150–157, doi:10.1016/j.ecoenv.2017.01.028.
[11] K.L. Valko, Application of biomimetic HPLC to estimate in vivo behavior of early
drug discovery compounds, Futur. Drug Discov. 1 (2019) FDD11, doi:10.4155/

fdd- 2019- 0 0 04.
[12] K.L. Valkó, Lipophilicity and biomimetic properties measured by HPLC to support drug discovery, J. Pharm. Biomed. Anal. 130 (2016) 35–54, doi:10.1016/j.
jpba.2016.04.009.
[13] K. Valkó, Chromatographic hydrophobicity index by fast-gradient RP-HPLC: a
high-throughput alternative to log P/log D, Anal. Chem. 69 (1997) 2022–2029,
doi:10.1021/ac961242d.
[14] K.L. Valkó, Biomimetic chromatography to accelerate drug discovery: part I, J
LC-GC N. Am. 36 (2018) 397–405.
[15] P. Gramatica, N. Chirico, E. Papa, S. Cassani, S. Kovarich, QSARINS: a new software for the development, analysis, and validation of QSAR MLR models, J.
Comput. Chem. 34 (2013) 2121–2132, doi:10.1002/jcc.23361.
[16] P. Gramatica, S. Cassani, N. Chirico, QSARINS-chem: insubria datasets and new
QSAR/QSPR models for environmental pollutants in QSARINS, J. Comput. Chem.
35 (2014) 1036–1044, doi:10.1002/jcc.23576.
[17] K.L. Priddy, P.E. Keller, Artificial Neural Networks: An Introduction, SPIE, 2009,
doi:10.1117/3.633187.
[18] Inc. StatSoft, Electronic Statistics Textbook, StatSoft, Tulsa, OK, 2011 WEB http:
//www.statsoft.com/textbook/ .
[19] K. Héberger, Sum of ranking differences compares methods or models fairly,
TrAC 29 (2010) 101–109, doi:10.1016/j.trac.20 09.09.0 09.
[20] K. Kollár-Hunek, K. Héberger, Method and model comparison by sum of ranking differences in cases of repeated observations (ties), Chemom. Intell. Lab.
Syst. 127 (2013) 139–146, doi:10.1016/j.chemolab.2013.06.007.
[21] K. Héberger, K. Kollár-Hunek, Sum of ranking differences for method discrimination and its validation: comparison of ranks with random numbers, J.
Chemom. 25 (2011) 151–158, doi:10.1002/cem.1320.
[22] R. Kaliszan, Correlation between the retention indices and the connectivity
indices of alcohols and methyl esters with complex cyclic structure, Chromatographia 10 (1977) 529–531, doi:10.1007/BF02262911.
[23] P. Žuvela, M. Skoczylas, J.Jay Liu, T. Ba̧czek, R. Kaliszan, M.W. Wong,
B. Buszewski, Column characterization and selection systems in reversed-phase
high-performance liquid chromatography, Chem. Rev. 119 (2019) 3674–3729,
doi:10.1021/acs.chemrev.8b00246.


8



×