Machine learning in medicine

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.89 MB, 270 trang )

Machine Learning in Medicine

Ton J. Cleophas • Aeilko H. Zwinderman

Machine Learning
in Medicine
by
TON J. CLEOPHAS, MD, PhD, Professor,
Past-President American College of Angiology,
Co-Chair Module Statistics Applied to Clinical Trials,
European Interuniversity College of Pharmaceutical Medicine, Lyon, France,
Department Medicine, Albert Schweitzer Hospital, Dordrecht, Netherlands,
AEILKO H. ZWINDERMAN, MathD, PhD, Professor,
President International Society of Biostatistics,
Co-Chair Module Statistics Applied to Clinical Trials,
European Interuniversity College of Pharmaceutical Medicine, Lyon, France,
Department Biostatistics and Epidemiology, Academic Medical Center, Amsterdam,
Netherlands
With the help from
EUGENE P. CLEOPHAS, MSc, BEng,
HENNY I. CLEOPHAS-ALLERS.

Ton J. Cleophas
European College Pharmaceutical Medicine
Lyon, France

Aeilko H. Zwinderman
Department of Epidemiology

and Biostatistics
Academic Medical Center
Amsterdam
Netherlands

Please note that additional material for this book can be downloaded from

ISBN 978-94-007-5823-0
ISBN 978-94-007-5824-7 (eBook)
DOI 10.1007/978-94-007-5824-7
Springer Dordrecht Heidelberg New York London
Library of Congress Control Number: 2012954054
© Springer Science+Business Media Dordrecht 2013
This work is subject to copyright. All rights are reserved by the Publisher, whether the whole or part of
the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microfilms or in any other physical way, and transmission or information
storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology
now known or hereafter developed. Exempted from this legal reservation are brief excerpts in connection
with reviews or scholarly analysis or material supplied specifically for the purpose of being entered and
executed on a computer system, for exclusive use by the purchaser of the work. Duplication of this
publication or parts thereof is permitted only under the provisions of the Copyright Law of the Publisher’s
location, in its current version, and permission for use must always be obtained from Springer. Permissions
for use may be obtained through RightsLink at the Copyright Clearance Center. Violations are liable to
prosecution under the respective Copyright Law.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
While the advice and information in this book are believed to be true and accurate at the date of
publication, neither the authors nor the editors nor the publisher can accept any legal responsibility for
any errors or omissions that may be made. The publisher makes no warranty, express or implied, with

respect to the material contained herein.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

Preface

Machine learning is a novel discipline concerned with the analysis of large and multiple variables data. It involves computationally intensive methods, like factor analysis,
cluster analysis, and discriminant analysis. It is currently mainly the domain of
computer scientists, and is already commonly used in social sciences, marketing
research, operational research and applied sciences. It is virtually unused in clinical
research. This is probably due to the traditional belief of clinicians in clinical trials
where multiple variables are equally balanced by the randomization process and are
not further taken into account. In contrast, modern computer data files often involve
hundreds of variables like genes and other laboratory values, and computationally
intensive methods are required. This book was written as a hand-hold presentation
accessible to clinicians, and as a must-read publication for those new to the methods.
Some 20 years ago serious statistical analyses were conducted by specialist statisticians. Nowadays there is ready access for professionals without a mathematical
background to statistical computing using personal computers or laptops. At this
time we witness a second revolution in data-analysis. Computationally intensive
methods have been made available in user-friendly statistical software like SPSS
software (cluster and discriminant analysis since 2000, partial correlations analysis
since 2005, neural networks algorithms since 2006). Large and multiple variables
data, although so far mainly the domain of computer scientists, are increasingly
accessible for professionals without a mathematical background. It is the authors’
experience as master class professors, that students are eager to master adequate
command of statistical software. For their benefit, most of the chapters include all
of the steps of the novel methods from logging in to the final results using SPSS
statistical software. Also for their benefit, SPSS data files of the examples used in
the various chapters are available at extras.springer.com.

The authors have given special efforts for all chapters to have their own introduction, discussion, and references sections. They can, therefore, be studied separately
and without the need to read the previous chapters first. In addition to the analysis
steps of the novel methods explained from data examples, also background information and clinical relevance information of the novel methods is given, and this is
done in an explanatory rather than mathematical manner.
v

vi

Preface

We should add that the authors are well-qualified in their field. Professor
Zwinderman is president of the International Society of Biostatistics (2012–2015)
and Professor Cleophas is past-president of the American College of Angiology
(2000–2012). From their expertise they should be able to make adequate selections
of modern methods for clinical data analysis for the benefit of physicians, students,
and investigators. The authors have been working and publishing together for over
10 years, and their research of statistical methodology can be characterized as a
continued effort to demonstrate that statistics is not mathematics but rather a discipline at the interface of biology and mathematics. The authors are not ware of any
other work published so far that is comparable with the current work, and, therefore,
believe that it does fill a need.

Contents

1

Introduction to Machine Learning ........................................................
1 Summary .............................................................................................
1.1 Background ...............................................................................

1.2 Objective and Methods .............................................................
1.3 Results and Conclusions ...........................................................
2 Introduction .........................................................................................
3 Machine Learning Terminology .........................................................
3.1 Artificial Intelligence ................................................................
3.2 Bootstraps .................................................................................
3.3 Canonical Regression................................................................
3.4 Components ..............................................................................
3.5 Cronbach’s alpha.......................................................................
3.6 Cross-Validation........................................................................
3.7 Data Dimension Reduction .......................................................
3.8 Data Mining ..............................................................................
3.9 Discretization ............................................................................
3.10 Discriminant Analysis...............................................................
3.11 Eigenvectors ..............................................................................
3.12 Elastic Net Regression ..............................................................
3.13 Factor Analysis .........................................................................
3.14 Factor Analysis Theory .............................................................
3.15 Factor Loadings ........................................................................
3.16 Fuzzy Memberships ..................................................................
3.17 Fuzzy Modeling ........................................................................
3.18 Fuzzy Plots ................................................................................
3.19 Generalization ...........................................................................
3.20 Hierarchical Cluster Analysis ...................................................
3.21 Internal Consistency Between the Original Variables
Contributing to a Factor in Factor Analysis ..............................
3.22 Iterations ...................................................................................
3.23 Lasso Regression ......................................................................

1

1
1
1
1
2
4
4
4
4
4
4
5
5
5
5
5
6
6
6
6
7
7
7
7
7
7
8
8
8
vii

viii

2

Contents

3.24 Latent Factors............................................................................
3.25 Learning ....................................................................................
3.26 Learning Sample .......................................................................
3.27 Linguistic Membership Names .................................................
3.28 Linguistic Rules ........................................................................
3.29 Logistic Regression ...................................................................
3.30 Machine Learning .....................................................................
3.31 Monte Carlo Methods ...............................................................
3.32 Multicollinearity or Collinearity ...............................................
3.33 Multidimensional Modeling......................................................
3.34 Multilayer Perceptron Model ....................................................
3.35 Multivariate Machine Learning Methods..................................
3.36 Multivariate Method..................................................................
3.37 Network.....................................................................................
3.38 Neural Network .........................................................................
3.39 Optimal Scaling ........................................................................
3.40 Overdispersion, Otherwise Called Overfitting ..........................
3.41 Partial Correlation Analysis ......................................................
3.42 Partial Least Squares .................................................................
3.43 Pearson’s Correlation Coefficient (R) .......................................
3.44 Principal Components Analysis ................................................
3.45 Radial Basis Functions..............................................................

3.46 Radial Basis Function Network ................................................
3.47 Regularization ...........................................................................
3.48 Ridge Regression ......................................................................
3.49 Splines .......................................................................................
3.50 Supervised Learning .................................................................
3.51 Training Data ............................................................................
3.52 Triangular Fuzzy Sets ...............................................................
3.53 Universal Space.........................................................................
3.54 Unsupervised Learning .............................................................
3.55 Varimax Rotation ......................................................................
3.56 Weights......................................................................................
4 Discussion ...........................................................................................
5 Conclusions .........................................................................................
Reference ..................................................................................................

8
8
9
9
9
9
9
9
9
10
10
10
10
10
10

10
11
11
11
11
12
12
12
12
12
12
12
13
13
13
13
13
13
14
14
15

Logistic Regression for Health Profiling ...............................................
1 Summary .............................................................................................
1.1 Background ...............................................................................
1.2 Methods and Results .................................................................
1.3 Conclusions ...............................................................................
2 Introduction .........................................................................................
3 Real Data Example .............................................................................
4 Discussion ...........................................................................................

17
17
17
17
17
18
19
21

Contents

ix

5 Conclusions .........................................................................................
References .................................................................................................

23
24

3

Optimal Scaling: Discretization .............................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective and Methods...............................................................
1.3 Results ........................................................................................
1.4 Conclusions ................................................................................
2 Introduction .........................................................................................

3 Some Terminology ..............................................................................
3.1 Cross-Validation .........................................................................
3.2 Discretization .............................................................................
3.3 Elastic Net Regression ...............................................................
3.4 Lasso Regression ........................................................................
3.5 Overdispersion, Otherwise Called Overfitting ...........................
3.6 Monte Carlo Methods ................................................................
3.7 Regularization ............................................................................
3.8 Ridge Regression .......................................................................
3.9 Splines ........................................................................................
4 Some Theory .......................................................................................
5 Example ..............................................................................................
6 Discussion ...........................................................................................
7 Conclusion ..........................................................................................
8 Appendix: Datafile of 250 Subjects Used as Example .......................
References .................................................................................................

25
25
25
25
25
26
26
28
28
28
28
28
28

28
28
29
29
29
29
30
32
32
37

4

Optimal Scaling: Regularization Including Ridge, Lasso,
and Elastic Net Regression .....................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods ......................................................................................
1.4 Results ........................................................................................
1.5 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Some Terminology ..............................................................................
3.1 Discretization .............................................................................
3.2 Splines ........................................................................................
3.3 Overdispersion, Otherwise Called Overfitting ...........................
3.4 Regularization ............................................................................
3.5 Ridge Regression .......................................................................
3.6 Monte Carlo Methods ................................................................
3.7 Cross-Validation .........................................................................

3.8 Lasso Regression ........................................................................
3.9 Elastic Net Regression ...............................................................

39
39
39
39
39
39
40
40
40
40
41
41
41
41
41
41
41
42

x

Contents

4 Some Theory .......................................................................................
5 Example ..............................................................................................
6 Discussion ...........................................................................................

7 Conclusions .........................................................................................
8 Appendix: Datafile of 250 Subjects Used as Example .......................
References .................................................................................................

42
42
46
47
47
53

5

Partial Correlations ................................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods ......................................................................................
1.4 Results ........................................................................................
1.5 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Some Theory .......................................................................................
4 Case-Study Analysis ...........................................................................
5 Discussion ...........................................................................................
6 Conclusions .........................................................................................
References .................................................................................................

55
55
55

55
55
55
56
56
57
61
63
64
64

6

Mixed Linear Models..............................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods and Results ..................................................................
1.4 Conclusions ................................................................................
2 Introduction .........................................................................................
3 A Placebo-Controlled Parallel Group Study
of Cholesterol Treatment ....................................................................
4 A Three Treatment Crossover Study of the Effect
of Sleeping Pills on Hours of Sleep ....................................................
5 Discussion ...........................................................................................
6 Conclusion ..........................................................................................
References .................................................................................................

65
65

65
65
65
66
66

7

Binary Partitioning .................................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods and Results ..................................................................
1.4 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Example ..............................................................................................
4 ROC (Receiver Operating Characteristic) Method
for Finding the Best Cut-off Level ......................................................

67
69
75
76
76
79
79
79
79
79
80

80
80
81

Contents

xi

5 Entropy Method for Finding the Best Cut-off Level...........................
6 Discussion ...........................................................................................
7 Conclusions .........................................................................................
References .................................................................................................

82
84
85
86

8

Item Response Modeling ........................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods ......................................................................................
1.4 Results ........................................................................................
1.5 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Item Response Modeling, Principles ..................................................

4 Quality of Life Assessment.................................................................
5 Clinical and Laboratory Diagnostic-Testing .......................................
6 Discussion ...........................................................................................
7 Conclusions .........................................................................................
7.1 We Conclude ..............................................................................
References .................................................................................................

87
87
87
87
87
87
88
88
89
90
93
94
97
97
97

9

Time-Dependent Predictor Modeling....................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods and Results ..................................................................

1.4 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Cox Regression Without Time-Dependent Predictors ........................
4 Cox Regression with a Time-Dependent Predictor.............................
5 Cox Regression with a Segmented Time-Dependent Predictor ..........
6 Multiple Cox Regression with a Time-Dependent Predictor ..............
7 Discussion ...........................................................................................
8 Conclusions .........................................................................................
8.1 We Conclude ..............................................................................
References .................................................................................................

99
99
99
99
99
100
100
101
103
106
108
109
110
111
111

10

Seasonality Assessments .........................................................................

1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective and Methods...............................................................
1.3 Results ........................................................................................
1.4 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Autocorrelations..................................................................................
4 Examples .............................................................................................

113
113
113
113
113
114
114
114
116

xii

Contents

5 Discussion ........................................................................................... 119
6 Conclusions ......................................................................................... 125
References ................................................................................................. 125
11

12

13

Non-Linear Modeling .............................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Results and Conclusions ............................................................
2 Introduction .........................................................................................
3 Testing for Linearity ...........................................................................
4 Logit and Probit Transformations .......................................................
5 “Trial and Error” Method, Box Cox Transformation,
Ace/Avas Packages .............................................................................
6 Sinusoidal Data ...................................................................................
7 Exponential Modeling.........................................................................
8 Spline Modeling ..................................................................................
9 Loess Modeling...................................................................................
10 Discussion ...........................................................................................
11 Conclusions .........................................................................................
12 Appendix .............................................................................................
References .................................................................................................

127
127
127
127
127
128
128
131

Artificial Intelligence, Multilayer Perceptron Modeling .....................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods and Results ..................................................................
1.4 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Historical Background ........................................................................
4 The Back Propagation (BP) Neural Network: The Computer
Teaches Itself to Make Predictions .....................................................
5 A Real Data Example..........................................................................
6 Discussion ...........................................................................................
7 Conclusions .........................................................................................
References .................................................................................................

145
145
145
145
145
146
146
146

Artificial Intelligence, Radial Basis Functions .....................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods ......................................................................................

1.4 Results ........................................................................................
1.5 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Example ..............................................................................................

157
157
157
157
157
157
158
158
159

133
134
135
135
139
141
141
142
143

147
148
152
153
154

Contents

14

15

xiii

4 Radial Basis Function Analysis ..........................................................
5 Discussion ...........................................................................................
6 Conclusions .........................................................................................
References .................................................................................................

159
164
165
165

Factor Analysis ........................................................................................
1 Summary .............................................................................................
1.1 Background ...............................................................................
1.2 Objective and Methods .............................................................
1.3 Results .......................................................................................
1.4 Conclusions ...............................................................................
2 Introduction .........................................................................................
3 Some Terminology ..............................................................................
3.1 Internal Consistency Between the Original
Variables Contributing to a Factor ............................................

3.2 Cronbach’s Alpha......................................................................
3.3 Multicollinearity or Collinearity ...............................................
3.4 Pearson’s Correlation Coefficient (R) .......................................
3.5 Factor Analysis Theory .............................................................
3.6 Magnitude of Factor Value for Individual Patients ...................
3.7 Factor Loadings ........................................................................
3.8 Varimax Rotation ......................................................................
3.9 Eigenvectors ..............................................................................
3.10 Iterations ...................................................................................
3.11 Components ..............................................................................
3.12 Latent Factors............................................................................
3.13 Multidimensional Modeling......................................................
4 Example ..............................................................................................
5 Making Predictions for Individual Patients,
Health Risk Profiling...........................................................................
6 Discussion ...........................................................................................
7 Conclusions .........................................................................................
8 Appendix: Data File of 200 Patients Admitted Because of Sepsis .....
References .................................................................................................

167
167
167
167
167
168
168
168

Hierarchical Cluster Analysis for Unsupervised Data ........................

1 Summary .............................................................................................
1.1 Background ...............................................................................
1.2 Objective ...................................................................................
1.3 Methods.....................................................................................
1.4 Results .......................................................................................
1.5 Conclusions ...............................................................................
2 Introduction to Hierarchical Cluster Analysis ....................................
2.1 A Novel Approach to Drug Efficacy Testing ............................
2.2 A Novel Statistical Methodology Suitable for the Purpose ......
2.3 Publications So Far ...................................................................
2.4 Objective of the Current Chapter ..............................................

168
169
169
169
169
170
170
171
171
171
172
172
172
172
175
176
176
177

181
183
183
183
183
183
183
184
184
184
185
185
185

xiv

Contents

3

Case-Study ..........................................................................................
3.1 Example......................................................................................
3.2 Data Analysis .............................................................................
4 Discussion ...........................................................................................
4.1 Multifactorial Nature of Drug Efficacy ......................................
4.2 Theoretical Background of Novel Methodology .......................
4.3 Flexibility of Hierarchical Cluster Analysis ..............................
4.4 Comparison with Other Machine Learning Methods.................
5 Conclusions .........................................................................................

References .................................................................................................

185
185
186
189
189
192
192
193
193
195

16

Partial Least Squares ..............................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods ......................................................................................
1.4 Results ........................................................................................
1.5 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Some Theory .......................................................................................
3.1 Principal Components Analysis .................................................
3.2 Partial Least Squares Analysis ...................................................
4 Example ..............................................................................................
4.1 Principal Components Analysis .................................................
4.2 Partial Least Squares Analysis ...................................................
4.3 Comparison of the Two Models .................................................

5 Discussion ...........................................................................................
6 Conclusions .........................................................................................
7 Appendix: Datafile of the Example Used in the Present Chapter .......
References .................................................................................................

197
197
197
197
197
198
198
198
199
200
200
201
201
203
204
205
206
207
212

17

Discriminant Analysis for Supervised Data..........................................
1 Summary .............................................................................................
1.1 Background ................................................................................

1.2 Objective ....................................................................................
1.3 Methods ......................................................................................
1.4 Results ........................................................................................
1.5 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Some Theory .......................................................................................
4 Case-Study ..........................................................................................
5 Discussion ...........................................................................................
6 Conclusions .........................................................................................
7 Appendix: Data File ............................................................................
References .................................................................................................

215
215
215
215
215
215
216
216
217
218
221
222
223
224

Contents

xv

18

Canonical Regression..............................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods ......................................................................................
1.4 Result..........................................................................................
1.5 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Some Theory .......................................................................................
4 Example ..............................................................................................
5 Discussion ...........................................................................................
6 Conclusions .........................................................................................
7 Appendix: Datafile of the Example Used in the Present Chapter .......
References .................................................................................................

225
225
225
225
225
225
226
226
227
227
232

233
234
239

19

Fuzzy Modeling .......................................................................................
1 Summary .............................................................................................
1.1 Background ................................................................................
1.2 Objective ....................................................................................
1.3 Methods and Results ..................................................................
1.4 Conclusions ................................................................................
2 Introduction .........................................................................................
3 Some Fuzzy Terminology ...................................................................
3.1 Fuzzy Memberships ...................................................................
3.2 Fuzzy Plots .................................................................................
3.3 Linguistic Membership Names ..................................................
3.4 Linguistic Rules .........................................................................
3.5 Triangular Fuzzy Sets.................................................................
3.6 Universal Space ..........................................................................
4 First Example, Dose-Response Effects of Thiopental
on Numbers of Responders .................................................................
5 Second Example, Time-Response Effect of Propranolol
on Peripheral Arterial Flow.................................................................
6 Discussion ...........................................................................................
7 Conclusions .........................................................................................
References .................................................................................................

241
241

241
241
241
242
242
243
243
243
243
243
243
243

247
251
252
252

Conclusions ..............................................................................................
1 Introduction .........................................................................................
2 Limitations of Machine Learning .......................................................
3 Serendipities and Machine Learning ..................................................
4 Machine Learning in Other Disciplines and in Medicine ...................
5 Conclusions .........................................................................................
References .................................................................................................

255
255
255
256

256
256
257

20

243

Index ................................................................................................................. 259

Chapter 1

Introduction to Machine Learning

1 Summary
1.1 Background
Traditional statistical tests are unable to handle large numbers of variables. The
simplest method to reduce large numbers of variables is the use of add-up scores.
But add-up scores do not account the relative importance of the separate variables,
their interactions and differences in units. Machine learning can be defined as
knowledge for making predictions as obtained from processing training data through
a computer. If data sets involve multiple variables, data analyses will be complex,
and modern computationally intensive methods will have to be applied for
analysis.

1.2 Objective and Methods
The current book, using real data examples as well as simulated data, reviews
important methods relevant for health care and research, although little used in the
field so far.

1.3 Results and Conclusions
1.One of the first machine learning methods used in health research is logistic
regression for health profiling where single combinations of x-variables are used
to predict the risk of a medical event in single persons (Chap. 2).
2. A wonderful method for analyzing imperfect data with multiple variables is optimal
scaling (Chaps. 3 and 4).
T.J. Cleophas and A.H. Zwinderman, Machine Learning in Medicine,
DOI 10.1007/978-94-007-5824-7_1, © Springer Science+Business Media Dordrecht 2013

1

2

1 Introduction to Machine Learning

3. Partial correlations analysis is the best method for removing interaction effects
from large clinical data sets (Chap. 5).
4.Mixed linear modeling (1), binary partitioning (2), item response modeling
(3), time dependent predictor analysis (4) and autocorrelation (5) are linear or
loglinear regression methods suitable for assessing data with respectively
repeated measures (1), binary decision trees (2), exponential exposure-response
relationships (3), different values at different periods (4) and those with seasonal
differences (5), (Chaps. 6, 7, 8, 9, and 10).
5. Clinical data sets with non-linear relationships between exposure and outcome
variables require special analysis methods, and can usually also be adequately
analyzed with neural networks methods like multi layer perceptron networks,
and radial basis functions networks (Chaps. 11, 12, and 13).
6. Clinical data with multiple exposure variables are usually analyzed using analysis of (co-) variance (AN(C)OVA), but this method does not adequately account

the relative importance of the variables and their interactions. Factor analysis
and hierarchical cluster analysis account for all of these limitations (Chaps. 14
and 15).
7.Data with multiple outcome variables are usually analyzed with multivariate
analysis of (co-) variance (MAN(C)OVA). However, this has the same limitations
as ANOVA. Partial least squares analysis, discriminant analysis, and canonical
regression account all of these limitations (Chaps. 16, 17, and 18).
8.Fuzzy modeling is a method suitable for modeling soft data, like data that
are partially true or response patterns that are different at different times
(Chap. 19).

2 Introduction
Traditional statistical tests are unable to handle large numbers of variables. The
simplest method to reduce large numbers of variables is the use of add-up scores.
But add-up scores do not account the relative importance of the separate variables,
their interactions and differences in units.
Principal components analysis and partial least square analysis, hierarchical
cluster analysis, optimal scaling and canonical regression are modern computationally
intensive methods, currently often listed as machine learning methods. This is
because the computations they make are far too complex to perform without the
help of a computer, and because they turn imputed information into knowledge,
which is in human terms a kind of learning process.
An additional advantage is that the novel methods are able to account all of
the limitations of the traditional methods. Although widely used in the fields of
behavioral sciences, social sciences, marketing, operational research and applied
sciences, they are virtually unused in medicine. This is a pity given the omnipresence
of large numbers of variables in this field of research. However, this is probably just

2 Introduction

3

a matter of time, now that the methods are increasingly available in SPSS statistical
software and many other packages.
We will start with logistic regression for health profiling where single combinations of x-variables are used to predict the risk of a medical event in single persons
(Chap. 2). Then in the Chaps.3 and 4 optimal scaling, a wonderful method for
analyzing imperfect data, and in Chap. 5 partial correlations, the best method for
removing interaction effects, will be reviewed. Mixed linear modeling (1), binary
partitioning (2), item response modeling (3), time dependent predictor analysis
(4) and autocorrelation (5) are linear or loglinear regression methods suitable
for assessing data with respectively repeated measures (1), binary decision trees
(2), exponential exposure-response relationships (3), different values at different
periods (4) and those with seasonal differences(5), (Chaps. 6, 7, 8, 9, and 10).
Methods for data with non-linear relationships between exposure and outcome
variables will be reviewed in the Chaps. 11, 12, and 13, entitled respectively nonlinear relationships, artificial intelligence using multilayer perceptron, and artificial
intelligence using radial basis functions. Data with multiple exposure variables are
usually analyzed using analysis of (co-) variance (AN(C)OVA), but this method
does not adequately account the relative importance of the variables and their interactions. Also, it rapidly looses power with large numbers of variables relative to the
numbers of observations and with strong correlations between the dependent variables.
And with large numbers of variables the design matrix may cause integer overflow,
too many levels of components, numerical problems with higher order interactions,
and commands may not be executed. Factor analysis (Chap. 14) and hierarchical
cluster analysis (Chap. 15) account for all of these limitations. Data with multiple
outcome variables are usually analyzed with multivariate analysis of (co-) variance
(MAN(C)OVA). However, this has the same limitations as ANOVA, and, in addition,
with a positive correlation between the outcome variables it is often powerless. The
Chaps. 16, 17, and 18 review methods that account for these limitations. These
methods are respectively partial least squares analysis, discriminant analysis, and
canonical regression. Finally Chap. 19 explains fuzzy modeling as a method for

modeling soft data, like data that are partially true or response patterns that are
different at different times.
A nice thing about the novel methodologies, thus, is that, unlike the traditional
methods like ANOVA and MANOVA, they not only can handle large data files
with numerous exposure and outcome variables, but also can do it in a relatively
unbiased way.
The current book serves as an introduction to machine learning methods in clinical
research, and was written as a hand-hold presentation accessible to clinicians, and
as a must-read publication for those new to the methods. It is the authors’ experience,
as master class professors, that students are eager to master adequate command of
statistical software. For their benefit all of the steps of the novel methods from
logging in to the final result using SPSS statistical software will be given in most of
the chapters. We will end up this initial chapter with some machine learning
terminology.

1 Introduction to Machine Learning

4

3 Machine Learning Terminology
3.1 Artificial Intelligence
Engineering method that simulates the structures and operating principles of the
human brain.

3.2 Bootstraps
Machine learning methods are computationally intensive. Computers make use of
bootstraps, otherwise called “random sampling from the data with replacement”,
in order to facilitate the calculations. Bootstraps is a Monte Carlo method.

3.3 Canonical Regression
Multivariate method. ANOVA / ANCOVA (analysis of (co)variance) and MANOVA /
MANCOVA (multivariate analysis of (co)variance) are the standard methods for the
analysis of data with respectively multiple independent and dependent variables.
A problem with these methods is, that they rapidly lose statistical power with
increasing numbers of variables, and that computer commands may not be executed
due to numerical problems with higher order calculations among components. Also,
clinically, we are often more interested in the combined effects of the clusters of
variables than in the separate effects of the different variables. As a simple solution
composite variables can be used as add-up sums of separate variables, but add-up
sums do not account the relative importance of the separate variables, their interactions, and differences in units. Canonical analysis can account all of that, and, unlike
MANCOVA, gives, in addition to test statistics of the separate variables, overall test
statistics of entire sets of variables.

3.4 Components
The term components is often used to indicate the factors in a factor analysis, e.g.,
in rotated component matrix and in principle component analysis.

3.5 Cronbachs alpha

alpha =

ổ
s2
k
ìỗ l - å 2 i
( k - 1) è
sT

ư
÷
ø

3 Machine Learning Terminology

5

K = number of original variables
s2i = variance of i-th original variable
s2T = variance of total score of the factor obtained by summing up all of the original
variables

3.6 Cross-Validation
Splitting the data into a k-fold scale and comparing it with a k-1 fold scale.
Assessment of test-retest reliability of the factors in factor analysis (see internal
consistency between the original variables contributing to a factor in factor
analysis).

3.7 Data Dimension Reduction
Factor analysis term used to describe what it does with the data.

3.8 Data Mining
A field at the intersection of computer science and statistics, It attempts to discover
patterns in large data sets.

3.9 Discretization
Converting continuous variables into discretized values in a regression model.

3.10 Discriminant Analysis
Multivariate method. It is largely identical to factor analysis but goes one step further.
It includes a grouping predictor variable in the statistical model, e.g. treatment
modality. The scientific question “is the treatment modality a significant predictor
of a clinical improvement” is, subsequently, assessed by the question “is the outcome clinical improvement a significant predictor of the odds of having had a
particular treatment.” This reasoning may seem incorrect, using an outcome for
making predictions, but, mathematically, it is no problem. It is just a matter of linear
cause-effect relationships, but just the other way around, and it works very conveniently with “messy” outcome variables like in the example given.

6

1 Introduction to Machine Learning

3.11 Eigenvectors
Eigenvectors is a term often used with factor analysis. The R-values of the original
variables versus novel factors are the eigenvalues of the original variables, their
place in a graph the eigenvectors. The scree plot compares the relative importance
of the novel factors, and that of the original variables using their eigenvector
values.

3.12 Elastic Net Regression
Shrinking procedure similar to lasso, but made suitable for larger numbers of
predictors.

3.13 Factor Analysis
Two or three unmeasured factors are identified to explain a much larger number of
measured variables.

3.14 Factor Analysis Theory

ALAT (alanine aminotransferase), ASAT (aspartate aminotransferase) and gammaGT
(gamma glutamyl tranferase) are a cluster of variables telling us something about a
patient’s liver function, while ureum, creatinine and creatininine clearance tell us
something about the same patient’s renal function. In order to make m
orbidity/
mortality predictions from such variables, often, multiple regression is used.
Multiple regression is appropriate, only, if the variables do not correlate too much
with one another, and a very strong correlation, otherwise called collinearity or
multicollinearity, will be observed within the above two clusters of variables.
This means, that the variables cannot be used simultaneously in a regression model,
and, that an alternative method has to be used. With factor analysis all of the
variables are replaced with a limited number of novel variables, that have the largest
possible correlation coefficients with all of the original variables. It is a multivariate
technique, somewhat similar to MANOVA (multivariate analysis of variance), with
the novel variables, otherwise called the factors, as outcomes, and the original variables, as predictors. However, it is less affected by collinearity, because the y- and
x-axes are used to present the novel factors in an orthogonal way, and it can be
shown that with an orthogonal relationship between two variables the magnitude of
their covariance is zero and has thus not to be taken into account. Factor analysis
constructs latent predictor variables from manifest predictor variables, and in this

3 Machine Learning Terminology

7

way it can be considered univariate method, but mathematically it is a multivariate
method, because multiple rather than single latent variables are constructed from
the predictor data available.

3.15 Factor Loadings

The factor loadings are the correlation coefficients between the original variables
and the estimated novel variable, the latent factor, adjusted for all of the original
variables, and adjusted for eventual differences in units.

3.16 Fuzzy Memberships
The universal spaces are divided into equally sized parts called membership
functions.

3.17 Fuzzy Modeling
A method for modeling soft data, like data that are partially true or response patterns
that are different at different times.

3.18 Fuzzy Plots
Graphs summarizing the fuzzy memberships of (for example) the imput values.

3.19 Generalization
Ability of a machine learning algorithm to perform accurately on future data.

3.20 Hierarchical Cluster Analysis
It is based on the concept that cases (patients) with closely connected characteristics
might also be more related in other fields like drug efficacy. With large data it is a
computationally intensive method, and today commonly classified as one of the

8

1 Introduction to Machine Learning

methods of explorative data mining. It may be more appropriate for drug efficacy
analyses than other machine learning methods, like factor analysis, because the

patients themselves rather than some new characteristics are used as dependent
variables.

3.21 I nternal Consistency Between the Original Variables
Contributing to a Factor in Factor Analysis
A strong correlation between the answers given to questions within one factor is
required: all of the questions should, approximately, predict one and the same thing.
The level of correlation is expressed as Cronbach’s alpha: 0 means poor, 1 perfect
relationship. The test-retest reliability of the original variables should be assessed
with one variable missing: all of the data files with one missing variable should
produce at least for 80% the same result as that of the non-missing data file
(alphas > 80%).

3.22 Iterations
Complex mathematical models are often laborious, so that even modern computers
have difficulty to process them. Software packages currently make use of a technique called iterations: five or more calculations are estimated and the one with the
best fit is chosen.

3.23 Lasso Regression
Shrinking procedure slightly different from ridge regression, because it shrinks the
smallest b-values to 0.

3.24 Latent Factors
The term latent factors is often used to indicate the factors in a factor analysis. They
are called latent, because they are not directly measured but rather derived from the
original variables.

3.25 Learning
This term would largely fit the term “fitting” in statistics.

3 Machine Learning Terminology

9

3.26 Learning Sample
Previously observed outcome data which are used by a neural network to learn to
predict future outcome data as close to the observed values as possible.

3.27 Linguistic Membership Names
Each fuzzy membership is given a name, otherwise called linguistic term.

3.28 Linguistic Rules
The relationships between the fuzzy memberships of the imput data and those of the
output data.

3.29 Logistic Regression
Very similar to linear regression. However, instead of linear regression where both
dependent and independent variable are continuous, logistic regression has a binary
dependent variable (being a responder or non-responder), which is measured as the
log odds of being a responder.

3.30 Machine Learning
Knowledge for making predictions, obtained from processing training data through
a computer. Particularly modern computationally intensive methods are increasingly
used for the purpose.

3.31 Monte Carlo Methods
Iterative testing in order to find the best fit solution for a statistical problem.

3.32 Multicollinearity or Collinearity
There should not be a strong correlation between different original variable values
in a conventional linear regression. Correlation coefficient (R) > 0.80 means the
presence of multicollinearity and, thus, of a flawed multiple regression analysis.

10

1 Introduction to Machine Learning

3.33 Multidimensional Modeling
An y- and x-axis are used to represent two factors. If a third factor existed within a
data file, it could be represented by a third axis, a z-axis creating a 3-d graph. Also
additional factors can be added to the model, but they cannot be presented in a 2- or
3-d drawing, but, just like with multiple regression modeling, the software programs
have no problem with multidimensional calculations similar to the above 2-d
calculations.

3.34 Multilayer Perceptron Model
Neural network consistent of multiple layers of artificial neurons that after having
received a signal beyond some threshold propagates it forward to the next later.

3.35 Multivariate Machine Learning Methods
The methods that always include multiple outcome variables. They include discriminant analysis, canonical regression, and partial least squares.

3.36 Multivariate Method
Statistical analysis method for data with multiple outcome variables.

3.37 Network
This term would largely fit the term “model” in statistics.

3.38 Neural Network
Distribution-free method for data modeling based on layers of artificial neurons that
transmit imputed information.

3.39 Optimal Scaling
The problem with linear regression is that consecutive levels of the variables are
assumed to be equal, but in practice this is virtually never true. Optimal scaling is a

Machine learning in medicine

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về