Tải bản đầy đủ (.pdf) (10 trang)

Data Mining and Knowledge Discovery Handbook, 2 Edition part 23 doc

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (383.98 KB, 10 trang )

200 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni
Variable Description Type State description
Region
Hoh birth region Nominal England, Scotland and Wales
Ad fems
No of adult females Ordinal 0, 1, ≥ 2
Ad males
No of adult males Ordinal 0, 1, ≥2
Children
No of children Ordinal 0, 1, 2, 3, ≥4
Hoh age
Age of Hoh Numeric 17-36; 36-50; 50-66; 66-98
Hoh gend
Gender of Hoh Nominal M, F
Accomod
Accommodation Nominal Room, Flat, House, Other
Bedrms
No of bedrooms Ordinal 1, 2, 3, ≥4
Ncars
No of cars Ordinal 1, 2, 3, ≥4
Tenure
House status Nominal Rent, Owned, Soc-Sector
Hoh reslen
Length of residence Numeric 0-3; 3-9; 9-19; ≥ 19 (months)
Hoh origin
Hoh ethnicity Nominal Caucas., Black, Chin., Indian, Other
Hoh status
Status of Hoh Nominal Active, Inactive, Retired
Table 10.2. Description of the variables used in the analysis. Hoh denotes the Head of the
Household. Numbers of adult males, females and children refer to the household.
of the household increases. The dependency of the gender of the household head on


the ethnic group shows that Blacks have the smallest probability of having a male
head of the household (64%) while Indians have the largest probability (89%). Other
interesting discoveries are that the age of the head of the household depends directly
on the number of adult males and females and shows that households with no fe-
males and two or more males are more likely to be headed by a young male, while
on the other hand, households with no males and two or more females are headed
by a mid age female. There appear to be more single households headed by an elder
female than an elder male. Also the composition of the household changes in the
ethnic groups and Indians have the smallest probability of living in a household with
no adult males (10%), while Blacks have the largest probability (32%).
By propagating the network, one may investigate other undirected associations
and discover that, for example, the typical Caucasian mid family with two children
has 77% chance of being headed by a male who, with probability .57, is aged be-
tween 36 and 50 years. The probability that the head of the household is active is
.84, and the probability that the household is in an owned house is .66. Results of
these queries are displayed in Figure 10.11. These figures are slightly different if the
head of the household is, for example, Black and the probability that the head of the
household is male (given that there are two children in the household) is only .62
and the probability that he is active is .79. If the head of the household is Indian, then
the probability that he is male is .90, and the probability that he is active is .88. On
average, the ethnic group changes slightly the probability of the household being in
an accommodation provided by the social service (26% for Blacks, 23% for Chinese,
20% Indians and 24% Caucasians). Similarly, Black household heads are more likely
to be inactive than household heads from different ethnic groups (16% Blacks, 10%
Indians, 14% Caucasians and Chinese) and to be living in a less wealthy household,
as shown by the larger probability of living in accommodations with a smaller num-
10 Bayesian Networks 201
Fig. 10.11. An example of probabilistic reasoning using the Bayesian network induced from
the 13 variables extracted from the 1996 General Household Survey.
ber of bedrooms and of having a smaller number of cars. The overall picture is that of

households headed by a Black to be less wealthy than others, and this would be the
conclusions one reaches if the gender of the head of the household is not taken into
account. However, the dependency structure discovered shows that the gender of the
head of the household and the number of adult females make all the other variables
independent of the ethnic group. Thus, the extracted model supports the hypothesis
that differences in the household wealth are more likely explained by the different
household composition, and in particular by the gender of the head of the household,
rather than racial factors.
10.6.2 Customer Profiling
A typical problem of direct mail fund raising campaigns is the low response rate.
Recent studies have shown that adding incentives or gifts in the mailing can increase
the response rate. This is the strategy implemented by an American Charity in the
June ’97 renewal campaign. The mailing included a gift of personalized name and
address labels plus an assortment of 10 note cards and envelopes. Each mail cost the
charity 0.68 dollars and resulted in a response rate of about 5% in the group of so
called lapsed donors, that is, individuals who made their last donation more than a
year before the ’97 renewal mail. Since the donations received by the respondents
ranged between 2 and 200 dollars, and the median donation was 13 dollars, the fund
raiser needed to decide when it was worth sending the renewal mail to a donor, on
202 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni
the basis of the information available about him from the in-house database. Fur-
thermore, the charity was interested in strategies to recapture Lapsed Donors and,
therefore, in making a profile from which to understand motivations behind their
lack of response.
We addressed these issues in (Sebastiani et al., 2000) by building two causal
models. The first model captured the dependency of the probability of response to
the mailing campaign on the independent variables in the database. The second one
modeled the dependency of the dollar amount of the gift and it was built by us-
ing only the 5% respondents to the ’97 mailing campaign. We focused here on the
first model, depicted in Figure 10.12, which shows that the probability of a donation

(variable Target-B in the top-left corner) is directly affected by the wealth rating
(variable Wealth1) and the donor’s neighborhood (variable Domain1). The net-
work shows that, marginally, only 5% of those who received the renewal mail are
likely to respond. Persons living in suburbs, cities or towns have about 5% probabil-
ity of responding, while donors living in rural or urban neighborhoods respond with
probability 5%. The wealth rating of the donor neighborhood has a positive effect
on the response rate of donors living in urban, suburban or city areas with donors
living in wealthier neighborhoods being more likely to respond than donors living
in poorer neighborhoods. The probability of responding raises up to about 6% for
donors living in wealth city neighborhoods. The variable Domain1 is closely related
to the variable Domain2 that represents an indicator of the socio-economic status of
the donor neighborhood and it shows that donors living in suburbs or city are more
likely to live in neighborhoods having a highly rated socio-economic status. There-
fore, they may be more sensitive to political and social issues. The model also shows
that donors living in neighborhoods with a high presence of males active in the Mil-
itary (Malemili) are more likely to respond. Again, since the charity collects funds
for military veterans, this fact supports the hypothesis that sensitivity to the problem
for which funds are collected has a large effect on the probability of response. On
the other hand, the wealth rating of donors living in rural neighborhood has the op-
posite effect: the higher the wealth rating, the smaller the probability that the donor
responds, and the least likely to respond (3.8%) are donors living in wealth rural
areas. A curiosity is that persons living in rural and poor neighborhood are more
likely to respond positively to mail including a gift than donors living in wealthy city
neighborhood.
By querying the network, we can profile respondents who are more likely to
live in a wealth neighborhood, which is located in a suburb and they are less likely
to have made a donation in the last 6 months than those who do not respond. One
feature that discriminates respondents from nonrespondent is the household income,
and respondents are 1.20 times more likely to be living in wealthy neighborhoods,
and to be on higher income than nonrespondents.

10 Bayesian Networks 203
Fig. 10.12. The Bayesian network induced from the data warehouse to profile likely respon-
dents to mail solicitations.
10.7 Conclusions and Future Research Directions
Bayesian networks are a representation formalism born at the intersection of statistics
and Artificial Intelligence. Thanks to their solid statistical foundations, they have
been successfully turned into a powerful Data Mining and knowledge discovery tool
able to uncover complex models of interactions from large databases. Their high
symbolic nature makes them easily understandable to human operators. Contrary
to standard classification methods, Bayesian networks do not require the preliminary
identification of an outcome variable of interest but they are able to draw probabilistic
inferences on any variable in the database.
Notwithstanding these attractive properties, there are still several theoretical is-
sues that limit the range of applicability of Bayesian networks to the practice of
science and engineering. This chapter has described methods to learn Bayesian net-
works from databases with either discrete or continuous variables. How to induce
Bayesian networks from databases containing both types of variables is still very
much an open research issues. Imposing the assumption that discrete variables can
only be parent nodes in the network, but cannot be children of any continuous Gaus-
sian node leads to a closed form solution for the computation of the marginal likeli-
hood (Lauritzen, 1992). This property has been applied, for example, to model-based
clustering by (Ramoni et al., 2002), and it is commonly used in classification prob-
lems (Cheeseman and Stutz, 1996). However, this restriction can quickly become
unrealistic and greatly limit the set of models to explore. As a consequence, common
204 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni
practice is still to discretize continuous variables with possible loss of information,
particularly when the continuous variables are highly skewed.
Another challenging research issue is how to learn Bayesian networks from in-
complete data. The received view of the effect of missing data on statistical inference
is based on the approach described by Rubin in (Rubin, 1987). This approach clas-

sifies the missing data mechanism as ignorable or not, according to whether the data
are missing completely at random (MCAR), missing at random (MAR), or informa-
tively missing (IM). According to this approach, data are MCAR if the probability
that an entry is missing is independent of both observed and unobserved values.
They are MAR if this probability is at most a function of the observed values in the
database and, in all other cases, data are IM. The received view is that, when data
are either MCAR or MAR, the missing data mechanism is ignorable for parame-
ter estimation, but it is not when data are IM. An important but overlooked issue
is whether the missing data mechanism generating data that are MAR is ignorable
for model selection (Rubin, 1996, Sebastiani and Ramoni, 2001A). We have shown
that this is not the case for regression type graphical models exemplified and in-
troduced two approaches to model selection with partially ignorable missing data
mechanisms: ignorable imputation and model folding. Contrary to standard impu-
tation schemes (Geiger et al., 1995, Little and Rubin, 1987, Schafer, 1997, Tanner,
1996,Thibaudeau and Winler, 2002), ignorable imputation accounts for the missing-
data mechanism and produces, asymptotically, a proper imputation model as defined
by Rubin (Rubin, 1987, Rubin et al., 1995). However, the computation effort can
be very demanding and model folding is a deterministic method to approximate the
exact marginal likelihood that reaches high accuracy at a low computational cost,
because the complexity of the model search is not affected by the presence of incom-
plete cases. Both ignorable imputation and model folding reconstruct a completion
of the incomplete data by taking into account the variables responsible for the miss-
ing data. This property is in agreement with the suggestion put forward in (Heitjan
and Rubin, 1991, Little and Rubin, 1987, Rubin, 1976) that the variables responsi-
ble for the missing data should be kept in the model. However, our approach allows
us to also evaluate the likelihoods of models that do not depend explicitly on these
variables.
Although this work provides the analytical foundations for a proper treatment
of missing data when the inference task is model selection, it is limited to the very
special situation in which only one variable is partially observed, data are supposed

to be only MCAR or MAR, and the set of Bayesian networks is limited to those
in which the partially observed variable is a child of the other variables. Research
is needed to extend these results to the more general graphical structures, in which
several variables can be partially observed and data can be MCAR, MAR or IM.
These two issues — learning mixed variables networks and handling incomplete
databases — are still unsolved and they offer challenging research opportunities.
10 Bayesian Networks 205
Acknowledgments
This work was supported in part by the National Science Foundation (ECS-0120309),
the Spanish State Office of Education and Universities, the European Social Fund and
the Fulbright Program of the US State Department.
References
S. G. Bottcher and C. Dethlefsen. Deal: A package for learning Bayesian networks. Available
from 2003.
U. M. Braga-Neto and E. R. Dougerthy. Is cross-validation valid for small-sample microarray
classification. Bioinformatics, 20:374–380, 2004.
E. Castillo, J. M. Gutierrez, and A. S. Hadi. Expert Systems and Probabilistic Network
Models. Springer, New York, NY, 1997.
E. Charniak. Belief networks without tears. AI Magazine, pages 50–62, 1991.
P. Cheeseman and J. Stutz. Bayesian classification (AutoClass): Theory and results. In
U. M. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy, editors, Advances in
Knowledge Discovery and Data Mining, pages 153–180. MIT Press, Cambridge, MA,
1996.
J. Cheng and M. Druzdzel. AIS-BN: An adaptive importance sampling algorithm for evi-
dential reasoning in large Bayesian networks. J Artif Intell Res, 13:155–188, 2000.
D. M. Chickering. Learning equivalence classes of Bayesian-network structures. J Mach
Learn Res, 2:445–498, February 2002.
G. F. Cooper. The computational complexity of probabilistic inference using Bayesian belief
networks. aij, 42:297–346, 1990.
G. F. Cooper and E. Herskovitz. A Bayesian method for the induction of probabilistic net-

works from data. Mach Learn, 9:309–347, 1992.
R. G. Cowell, A. P. Dawid, S. L. Lauritzen, and D. J. Spiegelhalter. Probabilistic Networks
and Expert Systems. Springer, New York, NY, 1999.
A. P. Dawid and S. L. Lauritzen. Hyper Markov laws in the statistical analysis of decompos-
able graphical models. Ann Stat, 21:1272–1317, 1993. Correction ibidem, (1995), 23,
1864.
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, New York, NY,
1973.
N. Friedman. Inferring cellular networks using probabilistic graphical models. Science,
303:799–805, 2004.
N. Friedman, D. Geiger, and M. Goldszmidt. Bayesian network classifiers. Mach Learn,
29:131–163, 1997.
N. Friedman and D. Koller. Being Bayesian about network structure: A
Bayesian approach to structure discovery in bayesian networks. Machine Learning,
50:95–125, 2003.
N. Friedman, K. Murphy, and S. Russell. Learning the structure of dynamic probabilistic
networks. In Proceedings of the 14th Annual Conference on Uncertainty in Artificial In-
telligence (UAI-98), pages 139–147, San Francisco, CA, 1998. Morgan Kaufmann Pub-
lishers.
D. Geiger and D. Heckerman. Learning gaussian networks. In Proceedings of the Tenth
Annual Conference on Uncertainty in Artificial Intelligence (UAI-94), San Francisco,
1994. Morgan Kaufmann.
206 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni
D. Geiger and D. Heckerman. A characterization of Dirichlet distributions through local and
global independence. Ann Stat, 25:1344–1368, 1997.
A. Gelman, J. B. Carlin, H. S. Stern, and D. B. Rubin. Bayesian Data Analysis. Chapman
and Hall, London, UK, 1995.
S. Geman and D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesian restora-
tion of images. IEEE T Pattern Anal, 6:721–741, 1984.
W. R. Gilks and G. O. Roberts. Strategies for improving MCMC. In W. R. Gilks, S. Richard-

son, and D. J. Spiegelhalter, editors, Markov Chain Monte Carlo in Practice, pages 89–
114. Chapman and Hall, London, UK, 1996.
C. Glymour, R. Scheines, P. Spirtes, and K. Kelly. Discovering Causal Structure: Artifi-
cial Intelligence, Philosophy of Science, and Statistical Modeling. Academic Press, San
Diego, CA, 1987.
I. J. Good. Rational decisions. J Roy Stat Soc B, 14:107–114, 1952.
I. J. Good. The Estimation of Probability: An Essay on Modern Bayesian Methods. MIT
Press, Cambridge, MA, 1968.
D. J. Hand. Construction and Assessment of Classification Rules. Wiley, New York, NY,
1997.
D. J. Hand, N. M. Adams, and R. J. Bolton. Pattern Detection and Discovery. Springer, New
York, 2002.
D. J. Hand, H. Mannila, and P. Smyth. Principles of Data Mining. MIT Press, Cambridge,
2001.
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer-
Verlag, New York, 2001.
D. Heckerman. Bayesian networks for Data Mining. Data Min Knowl Disc, 1:79–119, 1997.
D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The combi-
nations of knowledge and statistical data. Mach Learn, 20:197–243, 1995.
D. F. Heitjan and D. B. Rubin. Ignorability and coarse data. Ann Stat, 19:2244–2253, 1991.
R. E. Kass and A. Raftery. Bayes factors. J Am Stat Assoc, 90:773–795, 1995.
P. Langley, W. Iba, and K. Thompson. An analysis of Bayesian classifiers. In Proceedings
of the Tenth National Conference on Artificial Intelligence, pages 223–228, Menlo Park,
CA, 1992. AAAI Press.
P. Larranaga, C. Kuijpers, R. Murga, and Y. Yurramendi. Learning Bayesian network struc-
tures by searching for the best ordering with genetic algorithms. IEEE T Pattern Anal,
26:487–493, 1996.
S. L. Lauritzen. Propagation of probabilities, means and variances in mixed graphical asso-
ciation models. J Am Stat Assoc, 87(420):1098–108, 1992.
S. L. Lauritzen. Graphical Models. Oxford University Press, Oxford, UK, 1996.

S. L. Lauritzen and D. J. Spiegelhalter. Local computations with probabilities on graphical
structures and their application to expert systems (with discussion). J Roy Stat Soc B,
50:157–224, 1988.
R. J. A. Little and D. B. Rubin. Statistical Analysis with Missing Data. Wiley, New York,
NY, 1987.
D. Madigan and A. E. Raftery. Model selection and accounting for model uncertainty in
graphical models using Occam’s window. J Am Stat Assoc, 89:1535–1546, 1994.
D. Madigan and G. Ridgeway. Bayesian data analysis for Data Mining. In Handbook of
Data Mining, pages 103–132. MIT Press, 2003.
D. Madigan and J. York. Bayesian graphical models for discrete data. Int Stat Rev, pages
215–232, 1995.
10 Bayesian Networks 207
P. McCullagh and J. A. Nelder. Generalized Linear Models. Chapman and Hall, London,
2nd edition, 1989.
A. O’Hagan. Bayesian Inference. Kendall’s Advanced Theory of Statistics. Arnold, London,
UK, 1994.
J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of plausible inference.
Morgan Kaufmann, San Francisco, CA, 1988.
M. Ramoni, A. Riva, M. Stefanelli, and V. Patel. An ignorant belief network to forecast
glucose concentration from clinical databases. Artif Intell Med, 7:541–559, 1995.
M. Ramoni and P. Sebastiani. Bayesian methods. In Intelligent Data Analysis. An Introduc-
tion, pages 131–168. Springer, New York, NY, 2nd edition, 2003.
M. Ramoni, P. Sebastiani, and I.S. Kohane. Cluster analysis of gene expression dynamics.
Proc Natl Acad Sci USA, 99(14):9121–6, 2002.
L. Rokach, M. Averbuch, and O. Maimon, Information retrieval system for medical narra-
tive reports. Lecture notes in artificial intelligence, 3055. pp. 217-228, Springer-Verlag
(2004).
D. B. Rubin. Inference and missing data. Biometrika, 63:581–592, 1976.
D. B. Rubin. Multiple Imputation for Nonresponse in Survey. Wiley, New York, NY, 1987.
D. B. Rubin. Multiple imputation after 18 years. J Am Stat Assoc, 91:473–489, 1996.

D. B. Rubin, H. S. Stern, and V. Vehovar. Handling “don’t know” survey responses: the case
of the Slovenian plebiscite. J Am Stat Assoc, 90:822–828, 1995.
M. Sahami. Learning limited dependence Bayesian classifiers. In Proceeding of the 2 Int.
Conf. On Knowledge Discovery & Data Mining, 1996.
J. L. Schafer. Analysis of Incomplete Multivariate Data. Chapman and Hall, London, UK,
1997.
P Sebastiani, M Abad, and M F Ramoni. Bayesian networks for genomic analysis. In E R
Dougherty, I Shmulevich, J Chen, and Z J Wang, editors, Genomic Signal Processing
and Statistics, Series on Signal Processing and Communications. EURASIP, 2004.
P. Sebastiani and M. Ramoni. Analysis of survey data with Bayesian networks. Technical
Report, Knowledge Media Institute, The Open University, Walton Hall, Milton Keynes
MK7 6AA, 2000. Available from authors.
P. Sebastiani and M. Ramoni. Bayesian selection of decomposable models with incomplete
data. J Am Stat Assoc, 96(456):1375–1386, 2001A.
P. Sebastiani and M. Ramoni. Common trends in european school populations. Res. Offic.
Statist., 4(1):169–183, 2001B.
P. Sebastiani and M. F. Ramoni. On the use of Bayesian networks to analyze survey data.
Res. Offic. Statist., 4:54–64, 2001C.
P. Sebastiani and M. Ramoni. Generalized gamma networks. Technical report, University of
Massachusetts, Department of Mathematics and Statistics, 2003.
P. Sebastiani, M. Ramoni, and A. Crea. Profiling customers from in-house data. ACM
SIGKDD Explorations, 1:91–96, 2000.
P. Sebastiani, M. Ramoni, and I. Kohane. BADGE: Technical notes. Technical report, De-
partment of Mathematics and Statistics, University of Massachusetts at Amherst, 2003.
P. Sebastiani, M. F. Ramoni, V. Nolan, C. Baldwin, and M. H. Steinberg. Discovery of com-
plex traits associated with overt stroke in patients with sickle cell anemia by Bayesian
network modeling. In 27th Annual Meeting of the National Sickle Cell Disease Program,
2004. To appear.
208 Paola Sebastiani, Maria M. Abad, and Marco F. Ramoni
P. Sebastiani, Y. H. Yu, and M. F. Ramoni. Bayesian machine learning and its potential

applications to the genomic study of oral oncology. Adv Dent Res, 17:104–108, 2003.
R. D. Shachter. Evaluating influence diagrams. Operation Research, 34:871–882, 1986.
M. Singh and M. Valtorta. Construction of Bayesian network structures from data: A brief
survey and an efficient algorithm. Int J Approx Reason, 12:111–131, 1995.
D. J. Spiegelhalter and S. L. Lauritzen. Sequential updating of conditional probabilities on
directed graphical structures. Networks, 20:157–224, 1990.
P. Spirtes, C. Glymour, and R. Scheines. Causation, prediction and search. Springer, New
York, 1993.
M. A. Tanner. Tools for Statistical Inference. Springer, New York, NY, third edition, 1996.
Y. Thibaudeau and W. E. Winler. Bayesian networks representations, generalized imputation,
and synthetic microdata satisfying analytic restraints. Technical report, Statistical Re-
search Division report RR 2002/09, 2002. />A. Thomas, D. J. Spiegelhalter, and W. R. Gilks. Bugs: A program to perform Bayesian
inference using Gibbs Sampling. In J. Bernardo, J. Berger, A. P. Dawid, and A. F. M.
Smith, editors, Bayesian Statistics 4, pages 837–42. Oxford University Press, Oxford,
UK, 1992.
J. Whittaker. Graphical Models in Applied Multivariate Statistics. Wiley, New York, NY,
1990.
S. Wright. The theory of path coefficients: a reply to niles’ criticism. Genetics, 8:239–255,
1923.
S. Wright. The method of path coefficients. Annals of Mathematical Statistics, 5:161–215,
1934.
J. Yu, V. Smith, P. Wang, A. Hartemink, and E. Jarvis. Using Bayesian network inference al-
gorithms to recover molecular genetic regulatory networks. In International Conference
on Systems Biology 2002 (ICSB02), 2002.
H. Zhou and S. Sakane. Sensor planning for mobile robot localization using Bayesian net-
work inference. J. of Advanced Robotics, 16, 2002. To appear.
11
Data Mining within a Regression Framework
Richard A. Berk
Department of Statistics

UCLA

Summary. Regression analysis can imply a far wider range of statistical procedures than
often appreciated. In this chapter, a number of common Data Mining procedures are discussed
within a regression framework. These include non-parametric smoothers, classification and
regression trees, bagging, and random forests. In each case, the goal is to characterize one or
more of the distributional features of a response conditional on a set of predictors.
Key words: regression, smoothers, splines, CART, bagging, random forests
11.1 Introduction
Regression analysis can imply a broader range of techniques than ordinarily appre-
ciated. Statisticians commonly define regression so that the goal is to understand
“as far as possible with the available data how the the conditional distribution of
some response y varies across subpopulations determined by the possible values of
the predictor or predictors” (Cook and Weisberg, 1999). For example, if there is a
single categorical predictor such as male or female, a legitimate regression analysis
has been undertaken if one compares two income histograms, one for men and one
for women. Or, one might compare summary statistics from the two income distribu-
tions: the mean incomes, the median incomes, the two standard deviations of income,
and so on. One might also compare the shapes of the two distributions with a Q-Q
plot.
There is no requirement in regression analysis for there to be a “model” by which
the data were supposed to be generated. There is no need to address cause and ef-
fect. And there is no need to undertake statistical tests or construct confidence inter-
vals. The definition of a regression analysis can be met by pure description alone.
Construction of a “model,” often coupled with causal and statistical inference, are
supplements to a regression analysis, not a necessary component (Berk, 2003).
Given such a definition of regression analysis, a wide variety of techniques and
approaches can be applied. In this chapter, I will consider a range of procedures
O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,
DOI 10.1007/978-0-387-09823-4_11, © Springer Science+Business Media, LLC 2010

×