Tải bản đầy đủ (.pdf) (10 trang)

Data Mining and Knowledge Discovery Handbook, 2 Edition part 74 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (372.75 KB, 10 trang )

710 Vicenc¸ Torra
one), the results are similar. Some parameterizations of rank swapping (Rank with parameter
p in the Table) and microaggregation (Micmul with parameter k in the Table) are ranked in
both (Domingo-Ferrer and Torra, 2001b) and here among the best algorithms.
The comparison can be extended evaluating new masking methods and comparing them
with the existing scores. For example, results from (Jimenez and Torra, 2009) would permit to
include in this table (with a score lower than 40) some parameterizations of lossy compression
using JPEG 2000.
35.6.2 R-U Maps
(Duncan et al., 2001,Duncan et al., 2004) propose the R-U maps, for Risk-Utility maps. This
is a graphical representation of the two measures. R for risk and U for utility.
Figure 35.2 represents an R-U map for the methods listed in the previous section each with
several parameterizations. Namely, RankXXX corresponds to Rank Swapping, MicXXX are
variations of Microaggregation, JPEGXXX corresponds to Lossy Compression using JPEG,
and RemuestX is resampling (not described in this chapter). In the figure, DR corresponds to
the Disclosure Risk (R following the standard jargon of R-U maps), and IL to information loss
(in our case computed as aPIL). Formally, IL and utility U are related as follows: 1 −U = IL.
Note that in addition to the protection procedures represented in Table 35.1, the figure
includes all the other methods analyzed in (Domingo-Ferrer and Torra, 2001b) but with the
new measures DR and aPIL described above. In this figure, the lines represent scores of 50,
40, 30, and 20. Naturally, the nearer a method to (0,0), the better.
35.7 Conclusions
In this chapter we have reviewed the major topics concerning privacy in data mining. We
have rewiewed major protection methods, and discussed how to measure disclosure risk and
information loss. Finally, some tools for visualizing such measures and for comparing the
methods have been described.
Acknowledgements
Part of the research described in this chapter is supported by the Spanish MEC (projects ARES
– CONSOLIDER INGENIO 2010 CSD2007-00004 – and eAEGIS – TSI2007-65406-C03-
02).
References


Adam, N. R., Wortmann, J. C. (1989) Security-control for statistical databases: a comparative
study, ACM Computing Surveys, Volume: 21, 515-556.
Aggarwal, C. (2005) On k-anonymity and the curse of dimensionality, Proceedings of the
31st International Conference on Very Large Databases, pages 901-909.
Aggarwal, C. C., Yu, P. S. (2008) Privacy-Preserving Data Mining: Models and Algorithms,
Springer.
35 Privacy in Data Mining 711
0 20406080100
0 20406080100
Risk/Utility Map
DR
IL
Distr
Remuest1
Remuest3
JPEG100
JPEG010
JPEG015
JPEG095
JPEG020
MicOI10
JPEG025
JPEG030
JPEG070
MicOI09
JPEG075
MicOI08
JPEG080
MicOI07
JPEG065

JPEG090
MicOI06
JPEG085
MicOI04
MicOI05
MicOI03
Adit0.01
Adit0.02
Mic2mul09
Rank01
JPEG055
JPEG050
Mic2mul10
JPEG035
Mic2mul06
Mic2mul05
Rank02
JPEG060
Mic2mul08
Adit0.04
Mic2mul07
Mic2mul03
Mic2mul04
JPEG045
JPEG040
Adit0.06
Adit0.08
Adit0.12
Adit0.16
Adit0.14

Rank03
Adit0.1
MicZ04
Rank04
MicZ03
Mic3mul09
MicZ08
Adit0.18
MicZ07
MicZ05
Mic3mul10
MicZ06
MicZ09
Mic3mul08
MicZ10
Mic3mul07
MicPCP10
MicPCP07
MicPCP09
Mic3mul03
MicPCP05
MicPCP08
Mic3mul04
Mic3mul06
Mic4mul10
Mic3mul05
MicPCP06
Adit0.2
MicPCP04
Mic4mul09

Mic4mul08
MicPCP03
Mic4mul06
Mic4mul05
Mic4mul07
Rank06
Mic4mul04
Mic4mul03
Rank05
Micmul10
Micmul07
Micmul09
Rank08
Micmul06
Micmul08
Micmul05
Micmul04
Micmul03
Rank07
Rank10
Rank09
Rank12
Rank11
Rank14
Rank13
Rank16
Rank18
Rank15
Rank17
Rank20

Rank19
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+

+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
Fig. 35.2. R-U Maps for some protection methods. IL computed with PIL.
Agrawal, R., Srikant, R. (2000) Privacy Preserving Data Mining, Proc. of the ACM SIGMOD
Conference on Management of Data, 439-450.
Atallah, M., Bertino, E., Elmagarmid, A., Ibrahim, M., Verykios, V. (1999) Disclosure lim-
itation of sensitive rules, Proc. of IEEE Knowledge and Data Engineering Exchange
Workshop (KDEX).
Atzori, M., Bonchi, F., Giannotti, F., Pedreschi, D. (2008) Anonymity preserving pattern
discovery, The VLDB Journal 17 703-727.
Bacher, J., Brand, R., Bender, S. (2002) Re-identifying register data by survey data using
cluster analysis: an empirical study, Int. J. of Unc., Fuzz. and Knowledge Based Systems

10:5 589-607.
Bertino, E., Lin, D., Jiang, W. (2008) A survey of quantification of privacy preserving data
mining algorithms, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining:
712 Vicenc¸ Torra
Models and Algorithms, Springer, 183-205.
Brand, R. (2002) Microdata protection through noise addition, in J. Domingo-Ferrer (ed.)
Inference Control in Statistical Databases, Lecture Notes in Computer Science 2316 97-
116.
Bunn, P., Ostrovsky, R. (2007) Secure two-party k-means clustering, Proc. of CCS’07, ACM
Press, 486-497.
Burridge, J. (2003) Information preserving statistical obfuscation, Statistics and Computing,
13:321–327.
Carlson, M., Salabasis, M. (2002) A data swapping technique using ranks: a method for
disclosure control, Research on Official Statistics 5:2 35-64.
Dalenius, T. (1977) Towards a methodology for statistical disclosure control, Statistisk Tid-
skrift 5 429-444.
Dalenius, T. (1986) Finding a needle in a haystack - or identifying anonymous census
records, Journal of Official Statistics 2:3 329-336.
Defays, D., Nanopoulos, P. (1993) Panels of enterprises and confidentiality: the small aggre-
gates method, Proc. of 92 Symposium on Design and Analysis of Longitudinal Surveys,
Statistics Canada, 195-204.
Dempster, A. P., Laird, N. M., Rubin, D. B. (1977) Maximum Likelihood From Incomplete
Data Via the EM Algorithm, Journal of the Royal Statistical Society 39 1-38.
Domingo-Ferrer, J., Mateo-Sanz, J. M. (2002) Practical data-oriented microaggregation for
statistical disclosure control, IEEE Trans. on Knowledge and Data Engineering 14:1
189-201.
Domingo-Ferrer, J., Mateo-Sanz, J. M., Torra, V. (2001) Comparing SDC methods for mi-
crodata on the basis of information loss and disclosure risk, Pre-proceedings of ETK-
NTTS’2001, (Eurostat, ISBN 92-894-1176-5), Vol. 2, 807-826, Creta, Greece.
Domingo-Ferrer, J., Sebe, F., Castella-Roca, J. (2004) On the security of noise addition for

privacy in statistical databases, PSD 2004, Lecture Notes in Computer Science 3050
149-161.
Domingo-Ferrer, J., Torra, V. (2001) Disclosure Control Methods and Information Loss for
Microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality,
Disclosure, and Data Access: Theory and Practical Applications for Statistical Agencies,
Elsevier Science, 91-110.
Domingo-Ferrer, J., Torra, V. (2001) A quantitative comparison of disclosure control meth-
ods for microdata, in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confi-
dentiality, Disclosure and Data Access: Theory and Practical Applications for Statistical
Agencies, North-Holland, 111-134.
Domingo-Ferrer, J., Torra, V. (2003) Disclosure Risk Assessment in Statistical Microdata
Protection via advanced record linkage, Statistics and Computing, 13 343-354.
Domingo-Ferrer, J., Torra, V. (2005) Ordinal, Continuous and Heterogeneous k-Anonymity
Through Microaggregation, Data Mining and Knowledge Discovery 11:2 195-212.
Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Disclosure risk vs. data utility:
The R-U confidentiality map, Technical Report 121, National Institute of Statistical Sci-
ences.
Duncan, G. T., Keller-McNulty, S. A., Stokes, S. L. (2001) Database security and confiden-
tiality: examining disclosure risk vs. data utility through the R-U confidentiality map,
Technical Report 142, National Institute of Statistical Sciences.
Duncan, G. T., Lambert, D. (1986) Disclosure-limited data dissemination, Journal of the
American Statistical Association, 81 10-18.
35 Privacy in Data Mining 713
Duncan, G. T., Lambert, D. (1989) The risk disclosure for microdata, Journal of Business
and Economic Statistics 7 207-217.
Elamir, E. A. H. (2004) Analysis of re-identification risk based on log-linear models, PSD
2004, Lecture Notes in Computer Science 3050 273-281.
Elliot, M. (2002) Integrating file and record level disclosure risk assessment, in J. Domingo-
Ferrer, Inference Control in Statistical Databases, Lecture Notes in Computer Science
2316 126-134.

Elliot, M. J. Skinner, C. J., Dale, A. (1998) Special Uniqueness, Random Uniques and Sticky
Populations: Some Counterintuitive Effects of Geographical Detail on Disclosure Risk,
Research in Official Statistics 1:2 53-67.
Fellegi, I. P., Sunter, A. B. (1969) A theory for record linkage, Journal of the American
Statistical Association 64:328 1183-1210.
Fels
¨
o, F., Theeuwes, J., Wagner, G., (2001) Disclosure Limitation in Use: Results of a Survey,
in P. Doyle, J. I. Lane, J. J. M. Theeuwes, L. Zayatz (eds.) Confidentiality, Disclosure,
and Data Access: Theory and Practical Applications for Statistical Agencies, Elsevier
Science, 17-42.
Franconi, L., Polettini, S. (2004) Individual risk estimation in
μ
-Argus: a review, PSD 2004,
Lecture Notes in Computer Science 3050 262-272.
Gouweleeuw, J. M., Kooiman, P., Willenborg, L. C. R. J., De Wolf, P P. (1998) Post Ran-
domisation for Statistical Disclosure Control: Theory and Implementation’, Journal of
Official Statistics 14:4 463-478. Also as Research Paper No. 9731, Voorburg: Statistics
Netherlands (1997).
Gross, B., Guiblin, P., Merrett, K. (2004) Implementing the Post Randomisation method
to the individual sample of anonymised records (SAR) from the 2001 Census, paper
presented at “The Samples of Anonymised Records, An Open Meeting on the Samples of
Anonymised Records from the 2001 Census”. />09-30/gross.pdf
Hansen, S., Mukherjee, S. (2003) A Polynomial Algorithm for Optimal Univariate Microag-
gregation, IEEE Trans. on Knowledge and Data Engineering 15:4 1043-1044.
Haritsa, J. R. (2008) Mining association rules under privacy constraints, in C. C. Aggarwal,
P. S. Yu (eds.) Privacy-Preserving Data Mining: Models and Algorithms, Springer, 239-
266.
Hundepool, A., van de Wetering, A., Ramaswamy, R., Franconi, L., Capobianchi, C., de
Wolf, P P., Domingo-Ferrer, J., Torra, V., Brand, R., Giessing, S. (2003)

μ
-ARGUS
version 3.2 Software and User’s Manual, Voorburg NL,Statistics Netherlands, February,
2003; version 4.0 published on may 2005. />Jaro, M. A. (1989) Advances in record-linkage methodology as applied to matching the 1985
Census of Tampa, Florida, Journal of the American Statistical Association 84:406 414-
420.
Jim
´
enez, J., Torra, V. (2009) Utility and risk of JPEG-based continuous microdata protection
methods, Proc. Int. Conf. on Availability, Reliability and Security (ARES 2009), 929-
934.
Kantarcioglu, M. (2008) A survey of privacy-preserving methods across horizontally parti-
tioned data, in C. C. Aggarwal, P. S. Yu (eds.) Privacy-Preserving Data Mining: Models
and Algorithms, Springer, 313-335.
Kim, J., Winkler, W. (2003) Multiplicative noise for masking continuous data, Research
Report Series (Statistics 2003-01), U. S. Bureau of the Census.
Kisilevich S., Rokach L., Elovici Y., Shapira B., Efficient Multidimensional Suppression for
K-Anonymity, IEEE Transactions on Knowledge and Data Engineering, vol. 22, no. 3,
714 Vicenc¸ Torra
pp. 334-347, Mar. 2010
Ladra, S., Torra, V. (2008) On the comparison of generic information loss measures and
cluster-specific ones, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 107-
120.
Lambert, D. (1993) Measures of Disclosure Risk and Harm, Journal of Official Statistics 9
313-331.
LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Multidimensional k-anonymity, Tech-
nical Report 1521, University of Wisconsin.
LeFevre, K., DeWitt, D. J., Ramakrishnan, R. (2005) Incognito: Efficient Full-Domain K-
Anonymity, SIGMOD 2005.
Li, N., Li, T., Venkatasubramanian, S. (2007) T-closeness: privacy beyond k-anonymity and

l-diversity, Proc. of the IEEE ICDE 2007.
Liew, C. K., Choi, U. J., Liew, C. J. (1985) A data distortion by probability distribution,
ACM Transactions on Database Systems 10 395-411.
Lindell, Y., Pinkas, B. (2002) Privacy Preserving Data Mining, Journal of Cryptology, 15:3.
Lindell, Y., Pinkas, B. (2000) Privacy Preserving Data Mining, Crypto’00, Lecture Notes in
Computer Science 1880 20-24.
Liu, K., Kargupta, H., Ryan, J. (2006) Random projection based multiplicative data pertur-
bation for privacy preserving data mining, IEEE Trans. on Knowledge and Data Engi-
neering 18:1 92-106.
Machanavajjhala, A., Gehrke, J., Kiefer, D., Venkitasubramanian, M. (2006) L-diversity:
privacy beyond k-anonymity, Proc. of the IEEE ICDE.
Mateo-Sanz, J. M., Domingo-Ferrer, J. Seb
´
e, F. (2005) Probabilistic information loss mea-
sures in confidentiality protection of continuous microdata, Data Mining and Knowledge
Discovery, 11:2 181-193.
Moore, R. (1996) Controlled data swapping techniques for masking public use microdata
sets, U. S. Bureau of the Census (unpublished manuscript).
Muralidhar, K., Sarathy, R. (2008) Generating Sufficiency-based Non-Synthetic Perturbed
Data, Transactions on Data Privacy 1:1 17 - 33
Nin, J., Herranz, J., Torra, V. (2007) Rethinking Rank Swapping to Decrease Disclosure
Risk, Data and Knowledge Engineering, 64:1 346-364.
Nin, J., Herranz, J., Torra, V. (2008) How to Group Attributes in Multivariate Microaggrega-
tion, Intl. J. of Unc., Fuzz. and Knowledge-Based Systems, 16:1 121-138.
Nin, J., Herranz, J., Torra, V. (2008) On the Disclosure Risk of Multivariate Microaggrega-
tion, Data and Knowledge Engineering, 67:3 399-412.
Nin, J., Herranz, J., Torra, V. (2008) Towards a More Realistic Disclosure Risk Assessment,
Lecture Notes in Computer Science, 5262 152-165.
Nin, J. Torra, V. (2006) Extending microaggregation procedures for time series protection,
Lecture Notes in Artificial Intelligence, 4259 899-908.

Nin, J., Torra, V. (2009) Analysis of the Univariate Microaggregation Disclosure Risk, New
Generation Computing, 27 177-194.
Oganian, A., Domingo-Ferrer, J. (2000) On the Complexity of Optimal Microaggregation
for Statistical Disclosure Control, Statistical J. United Nations Economic Commission
for Europe, 18, 4, 345-354.
Paass, G. (1985) Disclosure risk and disclosure avoidance for microdata, Journal of Business
and Economic Statistics 6 487-500.
Paass, G., Wauschkuhn, U. (1985) Datenzugang, Datenschutz und Anonymisierung - Anal-
ysepotential und Identifizierbarkeit von Anonymisierten Individualdaten, Oldenbourg
Verlag.
35 Privacy in Data Mining 715
Pagliuca, D., Seri, G. (1999) Some results of individual ranking method on the system of
enterprise accounts annual survey, Esprit SDC Project, Deliverable MI-3/D2.
Pinkas, B. (2002) Cryptographic techniques for privacy-preserving data mining, ACM
SIGKDD Explorations 4:2.
Ravikumar, P., Cohen, W. W. (2004) A hierarchical graphical model for record linkage, Proc.
of UAI 2004.
Rokach L., Genetic algorithm-based feature set partitioning for classification prob-
lems,Pattern Recognition, 41(5):1676–1700, 2008.
Rokach L., Maimon O. and Lavi I., Space Decomposition In Data Mining: A Clustering Ap-
proach, Proceedings of the 14th International Symposium On Methodologies For Intel-
ligent Systems, Maebashi, Japan, Lecture Notes in Computer Science, Springer-Verlag,
2003, pp. 24–31.
Samarati, P. (2001) Protecting Respondents’ Identities in Microdata Release, IEEE Trans. on
Knowledge and Data Engineering, 13:6 1010-1027.
Samarati, P., Sweeney, L. (1998) Protecting privacy when disclosing information: k-
anonymity and its enforcement through generalization and suppression, SRI Intl. Tech.
Rep.
Spruill, N. L. (1983) The confidentiality and analytic usefulness of masked business mi-
crodata, Proc. of the Section on Survery Research Methods 1983, American Statistical

Association, 602-610.
Sweeney, L. (2002) Achieving k-anonymity privacy protection using generalization and sup-
pression, Int. J. of Unc., Fuzz. and Knowledge Based Systems 10:5 571-588.
Sweeney, L. (2002) k-anonymity: a model for protecting privacy, Int. J. of Unc., Fuzz. and
Knowledge Based Systems 10:5 557-570.
Takemura, A. (2002) Local recoding and record swapping by maximum weight matching for
disclosure control of microdata sets, Journal of Official Statistics 18 275-289. Preprint
(1999) Local recoding by maximum weight matching for disclosure control of microdata
sets.
Templ, M. (2008) Statistical Disclosure Control for Microdata Using the R-Package sdcMi-
cro, Transactions on Data Privacy 1 67-85.
Torra, V. (2004) Microaggregation for categorical variables: a median based approach, Proc.
Privacy in Statistical Databases (PSD 2004), Lecture Notes in Computer Science 3050
162-174.
Torra, V. (2004) OWA operators in data modeling and reidentification, IEEE Trans. on Fuzzy
Systems 12:5 652-660.
Torra, V. (2008) Constrained Microaggregation: Adding Constraints for Data Editing, Trans-
actions on Data Privacy 1:2 86-104.
Torra, V., Abowd, J. M., Domingo-Ferrer, J. (2006) Using Mahalanobis Distance-Based
Record Linkage for Disclosure Risk Assessment, Lecture Notes in Computer Science
4302 233-242.
Torra, V., Domingo-Ferrer, J. (2003) Record linkage methods for multidatabase data mining,
in V. Torra (ed.) Information Fusion in Data Mining, Springer, 101-132.
Torra, V., Miyamoto, S. (2004) Evaluating fuzzy clustering algorithms for microdata protec-
tion, PSD 2004, Lecture Notes in Computer Science 3050 175-186.
Trottini, M. (2003) Decision models for data disclosure limitation, PhD Dissertation,
Carnegie Mellon University. />Truta, T. M., Vinay, B. (2006) Privacy protection: p-sensitive k-anonymity property. Proc.
2nd Int. Workshop on Privacy Data management (PDM 2006) p. 94.
716 Vicenc¸ Torra
Willenborg, L., de Waal, T. (2001) Elements of Statistical Disclosure Control, Lecture Notes

in Statistics, Springer-Verlag.
Winkler, W. E. (1993) Matching and record linkage, Statistical Research Division, U. S.
Bureau of the Census (USA), RR93/08.
Winkler, W. E. (2004) Re-identification methods for masked microdata, PSD 2004, Lecture
Notes in Computer Science 3050 216-230.
Yancey, W. E., Winkler, W. E., Creecy, R. H. (2002) Disclosure risk assessment in pertur-
bative microdata protection, in J. Domingo-Ferrer (ed.) Inference Control in Statistical
Databases, Lecture Notes in Computer Science 2316 135-152.
Yao, A. C. (1982) Protocols for Secure Computations, Proc. of 23rd IEEE Symposium on
Foundations of Computer Science, Chicago, Illinois, 160-164.

36
Meta-Learning - Concepts and Techniques
Ricardo Vilalta
1
, Christophe Giraud-Carrier
2
, and Pavel Brazdil
3
1
University of Houston
2
Brigham Young University
3
University of Porto
Summary. The field of meta-learning has as one of its primary goals the understanding of the
interaction between the mechanism of learning and the concrete contexts in which that mech-
anism is applicable. The field has seen a continuous growth in the past years with interesting
new developments in the construction of practical model-selection assistants, task-adaptive
learners, and a solid conceptual framework. In this chapter we give an overview of different

techniques necessary to build meta-learning systems. We begin by describing an idealized
meta-learning architecture comprising a variety of relevant component techniques. We then
look at how each technique has been studied and implemented by previous research. In ad-
dition we show how meta-learning has already been identified as an important component in
real-world applications.
Key words: Meta-learning
36.1 Introduction
We are used to thinking of a learning system as a rational agent capable of adapting to a specific
environment by exploiting knowledge gained through experience; encountering multiple and
diverse scenarios sharpens the ability of the learning system to predict the effect produced
from selecting a particular course of action. In this case, learning is made manifest because
the quality of the predictions normally improves with an increasing number of scenarios or
examples. Nevertheless, if the predictive mechanism were to start afresh on different tasks,
the learning system would find itself at a considerable disadvantage; learning systems capable
of modifying their own predictive mechanism would soon outperform our base learner by
being able to change their learning strategy according to the characteristics of the task under
analysis.
Meta-learning differs from base-learning in the scope of the level of adaptation; whereas
learning at the base-level is based on accumulating experience on a specific learning task (e.g.,
credit rating, medical diagnosis, mine-rock discrimination, fraud detection, etc.), learning at
the meta-level is based on accumulating experience on the performance of multiple applica-
tions of a learning system. If a base-learner fails to perform efficiently, one would expect the
O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,
DOI 10.1007/978-0-387-09823-4_36, © Springer Science+Business Media, LLC 2010
718 Ricardo Vilalta, Christophe Giraud-Carrier, and Pavel Brazdil
learning mechanism itself to adapt in case the same task is presented again. Meta-learning is
then important in understanding the interaction between the mechanism of learning and the
concrete contexts in which that mechanism is applicable. Briefly stated, the field of meta-
learning is focused on the relation between tasks or domains and learning strategies. In that
sense, by learning or explaining what causes a learning system to be successful or not on a

particular task or domain, we go beyond the goal of producing more accurate learners to the
additional goal of understanding the conditions (e.g., types of example distributions) under
which a learning strategy is most appropriate.
From a practical stance, meta-learning can solve important problems in the application of
machine learning and Data Mining tools, particularly in the area of classification and regres-
sion. First, the successful use of these tools outside the boundaries of research (e.g., industry,
commerce, government) is conditioned on the appropriate selection of a suitable predictive
model (or combinations of models) according to the domain of application. Without any kind
of assistance, model selection and combination can turn into stumbling blocks to the end-user
who wishes to access the technology more directly and cost-effectively. End-users often lack
not only the expertise necessary to select a suitable model, but also the availability of many
models to proceed on a trial-and-error basis (e.g., by measuring accuracy via some re-sampling
technique such as n-fold cross-validation). A solution to this problem is attainable through the
construction of meta-learning systems. These systems can provide automatic and systematic
user guidance by mapping a particular task to a suitable model (or combination of models).
Second, a problem commonly observed in the practical use of ML and DM tools is how
to profit from the repetitive use of a predictive model over similar tasks. The successful ap-
plication of models in real-world scenarios requires a continuous adaptation to new needs.
Rather than starting afresh on new tasks, we expect the learning mechanism itself to re-learn,
taking into account previous experience (Thrun, 1998,Pratt et al., 1991,Caruana, 1997,Vilalta
and Drissi, 2002). Again, meta-learning systems can help control the process of exploiting
cumulative expertise by searching for patterns across tasks.
Our goal in this chapter is to give an overview of different techniques necessary to build
meta-learning systems. To impose some structure, we begin by describing an idealized meta-
learning architecture comprising a variety of relevant component techniques. We then look at
how each technique has been studied and implemented by previous research. We hope that by
proceeding in this way the reader can not only learn from past work, but in addition gain some
insight on how to construct meta-learning systems.
We also hope to show how recent advances in meta-learning are increasingly filling the
gaps in the construction of practical model-selection assistants and task-adaptive learners,

as well as in the development of a solid conceptual framework (Baxter, 1998, Baxter, 2000,
Giraud-Carrier et al., 2004).
This chapter is organized as follows. In the next section we illustrate an idealized meta-
learning architecture and detail on its constituent parts. In Section 65.3.3 we describe previous
research in meta-learning and its relation to our architecture. Section 65.3.4 describes a meta-
learning tool that has been instrumental as a decision support tool in real applications. Lastly,
section 65.3.5 discusses future directions and provides our conclusions.
36.2 A Meta-Learning Architecture
In this section we provide a general view of a software architecture that will be used as a
reference to describe many of the principles and current techniques in meta-learning. Though
36 Meta-Learning 719
not every technique in meta-learning fits into this architecture, such a general view helps us
understand the challenges we need to overcome before we can turn the technology into a set
of useful and practical tools.
36.2.1 Knowledge-Acquisition Mode
To begin, we propose a meta-learning system that divides into two modes of operation. During
the first mode, also known as the knowledge-acquisition mode, the main goal is to learn about
the learning process itself. Figure 36.1 illustrates this mode of operation. We assume the input
to the system is made of more than one dataset of examples (e.g., more than one set of pairs
of feature vectors and classes; Figure 36.1A). Upon arrival of each dataset, the meta-learning
system invokes a component responsible for extracting dataset characteristics or meta-features
(Figure 36.1B). The goal of this component is to gather information that transcends the par-
ticular domain of application. We look for information that can be used to generalize to other
example distributions. Section 36.3.1 details current research pointing in this direction.
During the knowledge acquisition mode, the learning technique (Figure 36.1C) does not
exploit knowledge across different datasets or tasks. Each dataset is considered independently
of the rest; the output to the system is a learning strategy (e.g., a classifier or combination of
classifiers, Figure 36.1D). Statistics derived from the output model or its performance (Figure
36.1E) may also serve as a form of characterizing the task under analysis (Sections 36.3.1 and
36.3.1).

Information derived from the meta-feature generator and the performance evaluation mod-
ule can be combined into a meta-knowledge base (Figure 36.1F). This knowledge base is the
main result of the knowledge–acquisition phase; it reflects experience accumulated across
different tasks. Meta-learning is tightly linked to the process of acquiring and exploiting meta-
knowledge. One can even say that advances in the field of meta-learning hinge around one
specific question: how can we acquire and exploit knowledge about learning systems (i.e.,
meta-knowledge) to understand and improve their performance? As we describe current re-
search in meta-learning we will be pointing out to different forms of meta-knowledge.
36.2.2 Advisory Mode
The efficiency of the meta-learner increases as it accumulates meta-knowledge. We assume
the lack of experience at the beginning of the learner’s life compels the meta-learner to use
one or more learning strategies without a clear preference for one of them; experimenting with
many different strategies becomes time consuming. However, as more training sets have been
examined, we expect the expertise of the meta-learner to dominate in deciding which learning
strategy best suits the characteristics of the training set.
In the advisory mode, meta-knowledge acquired in the exploratory mode is used to con-
figure the learning system in a manner that exploits the characteristics of the new data distri-
bution. Meta-features extracted from the dataset (Figure 36.2B) are matched with the meta-
knowledge base (Figure 36.2F) to produce a recommendation regarding the best available
learning strategy. At this point we move away from the use of static base learners to the ability
to do model selection or combining base learners (Figure 36.2C).
Two observations are worth considering at this point. First, the nature of the match be-
tween the set of meta-features and the meta-knowledge base can have several interpretations.
The traditional view poses this problem as a learning problem itself where a meta-learner
is invoked to output an approximating function mapping meta-features to learning strategies

×