Tải bản đầy đủ (.pdf) (10 trang)

Data Mining and Knowledge Discovery Handbook, 2 Edition part 106 ppt

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (225.21 KB, 10 trang )

1030 Steve Moyle
setting. Such well defined and strong processes include, for instance, clear model evaluation
procedures (Blockeel and Moyle, 2002).
Different perspectives exist on what collaborative Data Mining is (this is discussed further
in section 54.5). Three interpretations are: 1) multiple software agents applying Data Mining
algorithms to solve the same problem; 2) humans using modern collaboration techniques to
apply Data Mining to a single, defined problem; 3) Data Mining the artifacts of human collab-
oration. This chapter will focus solely on the second item – that of humans using collaboration
techniques to apply data mining to a single task. With sufficient definition of a particular Data
Mining problem, this is similar to a multiple software agent Data Mining framework (the first
item), although this is not the aim of the chapter. Many of the difficulties encountered in human
collaboration will also be encountered in designing a system for software agent collaboration.
Collaborative Data Mining aims to combine the results generated by isolated experts,
by enabling the collaboration of geographically dispersed laboratories and companies. For
each Data Mining problem, a virtual team of experts is selected on the basis of adequacy and
availability. Experts apply their methods to solving the problem – but also communicate with
each other to share their growing understanding of the problem. It is here that collaboration is
key.
The process of analyzing data through models has many similarities to experimental re-
search. Like the process of scientific discovery, Data Mining can benefit from different tech-
niques used by multiple researchers who collaborate, compete, and compare results to improve
their combined understanding. The rest of this chapter is organized as follows. The potential
difficulties in (remote) collaboration and a framework for analyzing such difficulties are out-
lined. A standard Data Mining process is reviewed, and studied for the likely contributions that
can be achieved collaboratively. A collaboration process for Data Mining is presented, with
clear guidelines for the practitioner so that they may avoid the potential pitfalls related to col-
laborative Data Mining. A brief summary of real examples of the application of collaborative
Data Mining are presented. The chapter concludes with a discussion.
54.2 Remote Collaboration
This section considers the motivations behind (remote) collaboration
1


, and types of collab-
oration it enables. It then reviews the framework proposed by McKenzie and Van Winke-
len (McKenzie and van Winkelen, 2001) for working within e-Collaboration Space. The term
e-Collaboration will be used as shorthand for remote collaboration, but many of the principles
can be applied to local collaboration also.
54.2.1 E-Collaboration:Motivations and Forms
The main motivation for collaboration (Moyle et al., 2003) is to harness dispersed exper-
tise and to enable knowledge sharing and learning in a manner that builds intellectual capi-
tal (Edvinsson and Malone, 1997). This offers tantalizing potential rewards including boost-
ing innovation, flexible resource management, and reduced risk (Amara, 1990, Mowshowitz,
1997, Nohria and Eccles, 1993, Snow et al., 1996), but these rewards are offset by numerous
difficulties mainly due to the increased complexity of a virtual environment.
In (McKenzie and van Winkelen, 2001) seven distinct forms of e-collaborating organiza-
tions that can be distinguished either by their structure or the intent behind their formation are
1
The term “remote” is removed in the sequel.
54 Collaborative Data Mining 1031
identified. These are: 1) virtual/smart organizations; 2) a community of interest and practice;
3) a virtual enterprise; 4) virtual teams; 5) a community of creation; 6) collaborative product
commerce or customer communities; and 7) virtual sourcing and resource coalitions. For col-
laborative data mining forms 4, and 5 are most relevant. These forms are summarized below.
• Virtual Teams are temporary culturally diverse geographically dispersed work groups that
communicate electronically. These can be smaller entities within virtual enterprises, or
within a transnational organization. They can be categorized by changing membership
and multiple organizational contexts.
• A Community of creation is revolves around a central firm and shares its knowledge for
the purpose of innovation. This structure consists of individuals and organizations with
ever changing boundaries.
Having recognized the collaboration form makes it possible to analyze the difficulties that
might be encountered. Such an analysis can be performed with respect to the e-collaboration

space model described in the next section.
54.2.2 E-Collaboration Space
Each type of e-collaboration form can be usefully analyzed with respect to McKenzie and Van
Winkelen’s e-Collaboration Space model (McKenzie and van Winkelen, 2001). This model
casts each form into the space by studying their location on the three dimensions of: number
of boundaries crossed, task, and relationships.
• Boundaries crossed: The more boundaries that are crossed in e-collabo
ration, the more barriers to a successful outcome are present. All communication takes
place across some boundary (Wilson, 2002). Fewer boundaries between agents lead to a
lower risk of misunderstanding. In e-collaboration the number of boundaries is automat-
ically increased. Influential boundaries to successful e-collaboration are: technological,
temporal, organizational, and cultural.
• Task: The nature of the tasks involved in the collaborative project is influenced by the
complexity of the processes, uncertainty of the available information and outcomes, and
interdependence of the various stages of the task. The complexity can be broadly classified
into linear – step-by-step processes; or non-linear. The interdependence of a task relates
to whether it can be decomposed into subtasks which can be worked on independently by
different participants.
• Relationships: Relationships are key to any successful collaboration. When electronic
communication is the only mode of interaction it is harder for relationships to form, be-
cause the instinctive reading of signals that establish trust and mutual understanding are
less accessible to participants.
For the remainder of the chapter only the dimension of task will be highlighted within the
e-collaboration space model. As will be described in the next sub-section, task complexity
makes collaborative Data Mining risk prone.
54.2.3 Collaborative Data Mining in E-Collaboration Space
Different forms of e-collaboration – as measured relative to the dimensions of task, bound-
aries, and relationships – can be viewed as locations in a three dimensional e-collaboration
1032 Steve Moyle
space. The location of a collaborative Data Mining project depends on the actual setting of

such a project. The most well defined dimension with respect to the Data Mining process
(refer back to section 60.2.1) is that of task.
The task complexity of Data Mining is high. Not only is there a high level of expertise in-
volved in a Data Mining project, but also there is the risk that in reaching the final solution(s),
much effort will appear – in hindsight – to have been wasted. Data miners have long under-
stood the need for a methodology to support the Data Mining process (Adriaans and Zantinge,
1996,Fayyad et al., 1996, Chapman et al., 2000). All these methodologies are explicit that the
Data Mining process is non-linear, and warns that information uncovered in later phases can
invalidate assumptions made in earlier phases. As a result the previous phases may need to be
re-visited. To exacerbate the situation, Data Mining is by its very nature a speculative process
– there may be no valuable information contained in the data sources at all, or the techniques
being used may not have sufficient power to uncover it. A typical Data Mining project at the
start of the collaboration is summarized with respect to the e-collaboration model in Table
54.1.
Table 54.1. The position of a disperse collaborative Data Mining project in E-collaboration
space (

potential boundary depending on situation).
Task Boundaries Crossed Relationships
High High Medium High
- Complex non-linear - Medium technological - Medium commonality of view
interdependencies - temporal

- Medium duration of existing
- Uncertainty - geographical relationship
- large organizational

- Medium duration of
- cultural


collaboration
54.3 The Data Mining Process
Data Mining processes broadly consist of a number of phases. These phases, however, are
interrelated and are not necessarily executed in a linear manner. For example, the results of
one phase may uncover more detail relating to an earlier phase and may force more effort
to be expended on a phase previously thought complete. The CRISP-DM methodology —
CRoss Industry Standard Process for Data Mining (Chapman et al., 2000), is an attempt to
standardise the process of Data Mining. In CRISP-DM, six interrelated phases are used to
describe the Data Mining process: business understanding, data understanding, data prepa-
ration, modelling, evaluation, and deployment (Figure 54.1). The main outputs of the business
understanding phase are the definition of business and data mining objectives as well as busi-
ness and Data Mining evaluation criteria. In this phase an assessment of resource requirements
and estimation of risk is performed. In the data understanding phase data collected and char-
acterized. Data quality is also assessed.
During data preparation, tables, records and attributes are selected and transformed for
modelling. Modelling is the process of extracting input/output patterns from given data and
deriving models — typically mathematical or logical models. In the modelling phase, vari-
ous techniques (e.g. association rules, decision trees, logistic regression, k-means clustering)
54 Collaborative Data Mining 1033
Fig. 54.1. The CRISP-DM cycle
are selected and applied and their parameters are calibrated – or tuned – to optimal values.
Different models are compared, and possibly combined.
In the evaluation phase models are selected and reviewed according to the business cri-
teria. The whole Data Mining process is reviewed and a list of possible actions is elaborated.
In the last phase, deployment is planned, implemented, and monitored. The entire project is
typically documented and summarized in a report.
The CRISP-DM handbook (Chapman et al., 2000) describes in detail how each of the
main phases is subdivided into specific tasks, with clearly defined predecessors/successors,
and inputs/outputs.
54.4 Collaborative Data Mining Guidelines

The CRISP-DM Data Mining process described in the preceding section can be adopted by
Data Mining agents collaborating remotely on a particular Data Mining project (SolEuNet,
2002, Flach et al., 2003). Not all of the CRISP-DM methodology can be entirely performed
in a collaboartive setting. Business understanding for instance, requires intense close contact
with the business environment for which the Data Mining is being performed. The phases
that can most easily be performed in a remote-collaborative fashion are data preparation and
modelling. The other phases can nevertheless benefit from a collaborative approach. Although
many of the specific tasks can be carried out independently, care must be taken by the par-
ticipants to ensure that efforts are not wasted. Principles to guide the process of collaboration
should be established in advance of a collaborative Data Mining project. For instance, indi-
vidual agents must communicate or share any intermediate results – or improvements in the
current best understanding of the Data Mining problem – so that all agents have the new
knowledge. Providing a catalogue of up-to-date knowledge about the problem assists new
agents entering the Data Mining project. Furthermore, procedures are required for how results
from different agents are compared, and ultimately combined, so that the value of efforts is
greater than the sum of the individual components.
54.4.1 Collaboration Principles
(Moyle et al., 2003) present a framework for collaborative Data Mining, involving both prin-
ciples and technological support. Collaborative groupware technology, with specific function-
1034 Steve Moyle
ality to support data mining are described (Vo et al., 2001). Principles for collaborative data
mining are outlined as follows (Moyle et al., 2003).
1. Requisite management. Sufficient management processes should be established. In par-
ticular the definition and objectives of the Data Mining problem should be clear from the
start of the project to all participants. An infrastructure ensuring information flows within
the network of agents should be provided.
2. Problem Solving Freedom. Agents should use their expertise and tools to execute Data
Mining tasks to solve problem in the manner they find best.
3. Start any time. All the necessary information about the Data Mining problem should be
captured and made available to participants at all times. This includes problem definition,

data, evaluation criteria, and any knowledge produced.
4. Stop any time. Participants should work on their solutions so that a working solution
– however crude – is available whenever a stop signal is issued. These solutions will
typically be Data Mining models. One approach is to try simpler modeling techniques
first (Holte, 1993).
5. Online knowledge sharing. The knowledge about the Data Mining problem gained by
each participant at each phase should be shared with all participants in a timely manner.
6. Security. Data and information about the Data Mining problem may contain sensitive
information and must not to be revealed outside the project. Access to information must
be controlled.
Having established a collaborative Data Mining project with appropriate principles and sup-
port, how can the results of the Data Mining efforts be compared and combined so that the
results are maximized? This is the question that the next section deals with.
54.4.2 Data Mining model evaluation and combination
One of the main outputs from the Data Mining process (Chapman et al., 2000) are the Data
Mining models. These may take many forms including decision trees, rules, artificial neural-
networks, regression equations (see (Mitchell, 1997) as an introduction to machine learning,
and (Hair et al., 1998) as an introduction to statistics text). Different agents may produce
models in the different forms, which requires methods for both evaluating them and combining
them.
When multiple agents produce multiple models as the result of data mining effort a process
for evaluating their relative merits must be established. Such processes are well defined in Data
Mining challenge problems (e.g. (Srinivasan et al., 1999,Page and Hatzis, 2001)). For example
a challenge recipe for the production of classificatory models can be found in (Moyle and
Srinivasan, 2001). To ensure accurate comparisons, models built by different agents must be
evaluated in exactly the same way, on the same data. This sounds like an obvious statement, but
agents can easily make adjustments to their copy of the data to suit their particular approaches,
without making the changes available to the other agents. This makes any model evaluation
ad comparison extremely difficult.
Furthermore, the evaluation criterion or criteria (there may be several) deemed most ap-

propriate may change during the knowledge discovery process. For instance, at some point
one may wish to redefine the data set on which models are evaluated (e.g. because it is found
that it contains outliers that make the evaluation procedure inaccurate) and re-evaluate pre-
viously built models. In (Blockeel and Moyle, 2002) it is discussed how this evaluation and
re-evaluation leads to significant extra efforts for the different agents and consequently is a
barrier to the knowledge discovery process, unless adequate software support is provided.
54 Collaborative Data Mining 1035
One approach to control model evaluation is to centralize the process. Consider an ab-
stracted Data Mining process where agents first tune their modeling algorithm (which outputs
the algorithm and its parameter settings, I), before building a final model (which is output as
M). The agent then uses the model to predict the labels on a test set (producing predictions, P),
from which an overall evaluation of the model (resulting in a score S) is determined. The point
at which these outputs are published for all agents to access depend on the architecture of the
evaluation system as shown in Figure 54.2. A single evaluation agent provides the evaluation
procedures; different agents submit information on their models to this agent, which stores this
information and automatically evaluates it according to all relevant criteria. If criteria change,
the evaluation agent automatically re-evaluates previously submitted models.
In such a framework information about produced models can be submitted at several lev-
els, as illustrated in Figure 54.2. Agents can run their own models on a test set and send only
predictions to the evaluation agent (assuming evaluation is based on predictions only), they
can submit descriptions of the models themselves, or even just send a complete description on
the model producing algorithm and the used parameters to the evaluation agent which has been
augmented with modeling algorithms. These respective options offer increased centralization
and increasingly flexible evaluation possibilities, but also involve increasingly sophisticated
software support (Blockeel and Moyle, 2002).
Communicating Data Mining models to the evaluation agent can be performed using a
standard format. For instance in (Flach et al., 2003) models from multiple agents were sub-
mitted in a standard, XML style, format (using the standard Predictive Markup Modeling
Language (PMML) (The Data Mining Group, 2003)). Such a procedure has been adopted for
a real-world collaborative Data Mining project (Flach et al., 2003).

Model combination is not always possible. However, when restricted to binary-classificatory
models it is possible to utilize Receiver Operating Characteristic (Provost and Fawcett, 2001)
curves to assist both model comparison, and model combination. ROC analysis plots differ-
ent binary-classification models on a two dimensional space with respect to the type of errors
the models make – false positive errors, and false negative errors
2
. The actual performance
of a model at run-time depends on the costs of errors at run-time, and the distribution of the
classes at run-time. The values of these run-time parameters – or operating characteristics –
determine the optimal model(s) for use in prediction. ROC analysis enables models to be com-
pared, which may result in some models never being optimal under any operating conditions
and can be discarded. The remaining models are those that are located on the ROC convex
hull (ROCCH).
As well as determining non-optimal models, ROC analysis can be used to combine mod-
els. One method is to use more two adjacent models on the ROCCH that are located either side
of the operating condition in combination to make run-time predictions. Another approach to
using ROCCH is to modify a single model into multiple models, that then can be plotted in
ROC space (Flach et al., 2001) resulting in models that fit a broader range of operating con-
ditions. (Wettschereck et al., 2003) describe a support system that performs model evaluation,
model visualization, and model comparison, which has been applied in a collaborative Data
Mining setting (Flach et al., 2003).
2
The axes on an ROC curve are actually the true positive rate versus the false positive rate.
1036 Steve Moyle
Fig. 54.2. Two different architectures for model evaluation. The path finishing in dashed ar-
rows depicts agents in charge of building and evaluating their own models before publishing
their results centrally. The path of solid arrows depicts Data Mining agents submitting their
models to a centralized evaluation agent which provides the services of executing submitted
models on a test set, evaluating the predictions to produce scores, and then publishing the
results. The information submitted to the central evaluation agent is: I=algorithm and param-

eter settings to produce models; M=models; P=predictions made by the models on a test set;
S=scores of the value of the models.
54.5 Discussion
References containing the keywords: collaborative Data Mining collaboration partition natu-
rally into the following categories.
• Multiple software agents applying Data Mining algorithms to solve the same problem:
(e.g. (Ramakrishnan, 2001)) this presupposes that the Data Mining task and its associated
data are well defined a priori.
• Humans using modern collaboration techniques to apply Data Mining to a single, defined
problem (e.g. (Mladenic et al., 2003)).
• Data Mining the artifacts of human collaboration: (e.g. (Biuk-Aghai and Simoff, 2001))
these artifacts are typically the conversations and associated documents collected via some
electronic based discussion forum.
• The collaboration process itself resulting in increased knowledge: a form of knowledge
growth by collection within a context.
• Grid style computing facilities collaborating to provide resources for Data Mining: (e.g.
(Singh et al., 2003)) these resources are typically providing either federated data or dis-
tributed computing power.
• Integrating Data Mining techniques into business process software: (e.g. (McDougall,
2001)) for example Enterprise Resource Planning systems, and groupware. Note that this,
too, implies a priori knowledge of what the Data Mining problems are to be solved.
54 Collaborative Data Mining 1037
This chapter focused mainly on the second item – that of humans using collaboration tech-
niques to apply Data Mining to a single task. With sufficient definition of a particular Data
Mining problem, this can lead to a multiple software agent Data Mining framework (the first
item), although this is not the aim of this chapter.
Many Data Mining challenges have been issued, which by their nature always result in
“winners” and “losers”. However, in collaborative approaches, much can be learned from the
losers as the Data Mining projects proceed. Much initial effort is required to establish a Data
Mining challenge (e.g. problem specification, data collection and preprocessing, specification

of evaluation criteria) – even before the participants register. This effort also needs to be ex-
pended in a collaborative setting so that the objectives of the Data Mining project are clearly
articulated in advance.
The application of the collaborative methodology and techniques described here has been
performed with mixed success in the data mining projects (Flach et al., 2003,Stepnkov et al.,
2003, Jorge et al., 2003). More development of collaborative Data Mining processes and sup-
porting tools and communication environments are likely to improve the results of harnessing
dispersed Data Mining expertise.
54.6 Conclusions
Collaborative Data Mining is more difficult the single team setting. Data mining benefits from
adhering to established processes. One key notion in Data Mining methodologies is that of
understanding (e.g. CRISP-DM contains the phases, business understanding and data under-
standing). How are such understandings produced, articulated, maintained, and communicated
to all collaborating agents? What happens when understandings change – how much of the
data mining process will need re-work? How does one agent’s understanding differ from an-
other, simply due to communication, language and cultural differences?
Practitioners embarking on collaborative Data Mining might wish to heed some of the
lessons learned from other collaborative Data Mining projects:
• Analyze the form of collaboration proposed and understand how difficult it is likely to be.
• Establish a methodology that all participants can utilize along with support tools and tech-
nologies.
• Ensure that all results – intermediate or otherwise – are recorded, and shared in a timely
manner.
• Encourage competition among participants.
• Define metrics for success at all stages.
• Define model evaluation and combination procedures.
References
Adriaans, P., and Zantinge, D., Data Mining. Addison-Wesley, New York, 1996.
Amara, R., New directions for innovations. Futures 53-22(2): p. 142 - 152, 1990.
Bacon, F., Novum Organum, eds. P. Urbach and J. Gibson. Open Court Publishing Company,

1994.
Biuk-Aghai, R.P. and S.J. Simoff. An integrative framework for knowledge extraction in
collaborative virtual environments.InThe 2001 International ACM SIGGROUP Con-
ference on Supporting Group Work. Boulder, Colorado, USA, 2001.
1038 Steve Moyle
Blockeel, H. and S.A. Moyle. Collaborative Data Mining needs centralised model evalua-
tion.InProceedings of the ICML-2002 Workshop on Data Mining Lessons Learned. The
University of New South Wales, Sydney, 2002.
Chapman, P., Clinton, J., Kerber, R., Khabaza, T., Reinartz, T., Shearer, C., and Wirth, R.
CRISP-DM 1.0: Step-by-step data mining guide. The CRISP-DM consortium, 2000.
Edvinsson, L. and Malone, M.S. Intellectual Capital: Realizing Your Company’s True Value
by Finding Its Hidden Brainpower. HarperBusiness, New York, USA, 1997.
Fayyad, U., et al., eds. Advances in Knowledge Discovery and Data Mining. MIT Press,
1996.
Flach, P.A., et al., Decision support for Data Mining: introduction to ROC analysis and its
application. In Data Mining and Decision Support: Integration and Collaboration, D.
Mladenic, et al., editors. Kluwer Academic Publishers, 2003.
Flach, P., Blockeel, H., Gaertner, T., Grobelnik, M., Kavsek, B., Kejkula, M., Krzywania, D.,
Lavrac, N., Mladenic, D., Moyle, S., Raeymaekers, S.,
Rauch, J., Ribeiro, R., Sclep, G., Struyf, J., Todorovski, L., Torgo, L., Wettsc
-hereck, D., and Wu, S. On the road to knowledge: mining 21 years of UK traffic acci-
dent reports, In Data Mining and Decision Support: Integration and Collaboration, D.
Mladenic, et al., editors. Kluwer Academic Publishers, 2003.
Hair, J.F., Anderson, R.E., Tatham, R.L., and Black, W.C. Multivariate Data Analysis. Pren-
tice Hall, 1998.
Holte, R.C., Very Simple Classification Rules Perform Well on Most Commonly Used
Datasets. Machine Learning, 1993.
53-3: p. 63-91.
Jorge, J., Alves, M.A., Grobelnik, M., Mladenic, D., and Petrak, J. Web site access analysis
for a national statistical agency. In Data Mining and Decision Support: Integration and

Collaboration, D. Mladenic, et al., editors, p. 157 – 166. Kluwer Academic Publishers,
2003.
Kuhn, T.S., The structure of scientific revolutions. 2nd, enlarged ed. 1962, University of
Chicago Press, Chicago, 1970.
McDougall, P., Companies that dare to share information are cashing in on new opportuni-
ties. InformationWeek, May 7, 2001.
McKenzie, J. and C. van Winkelen. Exploring E-collaboration Space. In the proceedings
of The first annual Knowledge Management Forum Conference. Henley Management
College, 2001.
Mitchell, T. Machine Learning. Department of Computer Science, Carnegie Mellon Univer-
sity. McGraw-Hill Book Company, Pittsburgh, 1997.
Mladenic, D., Lavrac, N., Bohanec, M., and Moyle, S. editors. Data Mining and Decision
Support: Integration and Collaboration. Kluwer Academic Publishers, 2003.
Mowshowitz, A., Virtual Organization. Communications of ACM, 53-40(9): p. 30 - 37. 1997.
Moyle, S. A., Srinivasan A., Classificatory challenge-Data Mining: a recipe. Informatica
53-25(3): p. 343–347. 2001.
Moyle, S., J. McKenzie, and A. Jorge, Collaboration in a Data Mining virtual organization.
In Data Mining and Decision Support: Integration and Collaboration, D. Mladenic, et
al., editors. Kluwer Academic Publishers, 2003.
Nohria, N. and R.G. Eccles, eds. Network and organizations; structure form and action.
Harvard Business School Press, Boston, 1993.
Page, C.D. and C. Hatzis, KDD Cup 2001. University of Wisconsin,
2001.
Popper, K. The Logic of Scientific Discovery. Routledge, 1977.
54 Collaborative Data Mining 1039
Provost, F. and T. Fawcett. Robust Classification for Imprecise Environments. Machine
Learning 53-42: p. 203-231, 2001.
Ramakrishnan., R. Mass Collaboration and Data Mining (keynote address).InThe Seventh
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(KDD-2001). San Francisco, California, 2001.

Singh, R., Leigh, J., DeFanti, T.A., and Karayannis F. TeraVision: a High Resolution Graph-
ics Streaming Device for Amplified Collaboration Environments. Journal of Future Gen-
eration Computer Systems (FGCS). 53-19(6): p. 957-972, 2003.
Snow, C.C., S.A. Snell, and S.C. Davison. Using transnational teams to globalize your com-
pany. Organizational Dynamics 53-24(4): p. 50 - 67, 1996.
SolEuNet. The Solomon European Netowrk – Data Mining and Decision Support for Busi-
ness Competitiveness: A European Virtual Enterprise.
2002.
Soukhanov, A., ed. Microsoft Encarta College Dictionary: The First Dictionary for the In-
ternet Age. St. Martin’s Press, 2001.
A. Srinivasan, R.D. King, and D.W. Bristol. An assessment of submissions made to the Pre-
dictive Toxicology Evaluation Challenge.InProceedings of the Sixteenth International
Conference on Artificial Intelligence (IJCAI-99). Morgan Kaufmann, Los Angeles, CA,
1999.
Stepnkov, O., J. Klma, and P. Mikovsk. Collaborative Data Mining with RAMSYS and Suma-
tra TT: Prediction of resources for a health farm.InData Mining and Decision Support:
Integration and Collaboration, D. Mladenic, et al., editors. p. 215 – 227. Kluwer Aca-
demic Publishers, 2003.
The Data Mining Group, The Predictive Model Markup Language (PMML).
2003.
Vo, A., Richter, G., Moyle, S., Jorge, A. Collaboration support for virtual data mining en-
terprises.In3rd International Workshop on Learning Software Organizations (LSO’01).
Springer-Verlag, 2001.
Wettschereck, D., A. Jorge, and S. Moyle. Visaulisation and Evaluation Support of Knowl-
edge Discovery through the Predictive Model Markup Language.In7th International
Knowledge-Based Intelligent Information and Engineering Systems (KES 2003), Ox-
ford. Springer-Verlag, 2003.
Wilson, T.D. The nonsense of knowledge management. Information Research 53-8(1), 2002.
Witten, I.H. and E. Frank. Data Mining: Practical Machine Learning Tools and Techniques
with Java Implementations. Morgan Kaufmann, San Francisco, 2000.

×