Data Mining and Knowledge Discovery Handbook, 2 Edition part 38 pptx

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (382.24 KB, 10 trang )

350 Jean-Francois Boulicaut and Baptiste Jeudy
describe a mining algorithm but rather a pruning technique for non anti-monotonic
and non monotonic constraints. Considering a sub-lattice
˚
A of 2
I
, the problem is to
decide whether this sub-lattice can be pruned. A sub-lattice is characterized by its
maximal element M and its minimal element m, i.e., the sub-lattice is the collection
of all itemsets S such that m ⊆ S ⊆ M. To prune this sub-lattice, one must prove that
none of its elements can satisfy the constraint C . To check this, the authors introduce
the concept of negative witness: a negative witness for C in the sub-lattice
˚
A is an
itemset W such that ¬C (W ) ⇒∀X ∈
˚
A, ¬C (X). Therefore, if the constraint is not
satisﬁed by the negative witness, then the whole sub-lattice can be pruned. Finding
witnesses for anti-monotonic or monotonic constraints is easy : m is the witness for
all anti-monotonic constraints and M for all monotonic ones. The authors then show
how to compute efﬁciently witnesses for various tough constraints. For instance, for
AV G (S) >
σ
, a witness is the set m ∪
{
i ∈ M |i.v >
σ
}
. The authors also gives an
algorithm (linear in the size of I ) to compute a witness for the difﬁcult constraint
(VA R (S) >

σ
) where VAR denotes the variance.
17.4.3 Ad-hoc Strategies
Apart from generic algorithms, many algorithms have been designed to cope with
speciﬁc classes of constraints. We select only two examples.
The FIC algorithm (Pei et al., 2001) does a depth-ﬁrst exploration of the item-
set lattice. It is very efﬁcient due to its clever data structure, a preﬁx-tree used to
store the database. This algorithm can compute the extended theory for a conjunction
C
am
∧C
m
∧C

where C

is convertible anti-monotonic or monotonic. A constraint
C

is convertible anti-monotonic if there exists an order on the items such that, if
itemsets are written using this order, every preﬁx of an itemset satisfying C

satisﬁes
C

. For instance, AVG(S) >
σ
is convertible anti-monotonic if the items i are or-
dered by decreasing value i.v. The main problem with convertible constraints is that
a conjunction of convertible constraints is generally not convertible.

Another example of an ad-hoc strategy is used in the c-Spade algorithm (Zaki,
2000). This algorithm is used to extract constrained sequences where each event
in the sequences is dated. One of the constraints, the max −gap constraint, states
that two consecutive events occurring in a pattern must not be further apart than a
given maximum gap. This constraint is neither anti-monotonic nor monotonic and a
speciﬁc algorithm has been designed for it.
17.4.4 Other Directions of Research
Among others, let us introduce here three important directions of research.
Adaptive Pruning Strategies
We mentioned the trade-off between anti-monotonic pruning which is known to be
quite efﬁcient and pruning based on non anti-monotonic constraints. Since the se-
lectivity of the various constraints is generally unknown, a quite exciting challenge
17 Constraint-based Data Mining 351
is to look for adaptive strategies which can decide of the pruning strategy dynam-
ically. (Bonchi et al., 2003A, Bonchi et al., 2003B) propose algorithms for fre-
quent itemsets under syntactical monotonic constraints. (Albert-Lorincz and Bouli-
caut, 2003) considers frequent sequence mining under regular expression constraints.
These are promising approaches to widen the applicability of constraint-based min-
ing techniques in real contexts.
Combining Constraints and Condensed Representations
A few papers, e.g., (Boulicaut and Jeudy, 2000, Bonchi and Lucchese, 2004), deal
with the problem of extracting constrained condensed representation. In these works,
the aim is to compute a condensed representation of the extended theory
Th
x
(D,2
I
,C
am
∧C

m
,freq). In (Boulicaut and Jeudy, 2000), the authors use free
itemsets, i.e., their algorithm computes the extended theory Th
x
(D,2
I
,C
am
∧C
m
∧
C
free
,freq). In (Bonchi and Lucchese, 2004), the authors use closed itemsets, i.e.,
their algorithm computes the extended theory Th
x
(D,2
I
,C
am
∧C
m
∧C
clos
,freq).
However, in these two works, the deﬁnition of free sets and closed sets have been
modiﬁed to be able to regenerate the extended theory Th
x
(D,2
I

,C
am
∧C
m
,freq)
from the extracted theories. This kind of research combines the advantages of both
condensed representations and constrained mining which result in very efﬁcient al-
gorithms.
Constraint-based Mining of more Complex Pattern
Domains
Most of the recent results have concerned simple local pattern discovery tasks like
the ones based on itemsets or sequences. We believe that inductive querying is much
more general. Many open problems are however to be addressed. For instance, even
constraint-based mining of association rules is already much harder than constraint-
based mining of itemsets (Lakshmanan et al., 1999,Jeudy and Boulicaut, 2002). The
recent work on the MINE RULE query language (Meo et al., 1998) is also typical
of the difﬁculty to optimize constraint-based association rule mining (Meo, 2003).
When considering model mining under constraints (e.g., classiﬁer design or clus-
tering), only very preliminary approaches are available (see, e.g., (Garofalakis and
Rastogi, 2000)). We think that this will be a major issue for research in the next few
years. For instance, for clustering, it seems important to go further than the classical
similarity optimization constraints and enable to specify other constraints on clusters
(e.g., enforcing that some objects are or are not within the same clusters).
17.5 Conclusion
In this chapter, we have considered constraint-based mining approaches, i.e., the core
techniques for inductive querying.
352 Jean-Francois Boulicaut and Baptiste Jeudy
This domain has been studied a lot for simple pattern domains like itemsets or
sequences. Rather general forms of inductive queries on these domains (e.g., ar-
bitrary boolean expressions over monotonic and anti-monotonic constraints) have

been considered. Beside the many ad-hoc algorithms, an interesting effort has con-
cerned generic algorithms. Many open problems are still there: how to solve tough
constraints?, how to design relevant approximation or relaxation schemes? how to
combine constraint-based mining with condensed representations, not only for sim-
ple pattern domains but also more complex ones?
Moreover, within the inductive database framework, the problem is to optimize
sequences of queries and typically sequences of correlated inductive queries. It is
crucial to consider that the optimization of a query and thus constraint-based mining
must also take into account the previously solved queries. Looking for the formal
properties between inductive queries, especially containment, is thus a major priority.
Here again, we believe that condensed representations might play a major role.
Last but not the least, a quite challenging problem is to consider from where
the constraints come. The analysts can think in terms of constraints or declarative
speciﬁcations which are not supported by the available solvers: an obvious example
could be unexpectedness or novelty w.r.t. some explicit background knowledge. To
be able to derive appropriate inductive queries based on a limited number of primi-
tives (and some associated solvers) from the constraints expressed by the analysts is
challenging.
References
R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast discovery of
association rules. In Advances in Knowledge Discovery and Data Mining, pages 307–
328. AAAI Press, 1996.
H. Albert-Lorincz and J F. Boulicaut. Mining frequent sequential patterns under regular
expressions: a highly adaptative strategy for pushing constraints. In Proc. SIAM DM’03,
pages 316–320, 2003.
Y. Bastide, N. Pasquier, R. Taouil, G. Stumme, and L. Lakhal. Mining minimal non-
redundant association rules using frequent closed itemsets. In Proc. CL 2000, volume
1861 of LNCS, pages 972–986. Springer-Verlag, 2000.
Y. Bastide, R. Taouil, N. Pasquier, G. Stumme, and L. Lakhal. Mining frequent patterns with
counting inference. SIGKDD Explorations, 2(2):66–75, 2000.

R. J. Bayardo. Efﬁciently mining long patterns from databases. In Proc. ACM SIGMOD’98,
pages 85–93, 1998.
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Adaptive constraint pushing in fre-
quent pattern mining. In Proc. PKDD’03, volume 2838 of LNAI, pages 47–58. Springer-
Verlag, 2003A.
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Examiner: Optimized
level-wise frequent pattern mining with monotone constraints. In Proc.
IEEE ICDM’03, pages 11–18, 2003B.
F. Bonchi, F. Giannotti, A. Mazzanti, and D. Pedreschi. Exante: Anticipated data reduction
in constrained pattern mining. In Proc. PKDD’03, volume 2838 of LNAI, pages 59–70.
Springer-Verlag, 2003C.
17 Constraint-based Data Mining 353
F. Bonchi and C. Lucchese. On closed constrained frequent pattern mining. In Proc. IEEE
ICDM’04 (In Press), 2004.
J F. Boulicaut. Inductive databases and multiple uses of frequent itemsets: the cInQ ap-
proach. In Database Technologies for Data Mining - Discovering Knowledge with In-
ductive Queries, volume 2682 of LNCS, pages 1–23. Springer-Verlag, 2004.
J F. Boulicaut and A. Bykowski. Frequent closures as a concise representation for binary
Data Mining. In Proc. PAKDD’00, volume 1805 of LNAI, pages 62–73. Springer-Verlag,
2000.
J F. Boulicaut, A. Bykowski, and C. Rigotti. Approximation of frequency queries by mean
of free-sets. In Proc. PKDD’00, volume 1910 of LNAI, pages 75–85. Springer-Verlag,
2000.
J F. Boulicaut, A. Bykowski, and C. Rigotti. Free-sets : a condensed representation of
boolean data for the approximation of frequency queries. Data Mining and Knowledge
Discovery, 7(1):5–22, 2003.
J F. Boulicaut and B. Jeudy. Using constraint for itemset mining: should we prune or not?
In Proc. BDA’00, pages 221–237, 2000.
J F. Boulicaut and B. Jeudy. Mining free-sets under constraints. In Proc. IEEE IDEAS’01,
pages 322–329, 2001.

C. Bucila, J. E. Gehrke, D. Kifer, and W. White. Dualminer: A dual-pruning algorithm for
itemsets with constraints. Data Mining and Knowledge Discovery, 7(4):241–272, 2003.
D. Burdick, M. Calimlim, and J. Gehrke. MAFIA: A maximal frequent itemset algorithm
for transactional databases. In Proc. IEEE ICDE’01, pages 443–452, 2001.
A. Bykowski and C. Rigotti. DBC: a condensed representation of frequent patterns for
efﬁcient mining. Information Systems, 28(8):949–977, 2003.
T. Calders and B. Goethals. Mining all non-derivable frequent itemsets. In Proc. PKDD’02,
volume 2431 of LNAI, pages 74–85. Springer-Verlag, 2002.
B. Cr
´
emilleux and J F. Boulicaut. Simplest rules characterizing classes generated by delta-
free sets. In Proc. ES 2002, pages 33–46. Springer-Verlag, 2002.
L. De Raedt. A perspective on inductive databases. SIGKDD Explorations, 4(2):69–77,
2003.
L. De Raedt, M. Jaeger, S. Lee, and H. Mannila. A theory of inductive query answering. In
Proc. IEEE ICDM’02, pages 123–130, 2002.
L. De Raedt and S. Kramer. The levelwise version space algorithm and its application to
molecular fragment ﬁnding. In Proc. IJCAI’01, pages 853–862, 2001.
M. M. Garofalakis and R. Rastogi. Scalable Data Mining with model constraints. SIGKDD
Explorations, 2(2):39–48, 2000.
M. M. Garofalakis, R. Rastogi, and K. Shim. SPIRIT: Sequential pattern mining with regular
expression constraints. In Proc. VLDB’99, pages 223–234, 1999.
B. Goethals and M. J. Zaki, editors. Proc. of the IEEE ICDM 2003 Workshop on Frequent
Itemset Mining Implementations, volume 90 of CEUR Workshop Proceedings, 2003.
D. Gunopulos, R. Khardon, H. Mannila, S. Saluja, H. Toivonen, and R. S.
Sharm. Discovering all most speciﬁc sentences. ACM Transactions on
Database Systems, 28(2):140–174, 2003.
T. Imielinski and H. Mannila. A database perspective on knowledge discovery. Communi-
cations of the ACM, 39(11):58–64, 1996.
B. Jeudy and J F. Boulicaut. Optimization of association rule mining queries. Intelligent

Data Analysis, 6(4):341–357, 2002.
D. Kifer, J. E. Gehrke, C. Bucila, and W. White. How to quickly ﬁnd a witness. In Proc.
ACM PODS’03, pages 272–283, 2003.
354 Jean-Francois Boulicaut and Baptiste Jeudy
S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In Proc.
ACM SIGKDD’01, pages 136–143, 2001.
L. V. Lakshmanan, R. Ng, J. Han, and A. Pang. Optimization of constrained frequent set
queries with 2-variable constraints. In Proc. ACM SIGMOD’99, pages 157–168, 1999.
D I. Lin and Z. M. Kedem. Pincer search: An efﬁcient algorithm for discovering the maxi-
mum frequent sets. IEEE Transactions on Knowledge and Data Engineering, 14(3):553–
566, 2002.
H. Mannila and H. Toivonen. Multiple uses of frequent sets and condensed representations.
In Proc. KDD’96, pages 189–194. AAAI Press, 1996.
H. Mannila and H. Toivonen. Levelwise search and borders of theories in knowledge discov-
ery. Data Mining and Knowledge Discovery, 1(3):241–258, 1997.
C. Mellish. The description identiﬁcation problem. Artiﬁcial Intelligence,
52(2):151–168, 1992.
R. Meo. Optimization of a language for Data Mining. In Proc. ACM SAC’03 - Data Mining
Track, pages 437–444, 2003.
R. Meo, G. Psaila, and S. Ceri. An extension to SQL for mining association rules. Data
Mining and Knowledge Discovery, 2(2):195–224, 1998.
T. Mitchell. Generalization as search. Artiﬁcial Intelligence, 18(2):203–226, 1980.
R. Ng, L. V. Lakshmanan, J. Han, and A. Pang. Exploratory mining and
pruning optimizations of constrained associations rules. In Proc. ACM
SIGMOD’98, pages 13–24, 1998.
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Efﬁcient mining of association rules using
closed itemset lattices. Information Systems, 24(1):25–46, 1999.
J. Pei, G. Dong, W. Zou, and J. Han. On computing condensed frequent pattern bases. In
Proc. IEEE ICDM’02, pages 378–385, 2002.
J. Pei, J. Han, and L. V. S. Lakshmanan. Mining frequent itemsets with convertible con-

straints. In Proc. IEEE ICDE’01, pages 433–442, 2001.
R. Srikant, Q. Vu, and R. Agrawal. Mining association rules with item constraints. In Proc.
ACM SIGKDD’97, pages 67–73, 1997.
M. J. Zaki. Sequence mining in categorical domains: incorporating constraints. In Proc.
ACM CIKM’00, pages 422–429, 2000.
18
Link Analysis
Steve Donoho
Mantas, Inc.
Summary. Link analysis is a collection of techniques that operate on data that can be rep-
resented as nodes and links. This chapter surveys a variety of techniques including subgraph
matching, ﬁnding cliques and K-plexes, maximizing spread of inﬂuence, visualization, ﬁnd-
ing hubs and authorities, and combining with traditional techniques (classiﬁcation, clustering,
etc). It also surveys applications including social network analysis, viral marketing, Internet
search, fraud detection, and crime prevention.
Key words: Link analysis, Social network analysis, Graph theory
18.1 Introduction
The term ”link analysis” does not refer to one speciﬁc technique or algorithm. Rather
it refers to a collection of techniques that are bound together by the type of data they
operate on. Link analysis techniques are applied to data that can be represented as
nodes and links as in Figure 18.1.
A node represents an entity such as a person, a document, or a bank account.
Nodes are sometimes referred to as ”vertices.” A link represents a relationship be-
tween two entities such as a parent/child relationship between two people, a reference
relationship between two documents, or a transaction between two bank accounts.
Links are sometimes referred to as ”edges.” Because links show relationships among
entities, this type of data is often referred to as relational data.
This is as opposed to attribute vector data used by many other unsupervised and
supervised Data Mining techniques. In most standard Data Mining techniques, data
is represented as a set of tuples (a vector of attribute values). Each tuple represents

an entity, but there is no explicit data about relationships among entities. In link
analysis, information exists about the relationships among entities, and analysis of
these relationships is the focus of the ﬁeld.
The roots of link analysis predate the use of modern computers. Law enforce-
ment ofﬁcials have carried out manual link analysis for many years. When a crime
is investigated, a network such as in Figure 18.1 is drawn where the nodes represent
O. Maimon, L. Rokach (eds.), Data Mining and Knowledge Discovery Handbook, 2nd ed.,
DOI 10.1007/978-0-387-09823-4_18, © Springer Science+Business Media, LLC 2010
356 Steve Donoho
Fig. 18.1. Node and Link Data Used by Link Analysis Techniques.
people, weapons, crime scenes, etc. One person may be linked to another if they
are family, friends, roommates, or business partners. A person may be linked to a
weapon if it is registered in his name or if it was found at his home. Once a network
of relationships is drawn out, the bigger picture of a crime emerges from the details.
Holes in the network become apparent, and they are areas for further investigation.
Hypotheses can be formed and tested.
Sociologists also performed manual link analysis long before there were com-
puters. The structure of a clan or tribe would be mapped out with nodes represent-
ing people and links representing family, work, or social relationships. From this a
sociologist could deduce who held powerful positions within the clan, who might
inﬂuence who else, how information might spread within the clan, and what factions
might arise.
The advent of computers allowed these techniques to become much more wide-
spread and to be applied on a much larger scale. All 10 million of a bank’s customers
can be analysed for money laundering relationships. The hundreds of millions of
documents on the Internet can be analysed to determine which are most respected
and reliable. Large communities can be analysed to determine how information and
opinions spread and who are the most inﬂuential individuals.
This chapter surveys the techniques that fall under the umbrella of link analysis
and how these techniques are being applied. Section 18.2 presents some key con-

cepts from the ﬁeld of Social Network Analysis. Section 18.3 examines how link
analysis techniques are used to improve search engine results. Section 18.4 looks at
recent link analysis ideas emerging from the ﬁeld of viral marketing. Section 18.5
shows how fraud detection and law enforcement have presented unique challenges
and opportunities for link analysis. Finally, Section 18.6 surveys recent combinations
of link analysis with traditional Data Mining techniques.
18 Link Analysis 357
18.2 Social Network Analysis
The ﬁeld of social network analysis (Wasserman, 1994,Hanneman, 2001) has devel-
oped over many years as sociologists developed formal methods of studying groups
of people and their relationships. When studying a social network, there are many
questions sociologists are interested in answering:
1. Which people are powerful?
2. Which people inﬂuence other people?
3. How does information spread within the network?
4. Who is relatively isolated, and who is well connected?
5. In a disagreement, who is likely to side with whom?
6. What roles do people play in an organization, and who has similar roles?
While concepts such as powerful, inﬂuential, isolated, and connected are somewhat
subjective, social network analysis methods give us a baseline for measuring and
making comparisons.
Fig. 18.2. Three Networks to Illustrate an Individual’s Power within a Network.
Many things can make a person powerful within a group. Consider the shaded
nodes in the three networks shown in Figure 18.2 (Hanneman, 2001). The person at
the center of the star intuitively seems more powerful than the one in the circle or the
one at the end of the line. If the people in the star want to communicate with each
other they have to go through the center person, and that person has the power to
either facilitate or hamper communication. If the people in the star want to engage
in business, they have to go through the person in the center, and that person has the
power to charge a fee as the middleman. In contrast, the shaded node in the circular

network is the most convenient path of communication or trade for some nodes, but
he is not the only path. Intuitively, he has less power than the center of the star. The
shaded node at the end of the line is dependant on others for communication and
trade but has no one who is dependant on him. Intuitively, he has little or no power.
The networks in Figure 18.2 illustrate how ”centrality” is one measure of power.
The node at the center of the star derives its power from being in the center of its
network. The shaded nodes in the circle and line are less central to their networks
358 Steve Donoho
and are therefore less powerful. Some quantitative methods of measuring centrality
are:
1. Degree. The shaded node in the star network is linked to six other nodes and thus
has a degree of six. All the other nodes in the star have a degree of one and are
comparatively less central. All the nodes in the circle have the same degree: two.
The shaded node in the line has a degree of one and is thus slightly less central
than other nodes in the line with degree two.
2. Closeness. The average distance from the shaded node in the star to all other
nodes is 1.0. This node has very direct access to everyone else. Other nodes
in the star have an average distance of 1.8. All the nodes in the circle have an
average distance of 2.0. The node at the end of the line has an average distance
of 3.5 whereas the node in the center of the line has an average distance of 2.0.
3. Betweenness. The shaded node in the star is between all other 15 pairs of nodes.
In the circle there are two paths between each pair of nodes. The shaded node in
the circle is on a path between all other 15 pairs, but since there is an alternative
path between each pair, the shaded node is on 50% of the paths between pairs.
The node at the end of the line is between no pairs. The node one from the end
of the line is on paths between 5 pairs (33% of 15 paths). The node at the center
of the line is on paths between 9 pairs (60% of 15 paths).
4. Cutpoints. Related to betweenness, cutpoints are nodes that if removed divide the
network into unconnected systems. These nodes hold particular power because
they are the only point of contact between otherwise disconnected networks. If

the center of the star is removed, six disconnected systems result. If a node in the
circle is removed, the network is still connected. If a non-end node is removed
from the line, two disconnected systems result.
A clique is a small, highly-interconnected group within a larger network. Cliques
are of interest for several reasons. Ideas or information may spread extremely quickly
within a clique because of the high connectivity. Members of a clique often act and
behave as a cohesive unit. Disputes may form between cliques (”factions”). A person
can be described with respect to the clique(s) they belong to. A person who is only
connected to people in his clique is called a ”local” and is strongly inﬂuenced by the
clique. A person who belongs to many cliques is called a ”cosmopolitan” and serves
to bring outside ideas and information into a clique.
The most strict deﬁnition of a clique is a complete subgraph (all nodes in the
clique must be linked to all other nodes). A couple more relaxed deﬁnitions are:
1. K-plexes. A group of N nodes is a K-plex if each of the nodes is connected to at
least N-K other nodes in the group. Intuitively, if K=2 then every member of the
clique has to be connected to all but two of the other members.
2. K-cores. The deﬁnition of a K-core is slightly more relaxed than that of a K-plex.
A K-core is a maximal group of nodes all of which are connected to at least K
other nodes in the group. For example, if K=4 then every member of the clique
is connected to at least 4 other clique members.
18 Link Analysis 359
The concept of ”equivalence” is very important within social networks. It makes it
possible to determine if a person is playing a particular role within a network. This
allows both intra-network comparisons (one node has the same role as another node
within one network) and inter-network comparison (two nodes in different networks
are playing the same role). Two measures of equivalence are:
1. Structural Equivalence. This is a strict measure of equivalence between two
nodes. Two nodes are exactly structurally equivalent if they are linked to exactly
the same other nodes. If not exactly equivalent, the degree of partial structural
equivalence can be measured using the degree of overlap in nodes they are linked

to.
2. Regular equivalence. Regular equivalence is a less strict deﬁnition than structural
equivalence. Two nodes have regular equivalence if the nodes they are linked to
are regular equivalents. For example, Fred Flintstone is the regular equivalent of
Barney Rubble because Fred is the husband of Wilma, and Barney is the husband
of Betty, and Wilma and Betty are regular equivalents.
On a broader scale, equivalence of nodes lays the groundwork for measuring the
similarity of one whole social network to another whole social network. This is useful
for matching a network against a known template in order to identify the nature of
the network as will be seen in Section 5 on Fraud Detection and Law Enforcement.
Many groups such as academic circles, fraud rings, business circles, shoppers
with common interests, and professional societies can be represented as social net-
works. Because of this, Social Network Analysis lays the groundwork for many im-
portant real-world applications.
18.3 Search Engines
The Internet is rich in relational data by the simple fact that web pages are linked to
other web pages. While traditional search techniques such as keyword searches fo-
cus exclusively on the content of a single page, newer techniques (Page et al., 1998,
Kleinberg, 1999) exploit relationships among pages. A user performing a search
wants to ﬁnd results that are not only relevant but are also authoritative and reli-
able. A keyword search on ”stock market” will not only return authoritative sites
such as the NASDAQ and NYSE pages, it will also return pages from thousands of
self-proclaimed gurus selling books, software, and advice. The truly reliable sources
of information are likely to be lost among the self-proclaimed gurus. Is there a way
to separate the wheat from the chaff? This is where relational information contained
in links comes into play.
An authoritative site such as the NASDAQ is likely to be recognized as authori-
tative by many people; therefore, many other sites are likely to point to the NASDAQ
site. But a self-proclaimed stock market guru is less likely to have many other sites
pointing to his site unless there truly is some merit to what he has to say. When one

site references another site, it is in fact declaring that that site has some merit – it
is casting a vote for the value and importance of the other site. Conceptually, this

Data Mining and Knowledge Discovery Handbook, 2 Edition part 38 pptx

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về