Tải bản đầy đủ (.pdf) (77 trang)

Girvan newman algorithm

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (4.11 MB, 77 trang )

1.
2.

How to compute betweenness?
How to select the number of
clusters?

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

1




Want to compute
betweenness of
paths starting at
node �



Breath first search
starting from �:

0
1
2
3
4

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,



2




Count the number of shortest paths from
� to all other nodes of the network:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

3




Compute betweenness by working up the
tree: If there are multiple paths count them
fractionally

The algorithm:
•Add edge flows:
-- node flow =
1+∑child edges
-- split the flow up
based on the parent
value
• Repeat the BFS
procedure for each
starting node �


1+1 paths to H
Split evenly
1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

4




Compute betweenness by working up the
tree: If there are multiple paths count them
fractionally

The algorithm:
•Add edge flows:
-- node flow =
1+∑child edges
-- split the flow up
based on the parent
value
• Repeat the BFS
procedure for each
starting node �

1+1 paths to H

Split evenly
1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

5


Graph Partitioning
• Methods to break a network into sets of
connected components called regions
• Many general approaches
– Divisive methods: Repeatedly identify and
remove edges connecting densely connected
regions
– Agglomerative methods: Repeatedly identify and
merge nodes that likely belong in the same region


[Girvan-Newman ‘02]



Divisive hierarchical clustering based on the
notion of edge betweenness:
Number of shortest paths passing through the edge




Girvan-Newman Algorithm:
§ Undirected unweighted networks

§ Repeat until no edges are left:
§ Calculate betweenness of edges
§ Remove edges with highest betweenness

§ Connected components are communities
§ Gives a hierarchical decomposition of the network
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

7


Girvan-Newman Algorithm
• Divisive method Proposed by Girvan and
Newman in 2002
• Uses edge betweenness to identify edges to
remove
• Edge betweenness: Total amount of “flow” an
edge carries between all pairs of nodes where a
single unit of flow between two nodes divides
itself evenly among all shortest paths between the
nodes (1/k units flow along each of k shortest
paths)


Girvan-Newman Algorithm
1. Calculate betweenness of all edges

2. Remove the edge(s) with highest betweenness
3. Repeat steps 1 and 2 until graph is partitioned
into as many regions as desired


Girvan-Newman Algorithm


Girvan-Newman Algorithm
v The idea is that it is likely that edges connecting
separate modules have high edge betweenness as
all the shortest paths from one module to another
must traverse through them.
v So if we gradually remove the edge with the
highest edge betweenness score we will get a
hierarchical map, a rooted tree, called a
dendrogram of the graph.
§ The leafs of the tree are the individual vertices
§ The root of the tree represents the whole graph


Girvan-Newman Algorithm
v Step 1. Find the edge of highest betweenness - or
multiple edges of highest betweenness, if there is a tie and remove these edges from the graph.
§ This may cause the graph to separate into multiple
components. If so, this is the first level of regions in
the partitioning of the graph.
v Step 2. Recalculate all betweennesses, and again remove
the edge or edges of highest betweenness.
§ This may break some of the existing components into

smaller components; if so, these are regions nested
within the larger regions.
v Step 3. Proceed in this way as long as edges remain in
graph, in each step recalculating all betweennesses and
removing the edge or edges of highest betweenness.


Girvan-Newman Algorithm
v The method gives us only a succession of splits of
the network into smaller and smaller
communities, but it gives no indication of which
splits are best.
v One way to find the best split is via the the
modularity concept: “The spectral modularity
maximization community detection algorithm”


Girvan-Newman: Example1


Girvan-Newman: Example1


Girvan-Newman: Example1

Ø edge.betweenness(mynetwork)
Ø [1] 2.0 3.0 3.5 2.5 3.5 5.5 5.0


Example1: Delete edge (4,5)


Øedge.betweenness(mynetwork)
Ø[1] 4 1 9 4 8 5


Example1: Delete edge (2,3)

Øedge.betweenness(mynetwork)
Ø[1] 1 1 1 2 2


Example1: Delete edge (3,4) and (4,6)

Øedge.betweenness(mynetwork)
Ø[1] 1 1 1


Example1: Delete edge (2,1), (5,1) and (5,2)

Øedge.betweenness(mynetwork)
Ønumeric(0)


Example1: An example via package igraph: the
edge.betweenness.community() function


Example1: An example via package igraph: the
edge.betweenness.community() function
Øedge.betweenness.community(mynetwork)

ØIGRAPH clustering edge betweenness,
Øgroups: 2, mod: 0.2 + groups:
Ø$`1`
Ø[1] "1" "2" "5"
Ø$`2`
Ø[1] "3" "4" "6"


Example1: Plot dendrogram for hierarchical
methods via igraph function plot_dendrogram()


1

12
33

49

Need to re-compute
betweenness at
every step

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

24


Step 1:


Step 3:

Step 2:

Hierarchical network decomposition:

J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

25


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×