1.
2.
How to compute betweenness?
How to select the number of
clusters?
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
1
Want to compute
betweenness of
paths starting at
node �
Breath first search
starting from �:
0
1
2
3
4
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
2
Count the number of shortest paths from
� to all other nodes of the network:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
3
Compute betweenness by working up the
tree: If there are multiple paths count them
fractionally
The algorithm:
•Add edge flows:
-- node flow =
1+∑child edges
-- split the flow up
based on the parent
value
• Repeat the BFS
procedure for each
starting node �
1+1 paths to H
Split evenly
1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
4
Compute betweenness by working up the
tree: If there are multiple paths count them
fractionally
The algorithm:
•Add edge flows:
-- node flow =
1+∑child edges
-- split the flow up
based on the parent
value
• Repeat the BFS
procedure for each
starting node �
1+1 paths to H
Split evenly
1+0.5 paths to J
Split 1:2
1 path to K.
Split evenly
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
5
Graph Partitioning
• Methods to break a network into sets of
connected components called regions
• Many general approaches
– Divisive methods: Repeatedly identify and
remove edges connecting densely connected
regions
– Agglomerative methods: Repeatedly identify and
merge nodes that likely belong in the same region
[Girvan-Newman ‘02]
Divisive hierarchical clustering based on the
notion of edge betweenness:
Number of shortest paths passing through the edge
Girvan-Newman Algorithm:
§ Undirected unweighted networks
§ Repeat until no edges are left:
§ Calculate betweenness of edges
§ Remove edges with highest betweenness
§ Connected components are communities
§ Gives a hierarchical decomposition of the network
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
7
Girvan-Newman Algorithm
• Divisive method Proposed by Girvan and
Newman in 2002
• Uses edge betweenness to identify edges to
remove
• Edge betweenness: Total amount of “flow” an
edge carries between all pairs of nodes where a
single unit of flow between two nodes divides
itself evenly among all shortest paths between the
nodes (1/k units flow along each of k shortest
paths)
Girvan-Newman Algorithm
1. Calculate betweenness of all edges
2. Remove the edge(s) with highest betweenness
3. Repeat steps 1 and 2 until graph is partitioned
into as many regions as desired
Girvan-Newman Algorithm
Girvan-Newman Algorithm
v The idea is that it is likely that edges connecting
separate modules have high edge betweenness as
all the shortest paths from one module to another
must traverse through them.
v So if we gradually remove the edge with the
highest edge betweenness score we will get a
hierarchical map, a rooted tree, called a
dendrogram of the graph.
§ The leafs of the tree are the individual vertices
§ The root of the tree represents the whole graph
Girvan-Newman Algorithm
v Step 1. Find the edge of highest betweenness - or
multiple edges of highest betweenness, if there is a tie and remove these edges from the graph.
§ This may cause the graph to separate into multiple
components. If so, this is the first level of regions in
the partitioning of the graph.
v Step 2. Recalculate all betweennesses, and again remove
the edge or edges of highest betweenness.
§ This may break some of the existing components into
smaller components; if so, these are regions nested
within the larger regions.
v Step 3. Proceed in this way as long as edges remain in
graph, in each step recalculating all betweennesses and
removing the edge or edges of highest betweenness.
Girvan-Newman Algorithm
v The method gives us only a succession of splits of
the network into smaller and smaller
communities, but it gives no indication of which
splits are best.
v One way to find the best split is via the the
modularity concept: “The spectral modularity
maximization community detection algorithm”
Girvan-Newman: Example1
Girvan-Newman: Example1
Girvan-Newman: Example1
Ø edge.betweenness(mynetwork)
Ø [1] 2.0 3.0 3.5 2.5 3.5 5.5 5.0
Example1: Delete edge (4,5)
Øedge.betweenness(mynetwork)
Ø[1] 4 1 9 4 8 5
Example1: Delete edge (2,3)
Øedge.betweenness(mynetwork)
Ø[1] 1 1 1 2 2
Example1: Delete edge (3,4) and (4,6)
Øedge.betweenness(mynetwork)
Ø[1] 1 1 1
Example1: Delete edge (2,1), (5,1) and (5,2)
Øedge.betweenness(mynetwork)
Ønumeric(0)
Example1: An example via package igraph: the
edge.betweenness.community() function
Example1: An example via package igraph: the
edge.betweenness.community() function
Øedge.betweenness.community(mynetwork)
ØIGRAPH clustering edge betweenness,
Øgroups: 2, mod: 0.2 + groups:
Ø$`1`
Ø[1] "1" "2" "5"
Ø$`2`
Ø[1] "3" "4" "6"
Example1: Plot dendrogram for hierarchical
methods via igraph function plot_dendrogram()
1
12
33
49
Need to re-compute
betweenness at
every step
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
24
Step 1:
Step 3:
Step 2:
Hierarchical network decomposition:
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,
25