CS224W: Analysis of Networks
Jure Leskovec, Stanford University
Network Metrics
¡
¡
¡
Many metrics at the node level:
Network Metrics
There are many available metrics at the node level:
§ E.g.,
node degree, PageRank score, node clustering
–
E.g. degree, betweenness, closeness
There are many available metrics at the node level:
Many
– metrics at the whole-network level:
– E.g. degree, betweenness, closeness
There
are also many metrics at the global level:
–
§ E.g.,
diameter,
clustering, size of giant component
– E.g. average distance, density, clustering coefficient
There are also many metrics at the global level:
– E.g.
What
about
in-between?
average something
distance, density, clustering
coefficient
What about something inbetween?
§ A What
mesoscale
characterization
about something
inbetween?of networks
>
10/9/18
?? ?
Macroscopic:
Mesoscopic
Whole network
Jure Leskovec, Stanford CS224W: Analysis of Networks
>
Microscopic:
Pedro Ribeiro
Single node
Pedro Ribeiro
2
Building of
Blocks
of Networks
Building Blocks
Networks
¡
Subnetworks,
or subgraphs,
are the
Subnetworks,
or subgraphs,
arebuilding
the building
Subnetworks,
or subgraphs,
blocks
of networks are the building
blocks
of networks:
blocks of networks
They have the power to characterize and
discriminate networks
¡
10/9/18
Theyhave
havethe
the power
power to
They
tocharacterize
characterizeand
and
discriminate networks
discriminate networks
Jure Leskovec, Stanford CS224W: Analysis of Networks
Pedro Ribeiro
Pedro Ribeiro
3
Subgraph decomposition of an electronic circuit
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
Oxford Protein Informatics Group
4
Example Application
Consider
directed
Let’s
considerall
all possible
possible (non-isomorphoic)
subgraphs of size 3
directed subgraphs of size 3
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
5
Pedro Ribeiro
¡
For each subgraph:
§ Imagine you have a metric capable of classifying the
subgraph “significance” [more on that later]
§ Negative values indicate under-representation
§ Positive values indicate over-representation
¡
We create a network significance profile:
§ A feature vector with values for all subgraph types
¡
Next: Compare profiles of different networks:
§
§
§
§
§
10/9/18
Regulatory network (gene regulation)
Neuronal network (synaptic connections)
World Wide Web (hyperlinks between pages)
Social network (friendships)
Language networks (word adjacency)
Jure Leskovec, Stanford CS224W: Analysis of Networks
6
Example Application
Gene regulation
networks
Network significance profile
Neurons
Web and social
Language networks
Different networks have similar fingerprints!
Image: (Milo et al., 2004)
Different networks have similar significance profiles
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
Pedro Ribeiro
Milo et al., Science 2004
7
Example Application
Correlation
Network significance profile similarity
Clustering of networks based on their significance profiles
Correlation in
significance profile of
the English and French
language networks
Closely related networks have more similar significance profiles
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
Different networks have similar fingerprints!
Milo et al., Science 2004
8
Image: (Milo et al., 2004)
Example Application – Science
Subgraph types (corresponding to the X-axis of the plot)
Network significance profile
Co-Authorship Network in different scientific areas
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
Choobdar et al., ASONAM 2012
9
Image: (Choobdar et al, 2002)
Pedro Ribeiro
¡
Network motifs: “recurring, significant
patterns of interconnections”
¡
How to define a network motif:
§ Pattern: induced/non-induced subgraph
§ Recurring: found many times, i.e., with high
frequency
§ Significant: more frequent than expected, i.e., in
randomly generated networks
§ Erdos-Renyi random graphs, scale-free networks
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
12
¡
Motifs:
§ Help us understand how networks work
§ Help us predict operation and reaction of the
network in a given situation
¡
Feed-forward loop
Examples:
§ Feed-forward loops: found in networks of
neurons, where they neutralize “biological noise”
§ Parallel loops: found in food webs
§ Single-input modules: found in gene control
networks
Single-input module
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
Parallel loop
13
*+#,-"#&.,)/$0123
Induced subgraph
of interest
(aka Motif):
No match!
10/9/18
Match!
Jure Leskovec, Stanford CS224W: Analysis of Networks
14
Subgraph concepts - Frequency
How to Subgraph
count?
– Allow overlapping
concepts - Frequency
Motif of interest:
How to count?
– Allow overlapping
– 4 occurrences:
¡ Allow
overlapping of motifs
{1,2,3,4,5}
¡ Network
on the right has
4
– 4 occurrences:
{1,2,3,4,6}
occurrences
of the motif:
{1,2,3,4,5}
{1,2,3,4,7}
{1,2,3,4,5}
{1,2,3,4,8}{1,2,3,4,6}
§
§ {1,2,3,4,6} {1,2,3,4,7}
{1,2,3,4,8}
§ {1,2,3,4,7}
§ {1,2,3,4,8}
Pedro Ribeiro
Pedro Ribeiro
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
Example borrowed from Pedro Ribeiro
15
¡
10/9/18
Key idea: Subgraphs that occur in a real
network much more often than in a random
network have functional significance
Jure Leskovec, Stanford CS224W: Analysis of Networks
Milo et. al., Science 2002
16
¡
Motifs are overrepresented in a network
when compared to randomized networks:
§ !" captures statistical significance of motif #:
,"')-. )/std(&"')-. )
!" = (&"'()* −&
§ &"'()* is #(subgraphs of type 4) in network 5 '()*
§ &"')-. is #(subgraphs of type 4) in randomized network 5 ')-.
¡
Network significance profile:
67" = !" / ∑!9:
10/9/18
§ 67 is a vector of normalized Z-scores
§ 67 emphasizes relative significance of subgraphs:
§ Important for comparison of networks of different sizes
§ Generally, larger networks display higher Z-scores
17
Goal: Generate a random graph with a
given degree sequence k1, k2, … kN
¡ Useful as a “null” model of networks:
¡
§ We can compare the real network G and a “random”
G’ which has the same degree sequence as G
¡
Configuration model:
A
C
B
D
Nodes with spokes
A
B
C
D
Randomly pair up
“mini”-n0des
A
C
B
D
Resulting graph
We ignore double edges and self-loops when creating the final graph
10/12/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
18
¡
¡
Start from a given graph !
Repeat the switching step "|$ % | times:
§ Select a pair of edges AàB, CàD at random
§ Exchange the endpoints to give AàD, CàB
§ Exchange edges only if no multiple edges
or self-edges are generated
¡
Result: A randomly rewired graph:
A
C
B
D
A
C
B
D
§ Same node degrees, randomly rewired edges
¡
10/9/18
" is chosen large enough (e.g., " = 100) for the
process to converge
Jure Leskovec, Stanford CS224W: Analysis of Networks
19
Example Application
Network significance profile
+"
−%
&(,Different networks have&'()
similar fingerprints!
10/9/18
!" = (%"
)/std(%"&(,- )
Jure Leskovec, Stanford CS224W: Analysis of Networks
Image: (Milo et al., 2004)
Milo et al., Science 2004
Pedro Ribeiro
20
Count subgraphs ! in " #$%&
¡ Count subgraphs ! in random networks " #%'( :
¡
§ Configuration model: Each " #%'( has the same
#(nodes), #(edges) and #(degree distribution) as " #$%&
¡
Assign Z-score to !:
0+#%'( )/std(.+#%'( )
§ *+ = (.+#$%& −.
§ High Z-score: Subgraph !
is a network motif of 6
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
21
Network Motifs Applicability
¡
Canonical definition:
Canon definition:
§ Directed
and undirected
– Directed and Undirected
§ Colored
and
uncolored
– Colored
and
uncolored
§ Temporal and static motifs
¡
Examples from my own pap
Variations
onconcept
the concept
Variations
on the
§
§
§
§
10/9/18
Example: colored networ
– Different frequency concepts
Different frequency concepts
– Different significance metrics
Different
significance metrics
– Under-Representation
(anti-motifs)
– Different constraints for
null model
Under-Representation
(anti-motifs)
– Weighted
networks for null model
Different
constraints
Jure Leskovec, Stanford CS224W: Analysis of Networks
22
Pedro Ribeiro
Z-scores of individual motifs for different networks
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
23
Z-scores of individual motifs for different networks
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
24
¡
Network of neurons and a gene network
contain similar motifs:
§ Feed-forward loops and bi-fan structures
§ Both are information processing networks with
sensory and acting components
¡
Food webs have parallel loops:
§ Prey of a particular predator share prey
¡
WWW network has bidirectional links
§ Design that allows the shortest path among sets of
related pages
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
25
¡
Graphlets: connected non-isomorphic subgraphs
§ Induced subgraphs of any frequency
For ! = #, %, &, … () there are *, +, *(, … ((,(+&,( graphlets!
10/9/18
Jure Leskovec, Stanford CS224W: Analysis of Networks
Przulj et al., Bioinformatics 2004
26