05 motifs and graphlets

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (23.85 MB, 41 trang )

CS224W: Analysis of Networks
Jure Leskovec, Stanford University

Network Metrics

¡
¡
¡

Many metrics at the node level:
Network Metrics

There are many available metrics at the node level:
§ E.g.,
node degree, PageRank score, node clustering
–
E.g. degree, betweenness, closeness

There are many available metrics at the node level:
Many
– metrics at the whole-network level:
– E.g. degree, betweenness, closeness

There
are also many metrics at the global level:
–
§ E.g.,
diameter,

clustering, size of giant component
– E.g. average distance, density, clustering coefficient
There are also many metrics at the global level:

– E.g.
What
about
in-between?
average something
distance, density, clustering
coefficient

What about something inbetween?
§ A What
mesoscale
characterization
about something
inbetween?of networks

>
10/9/18

?? ?

Macroscopic:
Mesoscopic
Whole network
Jure Leskovec, Stanford CS224W: Analysis of Networks

>

Microscopic:
Pedro Ribeiro
Single node

Pedro Ribeiro

2

Building of
Blocks
of Networks
Building Blocks
Networks

¡

Subnetworks,
or subgraphs,
are the
Subnetworks,
or subgraphs,
arebuilding
the building
Subnetworks,
or subgraphs,
blocks
of networks are the building
blocks
of networks:

blocks of networks

They have the power to characterize and
discriminate networks

¡

10/9/18

Theyhave
havethe
the power
power to
They
tocharacterize
characterizeand
and
discriminate networks
discriminate networks
Jure Leskovec, Stanford CS224W: Analysis of Networks

Pedro Ribeiro

Pedro Ribeiro

3

Subgraph decomposition of an electronic circuit
10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

Oxford Protein Informatics Group

4

Example Application
Consider
directed
Let’s
considerall
all possible
possible (non-isomorphoic)
subgraphs of size 3
directed subgraphs of size 3

10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

5
Pedro Ribeiro

¡

For each subgraph:

§ Imagine you have a metric capable of classifying the
subgraph “significance” [more on that later]
§ Negative values indicate under-representation
§ Positive values indicate over-representation

¡

We create a network significance profile:

§ A feature vector with values for all subgraph types

¡

Next: Compare profiles of different networks:
§
§
§
§
§

10/9/18

Regulatory network (gene regulation)
Neuronal network (synaptic connections)
World Wide Web (hyperlinks between pages)
Social network (friendships)
Language networks (word adjacency)
Jure Leskovec, Stanford CS224W: Analysis of Networks

6

Example Application
Gene regulation
networks

Network significance profile

Neurons

Web and social

Language networks

Different networks have similar fingerprints!

Image: (Milo et al., 2004)

Different networks have similar significance profiles

10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

Pedro Ribeiro

Milo et al., Science 2004

7

Example Application
Correlation

Network significance profile similarity

Clustering of networks based on their significance profiles

Correlation in
significance profile of
the English and French
language networks

Closely related networks have more similar significance profiles
10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

Different networks have similar fingerprints!

Milo et al., Science 2004

8

Image: (Milo et al., 2004)

Example Application – Science

Subgraph types (corresponding to the X-axis of the plot)

Network significance profile

Co-Authorship Network in different scientific areas
10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

Choobdar et al., ASONAM 2012
9
Image: (Choobdar et al, 2002)
Pedro Ribeiro

¡

Network motifs: “recurring, significant
patterns of interconnections”

¡

How to define a network motif:
§ Pattern: induced/non-induced subgraph
§ Recurring: found many times, i.e., with high
frequency
§ Significant: more frequent than expected, i.e., in
randomly generated networks
§ Erdos-Renyi random graphs, scale-free networks

10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

12

¡

Motifs:
§ Help us understand how networks work
§ Help us predict operation and reaction of the
network in a given situation

¡

Feed-forward loop

Examples:

§ Feed-forward loops: found in networks of
neurons, where they neutralize “biological noise”
§ Parallel loops: found in food webs
§ Single-input modules: found in gene control
networks
Single-input module
10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

Parallel loop
13

*+#,-"#&.,)/$0123
Induced subgraph
of interest
(aka Motif):

No match!

10/9/18

Match!

Jure Leskovec, Stanford CS224W: Analysis of Networks

14

Subgraph concepts - Frequency
How to Subgraph
count?
– Allow overlapping

concepts - Frequency

Motif of interest:

How to count?

– Allow overlapping

– 4 occurrences:
¡ Allow
overlapping of motifs
{1,2,3,4,5}
¡ Network
on the right has

4

– 4 occurrences:
{1,2,3,4,6}
occurrences
of the motif:
{1,2,3,4,5}
{1,2,3,4,7}
{1,2,3,4,5}
{1,2,3,4,8}{1,2,3,4,6}

§
§ {1,2,3,4,6} {1,2,3,4,7}
{1,2,3,4,8}
§ {1,2,3,4,7}
§ {1,2,3,4,8}

Pedro Ribeiro
Pedro Ribeiro

10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

Example borrowed from Pedro Ribeiro

15

¡

10/9/18

Key idea: Subgraphs that occur in a real
network much more often than in a random
network have functional significance

Jure Leskovec, Stanford CS224W: Analysis of Networks

Milo et. al., Science 2002

16

¡

Motifs are overrepresented in a network
when compared to randomized networks:
§ !" captures statistical significance of motif #:
,"')-. )/std(&"')-. )
!" = (&"'()* −&

§ &"'()* is #(subgraphs of type 4) in network 5 '()*
§ &"')-. is #(subgraphs of type 4) in randomized network 5 ')-.

¡

Network significance profile:
67" = !" / ∑!9:

10/9/18

§ 67 is a vector of normalized Z-scores
§ 67 emphasizes relative significance of subgraphs:
§ Important for comparison of networks of different sizes
§ Generally, larger networks display higher Z-scores

17

Goal: Generate a random graph with a
given degree sequence k1, k2, … kN
¡ Useful as a “null” model of networks:
¡

§ We can compare the real network G and a “random”
G’ which has the same degree sequence as G

¡

Configuration model:
A

C

B

D

Nodes with spokes

A

B

C

D

Randomly pair up
“mini”-n0des

A

C

B

D

Resulting graph

We ignore double edges and self-loops when creating the final graph
10/12/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

18

¡
¡

Start from a given graph !
Repeat the switching step "|$ % | times:
§ Select a pair of edges AàB, CàD at random
§ Exchange the endpoints to give AàD, CàB
§ Exchange edges only if no multiple edges
or self-edges are generated

¡

Result: A randomly rewired graph:

A

C

B

D

A

C

B

D

§ Same node degrees, randomly rewired edges
¡

10/9/18

" is chosen large enough (e.g., " = 100) for the
process to converge
Jure Leskovec, Stanford CS224W: Analysis of Networks

19

Example Application
Network significance profile

+"
−%

&(,Different networks have&'()
similar fingerprints!
10/9/18

!" = (%"

)/std(%"&(,- )

Jure Leskovec, Stanford CS224W: Analysis of Networks

Image: (Milo et al., 2004)

Milo et al., Science 2004
Pedro Ribeiro

20

Count subgraphs ! in " #$%&
¡ Count subgraphs ! in random networks " #%'( :
¡

§ Configuration model: Each " #%'( has the same
#(nodes), #(edges) and #(degree distribution) as " #$%&
¡

Assign Z-score to !:
0+#%'( )/std(.+#%'( )
§ *+ = (.+#$%& −.
§ High Z-score: Subgraph !
is a network motif of 6

10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

21

Network Motifs Applicability
¡

Canonical definition:
Canon definition:

§ Directed
and undirected
– Directed and Undirected
§ Colored
and
uncolored
– Colored
and
uncolored
§ Temporal and static motifs

¡

Examples from my own pap

Variations
onconcept
the concept
Variations

on the
§
§
§
§

10/9/18

Example: colored networ

– Different frequency concepts
Different frequency concepts
– Different significance metrics
Different
significance metrics
– Under-Representation
(anti-motifs)
– Different constraints for
null model
Under-Representation
(anti-motifs)
– Weighted
networks for null model
Different
constraints
Jure Leskovec, Stanford CS224W: Analysis of Networks

22

Pedro Ribeiro

Z-scores of individual motifs for different networks
10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

23

Z-scores of individual motifs for different networks
10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

24

¡

Network of neurons and a gene network
contain similar motifs:

§ Feed-forward loops and bi-fan structures
§ Both are information processing networks with
sensory and acting components

¡

Food webs have parallel loops:

§ Prey of a particular predator share prey

¡

WWW network has bidirectional links
§ Design that allows the shortest path among sets of
related pages

10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

25

¡

Graphlets: connected non-isomorphic subgraphs
§ Induced subgraphs of any frequency

For ! = #, %, &, … () there are *, +, *(, … ((,(+&,( graphlets!
10/9/18

Jure Leskovec, Stanford CS224W: Analysis of Networks

Przulj et al., Bioinformatics 2004

26

05 motifs and graphlets

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về