CS224W: Analysis of Networks
Jure Leskovec and Marinka Zitnik, Stanford University
¡
Intuition: Map nodes to d-dimensional
embeddings such that similar nodes in the
graph are embedded close together
f(
)=
Input graph
2D node embeddings
How to learn mapping function !?
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
2
Goal: Map nodes so that similarity in the
embedding space (e.g., dot product)
approximates similarity (e.g., proximity) in the
network
¡
Input network
12/6/18
d-dimensional
embedding space
Jure Leskovec, Stanford CS224W: Analysis of Networks,
3
Goal: similarity(u, v) ⇡ z>
v zu
Need to define!
Input network
12/6/18
d-dimensional
embedding space
Jure Leskovec, Stanford CS224W: Analysis of Networks,
4
¡
Encoder: Map a node to a low-dimensional
vector:
d-dimensional
enc(v) = zv embedding
node in the input graph
¡
Similarity function defines how relationships
in the input network map to relationships in
the embedding space:
similarity(u, v) ⇡ z>
v zu
Similarity of u and v
in the network
12/6/18
dot product between node
embeddings
Jure Leskovec, Stanford CS224W: Analysis of Networks,
5
So far we have focused on “shallow”
encoders, i.e. embedding lookups:
¡
embedding
matrix
embedding vector for a
specific node
Dimension/size
of embeddings
Z=
one column per node
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
6
Shallow encoders:
§ One-layer of data
transformation
§ A single hidden layer
maps node ! to
embedding "# via
function $, e.g.,
"# = $&"( , * ∈
,- ! '
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
7
¡
Limitations of shallow embedding methods:
§ O(|V|) parameters are needed:
§ No sharing of parameters between nodes
§ Every node has its own unique embedding
§ Inherently “transductive”:
§ Cannot generate embeddings for nodes that are not seen
during training
§ Do not incorporate node features:
§ Many graphs have features that we can and should
leverage
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
8
¡
Today: We will now discuss deep methods
based on graph neural networks:
enc(v) =
¡
12/6/18
multiple layers of non-linear
transformation of
graph structure
Note: All these deep encoders can be
combined with node similarity functions
defined in CS224W lecture 09
Jure Leskovec, Stanford CS224W: Analysis of Networks,
9
…
Output: Node embeddings Also,
we can embed larger network
structures, subgraphs, graphs
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
10
CNN on an image:
Goal is to generalize convolutions beyond simple lattices
Leverage node features/attributes (e.g., text, images)
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
11
Convolutional neural networks (on grids)
Single CNN layer with 3x3 filter:
Single CNN layer with 3x3 filter:
(Animation by
Vincent Dumoulin)
Image
Graph
Transform information at the neighbors and combine it:
§ Transform “messages” ℎ" from neighbors: #" ℎ"
§ Add them up: ∑" #" ℎ"
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
12
Hidd
Graph-structured data
Hidden layer
Hidden layer
Graph-structured data
What
our data
looks
like look
this?like this?
Butif what
if your
graphs
Input
What
if our data looks like this?
Input
or or
this:
this:
ReLU
ReLU
or this:
…
…
…
¡
Examples:
Biological networks, Medical networks, Social
networks, Information networks, Knowledge graphs,
Communication networks, Web graph, …
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
End-to-end learning on graphs with GCNs
Thomas Kipf
13
6
A nạve approach
• Take adjacency matrix
and feature matrix
[A, X]
¡ Join adjacency matrix and features
• Feed
them into
deepinto
(fully aconnected)
neural net
¡ Feed
them
deep neural
net:
• Concatenate them
ã Done?
A
B
E
C
D
Ă
A
B
C
D
E
Feat
A
0
1
1
1
0
1
0
B
1
0
0
1
1
0
0
C
1
0
0
1
0
0
1
D
1
1
1
0
1
1
1
E
0
1
0
1
0
1
0
?
Issues with this idea:
Problems:
ã
ã
Đ !(#) parameters
Huge number of parameters
§ Not applicable to graphs of different sizes
No inductive learning possible
§ Not invariant to node ordering
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
End-to-end learning on graphs with GCNs
Thomas Kipf
14
8
1.
Basics of deep learning for graphs
2.
Graph Convolutional Networks
3.
Graph Attention Networks (GAT)
4.
Practical tips and demos
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
15
¡
Local network neighborhoods:
§ Describe aggregation strategies
§ Define computation graphs
¡
Stacking multiple layers:
§ Describe the model, parameters, training
§ How to fit the model?
§ Simple example for unsupervised and supervised
training
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
17
¡
Assume we have a graph !:
§ " is the vertex set
§ # is the adjacency matrix (assume binary)
§ $ ∈ 'ì|*| is a matrix of node features
Đ Biologically meaningful node features:
§ E.g., immunological signatures, gene expression profiles, gene
functional information
§ No features:
§ Indicator vectors (one-hot encoding of a node)
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
18
[Scarselli et al., IEEE TNN 2005, Niepert et al., ICML 2016]
Graph-structured data
Learning Convolutional Neural Networks for Graphs
What
if our data looks like this?
Input
convolutional architecture
ver, for numerous graph colrdering (spatial, temporal, or
nodes of the graphs are not
or this: graph normalization
ReLU
stances, one has to solve two
e node sequences for which
ted and (ii) computing a norraphs, that is, a unique mapneighborhood graph construction
tion into a vector space rep…...
…
...
roach, termed PATCHY- SAN,
node sequence selection
for arbitrary graphs. For each
s nodes (and their order) for
Figure 2. An illustration of the proposed architecture. A node
re created. For each of these
a given
subgraph
to labeling
comeprocedure.
is selected
from a graphhow
via a graph
ting of exactlyProblem:
k nodes is ex- Forsequence
For some nodes in the sequence, a local neighborhood graph is ass, it is uniquely mapped to
a
with
canonical
nodeTheordering
sembled
and
normalized.
normalized neighborhoods are used
r. The normalized neighboras receptive fields and combined with existing CNN components.
eld for a node under consider12/6/18
19
g components
such as convo- Jure Leskovec, Stanford CS224W: Analysis of Networks,
ombined with the normalized
End-to-end with
learning
on graphs
GCNs
subgraphs
few
nodes.withAn
effectiveThomas
classKipf
of graph
6
[Kipf and Welling, ICLR 2017]
Idea: Node’s neighborhood defines a
computation graph
!
Determine node
computation graph
!
Propagate and
transform information
Learn how to propagate information across the
graph to compute node features
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
20
¡
Key idea: Generate node embeddings based
on local network neighborhoods
A
C
B
B
TARGET NODE
A
A
C
B
A
C
E
F
D
F
E
D
A
INPUT GRAPH
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
21
¡
Intuition: Nodes aggregate information from
their neighbors using neural networks
A
C
B
B
TARGET NODE
A
A
C
B
A
C
E
F
D
F
E
D
A
INPUT GRAPH
Neural networks
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
22
¡
Intuition: Network neighborhood defines a
computation graph
Every node defines a computation
graph based on its neighborhood!
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
23
¡
Model can be of arbitrary depth:
§ Nodes have embeddings at each layer
§ Layer-0 embedding of node ! is its input feature, i.e.
Layer-0
"!
Layer-1
B
B
TARGET NODE
Layer-2
A
C
A
C
F
D
E
xA
C x
C
A x
A
B
xB
E
xE
F
xF
A
D
A
INPUT GRAPH
12/6/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
xA
24
¡
Neighborhood aggregation: Key distinctions
are in how different approaches aggregate
information across the layers
B
TARGET NODE
A
What is in the box!?
C
A
F
?
B
?
C
A
C
B
?
D
E
F
E
D
INPUT GRAPH
12/6/18
A
Jure Leskovec, Stanford CS224W: Analysis of Networks,
?
A
25