CS224W: Analysis of Networks
Jure Leskovec, Stanford University
?
?
?
?
?
Machine
Learning
Node classification
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
2
?
?
?
10/23/18
x
Machine
Learning
Jure Leskovec, Stanford CS224W: Analysis of Networks,
4
¡
(Supervised) Machine Learning Lifecycle
requires feature engineering every
single time!
Raw
Data
Structured
Data
Automatically
Feature
Engineering learn the features
10/23/18
Learning
Algorithm
Model
Downstream
task
Jure Leskovec, Stanford CS224W: Analysis of Networks,
5
Goal: Efficient task-independent feature learning
for machine learning
in networks!
vec
node
u
!: # → ℝ&
ℝ&
Feature representation,
embedding
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
6
What is network embedding?
¡
Task:
map
in a network
• WeWe
map
eacheach
nodenode
in a network
into a into
low- a
low-dimensional
space
dimensional space
§ Distributed
representation
forfornodes
– Distributed
representation
nodes
§ Similarity
of embedding
between
nodes
indicates
– Similarity
between nodes
indicate
the link
theirstrength
network similarity
– Encode
network
information
andgenerate
generatenode
node
§ Encode
network
information
and
representation
representation
17
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
7
Example
2D embedding of nodes of the Zachary’s
Karate Karate
Club network:
• Zachary’s
Network:
¡
Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
8
¡
Modern deep learning toolbox is designed for
simple sequences or grids.
§ CNNs for fixed-size images/grids….
§ RNNs or word2vec for text/sequences…
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
9
¡
But networks are far more complex!
§ Complex topographical structure
(i.e., no spatial locality like grids)
§ No fixed node ordering or reference point (i.e.,
the isomorphism problem)
§ Often dynamic and have multimodal features.
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
10
¡
Assume we have a graph G:
§ V is the vertex set.
§ A is the adjacency matrix (assume binary).
§ No node features or extra information is used!
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
12
¡
Goal is to encode nodes so that similarity in
the embedding space (e.g., dot product)
approximates similarity in the original
network
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
13
Goal: similarity(u, v) ⇡ z>
v zu
in the original network
Similarity of the embedding
Need to define!
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
14
Define an encoder (i.e., a mapping from
nodes to embeddings)
2. Define a node similarity function (i.e., a
measure of similarity in the original
network).
3. Optimize the parameters of the encoder so
that:
1.
similarity(u, v) ⇡
in the original network
10/23/18
>
zv zu
Similarity of the embedding
Jure Leskovec, Stanford CS224W: Analysis of Networks,
15
¡
Encoder maps each node to a lowdimensional vector
d-dimensional
enc(v) = zv embedding
node in the input graph
¡
Similarity function specifies how relationships
in vector space map to relationships in the
original network
similarity(u, v) ⇡ z>
v zu
Similarity of u and v in
the original network
10/23/18
dot product between node
embeddings
Jure Leskovec, Stanford CS224W: Analysis of Networks,
16
¡
Simplest encoding approach: encoder is just
an embedding-lookup
enc(v) = Zv
d⇥|V| matrix, each column is node
Z2R
embedding [what we learn!]
v2I
10/23/18
|V| indicator vector, all zeroes
except a one in column
indicating node v
Jure Leskovec, Stanford CS224W: Analysis of Networks,
17
¡
Simplest encoding approach: encoder is just
an embedding-lookup
embedding
matrix
embedding vector for a
specific node
Dimension/size
of embeddings
Z=
one column per node
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
18
Simplest encoding approach: encoder is just an
embedding-lookup
Each node is assigned a unique
embedding vector
Many methods: node2vec, DeepWalk, LINE
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
19
¡
Key choice of methods is how they define
node similarity.
¡
E.g., should two nodes have similar
embeddings if they….
§
§
§
§
10/23/18
are connected?
share neighbors?
have similar “structural roles”?
…?
Jure Leskovec, Stanford CS224W: Analysis of Networks,
20
Material based on:
• Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD.
• Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD.
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
21
>
zu zv
10/23/18
⇡
probability that u
and v co-occur on
a random walk over
the network
Jure Leskovec, Stanford CS224W: Analysis of Networks,
22
1.
Estimate probability of visiting node ! on a
random walk starting from node " using
some random walk strategy #
2.
Optimize embeddings to encode these
random walk statistics:
Similarity (here: dot product=cos(())
encodes random walk “similarity”
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
23
1.
Expressivity: Flexible stochastic definition of
node similarity that incorporates both local
and higher-order neighborhood information
2.
Efficiency: Do not need to consider all node
pairs when training; only need to consider
pairs that co-occur on random walks
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
24
¡
Intuition: Find embedding of nodes to
d-dimensions that preserves similarity
¡
Idea: Learn node embedding such that nearby
nodes are close together in the network
¡
Given a node !, how do we define nearby
nodes?
§ "# ! … neighbourhood of ! obtained by some
strategy $
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
25
¡
¡
Given ! = ($, &),
Our goal is to learn a mapping (: * → ℝ- .
¡
Log-likelihood objective:
max 2 log P(:; (*)| (3 )
1
3 ∈5
§ where := (*) is neighborhood of node *
¡
Given node *, we want to learn feature
representations predictive of nodes in its
neighborhood :; (*)
10/23/18
Jure Leskovec, Stanford CS224W: Analysis of Networks,
26