Tải bản đầy đủ (.pdf) (60 trang)

09 node2vec (graph representation learning)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (28.54 MB, 60 trang )

CS224W: Analysis of Networks
Jure Leskovec, Stanford University




?

?

?

?
?

Machine
Learning

Node classification
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

2


?
?
?

10/23/18



x
Machine
Learning

Jure Leskovec, Stanford CS224W: Analysis of Networks,

4


¡

(Supervised) Machine Learning Lifecycle
requires feature engineering every
single time!

Raw
Data

Structured
Data

Automatically
Feature
Engineering learn the features
10/23/18

Learning
Algorithm


Model

Downstream
task

Jure Leskovec, Stanford CS224W: Analysis of Networks,

5


Goal: Efficient task-independent feature learning
for machine learning
in networks!

vec

node
u

!: # → ℝ&
ℝ&
Feature representation,
embedding

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

6



What is network embedding?

¡

Task:
map
in a network
• WeWe
map
eacheach
nodenode
in a network
into a into
low- a
low-dimensional
space
dimensional space
§ Distributed
representation
forfornodes
– Distributed
representation
nodes
§ Similarity
of embedding
between
nodes
indicates
– Similarity

between nodes
indicate
the link
theirstrength
network similarity
– Encode
network
information
andgenerate
generatenode
node
§ Encode
network
information
and
representation
representation

17
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

7


Example

2D embedding of nodes of the Zachary’s
Karate Karate

Club network:
• Zachary’s
Network:
¡

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

8


¡

Modern deep learning toolbox is designed for
simple sequences or grids.
§ CNNs for fixed-size images/grids….

§ RNNs or word2vec for text/sequences…

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

9


¡


But networks are far more complex!
§ Complex topographical structure
(i.e., no spatial locality like grids)

§ No fixed node ordering or reference point (i.e.,
the isomorphism problem)
§ Often dynamic and have multimodal features.
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

10



¡

Assume we have a graph G:
§ V is the vertex set.
§ A is the adjacency matrix (assume binary).
§ No node features or extra information is used!

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

12


¡


Goal is to encode nodes so that similarity in
the embedding space (e.g., dot product)
approximates similarity in the original
network

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

13


Goal: similarity(u, v) ⇡ z>
v zu
in the original network

Similarity of the embedding

Need to define!

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

14


Define an encoder (i.e., a mapping from
nodes to embeddings)

2. Define a node similarity function (i.e., a
measure of similarity in the original
network).
3. Optimize the parameters of the encoder so
that:
1.

similarity(u, v) ⇡

in the original network

10/23/18

>
zv zu

Similarity of the embedding

Jure Leskovec, Stanford CS224W: Analysis of Networks,

15


¡

Encoder maps each node to a lowdimensional vector
d-dimensional
enc(v) = zv embedding
node in the input graph


¡

Similarity function specifies how relationships
in vector space map to relationships in the
original network

similarity(u, v) ⇡ z>
v zu

Similarity of u and v in
the original network
10/23/18

dot product between node
embeddings

Jure Leskovec, Stanford CS224W: Analysis of Networks,

16


¡

Simplest encoding approach: encoder is just
an embedding-lookup

enc(v) = Zv
d⇥|V| matrix, each column is node

Z2R


embedding [what we learn!]

v2I

10/23/18

|V| indicator vector, all zeroes
except a one in column
indicating node v

Jure Leskovec, Stanford CS224W: Analysis of Networks,

17


¡

Simplest encoding approach: encoder is just
an embedding-lookup

embedding
matrix

embedding vector for a
specific node
Dimension/size
of embeddings

Z=

one column per node
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

18


Simplest encoding approach: encoder is just an
embedding-lookup
Each node is assigned a unique
embedding vector
Many methods: node2vec, DeepWalk, LINE

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

19


¡

Key choice of methods is how they define
node similarity.

¡

E.g., should two nodes have similar
embeddings if they….

§
§
§
§

10/23/18

are connected?
share neighbors?
have similar “structural roles”?
…?

Jure Leskovec, Stanford CS224W: Analysis of Networks,

20


Material based on:
• Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD.
• Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD.
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

21


>
zu zv


10/23/18



probability that u
and v co-occur on
a random walk over
the network

Jure Leskovec, Stanford CS224W: Analysis of Networks,

22


1.

Estimate probability of visiting node ! on a
random walk starting from node " using
some random walk strategy #

2.

Optimize embeddings to encode these
random walk statistics:
Similarity (here: dot product=cos(())
encodes random walk “similarity”

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,


23


1.

Expressivity: Flexible stochastic definition of
node similarity that incorporates both local
and higher-order neighborhood information

2.

Efficiency: Do not need to consider all node
pairs when training; only need to consider
pairs that co-occur on random walks

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

24


¡

Intuition: Find embedding of nodes to
d-dimensions that preserves similarity

¡


Idea: Learn node embedding such that nearby
nodes are close together in the network

¡

Given a node !, how do we define nearby
nodes?
§ "# ! … neighbourhood of ! obtained by some
strategy $

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

25


¡
¡

Given ! = ($, &),
Our goal is to learn a mapping (: * → ℝ- .

¡

Log-likelihood objective:
max 2 log P(:; (*)| (3 )
1

3 ∈5


§ where := (*) is neighborhood of node *
¡

Given node *, we want to learn feature
representations predictive of nodes in its
neighborhood :; (*)

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

26


×