09 node2vec (graph representation learning)

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (28.54 MB, 60 trang )

CS224W: Analysis of Networks
Jure Leskovec, Stanford University

?

?

?

?
?

Machine
Learning

Node classification
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

2

?
?
?

10/23/18

x
Machine
Learning

Jure Leskovec, Stanford CS224W: Analysis of Networks,

4

¡

(Supervised) Machine Learning Lifecycle
requires feature engineering every
single time!

Raw
Data

Structured
Data

Automatically
Feature
Engineering learn the features
10/23/18

Learning
Algorithm

Model

Downstream
task

Jure Leskovec, Stanford CS224W: Analysis of Networks,

5

Goal: Efficient task-independent feature learning
for machine learning
in networks!

vec

node
u

!: # → ℝ&
ℝ&
Feature representation,
embedding

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

6

What is network embedding?

¡

Task:
map
in a network
• WeWe
map
eacheach
nodenode
in a network
into a into
low- a
low-dimensional
space
dimensional space
§ Distributed
representation
forfornodes
– Distributed
representation
nodes
§ Similarity
of embedding
between
nodes
indicates
– Similarity

between nodes
indicate
the link
theirstrength
network similarity
– Encode
network
information
andgenerate
generatenode
node
§ Encode
network
information
and
representation
representation

17
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

7

Example

2D embedding of nodes of the Zachary’s
Karate Karate

Club network:
• Zachary’s
Network:
¡

Image from: Perozzi et al. DeepWalk: Online Learning of Social Representations. KDD 2014.
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

8

¡

Modern deep learning toolbox is designed for
simple sequences or grids.
§ CNNs for fixed-size images/grids….

§ RNNs or word2vec for text/sequences…

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

9

¡

But networks are far more complex!
§ Complex topographical structure
(i.e., no spatial locality like grids)

§ No fixed node ordering or reference point (i.e.,
the isomorphism problem)
§ Often dynamic and have multimodal features.
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

10

¡

Assume we have a graph G:
§ V is the vertex set.
§ A is the adjacency matrix (assume binary).
§ No node features or extra information is used!

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

12

¡

Goal is to encode nodes so that similarity in
the embedding space (e.g., dot product)
approximates similarity in the original
network

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

13

Goal: similarity(u, v) ⇡ z>
v zu
in the original network

Similarity of the embedding

Need to define!

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

14

Define an encoder (i.e., a mapping from
nodes to embeddings)

2. Define a node similarity function (i.e., a
measure of similarity in the original
network).
3. Optimize the parameters of the encoder so
that:
1.

similarity(u, v) ⇡

in the original network

10/23/18

>
zv zu

Similarity of the embedding

Jure Leskovec, Stanford CS224W: Analysis of Networks,

15

¡

Encoder maps each node to a lowdimensional vector
d-dimensional
enc(v) = zv embedding
node in the input graph

¡

Similarity function specifies how relationships
in vector space map to relationships in the
original network

similarity(u, v) ⇡ z>
v zu

Similarity of u and v in
the original network
10/23/18

dot product between node
embeddings

Jure Leskovec, Stanford CS224W: Analysis of Networks,

16

¡

Simplest encoding approach: encoder is just
an embedding-lookup

enc(v) = Zv
d⇥|V| matrix, each column is node

Z2R

embedding [what we learn!]

v2I

10/23/18

|V| indicator vector, all zeroes
except a one in column
indicating node v

Jure Leskovec, Stanford CS224W: Analysis of Networks,

17

¡

Simplest encoding approach: encoder is just
an embedding-lookup

embedding
matrix

embedding vector for a
specific node
Dimension/size
of embeddings

Z=

one column per node
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

18

Simplest encoding approach: encoder is just an
embedding-lookup
Each node is assigned a unique
embedding vector
Many methods: node2vec, DeepWalk, LINE

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

19

¡

Key choice of methods is how they define
node similarity.

¡

E.g., should two nodes have similar
embeddings if they….

§
§
§
§

10/23/18

are connected?
share neighbors?
have similar “structural roles”?
…?

Jure Leskovec, Stanford CS224W: Analysis of Networks,

20

Material based on:
• Perozzi et al. 2014. DeepWalk: Online Learning of Social Representations. KDD.
• Grover et al. 2016. node2vec: Scalable Feature Learning for Networks. KDD.
10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

21

>
zu zv

10/23/18

⇡

probability that u
and v co-occur on
a random walk over
the network

Jure Leskovec, Stanford CS224W: Analysis of Networks,

22

1.

Estimate probability of visiting node ! on a
random walk starting from node " using
some random walk strategy #

2.

Optimize embeddings to encode these
random walk statistics:
Similarity (here: dot product=cos(())
encodes random walk “similarity”

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

23

1.

Expressivity: Flexible stochastic definition of
node similarity that incorporates both local
and higher-order neighborhood information

2.

Efficiency: Do not need to consider all node
pairs when training; only need to consider
pairs that co-occur on random walks

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

24

¡

Intuition: Find embedding of nodes to
d-dimensions that preserves similarity

¡

Idea: Learn node embedding such that nearby
nodes are close together in the network

¡

Given a node !, how do we define nearby
nodes?
§ "# ! … neighbourhood of ! obtained by some
strategy $

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

25

¡
¡

Given ! = ($, &),
Our goal is to learn a mapping (: * → ℝ- .

¡

Log-likelihood objective:
max 2 log P(:; (*)| (3 )
1

3 ∈5

§ where := (*) is neighborhood of node *
¡

Given node *, we want to learn feature
representations predictive of nodes in its
neighborhood :; (*)

10/23/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

26

09 node2vec (graph representation learning)

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về