12 influence maximization in networks

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (47.88 MB, 81 trang )

Announcements:
1) Project milestones due in one week
2) Honor code and code submission

CS224W: Analysis of Networks
Jure Leskovec, Stanford University

Ă

We are more influenced by our friends
than strangers
ă 68% of consumers consult

friends and family before
purchasing home electronics
ă50% do research online

before purchasing electronics

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

2

Identify influential
customers

Convince them to
adopt the product –
Offer discount or
free samples
11/1/18

These customers
endorse the product
among their friends

Jure Leskovec, Stanford CS224W: Analysis of Networks,

3

KateMiddletoneffect
Thetrendeffect
that Kate,Duchessof
Cambridgehason
others,fromcosmetic
surgeryforbrides,to
salesofcoral-colored
jeans.”

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

4

n

According
to Newsweek, "The
Kate Effect may be
worth £1 billion to the
UK fashion industry."

n

Tony DiMasso, L. K.
Bennett’s US president,
stated in 2012,
"...when she does wear
something, it always
seems to go on a
waiting list."

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

5

11/1/18

¡

Influential
persons often
have many
friends

¡

Kate is one of the
persons that have
many friends in
this social
network

¡

For more Kates,
it’s not as easy as
you might think!

Jure Leskovec, Stanford CS224W: Analysis of Networks,

6

11/1/18

¡

Given a directed

graph and k>0,

¡

Find k seeds (Kates)
to maximize the
number of
influenced people
(possibly in many
steps)

Jure Leskovec, Stanford CS224W: Analysis of Networks,

7

¡

Linear Threshold Model

¡

Independent Cascade Model

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

8

¡

A node v has random threshold !" ~ U[0,1]

¡

A node v is influenced by each neighbor w
according to a weight #",% such that

å

bv ,w £ 1

w neighbor of v

¡

A node v becomes active when at least
(weighted) !" fraction of its neighbors are
active

å

bv ,w ³ q v

w active neighbor of v
11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

9

Inactive Node
0.6

0.3

Active Node
0.2
X

Threshold

0.2

Active neighbors

0.1

0.4

U
0.5

w

11/1/18

0.3
0.5

Stop!

0.2

v

Jure Leskovec, Stanford CS224W: Analysis of Networks,

10

¡

Independent Cascade Model
§ Directed finite ! = ($, &)
§ Set ( starts out with new behavior
§ Say nodes with this behavior are “active”

§ Each edge (), *) has a probability +)*
§ If node ) is active, it gets one chance to
make * active, with probability +)*
§ Each edge fires at most once

¡

Does scheduling matter? No
§ If ,, ) are both active at the same time, it doesn’t matter

which tries to activate * first

§ But the time moves in discrete steps
11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

11

¡
¡

Initially some nodes S are active
Each edge (", $) has probability (weight) &"$
0.4

a
0.4
0.3

b

0.2
0.3

c

0.2
0.3

0.3
0.4

0.4

¡

f

g

d

e

0.2

h

0.2

0.3

0.4

i

0.3

When node v becomes active:
§ It activates each out-neighbor $ with prob. &"$

¡
11/1/18

Activations spread through the network
Jure Leskovec, Stanford CS224W: Analysis of Networks,

12

Problem: (k is a user-specified parameter)
¡

Most influential set of
size k: set S of k nodes
producing largest
expected cascade size f(S)
if activated [Domingos-

0.4
b

0.2

0.2
0.3
f 0.3
0.4

0.4

Richardson ‘01]
¡

0.4

a

Optimization problem: max
S of size k

0.3

g

c
Influence
set Xa of a

0.3

0.2

e

h

0.2

0.3

0.4

i

0.3

Influence
set Xd of d

f (S )

Why “expected cascade size”? Xa is a result of a random process. So in
practice we would want to compute Xa for many random realizations and then
maximize the “average” value f(S). For now let’s ignore this nuisance and
simply assume that each node u influences a set of nodes Xu
11/1/18

d

Jure Leskovec, Stanford CS224W: Analysis of Networks,

1
! " = ' !( (")
|&|
Random
realizations i
13

¡
¡

S: is initial active set
f(S): The expected size of final active set
§ f(S) is the size of the union of Xu: !(#) = ∪'∈# )'
a
b

d

… influence set
Xu of node u

c

graph G

¡

11/1/18

Set S is more influential if f(S) is larger
!( *, , ) < !({*, /}) < !({*, 1})
Jure Leskovec, Stanford CS224W: Analysis of Networks,

14

Problem: Most influential set of k nodes:
set S on k nodes producing largest expected
cascade size f(S) if activated
¡ The optimization problem:
¡

max f (S )
S of size k

¡

How hard is this problem?
§ NP-COMPLETE!
§ Show that finding most influential
set is at least as hard as a set cover problem

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

16

NP: Decision problems with the property: If the answer
is YES, then this fact can be checked in polynomial time.
§ NP is the set of decision problems whose solutions can be determined
by a Non-deterministic Turing machine in Polynomial time

NP-COMPLETE:

§ NP-Complete is a complexity class which represents the set
of all problems X in NP for which it is possible to reduce any
other NP problem Y to X in polynomial time
§ E.g., every NP problem can be reduced to 3-SAT
Problem

Verifiable in P time

Solvable in P time

P

Yes

Yes

NP

Yes

No (or Yes, if in P)

NP-COMPLETE

Yes

Unknown

NP-HARD

No (or Yes if in NP-C)

Unknown

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

17

¡

Set cover problem
(a known NP-complete problem):
§ Given universe of elements ! = {$%, … , $( }
and sets *%, … , *+ ⊆ !
X3

X2

X4

X1

U

§ Q: Are there k sets among X1,…, Xm such that

their union is U?

¡

Goal:
Encode set cover as an instance of max f (S )
S of size k

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

18

Given a set cover instance with sets X1,…, Xm
Construction:
Build a bipartite “X-to-U” graph:
• Create edge

¡
¡

X1
X2
X3

Xm

1

1

u1

1

u2

e.g.:
X1 = {u1, u2, u3}

u3
un

(Xi,u) " Xi " uẻXi
-- directed edge
from sets to their
elements
ã Put weight 1 on
each edge (the
activation is
deterministic)

Set Cover as Influence Maximization in
X-to-U graph: There exists a set S of size k
with f(S)=k+n iff there exists a size k set cover

¡

Note: Optimal solution is always a set of nodes Xi (we never influence nodes “u”)

This problem is hard in general, but there could be special cases that are easier.

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

19

¡

Extremely bad news:
§ Influence maximization is NP-complete

¡

Next, good news:

§ There exists an approximation algorithm!
§ For some inputs the algorithm won’t find globally
optimal solution/set OPT
§ But we will also prove that the algorithm will never do
too badly either. More precisely, the algorithm will find
a set S that where ! " ≥ $. &' ∗ !(*+,), where OPT
is the globally optimal set.

11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

20

¡

Consider a Greedy Hill Climbing algorithm to
find S:
§ Input:
Influence set !" of each node ": !" = {&', &), … }
§ That is, if we activate ", nodes {&', &), … } will eventually
get active

§ Algorithm: At each iteration , activate the node "
that gives largest marginal gain: -./ 0(2,3' ∪ {"})
"

67 … Initially active set
8(69 ) … Size of the union of :; , < ∈ 69
11/1/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

21

Algorithm:
¡ Start with !" = { }
¡ For & = ' … )

d

b

§ Activate node * that max .(!&0' ∪ {*})
§ Let !& = !&0' ∪ {*}
¡

Example:

e

c

f(Si-1È{u})
a
b

§ Eval. 3 4 , … , 3({6}), pick argmax of them
§ Eval. 3 7, 4 , … , 3({7, 6}), pick argmax
§ Eval. 3(7, 8, 4}), … , 3({7, 8, 6}), pick argmax
11/1/18

a

Jure Leskovec, Stanford CS224W: Analysis of Networks,

c
d
e
22

¡

Claim: Hill climbing produces a solution S
where: f(S) ³(1-1/e)*f(OPT) (f(S)>0.63*f(OPT))
[Nemhauser, Fisher, Wolsey ’78, Kempe, Kleinberg, Tardos ‘03]

¡

Claim holds for functions f(·) with 2 properties:
§ f is monotone: (activating more nodes doesn’t hurt)
if S Í T then f(S) £ f(T) and f({})=0
§ f is submodular: (activating each additional node helps less)
adding an element to a set gives less improvement
than adding it to one of its subsets: "S Í T

f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)
Gain of adding a node to a small set
11/1/18

Gain of adding a node to a large set

Jure Leskovec, Stanford CS224W: Analysis of Networks,

23

Diminishing returns:
f(·)

¡

"S Í T

f(T È{u})
f(T)
f(S È{u})
f(S)

Adding u to T helps less
than adding it to S!

Set size |T|, |S|

f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)
Gain of adding a node to a small set
11/1/18

Gain of adding a node to a large set

Jure Leskovec, Stanford CS224W: Analysis of Networks,

24

Also see the handout posted on the course website.

12 influence maximization in networks

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về