infocom16 slide viral marketing

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (1.94 MB, 28 trang )

IEEE International Conference on Computer Communications
10-15 April 2016 San Francisco, CA, USA

Targeted Viral Marketing in
Billion-scale Networks
Hung T. Nguyen1, My T. Thai2 and Thang N. Dinh1
1 CS Dept., Virginia Commonwealth University, Richmond, VA 23284
2CISE Dept., University of Florida, Gainesville, FL 32611

Thang N. Dinh

1

I. Introduction: Viral Marketing
 Marketing via the “word-of-mouth” effect

 Influence Maximization: Find a small set of
users(seed) to influence most of the network.
Thang N. Dinh

2

Intro.: Viral Marketing Examples
VIRAL MARKETING
 ALS Ice Bucket Challenge
o 2.4 M videos uploaded on Facebook
o $98.2 M donated to ALS association

 ToyRUs

#PlayItForward

o $35.5 donation

 Always #LikeAGirl (youtube)
~60 mil. views
Thang N. Dinh

3

Intro.: Targeted Viral Marketing
What’s wrong with choosing Mr. President to advertise
Shampoo?

Thang N. Dinh

4

Intro.: Targeted Viral Marketing
 Targeted Marketing: Focus on customers with
certain traits

Age: 18-30, Like: Music

Tech hobbyists, Age: 25-50

 Targeted Viral Marketing:
Seeding strategies to
influence customers of
certain traits.

Thang N. Dinh

5

Targeted Viral Marketing Problem
 Real-world data: Social networks Twitter,
Stackexchange, etc.
o Users relationship: Who follows whom?
o User attributes: Geo-location,
o User-generated contents: Tweets, posts, etc.

 Targeted Viral Marketing:
o Company has a budget B to incentivize users
o Hope to trigger large cascade of adoption
o Whom to target for “3d printing”, “android”, etc.?

Thang N. Dinh

6

Targeted Viral Marketing (TVM)
 Input: Given graph 𝐺 = (𝑉, 𝐸, 𝑤) and a budget B and
a propagation model.
 Each node 𝑢 have a cost 𝑐(𝑢) and a relevant score
𝑏(𝑢)
 Output: A seed set of total cost at most B that
maximize the expected relevance of the influenced
users (influence spread).

Thang N. Dinh

7

Related Work: Influence Maximization
𝟏
(𝟏 − − 𝝐)-approximation with
𝒆
Method
Time complexity

a probability 𝟏 − 𝒏−𝟏
Note

Greedy
(KDD’03)

𝑂(𝑘𝑚𝑛𝜖 −3 )

Original greedy

CELF
(KDD’07)

𝑂(𝑘𝑚𝑛𝜖 −3 )

Lazy-forward, up to 700
times faster than Greedy

𝑂( 𝑚 + 𝑛

ln

𝑛
+ ln 2𝑛 𝜖 −2 )
𝑘

Up to 1000 times faster than
CELF

IMM
(SIGMOD’15)

𝑂( 𝑚 + 𝑛

ln

𝑛
+ ln 2𝑛 𝜖 −2 )
𝑘

Up to 100 times faster
TIM/TIM+

SSA/D-SSA
(To appear ACM
SIGMOD’16)

Near-linear time
+ Up to 1000 times faster
Guarantee minimum samples than IMM for InfMax
Sub-linear time for dense graph.

TIM/TIM+
(SIGMOD’14)

Thang N. Dinh

8

Related Work
 Nguyen et al. JSAC’13: Budgeted influence
maximization
o Not scalable, not consider users’ relevance

 Topic-aware influence: No theoretical guarantees on
the quality (Barbieri et al. KAIS 2013, Barbieri et al.
EDBT 2014, Chen et al. VLDB 2015)

Thang N. Dinh

9

Cascading Models
 Describe the cascading processes
 Popular models:
o
o
o
o

Linear Threshold
Independent Cascades (or Bayesian Network)
SI/ SIS, SIR, SIRS, SEIRS, …
Load shedding, DC/AC Power Flow Models

Thang N. Dinh

10

Independent Cascade Model

 When node v becomes active, it has a chance of
activating each currently inactive neighbor w.
 The activation attempt succeeds with probability pvw .
0.6
0.3

0.2
X

0.4

0.5
w

0.2
0.1

U

0.3
0.5
Thang N. Dinh

11

Example
0.6
Inactive Node

0.3

0.2

X

0.4

0.5
w

0.2

Active Node

U

0.1
0.3

0.2

0.5

Newly active
node
Successful
attempt
Unsuccessful

attempt

v

Thang N. Dinh

12

Challenges: Targeted VM
 Exponential number of
possible worlds
 Scale of the social networks
o Facebook: ≥ 1.5 billions nodes,
100 billion edges, 111 PB
adjacency matrix, 2.92 TB
adjacency list
o Twitter: ≥ 5 billions edges,
3 billions tweets/mo.

 Heterogeneous nodes’ costs and relevance (benefit)
 Difficult to estimate the number of needed samples.

Thang N. Dinh

13

General Framework
RIS sampling
max𝑆 ∈Ω 𝑓(𝑆)
(𝛼 − 𝜖)-approx.
solution 𝑆𝒜
𝑓 𝑆𝒜 ≥ 𝛼 − 𝜖 𝑂𝑃𝑇𝑓

Sample generator 𝒯
[size 𝑇 = 𝜃(ϵ, δ)]

𝑓መ𝑇 𝑆 ∼ 𝑓 𝑆 𝑤. ℎ. 𝑝

Max-coverage
(1-1/e) approx.
Bounding techniques

max𝑆 ∈Ω 𝑓መ𝑇 (𝑆)

𝛼-approx.
algorithm 𝒜
𝑆𝒜 ∈ Ω

𝑓መ𝑇 𝑆𝒜 ≥ 𝛼 ∙ 𝑂𝑃𝑇𝑓መ𝑇

with prob. (1 − δ)

Difficult to get (𝛼 − 𝜖)OPT multiplicative error
How many samples? 𝜽(𝝐, 𝜹) = ???
How to achieve minimum number of samples???
Thang N. Dinh

14

RIS Sampling(Borg. Et al. 14’)
 Generate hypergraph ℋ with hyperedges:
o Select a random 𝑢 ∈ 𝑉 and a random graph sample 𝑔
o Hyperedge ℰ = { nodes that can reach 𝑢 in 𝑔}
• Note: Instead of generating 𝑔, we can use reverse BFS

0.6

a
u=a
u=b
u=c

b

0.2

0.3
c

Example: Assuming
Independent Cascade model
ℰ1 = { 𝑎, 𝑏 }
ℰ2 = 𝑏, 𝑎, 𝑐
ℰ3 = 𝑐, 𝑎

ℋ = (𝑉, ℰ1 , ℰ2 , ℰ3 )

Thang N. Dinh

15

RIS Sampling (cont.)
0.6

a

0.2

0.3

 Observation:

b

ℰ1 = { 𝑎, 𝑏 }
ℰ2 = 𝑏, 𝑎, 𝑐
ℰ3 = 𝑐, 𝑎

c

o Influential nodes appear more often in the hyperedges
o Influential seed set = one that covers most hyperedges
RIS framework (Borgs. et al., Tang et al. 2014)

1. Generate multiple hyperedges
2. Find seed set that covers most hyperedges
using greedy algorithm for Max-Coverage.
Thang N. Dinh

16

Number of Samples (Threshold)
 Time complexity (expected) =
#Hyperedges [𝒎ℋ ] x (Time to generate a hyperedge) [EPT]

Decide the running-time

 A - How many hyperedges are sufficient?
𝜃 ≥ 8+𝜖

Unknown in advance
𝑛
ln
+ln 2𝑛
𝟏
𝑘
𝑛
 [(𝟏 − − 𝝐)-approx.
2
𝒆
𝑂𝑃𝑇𝑘 𝜖
with a probability 𝟏 − 𝒏−𝟏] (Tang et al. ‘14)

 B- Can we generate just a little than 𝜃 hyperedges?
- TIM:Lowerbound OPT by KPT ≤ OPT
- TIM+: Lowerbound KPT+ by KPT+ ∈ [KPT, OPT]
 Highly sophisticated estimation
 No guarantees on the number of samples
Thang N. Dinh

17

BCT Algorithm

Thang N. Dinh

18

BCT Algorithm
 Effective stopping conditions to generate “just
enough” samples.

 Importance sampling to guarantee a almost linear
number of samples

 Provable bounded errors and high confidence

Thang N. Dinh

19

Provable Guarantees

Thang N. Dinh

20

Experiments
 Datasets

Thang N. Dinh

21

Results: Benefit comparison

BCT results in the the best benefit
with the same budget!
Thang N. Dinh

22

Results: Quality & Running time

Thang N. Dinh

23

Results: Running time on Twitter

Thang N. Dinh

24

Seeding Quality
 Twitter: 40 million nodes, 1.5 billion edges, 106
millions tweets

Thang N. Dinh

25

infocom16 slide viral marketing

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về