13 outbreak detection in networks

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (28.59 MB, 48 trang )

CS224W: Analysis of Networks
Jure Leskovec, Stanford University

¡
¡

(1) New problem: Outbreak detection
(2) Develop an approximation algorithm
§ It is a submodular opt. problem!

¡

(3) Speed-up greedy hill-climbing
§ Valid for optimizing general submodular functions
(i.e., also works for influence maximization)

¡

(4) Prove a new “data dependent” bound
on the solution quality
§ Valid for optimizing any submodular function
(i.e., also works for influence maximization)

11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

2

¡

Given a real city water
distribution network

¡

And data on how
contaminants spread
in the network

¡

Detect the
contaminant as quickly
as possible

¡

11/7/18

S

Problem posed by the
US Environmental
Protection Agency
Jure Leskovec, Stanford CS224W: Analysis of Networks,

3

Posts

Users/blogs
Information
cascade
Time
ordered
hyperlinks

Which users/news sites should
one follow to detect cascades
as effectively as possible?
11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

4

Want to read things
before others do.
Detect blue & yellow
stories soon but miss
the red story.

Detect all
stories but late.

11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

5

¡

Both of these two are instances of the same
underlying problem!

¡

Given a dynamic process spreading over
a network we want to select a set of nodes
to detect the process effectively

¡

Many other applications:
§ Epidemics
§ Influence propagation
§ Network security

11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

6

¡

Utility of placing sensors:
§ Water flow dynamics, demands of households, …

¡

For each subset S Í V compute utility f(S)

High impact
outbreak
Contamination

Low impact
outbreak
S3
S1S2

S1

S4

Set V of all
network junctions
High sensing “quality” (e.g., f(S) = 0.9)
11/7/18

Medium

impact
outbreak

S3

Sensor reduces
impact through
early detection!

S2
S4

S1

Low sensing “quality” (e.g. f(S)=0.01)

Jure Leskovec, Stanford CS224W: Analysis of Networks,

7

Given:
¡
¡

Graph !(#, %)
Data about how outbreaks spread over the ':
§ For each outbreak ( we know the time )(*, ()
when outbreak ( contaminates node *

Water distribution network
(physical pipes and junctions)
11/7/18

Simulator of water consumption & flow
(built by Mech. Eng. people)
We simulate the contamination spread for every
possible location.

Jure Leskovec, Stanford CS224W: Analysis of Networks,

8

Given:
¡
¡

Graph !(#, %)
Data about how outbreaks spread over the ':
§ For each outbreak ( we know the time )(*, ()
when outbreak ( contaminates node *
a

c

b

a
c

b

The network of
newsmedia
11/7/18

Traces of the information flow and
identify influence sets
Collect lots of articles and trace them to
obtain data about information flow from a
given news site.

Jure Leskovec, Stanford CS224W: Analysis of Networks,

9

Given:
¡
¡

Graph !(#, %)
Data on how outbreaks spread over the ':

¡

Goal: Select a subset of nodes S that
maximizes the expected reward:

§ For each outbreak ( we know the time )(*, ()

when outbreak ( contaminates node *

max 1 2 = 4 6 ( 15 2
.⊆0

5

Expected reward for
detecting outbreak i

subject to: cost(S) < B

P(i)… probability of outbreak i occurring.
f(i)… reward for detecting outbreak i using sensors S.
11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

10

¡

Reward (one of the following three):
§ (1) Minimize time to detection
§ (2) Maximize number of detected propagations
§ (3) Minimize number of infected people

¡

Cost (context dependent):

§ Reading big blogs is more time consuming
§ Placing a sensor in a remote location is expensive
5
outbreak i

Monitoring blue node saves more
people than monitoring the green node
11/7/18

9

2
1

8
11

6
3

10
7

Jure Leskovec, Stanford CS224W: Analysis of Networks,

f(S)
11

¡

Objective functions:

§ 1) Time to detection (DT)

§ How long does it take to detect a contamination?
§ Penalty for detecting at time !: "# (%) = %

§ 2) Detection likelihood (DL)

§ How many contaminations do we detect?
§ Penalty for detecting at time !: "# (%) = 0, "# (∞) = 1
§ Note, this is binary outcome: we either detect or not

§ 3) Population affected (PA)

¡
11/7/18

§ How many people drank contaminated water?
§ Penalty for detecting at time !: "# (%) = {# of infected
nodes in outbreak + by time %}.

Observation:
In all cases detecting sooner does not hurt!
Jure Leskovec, Stanford CS224W: Analysis of Networks,

12

We define !" # as penalty reduction:
$% & = (% ∅ − (% (,(&, .))
¡

Observation: Diminishing returns
New sensor:

x1

S’
s’

x1
x2

x3
x2

x4

Placement S={x1, x2}
Adding s’ helps a lot
11/7/18

Placement S’={x1, x2, x3, x4}
Adding s’ helps
very little

Jure Leskovec, Stanford CS224W: Analysis of Networks,

13

Claim: For all ! ⊆ # ⊆ $ and sensors % ∈ $\#
( !∪ * −( ! ≥( #∪ * −( #
¡ Proof: All our objectives are submodular
¡

§
§
§
§

Fix cascade/outbreak Show (- ! = /- ∞ − /- (2(!, -)) is submodular
Consider ! ⊆ # ⊆ $ and sensor * ∈ $\#
When does node % detect cascade -?
§ We analyze 3 cases based on when * detects outbreak i
§ (1) 2 #, - < 2 !, - < 2(*, -): * detects late, nobody benefits:
67 8 ∪ 9 = 67 8 , also 67 : ∪ 9 = 67 : and so
67 8 ∪ 9 − 67 8 = 0 = 67 : ∪ 9 − 67 :

11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

14

¡

Proof (contd.):

Remember ' ⊆ "

§ (2) ! ", $ ≤ ! &, $ ≤ ! ', $ : & detects after B but before A
& detects sooner than any node in ' but after all in ".
So & only helps improve the solution ' (but not ")
)* + ∪ - − )* + ≥ 0 = )* 2 ∪ - − )* 2
§ (3) ! &, $ < ! ", $ < !(', $): & detects early
)* + ∪ - − )* + = 5* ∞ − 5* 7 -, 8 − )* (+) ≥
5* ∞ − 5* 7 -, 8 − )* (2) = )* 2 ∪ - − )* 2
§ Inequality is due to non-decreasingness of )* (⋅), i.e., )* + ≤ )* (2)

§ So, :$ (⋅) is submodular!

¡

So, :(⋅) is also submodular

) ; = < = 8 )* ;
*

11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

15

¡
Hill-climbing
reward
d

a
b
c

b

a
c

d
e
Add sensor with
highest marginal gain

11/7/18

e

¡

What do we know about
optimizing submodular
functions?
§ Hill-climbing (i.e., greedy) is near

"
optimal: (" − $) ⋅ '()

But:

§ (1) This only works for unit cost
case! (each sensor costs the same)
§ For us each sensor * has cost +(*)

§ (2) Hill-climbing algorithm is slow
§ At each iteration we need to re-evaluate
marginal gains of all nodes
§ Runtime '(|-| · /) for placing / sensors
Jure Leskovec, Stanford CS224W: Analysis of Networks,

Part 2-16

¡

Consider the following algorithm to solve
the outbreak detection problem:
Hill-climbing that ignores cost
§ Ignore sensor cost !(#)
§ Repeatedly select sensor with highest marginal gain
§ Do this until the budget is exhausted

Q: How well does this work?
¡ A: It can fail arbitrarily badly! L

¡

§ There exists a problem setting where the hill-climbing
solution is arbitrarily far from OPT

§ Next we come up with an example
11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

18

¡

Bad example when we ignore cost:
§
§
§
§

¡

! sensors, budget "
#$: reward %, cost ",
#( … #!: reward % − +, c = $
Hill-climbing always prefers more expensive sensor
#$ with reward % (and exhausts the budget).
It never selects cheaper sensors with reward % − +
→ For variable cost it can fail arbitrarily badly!

Idea: What if we optimize benefit-cost ratio?
8 9/:; ∪ {.} − 8(9/:;)
./ = arg max
5∈7
A #

11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

Greedily pick sensor
#B that maximizes
benefit to cost ratio.
19

¡
¡

Benefit-cost ratio can also fail arbitrarily badly!
Consider: budget !:
§ 2 sensors "# and "$:
§ Costs: %("#) = ), %("$) = !
§ Benefit (only 1 cascade): *("#) = $), *("$) = !

§ Then benefit-cost ratio is:
§ * "# /%("#) = $ and *("$ )/%("$) = #

§ So, we first select "# and then can not afford "$

→We get reward $) instead of !! Now send ) → .
and we get an arbitrarily bad solution!
This algorithm incentivizes choosing nodes with very low cost, even when slightly
more expensive ones can lead to much better global results.
11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

20

¡

CELF (Cost-Effective Lazy Forward-selection)
A two pass greedy algorithm:

§ Set (solution) !′: Use benefit-cost greedy
§ Set (solution) !′′: Use unit-cost greedy

§ Final solution: ! = $%& '$((*(!′), *(!′′))

How far is CELF from (unknown) optimal
solution?
¡ Theorem: CELF is near optimal [Krause&Guestrin, ‘05]
¡

§ CELF achieves ½(1-1/e) factor approximation!
This is surprising: We have two clearly suboptimal solutions, but taking best of the
two is guaranteed to give a near-optimal solution.
11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

21

¡
Hill-climbing
reward
d

a
b
c

b

§ Hill-climbing (i.e., greedy) is near
$
optimal (that is, (1 − ) ⋅ ()*)

a
c

d
e
Add sensor with
highest marginal gain

e

What do we know about
optimizing submodular
functions?

¡

But:

%

§ (2) Hill-climbing algorithm is slow!
§ At each iteration we need to reevaluate marginal gains of all nodes
§ Runtime ((|,| · .) for placing .
sensors

11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

23

¡

In round ! + #: So far we picked $% = {(1, … , (% }
§ Now pick -!.# = /01 2/3 5(7! ∪ {4}) − 5(7! )
4

§ This our old friend – greedy hill-climbing algorithm.
It maximizes the “marginal gain”
;! 4 = 5(7! ∪ {4}) − 5(7! )

¡

By submodularity property:
< $= ∪ >

¡

− < $@ for % < B

Observation: By submodularity:
For every 4
C= (>) ≥ C@ (>) for % < B since $% ⊂ $B
Marginal benefits di(u) only shrink!
(as i grows)

11/7/18

− < $= ≥ < $@ ∪ >

di(u) ³ dj(u)

u

Activating node u in step i helps
more than activating it at step j (j>i)

Jure Leskovec, Stanford CS224W: Analysis of Networks,

24

¡

Idea:
§ Use di as upper-bound on dj (j > i)

¡

Lazy hill-climbing:

Marginal gain

§ Keep an ordered list of marginal
benefits di from previous iteration
§ Re-evaluate di only for top node
§ Re-order and prune

a
b
c
d
e

f(S È {u}) – f(S) ≥ f(T È {u}) – f(T)
11/7/18

Jure Leskovec, Stanford CS224W: Analysis of Networks,

S1={a}

SÍT
25

13 outbreak detection in networks

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về