Bài 2 Slide Linear Regression

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.05 MB, 84 trang )

Linear Regression

Regression
Given:

– Data
x

X =

x

(1)

where

, . . . ,

Rd

x(i)

(n )

where

y

9

– Corresponding labels

y

(1)

, . . . , y

y(i) 2 R

(n )

8

=

7
6
5
4

Linear Regression

3

Quadratic Regression

2
1
0

1975

1980

1985

1990

1995

2000

2005

2010

2015

Year
2

Prostate Cancer Dataset
•
•

97 samples, partitioned into 67 train / 30 test
Eight predictors (features):

–

•

Continuous outcome variable:

–

Based on slide by Jeﬀ Howbert

6 continuous (4 log transforms), 1 binary, 1 ordinal

lpsa: log(prostate specific antigen level)

Linear Regression

•

Hypothesis:

X

y = ✓0 + ✓1x1 + ✓2x2 + . . . + ✓dxd

d

=

✓j x j

j =0

Assume x 0 = 1

•

Fit model by minimizing sum of squared errors

x

Figures are courtesy of Greg Shakhnarovich

5

Least Squares Linear Regression

•

Cost Function
1

X

n

J (✓) =
2n

•

Fit by solving

⇣

⇣
h✓

i=1

⌘

x

(i )

⌘2

(i)

— y

min J (✓)
✓

6

Intuition Behind Cost Function
1

J (✓) =

X

n

⇣

⇣
h✓

2n

⌘

x (i )

⌘2

—

i=1

For insight on J(), let’s assume

y (i )

x

R

so

✓ = [✓0 , ✓1 ]

Based on example by Andrew
Ng

7

Intuition Behind Cost Function
1
J (✓) =

X

n

⇣

⇣
h✓

2n

⌘

⌘2

x (i )

—

i=1

y (i )

x

For insight on J(), let’s assume

(for fixed

, this is a function of x)

R

so

✓ = [✓0 , ✓1 ]

(function of the parameter

3

3

2

2

1

1

0

0

)

y

0

1

x

2

3

-0.5

0

0.5

1

1.5

2

2.5

Based on example by Andrew
Ng

8

Intuition Behind Cost Function
X

1
J (✓) =

n

⇣

⇣
h✓

2n

⌘

⌘2

x (i )

—

i=1

y (i )

x

For insight on J(), let’s assume

(for fixed

, this is a function of x)

R

✓ = [✓0 , ✓1 ]

so

(function of the parameter

3

3

2

2

1

1

0

0

)

y

0

Based on example by Andrew
Ng

1

x

J ([0, 0. 5]) =

2

-0.5

3
1
2

3

⇥
(0.5 —

1)

2

+ (1 — 2

)

2

0

+ (1.5 — 3)

0.5
2

⇤

1

1.5

2

2.5

⇡ 0.58
9

Intuition Behind Cost Function
1
J (✓) =

X

n

⇣

⇣
h✓

2n

⌘

⌘2

x (i )

—

i=1

y (i )

x

For insight on J(), let’s assume

(for fixed

, this is a function of x)

R

so

✓ = [✓0 , ✓1 ]

(function of the parameter

3

3

2

2

1

1

0

0

)

J ([0, 0]) ⇡ 2. 333

y

0

1

x

2

3

-0.5

J() is concave

0

0.5

1

1.5

2

2.5

Based on example by Andrew
Ng

10

Intuition Behind Cost Function

Slide by Andrew Ng

11

Intuition Behind Cost Function

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

12

Intuition Behind Cost Function

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

13

Intuition Behind Cost Function

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

14

Intuition Behind Cost Function

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

15

Basic Search Procedure

•
•

Choose initial value for

✓

Until we reach a minimum:

– Choose a new value for

J(

0,

✓

to reduce

J (✓)

1)

1
0

Figure by Andrew Ng

16

Basic Search Procedure

•
•

Choose initial value for

✓

Until we reach a minimum:

– Choose a new value for

J(

0,

✓

to reduce

J (✓)

✓

1)

1
0

Figure by Andrew Ng

17

Basic Search Procedure

•
•

Choose initial value for

✓

Until we reach a minimum:

– Choose a new value for

J(

0,

1)

✓

to reduce

J (✓)

✓

Since the least squares objective function is conv1ex (concave),
we don’t ne0ed to worry about local minima
Figure by Andrew Ng

18

Gradient Descent

•
•

Initialize
✓ until convergence
Repeat
@
✓j

←✓

j

— ↵

simultaneous update for j = 0 ... d

J (✓)

@✓j

learning rate (small) e.g., α = 0.05

3

2

J(✓)
1

0
-0.5

0

0.5

1

1.5

2

2.5

✓
19

Gradient Descent

•
•

Initialize

✓

Repeat until convergence
@
✓j

@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X

n

J (✓) =

@✓j

@✓j 2n

⇣

⇣
h✓

⌘

x (i )

⌘2

—

i =1

y (i )

20

Gradient Descent

•
•

Initialize

✓

Repeat until convergence
@
✓j

@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X

n

J (✓) =
@✓j

@✓j 2n

@

1

⇣

⇣
h✓

x (i )

⌘2

—

i =1

X

n

X

d

=
@✓j 2n

⌘

i =1

! 2

y (i )

✓k x (ki )—

y

(i )

k =0

21

Gradient Descent

•
•

Initialize

✓

Repeat until convergence
@
✓j

@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X

⇣

n

J (✓) =
@✓j

@✓j 2n

@

1

⇣

h✓

x (i )

1 X

n

X

n

X

i =1

—

X

! 2

y (i )

d

✓k x (ki )

k =0

n
k =0

(i )

✓k x k

— y

(i )

!

d

=

i =1

⌘2

i =1

=
@✓j 2n

⌘

— y

(i )

@

X

⇥
@✓j

!

d

✓k x (ki )—

y

(i )

k =0

22

Gradient Descent

•
•

Initialize

✓

Repeat until convergence
@
✓j

@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X

⇣

n

J (✓) =
@✓j

@✓j 2n

@

1

⇣
h✓

x (i )

1 X

n

X

n

X

i =1
X

n

✓k x (ki )

k =0

✓k x k

X

(i )

— y

@

(i )

!

✓k x (ki )

— y

(i )

X

!

d

⇥
@✓j

d

n

— y
!

k =0

=

i =1

! 2

y (i )

d

(i )

n
1 X

—

d

=

i =1

⌘2

i =1

=
@✓j 2n

⌘

✓k x (ki )—

y

(i )

k =0

x j( i )

k =0
23

Gradient Descent for Linear Regression

•
•

Initialize

✓

Repeat until convergence
1 X
✓j

← ✓j — ↵

n

⇣

n

⇣

⌘

h✓

(i )

x (i)

• To achieve simultaneous update
• At the start of each GD iteration, compute h✓
• Use this stored value in the update step loop
• Assume convergence when
✓

L 2 norm:

new

simultaneous update

(i)

xj

— y

i =1

s

⌘

for j = 0 ... d

x (i)

— ✓o l d k2 <

q

X

kv k =
2

v

2
i

=

✏ v 12

+ v

2
2

+ . . . + v

2
|v |

i
24

Gradient Descent

(for fixed

, this is a function of x)

(function of the parameters

)

h(x) = -900 – 0.1 x

Slide by Andrew Ng

25

Gradient Descent

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

26

Bài 2 Slide Linear Regression

Tài liệu liên quan

Tài liệu bạn tìm kiếm đã sẵn sàng tải về