Tải bản đầy đủ (.pptx) (84 trang)

Bài 2 Slide Linear Regression

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (2.05 MB, 84 trang )

Linear Regression


Regression
Given:

– Data
x

X =

x

(1)

where

, . . . ,

Rd

x(i)

(n )

where

y

9


– Corresponding labels

y

(1)

, . . . , y

y(i) 2 R

(n )

8

=

7
6
5
4

Linear Regression

3

Quadratic Regression

2
1
0

1975

1980

1985

1990

1995

2000

2005

2010

2015

Year
2


Prostate Cancer Dataset



97 samples, partitioned into 67 train / 30 test
Eight predictors (features):






Continuous outcome variable:



Based on slide by Jeff Howbert

6 continuous (4 log transforms), 1 binary, 1 ordinal

lpsa: log(prostate specific antigen level)


Linear Regression



Hypothesis:

X

y = ✓0 + ✓1x1 + ✓2x2 + . . . + ✓dxd

d

=

✓j x j


j =0

Assume x 0 = 1



Fit model by minimizing sum of squared errors

x

Figures are courtesy of Greg Shakhnarovich

5


Least Squares Linear Regression



Cost Function
1

X

n

J (✓) =
2n




Fit by solving




h✓

i=1



x

(i )

⌘2

(i)

— y

min J (✓)


6


Intuition Behind Cost Function
1

J (✓) =

X

n




h✓

2n



x (i )

⌘2



i=1

For insight on J(), let’s assume

y (i )

x

R


so

✓ = [✓0 , ✓1 ]

Based on example by Andrew
Ng

7


Intuition Behind Cost Function
1
J (✓) =

X

n




h✓

2n



⌘2


x (i )



i=1

y (i )

x

For insight on J(), let’s assume

(for fixed

, this is a function of x)

R

so

✓ = [✓0 , ✓1 ]

(function of the parameter

3

3

2


2

1

1

0

0

)

y

0

1

x

2

3

-0.5

0

0.5


1

1.5

2

2.5

Based on example by Andrew
Ng

8


Intuition Behind Cost Function
X

1
J (✓) =

n




h✓

2n




⌘2

x (i )



i=1

y (i )

x

For insight on J(), let’s assume

(for fixed

, this is a function of x)

R

✓ = [✓0 , ✓1 ]

so

(function of the parameter

3

3


2

2

1

1

0

0

)

y

0

Based on example by Andrew
Ng

1

x

J ([0, 0. 5]) =

2


-0.5

3
1
2

3


(0.5 —

1)

2

+ (1 — 2

)

2

0

+ (1.5 — 3)

0.5
2




1

1.5

2

2.5

⇡ 0.58
9


Intuition Behind Cost Function
1
J (✓) =

X

n




h✓

2n



⌘2


x (i )



i=1

y (i )

x

For insight on J(), let’s assume

(for fixed

, this is a function of x)

R

so

✓ = [✓0 , ✓1 ]

(function of the parameter

3

3

2


2

1

1

0

0

)

J ([0, 0]) ⇡ 2. 333

y

0

1

x

2

3

-0.5

J() is concave


0

0.5

1

1.5

2

2.5

Based on example by Andrew
Ng

10


Intuition Behind Cost Function

Slide by Andrew Ng

11


Intuition Behind Cost Function

(for fixed


Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

12


Intuition Behind Cost Function

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

13


Intuition Behind Cost Function

(for fixed


Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

14


Intuition Behind Cost Function

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

15


Basic Search Procedure





Choose initial value for



Until we reach a minimum:

– Choose a new value for

J(

0,



to reduce

J (✓)

1)

1
0

Figure by Andrew Ng

16


Basic Search Procedure





Choose initial value for



Until we reach a minimum:

– Choose a new value for

J(

0,



to reduce

J (✓)



1)

1
0

Figure by Andrew Ng


17


Basic Search Procedure




Choose initial value for



Until we reach a minimum:

– Choose a new value for

J(

0,

1)



to reduce

J (✓)




Since the least squares objective function is conv1ex (concave),
we don’t ne0ed to worry about local minima
Figure by Andrew Ng

18


Gradient Descent




Initialize
✓ until convergence
Repeat
@
✓j

←✓

j

— ↵

simultaneous update for j = 0 ... d

J (✓)

@✓j

learning rate (small) e.g., α = 0.05

3

2

J(✓)
1

0
-0.5

0

0.5

1

1.5

2

2.5


19


Gradient Descent





Initialize



Repeat until convergence
@
✓j

@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X

n

J (✓) =

@✓j

@✓j 2n




h✓



x (i )

⌘2



i =1

y (i )

20


Gradient Descent




Initialize




Repeat until convergence
@
✓j

@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X

n

J (✓) =
@✓j

@✓j 2n


@

1




h✓

x (i )

⌘2



i =1

X

n

X

d

=
@✓j 2n




i =1

! 2

y (i )

✓k x (ki )—

y

(i )

k =0

21


Gradient Descent




Initialize



Repeat until convergence
@
✓j


@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X



n

J (✓) =
@✓j

@✓j 2n

@

1



h✓

x (i )

1 X

n

X

n

X

i =1



X

! 2

y (i )

d

✓k x (ki )

k =0


n
k =0

(i )

✓k x k

— y

(i )

!

d

=

i =1

⌘2

i =1

=
@✓j 2n



— y


(i )

@

X


@✓j

!

d

✓k x (ki )—

y

(i )

k =0

22


Gradient Descent




Initialize




Repeat until convergence
@
✓j

@
For Linear Regression:

simultaneous update for j = 0 ... d

J (✓)

← ✓j — ↵
@✓j

@

1

X



n

J (✓) =
@✓j


@✓j 2n

@

1


h✓

x (i )

1 X

n

X

n

X

i =1
X

n

✓k x (ki )

k =0


✓k x k

X

(i )

— y

@

(i )

!

✓k x (ki )

— y

(i )

X

!

d


@✓j

d


n

— y
!

k =0

=

i =1

! 2

y (i )

d

(i )

n
1 X



d

=

i =1


⌘2

i =1

=
@✓j 2n



✓k x (ki )—

y

(i )

k =0

x j( i )

k =0
23


Gradient Descent for Linear Regression




Initialize




Repeat until convergence
1 X
✓j

← ✓j — ↵

n



n





h✓

(i )

x (i)

• To achieve simultaneous update
• At the start of each GD iteration, compute h✓
• Use this stored value in the update step loop
• Assume convergence when


L 2 norm:

new

simultaneous update

(i)

xj

— y

i =1

s



for j = 0 ... d

x (i)

— ✓o l d k2 <

q

X

kv k =
2


v

2
i

=

✏ v 12

+ v

2
2

+ . . . + v

2
|v |

i
24


Gradient Descent

(for fixed

, this is a function of x)


(function of the parameters

)

h(x) = -900 – 0.1 x

Slide by Andrew Ng

25


Gradient Descent

(for fixed

Slide by Andrew Ng

, this is a function of x)

(function of the parameters

)

26


Tài liệu bạn tìm kiếm đã sẵn sàng tải về

Tải bản đầy đủ ngay
×