Tải bản đầy đủ (.ppt) (50 trang)

Chapter 4 (linear regression) student

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (376.82 KB, 50 trang )

Chapter 4
Linear Regression
and Correlation
analysis


1. Introduction to regression
analysis


Regression analysis
- Describe a relationship between two
variables in mathematical terms.
- Predict the value of a dependent variable
based on the value of at least one
independent variable
- Explain the impact of changes in an
independent variable on the dependent
variable


1. Introduction to regression
analysis
Dependent
variable

the variable we
wish to explain

Independent
variable



the variable
used to
explain the
dependent
variable


Names for ys and xs in
regression model
Names for y

Name for xs

Dependent
variable
Regressand

Independent
variables
Regressors

Effect variable

Causal variables

Explained variable

Explanatory
variables



Simple Linear Regression
Model






Only one independent variable,
x
Relationship between x and y
is described by a linear function
Changes in y are assumed to
be caused by changes in x


Types of Regression
Models
Positive Linear Relationship

Negative Linear Relationship

Non-linear relationship

No Relationship


Population Linear

Regression
The population regression model:
Population
y intercept
Dependent
Variable

Population
Slope
Coefficient

Independent
Variable

y β0  β1x  ε
Linear component

Random
Error
term, or
residual

Random Error
component


Linear Regression
Assumptions









Error values (ε) are statistically
independent
Error values are normally distributed
for any given value of x
The probability distribution of the
errors has constant variance
The underlying relationship between
the x variable and the y variable is
linear


Population Linear
Regression
y

y β0  β1x  ε

Observed Value
of y for xi

εi

Predicted Value
of y for xi


Slope = β1
Random Error
for this x value

Intercept = β0

xi

x


Estimated Regression
Model
The sample regression line provides an estimate of
the population regression line
Estimated
(or predicted)
y value

Estimate of
the regression

Estimate of the
regression slope

intercept

yˆ i b0  b1x


Independent
variable

The individual random error terms ei have a mean of zero


Least Squares Criterion


b0 and b1 are obtained by finding the
values of b0 and b1 that minimize the
sum of the squared residuals
2
ˆ
 e   (y  y)
2

2

  (y  (b0  b1x))


The Least Squares
Equation


The formulas for b1 and b0 are:

b1 


o
r

 xy 

 x y

n
2
(
x
)

2
x  n

xy  x . y
b1 
2
x

and

b0  y  b1 x


Interpretation of the
Slope and the Intercept



b0 is the estimated average value
of y when the value of x is zero



b1 is the estimated change in the
average value of y as a result of a
one-unit change in x


Example


A real estate agent wishes to examine
the relationship between the selling
price of a home and its size
(measured in square feet)



A random samplehouse
of 10price
houses
is
in $1000s
selected
 Independent

Dependent variable


variable (x)?

(y)?square feet


Sample Data for House
Price Model
House Price in
$1000s
(y)

Square Feet
(x)

245

1400

312

1600

279

1700

308

1875


199

1100

219

1550

405

2350

324

2450

319

1425

255

1700


Least Squares
Regression Properties


The sum of the residuals from the least

squares regression
yˆ ) 0is 0
 ( y line

The sum of the squared residuals is a
minimum 
(minimized)
( y  yˆ ) 2


The simple regression line always passes
through the mean of the y variable and
the mean of the
y xb0variable
 b1 x


The least squares coefficients are
unbiased estimates of β0 and β1



2

R
Coefficient of Determination





The coefficient of determination is
the portion of the total variation in
the dependent variable that is
explained by variation in the
independent variable
The coefficient of determination is
also called RSS
R-squared and is denoted
2
2
where
as
0 R 1
R 

TSS


2
Coefficient of Determination
R

Coefficient of determination
RSS sum of squares explained by regression
R 

TSS
total sum of squares
2



Examples of
Approximate
2 Values
R
y
R2 = 1

R2 = 1

x

100% of the variation
in y is explained by
variation in x

y

R2 = +1

Perfect linear relationship
between x and y:

x


Examples of
Approximate
2
R Values

y
0 < R2 < 1

x

Weaker linear relationship
between x and y:

Some but not all of
the variation in y is
explained by
variation in x

y

x


Examples of
Approximate
2 Values
R
R2 = 0

y

No linear relationship
between x and y:

R2 = 0


x

The value of Y does
not depend on x.
(None of the
variation in y is
explained by
variation in x)


y
245

x
1400

312

1600

279

1700

308

1875

199


1100

219
2865

1550
17150


251.82
83
273.76
83
284.73
83
303.93
58
218.91
83
268.28
33
2863.8
38

( yˆ  y ) 2

( y  y )2

1202.1

27
162.096
2
3.1035
87
304.007
1
4567.2
86
331.84
82
18911.
71

1722.2
5
650.25
56.25
462.25
7656.25
4556.25
32600.
5


Coefficient of
determination
RSS
R 
TSS

2


2. Correlation analysis




Correlation is a technique used to
measure the strength of the
relationship between two variables.
The stronger the correlation, the
better the relationship or the better
fit the regression line and vice versa.


Scatter Plot Examples
Low degree of
correlation

High degree of
correlation
y

y

x
y

x

y

x

x


×