Tải bản đầy đủ (.pdf) (88 trang)

Class Notes in Statistics and Econometrics Part 1 pdf

Bạn đang xem bản rút gọn của tài liệu. Xem và tải ngay bản đầy đủ của tài liệu tại đây (569.29 KB, 88 trang )

Class Notes in Statistics and Econometrics
Hans G. Ehrbar
Economics Department, University of Utah, 1645 Campus Center
Drive, Salt Lake City UT 84112-9300, U.S.A.
URL: www.econ.utah.edu/ehrbar/ecmet.pdf
E-mail address:
Abstract. This is an attempt to make a carefully argued set of class notes
freely available. The source code for these notes can be downloaded from
www.econ.utah.edu/ehrbar/ecmet-sources.zip Copyright Hans G. Ehrbar un-
der the GNU Public License
Contents
Chapter 1. Preface xxiii
Chapter 2. Probability Fields 1
2.1. The Concept of Probability 1
2.2. Events as Sets 12
2.3. The Axioms of Probability 20
2.4. Objective and Subjective Interpretation of Probability 26
2.5. Counting Rules 28
2.6. Relationships Involving Binomial Coefficients 32
2.7. Conditional Probability 34
2.8. Ratio of Probabilities as Strength of Evidence 45
iii
iv CONTENTS
2.9. Bayes Theorem 48
2.10. Independence of Events 50
2.11. How to Plot Frequency Vectors and Probability Vectors 57
Chapter 3. Random Variables 63
3.1. Notation 63
3.2. Digression about Infinitesimals 64
3.3. Definition of a Random Variable 68
3.4. Characterization of Random Variables 70


3.5. Discrete and Absolutely Continuous Probability Measures 77
3.6. Transformation of a Scalar Density Function 79
3.7. Example: Binomial Variable 82
3.8. Pitfalls of Data Reduction: The Ecological Fallacy 85
3.9. Independence of Random Variables 87
3.10. Location Parameters and Dispersion Parameters of a Random Variable 89
3.11. Entropy 100
Chapter 4. Random Number Generation and Encryption 121
4.1. Alternatives to the Linear Congruential Random Generator 125
4.2. How to test random generators 126
4.3. The Wichmann Hill generator 128
CONTENTS v
4.4. Public Key Cryptology 133
Chapter 5. Specific Random Variables 139
5.1. Binomial 139
5.2. The Hypergeometric Probability Distribution 146
5.3. The Poisson Distribution 148
5.4. The Exponential Distribution 154
5.5. The Gamma Distribution 158
5.6. The Uniform Distribution 164
5.7. The Beta Distribution 165
5.8. The Normal Distribution 166
5.9. The Chi-Square Distribution 172
5.10. The Lognormal Dis tribution 174
5.11. The Cauchy Distribution 174
Chapter 6. Sufficient Statistics and their Distributions 179
6.1. Factorization Theorem for Sufficient Statistics 179
6.2. The Exponential Family of Probability Distributions 182
Chapter 7. Chebyshev Inequality, Weak Law of Large Numbers, and Central
Limit Theorem 189

vi CONTENTS
7.1. Chebyshev Inequality 189
7.2. The Probability Limit and the Law of Large Numbers 192
7.3. Central Limit Theorem 195
Chapter 8. Vector Random Variables 199
8.1. Expected Value, Variances, Covariances 203
8.2. Marginal Probability Laws 210
8.3. Conditional Probability Distribution and Conditional Mean 212
8.4. The Multinomial Distribution 216
8.5. Independent Random Vectors 218
8.6. Conditional Expectation and Variance 221
8.7. Expected Values as Predictors 226
8.8. Transformation of Vector Random Variables 235
Chapter 9. Random Matrices 245
9.1. Linearity of Expected Values 245
9.2. Means and Variances of Quadratic Forms in Random Matrices 249
Chapter 10. The Multivariate Normal Probability Distribution 261
10.1. More About the Univariate Case 261
10.2. Definition of Multivariate Normal 264
CONTENTS vii
10.3. Special Case: Bivariate Normal 265
10.4. Multivariate Standard Normal in Higher Dimensions 284
10.5. Higher Moments of the Multivariate Standard Normal 290
10.6. The General Multivariate Normal 299
Chapter 11. The Regression Fallacy 309
Chapter 12. A Simple Example of Estimation 327
12.1. Sample Mean as Estimator of the Location Parameter 327
12.2. Intuition of the Maximum Likelihood Estimator 330
12.3. Variance Estimation and Degrees of Freedom 335
Chapter 13. Estimation Principles and Classification of Estimators 355

13.1. Asymptotic or Large-Sample Properties of Estimators 355
13.2. Small Sample Properties 359
13.3. Comparison Unbiasedness Consistency 362
13.4. The Cramer-Rao Lower Bound 369
13.5. Best Linear Unbiased Without Distribution Assumptions 386
13.6. Maximum Likelihood Estimation 390
13.7. Method of Moments Estimators 396
13.8. M-Estimators 396
viii CONTENTS
13.9. Sufficient Statistics and Estimation 397
13.10. The Likelihood Principle 405
13.11. Bayesian Inference 406
Chapter 14. Interval Estimation 411
Chapter 15. Hypothesis Testing 425
15.1. Duality between Significance Tests and Confidence Regions 433
15.2. The Neyman Pearson Lemma and Likelihood Ratio Tests 434
15.3. The Runs Test 440
15.4. Pearson’s Goodness of Fit Test. 447
15.5. Permutation Tests 453
15.6. The Wald, Likelihood Ratio, and Lagrange Multiplier Tests 465
Chapter 16. General Principles of Econometric Modelling 469
Chapter 17. Causality and Inference 473
Chapter 18. Mean-Variance Analysis in the Linear Model 481
18.1. Three Versions of the Linear Model 481
18.2. Ordinary Least Squares 484
18.3. The Coefficient of Determination 499
CONTENTS ix
18.4. The Adjusted R- Square 509
Chapter 19. Digression about Correlation Coefficients 513
19.1. A Unified Definition of Correlation Coefficients 513

19.2. Correlation Coefficients and the Associated Least Squares Problem 519
19.3. Canonical Correlations 521
19.4. Some Remarks about the Sample Partial Correlation Coefficients 524
Chapter 20. Numerical Methods for c omputing OLS Estimates 527
20.1. QR Decomposition 527
20.2. The LINPACK Impleme ntation of the QR Decomposition 530
Chapter 21. About Computers 535
21.1. General Strategy 535
21.2. The Emacs Editor 542
21.3. How to Enter and Exit SAS 544
21.4. How to Transfer SAS Data Sets Between Computers 545
21.5. Instructions for Statistics 5969, Hans Ehrbar’s Section 547
21.6. The Data Step in SAS 557
Chapter 22. Specific Datasets 563
x CONTENTS
22.1. Cobb Douglas Aggregate Production Function 563
22.2. Houthakker’s Data 580
22.3. Long Term Data about US Economy 592
22.4. Dougherty Data 594
22.5. Wage Data 595
Chapter 23. The Mean Squared Error as an Initial Criterion of Precision 629
23.1. Comparison of Two Vector Estimators 630
Chapter 24. Sampling Properties of the Least Squares Estimator 637
24.1. The Gauss Markov Theorem 639
24.2. Digression about Minimax Estimators 643
24.3. Miscellaneous Properties of the BLUE 645
24.4. Estimation of the Variance 666
24.5. Mallow’s Cp-Statistic as Estimator of the Mean Squared Error 668
24.6. Optimality of Variance Estimators 670
Chapter 25. Variance Estimation: Should One Require Unbiasedness? 675

25.1. Setting the Framework Straight 678
25.2. Derivation of the Best Bounded MSE Quadratic Estimator of the
Variance 682
CONTENTS xi
25.3. Unbiasedness Revisited 688
25.4. Summary 692
Chapter 26. Nonspherical Positive Definite Covariance Matrix 695
Chapter 27. Best Linear Prediction 703
27.1. Minimum Mean Squared Error, Unbiasedness Not Required 704
27.2. The Associated Least Squares Problem 717
27.3. Prediction of Future Observations in the Regression Model 720
Chapter 28. Updating of Estimates When More Observations become Available731
Chapter 29. Constrained Least Squares 737
29.1. Building the Constraint into the Model 738
29.2. Conversion of an Arbitrary Constraint into a Zero Constraint 740
29.3. Lagrange Approach to Constrained Least Squares 742
29.4. Constrained Least Squares as the Nesting of Two Simpler Models 748
29.5. Solution by Quadratic Decomposition 750
29.6. Sampling Properties of Constrained Least Squares 752
29.7. Estimation of the Variance in Constrained OLS 755
29.8. Inequality Restrictions 763
xii CONTENTS
29.9. Application: Biased Estimators and Pre-Test Estimators 764
Chapter 30. Additional Regressors 765
30.1. Selection of Re gress ors 789
Chapter 31. Residuals: Standardized, Predictive, “Studentized” 795
31.1. Three Decisions about Plotting Residuals 795
31.2. Relationship between Ordinary and Predictive Residuals 800
31.3. Standardization 806
Chapter 32. Regression Diagnostics 813

32.1. Missing Observations 814
32.2. Grouped Data 815
32.3. Influential Observations and Outliers 815
32.4. Sensitivity of Estimates to Omission of One Observation 820
Chapter 33. Regression Graphics 833
33.1. Scatterplot Matrices 834
33.2. Conditional Plots 838
33.3. Spinning 839
33.4. Sufficient Plots 841
CONTENTS xiii
Chapter 34. Asymptotic Properties of the OLS Estimator 847
34.1. Consistency of the OLS estimator 850
34.2. Asymptotic Normality of the Least Squares Estimator 852
Chapter 35. Least Squares as the Normal Maximum Likelihood Estimate 855
Chapter 36. Bayesian Estimation in the Linear Model 867
Chapter 37. OLS With Random Constraint 877
Chapter 38. Stein Rule Estimators 883
Chapter 39. Random Re gressors 891
39.1. Strongest Assumption: Error Term Well Behaved Conditionally on
Explanatory Variables 892
39.2. Contemporaneously Uncorrelated Disturbances 895
39.3. Disturbances Correlated with Regressors in Same Observation 896
Chapter 40. The Mahalanobis Distance 897
40.1. Definition of the Mahalanobis Distance 898
40.2. The Conditional Mahalanobis Distance 903
xiv CONTENTS
40.3. First Scenario: Minimizing relative increase in Mahalanobis distance
if distribution is known 904
40.4. Second Scenario: One Additional IID Observation 906
40.5. Third Scenario: one additonal observation in a Regression Model 909

Chapter 41. Interval Estimation 921
41.1. A Basic Construction Principle for C onfidence Regions 921
41.2. Coverage Probability of the Confidence Regions 929
41.3. Conventional Formulas for the Test Statistics 931
41.4. Interpretation in terms of Studentized Mahalanobis Distance 932
Chapter 42. Three Principles for Testing a Linear Constraint 941
42.1. Mathematical Detail of the Three Approaches 943
42.2. Examples of Tests of Linear Hypotheses 950
42.3. The F-Test Statistic is a Function of the Likelihood Ratio 966
42.4. Tests of Nonlinear Hypotheses 968
42.5. Choosing Between Nonnested Models 968
Chapter 43. Multiple Comparisons in the Linear Model 971
43.1. Rectangular Confidence Re gions 971
43.2. Relation between F-test and t-tests. 978
CONTENTS xv
43.3. Large-Sample Simultaneous Confidence Regions 983
Chapter 44. Sample SAS Regression Output 989
Chapter 45. Flexible Functional Form 997
45.1. Categorical Variables: Regression with Dummies and Factors 998
45.2. Flexible Functional Form for Numerical Variables 1002
45.3. More than One Explanatory Variable: Backfitting 1014
Chapter 46. Transformation of the Response Variable 1019
46.1. Alternating Least Squares and Alternating Conditional Expectations 1020
46.2. Additivity and Variance Stabilizing Transformations (avas) 1027
46.3. Comparing ace and avas 1029
Chapter 47. Density Estimation 1031
47.1. How to Measure the Precision of a Density Estimator 1031
47.2. The Histogram 1032
47.3. The Frequency Polygon 1034
47.4. Kernel Densities 1034

47.5. Transformational Kernel Density Estimators 1036
47.6. Confidence Bands 1036
xvi CONTENTS
47.7. Other Approaches to Density Estimation 1036
47.8. Two-and Three-Dimensional Densities 1037
47.9. Other Characterizations of Distributions 1038
47.10. Quantile-Quantile Plots 1038
47.11. Testing for Normality 1042
Chapter 48. Measuring Economic Inequality 1043
48.1. Web Resources about Income Inequality 1043
48.2. Graphical Representations of Inequality 1044
48.3. Quantitative Measures of Income Inequality 1045
48.4. Properties of Inequality Measures 1050
Chapter 49. Distributed Lags 1051
49.1. Geometric lag 1062
49.2. Autoregressive Distributed Lag Models 1063
Chapter 50. Investment Models 1073
50.1. Accelerator Models 1073
50.2. Jorgenson’s Model 1076
50.3. Investment Function Project 1081
CONTENTS xvii
Chapter 51. Distinguishing Random Variables from Variables Created by a
Deterministic C haotic Process 1083
51.1. Empirical Methods: Grassberger-Procaccia Plots. 1086
Chapter 52. Instrumental Variables 1089
Chapter 53. Errors in Variables 1099
53.1. The Simplest Errors-in-Variables Model 1099
53.2. General Definition of the EV Model 1108
53.3. Particular Forms of EV Models 1111
53.4. The Identification Problem 1116

53.5. Properties of Ordinary Least Squares in the EV model 1126
53.6. Kalman’s Critique of Malinvaud 1132
53.7. Estimation if the EV Model is Identified 1146
53.8. P-Estimation 1152
53.9. Estimation When the Error Covariance Matrix is Exactly Known 1165
Chapter 54. Dynamic Linear Models 1169
54.1. Specification and Recursive Solution 1169
54.2. Locally Constant Model 1175
54.3. The Reference Model 1181
xviii CONTENTS
54.4. Exchange Rate Forecasts 1186
54.5. Company Market Share 1194
54.6. Productivity in Milk Production 1200
Chapter 55. Numerical Minimization 1207
Chapter 56. Nonlinear Least Squares 1215
56.1. The J Test 1227
56.2. Nonlinear instrumental variables estimation 1230
Chapter 57. Applications of GLS with Nonspherical Covariance Matrix 1233
57.1. Cases when OLS and GLS are identical 1234
57.2. Heteroskedastic Disturbances 1235
57.3. Equicorrelated Covariance Matrix 1238
Chapter 58. Unknown Parameters in the Covariance Matrix 1245
58.1. Heteroskedasticity 1246
58.2. Autocorrelation 1255
58.3. Autoregressive Conditional Heteroskedasticity (ARCH) 1281
Chapter 59. Generalized Method of Mome nts Estimators 1287
CONTENTS xix
Chapter 60. Bootstrap Estimators 1299
Chapter 61. Random Coefficients 1303
Chapter 62. Multivariate Regression 1313

62.1. Multivariate Econometric Models: A Classification 1313
62.2. Multivariate Regress ion with Equal Regressors 1315
62.3. Growth Curve Models 1329
Chapter 63. Independent Observations from the Same Multivariate Population1333
63.1. Notation and Basic Statistics 1333
63.2. Two Geometries 1337
63.3. Assumption of Normality 1339
63.4. EM-Algorithm for Missing Observations 1341
63.5. Wishart Distribution 1347
63.6. Sample Correlation Coefficients 1349
Chapter 64. Pooling of Cross Section and Time Series Data 1353
64.1. OLS Model 1354
64.2. The Between-Estimator 1356
64.3. Dummy Variable Model (Fixed Effects) 1357
xx CONTENTS
64.4. Relation between the three Models so far: 1365
64.5. Variance Components Model (Random Effects) 1365
Chapter 65. Disturbance Related (Seemingly Unrelated) Regressions 1375
65.1. The Supermatrix Representation 1376
65.2. The Likelihood Function 1380
65.3. Concentrating out the Covariance Matrix (Incomplete) 1386
65.4. Situations in which OLS is Best 1389
65.5. Unknown Covariance Matrix 1395
Chapter 66. Simultaneous Equations Systems 1397
66.1. Examples 1397
66.2. General Mathematical Form 1405
66.3. Indirect Least Squares 1414
66.4. Instrumental Variables (2SLS) 1416
66.5. Identification 1419
66.6. Other Estimation Methods 1424

Chapter 67. Timeseries Analysis 1435
67.1. Covariance Stationary Timeseries 1435
67.2. Vector Autoregressive Processes 1450
CONTENTS xxi
67.3. Nonstationary Processes 1460
67.4. Cointegration 1464
Chapter 68. Seasonal Adjustment 1467
68.1. Methods of Seasonal Adjustment 1472
68.2. Seasonal Dummies in a Regression 1474
Chapter 69. Binary Choice Models 1487
69.1. Fisher’s Scoring and Iteratively Reweighted Le ast Squares 1487
69.2. Binary Dependent Variable 1489
69.3. The Generalized Linear Model 1495
Chapter 70. Multiple Choice Models 1499
Appendix A. Matrix Formulas 1501
A.1. A Fundamental Matrix Decomposition 1501
A.2. The Spectral Norm of a Matrix 1503
A.3. Inverses and g-Inverses of Matrices 1504
A.4. Deficiency Matrices 1506
A.5. Nonnegative Definite Symmetric Matrices 1514
A.6. Projection Matrices 1523
xxii CONTENTS
A.7. Determinants 1528
A.8. More About Inverses 1530
A.9. Eigenvalues and Singular Value Decomposition 1537
Appendix B. Arrays of Higher Rank 1543
B.1. Informal Survey of the Notation 1544
B.2. Axiomatic Development of Array Operations 1549
B.3. An Additional Notational Detail 1559
B.4. Equality of Arrays and Extended Substitution 1561

B.5. Vectorization and Kronecker Product 1562
Appendix C. Matrix Differentiation 1583
C.1. First Derivatives 1583
Appendix. Bibliography 1597
CHAPTER 1
Preface
These are class notes from several different graduate econometrics and statistics
classes. In the Spring 2000 they were used for Statistics 6869, syllabus on p. ??, and
in the Fall 2000 for Economics 7800, syllabus on p. ??. The notes give a careful and
complete mathematical treatment intended to be accessible also to a reader inexpe-
rienced in math. There are 618 exercise questions, almost all with answers. The
R-package ecmet has many of the datasets and R-functions needed in the examples.
P. 547 gives instructions how to download it.
Here are some features by which these notes may differ from other teaching
material available:
xxiii
xxiv 1. PREFACE
• A typographical distinction is made between random variables and the val-
ues taken by them (page 63).
• Best linear prediction of jointly distributed random variables is given as a
second basic building block next to the least squares model (chapter 27).
• Appendix A gives a collection of general matrix formulas in which the g-
inverse is used extensively.
• The “deficiency matrix,” which gives an algebraic representation of the null
space of a matrix, is defined and discussed in Appendix A.4.
• A molecule-like notation for concatenation of higher-dimensional arrays is
introduced in Appendix B and used occasionally, see (10.5.7), (64.3.2),
(65.0.18).
Other unusual treatments can be found in chapters/sections 3.11, 18.3, 25, 29, 40, 36,
41–42, and 64. There are a number of plots of density functions, confidence ellipses,

and other graphs which us e the full precision of T
E
X, and m ore will be added in the
future. Some chapters are carefully elaborated, while others are still in the process
of construction. In some topics covered in those notes I am an expert, in others I am
still a beginner.
This edition also includes a number of comments from a critical realist per-
spective, inspired by [Bha78] and [Bha93]; see also [Law89]. There are many
situations in the teaching of probability theory and statistics where the concept of
1. PREFACE xxv
totality, transfactual effic acy, etc., can and should be used. These comments are still
at an experimental state, and are the students are not required to know them for the
exams. In the on-line version of the notes they are printed in a different color.
After some more cleaning out of the code, I am planning to make the A
M
S-L
A
T
E
X
source files for these notes publicly available under the GNU public license, and up-
load them to the T
E
X-archive network CTAN. Since I am using Debian GNU/Linux,
the mate rials will also be available as a deb archive.
The most up-to-date version will always be posted at the web site of the Econom-
ics Department of the University of Utah www.econ.utah.edu/ehrbar/ecmet.pdf.
You can contact me by email at
Hans Ehrbar

×