Christian Kleiber and Achim Zeileis (2008), Applied Econometrics with R, Springer-Verlag, New York.
http://cran.r-project.org/web/packages/AER/AER.pdf
Example 1: Cigarette Consumption
This example is taken from Baltagi (Section 3.10 Empirical Example)
Let us install the AER package. This can be done using the menu:
or using the following command:
Now let us load the “CigarettesB” data from AER package:
## [1] "packs" "price" "income"
## packs price income
## AL 4.96213 0.20487 4.64039
## AZ 4.66312 0.16640 4.68389
## AR 5.10709 0.23406 4.59435
## CA 4.50449 0.36399 4.88147
## CT 4.66983 0.32149 5.09472
## DE 5.04705 0.21929 4.87087
## DC 4.65637 0.28946 5.05960
## FL 4.80081 0.28733 4.81155
## GA 4.97974 0.12826 4.73299
## ID 4.74902 0.17541 4.64307
## IL 4.81445 0.24806 4.90387
## IN 5.11129 0.08992 4.72916
## IA 4.80857 0.24081 4.74211
## KS 4.79263 0.21642 4.79613
## KY 5.37906 -0.03260 4.64937
## LA 4.98602 0.23856 4.61461
## ME 4.98722 0.29106 4.75501
## MD 4.77751 0.12575 4.94692
## MA 4.73877 0.22613 4.99998
## MI 4.94744 0.23067 4.80620
## MN 4.69589 0.34297 4.81207
## MS 4.93990 0.13638 4.52938
## MO 5.06430 0.08731 4.78189
## MT 4.73313 0.15303 4.70417
## NE 4.77558 0.18907 4.79671
## NV 4.96642 0.32304 4.83816
## NH 5.10990 0.15852 5.00319
## NJ 4.70633 0.30901 5.10268
## NM 4.58107 0.16458 4.58202
## NY 4.66496 0.34701 4.96075
## ND 4.58237 0.18197 4.69163
## OH 4.97952 0.12889 4.75875
## OK 4.72720 0.19554 4.62730
## PA 4.80363 0.22784 4.83516
## RI 4.84693 0.30324 4.84670
## SC 5.07801 0.07944 4.62549
## SD 4.81545 0.13139 4.67747
## TN 5.04939 0.15547 4.72525
## TX 4.65398 0.28196 4.73437
## UT 4.40859 0.19260 4.55586
## VT 5.08799 0.18018 4.77578
## VA 4.93065 0.11818 4.85490
## WA 4.66134 0.35053 4.85645
## WV 4.82454 0.12008 4.56859
## WI 4.83026 0.22954 4.75826
## WY 5.00087 0.10029 4.71169
We regress consumption on price using OLS:
##
## Call:
## lm(formula = packs ~ price, data = CigarettesB)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.45472 -0.09968 0.00612 0.11553 0.29346
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.0941 0.0627 81.247 < 2e-16 ***
## price -1.1983 0.2818 -4.253 0.000108 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.163 on 44 degrees of freedom
## Multiple R-squared: 0.2913, Adjusted R-squared: 0.2752
## F-statistic: 18.08 on 1 and 44 DF, p-value: 0.0001085
## (Intercept) price
## 5.094108 -1.198316
cig_lm.sum <- summary(cig_lm)
#You can see that the summary of our linear model has a lot of information.
r2<-summary(cig_lm)$r.squared
r2a<-summary(cig_lm)$adj.r.squared
sig<-summary(cig_lm)$sigma
The R-square value for our linear model is \(R^2\)=0.2912836.
The adjusted R-square is \(\bar{R}^2\)=0.2751764.
From this, we can deduce that the estimate of the intercept is \(\widehat \beta_0= 5.0941081\), and the estimate of the slope is \(\widehat \beta_1= -1.1983162\).
Consequently, the line of best fit is \[ \widehat {Y_{t}} = 5.0941081 -1.1983162 X_{t} \] However, this is reporting the estimates to too many decimal places: we can reduce that into 3 decimal places as follows: \[ \widehat {Y_{t}} = 5.094 -1.198 X_{t}. \]
x<-CigarettesB$price
y<-CigarettesB$packs
plot(x,y,pch=19,cex=0.6,xlab='Price',ylab='Consumption (Packs)')
abline(coef(cig_lm),col='red')
title('Line of Best Fit for Cigarette Data')
## 2.5 % 97.5 %
## (Intercept) 4.967747 5.2204696
## price -1.766224 -0.6304087