Chapter 7
library(wooldridge)
## Warning: package 'wooldridge' was built under R version 4.2.3
data <- wooldridge::sleep75
head(sleep75)
## age black case clerical construc educ earns74 gdhlth inlf leis1 leis2 leis3
## 1 32 0 1 0 0 12 0 0 1 3529 3479 3479
## 2 31 0 2 0 0 14 9500 1 1 2140 2140 2140
## 3 44 0 3 0 0 17 42500 1 1 4595 4505 4227
## 4 30 0 4 0 0 12 42500 1 1 3211 3211 3211
## 5 64 0 5 0 0 14 2500 1 1 4052 4007 4007
## 6 41 0 6 0 0 12 0 1 1 4812 4797 4797
## smsa lhrwage lothinc male marr prot rlxall selfe sleep slpnaps south
## 1 0 1.955861 10.075380 1 1 1 3163 0 3113 3163 0
## 2 0 0.357674 0.000000 1 0 1 2920 1 2920 2920 1
## 3 1 3.021887 0.000000 1 1 0 3038 1 2670 2760 0
## 4 0 2.263844 0.000000 0 1 1 3083 1 3083 3083 0
## 5 0 1.011601 9.328213 1 1 1 3493 0 3448 3493 0
## 6 0 2.957511 10.657280 1 1 1 4078 0 4063 4078 0
## spsepay spwrk75 totwrk union worknrm workscnd exper yngkid yrsmarr hrwage
## 1 0 0 3438 0 3438 0 14 0 13 7.070004
## 2 0 0 5020 0 5020 0 11 0 0 1.429999
## 3 20000 1 2815 0 2815 0 21 0 0 20.529997
## 4 5000 1 3786 0 3786 0 12 0 12 9.619998
## 5 2400 1 2580 0 2580 0 44 0 33 2.750000
## 6 0 0 1205 0 0 1205 23 0 23 19.249998
## agesq
## 1 1024
## 2 961
## 3 1936
## 4 900
## 5 4096
## 6 1681
Regression model
model1 <- lm(sleep ~ totwrk + educ + age + agesq + male , data = data)
summary(model1)
##
## Call:
## lm(formula = sleep ~ totwrk + educ + age + agesq + male, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2378.00 -243.29 6.74 259.24 1350.19
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3840.83197 235.10870 16.336 <2e-16 ***
## totwrk -0.16342 0.01813 -9.013 <2e-16 ***
## educ -11.71332 5.86689 -1.997 0.0463 *
## age -8.69668 11.20746 -0.776 0.4380
## agesq 0.12844 0.13390 0.959 0.3378
## male 87.75243 34.32616 2.556 0.0108 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 417.7 on 700 degrees of freedom
## Multiple R-squared: 0.1228, Adjusted R-squared: 0.1165
## F-statistic: 19.59 on 5 and 700 DF, p-value: < 2.2e-16
The model1 above shows whether the total number of working hours, education level, age, the square of age, and gender(male) are significant to the hours of sleeping. In this case, only education level and gender (male) are significant variables. Because the proability of these two variables are lower than 0.05. The coefficients of each variable are explained down below.
1. The coefficients of totwork is -0.16342. It means as the total weekly spent working increases by 1 unit, the hours of sleeping decreases by 0.16 unit.
2. The coefficients of education level is -11.71. It means as time spent in education increases by 1 unit, the hours of sleeping decreases by 11.71 units.
3. The coefficients of age is -8.70. It means as people get old,the hours of sleeping decreases by 8.70 units.
4. The coefficients of dummy variable (gender) is 87.75. It means when gender is male,the hours of sleeping increases by 87.75 units.
Question i). Is there evidence that men sleep more than women?
The coefficients of dummy variable (gender) is 87.75. The standard error of variable is 34.33. To answer this question, the t-statistics of the hypothesis should be calculated. The hypothesis should be:
H(0):the coefficient for "male" is zero.
H(1):the coefficient for "male" is not zero.
t=87.75/34.33=2.55
The degree of freedom = 700-1=699
Based on the t distribution table, the p value is 0.010985. The result is significant at a chosen significance level because p value is lower than 0.05. So, it can prove that the men sleep more than women.
Question ii). Is there a statistically significant tradeoff between working and sleeping? What is the estimated tradeoff?
The coefficient for "totwrk" is 0.163, and its standard error is 0.018.
t=0.163/0.018=9.06
Based on the t distribution table, the p value is 0.00001.The result is significant at a chosen significance level because p value is lower than 0.05. So, it can prove that there is a statistically significant tradeoff between working and sleeping.
Question iii).
model3 <- lm(sleep ~ age , data = data)
summary(model3)
##
## Call:
## lm(formula = sleep ~ age, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2455.35 -254.39 9.55 270.77 1381.96
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3128.913 59.468 52.615 <2e-16 ***
## age 3.541 1.471 2.408 0.0163 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 442.9 on 704 degrees of freedom
## Multiple R-squared: 0.008167, Adjusted R-squared: 0.006758
## F-statistic: 5.797 on 1 and 704 DF, p-value: 0.01631
The coefficients of age is 3.541. It means as people get old,the hours of sleeping increases by 3.541 units. The age is a significant variable. Because p-value is 0.0163, which is lower than 0.05.
Chapter 7 - Question 3
library(wooldridge)
data1 <- wooldridge::gpa2
head(data1)
## sat tothrs colgpa athlete verbmath hsize hsrank hsperc female white black
## 1 920 43 2.04 1 0.48387 0.10 4 40.00000 1 0 0
## 2 1170 18 4.00 0 0.82813 9.40 191 20.31915 0 1 0
## 3 810 14 1.78 1 0.88372 1.19 42 35.29412 0 1 0
## 4 940 40 2.42 0 0.80769 5.71 252 44.13310 0 1 0
## 5 1180 18 2.61 0 0.73529 2.14 86 40.18692 0 1 0
## 6 980 114 3.03 0 0.81481 2.68 41 15.29851 1 1 0
## hsizesq
## 1 0.0100
## 2 88.3600
## 3 1.4161
## 4 32.6041
## 5 4.5796
## 6 7.1824
model4<- lm(sat ~ hsize + hsizesq + female + black + female*black , data = data1)
summary(model4)
##
## Call:
## lm(formula = sat ~ hsize + hsizesq + female + black + female *
## black, data = data1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -570.45 -89.54 -5.24 85.41 479.13
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1028.0972 6.2902 163.445 < 2e-16 ***
## hsize 19.2971 3.8323 5.035 4.97e-07 ***
## hsizesq -2.1948 0.5272 -4.163 3.20e-05 ***
## female -45.0915 4.2911 -10.508 < 2e-16 ***
## black -169.8126 12.7131 -13.357 < 2e-16 ***
## female:black 62.3064 18.1542 3.432 0.000605 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 133.4 on 4131 degrees of freedom
## Multiple R-squared: 0.08578, Adjusted R-squared: 0.08468
## F-statistic: 77.52 on 5 and 4131 DF, p-value: < 2.2e-16
Question i).
Is there strong evidence that hsize2 should be included in the model? From this equation, what is the optimal high school size?
All the variables included in the equation are significant. Because p values of all variables are lower then 0.05.
To prove that hsizesq should be included in the model, t statistical test should be done.
t=2.19/0.53=4.13
The degree of freedom = 4131-1 = 4130
Based on the t distribution table, the p value is 0.000037.The result is significant at a chosen significance level because p value is lower than 0.05. So, it can prove that there is a statistically significance of hsizesq and the variable should be included in the model.
Question ii).
Holding hsize fixed, what is the estimated difference in SAT score between nonblack females and nonblack males? How statistically significant is this estimated difference?
− 45.09 (female) +62.31 (female*black) = 17.22.
To check whether the difference between nonblack females and nonblack males is significant, t statistical test should be run.
t_stat <- 62.31/18.15
df <- 4131-1
p_value <- 2 * (1 - pt(abs(t_stat), df))
p_value
## [1] 0.0006026815
P value is equal to 0.0006. So that the estimated difference is statistically significant.
Question iii).
What is the estimated difference in SAT score between nonblack males and black males? Test the null hypothesis that there is no difference between their scores, against the alternative that there is a difference.
The difference is -169.81.
t_stat1 <- 169.81/12.71
df <- 4131-1
p_value1 <- 2 * (1 - pt(abs(t_stat1), df))
p_value1
## [1] 0
P value is equal to 0. So that the estimated difference is statistically significant.
Question iv).
What is the estimated difference in SAT score between black females and nonblack females? What would you need to do to test whether the difference is statistically significant?
The difference is − 169.81 (black) +62.31 (female · black) = -107.5.
t_stat2 <- 45.09/6.29
df <- 4131-1
p_value2 <- 2 * (1 - pt(abs(t_stat2), df))
p_value2
## [1] 8.939516e-13
P value is equal to 0.0000000000000894. So that the estimated difference is statistically significant.
Chapter 7 - Question C1
data2 <- wooldridge::gpa1
head(data2)
## age soph junior senior senior5 male campus business engineer colGPA hsGPA ACT
## 1 21 0 0 1 0 0 0 1 0 3.0 3.0 21
## 2 21 0 0 1 0 0 0 1 0 3.4 3.2 24
## 3 20 0 1 0 0 0 0 1 0 3.0 3.6 26
## 4 19 1 0 0 0 1 1 1 0 3.5 3.5 27
## 5 20 0 1 0 0 0 0 1 0 3.6 3.9 28
## 6 20 0 0 1 0 1 1 1 0 3.0 3.4 25
## job19 job20 drive bike walk voluntr PC greek car siblings bgfriend clubs
## 1 0 1 1 0 0 0 0 0 1 1 0 0
## 2 0 1 1 0 0 0 0 0 1 0 1 1
## 3 1 0 0 0 1 0 0 0 1 1 0 1
## 4 1 0 0 0 1 0 0 0 0 1 0 0
## 5 0 1 0 1 0 0 0 0 1 1 1 0
## 6 0 0 0 0 1 0 0 0 1 1 0 0
## skipped alcohol gradMI fathcoll mothcoll
## 1 2 1.0 1 0 0
## 2 0 1.0 1 1 1
## 3 0 1.0 1 1 1
## 4 0 0.0 0 0 0
## 5 0 1.5 1 1 0
## 6 0 0.0 0 1 0
Question i).
Add the variables mothcoll and fathcoll to the equation estimated in (7.6) and report the results in the usual form. What happens to the estimated effect of PC ownership? Is PC still statistically significant?
model5<- lm(colGPA ~ PC + hsGPA + ACT , data = data2)
summary(model5)
##
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT, data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7901 -0.2622 -0.0107 0.2334 0.7570
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.263520 0.333125 3.793 0.000223 ***
## PC 0.157309 0.057287 2.746 0.006844 **
## hsGPA 0.447242 0.093647 4.776 4.54e-06 ***
## ACT 0.008659 0.010534 0.822 0.412513
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3325 on 137 degrees of freedom
## Multiple R-squared: 0.2194, Adjusted R-squared: 0.2023
## F-statistic: 12.83 on 3 and 137 DF, p-value: 1.932e-07
model6<- lm(colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll , data = data2)
summary(model6)
##
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll,
## data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.78149 -0.25726 -0.02121 0.24691 0.74432
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.255554 0.335392 3.744 0.000268 ***
## PC 0.151854 0.058716 2.586 0.010762 *
## hsGPA 0.450220 0.094280 4.775 4.61e-06 ***
## ACT 0.007724 0.010678 0.723 0.470688
## mothcoll -0.003758 0.060270 -0.062 0.950376
## fathcoll 0.041800 0.061270 0.682 0.496265
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3344 on 135 degrees of freedom
## Multiple R-squared: 0.2222, Adjusted R-squared: 0.1934
## F-statistic: 7.713 on 5 and 135 DF, p-value: 2.083e-06
PC is still significant. The p-value of PC was 0.006844 before adjustment. The value changed to 0.010762 after adjustment. The p-value of PC is still significant because the p-value is less than 0.05.
About the coefficient of PC, it slightly changed from 0.157309 to 0.151854.
Question ii).Test for joint significance of mothcoll and fathcoll in the equation from part (i) and be sure to report the p-value.
library(car)
## Warning: package 'car' was built under R version 4.2.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.2.3
hypotheses <- c("mothcoll=0", "fathcoll=0")
joint_test <- linearHypothesis(model6, hypotheses)
joint_test
## Linear hypothesis test
##
## Hypothesis:
## mothcoll = 0
## fathcoll = 0
##
## Model 1: restricted model
## Model 2: colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll
##
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 137 15.149
## 2 135 15.094 2 0.054685 0.2446 0.7834
P-value of the joint test is 0.7834.
summary(joint_test)
## Res.Df RSS Df Sum of Sq F
## Min. :135.0 Min. :15.09 Min. :2 Min. :0.05469 Min. :0.2446
## 1st Qu.:135.5 1st Qu.:15.11 1st Qu.:2 1st Qu.:0.05469 1st Qu.:0.2446
## Median :136.0 Median :15.12 Median :2 Median :0.05469 Median :0.2446
## Mean :136.0 Mean :15.12 Mean :2 Mean :0.05469 Mean :0.2446
## 3rd Qu.:136.5 3rd Qu.:15.14 3rd Qu.:2 3rd Qu.:0.05469 3rd Qu.:0.2446
## Max. :137.0 Max. :15.15 Max. :2 Max. :0.05469 Max. :0.2446
## NA's :1 NA's :1 NA's :1
## Pr(>F)
## Min. :0.7834
## 1st Qu.:0.7834
## Median :0.7834
## Mean :0.7834
## 3rd Qu.:0.7834
## Max. :0.7834
## NA's :1
Question iii). Add hsGPA2 to the model from part (i) and decide whether this generalization is needed.
model7<- lm(colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll + I(hsGPA^2) , data = data2)
summary(model7)
##
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll +
## I(hsGPA^2), data = data2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.78998 -0.24327 -0.00648 0.26179 0.72231
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.040328 2.443038 2.063 0.0410 *
## PC 0.140446 0.058858 2.386 0.0184 *
## hsGPA -1.802520 1.443552 -1.249 0.2140
## ACT 0.004786 0.010786 0.444 0.6580
## mothcoll 0.003091 0.060110 0.051 0.9591
## fathcoll 0.062761 0.062401 1.006 0.3163
## I(hsGPA^2) 0.337341 0.215711 1.564 0.1202
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3326 on 134 degrees of freedom
## Multiple R-squared: 0.2361, Adjusted R-squared: 0.2019
## F-statistic: 6.904 on 6 and 134 DF, p-value: 2.088e-06
The generalization is not needed.
Chapter 7 - Question C2
Question i).
Holding other factors fixed, what is the approximate difference in monthly salary between blacks and nonblacks? Is this difference statistically significant?
data3 <- wooldridge::wage2
head(data3)
## wage hours IQ KWW educ exper tenure age married black south urban sibs
## 1 769 40 93 35 12 11 2 31 1 0 0 1 1
## 2 808 50 119 41 18 11 16 37 1 0 0 1 1
## 3 825 40 108 46 14 11 9 33 1 0 0 1 1
## 4 650 40 96 32 12 13 7 32 1 0 0 1 4
## 5 562 40 74 27 11 14 5 34 1 0 0 1 10
## 6 1400 40 116 43 16 14 2 35 1 1 0 1 1
## brthord meduc feduc lwage
## 1 2 8 8 6.645091
## 2 NA 14 14 6.694562
## 3 2 14 14 6.715384
## 4 3 12 12 6.476973
## 5 6 6 11 6.331502
## 6 2 8 NA 7.244227
model7 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban, data = data3)
summary(model7)
##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black +
## south + urban, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.98069 -0.21996 0.00707 0.24288 1.22822
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.395497 0.113225 47.653 < 2e-16 ***
## educ 0.065431 0.006250 10.468 < 2e-16 ***
## exper 0.014043 0.003185 4.409 1.16e-05 ***
## tenure 0.011747 0.002453 4.789 1.95e-06 ***
## married 0.199417 0.039050 5.107 3.98e-07 ***
## black -0.188350 0.037667 -5.000 6.84e-07 ***
## south -0.090904 0.026249 -3.463 0.000558 ***
## urban 0.183912 0.026958 6.822 1.62e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3655 on 927 degrees of freedom
## Multiple R-squared: 0.2526, Adjusted R-squared: 0.2469
## F-statistic: 44.75 on 7 and 927 DF, p-value: < 2.2e-16
Answer i). The difference is -0.18835, which is statistically significant. Because the p value is lower than 0.05.
Question ii).
Add the variables exper 2 and tenure2 to the equation and show that they are jointly insignificant at even the 20% level.`
model8 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban + I(exper^2) + I(tenure^2), data = data3)
summary(model8)
##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black +
## south + urban + I(exper^2) + I(tenure^2), data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.98236 -0.21972 -0.00036 0.24078 1.25127
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.3586756 0.1259143 42.558 < 2e-16 ***
## educ 0.0642761 0.0063115 10.184 < 2e-16 ***
## exper 0.0172146 0.0126138 1.365 0.172665
## tenure 0.0249291 0.0081297 3.066 0.002229 **
## married 0.1985470 0.0391103 5.077 4.65e-07 ***
## black -0.1906636 0.0377011 -5.057 5.13e-07 ***
## south -0.0912153 0.0262356 -3.477 0.000531 ***
## urban 0.1854241 0.0269585 6.878 1.12e-11 ***
## I(exper^2) -0.0001138 0.0005319 -0.214 0.830622
## I(tenure^2) -0.0007964 0.0004710 -1.691 0.091188 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3653 on 925 degrees of freedom
## Multiple R-squared: 0.255, Adjusted R-squared: 0.2477
## F-statistic: 35.17 on 9 and 925 DF, p-value: < 2.2e-16
Question iii).
Extend the original model to allow the return to education to depend on race and test whether the return to education does depend on race.`
model9 <- lm(log(wage) ~ educ + exper + tenure + married + south + urban + educ * black, data = data3)
summary(model9)
##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + south +
## urban + educ * black, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.97782 -0.21832 0.00475 0.24136 1.23226
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.374817 0.114703 46.859 < 2e-16 ***
## educ 0.067115 0.006428 10.442 < 2e-16 ***
## exper 0.013826 0.003191 4.333 1.63e-05 ***
## tenure 0.011787 0.002453 4.805 1.80e-06 ***
## married 0.198908 0.039047 5.094 4.25e-07 ***
## south -0.089450 0.026277 -3.404 0.000692 ***
## urban 0.183852 0.026955 6.821 1.63e-11 ***
## black 0.094809 0.255399 0.371 0.710561
## educ:black -0.022624 0.020183 -1.121 0.262603
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3654 on 926 degrees of freedom
## Multiple R-squared: 0.2536, Adjusted R-squared: 0.2471
## F-statistic: 39.32 on 8 and 926 DF, p-value: < 2.2e-16
Answer: The education does not depend on the race. Because p-value of black*educ is 0.2626, which is greater than 0.05.
Question iv).
Again, start with the original model, but now allow wages to differ across four groups of people: married and black, married and nonblack, single and black, and single and nonblack. What is the estimated wage differential between married blacks and married nonblacks?
model10 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban + married*black, data = data3)
summary(model10)
##
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black +
## south + urban + married * black, data = data3)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.98013 -0.21780 0.01057 0.24219 1.22889
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.403793 0.114122 47.351 < 2e-16 ***
## educ 0.065475 0.006253 10.471 < 2e-16 ***
## exper 0.014146 0.003191 4.433 1.04e-05 ***
## tenure 0.011663 0.002458 4.745 2.41e-06 ***
## married 0.188915 0.042878 4.406 1.18e-05 ***
## black -0.240820 0.096023 -2.508 0.012314 *
## south -0.091989 0.026321 -3.495 0.000497 ***
## urban 0.184350 0.026978 6.833 1.50e-11 ***
## married:black 0.061354 0.103275 0.594 0.552602
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3656 on 926 degrees of freedom
## Multiple R-squared: 0.2528, Adjusted R-squared: 0.2464
## F-statistic: 39.17 on 8 and 926 DF, p-value: < 2.2e-16
Chapter 8
Chapter 8 - Question 1
Which of the following are consequences of heteroskedasticity?
(i) The OLS estimators, bj, are inconsistent.
(ii) The usual F statistic no longer has an F distribution.
(iii) The OLS estimators are no longer BLUE.
Answer:
(i) The OLS estimators, bj, are inconsistent: True. Heteroskedasticity can lead to inefficient estimates, making the OLS estimators inconsistent.
(ii) The usual F statistic no longer has an F distribution: False. Heteroskedasticity does not affect the distribution of the F-statistic directly. However, it can affect the efficiency of the estimates, leading to incorrect inference in hypothesis tests.
(iii) The OLS estimators are no longer BLUE (Best Linear Unbiased Estimators): True. Heteroskedasticity violates one of the Gauss-Markov assumptions, leading to the OLS estimators no longer being BLUE. In the presence of heteroskedasticity, generalized least squares (GLS) or weighted least squares (WLS) may be more appropriate for obtaining efficient and unbiased estimates.
Chapter 8 - Question 5
data4 <- wooldridge::smoke
head(data4)
## educ cigpric white age income cigs restaurn lincome agesq lcigpric
## 1 16.0 60.506 1 46 20000 0 0 9.903487 2116 4.102743
## 2 16.0 57.883 1 40 30000 0 0 10.308952 1600 4.058424
## 3 12.0 57.664 1 58 30000 3 0 10.308952 3364 4.054633
## 4 13.5 57.883 1 30 20000 0 0 9.903487 900 4.058424
## 5 10.0 58.320 1 17 20000 0 0 9.903487 289 4.065945
## 6 6.0 59.340 1 86 6500 0 0 8.779557 7396 4.083283
model11 <- lm(cigs ~ log(cigpric) + log(income) + educ + age + agesq + restaurn + white , data = data4)
summary(model11)
##
## Call:
## lm(formula = cigs ~ log(cigpric) + log(income) + educ + age +
## agesq + restaurn + white, data = data4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -15.772 -9.330 -5.907 7.945 70.275
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.682419 24.220729 -0.111 0.91184
## log(cigpric) -0.850907 5.782321 -0.147 0.88305
## log(income) 0.869014 0.728763 1.192 0.23344
## educ -0.501753 0.167168 -3.001 0.00277 **
## age 0.774502 0.160516 4.825 1.68e-06 ***
## agesq -0.009069 0.001748 -5.188 2.70e-07 ***
## restaurn -2.865621 1.117406 -2.565 0.01051 *
## white -0.559236 1.459461 -0.383 0.70169
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.41 on 799 degrees of freedom
## Multiple R-squared: 0.05291, Adjusted R-squared: 0.04461
## F-statistic: 6.377 on 7 and 799 DF, p-value: 2.588e-07
Question i). Are there any important differences between the two sets of standard errors?
Standard errors between two sets are almost same.
s_error <- coef(summary(model11))
s_error
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.682418774 24.220728831 -0.1107489 9.118433e-01
## log(cigpric) -0.850907441 5.782321084 -0.1471567 8.830454e-01
## log(income) 0.869013971 0.728763480 1.1924499 2.334389e-01
## educ -0.501753247 0.167167689 -3.0014966 2.770186e-03
## age 0.774502156 0.160515805 4.8250835 1.676279e-06
## agesq -0.009068603 0.001748055 -5.1878253 2.699355e-07
## restaurn -2.865621212 1.117405936 -2.5645301 1.051320e-02
## white -0.559236403 1.459461034 -0.3831801 7.016882e-01
Question ii). Holding other factors fixed, if education increases by four years, what happens to the estimated probability of smoking?
edu_4 <- coef(model11)["educ"] * 4
edu_4
## educ
## -2.007013
Question iii).
At what point does another year of age reduce the probability of smoking?
age <- -coef(model11)["age"] / (2 * coef(model11)["I(age^2)"])
age
## age
## NA
Question (iv) Interpret the coefficient on the binary variable restaurn (a dummy variable equal to one if the person lives in a state with restaurant smoking restrictions).
coef_res <- coef(model11)["restaurn"]
coef_res
## restaurn
## -2.865621
Question (v) Person number 206 in the data set has the following characteristics: cigpric=67.44, income=6,500, educ=16, age=77, restaurn= 0, white=0, and smokes 5 0. Compute the predicted probability of smoking for this person and comment on the result.
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.2.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.2.3
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
data_update <- data.frame(cigpric = 67.44, income = 6500, educ = 16, age = 77, restaurn = 0, white = 0)
het_test <- bptest(model11)
het_test
##
## studentized Breusch-Pagan test
##
## data: model11
## BP = 32.377, df = 7, p-value = 3.458e-05
Chapter 8 - Question C4
data5 <- wooldridge::vote1
head(data5)
## state district democA voteA expendA expendB prtystrA lexpendA lexpendB
## 1 AL 7 1 68 328.296 8.737 41 5.793916 2.167567
## 2 AK 1 0 62 626.377 402.477 60 6.439952 5.997638
## 3 AZ 2 1 73 99.607 3.065 55 4.601233 1.120048
## 4 AZ 3 0 69 319.690 26.281 64 5.767352 3.268846
## 5 AR 3 0 75 159.221 60.054 66 5.070293 4.095244
## 6 AR 4 1 69 570.155 21.393 46 6.345908 3.063064
## shareA
## 1 97.40767
## 2 60.88104
## 3 97.01476
## 4 92.40370
## 5 72.61247
## 6 96.38355
Question i). Estimate a model with voteA as the dependent variable and prtystrA, democA, log(expendA), and log(expendB) as independent variables. Obtain the OLS residuals, uˆi, and regress these on all of the independent variables. Explain why you obtain R2=0.
model12 <- lm(voteA ~ prtystrA + democA + log(expendA) + log(expendB), data = data5)
summary(model12)
##
## Call:
## lm(formula = voteA ~ prtystrA + democA + log(expendA) + log(expendB),
## data = data5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.576 -4.864 -1.146 4.903 24.566
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.66141 4.73604 7.952 2.56e-13 ***
## prtystrA 0.25192 0.07129 3.534 0.00053 ***
## democA 3.79294 1.40652 2.697 0.00772 **
## log(expendA) 5.77929 0.39182 14.750 < 2e-16 ***
## log(expendB) -6.23784 0.39746 -15.694 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 7.573 on 168 degrees of freedom
## Multiple R-squared: 0.8012, Adjusted R-squared: 0.7964
## F-statistic: 169.2 on 4 and 168 DF, p-value: < 2.2e-16
residuals <- residuals(model12)
residuals_model <- lm(residuals ~ prtystrA + democA + log(expendA) + log(expendB), data = data5)
summary(residuals_model)
##
## Call:
## lm(formula = residuals ~ prtystrA + democA + log(expendA) + log(expendB),
## data = data5)
##
## Residuals:
## Min 1Q Median 3Q Max
## -18.576 -4.864 -1.146 4.903 24.566
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.705e-15 4.736e+00 0 1
## prtystrA 1.563e-16 7.129e-02 0 1
## democA -4.577e-16 1.407e+00 0 1
## log(expendA) -3.685e-16 3.918e-01 0 1
## log(expendB) -5.594e-16 3.975e-01 0 1
##
## Residual standard error: 7.573 on 168 degrees of freedom
## Multiple R-squared: 5.211e-32, Adjusted R-squared: -0.02381
## F-statistic: 2.189e-30 on 4 and 168 DF, p-value: 1
Answer i). The R-squared being close to zero in the second regression indicates that the residuals are not well explained by the independent variables, suggesting heteroskedasticity.
Question ii).
Now, compute the Breusch-Pagan test for heteroskedasticity. Use the F statistic version and report the p-value.
bptest_result <- bptest(model12)
print(bptest_result)
##
## studentized Breusch-Pagan test
##
## data: model12
## BP = 9.0934, df = 4, p-value = 0.05881
Answer ii). P-value is higher than 0.05881, which means that the result is not statistically significant.
Question iii).
Compute the special case of the White test for heteroskedasticity, again using the F statistic form. How strong is the evidence for heteroskedasticity now?
white_data <- data.frame(residuals_squared = residuals^2, data5$prtystrA, data5$democA, log_expendA = log(data5$expendA), log_expendB = log(data5$expendB))
white_model <- lm(residuals_squared ~ data5$prtystrA + data5$democA + log_expendA + log_expendB, data = white_data)
f_statistic <- summary(white_model)$fstatistic
p_value <- pf(f_statistic[1], f_statistic[2], f_statistic[3], lower.tail = FALSE)
print(paste("F-statistic:", f_statistic[1], "P-value:", p_value))
## [1] "F-statistic: 2.33011268371626 P-value: 0.0580575140885541"
Chapter 8 - Question 13
Question i).
data6 <- wooldridge::fertil2
head(data6)
## mnthborn yearborn age electric radio tv bicycle educ ceb agefbrth children
## 1 5 64 24 1 1 1 1 12 0 NA 0
## 2 1 56 32 1 1 1 1 13 3 25 3
## 3 7 58 30 1 0 0 0 5 1 27 1
## 4 11 45 42 1 0 1 0 4 3 17 2
## 5 5 45 43 1 1 1 1 11 2 24 2
## 6 8 52 36 1 0 0 0 7 1 26 1
## knowmeth usemeth monthfm yearfm agefm idlnchld heduc agesq urban urb_educ
## 1 1 0 NA NA NA 2 NA 576 1 12
## 2 1 1 11 80 24 3 12 1024 1 13
## 3 1 0 6 83 24 5 7 900 1 5
## 4 1 0 1 61 15 3 11 1764 1 4
## 5 1 1 3 66 20 2 14 1849 1 11
## 6 1 1 11 76 24 4 9 1296 1 7
## spirit protest catholic frsthalf educ0 evermarr
## 1 0 0 0 1 0 0
## 2 0 0 0 1 0 1
## 3 1 0 0 0 0 1
## 4 0 0 0 0 0 1
## 5 0 1 0 1 0 1
## 6 0 0 0 0 0 1
library("sandwich")
## Warning: package 'sandwich' was built under R version 4.2.3
model10 <- lm(children ~ age + I(age^2) + educ + electric + urban, data = data6)
summary(model10)
##
## Call:
## lm(formula = children ~ age + I(age^2) + educ + electric + urban,
## data = data6)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9012 -0.7136 -0.0039 0.7119 7.4318
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.2225162 0.2401888 -17.580 < 2e-16 ***
## age 0.3409255 0.0165082 20.652 < 2e-16 ***
## I(age^2) -0.0027412 0.0002718 -10.086 < 2e-16 ***
## educ -0.0752323 0.0062966 -11.948 < 2e-16 ***
## electric -0.3100404 0.0690045 -4.493 7.20e-06 ***
## urban -0.2000339 0.0465062 -4.301 1.74e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.452 on 4352 degrees of freedom
## (3 observations deleted due to missingness)
## Multiple R-squared: 0.5734, Adjusted R-squared: 0.5729
## F-statistic: 1170 on 5 and 4352 DF, p-value: < 2.2e-16
Some robust standard errors are bigger that nonrobust errors.
a <- coeftest(model10, vcov = sandwich)
a
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.22251623 0.24368307 -17.3279 < 2.2e-16 ***
## age 0.34092552 0.01916146 17.7923 < 2.2e-16 ***
## I(age^2) -0.00274121 0.00035027 -7.8260 6.278e-15 ***
## educ -0.07523232 0.00630336 -11.9353 < 2.2e-16 ***
## electric -0.31004041 0.06390411 -4.8517 1.267e-06 ***
## urban -0.20003386 0.04543962 -4.4022 1.097e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Question ii). Add the three religious dummy variables and test whether they are jointly significant. What are the p-values for the nonrobust and robust tests?
joint_test <- coeftest(model10, vcov = vcovHC)
print(joint_test[, "Pr(>|t|)"])
## (Intercept) age I(age^2) educ electric urban
## 9.635281e-65 5.015759e-68 7.640643e-15 3.247461e-32 1.351408e-06 1.135260e-05
Question iii). From the regresion in part (ii), obtain the fitted values yˆ and the residuals, u. Regress u2 on yˆ, yˆ2 and test the joint significance of the two regressors. Conclude that heteroskedasticity is present in the equation for children.
fv <- fitted(model10)
head(fv,6)
## 1 2 3 4 5 6
## 0.9678977 2.3920079 2.6519254 4.4498594 4.0311559 3.4614951
residuals <- resid(model10)
head(residuals,6)
## 1 2 3 4 5 6
## -0.9678977 0.6079921 -1.6519254 -2.4498594 -2.0311559 -2.4614951
hetero_test <- lm(residuals^2 ~ fv)
summary(hetero_test)
##
## Call:
## lm(formula = residuals^2 ~ fv)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.336 -1.897 -0.321 0.682 49.275
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.54042 0.09451 -5.718 1.15e-08 ***
## fv 1.16693 0.03347 34.863 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.717 on 4356 degrees of freedom
## Multiple R-squared: 0.2182, Adjusted R-squared: 0.218
## F-statistic: 1215 on 1 and 4356 DF, p-value: < 2.2e-16
Question iv). Would you say the heteroskedasticity you found in part (iii) is practically important?
The heteroskedasticity is statistically significant because p-value is lower than 0.05.
Chapter 9 - Question 1
Question 1. When ceoten2 and comten2 are added. Is there evidence of functional form misspecification in this model?
data7 <- wooldridge::ceosal2
head(data7)
## salary age college grad comten ceoten sales profits mktval lsalary lsales
## 1 1161 49 1 1 9 2 6200 966 23200 7.057037 8.732305
## 2 600 43 1 1 10 10 283 48 1100 6.396930 5.645447
## 3 379 51 1 1 9 3 169 40 1100 5.937536 5.129899
## 4 651 55 1 0 22 22 1100 -54 1000 6.478509 7.003066
## 5 497 44 1 1 8 6 351 28 387 6.208590 5.860786
## 6 1067 64 1 1 7 7 19000 614 3900 6.972606 9.852194
## lmktval comtensq ceotensq profmarg
## 1 10.051908 81 4 15.580646
## 2 7.003066 100 100 16.961130
## 3 7.003066 81 9 23.668638
## 4 6.907755 484 484 -4.909091
## 5 5.958425 64 36 7.977208
## 6 8.268732 49 49 3.231579
model11 <- lm(log(salary) ~ log(sales) + log(mktval) + profmarg + ceoten + comten, data = data7)
summary(model11)
##
## Call:
## lm(formula = log(salary) ~ log(sales) + log(mktval) + profmarg +
## ceoten + comten, data = data7)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.5436 -0.2796 -0.0164 0.2857 1.9879
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.571977 0.253466 18.038 < 2e-16 ***
## log(sales) 0.187787 0.040003 4.694 5.46e-06 ***
## log(mktval) 0.099872 0.049214 2.029 0.04397 *
## profmarg -0.002211 0.002105 -1.050 0.29514
## ceoten 0.017104 0.005540 3.087 0.00236 **
## comten -0.009238 0.003337 -2.768 0.00626 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4947 on 171 degrees of freedom
## Multiple R-squared: 0.3525, Adjusted R-squared: 0.3336
## F-statistic: 18.62 on 5 and 171 DF, p-value: 9.488e-15
r_squared <- summary(model11)$r.squared
r_squared
## [1] 0.3525374
model12 <- lm(log(salary) ~ log(sales) + log(mktval) + profmarg + ceotensq + comtensq, data = data7)
summary(model12)
##
## Call:
## lm(formula = log(salary) ~ log(sales) + log(mktval) + profmarg +
## ceotensq + comtensq, data = data7)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.47481 -0.25933 -0.00511 0.27010 2.07583
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.612e+00 2.524e-01 18.276 < 2e-16 ***
## log(sales) 1.805e-01 4.021e-02 4.489 1.31e-05 ***
## log(mktval) 1.018e-01 4.988e-02 2.040 0.0429 *
## profmarg -2.077e-03 2.135e-03 -0.973 0.3321
## ceotensq 3.761e-04 1.916e-04 1.963 0.0512 .
## comtensq -1.788e-04 7.236e-05 -2.471 0.0144 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5024 on 171 degrees of freedom
## Multiple R-squared: 0.3324, Adjusted R-squared: 0.3129
## F-statistic: 17.03 on 5 and 171 DF, p-value: 1.195e-13
r_squared <- summary(model12)$r.squared
r_squared
## [1] 0.3323998
The values of R square of models are similiar.
Chapter 9 - question 5
data9 <- wooldridge::campus
head(data9)
## enroll priv police crime lcrime lenroll lpolice
## 1 21836 0 24 446 6.100319 9.991315 3.178054
## 2 6485 0 13 1 0.000000 8.777247 2.564949
## 3 2123 0 3 1 0.000000 7.660585 1.098612
## 4 8240 0 17 121 4.795791 9.016756 2.833213
## 5 19793 0 30 470 6.152733 9.893084 3.401197
## 6 3256 1 9 25 3.218876 8.088255 2.197225
b0 <- -6.63
se_b0 <- 1.03
b1 <- 1.27
se_b1 <- 0.11
n <- nrow(data9)
t_stat <- (b1 - 1) / se_b1
df <- n - 2
critical_value <- qt(0.95, df)
if (t_stat > critical_value) {
cat("Reject the null hypothesis H0: B1 = 1 in favor of H1: B1 > 1 at the 5% level.\n")
} else {
cat("Fail to reject the null hypothesis H0: B1 = 1 at the 5% level.\n")
}
## Reject the null hypothesis H0: B1 = 1 in favor of H1: B1 > 1 at the 5% level.
Chapter 9 - C3
data10 <- wooldridge::jtrain
head(data10)
## year fcode employ sales avgsal scrap rework tothrs union grant d89 d88
## 1 1987 410032 100 47000000 35000 NA NA 12 0 0 0 0
## 2 1988 410032 131 43000000 37000 NA NA 8 0 0 0 1
## 3 1989 410032 123 49000000 39000 NA NA 8 0 0 1 0
## 4 1987 410440 12 1560000 10500 NA NA 12 0 0 0 0
## 5 1988 410440 13 1970000 11000 NA NA 12 0 0 0 1
## 6 1989 410440 14 2350000 11500 NA NA 10 0 0 1 0
## totrain hrsemp lscrap lemploy lsales lrework lhrsemp lscrap_1 grant_1
## 1 100 12.000000 NA 4.605170 17.66566 NA 2.564949 NA 0
## 2 50 3.053435 NA 4.875197 17.57671 NA 1.399565 NA 0
## 3 50 3.252033 NA 4.812184 17.70733 NA 1.447397 NA 0
## 4 12 12.000000 NA 2.484907 14.26020 NA 2.564949 NA 0
## 5 13 12.000000 NA 2.564949 14.49354 NA 2.564949 NA 0
## 6 14 10.000000 NA 2.639057 14.66993 NA 2.397895 NA 0
## clscrap cgrant clemploy clsales lavgsal clavgsal cgrant_1
## 1 NA 0 NA NA 10.463103 NA NA
## 2 NA 0 0.27002716 -0.0889492 10.518673 0.05556965 0
## 3 NA 0 -0.06301308 0.1306210 10.571317 0.05264378 0
## 4 NA 0 NA NA 9.259130 NA NA
## 5 NA 0 0.08004260 0.2333469 9.305651 0.04652023 0
## 6 NA 0 0.07410812 0.1763821 9.350102 0.04445171 0
## chrsemp clhrsemp
## 1 NA NA
## 2 -8.9465647 -1.16538453
## 3 0.1985974 0.04783237
## 4 NA NA
## 5 0.0000000 0.00000000
## 6 -2.0000000 -0.16705394
Question (answer)ii). There is no signifance that a job training grant lower a firm's scrap rate. The p-value of grant is 0.8895, which indicates grant is not a significant variable.
model13 <- subset(data10, year == 1988)
model14 <- lm(log(scrap) ~ grant, data = model13)
summary(model14)
##
## Call:
## lm(formula = log(scrap) ~ grant, data = model13)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4043 -0.9536 -0.0465 0.9636 2.8103
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.4085 0.2406 1.698 0.0954 .
## grant 0.0566 0.4056 0.140 0.8895
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.423 on 52 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.0003744, Adjusted R-squared: -0.01885
## F-statistic: 0.01948 on 1 and 52 DF, p-value: 0.8895
Question (answer) iii). Iscrap_1 is significant. But, grant is still not significant.
model15 <- lm(log(scrap) ~ grant + lscrap_1, data = model13)
summary(model15)
##
## Call:
## lm(formula = log(scrap) ~ grant + lscrap_1, data = model13)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.9146 -0.1763 0.0057 0.2308 1.5991
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.02124 0.08910 0.238 0.8126
## grant -0.25397 0.14703 -1.727 0.0902 .
## lscrap_1 0.83116 0.04444 18.701 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5127 on 51 degrees of freedom
## (103 observations deleted due to missingness)
## Multiple R-squared: 0.8728, Adjusted R-squared: 0.8678
## F-statistic: 174.9 on 2 and 51 DF, p-value: < 2.2e-16
Question (answer) V)
test_lscrap_1 <- summary(model15)
p_value_lscrap_1 <- test_lscrap_1$coefficients["lscrap_1", "Pr(>|t|)"]
cat("Test for lscrap_1 parameter:", ifelse(p_value_lscrap_1 < 0.05, "Statistically significant", "Not significant"), "\n")
## Test for lscrap_1 parameter: Statistically significant
Chapter 9 - C4
data11 <- wooldridge::infmrt
model16 <- subset(data11, year == 1990)
model17 <- lm(infmort ~ log(pcinc) + log(physic) + log(popul) + DC, data = model16)
summary(model17)
##
## Call:
## lm(formula = infmort ~ log(pcinc) + log(physic) + log(popul) +
## DC, data = model16)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.4964 -0.8076 0.0000 0.9358 2.6077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 23.9548 12.4195 1.929 0.05994 .
## log(pcinc) -0.5669 1.6412 -0.345 0.73135
## log(physic) -2.7418 1.1908 -2.303 0.02588 *
## log(popul) 0.6292 0.1911 3.293 0.00191 **
## DC 16.0350 1.7692 9.064 8.43e-12 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.246 on 46 degrees of freedom
## Multiple R-squared: 0.691, Adjusted R-squared: 0.6641
## F-statistic: 25.71 on 4 and 46 DF, p-value: 3.146e-11
The coefficient of DC is 16.035, which means when DC increases by one unit, infmort increases by 16.035 (when other all variables are fixed). The p-value of DC is lower than 0.05, which means DC is statistically significant.
Chapter 10 - Question 1
Decide if you agree or disagree with each of the following statements and give a brief explanation of your decision:
(i) Like cross-sectional observations, we can assume that most time series observations are independently distributed.
(ii) The OLS estimator in a time series regression is unbiased under the first three
Gauss-Markov assumptions.
(iii) A trending variable cannot be used as the dependent variable in multiple regression
analysis.
(iv) Seasonality is not an issue when using annual time series observations
Answer
(i) Like cross-sectional observations, we can assume that most time series observations are independently distributed.
Disagree: Time series observations are often not independently distributed because they can exhibit serial correlation, where the current observation is correlated with past observations. Time series data points are usually correlated over time, and independence assumptions may not hold.
(ii) The OLS estimator in a time series regression is unbiased under the first three Gauss-Markov assumptions.
Disagree: The Gauss-Markov assumptions include linearity, no perfect multicollinearity, exogeneity, and homoscedasticity. Time series data may violate the assumption of independence over time, and if there is serial correlation, the OLS estimator may not be unbiased.
(iii) A trending variable cannot be used as the dependent variable in multiple regression analysis.
Disagree: A trending variable can be used as the dependent variable in multiple regression analysis. However, one needs to be cautious about potential issues like spurious regression, where unrelated trends in different variables may lead to a false correlation. Detrending or using appropriate methods can be applied to handle trends.
(iv) Seasonality is not an issue when using annual time series observations.
Disagree: Seasonality can still be an issue in annual time series observations. Even with annual data, there might be patterns or cycles within each year that need to be considered. Ignoring seasonality can lead to misspecification in the model.
Chapter 10 - Question 5
Suppose you have quarterly data on new housing starts, interest rates, and real per capita income. Specify a model for housing starts that accounts for possible trends and seasonality in the variables.
Answer
When modeling quarterly data on new housing starts, interest rates, and real per capita income, it's important to account for potential trends and seasonality in the variables. A common approach is to use a time series model such as a SARIMA (Seasonal Autoregressive Integrated Moving Average) model. Here's how you might specify a model:
Let's denote:
y(t):Housing starts at time
x(1,t) : Interest rates at time
x(2,t): Real per capita income at time
A simple SARIMA model with a linear trend, seasonality, and exogenous variables might look like this:
y(t)=b0+b1t+b2x(1,t)+b3(x2,t)+e(t)
Here:
b0 is the intercept,
b1 represents the linear trend over time,
b2 and b3 represent the impact of interest rates and real per capita income on housing starts, respectively,
e(t) is the error term.
Chapter 10 - Question C1
In October 1979, the Federal Reserve changed its policy of using finely tuned interest rate adjustments and instead began targeting the money supply. Using the data in INTDEF.RAW, define a dummy variable equal to 1 for years after 1979. Include this dummy in equation (10.15) to see if there is a shift in the interest rate equation after
1979. What do you conclude?
data13 <- wooldridge::intdef
head(data13)
## year i3 inf rec out def i3_1 inf_1 def_1 ci3 cinf
## 1 1948 1.04 8.1 16.2 11.6 -4.6000004 NA NA NA NA NA
## 2 1949 1.10 -1.2 14.5 14.3 -0.1999998 1.04 8.1 -4.6000004 0.06000006 -9.3
## 3 1950 1.22 1.3 14.4 15.6 1.2000008 1.10 -1.2 -0.1999998 0.12000000 2.5
## 4 1951 1.55 7.9 16.1 14.2 -1.9000006 1.22 1.3 1.2000008 0.32999992 6.6
## 5 1952 1.77 1.9 19.0 19.4 0.3999996 1.55 7.9 -1.9000006 0.22000003 -6.0
## 6 1953 1.93 0.8 18.7 20.4 1.6999989 1.77 1.9 0.3999996 0.15999997 -1.1
## cdef y77
## 1 NA 0
## 2 4.400001 0
## 3 1.400001 0
## 4 -3.100001 0
## 5 2.300000 0
## 6 1.299999 0
data13$dummy <- as.integer(data13$year > 1979)
data13$dummy
## [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
model15 <- lm(i3 ~ inf + def , data = data13)
summary(model15)
##
## Call:
## lm(formula = i3 ~ inf + def, data = data13)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.9948 -1.1694 0.1959 0.9602 4.7224
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.73327 0.43197 4.012 0.00019 ***
## inf 0.60587 0.08213 7.376 1.12e-09 ***
## def 0.51306 0.11838 4.334 6.57e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.843 on 53 degrees of freedom
## Multiple R-squared: 0.6021, Adjusted R-squared: 0.5871
## F-statistic: 40.09 on 2 and 53 DF, p-value: 2.483e-11
model16 <- lm(i3 ~ inf + def + y77 , data = data13)
summary(model16)
##
## Call:
## lm(formula = i3 ~ inf + def + y77, data = data13)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.4048 -0.9632 0.2192 0.8497 4.3447
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.40531 0.42239 3.327 0.00162 **
## inf 0.56884 0.07832 7.263 1.88e-09 ***
## def 0.36276 0.12337 2.940 0.00488 **
## y77 1.47773 0.52349 2.823 0.00673 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.733 on 52 degrees of freedom
## Multiple R-squared: 0.6549, Adjusted R-squared: 0.635
## F-statistic: 32.9 on 3 and 52 DF, p-value: 4.608e-12
Chapter 10 - Question C6
data14 <- wooldridge::fertil3
head(data14)
## gfr pe year t tsq pe_1 pe_2 pe_3 pe_4 pill ww2 tcu cgfr cpe
## 1 124.7 0.00 1913 1 1 NA NA NA NA 0 0 1 NA NA
## 2 126.6 0.00 1914 2 4 0.00 NA NA NA 0 0 8 1.900002 0.00
## 3 125.0 0.00 1915 3 9 0.00 0 NA NA 0 0 27 -1.599998 0.00
## 4 123.4 0.00 1916 4 16 0.00 0 0 NA 0 0 64 -1.599998 0.00
## 5 121.0 19.27 1917 5 25 0.00 0 0 0 0 0 125 -2.400002 19.27
## 6 119.8 23.94 1918 6 36 19.27 0 0 0 0 0 216 -1.199997 4.67
## cpe_1 cpe_2 cpe_3 cpe_4 gfr_1 cgfr_1 cgfr_2 cgfr_3 cgfr_4 gfr_2
## 1 NA NA NA NA NA NA NA NA NA NA
## 2 NA NA NA NA 124.7 NA NA NA NA NA
## 3 0.00 NA NA NA 126.6 1.900002 NA NA NA 124.7
## 4 0.00 0 NA NA 125.0 -1.599998 1.900002 NA NA 126.6
## 5 0.00 0 0 NA 123.4 -1.599998 -1.599998 1.900002 NA 125.0
## 6 19.27 0 0 0 121.0 -2.400002 -1.599998 -1.599998 1.900002 123.4
Question i) Regress gfrt on t and t^2 and save the residuals. This gives a detrended gfrt, say, gf.
model17<- lm(gfr ~ t + tsq, data = data14)
summary(model17)
##
## Call:
## lm(formula = gfr ~ t + tsq, data = data14)
##
## Residuals:
## Min 1Q Median 3Q Max
## -28.7519 -12.5333 0.3168 13.7611 28.7346
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 107.056263 6.049651 17.696 <2e-16 ***
## t 0.071697 0.382446 0.187 0.852
## tsq -0.007959 0.005077 -1.568 0.122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 16.64 on 69 degrees of freedom
## Multiple R-squared: 0.3141, Adjusted R-squared: 0.2942
## F-statistic: 15.8 on 2 and 69 DF, p-value: 2.243e-06
residuals_gft <- resid(model17)
head(residuals_gft,6)
## 1 2 3 4 5 6
## 17.58000 19.43218 17.80028 16.18430 13.78423 12.60009
Question (ii) Regress gf(t)on all of the variables in equation (10.35), including t and t^2. Compare the R-squared with that from (10.35). What do you conclude?
model19 <- lm(residuals_gft ~ pe + year + tsq + pe_1 + pe_2 + pe_3 + pe_4 + pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + cpe_3 + cpe_4 + gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + gfr_2 + t + tsq, data = data14)
summary(model19)
## Warning in summary.lm(model19): essentially perfect fit: summary may be
## unreliable
##
## Call:
## lm(formula = residuals_gft ~ pe + year + tsq + pe_1 + pe_2 +
## pe_3 + pe_4 + pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 +
## cpe_3 + cpe_4 + gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 +
## gfr_2 + t + tsq, data = data14)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.511e-14 -3.343e-15 3.140e-16 3.383e-15 3.874e-14
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.003e+01 3.704e-12 8.106e+12 <2e-16 ***
## pe 9.797e-17 1.150e-16 8.520e-01 0.3982
## year -7.170e-02 1.920e-15 -3.734e+13 <2e-16 ***
## tsq 7.959e-03 5.644e-17 1.410e+14 <2e-16 ***
## pe_1 -2.552e-16 1.264e-16 -2.020e+00 0.0489 *
## pe_2 3.617e-16 1.376e-16 2.629e+00 0.0114 *
## pe_3 -2.201e-16 1.301e-16 -1.692e+00 0.0969 .
## pe_4 1.078e-16 1.017e-16 1.060e+00 0.2944
## pill -5.558e-15 9.440e-15 -5.890e-01 0.5588
## ww2 6.534e-15 1.055e-14 6.200e-01 0.5385
## tcu -2.996e-19 4.837e-19 -6.190e-01 0.5385
## cgfr 1.000e+00 4.348e-16 2.300e+15 <2e-16 ***
## cpe NA NA NA NA
## cpe_1 NA NA NA NA
## cpe_2 NA NA NA NA
## cpe_3 NA NA NA NA
## cpe_4 -5.996e-17 1.031e-16 -5.820e-01 0.5635
## gfr_1 1.000e+00 2.027e-16 4.933e+15 <2e-16 ***
## cgfr_1 -4.721e-17 4.252e-16 -1.110e-01 0.9120
## cgfr_2 -4.426e-16 4.489e-16 -9.860e-01 0.3290
## cgfr_3 4.138e-16 4.030e-16 1.027e+00 0.3096
## cgfr_4 -7.363e-16 3.976e-16 -1.852e+00 0.0701 .
## gfr_2 NA NA NA NA
## t NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.035e-14 on 49 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 9.627e+30 on 17 and 49 DF, p-value: < 2.2e-16
Question (iii) Reestimate equation (10.35) but add t3 to the equation. Is this additional term statistically significant?
model20 <- lm(gfr ~ pe + year + tsq + pe_1 + pe_2 + pe_3 + pe_4 + pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + cpe_3 + cpe_4 + gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + gfr_2 + t + tsq + pe_3, data = data14)
summary(model20)
## Warning in summary.lm(model20): essentially perfect fit: summary may be
## unreliable
##
## Call:
## lm(formula = gfr ~ pe + year + tsq + pe_1 + pe_2 + pe_3 + pe_4 +
## pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + cpe_3 + cpe_4 +
## gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + gfr_2 + t + tsq +
## pe_3, data = data14)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.781e-14 -3.372e-15 4.860e-16 3.509e-15 2.608e-14
##
## Coefficients: (6 not defined because of singularities)
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.866e-12 3.438e-12 -8.340e-01 0.4086
## pe 2.736e-17 1.067e-16 2.560e-01 0.7987
## year 1.502e-15 1.782e-15 8.430e-01 0.4035
## tsq -3.121e-17 5.238e-17 -5.960e-01 0.5541
## pe_1 -2.128e-16 1.173e-16 -1.814e+00 0.0758 .
## pe_2 3.108e-16 1.277e-16 2.434e+00 0.0186 *
## pe_3 -1.396e-16 1.207e-16 -1.156e+00 0.2532
## pe_4 7.685e-17 9.441e-17 8.140e-01 0.4196
## pill -1.228e-14 8.762e-15 -1.401e+00 0.1674
## ww2 4.720e-15 9.790e-15 4.820e-01 0.6318
## tcu 2.043e-19 4.489e-19 4.550e-01 0.6511
## cgfr 1.000e+00 4.036e-16 2.478e+15 <2e-16 ***
## cpe NA NA NA NA
## cpe_1 NA NA NA NA
## cpe_2 NA NA NA NA
## cpe_3 NA NA NA NA
## cpe_4 -8.993e-17 9.569e-17 -9.400e-01 0.3519
## gfr_1 1.000e+00 1.881e-16 5.315e+15 <2e-16 ***
## cgfr_1 -4.929e-16 3.946e-16 -1.249e+00 0.2175
## cgfr_2 -4.788e-16 4.166e-16 -1.149e+00 0.2560
## cgfr_3 1.570e-16 3.740e-16 4.200e-01 0.6765
## cgfr_4 -7.097e-16 3.690e-16 -1.924e+00 0.0602 .
## gfr_2 NA NA NA NA
## t NA NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.607e-15 on 49 degrees of freedom
## (5 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 1.495e+31 on 17 and 49 DF, p-value: < 2.2e-16