Chapter 7
library(wooldridge)
## Warning: package 'wooldridge' was built under R version 4.2.3
data <- wooldridge::sleep75
head(sleep75)
##   age black case clerical construc educ earns74 gdhlth inlf leis1 leis2 leis3
## 1  32     0    1        0        0   12       0      0    1  3529  3479  3479
## 2  31     0    2        0        0   14    9500      1    1  2140  2140  2140
## 3  44     0    3        0        0   17   42500      1    1  4595  4505  4227
## 4  30     0    4        0        0   12   42500      1    1  3211  3211  3211
## 5  64     0    5        0        0   14    2500      1    1  4052  4007  4007
## 6  41     0    6        0        0   12       0      1    1  4812  4797  4797
##   smsa  lhrwage   lothinc male marr prot rlxall selfe sleep slpnaps south
## 1    0 1.955861 10.075380    1    1    1   3163     0  3113    3163     0
## 2    0 0.357674  0.000000    1    0    1   2920     1  2920    2920     1
## 3    1 3.021887  0.000000    1    1    0   3038     1  2670    2760     0
## 4    0 2.263844  0.000000    0    1    1   3083     1  3083    3083     0
## 5    0 1.011601  9.328213    1    1    1   3493     0  3448    3493     0
## 6    0 2.957511 10.657280    1    1    1   4078     0  4063    4078     0
##   spsepay spwrk75 totwrk union worknrm workscnd exper yngkid yrsmarr    hrwage
## 1       0       0   3438     0    3438        0    14      0      13  7.070004
## 2       0       0   5020     0    5020        0    11      0       0  1.429999
## 3   20000       1   2815     0    2815        0    21      0       0 20.529997
## 4    5000       1   3786     0    3786        0    12      0      12  9.619998
## 5    2400       1   2580     0    2580        0    44      0      33  2.750000
## 6       0       0   1205     0       0     1205    23      0      23 19.249998
##   agesq
## 1  1024
## 2   961
## 3  1936
## 4   900
## 5  4096
## 6  1681
Regression model
model1 <- lm(sleep ~ totwrk + educ + age + agesq + male , data = data)
summary(model1)
## 
## Call:
## lm(formula = sleep ~ totwrk + educ + age + agesq + male, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2378.00  -243.29     6.74   259.24  1350.19 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3840.83197  235.10870  16.336   <2e-16 ***
## totwrk        -0.16342    0.01813  -9.013   <2e-16 ***
## educ         -11.71332    5.86689  -1.997   0.0463 *  
## age           -8.69668   11.20746  -0.776   0.4380    
## agesq          0.12844    0.13390   0.959   0.3378    
## male          87.75243   34.32616   2.556   0.0108 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 417.7 on 700 degrees of freedom
## Multiple R-squared:  0.1228, Adjusted R-squared:  0.1165 
## F-statistic: 19.59 on 5 and 700 DF,  p-value: < 2.2e-16
The model1 above shows whether the total number of working hours, education level, age, the square of age, and gender(male) are significant to the hours of sleeping. In this case, only education level and gender (male) are significant variables. Because the proability of these two variables are lower than 0.05. The coefficients of each variable are explained down below.
1. The coefficients of totwork is -0.16342. It means as the total weekly spent working increases by 1 unit, the hours of sleeping decreases by 0.16 unit.
2. The coefficients of education level is -11.71. It means as time spent in education increases by 1 unit, the hours of sleeping decreases by 11.71 units.
3. The coefficients of age is -8.70. It means as people get old,the hours of sleeping decreases by 8.70 units.
4. The coefficients of dummy variable (gender) is 87.75. It means when gender is male,the hours of sleeping increases by 87.75 units.
Question i). Is there evidence that men sleep more than women?
The coefficients of dummy variable (gender) is 87.75. The standard error of variable is 34.33. To answer this question, the t-statistics of the hypothesis should be calculated. The hypothesis should be:
H(0):the coefficient for "male" is zero.
H(1):the coefficient for "male" is not zero.
t=87.75/34.33=2.55
The degree of freedom = 700-1=699
Based on the t distribution table, the p value is 0.010985. The result is significant at a chosen significance level because p value is lower than 0.05. So, it can prove that the men sleep more than women.
Question ii). Is there a statistically significant tradeoff between working and sleeping? What is the estimated tradeoff?
The coefficient for "totwrk" is 0.163, and its standard error is 0.018.
t=0.163/0.018=9.06
Based on the t distribution table, the p value is 0.00001.The result is significant at a chosen significance level because p value is lower than 0.05. So, it can prove that there is a statistically significant tradeoff between working and sleeping.
Question iii).
model3 <- lm(sleep ~ age , data = data)
summary(model3)
## 
## Call:
## lm(formula = sleep ~ age, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2455.35  -254.39     9.55   270.77  1381.96 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3128.913     59.468  52.615   <2e-16 ***
## age            3.541      1.471   2.408   0.0163 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 442.9 on 704 degrees of freedom
## Multiple R-squared:  0.008167,   Adjusted R-squared:  0.006758 
## F-statistic: 5.797 on 1 and 704 DF,  p-value: 0.01631
The coefficients of age is 3.541. It means as people get old,the hours of sleeping increases by 3.541 units. The age is a significant variable. Because p-value is 0.0163, which is lower than 0.05.
Chapter 7 - Question 3
library(wooldridge)
data1 <- wooldridge::gpa2
head(data1)
##    sat tothrs colgpa athlete verbmath hsize hsrank   hsperc female white black
## 1  920     43   2.04       1  0.48387  0.10      4 40.00000      1     0     0
## 2 1170     18   4.00       0  0.82813  9.40    191 20.31915      0     1     0
## 3  810     14   1.78       1  0.88372  1.19     42 35.29412      0     1     0
## 4  940     40   2.42       0  0.80769  5.71    252 44.13310      0     1     0
## 5 1180     18   2.61       0  0.73529  2.14     86 40.18692      0     1     0
## 6  980    114   3.03       0  0.81481  2.68     41 15.29851      1     1     0
##   hsizesq
## 1  0.0100
## 2 88.3600
## 3  1.4161
## 4 32.6041
## 5  4.5796
## 6  7.1824
model4<- lm(sat ~ hsize + hsizesq + female + black + female*black , data = data1)
summary(model4)
## 
## Call:
## lm(formula = sat ~ hsize + hsizesq + female + black + female * 
##     black, data = data1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -570.45  -89.54   -5.24   85.41  479.13 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1028.0972     6.2902 163.445  < 2e-16 ***
## hsize          19.2971     3.8323   5.035 4.97e-07 ***
## hsizesq        -2.1948     0.5272  -4.163 3.20e-05 ***
## female        -45.0915     4.2911 -10.508  < 2e-16 ***
## black        -169.8126    12.7131 -13.357  < 2e-16 ***
## female:black   62.3064    18.1542   3.432 0.000605 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 133.4 on 4131 degrees of freedom
## Multiple R-squared:  0.08578,    Adjusted R-squared:  0.08468 
## F-statistic: 77.52 on 5 and 4131 DF,  p-value: < 2.2e-16
Question i).
Is there strong evidence that hsize2 should be included in the model? From this equation, what is the optimal high school size?
All the variables included in the equation are significant. Because p values of all variables are lower then 0.05.

To prove that hsizesq should be included in the model, t statistical test should be done. 

t=2.19/0.53=4.13
The degree of freedom = 4131-1 = 4130
Based on the t distribution table, the p value is 0.000037.The result is significant at a chosen significance level because p value is lower than 0.05. So, it can prove that there is a statistically significance of hsizesq and the variable should be included in the model.
Question ii).
Holding hsize fixed, what is the estimated difference in SAT score between nonblack females and nonblack males? How statistically significant is this estimated difference?
− 45.09 (female) +62.31 (female*black) = 17.22.
To check whether the difference between nonblack females and nonblack males is significant, t statistical test should be run. 
t_stat <- 62.31/18.15
df <- 4131-1
p_value <- 2 * (1 - pt(abs(t_stat), df))
p_value
## [1] 0.0006026815
P value is equal to 0.0006. So that the estimated difference is statistically significant.
Question iii).
What is the estimated difference in SAT score between nonblack males and black males? Test the null hypothesis that there is no difference between their scores, against the alternative that there is a difference.
The difference is -169.81.
t_stat1 <- 169.81/12.71
df <- 4131-1
p_value1 <- 2 * (1 - pt(abs(t_stat1), df))
p_value1
## [1] 0
P value is equal to 0. So that the estimated difference is statistically significant.
Question iv).
What is the estimated difference in SAT score between black females and nonblack females? What would you need to do to test whether the difference is statistically significant?
The difference is − 169.81 (black) +62.31 (female · black) = -107.5. 
t_stat2 <- 45.09/6.29
df <- 4131-1
p_value2 <- 2 * (1 - pt(abs(t_stat2), df))
p_value2
## [1] 8.939516e-13
P value is equal to 0.0000000000000894. So that the estimated difference is statistically significant.
Chapter 7 - Question C1
data2 <- wooldridge::gpa1
head(data2)
##   age soph junior senior senior5 male campus business engineer colGPA hsGPA ACT
## 1  21    0      0      1       0    0      0        1        0    3.0   3.0  21
## 2  21    0      0      1       0    0      0        1        0    3.4   3.2  24
## 3  20    0      1      0       0    0      0        1        0    3.0   3.6  26
## 4  19    1      0      0       0    1      1        1        0    3.5   3.5  27
## 5  20    0      1      0       0    0      0        1        0    3.6   3.9  28
## 6  20    0      0      1       0    1      1        1        0    3.0   3.4  25
##   job19 job20 drive bike walk voluntr PC greek car siblings bgfriend clubs
## 1     0     1     1    0    0       0  0     0   1        1        0     0
## 2     0     1     1    0    0       0  0     0   1        0        1     1
## 3     1     0     0    0    1       0  0     0   1        1        0     1
## 4     1     0     0    0    1       0  0     0   0        1        0     0
## 5     0     1     0    1    0       0  0     0   1        1        1     0
## 6     0     0     0    0    1       0  0     0   1        1        0     0
##   skipped alcohol gradMI fathcoll mothcoll
## 1       2     1.0      1        0        0
## 2       0     1.0      1        1        1
## 3       0     1.0      1        1        1
## 4       0     0.0      0        0        0
## 5       0     1.5      1        1        0
## 6       0     0.0      0        1        0
Question i).
Add the variables mothcoll and fathcoll to the equation estimated in (7.6) and report the results in the usual form. What happens to the estimated effect of PC ownership? Is PC still statistically significant?
model5<- lm(colGPA ~ PC + hsGPA + ACT , data = data2)
summary(model5)
## 
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT, data = data2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.7901 -0.2622 -0.0107  0.2334  0.7570 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.263520   0.333125   3.793 0.000223 ***
## PC          0.157309   0.057287   2.746 0.006844 ** 
## hsGPA       0.447242   0.093647   4.776 4.54e-06 ***
## ACT         0.008659   0.010534   0.822 0.412513    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3325 on 137 degrees of freedom
## Multiple R-squared:  0.2194, Adjusted R-squared:  0.2023 
## F-statistic: 12.83 on 3 and 137 DF,  p-value: 1.932e-07
model6<- lm(colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll , data = data2)
summary(model6)
## 
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll, 
##     data = data2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78149 -0.25726 -0.02121  0.24691  0.74432 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.255554   0.335392   3.744 0.000268 ***
## PC           0.151854   0.058716   2.586 0.010762 *  
## hsGPA        0.450220   0.094280   4.775 4.61e-06 ***
## ACT          0.007724   0.010678   0.723 0.470688    
## mothcoll    -0.003758   0.060270  -0.062 0.950376    
## fathcoll     0.041800   0.061270   0.682 0.496265    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3344 on 135 degrees of freedom
## Multiple R-squared:  0.2222, Adjusted R-squared:  0.1934 
## F-statistic: 7.713 on 5 and 135 DF,  p-value: 2.083e-06
PC is still significant. The p-value of PC was 0.006844 before adjustment. The value changed to 0.010762 after adjustment. The p-value of PC is still significant because the p-value is less than 0.05.

About the coefficient of PC, it slightly changed from 0.157309 to   0.151854.
Question ii).Test for joint significance of mothcoll and fathcoll in the equation from part (i) and be sure to report the p-value.
library(car)
## Warning: package 'car' was built under R version 4.2.3
## Loading required package: carData
## Warning: package 'carData' was built under R version 4.2.3
hypotheses <- c("mothcoll=0", "fathcoll=0")
joint_test <- linearHypothesis(model6, hypotheses)
joint_test
## Linear hypothesis test
## 
## Hypothesis:
## mothcoll = 0
## fathcoll = 0
## 
## Model 1: restricted model
## Model 2: colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll
## 
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1    137 15.149                           
## 2    135 15.094  2  0.054685 0.2446 0.7834
P-value of the joint test is 0.7834.
summary(joint_test)
##      Res.Df           RSS              Df      Sum of Sq             F         
##  Min.   :135.0   Min.   :15.09   Min.   :2   Min.   :0.05469   Min.   :0.2446  
##  1st Qu.:135.5   1st Qu.:15.11   1st Qu.:2   1st Qu.:0.05469   1st Qu.:0.2446  
##  Median :136.0   Median :15.12   Median :2   Median :0.05469   Median :0.2446  
##  Mean   :136.0   Mean   :15.12   Mean   :2   Mean   :0.05469   Mean   :0.2446  
##  3rd Qu.:136.5   3rd Qu.:15.14   3rd Qu.:2   3rd Qu.:0.05469   3rd Qu.:0.2446  
##  Max.   :137.0   Max.   :15.15   Max.   :2   Max.   :0.05469   Max.   :0.2446  
##                                  NA's   :1   NA's   :1         NA's   :1       
##      Pr(>F)      
##  Min.   :0.7834  
##  1st Qu.:0.7834  
##  Median :0.7834  
##  Mean   :0.7834  
##  3rd Qu.:0.7834  
##  Max.   :0.7834  
##  NA's   :1
Question iii).  Add hsGPA2 to the model from part (i) and decide whether this generalization is needed.
model7<- lm(colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll + I(hsGPA^2) , data = data2)
summary(model7)
## 
## Call:
## lm(formula = colGPA ~ PC + hsGPA + ACT + mothcoll + fathcoll + 
##     I(hsGPA^2), data = data2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78998 -0.24327 -0.00648  0.26179  0.72231 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)  
## (Intercept)  5.040328   2.443038   2.063   0.0410 *
## PC           0.140446   0.058858   2.386   0.0184 *
## hsGPA       -1.802520   1.443552  -1.249   0.2140  
## ACT          0.004786   0.010786   0.444   0.6580  
## mothcoll     0.003091   0.060110   0.051   0.9591  
## fathcoll     0.062761   0.062401   1.006   0.3163  
## I(hsGPA^2)   0.337341   0.215711   1.564   0.1202  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3326 on 134 degrees of freedom
## Multiple R-squared:  0.2361, Adjusted R-squared:  0.2019 
## F-statistic: 6.904 on 6 and 134 DF,  p-value: 2.088e-06
The generalization is not needed.
Chapter 7 - Question C2
Question i).
Holding other factors fixed, what is the approximate difference in monthly salary between blacks and nonblacks? Is this difference statistically significant?
data3 <- wooldridge::wage2
head(data3)
##   wage hours  IQ KWW educ exper tenure age married black south urban sibs
## 1  769    40  93  35   12    11      2  31       1     0     0     1    1
## 2  808    50 119  41   18    11     16  37       1     0     0     1    1
## 3  825    40 108  46   14    11      9  33       1     0     0     1    1
## 4  650    40  96  32   12    13      7  32       1     0     0     1    4
## 5  562    40  74  27   11    14      5  34       1     0     0     1   10
## 6 1400    40 116  43   16    14      2  35       1     1     0     1    1
##   brthord meduc feduc    lwage
## 1       2     8     8 6.645091
## 2      NA    14    14 6.694562
## 3       2    14    14 6.715384
## 4       3    12    12 6.476973
## 5       6     6    11 6.331502
## 6       2     8    NA 7.244227
model7 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban, data = data3)
summary(model7)
## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban, data = data3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98069 -0.21996  0.00707  0.24288  1.22822 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.395497   0.113225  47.653  < 2e-16 ***
## educ         0.065431   0.006250  10.468  < 2e-16 ***
## exper        0.014043   0.003185   4.409 1.16e-05 ***
## tenure       0.011747   0.002453   4.789 1.95e-06 ***
## married      0.199417   0.039050   5.107 3.98e-07 ***
## black       -0.188350   0.037667  -5.000 6.84e-07 ***
## south       -0.090904   0.026249  -3.463 0.000558 ***
## urban        0.183912   0.026958   6.822 1.62e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3655 on 927 degrees of freedom
## Multiple R-squared:  0.2526, Adjusted R-squared:  0.2469 
## F-statistic: 44.75 on 7 and 927 DF,  p-value: < 2.2e-16
Answer i). The difference is -0.18835, which is statistically significant. Because the p value is lower than 0.05.
Question ii).
Add the variables exper 2 and tenure2 to the equation and show that they are jointly insignificant at even the 20% level.`
model8 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban + I(exper^2) + I(tenure^2), data = data3)
summary(model8)
## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban + I(exper^2) + I(tenure^2), data = data3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98236 -0.21972 -0.00036  0.24078  1.25127 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.3586756  0.1259143  42.558  < 2e-16 ***
## educ         0.0642761  0.0063115  10.184  < 2e-16 ***
## exper        0.0172146  0.0126138   1.365 0.172665    
## tenure       0.0249291  0.0081297   3.066 0.002229 ** 
## married      0.1985470  0.0391103   5.077 4.65e-07 ***
## black       -0.1906636  0.0377011  -5.057 5.13e-07 ***
## south       -0.0912153  0.0262356  -3.477 0.000531 ***
## urban        0.1854241  0.0269585   6.878 1.12e-11 ***
## I(exper^2)  -0.0001138  0.0005319  -0.214 0.830622    
## I(tenure^2) -0.0007964  0.0004710  -1.691 0.091188 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3653 on 925 degrees of freedom
## Multiple R-squared:  0.255,  Adjusted R-squared:  0.2477 
## F-statistic: 35.17 on 9 and 925 DF,  p-value: < 2.2e-16
Question iii).
Extend the original model to allow the return to education to depend on race and test whether the return to education does depend on race.`
model9 <- lm(log(wage) ~ educ + exper + tenure + married + south + urban + educ * black, data = data3)
summary(model9)
## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + south + 
##     urban + educ * black, data = data3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97782 -0.21832  0.00475  0.24136  1.23226 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.374817   0.114703  46.859  < 2e-16 ***
## educ         0.067115   0.006428  10.442  < 2e-16 ***
## exper        0.013826   0.003191   4.333 1.63e-05 ***
## tenure       0.011787   0.002453   4.805 1.80e-06 ***
## married      0.198908   0.039047   5.094 4.25e-07 ***
## south       -0.089450   0.026277  -3.404 0.000692 ***
## urban        0.183852   0.026955   6.821 1.63e-11 ***
## black        0.094809   0.255399   0.371 0.710561    
## educ:black  -0.022624   0.020183  -1.121 0.262603    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3654 on 926 degrees of freedom
## Multiple R-squared:  0.2536, Adjusted R-squared:  0.2471 
## F-statistic: 39.32 on 8 and 926 DF,  p-value: < 2.2e-16
Answer: The education does not depend on the race. Because p-value of black*educ is 0.2626, which is greater than 0.05. 
Question iv).
Again, start with the original model, but now allow wages to differ across four groups of people: married and black, married and nonblack, single and black, and single and nonblack. What is the estimated wage differential between married blacks and married nonblacks?
model10 <- lm(log(wage) ~ educ + exper + tenure + married + black + south + urban + married*black, data = data3)
summary(model10)
## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban + married * black, data = data3)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98013 -0.21780  0.01057  0.24219  1.22889 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)    5.403793   0.114122  47.351  < 2e-16 ***
## educ           0.065475   0.006253  10.471  < 2e-16 ***
## exper          0.014146   0.003191   4.433 1.04e-05 ***
## tenure         0.011663   0.002458   4.745 2.41e-06 ***
## married        0.188915   0.042878   4.406 1.18e-05 ***
## black         -0.240820   0.096023  -2.508 0.012314 *  
## south         -0.091989   0.026321  -3.495 0.000497 ***
## urban          0.184350   0.026978   6.833 1.50e-11 ***
## married:black  0.061354   0.103275   0.594 0.552602    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3656 on 926 degrees of freedom
## Multiple R-squared:  0.2528, Adjusted R-squared:  0.2464 
## F-statistic: 39.17 on 8 and 926 DF,  p-value: < 2.2e-16
Chapter 8
Chapter 8 - Question 1
Which of the following are consequences of heteroskedasticity?
(i) The OLS estimators, bj, are inconsistent.
(ii) The usual F statistic no longer has an F distribution.
(iii) The OLS estimators are no longer BLUE.
Answer:
(i) The OLS estimators, bj, are inconsistent: True. Heteroskedasticity can lead to inefficient estimates, making the OLS estimators inconsistent.

(ii) The usual F statistic no longer has an F distribution: False. Heteroskedasticity does not affect the distribution of the F-statistic directly. However, it can affect the efficiency of the estimates, leading to incorrect inference in hypothesis tests.

(iii) The OLS estimators are no longer BLUE (Best Linear Unbiased Estimators): True. Heteroskedasticity violates one of the Gauss-Markov assumptions, leading to the OLS estimators no longer being BLUE. In the presence of heteroskedasticity, generalized least squares (GLS) or weighted least squares (WLS) may be more appropriate for obtaining efficient and unbiased estimates.
Chapter 8 - Question 5
data4 <- wooldridge::smoke
head(data4)
##   educ cigpric white age income cigs restaurn   lincome agesq lcigpric
## 1 16.0  60.506     1  46  20000    0        0  9.903487  2116 4.102743
## 2 16.0  57.883     1  40  30000    0        0 10.308952  1600 4.058424
## 3 12.0  57.664     1  58  30000    3        0 10.308952  3364 4.054633
## 4 13.5  57.883     1  30  20000    0        0  9.903487   900 4.058424
## 5 10.0  58.320     1  17  20000    0        0  9.903487   289 4.065945
## 6  6.0  59.340     1  86   6500    0        0  8.779557  7396 4.083283
model11 <- lm(cigs ~ log(cigpric) + log(income) + educ + age + agesq + restaurn + white , data = data4)
summary(model11)
## 
## Call:
## lm(formula = cigs ~ log(cigpric) + log(income) + educ + age + 
##     agesq + restaurn + white, data = data4)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -15.772  -9.330  -5.907   7.945  70.275 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.682419  24.220729  -0.111  0.91184    
## log(cigpric) -0.850907   5.782321  -0.147  0.88305    
## log(income)   0.869014   0.728763   1.192  0.23344    
## educ         -0.501753   0.167168  -3.001  0.00277 ** 
## age           0.774502   0.160516   4.825 1.68e-06 ***
## agesq        -0.009069   0.001748  -5.188 2.70e-07 ***
## restaurn     -2.865621   1.117406  -2.565  0.01051 *  
## white        -0.559236   1.459461  -0.383  0.70169    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 13.41 on 799 degrees of freedom
## Multiple R-squared:  0.05291,    Adjusted R-squared:  0.04461 
## F-statistic: 6.377 on 7 and 799 DF,  p-value: 2.588e-07
Question i). Are there any important differences between the two sets of standard errors?
Standard errors between two sets are almost same.
s_error <- coef(summary(model11))
s_error
##                  Estimate   Std. Error    t value     Pr(>|t|)
## (Intercept)  -2.682418774 24.220728831 -0.1107489 9.118433e-01
## log(cigpric) -0.850907441  5.782321084 -0.1471567 8.830454e-01
## log(income)   0.869013971  0.728763480  1.1924499 2.334389e-01
## educ         -0.501753247  0.167167689 -3.0014966 2.770186e-03
## age           0.774502156  0.160515805  4.8250835 1.676279e-06
## agesq        -0.009068603  0.001748055 -5.1878253 2.699355e-07
## restaurn     -2.865621212  1.117405936 -2.5645301 1.051320e-02
## white        -0.559236403  1.459461034 -0.3831801 7.016882e-01
Question ii). Holding other factors fixed, if education increases by four years, what happens to the estimated probability of smoking?
edu_4 <- coef(model11)["educ"] * 4
edu_4
##      educ 
## -2.007013
Question iii).
At what point does another year of age reduce the probability of smoking?
age <- -coef(model11)["age"] / (2 * coef(model11)["I(age^2)"])
age
## age 
##  NA
Question (iv) Interpret the coefficient on the binary variable restaurn (a dummy variable equal to one if the person lives in a state with restaurant smoking restrictions).
coef_res <- coef(model11)["restaurn"]
coef_res
##  restaurn 
## -2.865621
Question (v) Person number 206 in the data set has the following characteristics: cigpric=67.44, income=6,500, educ=16, age=77, restaurn= 0, white=0, and smokes 5 0. Compute the predicted probability of smoking for this person and comment on the result.
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.2.3
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.2.3
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
data_update <- data.frame(cigpric = 67.44, income = 6500, educ = 16, age = 77, restaurn = 0, white = 0)
het_test <- bptest(model11)
het_test
## 
##  studentized Breusch-Pagan test
## 
## data:  model11
## BP = 32.377, df = 7, p-value = 3.458e-05
Chapter 8 - Question C4
data5 <- wooldridge::vote1
head(data5)
##   state district democA voteA expendA expendB prtystrA lexpendA lexpendB
## 1    AL        7      1    68 328.296   8.737       41 5.793916 2.167567
## 2    AK        1      0    62 626.377 402.477       60 6.439952 5.997638
## 3    AZ        2      1    73  99.607   3.065       55 4.601233 1.120048
## 4    AZ        3      0    69 319.690  26.281       64 5.767352 3.268846
## 5    AR        3      0    75 159.221  60.054       66 5.070293 4.095244
## 6    AR        4      1    69 570.155  21.393       46 6.345908 3.063064
##     shareA
## 1 97.40767
## 2 60.88104
## 3 97.01476
## 4 92.40370
## 5 72.61247
## 6 96.38355
Question i). Estimate a model with voteA as the dependent variable and prtystrA, democA, log(expendA), and log(expendB) as independent variables. Obtain the OLS residuals, uˆi, and regress these on all of the independent variables. Explain why you obtain R2=0.
model12 <- lm(voteA ~ prtystrA + democA + log(expendA) + log(expendB), data = data5)
summary(model12)
## 
## Call:
## lm(formula = voteA ~ prtystrA + democA + log(expendA) + log(expendB), 
##     data = data5)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.576  -4.864  -1.146   4.903  24.566 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.66141    4.73604   7.952 2.56e-13 ***
## prtystrA      0.25192    0.07129   3.534  0.00053 ***
## democA        3.79294    1.40652   2.697  0.00772 ** 
## log(expendA)  5.77929    0.39182  14.750  < 2e-16 ***
## log(expendB) -6.23784    0.39746 -15.694  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.573 on 168 degrees of freedom
## Multiple R-squared:  0.8012, Adjusted R-squared:  0.7964 
## F-statistic: 169.2 on 4 and 168 DF,  p-value: < 2.2e-16
residuals <- residuals(model12)
residuals_model <- lm(residuals ~ prtystrA + democA + log(expendA) + log(expendB), data = data5)
summary(residuals_model)
## 
## Call:
## lm(formula = residuals ~ prtystrA + democA + log(expendA) + log(expendB), 
##     data = data5)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.576  -4.864  -1.146   4.903  24.566 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -2.705e-15  4.736e+00       0        1
## prtystrA      1.563e-16  7.129e-02       0        1
## democA       -4.577e-16  1.407e+00       0        1
## log(expendA) -3.685e-16  3.918e-01       0        1
## log(expendB) -5.594e-16  3.975e-01       0        1
## 
## Residual standard error: 7.573 on 168 degrees of freedom
## Multiple R-squared:  5.211e-32,  Adjusted R-squared:  -0.02381 
## F-statistic: 2.189e-30 on 4 and 168 DF,  p-value: 1
Answer i). The R-squared being close to zero in the second regression indicates that the residuals are not well explained by the independent variables, suggesting heteroskedasticity.
Question ii).
Now, compute the Breusch-Pagan test for heteroskedasticity. Use the F statistic version and report the p-value.
bptest_result <- bptest(model12)
print(bptest_result)
## 
##  studentized Breusch-Pagan test
## 
## data:  model12
## BP = 9.0934, df = 4, p-value = 0.05881
Answer ii). P-value is higher than 0.05881, which means that the result is not statistically significant.
Question iii).
Compute the special case of the White test for heteroskedasticity, again using the F statistic form. How strong is the evidence for heteroskedasticity now?
white_data <- data.frame(residuals_squared = residuals^2, data5$prtystrA, data5$democA, log_expendA = log(data5$expendA), log_expendB = log(data5$expendB))
white_model <- lm(residuals_squared ~ data5$prtystrA + data5$democA + log_expendA + log_expendB, data = white_data)
f_statistic <- summary(white_model)$fstatistic
p_value <- pf(f_statistic[1], f_statistic[2], f_statistic[3], lower.tail = FALSE)
print(paste("F-statistic:", f_statistic[1], "P-value:", p_value))
## [1] "F-statistic: 2.33011268371626 P-value: 0.0580575140885541"
Chapter 8 - Question 13
Question i).
data6 <- wooldridge::fertil2
head(data6)
##   mnthborn yearborn age electric radio tv bicycle educ ceb agefbrth children
## 1        5       64  24        1     1  1       1   12   0       NA        0
## 2        1       56  32        1     1  1       1   13   3       25        3
## 3        7       58  30        1     0  0       0    5   1       27        1
## 4       11       45  42        1     0  1       0    4   3       17        2
## 5        5       45  43        1     1  1       1   11   2       24        2
## 6        8       52  36        1     0  0       0    7   1       26        1
##   knowmeth usemeth monthfm yearfm agefm idlnchld heduc agesq urban urb_educ
## 1        1       0      NA     NA    NA        2    NA   576     1       12
## 2        1       1      11     80    24        3    12  1024     1       13
## 3        1       0       6     83    24        5     7   900     1        5
## 4        1       0       1     61    15        3    11  1764     1        4
## 5        1       1       3     66    20        2    14  1849     1       11
## 6        1       1      11     76    24        4     9  1296     1        7
##   spirit protest catholic frsthalf educ0 evermarr
## 1      0       0        0        1     0        0
## 2      0       0        0        1     0        1
## 3      1       0        0        0     0        1
## 4      0       0        0        0     0        1
## 5      0       1        0        1     0        1
## 6      0       0        0        0     0        1
library("sandwich")
## Warning: package 'sandwich' was built under R version 4.2.3
model10 <- lm(children ~ age + I(age^2) + educ + electric + urban, data = data6)
summary(model10)
## 
## Call:
## lm(formula = children ~ age + I(age^2) + educ + electric + urban, 
##     data = data6)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9012 -0.7136 -0.0039  0.7119  7.4318 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.2225162  0.2401888 -17.580  < 2e-16 ***
## age          0.3409255  0.0165082  20.652  < 2e-16 ***
## I(age^2)    -0.0027412  0.0002718 -10.086  < 2e-16 ***
## educ        -0.0752323  0.0062966 -11.948  < 2e-16 ***
## electric    -0.3100404  0.0690045  -4.493 7.20e-06 ***
## urban       -0.2000339  0.0465062  -4.301 1.74e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.452 on 4352 degrees of freedom
##   (3 observations deleted due to missingness)
## Multiple R-squared:  0.5734, Adjusted R-squared:  0.5729 
## F-statistic:  1170 on 5 and 4352 DF,  p-value: < 2.2e-16
Some robust standard errors are bigger that nonrobust errors.
a <- coeftest(model10, vcov = sandwich)
a
## 
## t test of coefficients:
## 
##                Estimate  Std. Error  t value  Pr(>|t|)    
## (Intercept) -4.22251623  0.24368307 -17.3279 < 2.2e-16 ***
## age          0.34092552  0.01916146  17.7923 < 2.2e-16 ***
## I(age^2)    -0.00274121  0.00035027  -7.8260 6.278e-15 ***
## educ        -0.07523232  0.00630336 -11.9353 < 2.2e-16 ***
## electric    -0.31004041  0.06390411  -4.8517 1.267e-06 ***
## urban       -0.20003386  0.04543962  -4.4022 1.097e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Question ii). Add the three religious dummy variables and test whether they are jointly significant. What are the p-values for the nonrobust and robust tests?
joint_test <- coeftest(model10, vcov = vcovHC)
print(joint_test[, "Pr(>|t|)"])
##  (Intercept)          age     I(age^2)         educ     electric        urban 
## 9.635281e-65 5.015759e-68 7.640643e-15 3.247461e-32 1.351408e-06 1.135260e-05
Question iii).  From the regresion in part (ii), obtain the fitted values yˆ and the residuals, u. Regress u2 on yˆ, yˆ2 and test the joint significance of the two regressors. Conclude that heteroskedasticity is present in the equation for children.
fv <- fitted(model10)
head(fv,6)
##         1         2         3         4         5         6 
## 0.9678977 2.3920079 2.6519254 4.4498594 4.0311559 3.4614951
residuals <- resid(model10)
head(residuals,6)
##          1          2          3          4          5          6 
## -0.9678977  0.6079921 -1.6519254 -2.4498594 -2.0311559 -2.4614951
hetero_test <- lm(residuals^2 ~ fv)
summary(hetero_test)
## 
## Call:
## lm(formula = residuals^2 ~ fv)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.336 -1.897 -0.321  0.682 49.275 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.54042    0.09451  -5.718 1.15e-08 ***
## fv           1.16693    0.03347  34.863  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.717 on 4356 degrees of freedom
## Multiple R-squared:  0.2182, Adjusted R-squared:  0.218 
## F-statistic:  1215 on 1 and 4356 DF,  p-value: < 2.2e-16
Question iv). Would you say the heteroskedasticity you found in part (iii) is practically important?
The heteroskedasticity is statistically significant because p-value is lower than 0.05.
Chapter 9 - Question 1
Question 1. When ceoten2 and comten2 are added. Is there evidence of functional form misspecification in this model?
data7 <- wooldridge::ceosal2
head(data7)
##   salary age college grad comten ceoten sales profits mktval  lsalary   lsales
## 1   1161  49       1    1      9      2  6200     966  23200 7.057037 8.732305
## 2    600  43       1    1     10     10   283      48   1100 6.396930 5.645447
## 3    379  51       1    1      9      3   169      40   1100 5.937536 5.129899
## 4    651  55       1    0     22     22  1100     -54   1000 6.478509 7.003066
## 5    497  44       1    1      8      6   351      28    387 6.208590 5.860786
## 6   1067  64       1    1      7      7 19000     614   3900 6.972606 9.852194
##     lmktval comtensq ceotensq  profmarg
## 1 10.051908       81        4 15.580646
## 2  7.003066      100      100 16.961130
## 3  7.003066       81        9 23.668638
## 4  6.907755      484      484 -4.909091
## 5  5.958425       64       36  7.977208
## 6  8.268732       49       49  3.231579
model11 <- lm(log(salary) ~ log(sales) + log(mktval) + profmarg + ceoten + comten, data = data7)
summary(model11)
## 
## Call:
## lm(formula = log(salary) ~ log(sales) + log(mktval) + profmarg + 
##     ceoten + comten, data = data7)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.5436 -0.2796 -0.0164  0.2857  1.9879 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.571977   0.253466  18.038  < 2e-16 ***
## log(sales)   0.187787   0.040003   4.694 5.46e-06 ***
## log(mktval)  0.099872   0.049214   2.029  0.04397 *  
## profmarg    -0.002211   0.002105  -1.050  0.29514    
## ceoten       0.017104   0.005540   3.087  0.00236 ** 
## comten      -0.009238   0.003337  -2.768  0.00626 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4947 on 171 degrees of freedom
## Multiple R-squared:  0.3525, Adjusted R-squared:  0.3336 
## F-statistic: 18.62 on 5 and 171 DF,  p-value: 9.488e-15
r_squared <- summary(model11)$r.squared
r_squared
## [1] 0.3525374
model12 <- lm(log(salary) ~ log(sales) + log(mktval) + profmarg + ceotensq + comtensq, data = data7)
summary(model12)
## 
## Call:
## lm(formula = log(salary) ~ log(sales) + log(mktval) + profmarg + 
##     ceotensq + comtensq, data = data7)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.47481 -0.25933 -0.00511  0.27010  2.07583 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  4.612e+00  2.524e-01  18.276  < 2e-16 ***
## log(sales)   1.805e-01  4.021e-02   4.489 1.31e-05 ***
## log(mktval)  1.018e-01  4.988e-02   2.040   0.0429 *  
## profmarg    -2.077e-03  2.135e-03  -0.973   0.3321    
## ceotensq     3.761e-04  1.916e-04   1.963   0.0512 .  
## comtensq    -1.788e-04  7.236e-05  -2.471   0.0144 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5024 on 171 degrees of freedom
## Multiple R-squared:  0.3324, Adjusted R-squared:  0.3129 
## F-statistic: 17.03 on 5 and 171 DF,  p-value: 1.195e-13
r_squared <- summary(model12)$r.squared
r_squared
## [1] 0.3323998
The values of R square of models are similiar.
Chapter 9 - question 5
data9 <- wooldridge::campus
head(data9)
##   enroll priv police crime   lcrime  lenroll  lpolice
## 1  21836    0     24   446 6.100319 9.991315 3.178054
## 2   6485    0     13     1 0.000000 8.777247 2.564949
## 3   2123    0      3     1 0.000000 7.660585 1.098612
## 4   8240    0     17   121 4.795791 9.016756 2.833213
## 5  19793    0     30   470 6.152733 9.893084 3.401197
## 6   3256    1      9    25 3.218876 8.088255 2.197225
b0 <- -6.63
se_b0 <- 1.03
b1 <- 1.27
se_b1 <- 0.11
n <- nrow(data9)
t_stat <- (b1 - 1) / se_b1
df <- n - 2
critical_value <- qt(0.95, df)
if (t_stat > critical_value) {
  cat("Reject the null hypothesis H0: B1 = 1 in favor of H1: B1 > 1 at the 5% level.\n")
} else {
  cat("Fail to reject the null hypothesis H0: B1 = 1 at the 5% level.\n")
}
## Reject the null hypothesis H0: B1 = 1 in favor of H1: B1 > 1 at the 5% level.
Chapter 9 - C3
data10 <- wooldridge::jtrain
head(data10)
##   year  fcode employ    sales avgsal scrap rework tothrs union grant d89 d88
## 1 1987 410032    100 47000000  35000    NA     NA     12     0     0   0   0
## 2 1988 410032    131 43000000  37000    NA     NA      8     0     0   0   1
## 3 1989 410032    123 49000000  39000    NA     NA      8     0     0   1   0
## 4 1987 410440     12  1560000  10500    NA     NA     12     0     0   0   0
## 5 1988 410440     13  1970000  11000    NA     NA     12     0     0   0   1
## 6 1989 410440     14  2350000  11500    NA     NA     10     0     0   1   0
##   totrain    hrsemp lscrap  lemploy   lsales lrework  lhrsemp lscrap_1 grant_1
## 1     100 12.000000     NA 4.605170 17.66566      NA 2.564949       NA       0
## 2      50  3.053435     NA 4.875197 17.57671      NA 1.399565       NA       0
## 3      50  3.252033     NA 4.812184 17.70733      NA 1.447397       NA       0
## 4      12 12.000000     NA 2.484907 14.26020      NA 2.564949       NA       0
## 5      13 12.000000     NA 2.564949 14.49354      NA 2.564949       NA       0
## 6      14 10.000000     NA 2.639057 14.66993      NA 2.397895       NA       0
##   clscrap cgrant    clemploy    clsales   lavgsal   clavgsal cgrant_1
## 1      NA      0          NA         NA 10.463103         NA       NA
## 2      NA      0  0.27002716 -0.0889492 10.518673 0.05556965        0
## 3      NA      0 -0.06301308  0.1306210 10.571317 0.05264378        0
## 4      NA      0          NA         NA  9.259130         NA       NA
## 5      NA      0  0.08004260  0.2333469  9.305651 0.04652023        0
## 6      NA      0  0.07410812  0.1763821  9.350102 0.04445171        0
##      chrsemp    clhrsemp
## 1         NA          NA
## 2 -8.9465647 -1.16538453
## 3  0.1985974  0.04783237
## 4         NA          NA
## 5  0.0000000  0.00000000
## 6 -2.0000000 -0.16705394
Question (answer)ii). There is no signifance that a job training grant lower a firm's scrap rate. The p-value of grant is 0.8895, which indicates grant is not a significant variable.
model13 <- subset(data10, year == 1988)
model14 <- lm(log(scrap) ~ grant, data = model13)
summary(model14)
## 
## Call:
## lm(formula = log(scrap) ~ grant, data = model13)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4043 -0.9536 -0.0465  0.9636  2.8103 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.4085     0.2406   1.698   0.0954 .
## grant         0.0566     0.4056   0.140   0.8895  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.423 on 52 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.0003744,  Adjusted R-squared:  -0.01885 
## F-statistic: 0.01948 on 1 and 52 DF,  p-value: 0.8895
Question (answer) iii). Iscrap_1 is significant. But, grant is still not significant.
model15 <- lm(log(scrap) ~ grant + lscrap_1, data = model13)
summary(model15)
## 
## Call:
## lm(formula = log(scrap) ~ grant + lscrap_1, data = model13)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9146 -0.1763  0.0057  0.2308  1.5991 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.02124    0.08910   0.238   0.8126    
## grant       -0.25397    0.14703  -1.727   0.0902 .  
## lscrap_1     0.83116    0.04444  18.701   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5127 on 51 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.8728, Adjusted R-squared:  0.8678 
## F-statistic: 174.9 on 2 and 51 DF,  p-value: < 2.2e-16
Question (answer) V)
test_lscrap_1 <- summary(model15)
p_value_lscrap_1 <- test_lscrap_1$coefficients["lscrap_1", "Pr(>|t|)"]
cat("Test for lscrap_1 parameter:", ifelse(p_value_lscrap_1 < 0.05, "Statistically significant", "Not significant"), "\n")
## Test for lscrap_1 parameter: Statistically significant
Chapter 9 - C4
data11 <- wooldridge::infmrt
model16 <- subset(data11, year == 1990)
model17 <- lm(infmort ~ log(pcinc) + log(physic) + log(popul) + DC, data = model16)
summary(model17)
## 
## Call:
## lm(formula = infmort ~ log(pcinc) + log(physic) + log(popul) + 
##     DC, data = model16)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4964 -0.8076  0.0000  0.9358  2.6077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  23.9548    12.4195   1.929  0.05994 .  
## log(pcinc)   -0.5669     1.6412  -0.345  0.73135    
## log(physic)  -2.7418     1.1908  -2.303  0.02588 *  
## log(popul)    0.6292     0.1911   3.293  0.00191 ** 
## DC           16.0350     1.7692   9.064 8.43e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.246 on 46 degrees of freedom
## Multiple R-squared:  0.691,  Adjusted R-squared:  0.6641 
## F-statistic: 25.71 on 4 and 46 DF,  p-value: 3.146e-11
The coefficient of DC is 16.035, which means when DC increases by one unit, infmort increases by 16.035 (when other all variables are fixed). The p-value of DC is lower than 0.05, which means DC is statistically significant.
Chapter 10 - Question 1
Decide if you agree or disagree with each of the following statements and give a brief explanation of your decision:
(i) Like cross-sectional observations, we can assume that most time series observations are independently distributed.
(ii) The OLS estimator in a time series regression is unbiased under the first three 
Gauss-Markov assumptions.
(iii) A trending variable cannot be used as the dependent variable in multiple regression 
analysis.
(iv) Seasonality is not an issue when using annual time series observations
Answer
(i) Like cross-sectional observations, we can assume that most time series observations are independently distributed.
Disagree: Time series observations are often not independently distributed because they can exhibit serial correlation, where the current observation is correlated with past observations. Time series data points are usually correlated over time, and independence assumptions may not hold.

(ii) The OLS estimator in a time series regression is unbiased under the first three Gauss-Markov assumptions.
Disagree: The Gauss-Markov assumptions include linearity, no perfect multicollinearity, exogeneity, and homoscedasticity. Time series data may violate the assumption of independence over time, and if there is serial correlation, the OLS estimator may not be unbiased.

(iii) A trending variable cannot be used as the dependent variable in multiple regression analysis.
Disagree: A trending variable can be used as the dependent variable in multiple regression analysis. However, one needs to be cautious about potential issues like spurious regression, where unrelated trends in different variables may lead to a false correlation. Detrending or using appropriate methods can be applied to handle trends.

(iv) Seasonality is not an issue when using annual time series observations.
Disagree: Seasonality can still be an issue in annual time series observations. Even with annual data, there might be patterns or cycles within each year that need to be considered. Ignoring seasonality can lead to misspecification in the model.
Chapter 10 - Question 5
Suppose you have quarterly data on new housing starts, interest rates, and real per capita income. Specify a model for housing starts that accounts for possible trends and seasonality in the variables.
Answer
When modeling quarterly data on new housing starts, interest rates, and real per capita income, it's important to account for potential trends and seasonality in the variables. A common approach is to use a time series model such as a SARIMA (Seasonal Autoregressive Integrated Moving Average) model. Here's how you might specify a model:

Let's denote:
y(t):Housing starts at time 
x(1,t) : Interest rates at time 
x(2,t): Real per capita income at time 
A simple SARIMA model with a linear trend, seasonality, and exogenous variables might look like this:
y(t)=b0+b1t+b2x(1,t)+b3(x2,t)+e(t)
Here:
b0 is the intercept,
b1 represents the linear trend over time,
b2 and b3 represent the impact of interest rates and real per capita income on housing starts, respectively,
e(t) is the error term.
Chapter 10 - Question C1
In October 1979, the Federal Reserve changed its policy of using finely tuned interest rate adjustments and instead began targeting the money supply. Using the data in INTDEF.RAW, define a dummy variable equal to 1 for years after 1979. Include this dummy in equation (10.15) to see if there is a shift in the interest rate equation after 
1979. What do you conclude?
data13 <- wooldridge::intdef
head(data13)
##   year   i3  inf  rec  out        def i3_1 inf_1      def_1        ci3 cinf
## 1 1948 1.04  8.1 16.2 11.6 -4.6000004   NA    NA         NA         NA   NA
## 2 1949 1.10 -1.2 14.5 14.3 -0.1999998 1.04   8.1 -4.6000004 0.06000006 -9.3
## 3 1950 1.22  1.3 14.4 15.6  1.2000008 1.10  -1.2 -0.1999998 0.12000000  2.5
## 4 1951 1.55  7.9 16.1 14.2 -1.9000006 1.22   1.3  1.2000008 0.32999992  6.6
## 5 1952 1.77  1.9 19.0 19.4  0.3999996 1.55   7.9 -1.9000006 0.22000003 -6.0
## 6 1953 1.93  0.8 18.7 20.4  1.6999989 1.77   1.9  0.3999996 0.15999997 -1.1
##        cdef y77
## 1        NA   0
## 2  4.400001   0
## 3  1.400001   0
## 4 -3.100001   0
## 5  2.300000   0
## 6  1.299999   0
data13$dummy <- as.integer(data13$year > 1979)
data13$dummy
##  [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1
## [39] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
model15 <- lm(i3 ~ inf + def , data = data13)
summary(model15)
## 
## Call:
## lm(formula = i3 ~ inf + def, data = data13)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.9948 -1.1694  0.1959  0.9602  4.7224 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.73327    0.43197   4.012  0.00019 ***
## inf          0.60587    0.08213   7.376 1.12e-09 ***
## def          0.51306    0.11838   4.334 6.57e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.843 on 53 degrees of freedom
## Multiple R-squared:  0.6021, Adjusted R-squared:  0.5871 
## F-statistic: 40.09 on 2 and 53 DF,  p-value: 2.483e-11
model16 <- lm(i3 ~ inf + def + y77 , data = data13)
summary(model16)
## 
## Call:
## lm(formula = i3 ~ inf + def + y77, data = data13)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.4048 -0.9632  0.2192  0.8497  4.3447 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1.40531    0.42239   3.327  0.00162 ** 
## inf          0.56884    0.07832   7.263 1.88e-09 ***
## def          0.36276    0.12337   2.940  0.00488 ** 
## y77          1.47773    0.52349   2.823  0.00673 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.733 on 52 degrees of freedom
## Multiple R-squared:  0.6549, Adjusted R-squared:  0.635 
## F-statistic:  32.9 on 3 and 52 DF,  p-value: 4.608e-12
Chapter 10 - Question C6
data14 <- wooldridge::fertil3
head(data14)
##     gfr    pe year t tsq  pe_1 pe_2 pe_3 pe_4 pill ww2 tcu      cgfr   cpe
## 1 124.7  0.00 1913 1   1    NA   NA   NA   NA    0   0   1        NA    NA
## 2 126.6  0.00 1914 2   4  0.00   NA   NA   NA    0   0   8  1.900002  0.00
## 3 125.0  0.00 1915 3   9  0.00    0   NA   NA    0   0  27 -1.599998  0.00
## 4 123.4  0.00 1916 4  16  0.00    0    0   NA    0   0  64 -1.599998  0.00
## 5 121.0 19.27 1917 5  25  0.00    0    0    0    0   0 125 -2.400002 19.27
## 6 119.8 23.94 1918 6  36 19.27    0    0    0    0   0 216 -1.199997  4.67
##   cpe_1 cpe_2 cpe_3 cpe_4 gfr_1    cgfr_1    cgfr_2    cgfr_3   cgfr_4 gfr_2
## 1    NA    NA    NA    NA    NA        NA        NA        NA       NA    NA
## 2    NA    NA    NA    NA 124.7        NA        NA        NA       NA    NA
## 3  0.00    NA    NA    NA 126.6  1.900002        NA        NA       NA 124.7
## 4  0.00     0    NA    NA 125.0 -1.599998  1.900002        NA       NA 126.6
## 5  0.00     0     0    NA 123.4 -1.599998 -1.599998  1.900002       NA 125.0
## 6 19.27     0     0     0 121.0 -2.400002 -1.599998 -1.599998 1.900002 123.4
Question i) Regress gfrt on t and t^2 and save the residuals. This gives a detrended gfrt, say, gf.
model17<- lm(gfr ~ t + tsq, data = data14)
summary(model17)
## 
## Call:
## lm(formula = gfr ~ t + tsq, data = data14)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -28.7519 -12.5333   0.3168  13.7611  28.7346 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 107.056263   6.049651  17.696   <2e-16 ***
## t             0.071697   0.382446   0.187    0.852    
## tsq          -0.007959   0.005077  -1.568    0.122    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.64 on 69 degrees of freedom
## Multiple R-squared:  0.3141, Adjusted R-squared:  0.2942 
## F-statistic:  15.8 on 2 and 69 DF,  p-value: 2.243e-06
residuals_gft <- resid(model17)
head(residuals_gft,6)
##        1        2        3        4        5        6 
## 17.58000 19.43218 17.80028 16.18430 13.78423 12.60009
Question (ii) Regress gf(t)on all of the variables in equation (10.35), including t and t^2. Compare the R-squared with that from (10.35). What do you conclude?
model19 <- lm(residuals_gft ~ pe + year + tsq + pe_1 + pe_2 + pe_3 + pe_4 + pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + cpe_3 + cpe_4 + gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + gfr_2 + t + tsq, data = data14)
summary(model19)
## Warning in summary.lm(model19): essentially perfect fit: summary may be
## unreliable
## 
## Call:
## lm(formula = residuals_gft ~ pe + year + tsq + pe_1 + pe_2 + 
##     pe_3 + pe_4 + pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + 
##     cpe_3 + cpe_4 + gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + 
##     gfr_2 + t + tsq, data = data14)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.511e-14 -3.343e-15  3.140e-16  3.383e-15  3.874e-14 
## 
## Coefficients: (6 not defined because of singularities)
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  3.003e+01  3.704e-12  8.106e+12   <2e-16 ***
## pe           9.797e-17  1.150e-16  8.520e-01   0.3982    
## year        -7.170e-02  1.920e-15 -3.734e+13   <2e-16 ***
## tsq          7.959e-03  5.644e-17  1.410e+14   <2e-16 ***
## pe_1        -2.552e-16  1.264e-16 -2.020e+00   0.0489 *  
## pe_2         3.617e-16  1.376e-16  2.629e+00   0.0114 *  
## pe_3        -2.201e-16  1.301e-16 -1.692e+00   0.0969 .  
## pe_4         1.078e-16  1.017e-16  1.060e+00   0.2944    
## pill        -5.558e-15  9.440e-15 -5.890e-01   0.5588    
## ww2          6.534e-15  1.055e-14  6.200e-01   0.5385    
## tcu         -2.996e-19  4.837e-19 -6.190e-01   0.5385    
## cgfr         1.000e+00  4.348e-16  2.300e+15   <2e-16 ***
## cpe                 NA         NA         NA       NA    
## cpe_1               NA         NA         NA       NA    
## cpe_2               NA         NA         NA       NA    
## cpe_3               NA         NA         NA       NA    
## cpe_4       -5.996e-17  1.031e-16 -5.820e-01   0.5635    
## gfr_1        1.000e+00  2.027e-16  4.933e+15   <2e-16 ***
## cgfr_1      -4.721e-17  4.252e-16 -1.110e-01   0.9120    
## cgfr_2      -4.426e-16  4.489e-16 -9.860e-01   0.3290    
## cgfr_3       4.138e-16  4.030e-16  1.027e+00   0.3096    
## cgfr_4      -7.363e-16  3.976e-16 -1.852e+00   0.0701 .  
## gfr_2               NA         NA         NA       NA    
## t                   NA         NA         NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.035e-14 on 49 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 9.627e+30 on 17 and 49 DF,  p-value: < 2.2e-16
Question (iii) Reestimate equation (10.35) but add t3 to the equation. Is this additional term statistically significant?
model20 <- lm(gfr ~ pe + year + tsq + pe_1 + pe_2 + pe_3 + pe_4 + pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + cpe_3 + cpe_4 + gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + gfr_2 + t + tsq + pe_3, data = data14)
summary(model20)
## Warning in summary.lm(model20): essentially perfect fit: summary may be
## unreliable
## 
## Call:
## lm(formula = gfr ~ pe + year + tsq + pe_1 + pe_2 + pe_3 + pe_4 + 
##     pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + cpe_3 + cpe_4 + 
##     gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + gfr_2 + t + tsq + 
##     pe_3, data = data14)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -4.781e-14 -3.372e-15  4.860e-16  3.509e-15  2.608e-14 
## 
## Coefficients: (6 not defined because of singularities)
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept) -2.866e-12  3.438e-12 -8.340e-01   0.4086    
## pe           2.736e-17  1.067e-16  2.560e-01   0.7987    
## year         1.502e-15  1.782e-15  8.430e-01   0.4035    
## tsq         -3.121e-17  5.238e-17 -5.960e-01   0.5541    
## pe_1        -2.128e-16  1.173e-16 -1.814e+00   0.0758 .  
## pe_2         3.108e-16  1.277e-16  2.434e+00   0.0186 *  
## pe_3        -1.396e-16  1.207e-16 -1.156e+00   0.2532    
## pe_4         7.685e-17  9.441e-17  8.140e-01   0.4196    
## pill        -1.228e-14  8.762e-15 -1.401e+00   0.1674    
## ww2          4.720e-15  9.790e-15  4.820e-01   0.6318    
## tcu          2.043e-19  4.489e-19  4.550e-01   0.6511    
## cgfr         1.000e+00  4.036e-16  2.478e+15   <2e-16 ***
## cpe                 NA         NA         NA       NA    
## cpe_1               NA         NA         NA       NA    
## cpe_2               NA         NA         NA       NA    
## cpe_3               NA         NA         NA       NA    
## cpe_4       -8.993e-17  9.569e-17 -9.400e-01   0.3519    
## gfr_1        1.000e+00  1.881e-16  5.315e+15   <2e-16 ***
## cgfr_1      -4.929e-16  3.946e-16 -1.249e+00   0.2175    
## cgfr_2      -4.788e-16  4.166e-16 -1.149e+00   0.2560    
## cgfr_3       1.570e-16  3.740e-16  4.200e-01   0.6765    
## cgfr_4      -7.097e-16  3.690e-16 -1.924e+00   0.0602 .  
## gfr_2               NA         NA         NA       NA    
## t                   NA         NA         NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.607e-15 on 49 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 1.495e+31 on 17 and 49 DF,  p-value: < 2.2e-16