Chapter7 (1)

## 
## Call:
## lm(formula = sleep ~ totwrk + educ + age + agesq + male, data = sleep75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2378.00  -243.29     6.74   259.24  1350.19 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3840.83197  235.10870  16.336   <2e-16 ***
## totwrk        -0.16342    0.01813  -9.013   <2e-16 ***
## educ         -11.71332    5.86689  -1.997   0.0463 *  
## age           -8.69668   11.20746  -0.776   0.4380    
## agesq          0.12844    0.13390   0.959   0.3378    
## male          87.75243   34.32616   2.556   0.0108 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 417.7 on 700 degrees of freedom
## Multiple R-squared:  0.1228, Adjusted R-squared:  0.1165 
## F-statistic: 19.59 on 5 and 700 DF,  p-value: < 2.2e-16

(i) All other factors being equal, is there evidence that men sleep more than women? How strong is the evidence?

Pvalue is equal to 87.75/34.33=2.56. That means this value is significant according to the table. We can interpret that men sleep 87.75 minutes more than women.

(ii) Is there a statistically significant tradeoff between working and sleeping? What is the estimated tradeoff?

t significant=0.163/0.018=9.05. it is significant statistically When working time increased by one minute, sleeping time will be decreased by 0.163 which is equal to 9.78 minutes.

(iii) What other regression do you need to run to test the null hypothesis that, holding other factors fixed, age has no effect on sleeping?

We just have to exclude age variable.

## 
## Call:
## lm(formula = sleep ~ totwrk + educ + male, data = sleep75)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2380.27  -239.15     6.74   257.31  1370.63 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 3747.51727   81.00609  46.262  < 2e-16 ***
## totwrk        -0.16734    0.01794  -9.329  < 2e-16 ***
## educ         -13.88479    5.65757  -2.454  0.01436 *  
## male          90.96919   34.27441   2.654  0.00813 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 418 on 702 degrees of freedom
## Multiple R-squared:  0.1193, Adjusted R-squared:  0.1155 
## F-statistic: 31.69 on 3 and 702 DF,  p-value: < 2.2e-16

Rsq hasn’t quite changed so age has no effect on sleeping time.

Chapter7 (3)

## 
## Call:
## lm(formula = sat ~ hsize + hsizesq + female + black + I(female * 
##     black), data = gpa2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -570.45  -89.54   -5.24   85.41  479.13 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       1028.0972     6.2902 163.445  < 2e-16 ***
## hsize               19.2971     3.8323   5.035 4.97e-07 ***
## hsizesq             -2.1948     0.5272  -4.163 3.20e-05 ***
## female             -45.0915     4.2911 -10.508  < 2e-16 ***
## black             -169.8126    12.7131 -13.357  < 2e-16 ***
## I(female * black)   62.3064    18.1542   3.432 0.000605 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 133.4 on 4131 degrees of freedom
## Multiple R-squared:  0.08578,    Adjusted R-squared:  0.08468 
## F-statistic: 77.52 on 5 and 4131 DF,  p-value: < 2.2e-16

(i) Is there strong evidence that hsize2 should be included in the model? From this equation,what is the optimal high school size?

From this equation we don’t have the information about SE to calculate the significance of the coeficientes so we can not be sure. However R square is very small so seems that the equation is not very good. To calculate the optimal size we need to take the first derivative 19.30hsize − 2.19hsize2 the result is 4.4

(ii) Holding hsize fixed, what is the estimated difference in SAT score between nonblack females and non-black males?

For that we just need to use white female (female =1, black = 0) − 45.09female +62.31female = 17.22. This is just a simple comparison between white male and female

(iii) What is the estimated difference in SAT score between non-black males and black males?

− 169.81black

(iv) What is the estimated difference in SAT score between black females and non-black females?

− 169.81black +62.31female · black = -107.5

Chapter 7 (C1)

##  [1] "age"      "soph"     "junior"   "senior"   "senior5"  "male"    
##  [7] "campus"   "business" "engineer" "colGPA"   "hsGPA"    "ACT"     
## [13] "job19"    "job20"    "drive"    "bike"     "walk"     "voluntr" 
## [19] "PC"       "greek"    "car"      "siblings" "bgfriend" "clubs"   
## [25] "skipped"  "alcohol"  "gradMI"   "fathcoll" "mothcoll"

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

(i) Add variables mothcoll and fathcoll to the equation

## 
## Call:
## lm(formula = hsGPA ~ mothcoll + fathcoll, data = gpa1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.99342 -0.20982  0.00926  0.20926  0.60658 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.393421   0.046719  72.635   <2e-16 ***
## mothcoll     0.019080   0.057555   0.332    0.741    
## fathcoll    -0.002679   0.058303  -0.046    0.963    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3221 on 138 degrees of freedom
## Multiple R-squared:  0.0008268,  Adjusted R-squared:  -0.01365 
## F-statistic: 0.0571 on 2 and 138 DF,  p-value: 0.9445

(ii) Test for joint significance of mothcoll and fathcoll

## 
## t test of coefficients:
## 
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  3.3934207  0.0542083 62.5996   <2e-16 ***
## mothcoll     0.0190800  0.0571079  0.3341   0.7388    
## fathcoll    -0.0026795  0.0600842 -0.0446   0.9645    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(iii) Add hsGPA to the model

## Warning in anova.lmlist(object, ...): models with response '"PC"' removed
## because response differs from model 1

## Analysis of Variance Table
## 
## Response: hsGPA
##            Df  Sum Sq  Mean Sq F value Pr(>F)
## mothcoll    1  0.0116 0.011629  0.1121 0.7383
## fathcoll    1  0.0002 0.000219  0.0021 0.9634
## Residuals 138 14.3175 0.103750

Chapter7 (C2)

(i) Estimate the model

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98069 -0.21996  0.00707  0.24288  1.22822 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.395497   0.113225  47.653  < 2e-16 ***
## educ         0.065431   0.006250  10.468  < 2e-16 ***
## exper        0.014043   0.003185   4.409 1.16e-05 ***
## tenure       0.011747   0.002453   4.789 1.95e-06 ***
## married      0.199417   0.039050   5.107 3.98e-07 ***
## black       -0.188350   0.037667  -5.000 6.84e-07 ***
## south       -0.090904   0.026249  -3.463 0.000558 ***
## urban        0.183912   0.026958   6.822 1.62e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3655 on 927 degrees of freedom
## Multiple R-squared:  0.2526, Adjusted R-squared:  0.2469 
## F-statistic: 44.75 on 7 and 927 DF,  p-value: < 2.2e-16

(ii) Add the variables exper and tenure? to the equation and show that they are jointly insignificant at even the 20% level.

## Analysis of Variance Table
## 
## Model 1: log(wage) ~ educ + exper + tenure + married + black + south + 
##     urban
## Model 2: log(wage) ~ educ + exper + tenure + married + black + south + 
##     urban + exper + tenure
##   Res.Df    RSS Df Sum of Sq F Pr(>F)
## 1    927 123.82                      
## 2    927 123.82  0         0

(iii) Extend the original model to allow the return to education to depend on race and test whether the return to education does depend on race.

## 
## Call:
## lm(formula = log(wage) ~ educ * black + exper + tenure + married + 
##     south + urban, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.97782 -0.21832  0.00475  0.24136  1.23226 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.374817   0.114703  46.859  < 2e-16 ***
## educ         0.067115   0.006428  10.442  < 2e-16 ***
## black        0.094809   0.255399   0.371 0.710561    
## exper        0.013826   0.003191   4.333 1.63e-05 ***
## tenure       0.011787   0.002453   4.805 1.80e-06 ***
## married      0.198908   0.039047   5.094 4.25e-07 ***
## south       -0.089450   0.026277  -3.404 0.000692 ***
## urban        0.183852   0.026955   6.821 1.63e-11 ***
## educ:black  -0.022624   0.020183  -1.121 0.262603    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3654 on 926 degrees of freedom
## Multiple R-squared:  0.2536, Adjusted R-squared:  0.2471 
## F-statistic: 39.32 on 8 and 926 DF,  p-value: < 2.2e-16

(iv) What is the estimated wage differential between married blacks and married nonblacks?

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98069 -0.21996  0.00707  0.24288  1.22822 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.395497   0.113225  47.653  < 2e-16 ***
## educ         0.065431   0.006250  10.468  < 2e-16 ***
## exper        0.014043   0.003185   4.409 1.16e-05 ***
## tenure       0.011747   0.002453   4.789 1.95e-06 ***
## married      0.199417   0.039050   5.107 3.98e-07 ***
## black       -0.188350   0.037667  -5.000 6.84e-07 ***
## south       -0.090904   0.026249  -3.463 0.000558 ***
## urban        0.183912   0.026958   6.822 1.62e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3655 on 927 degrees of freedom
## Multiple R-squared:  0.2526, Adjusted R-squared:  0.2469 
## F-statistic: 44.75 on 7 and 927 DF,  p-value: < 2.2e-16

## <NA> 
##   NA

Chapter 8 (1)

Which of the following are consequences of heteroskedasticity?

(i) The OLS estimators, b^ j, are inconsistent.

*The OLS estimators, b^j, are still unbiased, but they are no longer efficient (i.e., not BLUE - Best Linear Unbiased Estimators). Heteroskedasticity does not make the OLS estimators inconsistent, but it affects their efficiency.

(ii) The usual F statistic no longer has an F distribution.

*The usual F statistic for testing overall significance may not have an F distribution under heteroskedasticity. This can lead to incorrect inference in hypothesis testing.

(iii) The OLS estimators are no longer BLUE.

*As mentioned in (i), the OLS estimators are no longer BLUE when heteroskedasticity is present. The Best Linear Unbiased Estimators (BLUE) property relies on the assumption of homoskedasticity. All three statements are consequences of heteroskedasticity. It is important to detect and address heteroskedasticity to obtain valid and efficient inference in regression analysis. Common remedies include using heteroskedasticity-robust standard errors or transforming the data to stabilize the variance.

The correct statements are (ii) and (iii).

Chapter8 (5)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.4
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.0
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

## 'data.frame':    807 obs. of  10 variables:
##  $ educ    : num  16 16 12 13.5 10 6 12 15 12 12 ...
##  $ cigpric : num  60.5 57.9 57.7 57.9 58.3 ...
##  $ white   : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ age     : int  46 40 58 30 17 86 35 48 48 31 ...
##  $ income  : int  20000 30000 30000 20000 20000 6500 20000 30000 20000 20000 ...
##  $ cigs    : int  0 0 3 0 0 0 0 0 0 0 ...
##  $ restaurn: int  0 0 0 0 0 0 0 0 0 0 ...
##  $ lincome : num  9.9 10.3 10.3 9.9 9.9 ...
##  $ agesq   : int  2116 1600 3364 900 289 7396 1225 2304 2304 961 ...
##  $ lcigpric: num  4.1 4.06 4.05 4.06 4.07 ...
##  - attr(*, "time.stamp")= chr "25 Jun 2011 23:03"

##  [1] "educ"     "cigpric"  "white"    "age"      "income"   "cigs"    
##  [7] "restaurn" "lincome"  "agesq"    "lcigpric"

(i) Differences between Standard Errors

## (Intercept)        educ       exper      tenure     married       black 
## 0.113225045 0.006250395 0.003185185 0.002452973 0.039050151 0.037666636 
##       south       urban 
## 0.026248508 0.026958329

(ii) Effect of Education on Smoking Probability

##      educ 
## 0.2617229

(iii) Age Effect on Smoking Probability

## <NA> 
##   NA

## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 23.753, df = 7, p-value = 0.001259

(iv) Interpretation of Coefficient on ‘restaurn’

Holding other factors in the equation fixed, a person in a state with restaurant smoking restrictions has a .101 lower chance of smoking.

coef_restaurant <- coef(model)["restaurn"]
coef_restaurant

## <NA> 
##   NA

(v) Predicted Probability for Person 206

The probability of smoking for person number 206 is: p= 0.656-0.069* log(67.44)+0.012* log(6500)-0.029* 16 + 0.020* 77-0.00026 * (77^2)-0.101 * 0-0.026*0 = 0.0052 ~ 0.52%

Chapter8 (C4)

##   state district democA voteA expendA expendB prtystrA lexpendA lexpendB
## 1    AL        7      1    68 328.296   8.737       41 5.793916 2.167567
## 2    AK        1      0    62 626.377 402.477       60 6.439952 5.997638
## 3    AZ        2      1    73  99.607   3.065       55 4.601233 1.120048
## 4    AZ        3      0    69 319.690  26.281       64 5.767352 3.268846
## 5    AR        3      0    75 159.221  60.054       66 5.070293 4.095244
## 6    AR        4      1    69 570.155  21.393       46 6.345908 3.063064
##     shareA
## 1 97.40767
## 2 60.88104
## 3 97.01476
## 4 92.40370
## 5 72.61247
## 6 96.38355

(i) Estimate the model and obtain OLS residuals

## 
## Call:
## lm(formula = voteA ~ prtystrA + democA + log(expendA) + log(expendB), 
##     data = data5)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.576  -4.864  -1.146   4.903  24.566 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.66141    4.73604   7.952 2.56e-13 ***
## prtystrA      0.25192    0.07129   3.534  0.00053 ***
## democA        3.79294    1.40652   2.697  0.00772 ** 
## log(expendA)  5.77929    0.39182  14.750  < 2e-16 ***
## log(expendB) -6.23784    0.39746 -15.694  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.573 on 168 degrees of freedom
## Multiple R-squared:  0.8012, Adjusted R-squared:  0.7964 
## F-statistic: 169.2 on 4 and 168 DF,  p-value: < 2.2e-16

## 
## Call:
## lm(formula = residuals ~ prtystrA + democA + log(expendA) + log(expendB), 
##     data = data5)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -18.576  -4.864  -1.146   4.903  24.566 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)
## (Intercept)  -1.183e-14  4.736e+00       0        1
## prtystrA      1.493e-16  7.129e-02       0        1
## democA        1.843e-15  1.407e+00       0        1
## log(expendA) -3.811e-16  3.918e-01       0        1
## log(expendB)  1.119e-15  3.975e-01       0        1
## 
## Residual standard error: 7.573 on 168 degrees of freedom
## Multiple R-squared:  5.525e-32,  Adjusted R-squared:  -0.02381 
## F-statistic: 2.32e-30 on 4 and 168 DF,  p-value: 1

(ii) Breusch-Pagan test for heteroskedasticity

## 
##  studentized Breusch-Pagan test
## 
## data:  model9
## BP = 9.0934, df = 4, p-value = 0.05881

(iii) White test for heteroskedasticity using F-statistic

## [1] "F-statistic: 2.33011268371627 P-value: 0.0580575140885532"

Chapter8 (C13)

(i) Estimate the model with robust standard errors

model <- lm(children ~ age + I(age^2) + educ + electric + urban, data = data)

robust_model <- coeftest(model, vcov = vcovHC(model, type = “HC1”))

## 
## Call:
## lm(formula = children ~ age + I(age^2) + educ + electric + urban, 
##     data = data_8C13)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.9012 -0.7136 -0.0039  0.7119  7.4318 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.2225162  0.2401888 -17.580  < 2e-16 ***
## age          0.3409255  0.0165082  20.652  < 2e-16 ***
## I(age^2)    -0.0027412  0.0002718 -10.086  < 2e-16 ***
## educ        -0.0752323  0.0062966 -11.948  < 2e-16 ***
## electric    -0.3100404  0.0690045  -4.493 7.20e-06 ***
## urban       -0.2000339  0.0465062  -4.301 1.74e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.452 on 4352 degrees of freedom
## Multiple R-squared:  0.5734, Adjusted R-squared:  0.5729 
## F-statistic:  1170 on 5 and 4352 DF,  p-value: < 2.2e-16

(ii) Add religious dummy variables and test joint significance

## 
## Call:
## lm(formula = log(wage) ~ educ + exper + tenure + married + black + 
##     south + urban, data = wage2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.98069 -0.21996  0.00707  0.24288  1.22822 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.395497   0.113225  47.653  < 2e-16 ***
## educ         0.065431   0.006250  10.468  < 2e-16 ***
## exper        0.014043   0.003185   4.409 1.16e-05 ***
## tenure       0.011747   0.002453   4.789 1.95e-06 ***
## married      0.199417   0.039050   5.107 3.98e-07 ***
## black       -0.188350   0.037667  -5.000 6.84e-07 ***
## south       -0.090904   0.026249  -3.463 0.000558 ***
## urban        0.183912   0.026958   6.822 1.62e-11 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3655 on 927 degrees of freedom
## Multiple R-squared:  0.2526, Adjusted R-squared:  0.2469 
## F-statistic: 44.75 on 7 and 927 DF,  p-value: < 2.2e-16

##                           Robust SE
## (Intercept)  5.39549702 0.113796565
## educ         0.06543073 0.006445195
## exper        0.01404301 0.003261112
## tenure       0.01174728 0.002553205
## married      0.19941705 0.040126899
## black       -0.18834991 0.037030313
## south       -0.09090366 0.027505137
## urban        0.18391207 0.027262390

(iii) Obtain fitted values and residuals, regress i on 9, 12, and test joint significance

##   (Intercept)          educ         exper        tenure       married 
## 4.716024e-250  4.939470e-23  1.837738e-05  4.789239e-06  7.986371e-07 
##         black         south         urban 
##  4.416935e-07  9.862904e-04  2.672972e-11

(iv) Assess the practical importance of heteroskedasticity

## 
## Call:
## lm(formula = residuals^2 ~ fitted_values)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.1616 -0.1178 -0.0770  0.0238  3.7877 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)  
## (Intercept)    0.50924    0.25575   1.991   0.0468 *
## fitted_values -0.05559    0.03771  -1.474   0.1408  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2439 on 933 degrees of freedom
## Multiple R-squared:  0.002324,   Adjusted R-squared:  0.001254 
## F-statistic: 2.173 on 1 and 933 DF,  p-value: 0.1408

Chapter 9 (1)

Before Adding ceoten and comten: Initial R-squared (R?): 353 (n = 177)
After Adding ceoten and comten: Updated R-squared (R?): 375

## 
##  RESET test
## 
## data:  model_before
## RESET = 3.5014, df1 = 1, df2 = 170, p-value = 0.06303

## 
##  RESET test
## 
## data:  model_after
## RESET = 3.5014, df1 = 1, df2 = 170, p-value = 0.06303

Chapter9 (5)

Exogenous Sample Selection in Example 4.4

In Example 4.4, we estimated a model relating the number of campus crimes to student enrollment for a sample of colleges. It’s important to note that the sample used in this analysis was not a random sample of all colleges in the United States in 1992. This is because many schools did not report campus crimes.

Exogenous Sample Selection

Exogenous sample selection occurs when the process of selecting the sample is unrelated to the dependent variable or outcomes of interest.

Conditions for Exogeneity

Independence from Outcomes: For the failure to report crimes to be exogenous, it should be independent of the actual crime rates on campuses. In other words, colleges that do not report crimes should not systematically differ in their crime rates compared to those that do report.
Randomness in Non-Reporting: If the decision of a college to report or not report crimes is random and unrelated to the true crime situation, it can be considered exogenous sample selection.

Considerations

It’s important to critically assess whether these conditions hold in the context of the data and the sample used in Example 4.4.

## Reject the null hypothesis H0: B1 = 1 in favor of H1: B1 > 1 at the 5% level.

Chapter9 (C3)

(i) Simple regression model

Unobserved factors in u may be positively connected with grant for a variety of reasons, including the company’s resources (which have a positive association with grant), the industry, the location, trainee performance, etc.

## Test for lscrap_1 parameter: Statistically significant

(ii) Estimate the simple regression model using the data for 1988

The p-value of 0.8895 suggests that the coefficient of grant, which is 0.0566, is not statistically different from 0. Therefore, we cannot say that a company’s scrap rate is reduced by receiving a job training grant.

## [1] 54

## 
## Call:
## lm(formula = log(scrap) ~ grant, data = data_9C3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.4043 -0.9536 -0.0465  0.9636  2.8103 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)  
## (Intercept)   0.4085     0.2406   1.698   0.0954 .
## grant         0.0566     0.4056   0.140   0.8895  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.423 on 52 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.0003744,  Adjusted R-squared:  -0.01885 
## F-statistic: 0.01948 on 1 and 52 DF,  p-value: 0.8895

(iii) Add explanatory variable

## 
## Call:
## lm(formula = log(scrap1988) ~ grant1988 + log(scrap1987), data = data_9C3_3_tran)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.9146 -0.1763  0.0057  0.2308  1.5991 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     0.02124    0.08910   0.238   0.8126    
## grant1988      -0.25397    0.14703  -1.727   0.0902 .  
## log(scrap1987)  0.83116    0.04444  18.701   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5127 on 51 degrees of freedom
##   (103 observations deleted due to missingness)
## Multiple R-squared:  0.8728, Adjusted R-squared:  0.8678 
## F-statistic: 174.9 on 2 and 51 DF,  p-value: < 2.2e-16

(iv) P-value test

## [1] "One-sided p-value:"

## [1] 0.04508135

(v) Notable differences

*Following the addition of the explanatory variable log(scrap1987), the estimated values turn negative and the significance of grant 1988 increases. To be more precise, the coefficient of grant1988 is -0.254, meaning that, on average, companies with grants have a 25.4% lower scrap rate than companies without grants. Statistically significant at the 5% level, this result outperforms the one-sided alternative H1: Bgrant <0. The reason for this is that the model yielded a p-value of 0.045 <0.05.

Chapter9 (C4)

(i) Estimate the Model and Obtain OLS Residuals

## 
## Call:
## lm(formula = infmort ~ log(pcinc) + log(physic) + log(popul) + 
##     DC, data = infmrt_1990)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.4964 -0.8076  0.0000  0.9358  2.6077 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  23.9548    12.4195   1.929  0.05994 .  
## log(pcinc)   -0.5669     1.6412  -0.345  0.73135    
## log(physic)  -2.7418     1.1908  -2.303  0.02588 *  
## log(popul)    0.6292     0.1911   3.293  0.00191 ** 
## DC           16.0350     1.7692   9.064 8.43e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.246 on 46 degrees of freedom
## Multiple R-squared:  0.691,  Adjusted R-squared:  0.6641 
## F-statistic: 25.71 on 4 and 46 DF,  p-value: 3.146e-11

(ii) Breusch-Pagan Test for Heteroskedasticity

## 
##  studentized Breusch-Pagan test
## 
## data:  model
## BP = 23.753, df = 7, p-value = 0.001259

Chapter 10 (1)

(i) Independently distributed time series observations

independence_statement <- “Disagree” independence_explanation <- “Time series data often exhibits autocorrelation, violating independence assumptions.”

(ii) Unbiased OLS estimator in time series regression

ols_statement <- “Disagree” ols_explanation <- “Time series often violates OLS assumptions, leading to bias.”

(iv) Seasonality in annual time series observations

seasonality_statement <- “Disagree” seasonality_explanation <- “Seasonality can exist in annual data, affecting analysis.”

Chapter10 (5). Model for Housing Starts

Assuming data frame ‘data’ contains columns: housing_starts, interest_rates, real_income, and date

model_housing_starts <- lm(housing_starts ~ trend_variable + seasonality_variable + interest_rates + real_income, data = data)

Chapter10 (C1). Using a Dummy Variable for Federal Reserve Policy Change

Create a dummy variable for the policy change after 1979

## 
## Call:
## lm(formula = inf ~ dummy + ci3 + cdef + cinf, data = intdef)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.1867 -1.8047 -0.8382  0.9943  6.7831 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   3.3937     0.5193   6.535 3.21e-08 ***
## dummy         0.9400     0.7977   1.178  0.24423    
## ci3           0.4391     0.3172   1.385  0.17233    
## cdef          0.4382     0.3370   1.300  0.19954    
## cinf          0.5707     0.2103   2.714  0.00909 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.799 on 50 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.2012, Adjusted R-squared:  0.1373 
## F-statistic: 3.148 on 4 and 50 DF,  p-value: 0.02196

## Conclusion regarding the model:

## From the regression results:

## - The policy change after 1979 represented by the 'dummy' variable doesn't appear to have a statistically significant impact on CPI inflation rates (p-value = 0.24423).

## - Among additional variables, only 'cinf' (change in federal outlays minus federal receipts) shows a statistically significant relationship with CPI inflation rates (p-value = 0.00909).

## - The overall model explains a small proportion of the variance in CPI inflation rates (Adjusted R-squared = 0.1373).

## Therefore, while 'cinf' seems to be related to CPI inflation rates, the policy change after 1979, as represented by the 'dummy' variable, does not show a significant impact in this model.

Chapter10 (C6)

(i): Regress gfr on t and tsq to obtain the residuals (gft)

model_t_tsq <- lm(gfr ~ t + tsq, data = fertil3) residuals_gft <- resid(model_t_tsq)

(ii): Regress gft on all variables from equation (10.35), including t and tsq

## Warning in summary.lm(model_10_35): essentially perfect fit: summary may be
## unreliable

## 
## Call:
## lm(formula = residuals_gft ~ pe + year + tsq + pe_1 + pe_2 + 
##     pe_3 + pe_4 + pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + 
##     cpe_3 + cpe_4 + gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + 
##     gfr_2 + t + tsq, data = fertil3)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -5.495e-14 -2.374e-15 -2.200e-17  2.638e-15  3.812e-14 
## 
## Coefficients: (6 not defined because of singularities)
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept)  3.003e+01  4.452e-12  6.746e+12   <2e-16 ***
## pe          -2.624e-16  1.381e-16 -1.900e+00   0.0634 .  
## year        -7.170e-02  2.307e-15 -3.107e+13   <2e-16 ***
## tsq          7.959e-03  6.783e-17  1.173e+14   <2e-16 ***
## pe_1         3.803e-16  1.519e-16  2.504e+00   0.0157 *  
## pe_2         2.118e-17  1.653e-16  1.280e-01   0.8986    
## pe_3        -2.758e-16  1.563e-16 -1.764e+00   0.0839 .  
## pe_4         1.363e-17  1.222e-16  1.120e-01   0.9117    
## pill         5.956e-15  1.134e-14  5.250e-01   0.6020    
## ww2         -3.761e-15  1.268e-14 -2.970e-01   0.7679    
## tcu          1.130e-18  5.813e-19  1.945e+00   0.0576 .  
## cgfr         1.000e+00  5.225e-16  1.914e+15   <2e-16 ***
## cpe                 NA         NA         NA       NA    
## cpe_1               NA         NA         NA       NA    
## cpe_2               NA         NA         NA       NA    
## cpe_3               NA         NA         NA       NA    
## cpe_4       -2.998e-17  1.239e-16 -2.420e-01   0.8098    
## gfr_1        1.000e+00  2.436e-16  4.105e+15   <2e-16 ***
## cgfr_1      -2.649e-16  5.109e-16 -5.190e-01   0.6064    
## cgfr_2       4.872e-16  5.394e-16  9.030e-01   0.3708    
## cgfr_3      -7.235e-16  4.843e-16 -1.494e+00   0.1416    
## cgfr_4       8.808e-19  4.778e-16  2.000e-03   0.9985    
## gfr_2               NA         NA         NA       NA    
## t                   NA         NA         NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.244e-14 on 49 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 6.667e+30 on 17 and 49 DF,  p-value: < 2.2e-16

(iii): Re-estimate equation (10.35) but add the ‘pe_3’ as an additional variable to check stat. significance.

## Warning in summary.lm(model_with_pe_3): essentially perfect fit: summary may be
## unreliable

## 
## Call:
## lm(formula = gfr ~ pe + year + tsq + pe_1 + pe_2 + pe_3 + pe_4 + 
##     pill + ww2 + tcu + cgfr + cpe + cpe_1 + cpe_2 + cpe_3 + cpe_4 + 
##     gfr_1 + cgfr_1 + cgfr_2 + cgfr_3 + cgfr_4 + gfr_2 + t + tsq + 
##     pe_3, data = fertil3)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -6.294e-14 -3.614e-15  3.870e-16  3.960e-15  5.021e-14 
## 
## Coefficients: (6 not defined because of singularities)
##               Estimate Std. Error    t value Pr(>|t|)    
## (Intercept) -5.849e-12  4.744e-12 -1.233e+00    0.224    
## pe          -6.052e-17  1.472e-16 -4.110e-01    0.683    
## year         3.028e-15  2.459e-15  1.231e+00    0.224    
## tsq         -7.141e-17  7.229e-17 -9.880e-01    0.328    
## pe_1         1.081e-16  1.618e-16  6.680e-01    0.507    
## pe_2         3.984e-18  1.762e-16  2.300e-02    0.982    
## pe_3        -2.328e-17  1.666e-16 -1.400e-01    0.889    
## pe_4        -5.775e-17  1.303e-16 -4.430e-01    0.660    
## pill        -1.025e-14  1.209e-14 -8.480e-01    0.401    
## ww2          1.107e-14  1.351e-14  8.190e-01    0.417    
## tcu          5.447e-19  6.195e-19  8.790e-01    0.384    
## cgfr         1.000e+00  5.569e-16  1.796e+15   <2e-16 ***
## cpe                 NA         NA         NA       NA    
## cpe_1               NA         NA         NA       NA    
## cpe_2               NA         NA         NA       NA    
## cpe_3               NA         NA         NA       NA    
## cpe_4        5.996e-17  1.320e-16  4.540e-01    0.652    
## gfr_1        1.000e+00  2.596e-16  3.852e+15   <2e-16 ***
## cgfr_1      -7.823e-16  5.445e-16 -1.437e+00    0.157    
## cgfr_2       5.662e-17  5.749e-16  9.900e-02    0.922    
## cgfr_3      -5.129e-16  5.162e-16 -9.940e-01    0.325    
## cgfr_4      -1.866e-16  5.092e-16 -3.660e-01    0.716    
## gfr_2               NA         NA         NA       NA    
## t                   NA         NA         NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.326e-14 on 49 degrees of freedom
##   (5 observations deleted due to missingness)
## Multiple R-squared:      1,  Adjusted R-squared:      1 
## F-statistic: 7.853e+30 on 17 and 49 DF,  p-value: < 2.2e-16

Final_Exam

Maral Delger

2024-01-05