rm(list=ls())                   
library(wooldridge)       
library(lmtest)                 # load package; to conduct hypothesis test using robust SE
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(plm) 

pnt <- data.frame(pntsprd)

W17/C1

Use the data in pntsprd.wf14 for this exercise.

(i) The variable favwin is a binary variable if the team favoured by the Las Vegas point spread wins. A linear probability model to estimate the probability that the favoured team wins is P(f avin = 1|spread) = β0 + β1spread. Explain why, if the spread incorporates all relevant information, we expect β0 = 0.5.

This question is not clear, but I guessing that B1 would be zero, in this case we should not have favorit and anyone would have 50% 50% chances

(ii) Estimate the model from part (i) by OLS. Test H0: : β0 = 0.5 against a two-sided alternative. Use both the usual and heteroskedasticity-robust standard errors.

fit <- lm(favwin~spread, pnt)
summary(fit) # https://search.r-project.org/CRAN/refmans/fixest/html/summary.fixest.html
## 
## Call:
## lm(formula = favwin ~ spread, data = pnt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9836 -0.1192  0.1519  0.3069  0.4037 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.576949   0.028235  20.434  < 2e-16 ***
## spread      0.019366   0.002339   8.281 9.32e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4017 on 551 degrees of freedom
## Multiple R-squared:  0.1107, Adjusted R-squared:  0.1091 
## F-statistic: 68.57 on 1 and 551 DF,  p-value: 9.324e-16
summary(fit, vcov = "HC2") #why they are the same?
## 
## Call:
## lm(formula = favwin ~ spread, data = pnt)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.9836 -0.1192  0.1519  0.3069  0.4037 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 0.576949   0.028235  20.434  < 2e-16 ***
## spread      0.019366   0.002339   8.281 9.32e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.4017 on 551 degrees of freedom
## Multiple R-squared:  0.1107, Adjusted R-squared:  0.1091 
## F-statistic: 68.57 on 1 and 551 DF,  p-value: 9.324e-16
coeftest(fit, vcov=vcovHC, type="HC1")  #use heteroskedasticity robust SE
## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) 0.5769492  0.0316568  18.225 < 2.2e-16 ***
## spread      0.0193655  0.0019218  10.077 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Intercept) 0.576949 0.028235 20.434 < 2e-16 spread 0.019366 0.002339 8.281 9.32e-16

(Intercept) 0.5769492 0.0316568 18.225 < 2.2e-16 spread 0.0193655 0.0019218 10.077 < 2.2e-16

(iii) Is spread statistically significant? What is the estimated probability that the favored team wins when spread = 10?

Yes, it is in both cases. 0.5769492 + 0.0193655 * 10 = 0.7706042

(iv) Now, estimate a probit model for P(f avwin = 1|spread). Interpret and test the null hypothesis that the intercept is zero. [Hint: Remember that Φ(0) = 0.5.]

fit <- glm(favwin~spread, pnt, family = binomial(link = "probit"))
summary(fit, vcov = "HC1") #why they are the same?
## 
## Call:
## glm(formula = favwin ~ spread, family = binomial(link = "probit"), 
##     data = pnt)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.69141   0.09976   0.45805   0.83300   1.12244  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.01059    0.10349  -0.102    0.918    
## spread       0.09246    0.01212   7.631 2.32e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 605.50  on 552  degrees of freedom
## Residual deviance: 527.12  on 551  degrees of freedom
## AIC: 531.12
## 
## Number of Fisher Scoring iterations: 5
coeftest(fit, vcov=vcovHC, type="HC1")  #use heteroskedasticity robust SE
## 
## z test of coefficients:
## 
##              Estimate Std. Error z value  Pr(>|z|)    
## (Intercept) -0.010592   0.101788 -0.1041    0.9171    
## spread       0.092463   0.011612  7.9626 1.685e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The coefficient B0 is insignificant meaning we do accept the Null hypothesis.

(v) Use the probit model to estimate the probability that the favored team wins when spread=10. Compare this with the LPM estimate from part (iii).

-0.010592 + 0.092463 * 10 = 0.914038 The end result is 0.8196516, it is higher than in (iii)

pnorm(0.914038)
## [1] 0.8196516

(vi) Add the variables f avhome, f av25, and und25 to the probit model and test joint significance of these variables using the likelihood ratio test. (How many df are in the chi-square distribution?) Interpret this result, focusing on the question of whether the spread incorporates all observable information prior to a game.

fit <- glm(favwin~spread+favhome+fav25+und25, pnt, family = binomial(link = "probit"))
summary(fit, vcov = "HC1") #why they are the same?
## 
## Call:
## glm(formula = favwin ~ spread + favhome + fav25 + und25, family = binomial(link = "probit"), 
##     data = pnt)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.7008   0.1028   0.4598   0.8220   1.2130  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -0.055180   0.129239  -0.427    0.669    
## spread       0.087884   0.012779   6.877 6.11e-12 ***
## favhome      0.148575   0.136871   1.086    0.278    
## fav25        0.003068   0.158790   0.019    0.985    
## und25       -0.219809   0.251272  -0.875    0.382    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 605.50  on 552  degrees of freedom
## Residual deviance: 525.28  on 548  degrees of freedom
## AIC: 535.28
## 
## Number of Fisher Scoring iterations: 6
#coeftest(fit, vcov=vcovHC, type="HC1")  #use heteroskedasticity robust SE

Chi-square should be 4. Spread still significant, but the coefficient is small. All the other coefficients are insignificant.