Applied logistic regression - Week Two Homework

Exercise 1

Use the Myopia Study (MYOPIA.dta)

Using the results of the output from Stata, assess the significance of the slope coefficient for SPHEQ using the likelihood ratio test and the Wald test. What assumptions are needed for the p-values computed for each of these tests to be valid? Are the results of these tests consistent with one another? What is the value of the deviance for the fitted model?

MYOPIA<-read.csv(file = "https://learn.canvas.net/courses/1179/files/461762/download?download_frd=1")#read the file from adress in the assignment

#run the regression

fit_hw2_ex1<-glm(MYOPIC~SPHEQ, data=MYOPIA, family = binomial())
stargazer::stargazer(fit_hw2_ex1,type = 'html')


	Dependent variable:

	MYOPIC

SPHEQ	-3.833^***
	(0.418)

Constant	0.054
	(0.207)


Observations	618
Log Likelihood	-168.672
Akaike Inf. Crit.	341.345

Note:	p<0.1; p<0.05; p<0.01

Firstly, lets explore a litle bit the fit_hw2_ex1 object:

str(fit_hw2_ex1)

List of 30
 $ coefficients     : Named num [1:2] 0.054 -3.833
  ..- attr(*, "names")= chr [1:2] "(Intercept)" "SPHEQ"
 $ residuals        : Named num [1:618] 1.78 -1.1 -1.01 8.09 -1.07 ...
  ..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
 $ fitted.values    : Named num [1:618] 0.563 0.0931 0.0114 0.1236 0.068 ...
  ..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
 $ effects          : Named num [1:618] 9.481 9.162 -0.107 2.633 -0.286 ...
  ..- attr(*, "names")= chr [1:618] "(Intercept)" "SPHEQ" "" "" ...
 $ R                : num [1:2, 1:2] -7.11 0 -2.57 -2.39
  ..- attr(*, "dimnames")=List of 2
  .. ..$ : chr [1:2] "(Intercept)" "SPHEQ"
  .. ..$ : chr [1:2] "(Intercept)" "SPHEQ"
 $ rank             : int 2
 $ qr               :List of 5
  ..$ qr   : num [1:618, 1:2] -7.1073 0.0409 0.0149 0.0463 0.0354 ...
  .. ..- attr(*, "dimnames")=List of 2
  .. .. ..$ : chr [1:618] "1" "2" "3" "4" ...
  .. .. ..$ : chr [1:2] "(Intercept)" "SPHEQ"
  ..$ rank : int 2
  ..$ qraux: num [1:2] 1.07 1.03
  ..$ pivot: int [1:2] 1 2
  ..$ tol  : num 1e-11
  ..- attr(*, "class")= chr "qr"
 $ family           :List of 12
  ..$ family    : chr "binomial"
  ..$ link      : chr "logit"
  ..$ linkfun   :function (mu)  
  ..$ linkinv   :function (eta)  
  ..$ variance  :function (mu)  
  ..$ dev.resids:function (y, mu, wt)  
  ..$ aic       :function (y, n, mu, wt, dev)  
  ..$ mu.eta    :function (eta)  
  ..$ initialize:  expression({     if (NCOL(y) == 1) {         if (is.factor(y))              y <- y != levels(y)[1L]         n <- rep.int(1, nobs)         y[weights == 0] <- 0         if (any(y < 0 | y > 1))              stop("y values must be 0 <= y <= 1")         mustart <- (weights * y + 0.5)/(weights + 1)         m <- weights * y         if (any(abs(m - round(m)) > 0.001))              warning("non-integer #successes in a binomial glm!")     }     else if (NCOL(y) == 2) {         if (any(abs(y - round(y)) > 0.001))              warning("non-integer counts in a binomial glm!")         n <- y[, 1] + y[, 2]         y <- ifelse(n == 0, 0, y[, 1]/n)         weights <- weights * n         mustart <- (n * y + 0.5)/(n + 1)     }     else stop("for the 'binomial' family, y must be a vector of 0 and 1's\nor a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
  ..$ validmu   :function (mu)  
  ..$ valideta  :function (eta)  
  ..$ simulate  :function (object, nsim)  
  ..- attr(*, "class")= chr "family"
 $ linear.predictors: Named num [1:618] 0.253 -2.277 -4.465 -1.958 -2.618 ...
  ..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
 $ deviance         : num 337
 $ aic              : num 341
 $ null.deviance    : num 480
 $ iter             : int 6
 $ weights          : Named num [1:618] 0.246 0.0844 0.0112 0.1084 0.0634 ...
  ..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
 $ prior.weights    : Named num [1:618] 1 1 1 1 1 1 1 1 1 1 ...
  ..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
 $ df.residual      : int 616
 $ df.null          : int 617
 $ y                : Named num [1:618] 1 0 0 1 0 0 0 0 0 0 ...
  ..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
 $ converged        : logi TRUE
 $ boundary         : logi FALSE
 $ model            :'data.frame':  618 obs. of  2 variables:
  ..$ MYOPIC: int [1:618] 1 0 0 1 0 0 0 0 0 0 ...
  ..$ SPHEQ : num [1:618] -0.052 0.608 1.179 0.525 0.697 ...
  ..- attr(*, "terms")=Classes 'terms', 'formula' length 3 MYOPIC ~ SPHEQ
  .. .. ..- attr(*, "variables")= language list(MYOPIC, SPHEQ)
  .. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. .. ..$ : chr [1:2] "MYOPIC" "SPHEQ"
  .. .. .. .. ..$ : chr "SPHEQ"
  .. .. ..- attr(*, "term.labels")= chr "SPHEQ"
  .. .. ..- attr(*, "order")= int 1
  .. .. ..- attr(*, "intercept")= int 1
  .. .. ..- attr(*, "response")= int 1
  .. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. .. ..- attr(*, "predvars")= language list(MYOPIC, SPHEQ)
  .. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. .. ..- attr(*, "names")= chr [1:2] "MYOPIC" "SPHEQ"
 $ call             : language glm(formula = MYOPIC ~ SPHEQ, family = binomial(), data = MYOPIA)
 $ formula          :Class 'formula' length 3 MYOPIC ~ SPHEQ
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
 $ terms            :Classes 'terms', 'formula' length 3 MYOPIC ~ SPHEQ
  .. ..- attr(*, "variables")= language list(MYOPIC, SPHEQ)
  .. ..- attr(*, "factors")= int [1:2, 1] 0 1
  .. .. ..- attr(*, "dimnames")=List of 2
  .. .. .. ..$ : chr [1:2] "MYOPIC" "SPHEQ"
  .. .. .. ..$ : chr "SPHEQ"
  .. ..- attr(*, "term.labels")= chr "SPHEQ"
  .. ..- attr(*, "order")= int 1
  .. ..- attr(*, "intercept")= int 1
  .. ..- attr(*, "response")= int 1
  .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv> 
  .. ..- attr(*, "predvars")= language list(MYOPIC, SPHEQ)
  .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
  .. .. ..- attr(*, "names")= chr [1:2] "MYOPIC" "SPHEQ"
 $ data             :'data.frame':  618 obs. of  18 variables:
  ..$ ID       : int [1:618] 1 2 3 4 5 6 7 8 9 10 ...
  ..$ STUDYYEAR: int [1:618] 1992 1995 1991 1990 1995 1995 1993 1991 1991 1991 ...
  ..$ MYOPIC   : int [1:618] 1 0 0 1 0 0 0 0 0 0 ...
  ..$ AGE      : int [1:618] 6 6 6 6 5 6 6 6 7 6 ...
  ..$ GENDER   : int [1:618] 1 1 1 1 0 0 1 1 0 1 ...
  ..$ SPHEQ    : num [1:618] -0.052 0.608 1.179 0.525 0.697 ...
  ..$ AL       : num [1:618] 21.9 22.4 22.5 22.2 23.3 ...
  ..$ ACD      : num [1:618] 3.69 3.7 3.46 3.86 3.68 ...
  ..$ LT       : num [1:618] 3.5 3.39 3.51 3.61 3.45 ...
  ..$ VCD      : num [1:618] 14.7 15.3 15.5 14.7 16.2 ...
  ..$ SPORTHR  : int [1:618] 45 4 14 18 14 10 12 12 4 30 ...
  ..$ READHR   : int [1:618] 8 0 0 11 0 6 7 0 0 5 ...
  ..$ COMPHR   : int [1:618] 0 1 2 0 0 2 2 0 3 1 ...
  ..$ STUDYHR  : int [1:618] 0 1 0 0 0 1 1 0 1 0 ...
  ..$ TVHR     : int [1:618] 10 7 10 4 4 19 8 8 3 10 ...
  ..$ DIOPTERHR: int [1:618] 34 12 14 37 4 44 36 8 12 27 ...
  ..$ MOMMY    : int [1:618] 1 1 0 0 1 0 0 0 0 0 ...
  ..$ DADMY    : int [1:618] 1 1 0 1 0 1 1 0 0 0 ...
 $ offset           : NULL
 $ control          :List of 3
  ..$ epsilon: num 1e-08
  ..$ maxit  : num 25
  ..$ trace  : logi FALSE
 $ method           : chr "glm.fit"
 $ contrasts        : NULL
 $ xlevels          : Named list()
 - attr(*, "class")= chr [1:2] "glm" "lm"

Thirty is quite a lot but if we endure we can find:

coefficients
deviance
null.deviance
...

Likelihood ratio test

\[ H_{0}:\beta_{1}=0 \] \[ H_{1}:\beta_{1}\neq 0 \] \[ statistic = -2\ln{\frac{likelihood~without~SPEQ}{likelihood~ with~SPEQ}}\\ ~~= null~deviance-deviance\\ \sim\chi^{2} \] $statistic=$142.7321372

Note that we used inline code combined with $\LaTeX$ above, the code was:fit_hw2_ex1$null.deviance-fit_hw2_ex1$deviance

And, now we can calculate chi square p-value:

p_value=1-pchisq(df = 1,statistic)#alternatively use pchisq(df = 1,statistic,lower.tail=FALSE) 
p_value

## [1] 0

So we can say that SPEQ variable is significant and that we should keep it in the model, that is we reject the $H_0$ hypotesis.

Wald test

\[H_0:\beta_1=0 \] \[H_1:\beta_1\neq0\] \[Wald~statistic=\frac{\hat{\beta_1}}{\hat{SE(\beta_1)}} \sim N(0,1)\] $Wald~statistic=$-9.1619878

Once again we used $\LaTeX$ inline with inline code coef(summary(fit_hw2_ex1))[2,3]. Here we used summary function since is much easier than looking thru fit_hw2_ex1, see the code below

knitr::kable(coef(summary(fit_hw2_ex1)))

	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	0.0539731	0.2067483	0.2610572	0.7940484
SPHEQ	-3.8330976	0.4183696	-9.1619878	0.0000000

Also from this table we can extract p value by subseting coef(summary(fit_hw2_ex1))[2,4] which is an alternative to usage of pnorm function. $p=$0

Finali we reject the $H_0$ hypotesis that is, we conclude that SPEQ is highly significant.

Applied logistic regression - Week Two Homework

Milos Cipovic

September 3, 2016

Exercise 1

Likelihood ratio test

Wald test