Use the Myopia Study (MYOPIA.dta)
Using the results of the output from Stata, assess the significance of the slope coefficient for SPHEQ using the likelihood ratio test and the Wald test. What assumptions are needed for the p-values computed for each of these tests to be valid? Are the results of these tests consistent with one another? What is the value of the deviance for the fitted model?
MYOPIA<-read.csv(file = "https://learn.canvas.net/courses/1179/files/461762/download?download_frd=1")#read the file from adress in the assignment
#run the regression
fit_hw2_ex1<-glm(MYOPIC~SPHEQ, data=MYOPIA, family = binomial())
stargazer::stargazer(fit_hw2_ex1,type = 'html')
| Dependent variable: | |
| MYOPIC | |
| SPHEQ | -3.833*** |
| (0.418) | |
| Constant | 0.054 |
| (0.207) | |
| Observations | 618 |
| Log Likelihood | -168.672 |
| Akaike Inf. Crit. | 341.345 |
| Note: | p<0.1; p<0.05; p<0.01 |
Firstly, lets explore a litle bit the fit_hw2_ex1 object:
str(fit_hw2_ex1)
List of 30
$ coefficients : Named num [1:2] 0.054 -3.833
..- attr(*, "names")= chr [1:2] "(Intercept)" "SPHEQ"
$ residuals : Named num [1:618] 1.78 -1.1 -1.01 8.09 -1.07 ...
..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
$ fitted.values : Named num [1:618] 0.563 0.0931 0.0114 0.1236 0.068 ...
..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
$ effects : Named num [1:618] 9.481 9.162 -0.107 2.633 -0.286 ...
..- attr(*, "names")= chr [1:618] "(Intercept)" "SPHEQ" "" "" ...
$ R : num [1:2, 1:2] -7.11 0 -2.57 -2.39
..- attr(*, "dimnames")=List of 2
.. ..$ : chr [1:2] "(Intercept)" "SPHEQ"
.. ..$ : chr [1:2] "(Intercept)" "SPHEQ"
$ rank : int 2
$ qr :List of 5
..$ qr : num [1:618, 1:2] -7.1073 0.0409 0.0149 0.0463 0.0354 ...
.. ..- attr(*, "dimnames")=List of 2
.. .. ..$ : chr [1:618] "1" "2" "3" "4" ...
.. .. ..$ : chr [1:2] "(Intercept)" "SPHEQ"
..$ rank : int 2
..$ qraux: num [1:2] 1.07 1.03
..$ pivot: int [1:2] 1 2
..$ tol : num 1e-11
..- attr(*, "class")= chr "qr"
$ family :List of 12
..$ family : chr "binomial"
..$ link : chr "logit"
..$ linkfun :function (mu)
..$ linkinv :function (eta)
..$ variance :function (mu)
..$ dev.resids:function (y, mu, wt)
..$ aic :function (y, n, mu, wt, dev)
..$ mu.eta :function (eta)
..$ initialize: expression({ if (NCOL(y) == 1) { if (is.factor(y)) y <- y != levels(y)[1L] n <- rep.int(1, nobs) y[weights == 0] <- 0 if (any(y < 0 | y > 1)) stop("y values must be 0 <= y <= 1") mustart <- (weights * y + 0.5)/(weights + 1) m <- weights * y if (any(abs(m - round(m)) > 0.001)) warning("non-integer #successes in a binomial glm!") } else if (NCOL(y) == 2) { if (any(abs(y - round(y)) > 0.001)) warning("non-integer counts in a binomial glm!") n <- y[, 1] + y[, 2] y <- ifelse(n == 0, 0, y[, 1]/n) weights <- weights * n mustart <- (n * y + 0.5)/(n + 1) } else stop("for the 'binomial' family, y must be a vector of 0 and 1's\nor a 2 column matrix where col 1 is no. successes and col 2 is no. failures") })
..$ validmu :function (mu)
..$ valideta :function (eta)
..$ simulate :function (object, nsim)
..- attr(*, "class")= chr "family"
$ linear.predictors: Named num [1:618] 0.253 -2.277 -4.465 -1.958 -2.618 ...
..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
$ deviance : num 337
$ aic : num 341
$ null.deviance : num 480
$ iter : int 6
$ weights : Named num [1:618] 0.246 0.0844 0.0112 0.1084 0.0634 ...
..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
$ prior.weights : Named num [1:618] 1 1 1 1 1 1 1 1 1 1 ...
..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
$ df.residual : int 616
$ df.null : int 617
$ y : Named num [1:618] 1 0 0 1 0 0 0 0 0 0 ...
..- attr(*, "names")= chr [1:618] "1" "2" "3" "4" ...
$ converged : logi TRUE
$ boundary : logi FALSE
$ model :'data.frame': 618 obs. of 2 variables:
..$ MYOPIC: int [1:618] 1 0 0 1 0 0 0 0 0 0 ...
..$ SPHEQ : num [1:618] -0.052 0.608 1.179 0.525 0.697 ...
..- attr(*, "terms")=Classes 'terms', 'formula' length 3 MYOPIC ~ SPHEQ
.. .. ..- attr(*, "variables")= language list(MYOPIC, SPHEQ)
.. .. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. .. ..- attr(*, "dimnames")=List of 2
.. .. .. .. ..$ : chr [1:2] "MYOPIC" "SPHEQ"
.. .. .. .. ..$ : chr "SPHEQ"
.. .. ..- attr(*, "term.labels")= chr "SPHEQ"
.. .. ..- attr(*, "order")= int 1
.. .. ..- attr(*, "intercept")= int 1
.. .. ..- attr(*, "response")= int 1
.. .. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. .. ..- attr(*, "predvars")= language list(MYOPIC, SPHEQ)
.. .. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. .. ..- attr(*, "names")= chr [1:2] "MYOPIC" "SPHEQ"
$ call : language glm(formula = MYOPIC ~ SPHEQ, family = binomial(), data = MYOPIA)
$ formula :Class 'formula' length 3 MYOPIC ~ SPHEQ
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
$ terms :Classes 'terms', 'formula' length 3 MYOPIC ~ SPHEQ
.. ..- attr(*, "variables")= language list(MYOPIC, SPHEQ)
.. ..- attr(*, "factors")= int [1:2, 1] 0 1
.. .. ..- attr(*, "dimnames")=List of 2
.. .. .. ..$ : chr [1:2] "MYOPIC" "SPHEQ"
.. .. .. ..$ : chr "SPHEQ"
.. ..- attr(*, "term.labels")= chr "SPHEQ"
.. ..- attr(*, "order")= int 1
.. ..- attr(*, "intercept")= int 1
.. ..- attr(*, "response")= int 1
.. ..- attr(*, ".Environment")=<environment: R_GlobalEnv>
.. ..- attr(*, "predvars")= language list(MYOPIC, SPHEQ)
.. ..- attr(*, "dataClasses")= Named chr [1:2] "numeric" "numeric"
.. .. ..- attr(*, "names")= chr [1:2] "MYOPIC" "SPHEQ"
$ data :'data.frame': 618 obs. of 18 variables:
..$ ID : int [1:618] 1 2 3 4 5 6 7 8 9 10 ...
..$ STUDYYEAR: int [1:618] 1992 1995 1991 1990 1995 1995 1993 1991 1991 1991 ...
..$ MYOPIC : int [1:618] 1 0 0 1 0 0 0 0 0 0 ...
..$ AGE : int [1:618] 6 6 6 6 5 6 6 6 7 6 ...
..$ GENDER : int [1:618] 1 1 1 1 0 0 1 1 0 1 ...
..$ SPHEQ : num [1:618] -0.052 0.608 1.179 0.525 0.697 ...
..$ AL : num [1:618] 21.9 22.4 22.5 22.2 23.3 ...
..$ ACD : num [1:618] 3.69 3.7 3.46 3.86 3.68 ...
..$ LT : num [1:618] 3.5 3.39 3.51 3.61 3.45 ...
..$ VCD : num [1:618] 14.7 15.3 15.5 14.7 16.2 ...
..$ SPORTHR : int [1:618] 45 4 14 18 14 10 12 12 4 30 ...
..$ READHR : int [1:618] 8 0 0 11 0 6 7 0 0 5 ...
..$ COMPHR : int [1:618] 0 1 2 0 0 2 2 0 3 1 ...
..$ STUDYHR : int [1:618] 0 1 0 0 0 1 1 0 1 0 ...
..$ TVHR : int [1:618] 10 7 10 4 4 19 8 8 3 10 ...
..$ DIOPTERHR: int [1:618] 34 12 14 37 4 44 36 8 12 27 ...
..$ MOMMY : int [1:618] 1 1 0 0 1 0 0 0 0 0 ...
..$ DADMY : int [1:618] 1 1 0 1 0 1 1 0 0 0 ...
$ offset : NULL
$ control :List of 3
..$ epsilon: num 1e-08
..$ maxit : num 25
..$ trace : logi FALSE
$ method : chr "glm.fit"
$ contrasts : NULL
$ xlevels : Named list()
- attr(*, "class")= chr [1:2] "glm" "lm"
Thirty is quite a lot but if we endure we can find:
coefficientsdeviancenull.deviance...\[ H_{0}:\beta_{1}=0 \] \[ H_{1}:\beta_{1}\neq 0 \] \[ statistic = -2\ln{\frac{likelihood~without~SPEQ}{likelihood~ with~SPEQ}}\\
~~= null~deviance-deviance\\
\sim\chi^{2} \] \(statistic=\)142.7321372
Note that we used inline code combined with \(\LaTeX\) above, the code was:fit_hw2_ex1$null.deviance-fit_hw2_ex1$deviance
And, now we can calculate chi square p-value:
p_value=1-pchisq(df = 1,statistic)#alternatively use pchisq(df = 1,statistic,lower.tail=FALSE)
p_value
## [1] 0
So we can say that SPEQ variable is significant and that we should keep it in the model, that is we reject the \(H_0\) hypotesis.
\[H_0:\beta_1=0 \] \[H_1:\beta_1\neq0\] \[Wald~statistic=\frac{\hat{\beta_1}}{\hat{SE(\beta_1)}}
\sim N(0,1)\] \(Wald~statistic=\)-9.1619878
Once again we used \(\LaTeX\) inline with inline code coef(summary(fit_hw2_ex1))[2,3]. Here we used summary function since is much easier than looking thru fit_hw2_ex1, see the code below
knitr::kable(coef(summary(fit_hw2_ex1)))
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 0.0539731 | 0.2067483 | 0.2610572 | 0.7940484 |
| SPHEQ | -3.8330976 | 0.4183696 | -9.1619878 | 0.0000000 |
Also from this table we can extract p value by subseting coef(summary(fit_hw2_ex1))[2,4] which is an alternative to usage of pnorm function. \(p=\)0
Finali we reject the \(H_0\) hypotesis that is, we conclude that SPEQ is highly significant.