Use the Myopia Study (MYOPIA.dta)
One variable that is clearly important is the initial value of spherical equivalent refraction (SPHEQ).
Write down the equation for the logistic regression model of SPHEQ on MYOPIA. Write down the equation for the logit transformation of this logistic regression model. What characteristic of the outcome variable, MYOPIA, leads us to consider the logistic regression model as opposed to the usual linear regression model to describe the relationship between MYOPIA and SPHEQ?
Form a scatterplot of MYOPIA vs. SPHEQ.
Write down an expression for the likelihood and log likelihood for the logistic regression model in part (a) using the ungrouped, , data. Obtain expressions for the two likelihood equations.
Using Stata, obtain the maximum likelihood estimates of the parameters of the logistic regression model in part (a). These estimates should be based on the ungrouped, , data. Using these estimates, write down the equation for the fitted values, that is, the estimated logistic probabilities. Plot the equation for the fitted values on the axes used in the scatterplots in parts (b) and (c).
Firstly let’s read in the data for the assignment.
MYOPIA<-read.csv(file = "https://learn.canvas.net/courses/1179/files/461762/download?download_frd=1")#read the file from adress in the assignment
knitr::kable(head(MYOPIA))#print first 6 rows with kable function from knitr
| ID | STUDYYEAR | MYOPIC | AGE | GENDER | SPHEQ | AL | ACD | LT | VCD | SPORTHR | READHR | COMPHR | STUDYHR | TVHR | DIOPTERHR | MOMMY | DADMY |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1992 | 1 | 6 | 1 | -0.052 | 21.89 | 3.690 | 3.498 | 14.70 | 45 | 8 | 0 | 0 | 10 | 34 | 1 | 1 |
| 2 | 1995 | 0 | 6 | 1 | 0.608 | 22.38 | 3.702 | 3.392 | 15.29 | 4 | 0 | 1 | 1 | 7 | 12 | 1 | 1 |
| 3 | 1991 | 0 | 6 | 1 | 1.179 | 22.49 | 3.462 | 3.514 | 15.52 | 14 | 0 | 2 | 0 | 10 | 14 | 0 | 0 |
| 4 | 1990 | 1 | 6 | 1 | 0.525 | 22.20 | 3.862 | 3.612 | 14.73 | 18 | 11 | 0 | 0 | 4 | 37 | 0 | 1 |
| 5 | 1995 | 0 | 5 | 0 | 0.697 | 23.29 | 3.676 | 3.454 | 16.16 | 14 | 0 | 0 | 0 | 4 | 4 | 1 | 0 |
| 6 | 1995 | 0 | 6 | 0 | 1.744 | 22.14 | 3.224 | 3.556 | 15.36 | 10 | 6 | 2 | 1 | 19 | 44 | 0 | 1 |
OK then, we see that the variables that we’ll been working with are MYOPIC and SPEQ.Let’s get started.
MYOPIC is the variable that takes value 1 if the subject doesn’t have myopia (contrary to my intuition) and 0 if the subject has myopia.
MYOPIC is a binary response variable, using it in simple linear regression would break some of the assumption of linear regression (linearity, normal distribution, homoscedasticity..).library(ggplot2)
ggplot(MYOPIA, aes(x = SPHEQ, y = MYOPIC)) + geom_jitter(shape = "O", position = position_jitter(height = 0)) +theme_bw()
And now, lets write equations for likelihood functions:
We are going to fit logistic regresion to MYOPIA dataset by regressing SPHEQ variable on MIOPIC by using glm function for generalized linear models and choosing binomial family.
library(stargazer)
##
## Please cite as:
## Hlavac, Marek (2015). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2. http://CRAN.R-project.org/package=stargazer
#here we are using option result=asis in chunc because of the function stargazer which prints out a nice table summary
fit_hw1_ex1<-glm(MYOPIC~SPHEQ, data=MYOPIA, family = binomial())
stargazer(fit_hw1_ex1,type = "html")
| Dependent variable: | |
| MYOPIC | |
| SPHEQ | -3.833*** |
| (0.418) | |
| Constant | 0.054 |
| (0.207) | |
| Observations | 618 |
| Log Likelihood | -168.672 |
| Akaike Inf. Crit. | 341.345 |
| Note: | p<0.1; p<0.05; p<0.01 |
…Continuing…
Equation:
\[ \hat{\pi}(x)=\frac{ e^{\hspace{1mm} 0.054-3.833 \cdot X}}{1+e^{\hspace{1mm}0.054-3.833\cdot x}} \]
And finally a plot of fitted values:
ggplot(MYOPIA, aes(x = SPHEQ, y = MYOPIC)) + geom_jitter(shape = "O", position = position_jitter(height = 0))+ stat_smooth(method="glm", se= FALSE,method.args = list(family = "binomial"))+xlim(-1,4.5)
Use the ICU study (icu.dta)
The ICU data set consists of a sample of 200 subjects who were part of a much larger study on survival of patients following admission to an adult intensive care unit (ICU). The major goal of this study was to develop a logistic regression model to predict the probability of survival to hospital discharge of these patients. A number of publications have appeared which have focused on various facets of this problem.
Write down the equation for the logistic regression model of STA on AGE. Write down the equation for the logit transformation of this logistic regression model. What characteristic of the outcome variable, STA, leads us to consider the logistic regression model as opposed to the usual linear regression model to describe the relationship between STA and AGE?
Write down an expression for the likelihood and log likelihood for the logistic regression model in part (a) using the ungrouped, data. Obtain expressions for the two likelihood equations.
Using Stata, obtain the maximum likelihood estimates of the parameters of the logistic regression model in part (a). These estimates should be based on the ungrouped, data. Using these estimates, write down the equation for the fitted values, that is, the estimated logistic probabilities. Plot the equation for the fitted values on the axes used in the scatterplots in part (b).
Summarize (describe in words) the results presented in the plot obtained from parts (b) and (d)
Reading in the data from course site.
ICU<-read.csv(file = "https://learn.canvas.net/courses/1179/files/461760/download?download_frd=1")#read the file from adress in the assignment
knitr::kable(head(ICU))#print first 6 rows with kable function from knitr
| ID | STA | AGE | SEX | RACE | SER | CAN | CRN | INF | CPR | SYS | HRA | PRE | TYP | FRA | PO2 | PH | PCO | BIC | CRE | LOC |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 552 | 0 | 16 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 100 | 140 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 102 | 0 | 16 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 104 | 111 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 837 | 0 | 17 | 1 | 3 | 0 | 0 | 0 | 0 | 0 | 130 | 140 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 863 | 0 | 17 | 0 | 3 | 1 | 0 | 0 | 0 | 0 | 136 | 78 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
| 829 | 0 | 17 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 140 | 78 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
| 379 | 0 | 18 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 112 | 76 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Since questions a - c and e are literraly the same as in exercise one we’re going to skip them and jump right to model estimation and ploting.
ggplot(aes(x=AGE,y=STA),data = ICU)+geom_point(alpha=0.4,colour="red")+theme_bw()
Fitting the data:
fit_hw1_ex2<-glm(STA~AGE, data=ICU, family = binomial())
stargazer(fit_hw1_ex1,type = "html")
| Dependent variable: | |
| MYOPIC | |
| SPHEQ | -3.833*** |
| (0.418) | |
| Constant | 0.054 |
| (0.207) | |
| Observations | 618 |
| Log Likelihood | -168.672 |
| Akaike Inf. Crit. | 341.345 |
| Note: | p<0.1; p<0.05; p<0.01 |
Logistic regression model:
\[ \hat{\pi}(x)=\frac{ e^{\hspace{1mm} -3.06-0.027 \cdot X}}{1+e^{\hspace{1mm}0.054-3.833\cdot x}} \]
Plot of fitted values:
ggplot(data = ICU,aes(x=AGE,y=STA))+geom_point(alpha=0.4,colour="red")+theme_bw()+stat_smooth(method="glm", se= FALSE,method.args = list(family = "binomial"))
It doesn’t look really s shaped but as proffesor said, it doesn’t allways have to be.