"Quiz 2"
## [1] "Quiz 2"
library(faraway)
data(esdcomp)
attach(esdcomp)
incre <- seq(0,11, by = 1)
hist(complaints, breaks = incre)

"As you can see the majority of the our complaints fall between 0 and 1. More than 1 complaint is more seldom."
## [1] "As you can see the majority of the our complaints fall between 0 and 1. More than 1 complaint is more seldom."
"The datset esdcomp looks into the factors that lead to complaints recieved in an emergency room. It would be ideal to use Poisson Regression in this scenario because our target variable (complaints recieved) is a count over some period of time. Linear Regression would not be effective because there is no way the number of complaints could be negative. Poisson Regression allows us to get rid of negative values."
## [1] "The datset esdcomp looks into the factors that lead to complaints recieved in an emergency room. It would be ideal to use Poisson Regression in this scenario because our target variable (complaints recieved) is a count over some period of time. Linear Regression would not be effective because there is no way the number of complaints could be negative. Poisson Regression allows us to get rid of negative values."
begmod <- glm(complaints~1, family = "poisson")
pmod1 <- glm(complaints~visits, family = "poisson")
pmod2 <- glm(complaints~residency, family = "poisson")
pmod3 <- glm(complaints~gender, family = "poisson")
pmod4 <- glm(complaints~revenue, family = "poisson")
pmod5 <- glm(complaints~hours, family = "poisson")
AIC(begmod)
## [1] 214.2269
AICtest <- c(AIC(pmod1),AIC(pmod2),AIC(pmod3),AIC(pmod4),AIC(pmod5))
AICtest
## [1] 181.0258 210.0828 213.2371 213.7092 191.4638
"The model with the lowest AIC, and therefore our best predictor for complaints, is the number of patient visits the hospital has."
## [1] "The model with the lowest AIC, and therefore our best predictor for complaints, is the number of patient visits the hospital has."
"We want to make sure that our new model is significantly better than our grand mean model. We can do this using a drop in deviance test"
## [1] "We want to make sure that our new model is significantly better than our grand mean model. We can do this using a drop in deviance test"
anova(pmod1,begmod, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits
## Model 2: complaints ~ 1
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 42 54.246
## 2 43 89.447 -1 -35.201 2.974e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
"Because this is our best predictor, pmod1, now becomes the currentmodel we will use going forward"
## [1] "Because this is our best predictor, pmod1, now becomes the currentmodel we will use going forward"
currentmod <- pmod1
amod <- glm(complaints~visits + residency, family = "poisson")
bmod <- glm(complaints~visits + gender, family = "poisson")
cmod <- glm(complaints~visits + revenue, family = "poisson")
dmod <- glm(complaints~visits + hours, family = "poisson")
"One way we can see whether our 2 predictor model is significantly better than our model with just oligarchy is to use a drop in deviance test"
## [1] "One way we can see whether our 2 predictor model is significantly better than our model with just oligarchy is to use a drop in deviance test"
anova(amod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + residency
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 50.882
## 2 42 54.246 -1 -3.3639 0.06664 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
anova(bmod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + gender
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 53.898
## 2 42 54.246 -1 -0.34833 0.5551
anova(cmod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + revenue
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 52.109
## 2 42 54.246 -1 -2.137 0.1438
anova(dmod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + hours
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 53.566
## 2 42 54.246 -1 -0.67983 0.4096
"Using a drop in deviance test with an alpha of .05, no 2 predictor model was significantly better than our model with just visits. However, a few came close such as residency and revenue."
## [1] "Using a drop in deviance test with an alpha of .05, no 2 predictor model was significantly better than our model with just visits. However, a few came close such as residency and revenue."
"We can try to add a few more complex models by squaring and square rooting some predictors"
## [1] "We can try to add a few more complex models by squaring and square rooting some predictors"
SqrtRev <- sqrt(revenue)
Revsquared <- revenue^2
SqrtHour <- sqrt(hours)
HourSq <- hours^2
Wmod <- glm(complaints~visits + HourSq, family = "poisson")
Xmod <- glm(complaints~visits + Revsquared, family = "poisson")
Ymod <- glm(complaints~visits + SqrtRev, family = "poisson")
Zmod <- glm(complaints~visits + SqrtHour, family = "poisson")
anova(Wmod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + HourSq
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 53.566
## 2 42 54.246 -1 -0.68029 0.4095
anova(Xmod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + Revsquared
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 51.776
## 2 42 54.246 -1 -2.4702 0.116
anova(Ymod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + SqrtRev
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 52.274
## 2 42 54.246 -1 -1.9723 0.1602
anova(Zmod,pmod1, test = "Chisq")
## Analysis of Deviance Table
##
## Model 1: complaints ~ visits + SqrtHour
## Model 2: complaints ~ visits
## Resid. Df Resid. Dev Df Deviance Pr(>Chi)
## 1 41 53.599
## 2 42 54.246 -1 -0.64671 0.4213
"Again we have no other predictors that are significant. So, we can conclude that our predictor with visits is our best model"
## [1] "Again we have no other predictors that are significant. So, we can conclude that our predictor with visits is our best model"
summary(pmod1)
##
## Call:
## glm(formula = complaints ~ visits, family = "poisson")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8125 -0.9655 -0.2594 0.9004 1.9214
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8493259 0.3807341 -2.231 0.0257 *
## visits 0.0008101 0.0001384 5.852 4.84e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1)
##
## Null deviance: 89.447 on 43 degrees of freedom
## Residual deviance: 54.246 on 42 degrees of freedom
## AIC: 181.03
##
## Number of Fisher Scoring iterations: 5
"Let log(λ) be equal to the mean number of complaints at the hospital"
## [1] "Let log(λ) be equal to the mean number of complaints at the hospital"
"log(λ) = β0 + .0008101(visits)"
## [1] "log(λ) = β0 + .0008101(visits)"
"We can interpret our model: An increase of 1 patient visits is associated with a multiplicative change of 1.00081 in the mean number of complaints recieved"
## [1] "We can interpret our model: An increase of 1 patient visits is associated with a multiplicative change of 1.00081 in the mean number of complaints recieved"
"Because our model experiences some dispersion which is not accounted for, we should calculate the dispersion"
## [1] "Because our model experiences some dispersion which is not accounted for, we should calculate the dispersion"
dp <- sum(residuals(pmod1, type = "pearson")^2 / pmod1$df.residual)
dp
## [1] 1.272378
#We have a bit of over dispersion with our variance being about 1.27 times larger than the mean. There does not appear to be much dispersion and likely explains the good fit of the model."
summary(pmod1, dispersion = dp)
##
## Call:
## glm(formula = complaints ~ visits, family = "poisson")
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.8125 -0.9655 -0.2594 0.9004 1.9214
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.8493259 0.4294671 -1.978 0.048 *
## visits 0.0008101 0.0001561 5.188 2.12e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for poisson family taken to be 1.272378)
##
## Null deviance: 89.447 on 43 degrees of freedom
## Residual deviance: 54.246 on 42 degrees of freedom
## AIC: 181.03
##
## Number of Fisher Scoring iterations: 5
"We should calculate a confidence interval for our model found through forward selection. We will use a 95% confidence interval"
## [1] "We should calculate a confidence interval for our model found through forward selection. We will use a 95% confidence interval"
myz <- qnorm(.975)
myci <- .0008101 + c(-1,1)*myz *.0001561
myci
## [1] 0.0005041496 0.0011160504
"We can interpret this confidence interval by: An increase of 1 patient visitor, we can be 95% sure, will result in a multiplicative increase between 1.000504 and 1.001117"
## [1] "We can interpret this confidence interval by: An increase of 1 patient visitor, we can be 95% sure, will result in a multiplicative increase between 1.000504 and 1.001117"