GLM: Annual Incidence
lmFit1 <- glm(count ~ year, data = annualIncidence)
summary(lmFit1)
##
## Call:
## glm(formula = count ~ year, data = annualIncidence)
##
## Deviance Residuals:
## 1 2 3 4
## -234.5 365.5 -27.5 -103.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2545754.5 285352.6 -8.921 0.0123 *
## year 1292.0 141.4 9.135 0.0118 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 100024.5)
##
## Null deviance: 8546369 on 3 degrees of freedom
## Residual deviance: 200049 on 2 degrees of freedom
## AIC: 60.632
##
## Number of Fisher Scoring iterations: 2
Count is the response variable and year is predictor variable We are
95% confident that the actual slope lies within (1016, 1568) Annual
incidence is increasing at a rate of 1,292 surgeries Formula: y = 1292x
- 2545754
summary(annualIncidence$count - lmFit1$fitted.values)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -234.50 -136.25 -65.50 0.00 70.75 365.50
Distribution is slightly right-skewed
GLM: Mean annual charge
lmFit2 <- glm(avgCharge ~ year, data = annualCharge)
summary(lmFit2)
##
## Call:
## glm(formula = avgCharge ~ year, data = annualCharge)
##
## Deviance Residuals:
## 1 2 3 4
## 319.3 93.1 -1144.1 731.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.352e+07 8.920e+05 -15.16 0.00432 **
## year 6.765e+03 4.422e+02 15.30 0.00424 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 977484.9)
##
## Null deviance: 230794625 on 3 degrees of freedom
## Residual deviance: 1954970 on 2 degrees of freedom
## AIC: 69.75
##
## Number of Fisher Scoring iterations: 2
Average charge is the response variable and year is the predictor
variable We are 95% confident that the actual slope lies within ($5702,
$7764) Mean annual charge is increasing at a rate of $6733 Formula: y =
6733x - 13457644
summary(annualCharge$avgCharge - lmFit2$fitted.values)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1144.1 -216.2 206.2 0.0 422.4 731.7
Distribution is slightly right-skewed
GLM: Patient Survival
What predictor variables are statistically significant when
evaluating a patient’s chance of survival?
dieGLM <- glm(died ~ year + female + race + age + i10_ndx + i10_npr + hcup_division + zipinc_qrtl, data = allYears, family = binomial)
summary(dieGLM)
##
## Call:
## glm(formula = died ~ year + female + race + age + i10_ndx + i10_npr +
## hcup_division + zipinc_qrtl, family = binomial, data = allYears)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.5662 -0.1792 -0.1146 -0.0716 3.9819
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 318.581210 31.456657 10.128 < 2e-16 ***
## year -0.162416 0.015597 -10.413 < 2e-16 ***
## female 0.079968 0.033817 2.365 0.018042 *
## race 0.054079 0.014871 3.637 0.000276 ***
## age 0.035191 0.001187 29.638 < 2e-16 ***
## i10_ndx 0.100403 0.002547 39.416 < 2e-16 ***
## i10_npr 0.118859 0.002646 44.916 < 2e-16 ***
## hcup_division 0.010918 0.007062 1.546 0.122101
## zipinc_qrtl -0.062091 0.015344 -4.047 5.2e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 39437 on 230459 degrees of freedom
## Residual deviance: 32261 on 230451 degrees of freedom
## (12962 observations deleted due to missingness)
## AIC: 32279
##
## Number of Fisher Scoring iterations: 8
Died is the response variable and year, sex, race, age, length of
stay, number of diagnoses, number of procedures, hospital division and
household income are the response variables. At an alpha level of 0.05,
hospital division is not statistically significant and sex is only
slightly significant. All other variables are statistically
significant.
What is the best model for patient survival? Sex?
die1 <- glm(died ~ female, data = allYears, family = binomial)
summary(die1)
##
## Call:
## glm(formula = died ~ female, family = binomial, data = allYears)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.1862 -0.1862 -0.1840 -0.1840 2.8592
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.07062 0.02135 -190.649 <2e-16 ***
## female 0.02452 0.03152 0.778 0.437
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 41801 on 243074 degrees of freedom
## Residual deviance: 41800 on 243073 degrees of freedom
## (347 observations deleted due to missingness)
## AIC: 41804
##
## Number of Fisher Scoring iterations: 7
AIC: 41804
Race?
die2 <- glm(died ~ race, data = allYears, family = binomial)
summary(die2)
##
## Call:
## glm(formula = died ~ race, family = binomial, data = allYears)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.1853 -0.1843 -0.1841 -0.1841 2.8588
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -4.071921 0.028164 -144.580 <2e-16 ***
## race 0.002614 0.013979 0.187 0.852
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 40253 on 235584 degrees of freedom
## Residual deviance: 40253 on 235583 degrees of freedom
## (7837 observations deleted due to missingness)
## AIC: 40257
##
## Number of Fisher Scoring iterations: 7
AIC: 40257
Age?
die3 <- glm(died ~ age, data = allYears, family = binomial)
summary(die3)
##
## Call:
## glm(formula = died ~ age, family = binomial, data = allYears)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.3090 -0.2100 -0.1729 -0.1325 3.5306
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -6.230444 0.067723 -92.00 <2e-16 ***
## age 0.035692 0.001005 35.53 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 41808 on 243273 degrees of freedom
## Residual deviance: 40381 on 243272 degrees of freedom
## (148 observations deleted due to missingness)
## AIC: 40385
##
## Number of Fisher Scoring iterations: 7
AIC: 40385
Hospital Division?
die4 <- glm(died ~ hcup_division, data = allYears, family = binomial)
summary(die4)
##
## Call:
## glm(formula = died ~ hcup_division, family = binomial, data = allYears)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.1896 -0.1873 -0.1850 -0.1827 2.8727
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.997194 0.036134 -110.621 <2e-16 ***
## hcup_division -0.012515 0.006498 -1.926 0.0541 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 41808 on 243283 degrees of freedom
## Residual deviance: 41804 on 243282 degrees of freedom
## (138 observations deleted due to missingness)
## AIC: 41808
##
## Number of Fisher Scoring iterations: 7
AIC: 41808
Household Income?
die5 <- glm(died ~ zipinc_qrtl, data = allYears, family = binomial)
summary(die5)
##
## Call:
## glm(formula = died ~ zipinc_qrtl, family = binomial, data = allYears)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -0.1881 -0.1881 -0.1855 -0.1829 2.8729
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -3.99761 0.03589 -111.394 <2e-16 ***
## zipinc_qrtl -0.02820 0.01451 -1.944 0.0519 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 40919 on 238169 degrees of freedom
## Residual deviance: 40915 on 238168 degrees of freedom
## (5252 observations deleted due to missingness)
## AIC: 40919
##
## Number of Fisher Scoring iterations: 7
AIC: 40919
Of the 5 models, race has the lowest AIC and hospital division has
the highest AIC. Race is the best predictor variable to model a
patient’s likelihood of survival and hospital division is least
efficient predictor variable.