Week 8 Discussion

First, I looked at a multiple regression model for Fertility, using all other variables:

summary(lm(Fertility ~ . , data = swiss))

## 
## Call:
## lm(formula = Fertility ~ ., data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -15.2743  -5.2617   0.5032   4.1198  15.3213 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      66.91518   10.70604   6.250 1.91e-07 ***
## Agriculture      -0.17211    0.07030  -2.448  0.01873 *  
## Examination      -0.25801    0.25388  -1.016  0.31546    
## Education        -0.87094    0.18303  -4.758 2.43e-05 ***
## Catholic          0.10412    0.03526   2.953  0.00519 ** 
## Infant.Mortality  1.07705    0.38172   2.822  0.00734 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.165 on 41 degrees of freedom
## Multiple R-squared:  0.7067, Adjusted R-squared:  0.671 
## F-statistic: 19.76 on 5 and 41 DF,  p-value: 5.594e-10

Based on that model, I decided to remove the Examination variable because it had the highest p-value. I ran the multiple regression again without Examination:

summary(lm(Fertility ~ Agriculture + Education + Catholic + Infant.Mortality, data = swiss))

## 
## Call:
## lm(formula = Fertility ~ Agriculture + Education + Catholic + 
##     Infant.Mortality, data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.6765  -6.0522   0.7514   3.1664  16.1422 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      62.10131    9.60489   6.466 8.49e-08 ***
## Agriculture      -0.15462    0.06819  -2.267  0.02857 *  
## Education        -0.98026    0.14814  -6.617 5.14e-08 ***
## Catholic          0.12467    0.02889   4.315 9.50e-05 ***
## Infant.Mortality  1.07844    0.38187   2.824  0.00722 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.168 on 42 degrees of freedom
## Multiple R-squared:  0.6993, Adjusted R-squared:  0.6707 
## F-statistic: 24.42 on 4 and 42 DF,  p-value: 1.717e-10

I am surprised to see that the adjusted R-squared actually got slightly worse when removing the Examination variable. With this information, I conclude that the model with all variables is the best model.

In order to run a logistic regression model for Fertility, the Fertility variable needs to be recoded into 1 or 0. Given the information that we are looking for Fertility > 70.0, we can code everything in Fertility that is greater than 70 as a 1 and everyting less than or equal to 70 as a 0.

swiss$Fertility[swiss$Fertility<=70] <- 0
swiss$Fertility[swiss$Fertility>70] <- 1

Now that the Fertility variable has been recoded, I can run my logistic model:

summary(glm(Fertility ~ . , data = swiss, family = binomial))

## 
## Call:
## glm(formula = Fertility ~ ., family = binomial, data = swiss)
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -1.85403  -0.45960   0.03648   0.55548   2.32911  
## 
## Coefficients:
##                  Estimate Std. Error z value Pr(>|z|)  
## (Intercept)       4.82826    5.25607   0.919   0.3583  
## Agriculture      -0.09615    0.04011  -2.397   0.0165 *
## Examination      -0.32116    0.13844  -2.320   0.0203 *
## Education        -0.12078    0.08610  -1.403   0.1607  
## Catholic          0.02078    0.01376   1.509   0.1312  
## Infant.Mortality  0.29078    0.21051   1.381   0.1672  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 65.135  on 46  degrees of freedom
## Residual deviance: 32.887  on 41  degrees of freedom
## AIC: 44.887
## 
## Number of Fisher Scoring iterations: 6

Week 8 Discussion

Greg Adelsberger

3/2/2018

First, I looked at a multiple regression model for Fertility, using all other variables:

Based on that model, I decided to remove the Examination variable because it had the highest p-value. I ran the multiple regression again without Examination:

I am surprised to see that the adjusted R-squared actually got slightly worse when removing the Examination variable. With this information, I conclude that the model with all variables is the best model.

In order to run a logistic regression model for Fertility, the Fertility variable needs to be recoded into 1 or 0. Given the information that we are looking for Fertility > 70.0, we can code everything in Fertility that is greater than 70 as a 1 and everyting less than or equal to 70 as a 0.

Now that the Fertility variable has been recoded, I can run my logistic model: