Discussion 8

Using the “swiss” dataset, build the best multiple regression model you can for the variable Fertility. Then build a logistic regression model for predicting Fertility>70.0. Post your solutions / interpretation / code for your peers to see.

library(psych)
describe(swiss$Fertility)

##    vars  n  mean    sd median trimmed   mad min  max range  skew kurtosis
## X1    1 47 70.14 12.49   70.4   70.66 10.23  35 92.5  57.5 -0.46     0.26
##      se
## X1 1.82

We should then look into the correlations that exist between the variables in order to establish which is the best one to analyze in regards to fertility

cor(swiss)

##                   Fertility Agriculture Examination   Education   Catholic
## Fertility         1.0000000  0.35307918  -0.6458827 -0.66378886  0.4636847
## Agriculture       0.3530792  1.00000000  -0.6865422 -0.63952252  0.4010951
## Examination      -0.6458827 -0.68654221   1.0000000  0.69841530 -0.5727418
## Education        -0.6637889 -0.63952252   0.6984153  1.00000000 -0.1538589
## Catholic          0.4636847  0.40109505  -0.5727418 -0.15385892  1.0000000
## Infant.Mortality  0.4165560 -0.06085861  -0.1140216 -0.09932185  0.1754959
##                  Infant.Mortality
## Fertility              0.41655603
## Agriculture           -0.06085861
## Examination           -0.11402160
## Education             -0.09932185
## Catholic               0.17549591
## Infant.Mortality       1.00000000

From a quick analysis one could say that Fertility and Education are the closest related variables. This can be demonstrated by graphing and seeing if there is indeed a linear relationship.

plot(swiss$Fertility~swiss$Education, xlab='Education', ylab='Fertility')
abline(lm(swiss$Fertility~swiss$Education))

sample1<-lm(Fertility~Education, data=swiss)
summary(sample1)

## 
## Call:
## lm(formula = Fertility ~ Education, data = swiss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.036  -6.711  -1.011   9.526  19.689 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  79.6101     2.1041  37.836  < 2e-16 ***
## Education    -0.8624     0.1448  -5.954 3.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.446 on 45 degrees of freedom
## Multiple R-squared:  0.4406, Adjusted R-squared:  0.4282 
## F-statistic: 35.45 on 1 and 45 DF,  p-value: 3.659e-07

Although the p-value is smaller than 0.05 only 44% of the variability in Fertility is explained by Education. Meaning that other variables add on to the weight, perhaps if we had ran the test with both Education and Examination there would be a higher variability.

Moving forward with the testing when Fertility is greater than 70 we ran the following test

greaterthan<-subset(swiss,swiss$Fertility>70)
table(greaterthan$Fertility>70)

## 
## TRUE 
##   24

I did not know how to continue on with this model… Any suggestions are greatly appreciated.

Discussion 8

Carola Rojas

3/09/2018