The first thing I did was remind myself of the structure of the swiss data since we previously used it, but I couldn’t remember all the variables:
str(swiss)
## 'data.frame': 47 obs. of 6 variables:
## $ Fertility : num 80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
## $ Agriculture : num 17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
## $ Examination : int 15 6 5 12 17 9 16 14 12 16 ...
## $ Education : int 12 9 5 7 15 7 7 8 7 13 ...
## $ Catholic : num 9.96 84.84 93.4 33.77 5.16 ...
## $ Infant.Mortality: num 22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...
Then I looked at the correlation between all the variables to get a sense for which two variables would be best to analyze:
cor(swiss)
## Fertility Agriculture Examination Education Catholic
## Fertility 1.0000000 0.35307918 -0.6458827 -0.66378886 0.4636847
## Agriculture 0.3530792 1.00000000 -0.6865422 -0.63952252 0.4010951
## Examination -0.6458827 -0.68654221 1.0000000 0.69841530 -0.5727418
## Education -0.6637889 -0.63952252 0.6984153 1.00000000 -0.1538589
## Catholic 0.4636847 0.40109505 -0.5727418 -0.15385892 1.0000000
## Infant.Mortality 0.4165560 -0.06085861 -0.1140216 -0.09932185 0.1754959
## Infant.Mortality
## Fertility 0.41655603
## Agriculture -0.06085861
## Examination -0.11402160
## Education -0.09932185
## Catholic 0.17549591
## Infant.Mortality 1.00000000
The summary of the data set tells us that Switzerland, in 1888, was entering a period known as the demographic transition; i.e., its fertility was beginning to fall from the high level typical of underdeveloped countries. Therefore, it seems best to focus on the strongest correlations between the Fertility measure and one other variable. The strongest correlations are between Fertility and Examination and Fertility and Education. My next step was to see if either appeared to be linear.
plot(swiss$Fertility~swiss$Examination)
abline(lm(swiss$Fertility~swiss$Examination))

plot(swiss$Fertility~swiss$Education)
abline(lm(swiss$Fertility~swiss$Education))

The Fertility and Examination plot appears to be much more linear than the Fertility and Education plot, so I will use Fertility and Examination in my regression analysis.
Discussion7<-(lm(swiss$Fertility~swiss$Examination))
summary(Discussion7)
##
## Call:
## lm(formula = swiss$Fertility ~ swiss$Examination)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.9375 -6.0044 -0.3393 7.9239 19.7399
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 86.8185 3.2576 26.651 < 2e-16 ***
## swiss$Examination -1.0113 0.1782 -5.675 9.45e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.642 on 45 degrees of freedom
## Multiple R-squared: 0.4172, Adjusted R-squared: 0.4042
## F-statistic: 32.21 on 1 and 45 DF, p-value: 9.45e-07
The p-value is less than 0.05 so the relationship between the two variables is statistically significant. However, the R-squared is only 0.4172 which means that only ~42% of the variability in Fertility is explained by Examination.