Using the “swiss” dataset, conduct a regression of two variables of interest. Interpret the assumptions and results. Post your solutions and R code here.
library(psych)
We should then look into the correlations that exist between the variables in order to establish which is the best one to analyze in regards to fertility
cor(swiss)
## Fertility Agriculture Examination Education Catholic
## Fertility 1.0000000 0.35307918 -0.6458827 -0.66378886 0.4636847
## Agriculture 0.3530792 1.00000000 -0.6865422 -0.63952252 0.4010951
## Examination -0.6458827 -0.68654221 1.0000000 0.69841530 -0.5727418
## Education -0.6637889 -0.63952252 0.6984153 1.00000000 -0.1538589
## Catholic 0.4636847 0.40109505 -0.5727418 -0.15385892 1.0000000
## Infant.Mortality 0.4165560 -0.06085861 -0.1140216 -0.09932185 0.1754959
## Infant.Mortality
## Fertility 0.41655603
## Agriculture -0.06085861
## Examination -0.11402160
## Education -0.09932185
## Catholic 0.17549591
## Infant.Mortality 1.00000000
For some reason this “kdepairs(swiss)”would not graph. So I cannot infer a graphical difference.
From a quick analysis one could say that Fertility and Education are the closest related variables. This can be demonstrated by graphing and seeing if there is indeed a linear relationship.
plot(swiss$Fertility~swiss$Education, xlab='Education', ylab='Fertility')
abline(lm(swiss$Fertility~swiss$Education))
plot(swiss$Fertility~swiss$Examination, xlab='Examination', ylab='Fertility')
abline(lm(swiss$Fertility~swiss$Examination))
sample1<-lm(Fertility~Education, data=swiss)
summary(sample1)
##
## Call:
## lm(formula = Fertility ~ Education, data = swiss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -17.036 -6.711 -1.011 9.526 19.689
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 79.6101 2.1041 37.836 < 2e-16 ***
## Education -0.8624 0.1448 -5.954 3.66e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.446 on 45 degrees of freedom
## Multiple R-squared: 0.4406, Adjusted R-squared: 0.4282
## F-statistic: 35.45 on 1 and 45 DF, p-value: 3.659e-07
sample2<-lm(Fertility~Examination, data=swiss)
summary(sample2)
##
## Call:
## lm(formula = Fertility ~ Examination, data = swiss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -25.9375 -6.0044 -0.3393 7.9239 19.7399
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 86.8185 3.2576 26.651 < 2e-16 ***
## Examination -1.0113 0.1782 -5.675 9.45e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.642 on 45 degrees of freedom
## Multiple R-squared: 0.4172, Adjusted R-squared: 0.4042
## F-statistic: 32.21 on 1 and 45 DF, p-value: 9.45e-07
plot(sample1)
plot(sample2)
I would have liked to run a test with both variables in it but i am not too sure how to.