Using the “swiss” dataset, conduct a regression of two variables of interest. Interpret the assumptions and results. Post your solutions and R code here.

library(psych)

We should then look into the correlations that exist between the variables in order to establish which is the best one to analyze in regards to fertility

cor(swiss)
##                   Fertility Agriculture Examination   Education   Catholic
## Fertility         1.0000000  0.35307918  -0.6458827 -0.66378886  0.4636847
## Agriculture       0.3530792  1.00000000  -0.6865422 -0.63952252  0.4010951
## Examination      -0.6458827 -0.68654221   1.0000000  0.69841530 -0.5727418
## Education        -0.6637889 -0.63952252   0.6984153  1.00000000 -0.1538589
## Catholic          0.4636847  0.40109505  -0.5727418 -0.15385892  1.0000000
## Infant.Mortality  0.4165560 -0.06085861  -0.1140216 -0.09932185  0.1754959
##                  Infant.Mortality
## Fertility              0.41655603
## Agriculture           -0.06085861
## Examination           -0.11402160
## Education             -0.09932185
## Catholic               0.17549591
## Infant.Mortality       1.00000000

For some reason this “kdepairs(swiss)”would not graph. So I cannot infer a graphical difference.

From a quick analysis one could say that Fertility and Education are the closest related variables. This can be demonstrated by graphing and seeing if there is indeed a linear relationship.

plot(swiss$Fertility~swiss$Education, xlab='Education', ylab='Fertility')
abline(lm(swiss$Fertility~swiss$Education))

plot(swiss$Fertility~swiss$Examination, xlab='Examination', ylab='Fertility')
abline(lm(swiss$Fertility~swiss$Examination))

sample1<-lm(Fertility~Education, data=swiss)
summary(sample1)
## 
## Call:
## lm(formula = Fertility ~ Education, data = swiss)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.036  -6.711  -1.011   9.526  19.689 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  79.6101     2.1041  37.836  < 2e-16 ***
## Education    -0.8624     0.1448  -5.954 3.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.446 on 45 degrees of freedom
## Multiple R-squared:  0.4406, Adjusted R-squared:  0.4282 
## F-statistic: 35.45 on 1 and 45 DF,  p-value: 3.659e-07
sample2<-lm(Fertility~Examination, data=swiss)
summary(sample2)
## 
## Call:
## lm(formula = Fertility ~ Examination, data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -25.9375  -6.0044  -0.3393   7.9239  19.7399 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  86.8185     3.2576  26.651  < 2e-16 ***
## Examination  -1.0113     0.1782  -5.675 9.45e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.642 on 45 degrees of freedom
## Multiple R-squared:  0.4172, Adjusted R-squared:  0.4042 
## F-statistic: 32.21 on 1 and 45 DF,  p-value: 9.45e-07
plot(sample1)

plot(sample2)

I would have liked to run a test with both variables in it but i am not too sure how to.