Using the “swiss” dataset, conduct a regression of two variables of interest. Interpret the assumptions and results. Post your solutions and R code here.

library(stats)
library(datasets)
library(psych)
library(kableExtra)
require(graphics)

##Swiss Data
swiss
##              Fertility Agriculture Examination Education Catholic
## Courtelary        80.2        17.0          15        12     9.96
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
## Broye             83.8        70.2          16         7    92.85
## Glane             92.4        67.8          14         8    97.16
## Gruyere           82.4        53.3          12         7    97.67
## Sarine            82.9        45.2          16        13    91.38
## Veveyse           87.1        64.5          14         6    98.61
## Aigle             64.1        62.0          21        12     8.52
## Aubonne           66.9        67.5          14         7     2.27
## Avenches          68.9        60.7          19        12     4.43
## Cossonay          61.7        69.3          22         5     2.82
## Echallens         68.3        72.6          18         2    24.20
## Grandson          71.7        34.0          17         8     3.30
## Lausanne          55.7        19.4          26        28    12.11
## La Vallee         54.3        15.2          31        20     2.15
## Lavaux            65.1        73.0          19         9     2.84
## Morges            65.5        59.8          22        10     5.23
## Moudon            65.0        55.1          14         3     4.52
## Nyone             56.6        50.9          22        12    15.14
## Orbe              57.4        54.1          20         6     4.20
## Oron              72.5        71.2          12         1     2.40
## Payerne           74.2        58.1          14         8     5.23
## Paysd'enhaut      72.0        63.5           6         3     2.56
## Rolle             60.5        60.8          16        10     7.72
## Vevey             58.3        26.8          25        19    18.46
## Yverdon           65.4        49.5          15         8     6.10
## Conthey           75.5        85.9           3         2    99.71
## Entremont         69.3        84.9           7         6    99.68
## Herens            77.3        89.7           5         2   100.00
## Martigwy          70.5        78.2          12         6    98.96
## Monthey           79.4        64.9           7         3    98.22
## St Maurice        65.0        75.9           9         9    99.06
## Sierre            92.2        84.6           3         3    99.46
## Sion              79.3        63.1          13        13    96.83
## Boudry            70.4        38.4          26        12     5.62
## La Chauxdfnd      65.7         7.7          29        11    13.79
## Le Locle          72.7        16.7          22        13    11.22
## Neuchatel         64.4        17.6          35        32    16.92
## Val de Ruz        77.6        37.6          15         7     4.97
## ValdeTravers      67.6        18.7          25         7     8.65
## V. De Geneve      35.0         1.2          37        53    42.34
## Rive Droite       44.7        46.6          16        29    50.43
## Rive Gauche       42.8        27.7          22        29    58.33
##              Infant.Mortality
## Courtelary               22.2
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6
## Broye                    23.6
## Glane                    24.9
## Gruyere                  21.0
## Sarine                   24.4
## Veveyse                  24.5
## Aigle                    16.5
## Aubonne                  19.1
## Avenches                 22.7
## Cossonay                 18.7
## Echallens                21.2
## Grandson                 20.0
## Lausanne                 20.2
## La Vallee                10.8
## Lavaux                   20.0
## Morges                   18.0
## Moudon                   22.4
## Nyone                    16.7
## Orbe                     15.3
## Oron                     21.0
## Payerne                  23.8
## Paysd'enhaut             18.0
## Rolle                    16.3
## Vevey                    20.9
## Yverdon                  22.5
## Conthey                  15.1
## Entremont                19.8
## Herens                   18.3
## Martigwy                 19.4
## Monthey                  20.2
## St Maurice               17.8
## Sierre                   16.3
## Sion                     18.1
## Boudry                   20.3
## La Chauxdfnd             20.5
## Le Locle                 18.9
## Neuchatel                23.0
## Val de Ruz               20.0
## ValdeTravers             19.5
## V. De Geneve             18.0
## Rive Droite              18.2
## Rive Gauche              19.3
str(swiss)
## 'data.frame':    47 obs. of  6 variables:
##  $ Fertility       : num  80.2 83.1 92.5 85.8 76.9 76.1 83.8 92.4 82.4 82.9 ...
##  $ Agriculture     : num  17 45.1 39.7 36.5 43.5 35.3 70.2 67.8 53.3 45.2 ...
##  $ Examination     : int  15 6 5 12 17 9 16 14 12 16 ...
##  $ Education       : int  12 9 5 7 15 7 7 8 7 13 ...
##  $ Catholic        : num  9.96 84.84 93.4 33.77 5.16 ...
##  $ Infant.Mortality: num  22.2 22.2 20.2 20.3 20.6 26.6 23.6 24.9 21 24.4 ...
mydescribe <-round(describe(swiss),3) 
mydescribe%>%kbl()%>%kable_classic(html_font = "Courier New")
vars n mean sd median trimmed mad min max range skew kurtosis se
Fertility 1 47 70.143 12.492 70.40 70.659 10.230 35.00 92.5 57.50 -0.456 0.260 1.822
Agriculture 2 47 50.660 22.711 54.10 51.156 23.870 1.20 89.7 88.50 -0.320 -0.886 3.313
Examination 3 47 16.489 7.978 16.00 16.077 7.413 3.00 37.0 34.00 0.446 -0.137 1.164
Education 4 47 10.979 9.615 8.00 9.385 5.930 1.00 53.0 52.00 2.268 6.140 1.403
Catholic 5 47 41.144 41.705 15.14 39.116 18.651 2.15 100.0 97.85 0.479 -1.665 6.083
Infant.Mortality 6 47 19.943 2.913 20.00 19.985 2.817 10.80 26.6 15.80 -0.331 0.777 0.425
##Correlation
cor(swiss)
##                   Fertility Agriculture Examination   Education   Catholic
## Fertility         1.0000000  0.35307918  -0.6458827 -0.66378886  0.4636847
## Agriculture       0.3530792  1.00000000  -0.6865422 -0.63952252  0.4010951
## Examination      -0.6458827 -0.68654221   1.0000000  0.69841530 -0.5727418
## Education        -0.6637889 -0.63952252   0.6984153  1.00000000 -0.1538589
## Catholic          0.4636847  0.40109505  -0.5727418 -0.15385892  1.0000000
## Infant.Mortality  0.4165560 -0.06085861  -0.1140216 -0.09932185  0.1754959
##                  Infant.Mortality
## Fertility              0.41655603
## Agriculture           -0.06085861
## Examination           -0.11402160
## Education             -0.09932185
## Catholic               0.17549591
## Infant.Mortality       1.00000000
##Picking Fertility and Education
cor(swiss$Fertility, swiss$Education)
## [1] -0.6637889
#Based on this there is a strong negative correlation between the two variables.
hist(swiss$Fertility)

#Normally distributed.
hist(swiss$Education)

#Skewed to the right.

##Regression
myreg <- lm(swiss$Fertility~swiss$Education)
summary(myreg)
## 
## Call:
## lm(formula = swiss$Fertility ~ swiss$Education)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -17.036  -6.711  -1.011   9.526  19.689 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      79.6101     2.1041  37.836  < 2e-16 ***
## swiss$Education  -0.8624     0.1448  -5.954 3.66e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.446 on 45 degrees of freedom
## Multiple R-squared:  0.4406, Adjusted R-squared:  0.4282 
## F-statistic: 35.45 on 1 and 45 DF,  p-value: 3.659e-07
#The intercept is 79.6101 which indicates that when the percent of those who have education beyond primary school for draftees is equal to zero, the fertility equals 79.6101. The coefficient for Education is -0.8624 which is the slope for the regression line and that means when the education beyond primary school for draftees increases by one percentage point, Fertility decreases by 0.8624. The R-squared being 0.4406 shows us that 44.06% of Fertility's variation can be explained by Education.
##ANOVA check
aov <- anova(myreg)
aov
## Analysis of Variance Table
## 
## Response: swiss$Fertility
##                 Df Sum Sq Mean Sq F value    Pr(>F)    
## swiss$Education  1 3162.7  3162.7  35.446 3.659e-07 ***
## Residuals       45 4015.2    89.2                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
sqrt(89.2)
## [1] 9.444575
##Plots
#Residuals
plot(myreg)

hist(myreg$residuals)

mean(myreg$residuals)
## [1] -7.176588e-16
#Mean indicates that they are centered at close to zero. The plot seems to show an outlier that may skew the results a bit. The histogram shows that the residuals are normally distributed, but definitely not perfectly.
shapiro.test(myreg$residuals)
## 
##  Shapiro-Wilk normality test
## 
## data:  myreg$residuals
## W = 0.95346, p-value = 0.05922
#We cannot reject the null hypothesis that the residuals are normally distributed although it is close.
#Model
plot(swiss$Fertility, swiss$Education)
abline(myreg)

#The plot shows the negative correlation between the two variables.