This is just to get a look at the data and see what it looks like
head(swiss)
I copied this after typing in ?swiss, in order to fully understand what each variable represents.
#Fertility Ig, ‘common standardized fertility measure’
#Agriculture % of males involved in agriculture as occupation
#Examination % draftees receiving highest mark on army examination
#Education % education beyond primary school for draftees.
#Catholic % ‘catholic’ (as opposed to ‘protestant’).
#Infant.Mortality live births who live less than 1 year.
plot(Examination~Agriculture,swiss)
cor(swiss$Agriculture,swiss$Examination)
## [1] -0.6865422
Correlation of -0.6865422, indicates there is a relativly strong negative correlation.
model = lm(Examination~Agriculture,swiss)
summary(model)
##
## Call:
## lm(formula = Examination ~ Agriculture, data = swiss)
##
## Residuals:
## Min 1Q Median 3Q Max
## -14.1324 -3.0931 -0.0264 4.2817 10.5378
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 28.70668 2.11001 13.605 < 2e-16 ***
## Agriculture -0.24117 0.03807 -6.334 9.95e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.865 on 45 degrees of freedom
## Multiple R-squared: 0.4713, Adjusted R-squared: 0.4596
## F-statistic: 40.12 on 1 and 45 DF, p-value: 9.952e-08
The intercept is 28.70668, which means when the percent of males involved in agriculture as occupation is 0, the number percent of draftees receiving highest marks on army examination is 28.70668.
The slope of -0.24117 indicates that as the the percent of males involved in agriculture increases by 1, the percent of draftees receivng highest mark on army examination decreases by 0.24117.
The R-squared tells us that 47.13% of the variation in examination can be explained by agriculture.
plot(model)
mean(model$residuals)
## [1] -3.871707e-16
hist(model$residuals)
The residual plot shows the data is centered at 0, although there are some curves that indicate the data might not be heteroscedastic.
The mean of the residuals is nearly 0
The histogram of the residuals shows a nearly normal distribution
Agriculture1 = coef(model)[1] + coef(model)[2]*10
Agriculture1
## (Intercept)
## 26.29503
When the percent of males involved in agriculture is 10, the percent of draftees receiving highest mark on army examination is 26.29503.
Agriculture2 = coef(model)[1] + coef(model)[2]*80
Agriculture2
## (Intercept)
## 9.413474
When the percent of males involved in agriculture is 80, the percent of draftees receiving highest mark on army examination is 9.413474.
So as the percent of men working in agriculture increases the examination scores decrease.
plot(Examination~Agriculture,swiss)
abline(model)