Variables I used to conduct my regression were Agriculture and Examination.

This is just to get a look at the data and see what it looks like

head(swiss)

I copied this after typing in ?swiss, in order to fully understand what each variable represents.

#Fertility           Ig, ‘common standardized fertility measure’
#Agriculture            % of males involved in agriculture as occupation
#Examination            % draftees receiving highest mark on army examination
#Education            % education beyond primary school for draftees.
#Catholic            % ‘catholic’ (as opposed to ‘protestant’).
#Infant.Mortality     live births who live less than 1 year.

Graph of Examination and Agriculture and correlation coefficient between the two variables

plot(Examination~Agriculture,swiss)

cor(swiss$Agriculture,swiss$Examination)
## [1] -0.6865422

Correlation of -0.6865422, indicates there is a relativly strong negative correlation.

Regression of Agriculture on Examination

model = lm(Examination~Agriculture,swiss)
summary(model)
## 
## Call:
## lm(formula = Examination ~ Agriculture, data = swiss)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -14.1324  -3.0931  -0.0264   4.2817  10.5378 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 28.70668    2.11001  13.605  < 2e-16 ***
## Agriculture -0.24117    0.03807  -6.334 9.95e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 5.865 on 45 degrees of freedom
## Multiple R-squared:  0.4713, Adjusted R-squared:  0.4596 
## F-statistic: 40.12 on 1 and 45 DF,  p-value: 9.952e-08

The intercept is 28.70668, which means when the percent of males involved in agriculture as occupation is 0, the number percent of draftees receiving highest marks on army examination is 28.70668.

The slope of -0.24117 indicates that as the the percent of males involved in agriculture increases by 1, the percent of draftees receivng highest mark on army examination decreases by 0.24117.

The R-squared tells us that 47.13% of the variation in examination can be explained by agriculture.

Plot of the residuals

plot(model)

mean(model$residuals) 
## [1] -3.871707e-16
hist(model$residuals)

The residual plot shows the data is centered at 0, although there are some curves that indicate the data might not be heteroscedastic.

The mean of the residuals is nearly 0

The histogram of the residuals shows a nearly normal distribution

Plugging in some values for Agriculture

Agriculture1 = coef(model)[1] + coef(model)[2]*10
Agriculture1
## (Intercept) 
##    26.29503

When the percent of males involved in agriculture is 10, the percent of draftees receiving highest mark on army examination is 26.29503.

Agriculture2 = coef(model)[1] + coef(model)[2]*80
Agriculture2
## (Intercept) 
##    9.413474

When the percent of males involved in agriculture is 80, the percent of draftees receiving highest mark on army examination is 9.413474.

So as the percent of men working in agriculture increases the examination scores decrease.

Graph of model with regression line

plot(Examination~Agriculture,swiss)
abline(model)