Regression - Discussion 11

Swiss Education vs Agriculture Regression

I am using the built-in R dataset of Swiss socioeconomic and fertility factors from 1888. I will focus on the relationship between education and agriculture.

##              Fertility Agriculture Examination Education Catholic
## Courtelary        80.2        17.0          15        12     9.96
## Delemont          83.1        45.1           6         9    84.84
## Franches-Mnt      92.5        39.7           5         5    93.40
## Moutier           85.8        36.5          12         7    33.77
## Neuveville        76.9        43.5          17        15     5.16
## Porrentruy        76.1        35.3           9         7    90.57
##              Infant.Mortality
## Courtelary               22.2
## Delemont                 22.2
## Franches-Mnt             20.2
## Moutier                  20.3
## Neuveville               20.6
## Porrentruy               26.6

Visualization

A scatter plot of agriculture as a function of education. We can take the log of education to fit the a linear regression.

Conditions

  • The Swiss towns which we used to fit the line are independent.

  • The relationship between the log education and agriculture is linear.

  • The residuals from the regression line are nearly normal.

  • The variability of education/agriculture observations around the regression line is constant.

Model Quality

  • This model has an intercent of -19.348 and a slope of 91.277.

  • To determine the model quality, we look at the Multiple R-squared value. R-squared values closer to one indicate better model quality, so R^2 = 0.4569 indicates that this model is not ideal.

## 
## Call:
## lm(formula = Agriculture ~ log(Education))
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -37.181 -14.089  -0.326  13.974  28.291 
## 
## Coefficients:
##                Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      91.277      7.048  12.950  < 2e-16 ***
## log(Education)  -19.348      3.145  -6.152 1.85e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 16.92 on 45 degrees of freedom
## Multiple R-squared:  0.4569, Adjusted R-squared:  0.4448 
## F-statistic: 37.85 on 1 and 45 DF,  p-value: 1.854e-07