Example of Linear Regression and Residual Analysis. Use of the Boston dataframe in the MASS package.
https://cran.r-project.org/web/packages/MASS/MASS.pdf
# Load the MASS package.
library(MASS)
# Use the Boston dataframe which contains housing data for the suburbs of Boston.
class(Boston)
## [1] "data.frame"
names(Boston)
## [1] "crim" "zn" "indus" "chas" "nox" "rm" "age"
## [8] "dis" "rad" "tax" "ptratio" "black" "lstat" "medv"
head(Boston)
## crim zn indus chas nox rm age dis rad tax ptratio black
## 1 0.00632 18 2.31 0 0.538 6.575 65.2 4.0900 1 296 15.3 396.90
## 2 0.02731 0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90
## 3 0.02729 0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83
## 4 0.03237 0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63
## 5 0.06905 0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90
## 6 0.02985 0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12
## lstat medv
## 1 4.98 24.0
## 2 9.14 21.6
## 3 4.03 34.7
## 4 2.94 33.4
## 5 5.33 36.2
## 6 5.21 28.7
# Is there a linear relationship between median value and crime rate per capita?
m1 = lm(medv ~ crim, data = Boston)
summary(m1)
##
## Call:
## lm(formula = medv ~ crim, data = Boston)
##
## Residuals:
## Min 1Q Median 3Q Max
## -16.957 -5.449 -2.007 2.512 29.800
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 24.03311 0.40914 58.74 <2e-16 ***
## crim -0.41519 0.04389 -9.46 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.484 on 504 degrees of freedom
## Multiple R-squared: 0.1508, Adjusted R-squared: 0.1491
## F-statistic: 89.49 on 1 and 504 DF, p-value: < 2.2e-16
From the above we see that the p-values are quite low (2e-16), and the standard error is low compared to the estimate.
Plot the residuals.
# Density plot of residuals
plot(density(resid(m1)))
qqnorm(resid(m1))
qqline(resid(m1))
The above suggests that the linear model is not really appropriate because the residuals do not follow a normal distribution plot, particularly for large values of predictor variable (Boston$crim, crime rate per capita).