In this notebook, we demonstrate how to fit a simple linear
regression model in R and perform the necessary assumption checks. We
will use the built-in mtcars dataset.
# Load necessary libraries and data
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.1
data(mtcars)
# Show the first few rows of the dataset
head(mtcars)
We will model mpg (miles per gallon) as a function of
wt (weight of the car).
# Fit the model
model <- lm(mpg ~ wt, data = mtcars)
# Summary of the model
summary(model)
##
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5432 -2.3647 -0.1252 1.4096 6.8727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 37.2851 1.8776 19.858 < 2e-16 ***
## wt -5.3445 0.5591 -9.559 1.29e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446
## F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
We check the linearity assumption by plotting the residuals vs. the fitted values.
# Plot residuals vs. fitted values
ggplot(data = model, aes(x = .fitted, y = .resid)) +
geom_point() +
geom_smooth(method = "loess", col = "red") +
labs(title = "Residuals vs Fitted", x = "Fitted values", y = "Residuals")
## `geom_smooth()` using formula = 'y ~ x'
We can check the normality of residuals using a Q-Q plot and a Shapiro-Wilk test.
# Q-Q plot
qqnorm(residuals(model))
qqline(residuals(model), col = "red")
# Shapiro-Wilk test
shapiro.test(residuals(model))
##
## Shapiro-Wilk normality test
##
## data: residuals(model)
## W = 0.94508, p-value = 0.1044
Homoscedasticity means that the residuals should have constant variance. We check this by plotting the residuals against the fitted values.
# Residuals vs fitted values plot (again checks homoscedasticity)
plot(model, which = 3)
We can check for independence of residuals using the Durbin-Watson test.
# Install the package if necessary
# install.packages("lmtest")
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.4.1
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.4.1
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
# Perform Durbin-Watson test
dwtest(model)
##
## Durbin-Watson test
##
## data: model
## DW = 1.2517, p-value = 0.0102
## alternative hypothesis: true autocorrelation is greater than 0
We have successfully fitted a simple linear regression model and performed basic assumption checks, ensuring the validity of the model. This workflow can be easily reproduced and adapted to other datasets.