Markdown Syntax

Introduction

In this notebook, we demonstrate how to fit a simple linear regression model in R and perform the necessary assumption checks. We will use the built-in mtcars dataset.

# Load necessary libraries and data
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.4.1
data(mtcars)

# Show the first few rows of the dataset
head(mtcars)

Fitting the linear regression model

We will model mpg (miles per gallon) as a function of wt (weight of the car).

# Fit the model
model <- lm(mpg ~ wt, data = mtcars)

# Summary of the model
summary(model)
## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Assumption Checks

Linearity

We check the linearity assumption by plotting the residuals vs. the fitted values.

# Plot residuals vs. fitted values
ggplot(data = model, aes(x = .fitted, y = .resid)) +
  geom_point() +
  geom_smooth(method = "loess", col = "red") +
  labs(title = "Residuals vs Fitted", x = "Fitted values", y = "Residuals")
## `geom_smooth()` using formula = 'y ~ x'

Normality of Residuals

We can check the normality of residuals using a Q-Q plot and a Shapiro-Wilk test.

# Q-Q plot
qqnorm(residuals(model))
qqline(residuals(model), col = "red")

# Shapiro-Wilk test
shapiro.test(residuals(model))
## 
##  Shapiro-Wilk normality test
## 
## data:  residuals(model)
## W = 0.94508, p-value = 0.1044

Homoscedasticity

Homoscedasticity means that the residuals should have constant variance. We check this by plotting the residuals against the fitted values.

# Residuals vs fitted values plot (again checks homoscedasticity)
plot(model, which = 3)

Independence of Residuals

We can check for independence of residuals using the Durbin-Watson test.

# Install the package if necessary
# install.packages("lmtest")
library(lmtest)
## Warning: package 'lmtest' was built under R version 4.4.1
## Loading required package: zoo
## Warning: package 'zoo' was built under R version 4.4.1
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
# Perform Durbin-Watson test
dwtest(model)
## 
##  Durbin-Watson test
## 
## data:  model
## DW = 1.2517, p-value = 0.0102
## alternative hypothesis: true autocorrelation is greater than 0

Conclusion

We have successfully fitted a simple linear regression model and performed basic assumption checks, ensuring the validity of the model. This workflow can be easily reproduced and adapted to other datasets.