Overview

In this tutorial, we’ll explore the basics of linear modeling using R. We’ll walk through the steps to perform a simple linear regression analysis, visualize the data, and interpret the results.

Setting up the Environment

Before we start, make sure you have R and RStudio installed. You can download them from the following links:

What is Linear Modelling?

Linear modeling is a statistical technique used to model the relationship between a dependent variable (response) and one or more independent variables (predictors) by fitting a linear equation.

Simple Linear Regression

Simple linear regression models the relationship between a single predictor and the response. The model equation is:

Y = β0 + β1X + ε


Where:

  • Y is the response variable

  • X is the predictor variable

  • β0 is the intercept

  • β1 is the slope

  • ε is the error term

Data Preparation

Let’s start by simply importing a sample dataset. We’ll use the mtcars dataset that comes with R, which contains information about various car models.

# Load the mtcars dataset
data(mtcars)

To have a peak at the structure of the data set, use the following code:

head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

The dataset contains 32 observations on 11 (numeric) variables:

mpg Miles/(US) gallon
cyl Number of cylinders
disp Displacement (cu.in.)
hp Gross horsepower
drat Rear axle ratio
wt Weight (1000 lbs)
qsec 1/4 mile time
vs Engine (0 = V-shaped, 1 = straight)
am Transmission (0 = automatic, 1 = manual)
gear Number of forward gears
carb Number of carburetors

We’ll use mpg as the response variable and hp as the predictor for our simple linear regression.

Simple Linear Regression Analysis

Now, let’s perform a simple linear regression analysis to model the relationship between mpg and hp.

# Fit a simple linear regression model
model <- lm(mpg ~ hp, data = mtcars)

# Display the summary of the model
summary(model)
## 
## Call:
## lm(formula = mpg ~ hp, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.7121 -2.1122 -0.8854  1.5819  8.2360 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 30.09886    1.63392  18.421  < 2e-16 ***
## hp          -0.06823    0.01012  -6.742 1.79e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.863 on 30 degrees of freedom
## Multiple R-squared:  0.6024, Adjusted R-squared:  0.5892 
## F-statistic: 45.46 on 1 and 30 DF,  p-value: 1.788e-07

The lm() function fits the model, and summary(model) provides a summary of the regression results.

Visualization

To visualize the relationship, we’ll create a scatter plot with the regression line, using ggplot.

# Install and load the ggplot2 package
# install.packages("ggplot2")
library(ggplot2)

# Create a scatter plot
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm") +
  labs(title = "Scatter Plot of hp vs. mpg",
       x = "Horsepower",
       y = "Miles per Gallon")
## `geom_smooth()` using formula = 'y ~ x'

This scatter plot shows the relationship between hp and mpg, along with the linear regression line.

We can clearly observe a negative relationship between the two variables: as Horsepower increases, Miles per Gallon decreases. This means that as the Horsepower of a car increases, the fuel efficiency tends to decrease.

Model Evaluation

We can evaluate the model’s performance by calculating the mean squared error (MSE).

# Calculate Mean Squared Error (MSE)
predicted <- predict(model)
actual <- mtcars$mpg
mse <- mean((actual - predicted)^2)
mse
## [1] 13.98982

The lower the MSE, the better the model fits the data.

Interpreting Results

The summary of the regression model provides coefficients, including the intercept and slope. Interpret these values to draw conclusions about the relationship between the variables.

Conclusion

In this tutorial, we introduced linear modeling in R and performed a simple linear regression analysis using the mtcars data set that comes with R. You can apply these concepts to analyze and model relationships between variables of other datasets.