Simple linear regression is a statistical method used to model the relationship between two quantitative variables.
The goal is to predict one variable using another.
2026-06-08
Simple linear regression is a statistical method used to model the relationship between two quantitative variables.
The goal is to predict one variable using another.
Regression helps us answer questions like:
Suppose we want to investigate whether the number of hours a student studies is related to their exam score.
Our dataset contains observations on:
We will use simple linear regression to determine whether a linear relationship exists between these variables.
The scatterplot below shows the positive linear relationship between hours studied and exam score.
Simple linear regression models the relationship between two variables using a straight line.
The general form of the regression equation is
\[ \hat{y}=b_0+b_1x \]
where:
The regression line summarizes the linear relationship between hours studied and exam score. The positive slope indicates that exam scores tend to increase as study time increases.
Using our sample data, the fitted regression equation is:
\[ \widehat{\text{Score}} = 50.30 + 4.91(\text{Hours}) \]
This equation can be used to predict a student’s exam score based on the number of hours studied.
Interpretation:
A residual measures the difference between an observed value and its predicted value.
\[ e_i = y_i - \hat{y}_i \]
where:
Residuals help us evaluate how well the regression model fits the data.
A residual plot helps determine whether a linear model is appropriate for the data. Residuals that are randomly scattered around zero suggest that a linear model is appropriate.
Plotly allows users to interact with the graph by zooming, panning, and hovering over points. Interactive visualizations can make statistical relationships easier to explore and understand.
The following R code fits a simple linear regression model and creates a regression plot. The lm() function fits the regression model, while ggplot2 is used to visualize the relationship between the variables.
model <- lm(Score ~ Hours, data = study_data)
ggplot(study_data, aes(x = Hours, y = Score)) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = FALSE) +
labs(
title = "Regression Line",
x = "Hours Studied",
y = "Exam Score"
)
In this presentation, we explored simple linear regression and its applications.
Key takeaways: