2026-06-08

What is Simple Linear Regression?

Simple linear regression is a statistical method used to model the relationship between two quantitative variables.

The goal is to predict one variable using another.

Why Use Regression?

Regression helps us answer questions like:

  • Does studying more appear to increase exam scores?
  • How strong is the relationship between two variables?
  • Can we use one variable to predict another?

Example Data

Suppose we want to investigate whether the number of hours a student studies is related to their exam score.

Our dataset contains observations on:

  • Hours Studied
  • Exam Score

We will use simple linear regression to determine whether a linear relationship exists between these variables.

Scatterplot of the Data

The scatterplot below shows the positive linear relationship between hours studied and exam score.

Regression Equation

Simple linear regression models the relationship between two variables using a straight line.

The general form of the regression equation is

\[ \hat{y}=b_0+b_1x \]

where:

  • \(\hat{y}\) = predicted value of the response variable
  • \(b_0\) = y-intercept
  • \(b_1\) = slope of the regression line
  • \(x\) = explanatory variable

Regression Line

The regression line summarizes the linear relationship between hours studied and exam score. The positive slope indicates that exam scores tend to increase as study time increases.

Fitted Regression Model

Using our sample data, the fitted regression equation is:

\[ \widehat{\text{Score}} = 50.30 + 4.91(\text{Hours}) \]

This equation can be used to predict a student’s exam score based on the number of hours studied.

Interpretation:

  • The intercept (50.30) represents the predicted exam score when a student studies 0 hours.
  • The slope (4.91) indicates that each additional hour studied is associated with an increase of about 4.91 points in the predicted exam score.

Residuals

A residual measures the difference between an observed value and its predicted value.

\[ e_i = y_i - \hat{y}_i \]

where:

  • \(e_i\) = residual
  • \(y_i\) = observed value
  • \(\hat{y}_i\) = predicted value

Residuals help us evaluate how well the regression model fits the data.

Residual Plot

A residual plot helps determine whether a linear model is appropriate for the data. Residuals that are randomly scattered around zero suggest that a linear model is appropriate.

Interactive Regression Plot

Plotly allows users to interact with the graph by zooming, panning, and hovering over points. Interactive visualizations can make statistical relationships easier to explore and understand.

R Code Example

The following R code fits a simple linear regression model and creates a regression plot. The lm() function fits the regression model, while ggplot2 is used to visualize the relationship between the variables.

model <- lm(Score ~ Hours, data = study_data)

ggplot(study_data, aes(x = Hours, y = Score)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(
    title = "Regression Line",
    x = "Hours Studied",
    y = "Exam Score"
  )

Conclusion

In this presentation, we explored simple linear regression and its applications.

Key takeaways:

  • Simple linear regression models the relationship between two quantitative variables.
  • Scatterplots help visualize relationships in the data.
  • The fitted regression line can be used to make predictions.
  • Residuals help evaluate how well the model fits the data.
  • Regression is a powerful tool for understanding and predicting real-world outcomes.