This presentation explores Simple Linear Regression, a foundational tool in statistics used to examine the relationship between two variables. We will learn what it is, how it works, and see some visual and practical examples.
2025-06-08
This presentation explores Simple Linear Regression, a foundational tool in statistics used to examine the relationship between two variables. We will learn what it is, how it works, and see some visual and practical examples.
Simple linear regression is a statistical method that allows us to model and analyze the relationship between two continuous variables.
We try to fit a straight line that best describes how one variable (the dependent variable) changes in response to another (the independent variable).
The general form of the simple linear regression model is:
\[ y = \beta_0 + \beta_1 x + \varepsilon \] Where: \(y\) is the dependent variable (what we want to predict), \(x\) is the independent variable (the predictor), \(\beta_0\) is the intercept, \(\beta_1\) is the slope, \(\varepsilon\) is the error term (the difference between the predicted and actual value).
In simple linear regression, the model is:
\[ y = \beta_0 + \beta_1 x + \varepsilon \]
Each component of the equation has a specific meaning:
This equation assumes a linear relationship between the variables, where the slope and intercept describe the line of best fit through the data.
We begin by visualizing the relationship between two variables using a scatter plot.
Below is a scatter plot of Height vs. Weight for a small sample of individuals.
Now we fit a simple linear regression model to our data.
The goal is to find the line that best describes the relationship between height and weight.
Here’s a 3D plot showing how weight varies with height and age. This gives us an idea of how multiple predictors can influence an outcome.
Below is the R code used to create the linear regression plot showing how weight changes with height.
# Fit the linear model model <- lm(weight ~ height, data = sample_data) # Create the plot with regression line library(ggplot2) ggplot(sample_data, aes(x = height, y = weight)) + geom_point(color = "steelblue", size = 3) + geom_smooth(method = "lm", se = FALSE, color = "darkred", size = 1.2) + labs(x = "Height (inches)", y = "Weight (pounds)", title = "Fitted Linear Regression Line") + theme_minimal(base_size = 10)
To find the best-fitting regression line, we estimate the coefficients \(\beta_0\) and \(\beta_1\) by minimizing the sum of squared errors (SSE).
This means we minimize the following:
\[ SSE = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 = \sum_{i=1}^{n} (y_i - (\beta_0 + \beta_1 x_i))^2 \]
The solution gives us the formulas:
\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}, \quad \beta_0 = \bar{y} - \beta_1 \bar{x} \]
These are the slope and intercept of the regression line.
lm() and to visualize it with ggplot2 or plotly.