October 18, 2024
Simple linear regression is a statistical method that allows us to model the relationship between two variables by fitting a linear equation to observed data.
Real-world applications: - Predicting fuel efficiency based on horsepower. - Estimating house prices based on square footage.
The goal is to find a linear equation that best predicts \(y\) based on \(x\).
The simple linear regression equation is:
\[ y = \beta_0 + \beta_1x + \epsilon \]
Where: - \(y\) is the dependent variable (response variable). - \(x\) is the independent variable (predictor variable). - \(\beta_0\) is the intercept (the value of \(y\) when \(x = 0\)). - \(\beta_1\) is the slope of the line (the change in \(y\) for a one-unit change in \(x\)). - \(\epsilon\) is the error term, representing the difference between the observed and predicted values.
Our goal is to estimate \(\beta_0\) and \(\beta_1\) using data.
mtcarsFor this simple linear regression analysis, we use the built-in R dataset mtcars.
mpg (Miles per Gallon) – The fuel efficiency of a car.hp (Horsepower) – The power output of the car.This dataset contains observations from 32 car models, with variables such as: - Miles per Gallon (mpg) - Horsepower (hp) - Number of cylinders (cyl) - Weight of the car (wt)
Our goal is to use horsepower to predict miles per gallon using a simple linear regression model.
library(ggplot2)
# Plotting a scatterplot and adding a regression line
ggplot(mtcars, aes(x = hp, y = mpg)) +
geom_point(color = "blue") +
geom_smooth(method = "lm", col = "red") +
labs(title = "Simple Linear Regression: MPG vs Horsepower",
x = "Horsepower (hp)",
y = "Miles per Gallon (mpg)")
The regression model helps us understand the relationship between horsepower (hp) and miles per gallon (mpg).
From our regression model: - Slope: For every additional unit of horsepower, mpg decreases. - \(R^2\): The value of \(R^2\) shows how well horsepower explains the variance in mpg.
A residual plot helps evaluate the fit of the regression model by showing the difference between the observed and predicted values.
The formulas for calculating the slope (\(\beta_1\)) and intercept (\(\beta_0\)) in simple linear regression are:
\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]
Where: - \(x_i\) and \(y_i\) are the individual data points. - \(\bar{x}\) and \(\bar{y}\) are the means of the independent and dependent variables, respectively. - \(\beta_1\) represents the slope of the regression line, indicating the change in \(y\) for a one-unit change in \(x\). - \(\beta_0\) represents the intercept, which is the expected value of \(y\) when \(x = 0\).
These formulas help us understand how the slope and intercept are derived from the data.
In this presentation, we explored the basics of simple linear regression, a powerful tool for understanding relationships between variables.
Simple linear regression is widely used in various fields, from predicting fuel efficiency to estimating real estate prices. Understanding the basics of this method can help in building more complex models and interpreting real-world data.