2024-10-19

Simple Linear Regression

Understanding the Basics and Applications

Simple linear regression is a fundamental statistical technique used to examine the relationship between two variables. The method assumes that the relationship between the dependent variable (response) and the independent variable (predictor) can be explained by a straight line.

Introduction to Simple Linear Regression

Simple linear regression is a statistical method used to model the relationship between a dependent variable and one independent variable.

It helps in understanding how changes in the independent variable can affect the dependent variable.

The Regression Equation

The simple linear regression model is defined by the equation:

\[ Y = \beta_0 + \beta_1X + \epsilon \]

where: - \(Y\) is the dependent variable, - \(\beta_0\) is the y-intercept, - \(\beta_1\) is the slope of the line, - \(X\) is the independent variable, and - \(\epsilon\) is the error term.

Assumptions of Linear Regression

  • Linearity: The relationship between \(X\) and \(Y\) is linear, as shown by the model:

\[ Y = \beta_0 + \beta_1X + \epsilon \]

  • Independence: The residuals \(\epsilon\) are independent.
  • Homoscedasticity: The residuals have constant variance, \(\text{Var}(\epsilon) = \sigma^2\).
  • Normality: The residuals are normally distributed, \(\epsilon \sim \mathcal{N}(0, \sigma^2)\).

Example of Simple Linear Regression

For example, predicting house prices based on square footage.

As the area increases, the price is expected to increase as well. This can be modeled using simple linear regression.

Plotting the Regression Line

## `geom_smooth()` using formula = 'y ~ x'

R Code for Scatterplot with Regression Line

set.seed(42)
x <- rnorm(100, mean = 5, sd = 2)
y <- 3 + 2 * x + rnorm(100)
data <- data.frame(x = x, y = y)
library(ggplot2)
ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(method = "lm", col = "blue") +
  ggtitle("Scatterplot with Regression Line")
## `geom_smooth()` using formula = 'y ~ x'

Plotly 3D Scatter Plot

Slide 9: Conclusion

In conclusion, simple linear regression is a powerful statistical tool used to understand relationships between variables.

It finds applications in various fields such as economics, biology, and engineering.