Introduction to Simple Linear Regression

  • Definition: A method to model the relationship between a dependent variable and one independent variable.
  • Purpose: To predict the value of the dependent variable based on the independent variable.

Key Concepts

  • Dependent Variable (Y): The outcome variable.
  • Independent Variable (X): The predictor variable.
  • Slope (β₁): Indicates the change in Y for a one-unit change in X.
  • Intercept (β₀): The expected value of Y when X is zero.

Mathematical Formulation

The simple linear regression model is given by:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

Where: - \(Y\) is the dependent variable. - \(X\) is the independent variable. - \(\beta_0\) is the intercept. - \(\beta_1\) is the slope. - \(\epsilon\) is the error term.

Assumptions of Simple Linear Regression

  1. Linearity: The relationship between X and Y is linear.
  2. Independence: Observations are independent of each other.
  3. Homoscedasticity: Constant variance of errors.
  4. Normality: Errors are normally distributed.

Example Dataset

We’ll use the built-in mtcars dataset to explore the relationship between Horsepower (hp) and Miles Per Gallon (mpg).

Data Visualization with ggplot2

Fitting the Regression Model

We fit a simple linear regression model to predict MPG based on HP.

Regression Model Coefficients
Estimate Std. Error t value Pr(>|t|)
(Intercept) 30.0988605 1.6339210 18.421246 0e+00
hp -0.0682283 0.0101193 -6.742388 2e-07

Regression Line with ggplot2

Interactive 3D Plot with Plotly

Mathematical Interpretation

The estimated regression equation from our model is:

\[ \hat{Y} = 30.09886 - 0.06823 \times \text{HP} \]

  • Interpretation: For each additional horsepower, the MPG decreases by approximately 0.068, holding all else constant.

R Code Example

Below is the R code used to create the regression line plot with ggplot2.

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point(color = 'darkgreen') +
  geom_smooth(method = 'lm', se = TRUE, color = 'red') +
  labs(title = "Linear Regression of MPG on HP",
       x = "Horsepower (hp)",
       y = "Miles Per Gallon (mpg)") +
  theme_minimal()

Conclusion

  • Simple Linear Regression is a foundational statistical tool for understanding relationships between variables.
  • Always check the underlying assumptions to ensure the validity of the model.
  • Visualization plays a crucial role in interpreting and communicating the results.

References

  • Books:
    • “Introduction to Statistical Learning” by James, Witten, Hastie, and Tibshirani.
    • “Applied Linear Regression” by Sanford Weisberg.
  • Online Resources: