2024-11-01

Simple Linear Regression

What is Linear Regression?

  • Linear regression is a statistical method for modeling the relationship between a dependent variable (Y) and an independent variable (X).
  • Simple Linear Regression: 1 independent variable.
  • Multiple Linear Regression: multiple independent variables.
  • The ultimate goal is to predict the variable \(Y\).

When is Simple Linear Regression Applicable?

  1. When you need to understand the linear relationship between two continuous variables.
  2. The relationship appears linear.
  3. There is one predictor \(X\).

Assumptions of Linear Regression:

  1. The relationship must be linear.
  2. Observations must be independent.
  3. Homoscedasticity: similar variances of errors.
  4. Residuals are normally distributed.

Examples:

  • Predicting muscle growth based on protein intake.
  • Estimating calorie intake based on body weight.
  • Predicting exam grades based on hours of study.

Formula for Simple Linear Regression

\(Y = \beta_0 + \beta_1 X + \epsilon\)

  • \(\beta_0\): Intercept (the value of \(Y\) when \(X = 0\))
  • \(\beta_1\): Slope (the change in \(Y\) for a change in \(X\))
  • \(\epsilon\): Error term (accounts for the variability not explained by \(X\))

Example Scenario

We want to predict sales based on spending on advertising.

Sales Vs Advertising Spend

Residual Plot

Interpreting the Results

  • Intercept \(\beta_0 = -10\): Expected value of \(Y\) when \(X = 0\).
  • Slope \(\beta_1 = 0.5\): Change in \(Y\) for each unit change in \(X\).
    • In this example, the slope is positive, indicating that higher advertising spending is associated with higher sales.

Evaluating model fit

R squared \(R^2\): Shows how much of variance in Y is explained by X. Values range from 0 to 1, the higher the better Residuals: Difference between observed and predicted value Y Calculate R squared:

  • The R-squared value is 0.802

Since the R sqaured value is 0.8, we can say definitely say that this linear model predicts the y results Therefore, We can use our regression equation to predict future sales based on new advertising spend values.

Limitations of SLR: 1) This model captures only linear regression, other relationships are not modeled well 2) Sensitive to outliers which can skew the results 3) Assumptions must be valid

Conclusion

Simple Linear Regression is a powerful tool for examining the relationship between two variables.

  • Strengths: Easy to use and interpret, making it accessible for a wide range of applications.
  • Weaknesses: Sensitive to outliers and relies on assumptions like linearity and homoscedasticity, which may limit its effectiveness in some cases.

Residual Plot

model <- lm(Sales ~ Advertising, data = data)
data$residuals <- residuals(model)
ggplot(data, aes(x = Advertising, y = residuals)) +
  geom_point(color = "purple") +
  geom_hline(yintercept = 0, linetype = "dashed") +
  labs(title = "Residuals of Linear Model", x = "Advertising Spend", y = "Residuals")

Thank you very much!