2024-09-22

Introduction

  • Simple Linear Regression is a statistical method used to model the relationship between two continuous variables.
  • It assumes a linear relationship between the independent variable \(X\) and the dependent variable \(Y\).

The Model

The simple linear regression model is given by:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

  • \(\beta_0\): Intercept
  • \(\beta_1\): Slope
  • \(\epsilon\): Error term

Assumptions

  1. Linearity: The relationship between \(X\) and \(Y\) is linear.
  2. Independence: Observations are independent.
  3. Homoscedasticity: Constant variance of errors.
  4. Normality: Errors are normally distributed.

library(ggplot2)
ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  labs(
    title = "Scatter Plot of MPG vs Horsepower",
    x = "Horsepower",
    y = "Miles Per Gallon"
  )

ggplot(mtcars, aes(x = hp, y = mpg)) +
  geom_point() +
  geom_smooth(method = "lm", col = "red") +
  labs(
    title = "Regression Line of MPG on Horsepower",
    x = "Horsepower",
    y = "Miles Per Gallon"
  )

library(plotly)
residuals <- resid(model)
fitted <- fitted(model)
plot_ly(
  x = fitted,
  y = residuals,
  type = "scatter",
  mode = "markers"
) %>%
  layout(
    title = "Residuals vs Fitted Values",
    xaxis = list(title = "Fitted Values"),
    yaxis = list(title = "Residuals")
  )

Equation of the Line

From the model, the estimated regression equation is:

\[ \hat{Y} = \hat{\beta}_0 + \hat{\beta}_1 X \]

  • \(\hat{\beta}_0 = 30.1\)
  • \(\hat{\beta}_1 = -0.07\)

\[ \hat{Y} = 30.1 + -0.07 X \]

Conclusion

  • There is a negative linear relationship between horsepower and MPG.
  • Higher horsepower tends to result in lower fuel efficiency.