2025-03-17

What is Simple Linear Regression?

Simple Linear Regression is a statistical method used to predict one dependent variable based on a single independent variable.

Regression Equation (LaTeX)

The simple linear regression model is represented as:

\[ y = \beta_0 + \beta_1 x + \varepsilon \]

  • \(y\): Dependent variable
  • \(x\): Independent variable
  • \(\beta_0\): Intercept
  • \(\beta_1\): Slope (Regression coefficient)
  • \(\varepsilon\): Error term

Model Evaluation Metric (LaTeX)

The coefficient of determination \(R^2\) is calculated using the formula:

\[ R^2 = 1 - \frac{SSR}{SST} \]

  • \(SSR\) (Sum of Squares for Residuals): Residual sum of squares
  • \(SST\) (Total Sum of Squares): Total variability in the data
  • A higher \(R^2\) (closer to 1) indicates a better fit of the model to the data.

Data Overview

We will use the mtcars dataset:

  • mpg (Miles Per Gallon): Fuel efficiency
  • wt (Weight): Vehicle weight
head(mtcars)
##                    mpg cyl disp  hp drat    wt  qsec vs am gear carb
## Mazda RX4         21.0   6  160 110 3.90 2.620 16.46  0  1    4    4
## Mazda RX4 Wag     21.0   6  160 110 3.90 2.875 17.02  0  1    4    4
## Datsun 710        22.8   4  108  93 3.85 2.320 18.61  1  1    4    1
## Hornet 4 Drive    21.4   6  258 110 3.08 3.215 19.44  1  0    3    1
## Hornet Sportabout 18.7   8  360 175 3.15 3.440 17.02  0  0    3    2
## Valiant           18.1   6  225 105 2.76 3.460 20.22  1  0    3    1

Scatter Plot with ggplot2

library(ggplot2)

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "darkblue") +
  labs(title = "Relationship between Vehicle Weight and MPG",
       x = "Weight (wt)",
       y = "Miles Per Gallon (mpg)") +
  theme_minimal()

Scatter Plot with ggplot2

Estimating Simple Linear Model

model <- lm(mpg ~ wt, data = mtcars)
summary(model)
  • Check regression coefficients, p-value, and R-squared value.

Estimating Simple Linear Model

## 
## Call:
## lm(formula = mpg ~ wt, data = mtcars)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Regression Line with ggplot2

ggplot(mtcars, aes(x = wt, y = mpg)) +
  geom_point(color = "darkblue") +
  geom_smooth(method = "lm", se = TRUE, color = "red") +
  labs(title = "Regression Line",
       x = "Weight (wt)",
       y = "Miles Per Gallon (mpg)")

Regression Line with ggplot2

## `geom_smooth()` using formula = 'y ~ x'

3D Plot Using Plotly

library(plotly)

x <- mtcars$wt
y <- mtcars$hp
z <- mtcars$mpg

plot_ly(x = ~x, y = ~y, z = ~z, type = "scatter3d", mode = "markers") %>%
  layout(scene = list(xaxis = list(title = 'Weight (wt)'),
                      yaxis = list(title = 'Horsepower (hp)'),
                      zaxis = list(title = 'Miles Per Gallon (mpg)')))
  • Visualize the relationship among weight, horsepower, and fuel efficiency in 3D.

3D Plot Using Plotly

Conclusion

  • As vehicle weight increases, fuel efficiency (mpg) decreases.
  • This relationship is statistically significant.
  • Useful for improving vehicle performance.