2026-04-10

Introduction

  • Simple Linear Regression models relationship between:
    • Independent variable (x)
    • Dependent variable (y)
  • Used for prediction and trend analysis

Why It Matters

  • Helps predict outcomes (sales, prices, etc.)
  • Finds relationships between variables
  • Used in data science, business, engineering

Regression Model (Math)

\[ y = \beta_0 + \beta_1 x + \epsilon \]

  • \(\beta_0\) = intercept
  • \(\beta_1\) = slope
  • \(\epsilon\) = error

Estimation Formulas (Math)

\[ \hat{\beta}_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

\[ \hat{\beta}_0 = \bar{y} - \hat{\beta}_1 \bar{x} \]

Scatter Plot

ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  ggtitle("Scatter Plot of x vs y")

Regression Line

ggplot(data, aes(x = x, y = y)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  ggtitle("Linear Regression Fit")
## `geom_smooth()` using formula = 'y ~ x'

3D Plot

plot_ly(data, x = ~x, y = ~y, z = ~x,
        type = "scatter3d", mode = "markers") 

Model Summary

summary(model)
## 
## Call:
## lm(formula = y ~ x, data = data)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -7.4423 -1.5505  0.5624  1.4499  4.6351 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.8917     1.2947   2.233   0.0385 *  
## x             2.0647     0.1081  19.103 2.12e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.787 on 18 degrees of freedom
## Multiple R-squared:  0.953,  Adjusted R-squared:  0.9504 
## F-statistic: 364.9 on 1 and 18 DF,  p-value: 2.123e-13

Conclusion

  • Linear regression helps model relationships
  • It is useful for prediction
  • Visualizations make results easier to understand