2025-03-16

Introduction to Linear Regression

  • Linear Regression is a widely used statistical technique for modeling relationships between a dependent variable and one or more independent variables.
  • It is extensively applied in finance, healthcare, engineering, and machine learning.
  • Helps in predicting continuous outcomes from input features.

Mathematical Foundation

  • The equation for Simple Linear Regression: \[ Y = \beta_0 + \beta_1 X + \epsilon \]

    Where:

    • \(Y\) = Dependent Variable
    • \(X\) = Independent Variable
    • \(\beta_0\) = Intercept
    • \(\beta_1\) = Slope
    • \(\epsilon\) = Error Term

Generating Sample Data

data <- data.frame(
  X = rnorm(100, mean = 5, sd = 2),
  Y = 3 + 2*rnorm(100, mean = 5, sd = 2) + rnorm(100, sd = 1)
)
head(data)
##          X         Y
## 1 7.741917 15.802932
## 2 3.870604 17.512782
## 3 5.726257 10.158491
## 4 6.265725 22.453467
## 5 5.808537  8.956045
## 6 4.787751 12.271200

Scatter Plot with Regression Line

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## `geom_smooth()` using formula = 'y ~ x'

Model Fitting in R

model <- lm(Y ~ X, data = data)
summary(model)$coefficients
##                Estimate Std. Error     t value     Pr(>|t|)
## (Intercept) 12.72265214  1.0155058 12.52839003 4.579201e-22
## X           -0.01637802  0.1855634 -0.08826103 9.298493e-01

3D Visualization with Multiple Variables

Model Evaluation Metrics

  • R-squared: Determines how well the model fits the data.
  • Mean Squared Error (MSE): Measures the average squared difference between predicted and actual values.
  • Residual Analysis: Identifies patterns in errors.

Conclusion

  • Linear Regression is a fundamental technique for predictive modeling.
  • Common applications include:
    • Stock Market Predictions (Finance)
    • Disease Risk Assessment (Healthcare)
    • Quality Control (Engineering)
  • Future improvements include:
    • Polynomial Regression for complex relationships
    • Ridge & Lasso Regression for regularization
    • Machine Learning Approaches for advanced prediction models