March 25, 2025

1

Linear Regression

  • What is Linear Regression?
    • Finds a linear relationship between variables and estimates missing values called interpolation.
  • Why is it useful?
    • Helps predict outcomes based on past data.
    • Identifies trends and relationships between variables.
    • Provides insights for decision-making in various fields.
  • Applications in finance, insurance, and healthcare.
    • Finance: Predicting stock prices, risk assessment.
    • Insurance: Estimating claim amounts, determining premiums.
    • Healthcare: Diagnosing diseases, predicting patient outcomes.

Regression Formula

A simple linear regression model:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

  • \(Y\): Dependent variable
  • \(X\): Independent variable
  • \(\beta_0\): Intercept
  • \(\beta_1\): Slope
  • \(\epsilon\): Error term

Data

##     x          y
## 1   1 -0.6047565
## 2   2  5.6982251
## 3   3 26.5870831
## 4   4 14.7050839
## 5   5 18.2928774
## 6   6 37.1506499
## 7   7 27.6091621
## 8   8 13.3493877
## 9   9 22.1314715
## 10 10 27.5433803
## 11 11 47.2408180
## 12 12 41.5981383
## 13 13 45.0077145
## 14 14 45.1068272
## 15 15 41.4415887

Code for the Linear Regression

ggplot(data, aes(x = x, y = y)) +
  geom_point(color = "#8C1D40") + 
  geom_smooth(method = "lm", col = "steelblue") +
  labs(title = "Linear Regression Example", x = "X", y = "Y") +
  theme(plot.title = element_text(color = "#8C1D40"))

Linear Regression Model of Data

## `geom_smooth()` using formula = 'y ~ x'

Model Summary

\[ y=4.8780+2.8307x \]

##             Estimate Std. Error  t value     Pr(>|t|)
## (Intercept) 4.878044  4.7474585 1.027506 0.3229217707
## x           2.830725  0.5221508 5.421278 0.0001168109
## The correlation coefficient is: 0.8326619

Regression Analysis

  • Equation: \(y = 4.8780 + 2.8307x\)
    • Intercept: 4.8780 (not statistically significant)
    • Slope: 2.8307 (statistically significant, p < 0.05)
  • Correlation: r = 0.83
    • Strong positive relationship between \(x\) and \(y\), indicating that as \(x\) increases, \(y\) also increases.

Residuals

  • What are Residuals?
    • Differences between observed and predicted values.
    • Formula: \(\text{Residual} = Y - \hat{Y}\), where \(Y\) is observed and \(\hat{Y}\) is predicted.
  • Why are Residuals Important?
    • Model Assessment: Help evaluate how well the model fits the data.
    • Detecting Patterns: Patterns suggest the model might not capture all data aspects.
    • Outliers: Large residuals may indicate outliers affecting accuracy.
  • What do Residuals Tell Us?
    • Random distribution suggests valid linear regression assumptions.

3D Plotly Visualization

This plot lets you zoom in and hover over the residuals to view their values.

Residual Plots

  • What is the Residual Plot?
    • Shows the difference between observed and predicted values (residuals).
    • Residuals are on the vertical axis, and \(X\) is on the horizontal axis.
  • What does it tell us?
    • Randomly scattered residuals suggest a good linear model.
    • Patterns in residuals indicate a possible need for a non-linear model.
    • In this case, no obvious pattern shows that the linear model fits well.

Residual Plot of Data

Since the residual plot shows no clear pattern, a linear model appears to be a good fit for this dataset.

Conclusion

  • Linear Regression finds a straight-line relationship between variables.
  • Benefits:
    • Simple and useful in finance, insurance, and healthcare.
  • Limitations:
    • Assumes a linear relationship, which may not always be true.
    • Sensitive to outliers, which can affect results.
  • Residuals help evaluate the model’s fit and identify problems, such as patterns that suggest a non-linear relationship or outliers.
    • Residual Plots helps evaluate the fit of a regression model by showing if the residuals are randomly scattered or if patterns suggest the need for a different model.