2025-03-25

Introduction

  • Simple Linear Regression is used to model the relationship between two variables.
  • It helps predict the dependent variable (Y) based on the independent variable (X).
  • Example: Predicting Tree Volume using Girth.

Mathematical Formula

We model the relationship using the equation:

\[ Y = \beta_0 + \beta_1 X + \epsilon \]

  • \(Y\) = Dependent Variable (Response)
  • \(X\) = Independent Variable (Predictor)
  • \(\beta_0\) = Intercept (Y when X = 0)
  • \(\beta_1\) = Slope (Rate of change)
  • \(\epsilon\) = Error term

Computing the Coefficients

The regression coefficients are computed using the following formulas:

\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]

\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]

These formulas estimate the line that best fits the data.

Dataset: trees

First 6 Rows of the trees Dataset
Girth Height Volume
8.3 70 10.3
8.6 65 10.3
8.8 63 10.2
10.5 72 16.4
10.7 81 18.8
10.8 83 19.7

Scatter Plot using ggplot2

Regression Line using ggplot2

## `geom_smooth()` using formula = 'y ~ x'

Interactive Plot using plotly

R Code for Regression

# Fit a linear model
lm_model <- lm(Volume ~ Girth, data = trees)

# Model summary
summary(lm_model)

Model Summary Output

## 
## Call:
## lm(formula = Volume ~ Girth, data = trees)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -8.065 -3.107  0.152  3.495  9.587 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
## Girth         5.0659     0.2474   20.48  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.252 on 29 degrees of freedom
## Multiple R-squared:  0.9353, Adjusted R-squared:  0.9331 
## F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

Model Predictions

##   Girth Predicted_Volume
## 1    10         13.71511
## 2    12         23.84682
## 3    15         39.04439

Conclusion

  • The model shows a strong linear relationship between Girth and Tree Volume.
  • High R² (~0.93) indicates good fit.
  • Model can effectively predict tree volume using girth.