2025-03-16

Linear Regression

Linear Regression is a method to establish a linear relationship between a dependent variable with one or more independent or predictor variables.

  • Simple Linear Regression (SLR): One dependent (\(y\)) and one independent (\(x\)) variable.

  • Multiple Linear Regression (MLR): One dependent (\(y\)) and more than one independent (\(x_1, x_2, ..., x_n\)) variables.

In this presentation we will plot SLR for Volume vs. Girth and Volume vs. Height, and MLR for Volume vs. Girth and Height using data set trees.

Data set

The summary of the data set trees is as follows:

     Girth           Height       Volume     
 Min.   : 8.30   Min.   :63   Min.   :10.20  
 1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
 Median :12.90   Median :76   Median :24.20  
 Mean   :13.25   Mean   :76   Mean   :30.17  
 3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
 Max.   :20.60   Max.   :87   Max.   :77.00  

The data set trees has three numeric variables: Girth, Height, and Volume.

SLR 1: Volume vs. Girth

Summary of the linear model for Volume vs. Girth

slr1 <- lm(data=trees, Volume~Girth)
summary(slr1)
Call:
lm(formula = Volume ~ Girth, data = trees)

Residuals:
   Min     1Q Median     3Q    Max 
-8.065 -3.107  0.152  3.495  9.587 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
Girth         5.0659     0.2474   20.48  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.252 on 29 degrees of freedom
Multiple R-squared:  0.9353,    Adjusted R-squared:  0.9331 
F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

SLR 1 Plot

\(Model: Volume = -36.9435+5.0659*Girth,\) \(R^2 = 0.9331\)

  • Plot:
`geom_smooth()` using formula = 'y ~ x'

SLR 2: Volume vs. Height

Summary of the linear model for Volume vs. Height

slr2 <- lm(data=trees, Volume~Height)
summary(slr2)
Call:
lm(formula = Volume ~ Height, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-21.274  -9.894  -2.894  12.068  29.852 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -87.1236    29.2731  -2.976 0.005835 ** 
Height        1.5433     0.3839   4.021 0.000378 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 13.4 on 29 degrees of freedom
Multiple R-squared:  0.3579,    Adjusted R-squared:  0.3358 
F-statistic: 16.16 on 1 and 29 DF,  p-value: 0.0003784

SLR 2 Plot

\(Model: Volume = -87.1236+1.5433*Height,\) \(R^2 = 0.3358\)

  • Plot:
`geom_smooth()` using formula = 'y ~ x'

MLR Volume vs. Girth, Height

Summary of the linear model for Volume vs. Girth, Height

mlr <- lm(data=trees, Volume~Girth+Height)
summary(mlr)
Call:
lm(formula = Volume ~ Girth + Height, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.4065 -2.6493 -0.2876  2.2003  8.4847 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -57.9877     8.6382  -6.713 2.75e-07 ***
Girth         4.7082     0.2643  17.816  < 2e-16 ***
Height        0.3393     0.1302   2.607   0.0145 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 3.882 on 28 degrees of freedom
Multiple R-squared:  0.948, Adjusted R-squared:  0.9442 
F-statistic:   255 on 2 and 28 DF,  p-value: < 2.2e-16

MLR Plot Code

\(Model: Volume = -57.9877+4.7082*Girth+0.3393*Height,\) \(R^2 = 0.9442\)

x <- trees$Girth
y <- trees$Height
z <- trees$Volume
xax <- list(title = "Girth")
yax <- list(title = "Height")
zax <- list(title = "Volume")
mlrplot <- plot_ly(data=trees, x=x, y=y, z=z, 
              type="scatter3d", mode="markers",
              color = z) %>%
              layout(title = "Volume vs. Girth, Height",
              scene=list(xaxis=xax, yaxis=yax, zaxis=zax))

MLR Plot

3-D plotly plot of Volume vs. Girth, Height:

mlrplot

Conclusion

  • The Multiple Linear Regression model is the best fit as it has the highest \(R^2\) value.
  • Girth and Height together are better predictors of tree Volume, than each of them individually.