2026-02-08

Intro

Using default tree data set

data(trees)
glimpse(trees)
Rows: 31
Columns: 3
$ Girth  <dbl> 8.3, 8.6, 8.8, 10.5, 10.7, 10.8, 11.0, 11.0, 11.1, 11.2, 11.3, …
$ Height <dbl> 70, 65, 63, 72, 81, 83, 66, 75, 80, 75, 79, 76, 76, 69, 75, 74,…
$ Volume <dbl> 10.3, 10.3, 10.2, 16.4, 18.8, 19.7, 15.6, 18.2, 22.6, 19.9, 24.…
summary(trees)
     Girth           Height       Volume     
 Min.   : 8.30   Min.   :63   Min.   :10.20  
 1st Qu.:11.05   1st Qu.:72   1st Qu.:19.40  
 Median :12.90   Median :76   Median :24.20  
 Mean   :13.25   Mean   :76   Mean   :30.17  
 3rd Qu.:15.25   3rd Qu.:80   3rd Qu.:37.30  
 Max.   :20.60   Max.   :87   Max.   :77.00  

LaTeX

Testing if Girth an adequate variable to make a prediction on tree Volume with a linear regression \[ Y_i=\beta_0+\beta_1 X_i + \varepsilon_i \] “Vol = intercept + slope * girth + error”

Volume vs Girth

[1] "With fitted regression line"
`geom_smooth()` using formula = 'y ~ x'

fitting model

Call:
lm(formula = Volume ~ Girth, data = trees)

Residuals:
   Min     1Q Median     3Q    Max 
-8.065 -3.107  0.152  3.495  9.587 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) -36.9435     3.3651  -10.98 7.62e-12 ***
Girth         5.0659     0.2474   20.48  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.252 on 29 degrees of freedom
Multiple R-squared:  0.9353,    Adjusted R-squared:  0.9331 
F-statistic: 419.4 on 1 and 29 DF,  p-value: < 2.2e-16

ggplot 2 residuals

3d plotly Girth, Height, Volume

LaTeX 2 Slope test

Hypothesis test: \[ H_0: \beta_1 = 0 \quad \text{vs} \quad H_a: \beta_1 \neq 0 \] Test stat \[ t = \frac{\hat{\beta}_1}{SE(\hat{\beta}_1)} \] p-value \[ p = <2e-16 \] “As a result of a small p-value, we reject null hypothesis and conclude that Girth is a predictor of Volume”