- Simple Linear Regression is used to model the relationship between two variables.
- It helps predict the dependent variable (Y) based on the independent variable (X).
- Example: Predicting Tree Volume using Girth.
2025-03-25
We model the relationship using the equation:
\[ Y = \beta_0 + \beta_1 X + \epsilon \]
The regression coefficients are computed using the following formulas:
\[ \beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2} \]
\[ \beta_0 = \bar{y} - \beta_1 \bar{x} \]
These formulas estimate the line that best fits the data.
| Girth | Height | Volume |
|---|---|---|
| 8.3 | 70 | 10.3 |
| 8.6 | 65 | 10.3 |
| 8.8 | 63 | 10.2 |
| 10.5 | 72 | 16.4 |
| 10.7 | 81 | 18.8 |
| 10.8 | 83 | 19.7 |
## `geom_smooth()` using formula = 'y ~ x'
# Fit a linear model lm_model <- lm(Volume ~ Girth, data = trees) # Model summary summary(lm_model)
## ## Call: ## lm(formula = Volume ~ Girth, data = trees) ## ## Residuals: ## Min 1Q Median 3Q Max ## -8.065 -3.107 0.152 3.495 9.587 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -36.9435 3.3651 -10.98 7.62e-12 *** ## Girth 5.0659 0.2474 20.48 < 2e-16 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 4.252 on 29 degrees of freedom ## Multiple R-squared: 0.9353, Adjusted R-squared: 0.9331 ## F-statistic: 419.4 on 1 and 29 DF, p-value: < 2.2e-16
## Girth Predicted_Volume ## 1 10 13.71511 ## 2 12 23.84682 ## 3 15 39.04439