Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The goal of linear regression is to find the best-fitting linear equation that describes how the dependent variable changes as the independent variables vary.
Simple Linear Regression is a specific case of linear regression which only involves one dependent variable and one independent variable. However, as stated above, There is also exists the method of Multiple Linear Regression, which deals with more than one independent variable
The following set of slides will explain the process of simple linear regression in this order:
treesdata(trees) head(trees)
Girth Height Volume 1 8.3 70 10.3 2 8.6 65 10.3 3 8.8 63 10.2 4 10.5 72 16.4 5 10.7 81 18.8 6 10.8 83 19.7
The formula for simple linear regression is: \(y = \beta_0 + \beta_1 x + \epsilon\)
Where:
To estimate the intercept \(\beta_0\) and slope \(\beta_1\) in simple linear regression, we use the following formulas:
Formula for \(\beta_1\) (Slope):
\(\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\)
Where:
Formula for \(\beta_0\) (Intercept):
\(\beta_0 = \bar{y} - \beta_1 \bar{x}\)
Where:
These formulas are used to find the line that best fits the data by minimizing the squared differences between observed and predicted values.
model <- lm(Girth ~ Height, data = trees) summary(model)
Call:
lm(formula = Girth ~ Height, data = trees)
Residuals:
Min 1Q Median 3Q Max
-4.2386 -1.9205 -0.0714 2.7450 4.5384
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -6.18839 5.96020 -1.038 0.30772
Height 0.25575 0.07816 3.272 0.00276 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 2.728 on 29 degrees of freedom
Multiple R-squared: 0.2697, Adjusted R-squared: 0.2445
F-statistic: 10.71 on 1 and 29 DF, p-value: 0.002758
The following plot is to demonstrate that we may perform simple linear regression on any two variables within the databset, not just the two variables shown previously.