What is Linear Regression?

What is Linear Regression?

Linear Regression is a statistical method used to model the relationship between a dependent variable and one or more independent variables. The goal of linear regression is to find the best-fitting linear equation that describes how the dependent variable changes as the independent variables vary.

Simple Linear Regression is a specific case of linear regression which only involves one dependent variable and one independent variable. However, as stated above, There is also exists the method of Multiple Linear Regression, which deals with more than one independent variable

Presentation Overview

The following set of slides will explain the process of simple linear regression in this order:

  1. Data Summary
  2. Data Plot
  3. Simple Linear Regression Formula
  4. Simple Linear Regression model
  5. Regression Line Plot (will display an additional plot)

Summary of Dataset trees

data(trees)
head(trees)
  Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2
4  10.5     72   16.4
5  10.7     81   18.8
6  10.8     83   19.7

Dataset Scatterplot

Simple Linear Regression Formula

The formula for simple linear regression is: \(y = \beta_0 + \beta_1 x + \epsilon\)

Where:

  1. \(y\) is the dependent variable<>
  2. \(x\) is the independent variable<>
  3. \(\beta_0\) is the intercept<>
  4. \(\beta_1\) is the slope<>
  5. \(\epsilon\) is the error term<>

Simple Linear Regression Formula

To estimate the intercept \(\beta_0\) and slope \(\beta_1\) in simple linear regression, we use the following formulas:

Formula for \(\beta_1\) (Slope):

\(\beta_1 = \frac{\sum (x_i - \bar{x})(y_i - \bar{y})}{\sum (x_i - \bar{x})^2}\)

Where:

  • \(x_i\) and \(y_i\) are the observed values of the independent and dependent variables.<>
  • \(\bar{x}\) and \(\bar{y}\) are the means of the independent and dependent variables.<>

Formula for \(\beta_0\) (Intercept):

\(\beta_0 = \bar{y} - \beta_1 \bar{x}\)

Where:

  • \(\beta_1\) is the slope, \(\bar{x}\) is the mean of \(x\), and \(\bar{y}\) is the mean of \(y\).<>

These formulas are used to find the line that best fits the data by minimizing the squared differences between observed and predicted values.

Simple Linear Regression model

model <- lm(Girth ~ Height, data = trees)
summary(model)
Call:
lm(formula = Girth ~ Height, data = trees)

Residuals:
    Min      1Q  Median      3Q     Max 
-4.2386 -1.9205 -0.0714  2.7450  4.5384 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)   
(Intercept) -6.18839    5.96020  -1.038  0.30772   
Height       0.25575    0.07816   3.272  0.00276 **
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.728 on 29 degrees of freedom
Multiple R-squared:  0.2697,    Adjusted R-squared:  0.2445 
F-statistic: 10.71 on 1 and 29 DF,  p-value: 0.002758

Simple Linear Regression Plot (Height vs. Girth)

Simple Linear Regression Plot (Height vs. Girth)

The following plot is to demonstrate that we may perform simple linear regression on any two variables within the databset, not just the two variables shown previously.