2024-04-08

Simple Linear Regeresstion

  • Simple Linear Regeresstion is a statistical method to estimate the relationship between 2 quantitative variable:

    • One variable is regarded as the preductor, explanatory or independent variable.

    • The other one is regraded the response, outcome or dependent variable.

  • The goal of a simple linear regeression is to predict the value of a dependent variable based on an independent variable.

The simple linear regeression formular

A simple linear regeression line can be described by the equation:

\[ \text{y} = \beta_0 + \beta_1\text{x} + \varepsilon;\]

Where:

  • y is the predict value of the dependent variable (y) for any given value of the independent varialble (x).
  • \(\beta_0\) is the intercept, the predicted value of y when x is 0.
  • \(\beta_1\) is the regression coefficient
  • x is the independent varialbe
  • \(\varepsilon\) is the error of the estimate of the regeression coefficient.

Interpreting the Regression Line

  • The simple linear regression line \[ \hat{y} = a + bx\]
  • \(\hat{y}\) is the predicted value of y
  • a is the intercept and predicts where the regression line will cross the y-axis,
  • b predicts the change in y for every unit change in x.

Dataset trees

  Girth Height Volume
1   8.3     70   10.3
2   8.6     65   10.3
3   8.8     63   10.2
4  10.5     72   16.4
5  10.7     81   18.8
6  10.8     83   19.7

Example with Scatter Plot

Example with Scatter Plot with Fitted Line

Scatter plot with correlation coefficient

`geom_smooth()` using formula = 'y ~ x'

Plot a Linear Regression Line in ggplot2

Call:
lm(formula = y ~ x, data = data)

Residuals:
    Min      1Q  Median      3Q     Max 
-1.4444 -0.8013 -0.2426  0.5978  2.2363 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  4.20041    0.56730   7.404 5.16e-06 ***
x            1.84036    0.07857  23.423 5.13e-12 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.091 on 13 degrees of freedom
Multiple R-squared:  0.9769,    Adjusted R-squared:  0.9751 
F-statistic: 548.7 on 1 and 13 DF,  p-value: 5.13e-12

Visualize the fitted linear regression model:

`geom_smooth()` using formula = 'y ~ x'

Create regression plot with customized style

`geom_smooth()` using formula = 'y ~ x'

Reference