2024-03-17

Introduction

  • Simple linear regression is a statistical method used to model the relationship between two variables: one independent variable (X) and a possibly dependent variable (Y).

  • It assumes that there is a linear relationship between X and Y, which can be represented by a straight line.

  • The correlation of the two variables is demonstrated by how closely the linear regression matches the actual data points.

Model Representation

The simple linear regression model can be represented as:

\[ y = \alpha + \beta x \]

Where:
- \(y\) = Dependent variable
- \(x\) = Independent variable
- \(\alpha\) = Intercept
- \(\beta\) = Slope

Coefficient of Determination

  • The coefficient of determination (\(R^2\)) measures the proportion of the variance in the dependent variable (\(Y\)) that can be explained by the independent variable (\(X\)).

  • It essentially provides information about how well the linear regression model represents the actual data.

  • It ranges from 0 to 1; the closer \(R^2\) is to 0 the less reliable the model is and the closer to 1 the more. \(R^2 = 1\) would indicate that the ALL of the variation of the dependent values can be explained the independent variable.

\[ R^2 = \frac{{\text{Explained Variance}}}{{\text{Total Variance}}} \]

Linear Regression Example

Let us create a data frame to illustrate the concept of linear regression. In this case, let’s include sample height and weight.

# Sample data
x <- c(60,60,61,63,65,65,65,67,68,69,70,70,71,72)
y <- c(120,110,120,125,135,175,160,115,150,155,200,175,180,220)

# Create data frame
data <- data.frame(x = x, y = y)

The aim of this example is to show an average linear regression, with a moderate \(R^2\) of around 0.66.
After we will manipulate the for a new example to demonstrate a poor fit, with \(R^2 = 0.16\).

Linear Regression Example

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression with Poor Fit

## `geom_smooth()` using formula = 'y ~ x'

Linear Regression with Perfect Fit

Linear regression performed with data that has the coefficient of determination \(R^2=1\).