2022-11-04

Simple Linear Regression

This is an R Markdown presentation on simple linear regression

Simple Linear Regression Defined

  • Linear regression allows us to find the strength of a relationship
  • The relationship can involve two or more variables
  • When there is just one independent and one dependent variable, it is known as “simple” linear regression

The Line of Best Fit

  • A common tool of linear regression is the “line of best fit”
  • The line of best fit is a straight line that ideally has the smallest distance between itself and the rest of the data points
  • The formula for the line of best fit is essentially the formula for plotting a straight line that we all learned in grade school

Linear Regression Formulas

  • \(\bar{x}\) is the mean of the x-values
  • \(\bar{x}\) = \(\sum_{i=1}^{n} x_i\)
  • \(\bar{y}\) is the mean of the y-values. It is calculated the same way as \(\bar{x}\)

Linear Regression Formulas cont.

  • The formula for the slope of the line of best fit is given below:
  • \(m = \frac{ \sum_{i=1}^{n} (x_i-\bar{x})(y_i-\bar{y}) }{ \sum_{i=1}^{n} (x_i-\bar{x})^2 }\)
  • The y-intercept of the line is found with the formula: \(b = \bar{y}-m\bar{x}\)

Creating an Example Dataframe

  • Below is an example of simple linear regression performed on a dataset of peoples’ height and weight
##    height weight
## 1      66    143
## 2      68    170
## 3      68    162
## 4      71    171
## 5      73    173
## 6      62    119
## 7      63    127
## 8      64    141
## 9      72    191
## 10     68    184

Simple Plot of “people”

  • Below is a simple plot of our “people” dataset
  • Height is the independent variable while weight is the dependent

Plotting “people” with ggplot

library(ggplot2)
ggplot(people, aes(height, weight)) + geom_point() + 
  geom_smooth(method="lm", se=FALSE)
## `geom_smooth()` using formula = 'y ~ x'

ggplot with Labels, Colors and Error

ggplot(people, aes(height, weight)) + geom_point() + 
  geom_smooth(method="lm", color="red") + labs(x="Height", y="Weight", 
                                               title="Simple Linear Regression") + theme(plot.title = element_text(colour="red", face="bold"))
## `geom_smooth()` using formula = 'y ~ x'