2023-03-14

What is Simple Linear Regression

Simple Linear Regression is method to estimate a linear relationship between two quantitative variable usually in a graph format

Example:

\(y = \beta_1X + \beta_0\)
\(y = 5.06X + 14.8\)

Equation

The Usually Formula Used for Calculating Simple Linear Regression is: \(y = \beta_0 + \beta_1X + \epsilon\)

Meaning Of Each Variable:

\(y\): Predicted Value for any individual X
\(\beta_0\): Intercept, Predicted Value when X = 0
\(\beta_1\): Regression Coefficient
\(X\): Independent Variable
\(\epsilon\): Error Estimation

Why Do We Need It?

Simple Linear Regression is used to show the strength of the relationship between two variables. This can also shows how dependent the variables are on each other

Example:

Orange Tree Age Vs. Circumference

The Details

When using ggplot2 in default settings…

  • Black points on the graph represent every individual data point from the data set
  • The Blue line is the Simple Linear Regression Line also known as “Line of Best Fit”
  • The Grey area around the line show the Standard Error Lines. This shows the strength of correlation between the two variables

How to Graph:

Women Height vs Weight

library(ggplot2)
data(women)
head(women)
ggplot(women,aes(height, weight)) +
  geom_point() +
  geom_smooth(method='lm', se=True)
  • Here we graph the Data using ggplot2, so we first import it and use library()
  • Next import the data set we are using, in this case its the “women” data set
  • Next import the data set we are using, in this case its the “women” data set
  • Then we utilize the ggplot2 functions to graph every point by specifying the data set and the columns
  • geom_points() plots the displays the points on the graph
  • geom_smooth() graphs the Simple Linear Regression line while providing the Standard Error Lines (se=True)

Example:

For this graph, we can see that the grey area is very small, and very close to the Simple Linear Regression Line meaning that the correlation between the two variables, Woman’s Height and Weight, are closely related.

Example:

Number of Murders Vs. Urban Population For this graph, we can see that the grey area is very large meaning that the correlation between the two variables, Number of Murders and Urban Population, aren’t closely related. We can also conclude this based on the data points positions cause they are spread out all over the place.

Summary:

Overall, Simple Linear Regression is a powerful formula used to determine relationships between variables. Even with its many uses, the formula can also be misused by people by showing correlation between two independent variables, therefore its important to also understand the context of the graph as well as the scale that it is show in.