Simple Linear Regression

Introduction

Simple linear regression is a technique that helps people understand the relationship between two variables. This is by fitting a line to the data, so it can predict the value of one variable based on another. In this presentation, we will see how this method works and use an example of a real-world dataset.

Formula

The regression line equation is: \[ y = \beta_0+\beta_1 x + \epsilon \] Where:
- \(y\) is the dependent variable.
- \(x\) is the independent variable.
- \(\beta_0\) is the intercept.
- \(\beta_1\) is the slope.
- \(\epsilon\) is the error term.

NHTEMP Dataset

The nhtemp dataset used in these examples is avg temperatures in the US over several years. In this example, the year is the independent variable and the temperature is the dependent variable.

Slope and intercept

In this model:
- The slope (\(\beta_1)\) tells us how much the temp changes for each increase in year.
-The intercept (\(\beta_0\)) represents the value of the temp when the year is 0.

Scatter Plot with Linear Regression

R Code From Previous Plot

{r, echo = TRUE, warning = FALSE}

library(ggplot2)

data(“nhtemp”)

years <- as.numeric(time(nhtemp))

temps <- as.numeric(nhtemp)

ggplot(data = data.frame(year=years, temp = temps), aes(x=year, y = temp))+geom_point() + geom_smooth(method = “lm”, col = “red”, se = FALSE)

Line on Scatter Plot

3D Plot of Temperature Data

Conclusion

With this data, linear regression was shown to be very useful in predicting one variable based on another. In addition, found that plotly and ggplot2 are useful for visualizing this relationship.