Simple linear regression is a technique that helps people understand the relationship between two variables. This is by fitting a line to the data, so it can predict the value of one variable based on another. In this presentation, we will see how this method works and use an example of a real-world dataset.
The regression line equation is: \[
y = \beta_0+\beta_1 x + \epsilon
\] Where:
- \(y\) is the dependent variable.
- \(x\) is the independent variable.
- \(\beta_0\) is the intercept.
- \(\beta_1\) is the slope.
- \(\epsilon\) is the error term.
The nhtemp dataset used in these examples is avg temperatures in the US over several years. In this example, the year is the independent variable and the temperature is the dependent variable.
In this model:
- The slope (\(\beta_1)\) tells us how much the temp changes for each increase in year.
-The intercept (\(\beta_0\)) represents the value of the temp when the year is 0.
{r, echo = TRUE, warning = FALSE}
library(ggplot2)
data(“nhtemp”)
years <- as.numeric(time(nhtemp))
temps <- as.numeric(nhtemp)
ggplot(data = data.frame(year=years, temp = temps), aes(x=year, y = temp))+geom_point() + geom_smooth(method = “lm”, col = “red”, se = FALSE)
With this data, linear regression was shown to be very useful in predicting one variable based on another. In addition, found that plotly and ggplot2 are useful for visualizing this relationship.