2024-03-21

Introduction to Simple Linear Regression

Simple linear regression is a fundamental statistical method used to understand and predict the relationship between two continuous variables. One variable, known as the independent variable, is used to predict the value of the other variable, which is the dependent variable. The relationship between these variables is modeled through a linear equation, showcasing how the dependent variable changes as the independent variable increases or decreases.

Simple linear regression provides a straightforward yet powerful framework for analyzing and interpreting the linear association between two variables

The Linear Regression Model

The linear regression model is defined as:

\[ Y = \beta_0 + \beta_1X_1 + ... + \beta_kX_k + \epsilon \]

Where \(Y\) is the dependent variable we are trying to predict, \(X_1,...,X_k\) are the independent variables, \(\beta_0\) is the intercept, \(\beta_1,...,\beta_k\) are the coefficients, and \(\epsilon\) is the error term.

Ordinary Least Squares (OLS)

The Ordinary Least Squares method minimizes the sum of the squared differences between observed values and predicted values:

\[ \min_{\beta_0,...,\beta_k} \sum_{i=1}^{n} (Y_i - (\beta_0 + \beta_1X_{1i} + ... + \beta_kX_{ki}))^2 \]

Here, \(Y_i\) represents the observed values, and \(X_{1i},...,X_{ki}\) are the independent variable values for the ith observation.

GGPlot2 Plot 1: Simple linear regression plot

## `geom_smooth()` using formula = 'y ~ x'

GGPlot2 Plot 2: Simple linear regression plot

## `geom_smooth()` using formula = 'y ~ x'

R Code for Plot 2

library(ggplot2)
library(datasets)

data("airquality")
airqualit <- na.omit(airquality)
ggplot2 <- ggplot(airquality, aes(x = Temp, y = Ozone)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE, color = "maroon") +
  labs(title = "Effect of Temperature on Ozone Pollution",
       x = "Temperature (degrees F)",
       y = "Ozone (ppb)") +
  theme_minimal()
print(ggplot2)

Plotly Plot

Conclusion

The simple linear regression analysis, based on the airquality dataset, offers insights into the relationship between temperature and ozone pollution levels. The visualizations demonstrates a positive correlation, indicating that higher temperatures are associated with increased levels of ozone pollution. This analysis highlights the need to monitor and manage air quality, particularly in warmer months when high temperatures may worsen pollution levels.