2025-03-25

Linear regression in air pollution

-Linear regression models the relationship between two continuous variables by fitting a straight line to the data.

-Helps understand how environmental factors influence pollution levels.

-Provides insights for developing strategies to reduce pollution.

Scatter plot for Ozone vs Temperature

library(ggplot2)
data(airquality) 
airquality_clean <- na.omit(airquality)
ggplot(airquality_clean, aes(x = Temp, y = Ozone)) + geom_point() 

Formula for calculating the Linear Regression

\[ \hat{y} = \beta_0 + \beta_1 x \]

This is the formula to find the linear regression

Least squares formula

\[ \hat{\beta} = (XTX){-1}X^Ty \]

Regression line added to previous graph

library(ggplot2)
data(airquality)
airquality_clean <- na.omit(airquality)
ggplot(airquality_clean, aes(x = Temp, y = Ozone)) + geom_point() + geom_smooth(method = "lm", se = FALSE)

3-D plot with Solar Radiation

library(plotly)
airquality_clean <- na.omit(airquality)
plot_ly(airquality_clean, x = ~Temp, y = ~Ozone, z = ~Solar.R, type = "scatter3d", mode = "markers")

Code for previous plot

plot_ly(airquality, x = ~Temp, y = ~Ozone, z = ~Solar.R, type = “scatter3d”, mode = “markers”)

Conclusion

-Simple linear regression is a valuable tool for understanding environmental factors affecting pollution.

-The airquality dataset shows a positive correlation between temperature and ozone levels

-Models can guide environmental policies and strategies for pollution control.