load_packages <- function(pkg_list) {
for (pkg in pkg_list) {
if (!require(pkg, character.only = TRUE)) {
install.packages(pkg, dependencies = TRUE)
library(pkg, character.only = TRUE)
}
}
}
packages <- c("tidyverse", "readxl", "pander", "forecast")
load_packages(packages)
knitr::opts_chunk$set(echo = TRUE,
warning = FALSE,
message = FALSE,
results = TRUE,
comment = NA,
fig.align = "center"
)
url <- "https://raw.githubusercontent.com/ncbrechbill/STA321/refs/heads/main/STA321/uspollution.csv"
pollutionus <- read.csv(url)
The prevalence of air pollution and greenhouse gasses are commonly discussed in conversations about climate change. The several kinds of greenhouse gasses have been increasing and decreasing ever since the industrial revolution. This case study analyzes the family of Nitrogen Oxides (NOx’s). Nitrogen Oxides are released by vehicle emissions, power generation, and other combustion. These pollutants are responsible for a decrease in general air quality.
The 1990 Clean Air Act greatly reduced the emissions of several pollutants including Nitrogen Oxides. This greatly empowered the Environmental Protection Agency (EPA) to undertake several initiatives to reduce pollutant emissions.
Historical Emmissions of pollutants was sourced from Kaggle.com (https://www.kaggle.com/datasets/snehasubramanian/world-pollution-1750-2019?resource=download). The data ranges back to 1750 until 2019 for countries across the world. For this study we focus on the United State’s emissions. The data set records several pollutants, but for this study we will currently observe only Nitrogen Oxide.
The collected information was:
Year
Nitrogen.Oxide
Sulphur.Dioxide
Carbon.Monoxide
Organic.Carbon
NMVOCs
Black.Carbon
Ammonia
Total: Sum of all emissions content.
With a glance, we can observe the impact of the 1970 Clean Air Act Amendment. When observing data before and after this point, we can see a profound change in direction.
pollution <- read_excel("~/WCU/Schoolwork/STA321/Time Series/country wise 1750 - 2019.xlsx")
pollutionus150 <- pollution %>%
filter(Country == "United States") %>%
slice_tail(n = 150) %>%
select(-Country) %>%
mutate(total = `Nitrogen Oxide` + `Sulphur Dioxide` + `Carbon Monoxide` + `Organic Carbon` + NMVOCs + `Black Carbon` + `Ammonia`)
nox150 <- ts(pollutionus150[,3],
frequency = 1,
start = 1870,
end = 2019)
plot(nox150,
xlab = "Year",
ylab = "NOx Content",
main = "Nox Content by Year")
abline(v = 1970, col = "darkred", lwd=2, lty=2)
Before 1970 (indicated by the red dashed line), NOx content was increasing steadily. The trend changes direction after the passing of the Clean Air Act, supporting evidence for it’s value and environmental impact.
For analysis moving forward, we will look at data from 1970 onward. This will give us a sufficient number of observations to work with and captures the more relevant negative trend. Values before 1970 are not relevant to forecasting because they capture a trend that is no longer relevant under the Clean Air Act.
The collected NOx data from 1970-2019 are presented in the plot below. The trend in extracted with a moving average and displayed without imputation (the trend does not continue past the threshold).
nox <- ts(pollutionus[,3],
frequency = 1,
start = 1970,
end = 2019)
trend.nox <- ma(nox, order =10, centre = TRUE)
par(mar=c(2,2,2,2))
plot(nox, xlab="", ylab="", col="darkred", lwd =2)
title(main = "Extracted Trend from Nitrogen Oxide Content")
lines(trend.nox, lwd =2, col = "blue")
legend("topright", c("Original Series", "Trend Curve (Moving Average)"), lwd=rep(2,2),
col=c("darkred", "blue"), bty="n")
We can use four basic forecasting methods to predict future NOx emissions.
Mean: The mean of all recorded values is forecasted.
Naive: The last recorded value is forecasted.
Seasonal Naive: The last recorded seasonal fluctuation is forecasted.
Random Walk with Drift: A line is created connecting the first and last observations. The forecasted predictions are an extension of this line.
Each method’s predicted values are displayed in the table and the plot below.
pred.mv = meanf(nox, h=15)$mean
pred.naive = naive(nox, h=15)$mean
pred.snaive = snaive(nox, h=15)$mean
pred.rwf = rwf(nox, h=15, drift=TRUE)$mean
pred.table = cbind(pred.mv = pred.mv,
pred.naive = pred.naive,
pred.snaive = pred.snaive,
pred.rwf = pred.rwf)
pander(pred.table, caption = "Farecasting Table")
| pred.mv | pred.naive | pred.snaive | pred.rwf | |
|---|---|---|---|---|
| 2020 | 16728787 | 1921172 | 1921172 | 1375955 |
| 2021 | 16728787 | 1921172 | 1921172 | 830738 |
| 2022 | 16728787 | 1921172 | 1921172 | 285521 |
| 2023 | 16728787 | 1921172 | 1921172 | -259696 |
| 2024 | 16728787 | 1921172 | 1921172 | -804913 |
| 2025 | 16728787 | 1921172 | 1921172 | -1350130 |
| 2026 | 16728787 | 1921172 | 1921172 | -1895347 |
| 2027 | 16728787 | 1921172 | 1921172 | -2440564 |
| 2028 | 16728787 | 1921172 | 1921172 | -2985781 |
| 2029 | 16728787 | 1921172 | 1921172 | -3530998 |
| 2030 | 16728787 | 1921172 | 1921172 | -4076215 |
| 2031 | 16728787 | 1921172 | 1921172 | -4621432 |
| 2032 | 16728787 | 1921172 | 1921172 | -5166649 |
| 2033 | 16728787 | 1921172 | 1921172 | -5711866 |
| 2034 | 16728787 | 1921172 | 1921172 | -6257083 |
# Create a time index for forecasts
time_forecast <- time(nox)[length(nox)] + seq(1, length(pred.mv)) / frequency(nox)
# Plot original series
plot(nox, xlab = "Time", ylab = "NOx Content", col = "black", lwd = 2,
main = "NOx Content with Forecasts",
xlim = c(min(time(nox)), max(time_forecast)),
ylim = c(0, max(nox)))
# Add forecasts
lines(time_forecast, pred.mv, col = "red", lwd = 2, lty = 2)
lines(time_forecast, pred.naive, col = "blue", lwd = 2, lty = 3)
lines(time_forecast, pred.snaive, col = "green", lwd = 2, lty = 4)
lines(time_forecast, pred.rwf, col = "purple", lwd = 2, lty = 5)
# Add legend
legend("topright",
legend = c("Original", "Mean Forecast", "Naive", "Seasonal Naive", "Random Walk w/ Drift"),
col = c("black", "red", "blue", "green", "purple"),
lty = c(1, 2, 3, 4, 5), lwd = 2, bty = "n")
These forecast methods are simple, and as such may miss some of the nuance of the true trend. For example, the Random Walk with Drift predicts a negative value by 2023. Additionally, the three other methods predict a constant emission every year (although we understand that the error variation increases with each year.) While these forecasts are either impossible or highly unlikely to become true, they are a useful starting point in understanding the trend of the data. For example, we can see that there is no seasonality in this data, which was expected with the recordings being annual.
We can conclude that the Clean Air Act of 1970 had a profound impact on the emissions the United States produces. There is an undeniable negative trend on the Nitrogen Oxide content.