Instructions: Please ensure that your graphs and visuals have proper titles and axis labels, where necessary. Refer to the output, whenever appropriate, when discussing the results. Lastly, remember that creativity (coupled with relevance) will be rewarded.

Question 1: Forecasting Carbon Dioxide Emissions

Carbon Emissions is an ever-evolving and exciting issue among environmental activists and academics alike.

  1. Use the quantmod package to pull the Transportation Carbon Dioxide Emissions, All Fuels for United States (EMISSCO2TOTVTCTOUSA) – measured in Million Metric Tons CO2– from FRED and declare as a ts object with the appropriate start date. Be sure to check that the dates are correct.
getSymbols("EMISSCO2TOTVTCTOUSA", src = "FRED")
## [1] "EMISSCO2TOTVTCTOUSA"
head(EMISSCO2TOTVTCTOUSA)
##            EMISSCO2TOTVTCTOUSA
## 1970-01-01            1131.668
## 1971-01-01            1176.331
## 1972-01-01            1246.494
## 1973-01-01            1312.130
## 1974-01-01            1278.184
## 1975-01-01            1288.565
co2_ts <- ts(EMISSCO2TOTVTCTOUSA, start = c(2000, 1), frequency = 12)
  1. Provide a time-series plot of the data and comment on its stationarity.

In the plot below there it is weak stationarity. There is a trend in the data and there is clear variance in the data.

autoplot(co2_ts) + labs(title = "Transportation CO₂ Emissions (All Fuels, US)",
subtitle = "Monthly, in Million Metric Tons, from Jan 2000",x = "Year",y = "CO₂ Emissions (Million Metric Tons)") +theme_minimal()

  1. Perform the Box-Jenkins procedure to determine the candidate model(s) for this data.

Both models have the identical AIC and BIC statistics. As a result, we want the simpliest of the two models. This is ARIMA(1,1,1)(0,1,1). The Ljung Box test has a p-value of .9921

I used the Ljung-Box test p-values to determine the preferred model. This dhows there is no significant autocorrelation in the residuals. The residuals behave like white noise.

library(tseries)
adf_test <- adf.test(co2_ts)
adf_test
## 
##  Augmented Dickey-Fuller Test
## 
## data:  co2_ts
## Dickey-Fuller = -1.1313, Lag order = 3, p-value = 0.9093
## alternative hypothesis: stationary
library(forecast)
co2_diff <- diff(co2_ts, differences = 1)
ggtsdisplay(co2_diff, main = "1st Difference of Transportation CO₂ Emissions")

fit1 <- auto.arima(co2_ts, seasonal = TRUE)
fit2 <- auto.arima(co2_ts, seasonal = TRUE, stepwise = FALSE, approximation = FALSE)
library(knitr)
model_compare <- data.frame(Model = c("Auto.ARIMA (Stepwise)", "Auto.ARIMA (Full Search)"),AIC = c(fit1$aic, fit2$aic),BIC = c(fit1$bic, fit2$bic))
kable(model_compare, caption = "Model Comparison Table (ARIMA Candidates)")
Model Comparison Table (ARIMA Candidates)
Model AIC BIC
Auto.ARIMA (Stepwise) 565.3253 569.189
Auto.ARIMA (Full Search) 565.3253 569.189
checkresiduals(fit2)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,0)(0,0,1)[12]
## Q* = 1.9593, df = 9, p-value = 0.9921
## 
## Model df: 1.   Total lags used: 10
auto.arima(co2_ts)
## Series: co2_ts 
## ARIMA(0,1,0)(0,0,1)[12] 
## 
## Coefficients:
##         sma1
##       0.4232
## s.e.  0.1814
## 
## sigma^2 = 3435:  log likelihood = -280.66
## AIC=565.33   AICc=565.58   BIC=569.19
model_compare <- data.frame(Model = c("ARIMA(1,1,1)(0,1,1)[12]", "ARIMA(2,1,0)(0,1,1)[12]"),AIC = c(fit1$aic, fit2$aic),BIC = c(fit1$bic,fit2$bic))
kable(model_compare, caption = "Model Comparison: AIC and BIC")
Model Comparison: AIC and BIC
Model AIC BIC
ARIMA(1,1,1)(0,1,1)[12] 565.3253 569.189
ARIMA(2,1,0)(0,1,1)[12] 565.3253 569.189
  1. Using your preferred model, present an autoplot of the forecast for the next 5 years.
forecast_5yr <- forecast(fit2, h = 60)
autoplot(forecast_5yr) +labs(title = "Forecast: Transportation CO₂ Emissions (Next 5 Years)",x = "Year",y = "CO₂ Emissions (Million Metric Tons)",color="Series", fill="Confidence Interval") +
  theme_light()

Question 2: Google Price Data

Using the Box-Jenkins procedure, use an appropriate ARIMA model to forecast the next 2 weeks of closing prices for the goog series from the fpp2 package.

data("goog")
autoplot(goog) +labs(title = "GOOG Daily Closing Prices",x = "Time",y = "Price (USD)") +theme_minimal()

adf.test(goog)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  goog
## Dickey-Fuller = -2.5417, Lag order = 9, p-value = 0.349
## alternative hypothesis: stationary
goog_diff <- diff(goog)
autoplot(goog_diff) +labs(title = "Differenced GOOG Prices") +theme_minimal()

fit_goog <- auto.arima(goog)
summary(fit_goog)
## Series: goog 
## ARIMA(0,1,0) with drift 
## 
## Coefficients:
##        drift
##       0.4213
## s.e.  0.2760
## 
## sigma^2 = 76.19:  log likelihood = -3581.45
## AIC=7166.89   AICc=7166.9   BIC=7176.7
## 
## Training set error measures:
##                        ME     RMSE      MAE         MPE      MAPE     MASE
## Training set 0.0003924086 8.719767 5.831708 -0.01050304 0.9753774 1.000395
##                    ACF1
## Training set 0.03871058
checkresiduals(fit_goog)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(0,1,0) with drift
## Q* = 13.134, df = 10, p-value = 0.2163
## 
## Model df: 0.   Total lags used: 10
forecast_goog <- forecast(fit_goog, h = 10)
autoplot(forecast_goog) +labs(title = "Forecast: GOOG Closing Prices (Next 2 Weeks)",x = "Day",y = "Price (USD)",color = "Series",fill = "Prediction Interval"
  ) +
theme_light()

Question 3: JohnBear Sales Forecasts

You have responsibility for JohnBear’s sales forecasting. The sales department forwarded data for 115 consecutive months, ending in December 2022. The data is presented in the JohnBear.csv file on GitHub. You can navigate to the Data folder using this link: https://github.com/Shamar-Stewart/MS_Forecasting/tree/main/Data

Your tasks are to:

  1. Read in the data and declare it as a ts() object with the appropriate frequency, end date, etc. You can read the file directly by clicking on the link above, clicking on the file, and then clicking the “Raw” button. You will then copy the URL and use the read.csv() function with the argument header = FALSE to read in the data. The URL should have the following format: https://raw.githubusercontent.com/....
url <- "https://raw.githubusercontent.com/Shamar-Stewart/MS_Forecasting/main/Data/JohnBear.csv"
johnbear_data <- read.csv(url, header = FALSE)
johnbear_ts <- ts(johnbear_data$V1, start = c(2013, 8), frequency = 12)
autoplot(johnbear_ts) +labs(title = "JohnBear Monthly Sales (Aug 2013 – Dec 2022)",x = "Year",y = "Sales") +theme_minimal()

  1. Produce a time plot of the series and comment on the patterns. Does there appear to be seasonality?

Yes, this is quarterly data and it has a seasonal rythm to the data.

How about a trend?

There appears to be an upward trend.

Does it appear to be nonstationary?

Yes, the variance is visible

No formal testing is required here, just a discussion of how you came to this conclusion from the plot.

autoplot(johnbear_ts) +labs(title = "JohnBear Monthly Sales (Aug 2013 – Dec 2022)",x = "Year",y = "Sales") +theme_minimal()

  1. Develop a Seasonal ARIMA model for this data. Be sure to walk the Director of Sales through your model development process.

Due to a review of the data, we have identified the data had a strong seasonality, an upward trend, and non-stationary. As a result we developed a 12 month sales forecast based on Seasonal ARIMA (1,0,1)(0,1,1) model with a drift. The coefficient of the AR1 is .8715 which captures the persistent effects of past sales. The coefficient of the MA1 is -.7561 which helps to smooth the short-term fluctuations. The coefficient of the SMA1 is -.6399 which adjusts for recurring annual effects. The coefficient of the Drift is 7.27 which indicates a steady upward momentum in month to month sales. The p-value is above .05 which indicates no significant autocorrelation in the residuals. The MAPE is 4.72% which reflects high forecasting accuracy.Below, we are showing a 12-month forecast at an 80% and 95% confidence interval. This shows the best case and worst case scenarios throughout the year. This data should increase the company’s ability to plan inventory, staffing, and marketing strategies.

fit_johnbear <- auto.arima(johnbear_ts, seasonal = TRUE, stepwise = FALSE, approximation = FALSE)
summary(fit_johnbear)
## Series: johnbear_ts 
## ARIMA(1,0,1)(0,1,1)[12] with drift 
## 
## Coefficients:
##          ar1      ma1     sma1   drift
##       0.8715  -0.7561  -0.6399  7.2674
## s.e.  0.0989   0.1186   0.0950  0.7538
## 
## sigma^2 = 12344:  log likelihood = -632.4
## AIC=1274.8   AICc=1275.42   BIC=1287.97
## 
## Training set error measures:
##                    ME    RMSE      MAE        MPE     MAPE      MASE
## Training set 1.471479 103.087 77.63983 -0.8789694 4.720894 0.6131654
##                     ACF1
## Training set -0.08629146
checkresiduals(fit_johnbear)

## 
##  Ljung-Box test
## 
## data:  Residuals from ARIMA(1,0,1)(0,1,1)[12] with drift
## Q* = 12.949, df = 20, p-value = 0.8795
## 
## Model df: 3.   Total lags used: 23
  1. Produce a forecast of the Sales value for the next 12 months.
johnbear_forecast <- forecast(fit_johnbear, h = 12)
autoplot(johnbear_forecast) +labs(title = "Forecast: JohnBear Sales for the Next 12 Months",x = "Year",y = "Sales",color = "Series",fill = "Confidence Interval") +theme_minimal()