Overview

We forecasted the weekly price of gas using time series analysis techniques. We first came up with possible models to describe the change in the gas price. We then picked the best model by comparing the performance of each model.

We used weekly gas price in the U.S. from February 11th, 2013 to April 13th, 2020. We used such a wide range of times since the price tends to fluctuate much and wanted to see the big trend of the price change.

Libraries

library(dplyr)
library(tseries)
library(astsa)
library(knitr)

Importing Data

data = read.csv("gas price.csv", header = TRUE)

data = data %>%
  mutate(Date = as.Date(Date, '%B %d, %Y'))

Gas Price & Gas Price Difference Plot

par(mfrow=c(1,2))
plot(data$Date,data$price, type="o", ylab="Gas Price", xlab = "Date")
plot(data$Date[-1],diff(data$price), type="o", ylab="Gas Price Difference", xlab = "Date")

adf.test(data$price)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  data$price
## Dickey-Fuller = -2.3214, Lag order = 7, p-value = 0.4413
## alternative hypothesis: stationary
adf.test(diff(data$price))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(data$price)
## Dickey-Fuller = -4.9522, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

When we looked at the gas price graph, it did not look stationary so we ran the ADF test and the p-value was 0.4413, indicating that it is not a stationary process. For that, we took the difference in gas prices. When we ran the ADF test of difference in gas prices, the p-value was less than 0.05, indicating a stationary process with a possible linear trend. However, from the plot, it was clear that there is no linear trend.

Model Selction

par(mfrow=c(1,2))
acf(diff(data$price))
pacf(diff(data$price))

PACF plot suggested an autoregressive model, the AR(1) process, while the ACF plot suggested a moving-average model, MA(5) process. Since it is a forecast for the gas price, we believed that it is crucial to include past forecast to predict future values; hence we picked the AR(1) process.

AR(1)

sarima(data$price, 1,1,0)
## initial  value -3.060550 
## iter   2 value -3.259457
## iter   3 value -3.259530
## iter   4 value -3.259555
## iter   5 value -3.259565
## iter   6 value -3.259593
## iter   7 value -3.259597
## iter   8 value -3.259599
## iter   9 value -3.259600
## iter  10 value -3.259602
## iter  11 value -3.259603
## iter  12 value -3.259603
## iter  12 value -3.259603
## iter  12 value -3.259603
## final  value -3.259603 
## converged
## initial  value -3.248377 
## iter   2 value -3.248488
## iter   3 value -3.248521
## iter   4 value -3.248536
## iter   5 value -3.248573
## iter   6 value -3.248588
## iter   7 value -3.248592
## iter   8 value -3.248592
## iter   8 value -3.248592
## iter   8 value -3.248592
## final  value -3.248592 
## converged

## $fit
## 
## Call:
## stats::arima(x = xdata, order = c(p, d, q), seasonal = list(order = c(P, D, 
##     Q), period = S), xreg = constant, transform.pars = trans, fixed = fixed, 
##     optim.control = list(trace = trc, REPORT = 1, reltol = tol))
## 
## Coefficients:
##          ar1  constant
##       0.5797   -0.0044
## s.e.  0.0428    0.0048
## 
## sigma^2 estimated as 0.001506:  log likelihood = 684.29,  aic = -1362.58
## 
## $degrees_of_freedom
## [1] 372
## 
## $ttable
##          Estimate     SE t.value p.value
## ar1        0.5797 0.0428 13.5296  0.0000
## constant  -0.0044 0.0048 -0.9226  0.3568
## 
## $AIC
## [1] -3.643265
## 
## $AICc
## [1] -3.643178
## 
## $BIC
## [1] -3.611787

From the table, we noticed that the constant term is insignificant in this model so we removed the constant term and reran the sarima function.

sarima(data$price, 1,1,0, fixed = c(NA,0))

When we ran the sarima function without the constant term, all the other terms were significant so we selected ARIMA(1,1,0) without the constant term as our final model.

arima(1,1,0) Model Forecast Test

ts_price = ts(data$price)

current = window(ts_price,end=368)
future = window(ts_price,start=369)

arima_plot = sarima.for(current,n.ahead=7,1,1,0,fixed = c(NA,0))
points(x=(369:375),y=future,pch=3)

forecast=arima_plot$pred
sum((future-forecast)**2)
## [1] 1.229828

We used data without the last few terms and the selected model to predict the gas price. However, as the plot suggested, our forecast was way off from the actual gas price. Our model predicted an increase in gas prices, whereas the gas price dropped in reality.

Second Approach

time1 = time(data$Date)
time2 = time1*time1
time3 = time1*time2

par(mfrow=c(2,2))

plot(data$Date,data$price, type='l',ylab="Gas Price", xlab = "Date")
lm.linear = lm(data$price~time1)
r.linear =residuals(lm.linear)
points(data$Date, lm.linear$fit)
plot(data$Date,r.linear,type="o",ylab="Gas Price Residuals", xlab = "Date")

plot(data$Date,data$price, type='l',ylab="Gas Price", xlab = "Date")
lm.cubic = lm(data$price~time1+time2+time3)
r.cubic =residuals(lm.cubic)
points(data$Date, lm.cubic$fit)
plot(data$Date,r.cubic,type="o",ylab="Gas Price Residuals", xlab = "Date")

adf.test(r.cubic)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  r.cubic
## Dickey-Fuller = -3.0709, Lag order = 7, p-value = 0.125
## alternative hypothesis: stationary

The second approach was detrending the price data before we apply the time series analysis techniques. When we looked at the price graph, we noticed that there was a trend. We came up with two possibilities that there might be a linear or cubic trend. We concluded that the cubic trend is more fitting in this model. For that, we detrended the cubic trend and analyzed their residuals. When we ran the ADF test, the p-value of cubic detrended data came back 0.125, meaning that it is not a stationary process.

Gas Price Residuals Difference Plot

par(mfrow=c(1,1))
plot(data$Date[-1],diff(r.cubic), type="o", ylab="Gas Price Residuals", xlab = "Date")

adf.test(diff(r.cubic))
## 
##  Augmented Dickey-Fuller Test
## 
## data:  diff(r.cubic)
## Dickey-Fuller = -5.0616, Lag order = 7, p-value = 0.01
## alternative hypothesis: stationary

We took the difference in residual of detrended data. The ADF test came back less than 0.05 so we concluded that finding the difference once was sufficient to make the process stationary.

Model Selction

par(mfrow=c(1,2))
acf(diff(r.cubic))
pacf(diff(r.cubic))

PACF plot suggested the AR(1) process and the ACF plot suggested MA(5) process. Applying the same logic as the model selection process in the first approach, we picked the AR(1) process.

AR(1)

sarima(r.cubic, 1,1,0)
sarima(r.cubic, 1,1,0, fixed = c(NA,0))

Applying the same logic as the model selection process in the first approach, we selected ARIMA(1,1,0) without the constant term as our final model.

arima(1,1,0) Model Forecast Test

current = window(r.cubic,end=368)
future = window(r.cubic,start=369)

arima_plot = sarima.for(current,n.ahead=7,1,1,0,fixed = c(NA,0))
points(x=(369:375),y=future,pch=3)

forecast=arima_plot$pred
sum((future-forecast)**2)
## [1] 1.184792

We used the same method of testing the model and just like the first approach, our model predicted an increase in gas prices, whereas the gas price dropped in reality.

Model Comparison

Model AIC AICc BIC SSE
Differenced data ARIMA(1,1,0) -3.6464 -3.6464 -3.6254 1.229828
Cubic Detrended ARIMA(1,1,0) -3.6449 -3.6448 -3.6239 1.137243

The first approach model had lower AIC, AICc, and BIC, while the model from the second approach had lower SSE.

Differenced data ARIMA(1,1,0) follows the following equation:
\[X_{t} - X_{t-1} = 0.5837*(X_{t-1} - X_{t-2}) + e_{t}\] where \[e_{t} \sim WN(0,0.001509)\]

Cubic Detrended ARIMA(1,1,0) follows the following equation:
\[G_{t} = X_{t} + (4.225 - 0.0225t + 9.288*10^{-5}t^2 - 1.177*10^{-7}t^3)\] \[X_{t} - X_{t-1} = 0.5824*(X_{t-1} - X_{t-2}) + e_{t}\] where \[e_{t} \sim WN(0,0.001512)\]

Conclusion

In this project, we used time series analysis techniques to find the ’best’ model to forecast the price change of the gas. However, our models failed to predict a significant drop in the gas price; instead, it predicted that the prices would go up. The main reason for this miscalculation was since our models made predictions solely based on given price information. In the future, it would improve our forecast significantly if we could find ways to include other factors that may influence the gas price change.