Non renewable sources of energy, albeit depleting, are still an integral part of our lives. Crude oil, a form of fossil fuel is used widely around the world. The advantages and use of petroleum vary from being a source of fuel to run planes, in manufacturing of clothes, for insulation and more.
In this small project we trying to predict the next 3 months oil price by daily basis ranging from *October 2012 to the day of this writing, July 2022**. The objective of this project is to forecast the oil price of the next 3 months ahead using Time series to do the forecasting.
First we import the data. We used the data from the Yahoo! Finance Crude Oil. Let’s see first few observations from the data.
#Read data
oilprice <- read.csv("oilprice.csv")
head(oilprice, 10)As the data shown above, we only need the Date and Close columns. We try to remove the other unecessary columns. And changing the Date column to datetime.
#Remove unecessary columns
oilprice <- oilprice %>%
select(c(Date, Close.)) %>%
mutate(Date = ymd(Date))
head(oilprice, 10)As we can see from the few first observations, there a few missing dates cause of the skipped days of not being recorded. As the Time Series analysis suggest that there shouldn’t be any missing observations, in this case we must fill the missing month.
oilprice <- oilprice %>%
pad('month',
start_val = as.POSIXlt(min(oilprice$Date)),
end_val = as.POSIXlt(max(oilprice$Date)))
head(oilprice, 20)As we can see above, the missing month is filled with NA, lets fill the data with the price from the previous month.
oilprice <- na.locf(oilprice, fromLast = T)
head(oilprice, 20)Now the data is prepped, we can continue to the next step to create a Time-Series object. We create it according to the data, which is monthly.
oilprice_ts <- ts(data = oilprice$Close., start = c(2013,1), frequency = 12 )
oilprice_ts#> Jan Feb Mar Apr May Jun Jul Aug Sep Oct
#> 2013 97.49 92.05 97.23 93.46 91.97 96.56 105.03 107.65 96.38 96.38
#> 2014 97.49 102.59 101.58 99.74 102.71 98.17 98.17 95.96 91.16 80.54
#> 2015 48.24 59.63 59.63 59.63 60.30 59.47 47.12 49.20 45.09 46.59
#> 2016 33.62 33.75 38.34 45.92 48.33 48.33 41.60 44.70 48.24 46.86
#> 2017 54.01 54.01 50.60 49.33 48.32 46.04 50.17 47.23 51.67 57.40
#> 2018 64.73 61.64 64.94 67.04 67.04 74.15 69.80 69.80 73.25 65.31
#> 2019 53.79 57.22 60.14 63.91 53.50 58.47 58.58 55.10 54.18 54.18
#> 2020 51.56 44.76 18.84 18.84 35.49 39.27 40.27 42.61 40.22 35.79
#> 2021 52.20 61.50 59.16 63.58 66.32 73.47 73.95 75.03 75.03 83.57
#> Nov Dec
#> 2013 92.72 97.49
#> 2014 66.15 53.27
#> 2015 37.04 37.04
#> 2016 49.44 53.72
#> 2017 57.40 60.42
#> 2018 50.93 45.41
#> 2019 55.17 51.56
#> 2020 48.52 48.52
#> 2021 66.18 75.21
Now all set, next we can plot the time series object.
oilprice_ts %>% autoplot()Now that the plot is visible, we can separate its trend, seasonal, and remainder by decomposing the time series object.
oilprice_decomposed <- decompose(oilprice_ts)
oilprice_decomposed %>% autoplot()As we can see from the trend, its rather fluctuate with time, but mostly decreasing, but slowly increasing through the mid of 2022.
For the train and test data we can use the data by taking one year or 12 months from the original data.
#Test Data
oilprice_test <- tail(oilprice_ts, n = 3)
#Train Data
oilprice_train <- head(oilprice_ts, n = -3)The next step is to build the model. For this project we are using ETS Holt-Winters because the data contains both seasonal and trend. For comparison, we also using Seasonal ARIMA to check which model is better for forecasting.
# ETS Holt-Winters
oilprice_ets <- stlm(oilprice_train, method = "ets", lambda = 0)
# Seasonal ARIMA
oilprice_sarima <- stlm(oilprice_train, method = "arima", lambda = 0)# ETS Holt-Winters
oilprice_ets_forecast <- forecast(oilprice_ets, h = 3)
oilprice_ets_forecast#> Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
#> Oct 2021 73.70409 62.58452 86.79932 57.39429 94.64867
#> Nov 2021 73.30258 58.16747 92.37582 51.46488 104.40650
#> Dec 2021 71.49526 53.86009 94.90464 46.36064 110.25673
#Seasonal ARIMA
oilprice_sarima_forecast <- forecast(oilprice_sarima, h = 3)
oilprice_sarima_forecast#> Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
#> Oct 2021 73.34777 62.68822 85.81989 57.68760 93.25913
#> Nov 2021 72.83164 57.10803 92.88443 50.20926 105.64679
#> Dec 2021 71.10311 53.40682 94.66306 45.89871 110.14803
a <- autoplot(oilprice_ets_forecast, series = "ETS", fcol = "red") +
autolayer(oilprice_ts, series = "Actual", color = "black") +
labs(subtitle = "Oil Prices, from Jan 2012 - Jan 2022",
y = "Oil Price in USD") +
theme_minimal()
b <- autoplot(oilprice_sarima_forecast, series = "ARIMA", fcol = "blue") +
autolayer(oilprice_ts, series = "Actual", color = "black") +
labs(subtitle = "Oil Prices, from Jan 2012 - Jan 2022",
y = "Oil Price in USD") +
theme_minimal()
grid.arrange(a,b)data.frame(ETS = RMSE(oilprice_ets_forecast$mean, oilprice_test),
SARIMA = RMSE(oilprice_sarima_forecast$mean, oilprice_test))From the Result, we can conclude that ETS is the best forecast model, with the lowest RMSE with ~7,34 compared to the Seasonal ARIMA with ~7,42.
Normality: Shapiro.test
H0 : residuals are normally distributed H1 : residuals are not normally distributed
# p-value < 0.05; reject H0; accept H1
shapiro.test(oilprice_ets_forecast$residuals) #>
#> Shapiro-Wilk normality test
#>
#> data: oilprice_ets_forecast$residuals
#> W = 0.82461, p-value = 7.904e-10
hist(oilprice_ets_forecast$residuals, breaks = 12)plot(oilprice_ets_forecast$residuals)Based on the result, our data is not normally distributed.
Autocorrelation: Box.test - Ljng-Box
H0 : No autocorrelation in the forecast errors H1 : there is an autocorrelation in the forecast errors
# there is not enough data to reject H0
Box.test(oilprice_ets_forecast$residuals, type = "Ljung-Box") #>
#> Box-Ljung test
#>
#> data: oilprice_ets_forecast$residuals
#> X-squared = 2.2534, df = 1, p-value = 0.1333
Based on the result, there is no autocorrelation
In a time series, such errors might emerge from various unpredictable events and is actually quite unavoidable. One strategy to overcome it is to analyze what kinds of unpredictable events that might occur and occurs frequently. This can be done by time series analysis using seasonality adjustment.
Based on the true events, there is currently no best way to predicting Oil Prices, oil companies and oil services companies will have to adapt to recent world activities that impact oil production, transport, refinery, etc. that can directly or other variable that can indirectly impact oil prices.