1 Introduction

Non renewable sources of energy, albeit depleting, are still an integral part of our lives. Crude oil, a form of fossil fuel is used widely around the world. The advantages and use of petroleum vary from being a source of fuel to run planes, in manufacturing of clothes, for insulation and more.

2 Objective

In this small project we trying to predict the next 3 months oil price by daily basis ranging from *October 2012 to the day of this writing, July 2022**. The objective of this project is to forecast the oil price of the next 3 months ahead using Time series to do the forecasting.

3 Data Preparation

3.1 Importing Data

First we import the data. We used the data from the Yahoo! Finance Crude Oil. Let’s see first few observations from the data.

#Read data
oilprice <- read.csv("oilprice.csv")
head(oilprice, 10)

As the data shown above, we only need the Date and Close columns. We try to remove the other unecessary columns. And changing the Date column to datetime.

#Remove unecessary columns
oilprice <- oilprice %>% 
  select(c(Date, Close.)) %>% 
  mutate(Date = ymd(Date))
head(oilprice, 10)

3.2 Data Wrangling

As we can see from the few first observations, there a few missing dates cause of the skipped days of not being recorded. As the Time Series analysis suggest that there shouldn’t be any missing observations, in this case we must fill the missing month.

oilprice <- oilprice %>% 
  pad('month', 
      start_val = as.POSIXlt(min(oilprice$Date)), 
      end_val = as.POSIXlt(max(oilprice$Date)))
head(oilprice, 20)

As we can see above, the missing month is filled with NA, lets fill the data with the price from the previous month.

oilprice <- na.locf(oilprice, fromLast = T)
head(oilprice, 20)

3.3 Time-Series Object

Now the data is prepped, we can continue to the next step to create a Time-Series object. We create it according to the data, which is monthly.

oilprice_ts <- ts(data = oilprice$Close., start = c(2013,1), frequency = 12 )
oilprice_ts
#>         Jan    Feb    Mar    Apr    May    Jun    Jul    Aug    Sep    Oct
#> 2013  97.49  92.05  97.23  93.46  91.97  96.56 105.03 107.65  96.38  96.38
#> 2014  97.49 102.59 101.58  99.74 102.71  98.17  98.17  95.96  91.16  80.54
#> 2015  48.24  59.63  59.63  59.63  60.30  59.47  47.12  49.20  45.09  46.59
#> 2016  33.62  33.75  38.34  45.92  48.33  48.33  41.60  44.70  48.24  46.86
#> 2017  54.01  54.01  50.60  49.33  48.32  46.04  50.17  47.23  51.67  57.40
#> 2018  64.73  61.64  64.94  67.04  67.04  74.15  69.80  69.80  73.25  65.31
#> 2019  53.79  57.22  60.14  63.91  53.50  58.47  58.58  55.10  54.18  54.18
#> 2020  51.56  44.76  18.84  18.84  35.49  39.27  40.27  42.61  40.22  35.79
#> 2021  52.20  61.50  59.16  63.58  66.32  73.47  73.95  75.03  75.03  83.57
#>         Nov    Dec
#> 2013  92.72  97.49
#> 2014  66.15  53.27
#> 2015  37.04  37.04
#> 2016  49.44  53.72
#> 2017  57.40  60.42
#> 2018  50.93  45.41
#> 2019  55.17  51.56
#> 2020  48.52  48.52
#> 2021  66.18  75.21

4 Exploratory Data Analysis

Now all set, next we can plot the time series object.

oilprice_ts %>% autoplot()

4.1 Decompose

Now that the plot is visible, we can separate its trend, seasonal, and remainder by decomposing the time series object.

oilprice_decomposed <- decompose(oilprice_ts)
oilprice_decomposed %>% autoplot()

As we can see from the trend, its rather fluctuate with time, but mostly decreasing, but slowly increasing through the mid of 2022.

4.2 Splitting Data

For the train and test data we can use the data by taking one year or 12 months from the original data.

#Test Data
oilprice_test <- tail(oilprice_ts, n = 3)

#Train Data
oilprice_train <- head(oilprice_ts, n = -3)

5 Modeling

The next step is to build the model. For this project we are using ETS Holt-Winters because the data contains both seasonal and trend. For comparison, we also using Seasonal ARIMA to check which model is better for forecasting.

# ETS Holt-Winters
oilprice_ets <- stlm(oilprice_train, method = "ets", lambda = 0)
# Seasonal ARIMA
oilprice_sarima <- stlm(oilprice_train, method = "arima", lambda = 0)

6 Forecast

# ETS Holt-Winters
oilprice_ets_forecast <- forecast(oilprice_ets, h = 3)
oilprice_ets_forecast
#>          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
#> Oct 2021       73.70409 62.58452 86.79932 57.39429  94.64867
#> Nov 2021       73.30258 58.16747 92.37582 51.46488 104.40650
#> Dec 2021       71.49526 53.86009 94.90464 46.36064 110.25673
#Seasonal ARIMA
oilprice_sarima_forecast <- forecast(oilprice_sarima, h = 3)
oilprice_sarima_forecast
#>          Point Forecast    Lo 80    Hi 80    Lo 95     Hi 95
#> Oct 2021       73.34777 62.68822 85.81989 57.68760  93.25913
#> Nov 2021       72.83164 57.10803 92.88443 50.20926 105.64679
#> Dec 2021       71.10311 53.40682 94.66306 45.89871 110.14803
a <- autoplot(oilprice_ets_forecast, series = "ETS", fcol = "red") +
  autolayer(oilprice_ts, series = "Actual", color = "black") + 
  labs(subtitle = "Oil Prices, from Jan 2012 - Jan 2022",
       y = "Oil Price in USD") +
  theme_minimal()

b <- autoplot(oilprice_sarima_forecast, series = "ARIMA", fcol = "blue") +
  autolayer(oilprice_ts, series = "Actual", color = "black") +
  labs(subtitle = "Oil Prices, from Jan 2012 - Jan 2022",
       y = "Oil Price in USD") +
  theme_minimal()

grid.arrange(a,b)

6.1 Evaluation

data.frame(ETS = RMSE(oilprice_ets_forecast$mean, oilprice_test), 
           SARIMA = RMSE(oilprice_sarima_forecast$mean, oilprice_test))

From the Result, we can conclude that ETS is the best forecast model, with the lowest RMSE with ~7,34 compared to the Seasonal ARIMA with ~7,42.

7 Assumption

7.1 Normality

Normality: Shapiro.test

H0 : residuals are normally distributed H1 : residuals are not normally distributed

# p-value < 0.05; reject H0; accept H1
shapiro.test(oilprice_ets_forecast$residuals) 
#> 
#>  Shapiro-Wilk normality test
#> 
#> data:  oilprice_ets_forecast$residuals
#> W = 0.82461, p-value = 7.904e-10
hist(oilprice_ets_forecast$residuals, breaks = 12)

plot(oilprice_ets_forecast$residuals)

Based on the result, our data is not normally distributed.

7.2 Autocorrelation

Autocorrelation: Box.test - Ljng-Box

H0 : No autocorrelation in the forecast errors H1 : there is an autocorrelation in the forecast errors

# there is not enough data to reject H0
Box.test(oilprice_ets_forecast$residuals, type = "Ljung-Box") 
#> 
#>  Box-Ljung test
#> 
#> data:  oilprice_ets_forecast$residuals
#> X-squared = 2.2534, df = 1, p-value = 0.1333

Based on the result, there is no autocorrelation

8 Conclusion

In a time series, such errors might emerge from various unpredictable events and is actually quite unavoidable. One strategy to overcome it is to analyze what kinds of unpredictable events that might occur and occurs frequently. This can be done by time series analysis using seasonality adjustment.

Based on the true events, there is currently no best way to predicting Oil Prices, oil companies and oil services companies will have to adapt to recent world activities that impact oil production, transport, refinery, etc. that can directly or other variable that can indirectly impact oil prices.