Introduction:
About the company:
Subsequent to the dismantling of the Administered Pricing Mechanism (APM) in the petroleum sector, Oil Coordination Committee (OCC) was abolished and a new cell called Petroleum Planning & Analysis Cell (PPAC), attached to the Ministry of Petroleum and Natural Gas, was created effective 1st April 2002.
PPAC is to assist the Government in discharge of some of the functions earlier performed by OCC.
Shri Ram Naik, the then Hon’ble Minister of Petroleum & Natural Gas inaugurated Petroleum Planning & Analysis Cell at SCOPE Complex, Lodhi Road, New Delhi.
The expenditure of PPAC is borne by Oil Industry Development Board (OIDB).
The cell is located at New Delhi (India).
Here we collected data of net imports by india since Apr’2011 to Mar’2020.
Goal: To forecast the net crude oil import by india for next 2 years.
Data Collection:
Data collected from PPAC (Petroleum planning and analysis cell)
Data Exploration and Visualization:
tsdisplay(netimport.ts) # inspecting the ts object
autoplot(netimport.ts) # Plot function
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'
ggseasonplot(netimport.ts) # Plot the seasonality
## Warning in is.na(ylab): is.na() applied to non-(list or vector) of type
## 'NULL'
ggseasonplot(netimport.ts, polar = TRUE) # poltting seasonality with time as circular rather than linear
## Warning in is.na(ylab): is.na() applied to non-(list or vector) of type
## 'NULL'
ggsubseriesplot(netimport.ts) # Plotting subseries for each month (season)
ggAcf(netimport.ts) # Autocorrelation Plot
ggPacf(netimport.ts) # partial Autocorrelation Plot
Data pre partition and preprocessing:
# Data Partition
nValid <- 12
nTrain <- length(netimport.ts) - nValid
train.ts <- window(netimport.ts, start = c(2011, 4), end = c(2011, nTrain))
valid.ts <- window(netimport.ts, start = c(2011, nTrain + 4 ), end = c(2011, nTrain + nValid))
applying forecast methods
######################################################################################
# Naive forecasts
naive_fc <- naive(train.ts, h = 20)
summary(naive_fc)
##
## Forecast method: Naive method
##
## Model Information:
## Call: naive(y = train.ts, h = 20)
##
## Residual sd: 1777.2562
##
## Error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 50.14565 1768.282 1341.233 -0.6894039 10.62507 1.014018
## ACF1
## Training set -0.5272263
##
## Forecasts:
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## Jan 2019 17453.22 15187.076 19719.36 13987.451 20918.99
## Feb 2019 17453.22 14248.408 20658.03 12551.883 22354.56
## Mar 2019 17453.22 13528.143 21378.30 11450.332 23456.11
## Apr 2019 17453.22 12920.931 21985.51 10521.682 24384.76
## May 2019 17453.22 12385.967 22520.47 9703.525 25202.91
## Jun 2019 17453.22 11902.322 23004.12 8963.855 25942.59
## Jul 2019 17453.22 11457.565 23448.87 8283.657 26622.78
## Aug 2019 17453.22 11043.596 23862.84 7650.545 27255.89
## Sep 2019 17453.22 10654.787 24251.65 7055.913 27850.53
## Oct 2019 17453.22 10287.042 24619.40 6493.496 28412.94
## Nov 2019 17453.22 9937.269 24969.17 5958.565 28947.88
## Dec 2019 17453.22 9603.065 25303.37 5447.444 29459.00
## Jan 2020 17453.22 9282.520 25623.92 4957.213 29949.23
## Feb 2020 17453.22 8974.084 25932.36 4485.500 30420.94
## Mar 2020 17453.22 8676.480 26229.96 4030.355 30876.09
## Apr 2020 17453.22 8388.642 26517.80 3590.144 31316.30
## May 2020 17453.22 8109.667 26796.77 3163.489 31742.95
## Jun 2020 17453.22 7838.783 27067.66 2749.208 32157.23
## Jul 2020 17453.22 7575.325 27331.11 2346.284 32560.16
## Aug 2020 17453.22 7318.714 27587.73 1953.830 32952.61
# Use accuracy() to compute RMSE statistics
accuracy(naive_fc,netimport.ts)
## ME RMSE MAE MPE MAPE MASE
## Training set 50.14565 1768.282 1341.233 -0.6894039 10.625070 1.0140184
## Test set -362.49800 1380.497 1036.706 -2.7455271 6.236945 0.7837856
## ACF1 Theil's U
## Training set -0.52722625 NA
## Test set -0.07991316 0.7057737
######################################################################################
Decomposition
######################################################################################
# Decompose the given series into its various components: seasonal, trend, and irregular components (multiplicative decomposition).
import_decomposed <- decompose(netimport.ts, type = "multiplicative")
plot(import_decomposed) # Plot the decomposed series
#####################################################################################
# Seasonal Index from the decomposed series
seasonal_index <- data.frame(import_decomposed$figure)
seasonal_index <- cbind(seasonal_index, seq(1:12))
Linear trend model:
#####################################################################################
# Linear Trend Model
train.lm <- tslm(netimport.ts ~ trend)
summary(train.lm)
##
## Call:
## tslm(formula = netimport.ts ~ trend)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3269.2 -666.2 27.3 841.2 3244.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 9924.591 252.315 39.33 <2e-16 ***
## trend 72.445 4.019 18.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1302 on 106 degrees of freedom
## Multiple R-squared: 0.7541, Adjusted R-squared: 0.7517
## F-statistic: 325 on 1 and 106 DF, p-value: < 2.2e-16
train.lm.linear.trend.pred <- forecast(train.lm, h = nValid)
accuracy(train.lm.linear.trend.pred, nValid) # Model accuracy
## ME RMSE MAE MPE MAPE
## Training set 1.347056e-13 1289.863 996.1412 -9.942159e-01 7.609353e+00
## Test set -1.780908e+04 17809.085 17809.0847 -1.484090e+05 1.484090e+05
## MASE ACF1
## Training set 0.7203631 0.01945983
## Test set 12.8787034 NA
# Plotting the linear trend
plot(train.ts, xlab = "Time", ylab = "netimport", ylim = c(10000, 20000), bty = "l")
lines(train.lm$fitted, lwd = 2)
Exponential model
####################################################################################
# Exponential Trend Model
train.lm.expo.trend <- tslm(train.ts ~ trend, lambda = 0)
train.lm.expo.trend.pred <- forecast(train.lm.expo.trend, h = nValid)
accuracy(train.lm.expo.trend.pred, nValid) # Model accuracy
## ME RMSE MAE MPE MAPE
## Training set 64.21258 1274.451 979.9133 -5.237561e-01 7.73784e+00
## Test set -17149.08591 17149.086 17149.0859 -1.429090e+05 1.42909e+05
## MASE ACF1
## Training set 0.7306064 0.02246003
## Test set 12.7860623 NA
# Plots for fitted value for Exponential Trend
plot(train.lm.expo.trend.pred, ylim = c(10000, 25000), ylab = "netimports", xlab = "Time", bty = "l", xaxt = "n", xlim = c(2011,2022.25), main = "", flty = 2)
axis(1, at = seq(2011, 2022, 1), labels = format(seq(2011, 2022, 1)))
lines(train.lm.expo.trend.pred$fitted, lwd = 2, col = "blue") # Added in 6-5
lines(train.lm.linear.trend.pred$fitted, lwd = 2, col = "red", lty = 3)
lines(train.lm.linear.trend.pred$mean, lwd = 2, col = "black", lty = 3)
lines(valid.ts)
lines(c(2020.25 - 3, 2020.25 - 3), c(0, 35000))
lines(c(2020.25, 2020.25), c(0, 35000))
text(2011.25, 25000, "Training")
text(2018.25, 25000, "Validation")
text(2022.25, 25000, "Future")
arrows(2020 - 3, 24500, 2011.25, 24500, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5 - 3, 24500, 2020, 24500, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5, 24500, 2022, 24500, code = 3, length = 0.1, lwd = 1, angle = 30)
Model with seasonality
########################################################################################
# Modeling Seasonality
train.lm.season <- tslm(train.ts ~ season)
summary(train.lm.season)
##
## Call:
## tslm(formula = train.ts ~ season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4799.2 -1854.9 -200.5 2043.8 4394.9
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 14669.9 918.5 15.971 <2e-16 ***
## season2 -2279.8 1299.0 -1.755 0.0830 .
## season3 -1617.7 1299.0 -1.245 0.2166
## season4 -426.8 1257.8 -0.339 0.7353
## season5 -648.8 1257.8 -0.516 0.6073
## season6 -1437.5 1257.8 -1.143 0.2564
## season7 -1373.4 1257.8 -1.092 0.2781
## season8 -1075.0 1257.8 -0.855 0.3952
## season9 -2568.1 1257.8 -2.042 0.0444 *
## season10 -1597.0 1257.8 -1.270 0.2078
## season11 -1753.1 1257.8 -1.394 0.1672
## season12 -1009.8 1257.8 -0.803 0.4244
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2430 on 81 degrees of freedom
## Multiple R-squared: 0.08635, Adjusted R-squared: -0.03773
## F-statistic: 0.6959 on 11 and 81 DF, p-value: 0.7388
train.lm.season.pred <- forecast(train.lm.season, h = nValid)
accuracy(train.lm.season.pred , nValid)
## ME RMSE MAE MPE MAPE
## Training set -1.160911e-13 2268.006 2009.405 -3.108099e+00 15.84392
## Test set -1.465795e+04 14657.950 14657.950 -1.221496e+05 122149.58333
## MASE ACF1
## Training set 1.498178 0.7905511
## Test set 10.928714 NA
train.lm.season%>% forecast(h = 10) %>% autoplot()
# Plots of Fitted and Resdiuals
par(mfrow = c(2,1))
plot(train.lm.season.pred, ylim = c(13000, 26250), ylab = "netimports", xlab = "Time", bty = "l", xaxt = "n", xlim = c(2011,2022.25), main = "", flty = 2)
axis(1, at = seq(2011, 2022, 1), labels = format(seq(2011, 2022, 1)))
lines(train.lm.season.pred$fitted, lwd = 2, col = "blue")
lines(valid.ts)
lines(c(2020.25 - 3, 2020.25 - 3), c(0, 35000))
lines(c(2020.25, 2020.25), c(0, 35000))
text(2011.25, 26000, "Training")
text(2018.75, 26000, "Validation")
text(2022.25, 26000, "Future")
arrows(2020 - 3, 24500, 2011.25, 24500, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5 - 3, 24500, 2020, 24500, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5, 24500, 2022, 24500, code = 3, length = 0.1, lwd = 1, angle = 30)
# Residuals
plot(train.lm.season.pred$residuals, ylim = c(-4000, 5500), ylab = "Residuals", xlab = "Time", bty = "l", xaxt = "n", xlim = c(2011,2022.25), main = "")
axis(1, at = seq(2011, 2022, 1), labels = format(seq(2011, 2022, 1)))
lines(train.ts - train.lm.season.pred$fitted)
lines(valid.ts - train.lm.season.pred$mean)
lines(c(2020.25 - 3, 2020.25 - 3), c(-5000, 35000))
lines(c(2020.25, 2020.25), c(-5000, 35000))
text(2011.25, 5250, "Training")
text(2018.75, 5250, "Validation")
text(2022.25, 5250, "Future")
arrows(2020 - 3, 4250, 2011.25, 4250, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5 - 3, 4250, 2020, 4250, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5, 4250, 2022, 4250, code = 3, length = 0.1, lwd = 1, angle = 30)
########################################################################################
Quadratic trend + Seasonality
# Modleing Quadratic Trend + Seasonality
train.lm.trend.season <- tslm(train.ts ~ trend + I(trend^2) + season)
train.lm.trend.season.pred <- forecast(train.lm.trend.season, h = nValid, level = 0)
accuracy(train.lm.trend.season.pred , nValid)
## ME RMSE MAE MPE MAPE
## Training set 1.957783e-14 993.5972 799.11 -6.445713e-01 6.319136e+00
## Test set -1.857160e+04 18571.5952 18571.60 -1.547633e+05 1.547633e+05
## MASE ACF1
## Training set 0.5958026 -0.005597907
## Test set 13.8466606 NA
# Summary
summary(train.lm.trend.season)
##
## Call:
## tslm(formula = train.ts ~ trend + I(trend^2) + season)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2282.31 -639.18 81.64 679.96 1991.83
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 11427.1619 533.2136 21.431 < 2e-16 ***
## trend 61.1973 16.9490 3.611 0.000534 ***
## I(trend^2) 0.1589 0.1747 0.909 0.365983
## season2 -2355.7907 576.2563 -4.088 0.000104 ***
## season3 -1769.9995 576.3018 -3.071 0.002922 **
## season4 -229.3490 559.1232 -0.410 0.682774
## season5 -526.4552 558.9719 -0.942 0.349152
## season6 -1390.4495 558.8743 -2.488 0.014953 *
## season7 -1402.0141 558.8290 -2.509 0.014159 *
## season8 -1179.5847 558.8348 -2.111 0.037953 *
## season9 -2749.0115 558.8915 -4.919 4.65e-06 ***
## season10 -1854.4656 558.9993 -3.317 0.001375 **
## season11 -2087.5015 559.1594 -3.733 0.000355 ***
## season12 -1421.4534 559.3731 -2.541 0.013007 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1078 on 79 degrees of freedom
## Multiple R-squared: 0.8246, Adjusted R-squared: 0.7958
## F-statistic: 28.58 on 13 and 79 DF, p-value: < 2.2e-16
train.lm.trend.season %>% forecast(h = 20) %>% autoplot()
#Plotting the fitted and residuals
par(mfrow = c(2,1))
plot(train.lm.trend.season.pred, ylim = c(8000, 35250), ylab = "netimports", xlab = "Time", bty = "l", xaxt = "n", xlim = c(2011,2020.25), main = "", flty = 2)
axis(1, at = seq(2011, 2022, 4), labels = format(seq(2011, 2022, 4)))
lines(train.lm.trend.season.pred$fitted, lwd = 2, col = "green")
lines(valid.ts)
lines(c(2020.25 - 3, 2020.25 - 3), c(0, 35000))
lines(c(2020.25, 2020.25), c(0, 35000))
text(2011.25, 35000, "Training")
text(2018.25, 35000, "Validation")
text(2022.25, 35000, "Future")
arrows(2020 - 3, 35500, 2011.25, 35500, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5 - 3, 35500, 2020, 35500, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5, 35500, 2022, 35500, code = 3, length = 0.1, lwd = 1, angle = 30)
# Plotting with residuals
plot(train.lm.trend.season.pred$residuals, ylim = c(-4000, 5500), ylab = "Residuals", xlab = "Time", bty = "l", xaxt = "n", xlim = c(2011,2022.25), main = "")
axis(1, at = seq(2011, 2022, 1), labels = format(seq(2011, 2022, 1)))
lines(train.ts - train.lm.trend.season.pred$fitted)
lines(valid.ts - train.lm.trend.season.pred$mean)
lines(c(2020.25 - 3, 2020.25 - 3), c(-5000, 35000))
lines(c(2020.25, 2020.25), c(-5000, 35000))
text(2011.25, 5250, "Training")
text(2018.75, 5250, "Validation")
text(2022.25, 5250, "Future")
arrows(2020 - 3, 4250, 2011.25, 4250, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5 - 3, 4250, 2020, 4250, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5, 4250, 2022, 4250, code = 3, length = 0.1, lwd = 1, angle = 30)
# testing for white noise
train.lm.trend.season.resid <- train.lm.trend.season$residuals
Box.test(train.lm.trend.season.resid, lag = 24, fitdf = 0, type = "Ljung")
##
## Box-Ljung test
##
## data: train.lm.trend.season.resid
## X-squared = 52.952, df = 24, p-value = 0.000587
ggAcf(train.lm.trend.season.resid)
gglagplot(train.lm.trend.season.resid)
## Warning in is.na(xlab): is.na() applied to non-(list or vector) of type
## 'NULL'
## Warning in is.na(ylab): is.na() applied to non-(list or vector) of type
## 'NULL'
# There is strong correlation among neighboring values
# we need to take advantage of it in our forecast.
###########################################################################################
############################################################################################
# Fitting ARIMA directly on the main series
fit <- auto.arima(netimport.ts)
checkresiduals(fit)
##
## Ljung-Box test
##
## data: Residuals from ARIMA(1,1,1)(2,0,0)[12]
## Q* = 11.351, df = 18, p-value = 0.8789
##
## Model df: 4. Total lags used: 22
AIC(fit)
## [1] 1829.304
fit %>% forecast(h = 10) %>% autoplot()
# ARMA and ARIMA
ggAcf(netimport.ts)
gglagplot(netimport.ts)
## Warning in is.na(xlab): is.na() applied to non-(list or vector) of type
## 'NULL'
## Warning in is.na(ylab): is.na() applied to non-(list or vector) of type
## 'NULL'
# Fitting an ARIMA on residual series
train.lm.trend.season <- tslm(train.ts ~ trend + I(trend^2) + season)
train.res.arima <- Arima(train.lm.trend.season$residuals, order = c(1,0,0))
train.res.arima.pred <- forecast(train.res.arima, h = nValid)
# Plot
plot(train.lm.trend.season$residuals, ylim = c(-5000, 5000), ylab = "Residuals",
xlab = "Time", bty = "l", xaxt = "n", xlim = c(2011,2022.25), main = "")
axis(1, at = seq(2011, 2022, 1), labels = format(seq(2011, 2022, 1)))
lines(train.res.arima.pred$fitted, lwd = 2, col = "red")
lines(c(2020.25 - 3, 2020.25 - 3), c(-5000, 35000))
lines(c(2020.25, 2020.25), c(-5000, 35000))
text(2011.25, 2250, "Training")
text(2018.75, 2250, "Validation")
text(2022.25, 22500, "Future")
arrows(2011 - 3, 2000, 2011.25, 2000, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5 - 3, 2000, 2020, 2000, code = 3, length = 0.1, lwd = 1,angle = 30)
arrows(2020.5, 2000, 2022, 2000, code = 3, length = 0.1, lwd = 1, angle = 30)
summary(train.res.arima)
## Series: train.lm.trend.season$residuals
## ARIMA(1,0,0) with non-zero mean
##
## Coefficients:
## ar1 mean
## -0.0057 -0.1196
## s.e. 0.1047 102.4748
##
## sigma^2 estimated as 1008900: log likelihood=-773.78
## AIC=1553.57 AICc=1553.84 BIC=1561.17
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.09657344 993.5812 799.3317 99.44335 99.44335 0.7010363
## ACF1
## Training set 0.001783958
# ACF function
ggAcf(train.res.arima$residuals, lag.max = 12, main = "")
## Warning: Ignoring unknown parameters: main
# # White Noise
set.seed(3)
wn <- ts(rnorm(120))
autoplot(wn)
## Warning in is.na(main): is.na() applied to non-(list or vector) of type
## 'NULL'
ggAcf(wn) + ggtitle("Sample ACF for whitenoise")
# How to test if a series is whitenoise or not
# Ljung-Box test
Box.test(wn, lag = 24, fitdf = 0, type = "Ljung")
##
## Box-Ljung test
##
## data: wn
## X-squared = 10.853, df = 24, p-value = 0.99
Conclusion:
Since crude oil import not only depend on demand, there is many factors influencing those are like socio economic factors , cross border relations, global econimc policies, population growth , exchange rates etc.
The above forecasting models forecast of crude oil net import by india.
To conclude that quadratic trend model is the best fit for above models for this problem statement,However many other factors influencing crude oil imports hence we have to explore other models to get accurate forecasting of crude imports.