ContextBase - Time Series Modeling

Section 1 - The Problem

Crude oil is a form of fossil fuel that remains in wide use around the world. The advantages and use of crude oil include being a source of fuel to run automobiles, planes, and as a structural material for plastic products.

However, is it possible that geopolitical events such as political and security issues in the Middle East and North Africa have a significant impact on causing crude oil price fluctuations? What is the true significance of geopolitical risk in explaining movements in oil prices? Does Geopolitical Risk in the Middle East and North Africa have a stronger relationship with oil prices, than geopolitical risk in the non-Middle East and non-North Africa sectors of the world?

Section 1.1 - Our Solution

In order to solve the above problem, ContextBase studies the impact of Geopolitical Risk (GPR) in MENA (Middle East and North Africa) on the crude oil market, from 1985 to 2019, and tries to find out if the relationship between geopolitical risk in MENA and oil prices is positive and significant.

All data and analysis will be based on monthly data. This research paper utilizes the West Texas Intermediate (WTI) crude oil price as the dependent variable and the Geopolitical risk (GPR) index as the independent variable. Several control variables are also included.

Data Science technology applies scientific methods and processes to the analysis of datasets stored in modern voluminous data storage, accessible via the internet. The objective in analyzing scientific datasets with Data Science methods is to extract knowledge to efficienctly provide scientific insights. Internet datasets are usually available in formats that are readable by modern Data Science programming languages.

The R programming language is derived from the statistical programming language “S”, that is derived from the statistical database programming language “SAS”. The R programming language emerged with the advent of Data Science, and is uniquely capable of handling the processes required by Data Science. The R programming language allows for convenient dataset access, efficient algorithmic manipulation of datasets, (for example, the ability to apply functions across datasets without FOR loops), and efficient statistical processing of dataset records, and observations. The R programming language has a vast collection of dataset processing packages that encompass a wide variety of modern statistical and scientific methods. R also provides for convenient graphical processing of data contained within internet datasets.

Section 2 - Data Import

The data imported for this project is a dataframe of time series data. The observations include time series data for crude oil prices, geopolitcal risk in the Middle East and North Africa (MENA), and the world outside the MENA region.

Other time series data included in the project data’s dataframe are the USA’s Dollar Index, Treasury Spread, Purchasing Managers Index, Industrial Production Index, the EU’s Industry Production Index, Japan’s Industry Production Index, China’s Industry Production Index, and India’s Industry Production Index.

For efficient programming, variables are created for the time series’ dates, the Oil Price Index, the MENA index, and non-MENA index.

project_data <- read_excel("REG_001.xlsx")

x <- project_data$Date
MENA_GPR <- project_data$`Average MENA GPR (Independent Variable)`
Non_MENA_GPR <- project_data$`Average GPR of Non-MENA Countries (Control Variable)`
Oil_Prices <- project_data$`Real Crude Oil Prices (Dependent Variable)`

Section 2.1 - Sample of Dataset Records

The dataset records contains monthly time series data from 1985 to 2019. The observation categories in the data are labelled as dependent, independent and control variables.

The West Texas Intermediate (WTI) crude oil prices are the dependent variable and the Geopolitical risk (GPR) index is the independent variable. There are also multiple control variables.

kable(project_data[1:5,], caption = "Sample of Dataset Records")

Sample of Dataset Records
Date	Real Crude Oil Prices (Dependent Variable)	Average MENA GPR (Independent Variable)	Average GPR of Non-MENA Countries (Control Variable)	US Dollar Index (Control Variable)	US Treasury Spread (Control Variable)	US Purchasing Managers Index (PMI) (Control Variable)	US Industrial Production Index (PPI) (Control Variable)	EU Industry Production Index (PPI) (Control Variable)	Japan Industry Production (PPI) (Control Variable)	China Industry Production (PPI) (Control Variable)	India Industry Production (PPI) Control Variable
1985-01-01	64.12	64.46255	80.85935	124.761	1.45	51.7	56.1398	68.41407	83.87658	NA	NA
1985-02-01	64.58	67.27189	93.03326	128.033	1.34	52.1	56.3323	68.15677	83.62420	NA	NA
1985-03-01	68.09	75.92460	106.55102	128.437	1.15	52.8	56.4232	68.69044	82.95116	NA	NA
1985-04-01	66.20	86.59889	95.03971	125.096	1.34	55.3	56.2693	67.96617	84.29723	NA	NA
1985-05-01	66.45	53.93602	99.16809	125.848	1.46	54.2	56.3488	68.87150	85.13852	NA	NA

Section 3 - Variation of WTI Oil Prices and GPR

Understanding the core problem is important for Data Science research into the effects of geopolitical risk on crude oil prices. How far did the prices of oil drop, or rise, within a one year period? Review of the time series data is needed to better understand the overall picture.

Section 4 - Time Series Trend Comparison

Initially the trends appear very similar. We can compare all three variables in the same graph using the base R plotting function.

plot(c(x,x,x), c(MENA_GPR,Non_MENA_GPR,Oil_Prices),
     type='n', xlab="x", ylab=" ",las=1,
     main = "Figure 4 - Time Series Plot of GPR vs Oil Prices")
lines(x, MENA_GPR, type='l', lty=1, col='green')
lines(x, Non_MENA_GPR, type='l', lty=2, col='blue')
lines(x, Oil_Prices, type='l', lty=3, col='red')
legend('topright', legend=c('MENA GPR','Non-MENA GPR','Oil Prices'),
       lty=c(1,2,3), col=c('green','blue','red'))

Section 5 - Statistical Tests of The Data

The following section applies statistical analysis to the time series data to understand the behavior of WTI Crude Oil Prices and MENA GPR. The data is tested for Heteroskedasticity, multicollinearity, autocorrelation, and stationarity, to compare with the final time series model for validity and accuracy.

Regression models requires stationary variables. If the time series is non stationary, the regression will fail. The unit root test, Augmented Dickey Fuller, examines the stationarity of the three variables, (MENA GPR, Non-MENA GPR, and Crude Oil Prices). If the three variables are stationary, the regression will be performed. If a time series has significant autocorrelation, then the residuals will display significant autocorrelation, and probably heteroscedasticity.

Figure 6 demonstrates heteroscedasticity exists via the residuals increasing as the fitted Y values increase. In Figures 7 and 8 we plot the ACF and PACF charts via running an augmented Dickey Fuller Test. Pre-2002, significant variance in the crude oil price occurs. Post-2002 the crude oil price has shot up. There is clear indication of a few interventions between 2007 and 2009, and beginning again in 2014.

## Figures 6: Linear Regression Plot of Oil Prices vs MENA GPR

## Stationary Test of Crude Oil Prices

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -0.980   0.329
## [2,]   1 -1.188   0.254
## [3,]   2 -1.240   0.236
## [4,]   3 -1.200   0.250
## [5,]   4 -1.150   0.268
## [6,]   5 -0.995   0.323
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -2.08  0.2968
## [2,]   1 -2.57  0.1018
## [3,]   2 -2.64  0.0883
## [4,]   3 -2.61  0.0931
## [5,]   4 -2.50  0.1292
## [6,]   5 -2.18  0.2566
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -2.59  0.3255
## [2,]   1 -3.20  0.0872
## [3,]   2 -3.36  0.0605
## [4,]   3 -3.32  0.0674
## [5,]   4 -3.21  0.0863
## [6,]   5 -2.81  0.2364
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

## Stationary Test of MENA GPR

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -2.108  0.0361
## [2,]   1 -1.518  0.1363
## [3,]   2 -1.167  0.2617
## [4,]   3 -1.041  0.3069
## [5,]   4 -0.776  0.4013
## [6,]   5 -0.756  0.4087
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -8.52    0.01
## [2,]   1 -6.66    0.01
## [3,]   2 -5.44    0.01
## [4,]   3 -4.95    0.01
## [5,]   4 -4.23    0.01
## [6,]   5 -4.14    0.01
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -9.16    0.01
## [2,]   1 -7.20    0.01
## [3,]   2 -5.91    0.01
## [4,]   3 -5.43    0.01
## [5,]   4 -4.56    0.01
## [6,]   5 -4.50    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

## Stationary Test of Non-MENA GPR

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -1.351   0.196
## [2,]   1 -0.975   0.330
## [3,]   2 -0.751   0.410
## [4,]   3 -0.605   0.462
## [5,]   4 -0.562   0.478
## [6,]   5 -0.485   0.504
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -8.89    0.01
## [2,]   1 -6.88    0.01
## [3,]   2 -5.47    0.01
## [4,]   3 -4.80    0.01
## [5,]   4 -4.50    0.01
## [6,]   5 -4.40    0.01
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -8.88    0.01
## [2,]   1 -6.88    0.01
## [3,]   2 -5.47    0.01
## [4,]   3 -4.80    0.01
## [5,]   4 -4.50    0.01
## [6,]   5 -4.40    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

Section 5.1 - Differenced Series

With a p-value of 0.329, clearly the time series is not stationary. The ACF plot doesn’t have curves in the lags, so seasonality is not evident in the series. The high 1st lag in PACF also shows evidence of non-stationarity in the series.

In order to format the series as stationary, ordinary differencing is applied, then plotted and re-tested for stationarity using the Augmented Dickey Fuller unit root test.

The differenced series in Figures 9-11 has a better variance around the mean level, and the peaks are evidence of the interventions in the orignal series. The p-value of the augmented dickey fuller test is significant, and is now 0.01. Therefore, the series is now stationary.

## Stationary Test of Differenced Crude Oil Prices

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag    ADF p.value
## [1,]   0 -16.50    0.01
## [2,]   1 -12.53    0.01
## [3,]   2 -10.77    0.01
## [4,]   3  -9.98    0.01
## [5,]   4 -10.25    0.01
## [6,]   5  -9.63    0.01
## Type 2: with drift no trend 
##      lag    ADF p.value
## [1,]   0 -16.48    0.01
## [2,]   1 -12.51    0.01
## [3,]   2 -10.76    0.01
## [4,]   3  -9.96    0.01
## [5,]   4 -10.24    0.01
## [6,]   5  -9.61    0.01
## Type 3: with drift and trend 
##      lag    ADF p.value
## [1,]   0 -16.46    0.01
## [2,]   1 -12.50    0.01
## [3,]   2 -10.75    0.01
## [4,]   3  -9.95    0.01
## [5,]   4 -10.23    0.01
## [6,]   5  -9.60    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

## Stationary Test of Differenced MENA GPR

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag   ADF p.value
## [1,]   0 -27.3    0.01
## [2,]   1 -20.6    0.01
## [3,]   2 -16.2    0.01
## [4,]   3 -15.4    0.01
## [5,]   4 -12.6    0.01
## [6,]   5 -10.7    0.01
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -27.3    0.01
## [2,]   1 -20.5    0.01
## [3,]   2 -16.2    0.01
## [4,]   3 -15.3    0.01
## [5,]   4 -12.6    0.01
## [6,]   5 -10.7    0.01
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -27.3    0.01
## [2,]   1 -20.5    0.01
## [3,]   2 -16.1    0.01
## [4,]   3 -15.3    0.01
## [5,]   4 -12.6    0.01
## [6,]   5 -10.7    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

## Stationary Test of Differenced Non-MENA GPR

## Augmented Dickey-Fuller Test 
## alternative: stationary 
##  
## Type 1: no drift no trend 
##      lag   ADF p.value
## [1,]   0 -27.5    0.01
## [2,]   1 -21.1    0.01
## [3,]   2 -17.0    0.01
## [4,]   3 -14.1    0.01
## [5,]   4 -12.1    0.01
## [6,]   5 -11.1    0.01
## Type 2: with drift no trend 
##      lag   ADF p.value
## [1,]   0 -27.5    0.01
## [2,]   1 -21.1    0.01
## [3,]   2 -17.0    0.01
## [4,]   3 -14.1    0.01
## [5,]   4 -12.1    0.01
## [6,]   5 -11.1    0.01
## Type 3: with drift and trend 
##      lag   ADF p.value
## [1,]   0 -27.5    0.01
## [2,]   1 -21.1    0.01
## [3,]   2 -17.0    0.01
## [4,]   3 -14.1    0.01
## [5,]   4 -12.1    0.01
## [6,]   5 -11.1    0.01
## ---- 
## Note: in fact, p.value = 0.01 means p.value <= 0.01

Section 6 - Time Series Seasonal Decomposition

The p-value of the ADF test is now significant at 0.01, therefore the series is now stationary.

We move on to decomposing the time series into seasonal and monthly trends. We first run seasonal decomposition. The trend deviates slightly from the orignal, showing that it has an effect on the series. To understand the seasonal effect of the series we plot the seasonal factors monthly.

From Figure 12, we can see that the trend follows the overall pattern of the original series with an upward trend from 2003-2004. The remainder shows the peaks which reflect the interventions in the series. We break up the trend effect with the monthplot to understand the trends over the month.

Figure 12 shows that the mean over the months remains pretty much the same, with values of prices varying a lot between 10 and 100. This clearly shows the trend effect on the original series. Via model fitting, we found this series could potentially act as a predictor series for forecasting future crude oil prices.

# Seasonal decomposition
fit <- stl(crude_oil_prices, s.window="period")

plot(fit, main="Figure 12 - Seasonal Decomposition of Crude Oil Prices")

monthplot(crude_oil_prices, col="green", main="Figure 13 - Month Plot of Crude Oil Prices, Seasonal Adjustment")

seasonplot(crude_oil_prices, main="Figure 14 - Season Plot of Crude Oil Prices, Seasonal Adjustment")

Section 7 - Correlation Testing

This section contains visualization, and mathematical calculation, of the correlation of crude oil prices to MENA GPR. The most effective way to visualize a correlation between two variables is via scatter plot, easily accomplished with the R programming languafe and the ggplot2 library.

The correlations are mathematical proof that one can reject the null hypothesis - there isn’t any correlation between the West Texas Intermediate Oil Price history, and the geopolitical risk history of the Middle East and North Africa. Also varified is the greater significance of MENA GPR over Non-MENA GPR.

Correlation of Crude Oil Price and MENA GPR
	Crude Oil Price	MENA GPR
Crude Oil Price	1.000000	0.026559
MENA GPR	0.026559	1.000000

Correlation of Crude Oil Price and Non-MENA GPR
	Crude Oil Price	Non-MENA GPR
Crude Oil Price	1.0000000	0.0122368
Non-MENA GPR	0.0122368	1.0000000

Section 8 - Conclusion

The data exploration, analysis, statistical tests, and data modeling in this research paper accomplishes the examination of significant correlation of WTI Crude Oil Prices, Middle East and North Africa Geopolitical Risk, and Non- Middle East and North Africa Geopolitical Risk.

The time series data of the above three variables was found to be non-stationary. Therefore, via differencing of the times series data, significance was accomplished.

It was proven that MENA GPR has a greater correlation with Crude Oil Prices, than Non-MENA GPR.

Section 9 - Appendix

Section 9.1 - Required Packages

	List of Required Packages
Required Packages	‘tseries’ ‘aTSA’ ‘plyr’ ‘forecast’ ‘quantmod’ ‘PerformanceAnalytics’ ‘rugarch’ ‘nortest’ ‘readxl’ ‘ggplot2’ ‘knitr’

Section 9.2 - Session Information

	Session Information
R Version	R version 4.2.0 (2022-04-22 ucrt)
Platform	x86_64-w64-mingw32/x64 (64-bit)
Running	Windows 10 x64 (build 22631)
RStudio Citation	RStudio: Integrated Development Environment for R
RStudio Version	1.0.153

Section 9.3 - Glossary

GPR - Geopolitical Risk
WTI - West Texas Intermediate crude oil price index
MENA - Middle East and North Africa