Crude oil is a form of fossil fuel that remains in wide use around the world. The advantages and use of crude oil include being a source of fuel to run automobiles, planes, and as a structural material for plastic products.
However, is it possible that geopolitical events such as political and security issues in the Middle East and North Africa have a significant impact on causing crude oil price fluctuations? What is the true significance of geopolitical risk in explaining movements in oil prices? Does Geopolitical Risk in the Middle East and North Africa have a stronger relationship with oil prices, than geopolitical risk in the non-Middle East and non-North Africa sectors of the world?
In order to solve the above problem, ContextBase studies the impact of Geopolitical Risk (GPR) in MENA (Middle East and North Africa) on the crude oil market, from 1985 to 2019, and tries to find out if the relationship between geopolitical risk in MENA and oil prices is positive and significant.
All data and analysis will be based on monthly data. This research paper utilizes the West Texas Intermediate (WTI) crude oil price as the dependent variable and the Geopolitical risk (GPR) index as the independent variable. Several control variables are also included.
Data Science technology applies scientific methods and processes to the analysis of datasets stored in modern voluminous data storage, accessible via the internet. The objective in analyzing scientific datasets with Data Science methods is to extract knowledge to efficienctly provide scientific insights. Internet datasets are usually available in formats that are readable by modern Data Science programming languages.
The R programming language is derived from the statistical programming language “S”, that is derived from the statistical database programming language “SAS”. The R programming language emerged with the advent of Data Science, and is uniquely capable of handling the processes required by Data Science. The R programming language allows for convenient dataset access, efficient algorithmic manipulation of datasets, (for example, the ability to apply functions across datasets without FOR loops), and efficient statistical processing of dataset records, and observations. The R programming language has a vast collection of dataset processing packages that encompass a wide variety of modern statistical and scientific methods. R also provides for convenient graphical processing of data contained within internet datasets.
The data imported for this project is a dataframe of time series data. The observations include time series data for crude oil prices, geopolitcal risk in the Middle East and North Africa (MENA), and the world outside the MENA region.
Other time series data included in the project data’s dataframe are the USA’s Dollar Index, Treasury Spread, Purchasing Managers Index, Industrial Production Index, the EU’s Industry Production Index, Japan’s Industry Production Index, China’s Industry Production Index, and India’s Industry Production Index.
For efficient programming, variables are created for the time series’ dates, the Oil Price Index, the MENA index, and non-MENA index.
project_data <- read_excel("REG_001.xlsx")
x <- project_data$Date
MENA_GPR <- project_data$`Average MENA GPR (Independent Variable)`
Non_MENA_GPR <- project_data$`Average GPR of Non-MENA Countries (Control Variable)`
Oil_Prices <- project_data$`Real Crude Oil Prices (Dependent Variable)`
The dataset records contains monthly time series data from 1985 to 2019. The observation categories in the data are labelled as dependent, independent and control variables.
The West Texas Intermediate (WTI) crude oil prices are the dependent variable and the Geopolitical risk (GPR) index is the independent variable. There are also multiple control variables.
| Date | Real Crude Oil Prices (Dependent Variable) | Average MENA GPR (Independent Variable) | Average GPR of Non-MENA Countries (Control Variable) | US Dollar Index (Control Variable) | US Treasury Spread (Control Variable) | US Purchasing Managers Index (PMI) (Control Variable) | US Industrial Production Index (PPI) (Control Variable) | EU Industry Production Index (PPI) (Control Variable) | Japan Industry Production (PPI) (Control Variable) | China Industry Production (PPI) (Control Variable) | India Industry Production (PPI) Control Variable |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1985-01-01 | 64.12 | 64.46255 | 80.85935 | 124.761 | 1.45 | 51.7 | 56.1398 | 68.41407 | 83.87658 | NA | NA |
| 1985-02-01 | 64.58 | 67.27189 | 93.03326 | 128.033 | 1.34 | 52.1 | 56.3323 | 68.15677 | 83.62420 | NA | NA |
| 1985-03-01 | 68.09 | 75.92460 | 106.55102 | 128.437 | 1.15 | 52.8 | 56.4232 | 68.69044 | 82.95116 | NA | NA |
| 1985-04-01 | 66.20 | 86.59889 | 95.03971 | 125.096 | 1.34 | 55.3 | 56.2693 | 67.96617 | 84.29723 | NA | NA |
| 1985-05-01 | 66.45 | 53.93602 | 99.16809 | 125.848 | 1.46 | 54.2 | 56.3488 | 68.87150 | 85.13852 | NA | NA |
Understanding the core problem is important for Data Science research into the effects of geopolitical risk on crude oil prices. How far did the prices of oil drop, or rise, within a one year period? Review of the time series data is needed to better understand the overall picture.
Initially the trends appear very similar. We can compare all three variables in the same graph using the base R plotting function.
plot(c(x,x,x), c(MENA_GPR,Non_MENA_GPR,Oil_Prices),
type='n', xlab="x", ylab=" ",las=1,
main = "Figure 4 - Time Series Plot of GPR vs Oil Prices")
lines(x, MENA_GPR, type='l', lty=1, col='green')
lines(x, Non_MENA_GPR, type='l', lty=2, col='blue')
lines(x, Oil_Prices, type='l', lty=3, col='red')
legend('topright', legend=c('MENA GPR','Non-MENA GPR','Oil Prices'),
lty=c(1,2,3), col=c('green','blue','red'))
The following section applies statistical analysis to the time series data to understand the behavior of WTI Crude Oil Prices and MENA GPR. The data is tested for Heteroskedasticity, multicollinearity, autocorrelation, and stationarity, to compare with the final time series model for validity and accuracy.
Regression models requires stationary variables. If the time series is non stationary, the regression will fail. The unit root test, Augmented Dickey Fuller, examines the stationarity of the three variables, (MENA GPR, Non-MENA GPR, and Crude Oil Prices). If the three variables are stationary, the regression will be performed. If a time series has significant autocorrelation, then the residuals will display significant autocorrelation, and probably heteroscedasticity.
Figure 6 demonstrates heteroscedasticity exists via the residuals
increasing as the fitted Y values increase. In Figures 7 and 8 we plot
the ACF and PACF charts via running an augmented Dickey Fuller Test.
Pre-2002, significant variance in the crude oil price occurs. Post-2002
the crude oil price has shot up. There is clear indication of a few
interventions between 2007 and 2009, and beginning again in 2014.
## Figures 6: Linear Regression Plot of Oil Prices vs MENA GPR
## Stationary Test of Crude Oil Prices
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -0.980 0.329
## [2,] 1 -1.188 0.254
## [3,] 2 -1.240 0.236
## [4,] 3 -1.200 0.250
## [5,] 4 -1.150 0.268
## [6,] 5 -0.995 0.323
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -2.08 0.2968
## [2,] 1 -2.57 0.1018
## [3,] 2 -2.64 0.0883
## [4,] 3 -2.61 0.0931
## [5,] 4 -2.50 0.1292
## [6,] 5 -2.18 0.2566
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -2.59 0.3255
## [2,] 1 -3.20 0.0872
## [3,] 2 -3.36 0.0605
## [4,] 3 -3.32 0.0674
## [5,] 4 -3.21 0.0863
## [6,] 5 -2.81 0.2364
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
## Stationary Test of MENA GPR
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -2.108 0.0361
## [2,] 1 -1.518 0.1363
## [3,] 2 -1.167 0.2617
## [4,] 3 -1.041 0.3069
## [5,] 4 -0.776 0.4013
## [6,] 5 -0.756 0.4087
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -8.52 0.01
## [2,] 1 -6.66 0.01
## [3,] 2 -5.44 0.01
## [4,] 3 -4.95 0.01
## [5,] 4 -4.23 0.01
## [6,] 5 -4.14 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -9.16 0.01
## [2,] 1 -7.20 0.01
## [3,] 2 -5.91 0.01
## [4,] 3 -5.43 0.01
## [5,] 4 -4.56 0.01
## [6,] 5 -4.50 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
## Stationary Test of Non-MENA GPR
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -1.351 0.196
## [2,] 1 -0.975 0.330
## [3,] 2 -0.751 0.410
## [4,] 3 -0.605 0.462
## [5,] 4 -0.562 0.478
## [6,] 5 -0.485 0.504
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -8.89 0.01
## [2,] 1 -6.88 0.01
## [3,] 2 -5.47 0.01
## [4,] 3 -4.80 0.01
## [5,] 4 -4.50 0.01
## [6,] 5 -4.40 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -8.88 0.01
## [2,] 1 -6.88 0.01
## [3,] 2 -5.47 0.01
## [4,] 3 -4.80 0.01
## [5,] 4 -4.50 0.01
## [6,] 5 -4.40 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
With a p-value of 0.329, clearly the time series is not stationary. The ACF plot doesn’t have curves in the lags, so seasonality is not evident in the series. The high 1st lag in PACF also shows evidence of non-stationarity in the series.
In order to format the series as stationary, ordinary differencing is applied, then plotted and re-tested for stationarity using the Augmented Dickey Fuller unit root test.
The differenced series in Figures 9-11 has a better variance around
the mean level, and the peaks are evidence of the interventions in the
orignal series. The p-value of the augmented dickey fuller test is
significant, and is now 0.01. Therefore, the series is now
stationary.
## Stationary Test of Differenced Crude Oil Prices
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -16.50 0.01
## [2,] 1 -12.53 0.01
## [3,] 2 -10.77 0.01
## [4,] 3 -9.98 0.01
## [5,] 4 -10.25 0.01
## [6,] 5 -9.63 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -16.48 0.01
## [2,] 1 -12.51 0.01
## [3,] 2 -10.76 0.01
## [4,] 3 -9.96 0.01
## [5,] 4 -10.24 0.01
## [6,] 5 -9.61 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -16.46 0.01
## [2,] 1 -12.50 0.01
## [3,] 2 -10.75 0.01
## [4,] 3 -9.95 0.01
## [5,] 4 -10.23 0.01
## [6,] 5 -9.60 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
## Stationary Test of Differenced MENA GPR
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -27.3 0.01
## [2,] 1 -20.6 0.01
## [3,] 2 -16.2 0.01
## [4,] 3 -15.4 0.01
## [5,] 4 -12.6 0.01
## [6,] 5 -10.7 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -27.3 0.01
## [2,] 1 -20.5 0.01
## [3,] 2 -16.2 0.01
## [4,] 3 -15.3 0.01
## [5,] 4 -12.6 0.01
## [6,] 5 -10.7 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -27.3 0.01
## [2,] 1 -20.5 0.01
## [3,] 2 -16.1 0.01
## [4,] 3 -15.3 0.01
## [5,] 4 -12.6 0.01
## [6,] 5 -10.7 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
## Stationary Test of Differenced Non-MENA GPR
## Augmented Dickey-Fuller Test
## alternative: stationary
##
## Type 1: no drift no trend
## lag ADF p.value
## [1,] 0 -27.5 0.01
## [2,] 1 -21.1 0.01
## [3,] 2 -17.0 0.01
## [4,] 3 -14.1 0.01
## [5,] 4 -12.1 0.01
## [6,] 5 -11.1 0.01
## Type 2: with drift no trend
## lag ADF p.value
## [1,] 0 -27.5 0.01
## [2,] 1 -21.1 0.01
## [3,] 2 -17.0 0.01
## [4,] 3 -14.1 0.01
## [5,] 4 -12.1 0.01
## [6,] 5 -11.1 0.01
## Type 3: with drift and trend
## lag ADF p.value
## [1,] 0 -27.5 0.01
## [2,] 1 -21.1 0.01
## [3,] 2 -17.0 0.01
## [4,] 3 -14.1 0.01
## [5,] 4 -12.1 0.01
## [6,] 5 -11.1 0.01
## ----
## Note: in fact, p.value = 0.01 means p.value <= 0.01
The p-value of the ADF test is now significant at 0.01, therefore the series is now stationary.
We move on to decomposing the time series into seasonal and monthly trends. We first run seasonal decomposition. The trend deviates slightly from the orignal, showing that it has an effect on the series. To understand the seasonal effect of the series we plot the seasonal factors monthly.
From Figure 12, we can see that the trend follows the overall pattern of the original series with an upward trend from 2003-2004. The remainder shows the peaks which reflect the interventions in the series. We break up the trend effect with the monthplot to understand the trends over the month.
Figure 12 shows that the mean over the months remains pretty much the same, with values of prices varying a lot between 10 and 100. This clearly shows the trend effect on the original series. Via model fitting, we found this series could potentially act as a predictor series for forecasting future crude oil prices.
# Seasonal decomposition
fit <- stl(crude_oil_prices, s.window="period")
plot(fit, main="Figure 12 - Seasonal Decomposition of Crude Oil Prices")monthplot(crude_oil_prices, col="green", main="Figure 13 - Month Plot of Crude Oil Prices, Seasonal Adjustment")seasonplot(crude_oil_prices, main="Figure 14 - Season Plot of Crude Oil Prices, Seasonal Adjustment")
This section contains visualization, and mathematical calculation, of the correlation of crude oil prices to MENA GPR. The most effective way to visualize a correlation between two variables is via scatter plot, easily accomplished with the R programming languafe and the ggplot2 library.
The correlations are mathematical proof that one can reject the null
hypothesis - there isn’t any correlation between the West Texas
Intermediate Oil Price history, and the geopolitical risk history of the
Middle East and North Africa. Also varified is the greater significance
of MENA GPR over Non-MENA GPR.
| Crude Oil Price | MENA GPR | |
|---|---|---|
| Crude Oil Price | 1.000000 | 0.026559 |
| MENA GPR | 0.026559 | 1.000000 |
| Crude Oil Price | Non-MENA GPR | |
|---|---|---|
| Crude Oil Price | 1.0000000 | 0.0122368 |
| Non-MENA GPR | 0.0122368 | 1.0000000 |
The data exploration, analysis, statistical tests, and data modeling in this research paper accomplishes the examination of significant correlation of WTI Crude Oil Prices, Middle East and North Africa Geopolitical Risk, and Non- Middle East and North Africa Geopolitical Risk.
The time series data of the above three variables was found to be non-stationary. Therefore, via differencing of the times series data, significance was accomplished.
It was proven that MENA GPR has a greater correlation with Crude Oil Prices, than Non-MENA GPR.
| List of Required Packages | |
|---|---|
| Required Packages | ‘tseries’ ‘aTSA’ ‘plyr’ ‘forecast’ ‘quantmod’ ‘PerformanceAnalytics’ ‘rugarch’ ‘nortest’ ‘readxl’ ‘ggplot2’ ‘knitr’ |
| Session Information | |
|---|---|
| R Version | R version 4.2.0 (2022-04-22 ucrt) |
| Platform | x86_64-w64-mingw32/x64 (64-bit) |
| Running | Windows 10 x64 (build 22631) |
| RStudio Citation | RStudio: Integrated Development Environment for R |
| RStudio Version | 1.0.153 |
GPR - Geopolitical Risk
WTI - West Texas Intermediate crude oil price index
MENA - Middle East and North Africa