VEC model to analyze the relationship between GDP and energy consumption in Peru, 1971 - 2014

Application in Python

Author

Nelson Brayan Mamani Flores

Published

March 31, 2023

Abstract

This paper presents empirical evidence on the relationship between energy consumption and GDP in Perú during the period 1971 to 2014. The results show that the series are not stationary, i.e., are individually I(1); in addition, we find a long-term relationship between both variables. Through a VECM we estimate short- and long-term elasticities to analyze the dynamics of adjustment. The results show that in the short-term the conservation hypothesis holds (i.e., no evidence of a short-term relationship of energy consumption to GDP). In the long-term, however, we find evidence of a feedback mechanism between both variables. Yet, this paper provides evidence that policymakers can implement policies aimed at energy conservation without hurting economic growth.

Key words: Energy Consumption, Cointegration Relation, Vector Error Correction Model (VECM), Structural Break, Perú.

1. Introduction

The oil crisis of the 1970s motivated researchers to study numerous relationships between macroeconomic variables. As a result, researchers began to study the relationships between macroeconomic variables such as oil prices, inflation, economic growth, energy consumption, the exchange rate, and other factors that affect the economy in general. This led to a greater understanding of how these variables are interconnected and how they can affect the economy of a country and the world as a whole. The oil crisis led to a significant increase in energy prices, which in turn had a major impact on the global economy. Also, since energy is a key input for production and consumption, it has become a central research topic in economics.

Given the importance between GDP dynamics and electricity consumption, this article will address the relationship between these variables through an error correction model. For this reason, the specific objective of this work is to provide empirical evidence of the existence or not of a strong relationship (in the short and long term) between energy consumption and GDP in Peru, as well as to analyze that, For the above reason, it is possible to implement environmental policies that promote the efficient use and conservation of energy.

In this sense, this work is an empirical and applied contribution to the scarce literature on energy and its impact on production in Peru.

2. The relationship between energy consumption and GDP

The relationship between energy consumption and GDP is complex and multifaceted. Generally, there is a positive correlation between energy consumption and GDP, as countries with higher levels of economic development tend to consume more energy in order to power their industries, transportation systems, and households. Evidence also shows that it is possible to find a two-way causality between energy consumption and real GDP. In the study carried out by Al-Iriani (2006) for the six countries that make up the Gulf Cooperation Council (Kuwait, Oman, Saudi Arabia, Bahrain, the United Arab Emirates and Qatar), the results obtained indicate that there is unidirectional causality of GDP. . to energy consumption; Soytas, Sari and Ozdemir (2001) found evidence of a one-way causal relationship between energy consumption and GDP in Turkey from the cointegration method and error correction vector analysis.

A priori we could say that the relationship between energy consumption and GDP is complex and depends on a wide range of factors including the level of economic development of a country, its energy policies and the efficiency of its energy systems.

3. Econometric methodology and model

It is known that to avoid obtaining misleading relationships in econometric estimates it is necessary to apply unit root tests to the series in order to determine the stationarity or not of the series. Then, if the series are non-stationary, that is, they have a unit root, it must be proved that they are cointegrated and, thus, have a long-term relationship between them.

Firstly, in this work the ADF (Augmented Dickey-Fuller) test and the KPSS (Kwiatkowski, Phillips, Smichdt and Shin) test were applied to test the order of integration of the series. Additionally, the Zivot-Andrews test was used to determine if the series are stationary or non-stationary in the presence of a possible unit root with structural break.

As mentioned, if the series have a unit root (they are integrated of order one), according to Granger and Newbold (1974), the step to follow is to test cointegration between the series. In addition, the test proposed by Johansen based on the Lagrange Multiplier estimator (MLE) and the cointegration methodology proposed by Johansen and Juselius (1992) were used to prove the existence of a long-term relationship between the variables. The next step is to estimate the VAR Model (p) to determine the number of lags thereof based on the information criteria and thus determine the optimal lags of the VEC Model (p-1).

3.1. Data and Application

This is a time series study, which consists of annual time series of GDP and energy consumption, both in per capita terms. The series are obtained from the World Bank, the definition of energy consumption is: Electricity consumption measures the production of electricity generating plants and combined heat and electricity generation plants, minus transmission, distribution and transformation losses, and the own consumption of the plants. On the other hand, GDP per capita is measured at current prices.

# import libraries

import pandas as pd 
import requests

# Access the world bank database

from pandas_datareader import wb
KWH_PC= 'EG.USE.ELEC.KH.PC'     # Series Electrical energy consumption per capita (kWh)
PBI_PC= 'NY.GDP.PCAP.KN'        # GDP per capita series in USD (Constant prices)

# Download the database with the selected variables

df1 = wb.download(indicator=[KWH_PC, PBI_PC], country='PER', start=1971, end=2014).sort_index()
df1.head()

		EG.USE.ELEC.KH.PC	NY.GDP.PCAP.KN
country	year
Peru	1971	388.514407	8784.738435
	1972	406.578668	8848.246104
	1973	410.295619	9158.635033
	1974	426.906110	9763.916538
	1975	435.773811	9930.384340

# Rename variables and remove country indexed column

data = df1.reset_index(level=['country'])
data = data.drop('country', axis=1).rename(columns={'EG.USE.ELEC.KH.PC': 'CE',
                                                    'NY.GDP.PCAP.KN': 'PBI'})
data.to_excel('data.xlsx', index=True)
data.head()

	CE	PBI
year
1971	388.514407	8784.738435
1972	406.578668	8848.246104
1973	410.295619	9158.635033
1974	426.906110	9763.916538
1975	435.773811	9930.384340

# Generate the logarithm of the variables

import numpy as np
data['LPBI'] = np.log(data['PBI'])
data['LCE'] = np.log(data['CE'])
data.head()

	CE	PBI	LPBI	LCE
year
1971	388.514407	8784.738435	9.080771	5.962330
1972	406.578668	8848.246104	9.087975	6.007777
1973	410.295619	9158.635033	9.122452	6.016878
1974	426.906110	9763.916538	9.186449	6.056564
1975	435.773811	9930.384340	9.203354	6.077123

# Generate the graph in time series of the logarithms of the variables

import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
plt.rcParams['lines.linewidth']=1.5

fig, ax = plt.subplots(figsize=(10, 6))
data[['LPBI','LCE']].plot(ax=ax)
ax.set_title('Figure1. Energy Consumption Per Capita and Real GDP Per Capita in Peru, 1971-2014')
ax.set_xlabel('Mes')
ax.set_ylabel('Valor')
plt.show()

Graph 1 shows the series in logarithms for the study period and Graph 2 shows the first difference of the series in Graph 1. We can see that in graphs 1 and 2, the series are apparently correlated in some periods, but in others , power consumption increases.

# Generate the first differences of the variables

data['DLPBI'] = (data['LPBI']- data['LPBI'].shift(1))*100
data['DLCE'] = (data['LCE']- data['LCE'].shift(1))*100
data.head()

	CE	PBI	LPBI	LCE	DLPBI	DLCE
year
1971	388.514407	8784.738435	9.080771	5.962330	NaN	NaN
1972	406.578668	8848.246104	9.087975	6.007777	0.720331	4.544718
1973	410.295619	9158.635033	9.122452	6.016878	3.447789	0.910049
1974	426.906110	9763.916538	9.186449	6.056564	6.399645	3.968618
1975	435.773811	9930.384340	9.203354	6.077123	1.690558	2.055922

# Draw the first differences of the series

fig, ax = plt.subplots(figsize=(10, 6))
data[['DLPBI','DLCE']].plot(ax=ax)
ax.set_title('Figure2. First difference of energy consumption per capita and real GDP per capita in Perú, 1971-2014')
ax.set_xlabel('Año')
ax.set_ylabel('Valor')
plt.show()

For example, in at least 10 periods, the series appear to go in opposite directions. This result of the preliminary analysis provides a first approximation that gives strength to the hypothesis of this study, which suggests that there is neither a short nor a long-term relationship between energy consumption and GDP, in both directions, that is, from energy consumption to GDP and vice versa. However, in recent years it can be seen that more movements in GDP translate into movements in energy consumption.

In fact, if Graph 3 is observed, which presents a scatter graph with a linear trend, a first approximation can be seen, which indicates that the relationship between energy consumption and GDP is positive for the period of study in Peru.

# Plot the dispersion between variables

fig, ax = plt.subplots(figsize=(10, 6))
plt.scatter(data['LCE'], data['LPBI'])
fit = np.polyfit(data['LCE'], data['LPBI'], 1)
fit_fn = np.poly1d(fit)
plt.plot(data['LCE'], fit_fn(data['LCE']),'--r')
ax.set_title('Figure3. Dispersion between energy consumption per capita and real GDP per capita in Perú, 1971-2014')
ax.set_xlabel('LCE')
ax.set_ylabel('LPBI')
plt.show()

3.2. Unit Root Tests

The Augmented Dickey-Fuller (ADF) test and the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test are both statistical tests commonly used in time series analysis to test for stationarity. Additionally, the Zivot-Andrews test was used to test for a structural break or regime shift in the data. A structural break occurs when there is a significant change in the underlying data generation process of a time series. The null hypothesis of the Zivot-Andrews test is that the time series is stationary and does not have a structural break. The alternative hypothesis is that the time series has a structural break at an unknown point in time. The results are shown below, taking into account deterministic components.

# import libraries

import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller

# Dickey & Fuller test with constant for the LCE series

X = data["LCE"].values
result_adf1 = adfuller(X, regression='c')
print('ADF statistic: %f' % result_adf1[0])
print('p-value: %f' % result_adf1[1])
print('Critical values:')
for key, value in result_adf1[4].items():
    print('\t%s: %.3f' % (key, value))

if result_adf1[0] < result_adf1[4]["5%"]:
    print ("Reject the Ho - The time series is stationary")
else:
    print ("Do not reject Ho - The time series is not stationary")

ADF statistic: 1.804410
p-value: 0.998358
Critical values:
    1%: -3.597
    5%: -2.933
    10%: -2.605
Do not reject Ho - The time series is not stationary

Results analysis:

The results indicate that we cannot reject the null hypothesis (Ho) that the time series is not stationary. This is because the ADF (Augmented Dickey-Fuller) value is greater than the critical values at the 1%, 5%, and 10% significance levels, and the p-value is greater than the 5% significance level. . Therefore, we do not have enough evidence to affirm that the time series is stationary. It is important to highlight that the non-stationarity of the time series can have important implications in its modeling and in making decisions based on it.

# Dickey & Fuller test with constant and trend for the LCE series

X = data["LCE"].values
result_adf2 = adfuller(X, regression='ct')
print('ADF statistic: %f' % result_adf2[0])
print('p-value: %f' % result_adf2[1])
print('Critical values:')
for key, value in result_adf2[4].items():
    print('\t%s: %.3f' % (key, value))

if result_adf2[0] < result_adf2[4]["5%"]:
    print ("Reject the Ho - The time series is stationary")
else:
    print ("Do not reject Ho - The time series is not stationary")

ADF statistic: -0.093686
p-value: 0.993150
Critical values:
    1%: -4.192
    5%: -3.521
    10%: -3.191
Do not reject Ho - The time series is not stationary

Results analysis:

The results indicate that we cannot reject the null hypothesis (Ho) that the LCE time series is not stationary. This is because the ADF (Augmented Dickey-Fuller) value is greater than the critical values at the 1%, 5%, and 10% significance levels, and the p-value is greater than the 5% significance level. . Therefore, we do not have enough evidence to affirm that the LCE time series is stationary. It is important to highlight that the non-stationarity of the time series can have important implications in its modeling and in making decisions based on it. In addition, when using the test with constant and trend, the possible presence of a trend in the time series is taken into account, which may affect its stationarity.

# Dickey & Fuller test with constant for the lPBI series

Y = data["LPBI"].values
result_adf3 = adfuller(Y, regression='c')
print('ADF statistic: %f' % result_adf3[0])
print('p-value: %f' % result_adf3[1])
print('Critical values:')
for key, value in result_adf3[4].items():
    print('\t%s: %.3f' % (key, value))

if result_adf3[0] < result_adf3[4]["5%"]:
    print ("Reject the Ho - The time series is stationary")
else:
    print ("Do not reject Ho - The time series is not stationary")

ADF statistic: -0.502716
p-value: 0.891485
Critical values:
    1%: -3.597
    5%: -2.933
    10%: -2.605
Do not reject Ho - The time series is not stationary

Results analysis:

The results indicate that we cannot reject the null hypothesis (Ho) that the lGDP time series is not stationary. This is because the ADF (Augmented Dickey-Fuller) value is greater than the critical values at the 1%, 5%, and 10% significance levels, and the p-value is greater than the 5% significance level. . Therefore, we do not have sufficient evidence to affirm that the lGDP time series is stationary. It is important to highlight that the non-stationarity of the time series can have important implications in its modeling and in making decisions based on it. Using the constant test assumes that the time series has no deterministic trend, but may have a constant.

# Dickey & Fuller test with constant and trend for the lPBI series

Y = data["LPBI"].values
result_adf4 = adfuller(Y, regression='ct')
print('ADF statistic: %f' % result_adf4[0])
print('p-value: %f' % result_adf4[1])
print('Critical values:')
for key, value in result_adf4[4].items():
    print('\t%s: %.3f' % (key, value))

if result_adf4[0] < result_adf4[4]["5%"]:
    print ("Reject the Ho - The time series is stationary")
else:
    print ("Do not reject Ho - The time series is not stationary")

ADF statistic: -0.994704
p-value: 0.944823
Critical values:
    1%: -4.192
    5%: -3.521
    10%: -3.191
Do not reject Ho - The time series is not stationary

Results analysis:

The results indicate that we cannot reject the null hypothesis (Ho) that the IPBI time series is not stationary. This is because the ADF (Augmented Dickey-Fuller) value is greater than the critical values at the 1%, 5%, and 10% significance levels, and the p-value is greater than the 5% significance level. . Therefore, we do not have enough evidence to affirm that the GDPl time series is stationary. When using the test with constant and trend, the possible presence of a trend and a constant in the time series is taken into account, which may affect its stationarity. It is important to highlight that the non-stationarity of the time series can have important implications in its modeling and in decision-making based on it.

# import libraries

from statsmodels.tsa.stattools import kpss

import warnings
warnings.filterwarnings("ignore")

# Kwiatkowski–Phillips–Schmidt–Shin test with constant for the lCE series

result_kpss1 = kpss(X, regression='c')
print('KPSS Statistic: %f' % result_kpss1[0])
print('p-value: %f' % result_kpss1[1])
print('Critical values:')
for key, value in result_kpss1[3].items():
    print('\t%s: %.4f' % (key, value))

if result_kpss1[0] < result_kpss1[3]["5%"]:
    print ("Reject the Ho - The time series is not stationary")
else:
    print ("Do not reject Ho - The time series is stationary")

KPSS Statistic: 0.876762
p-value: 0.010000
Critical values:
    10%: 0.3470
    5%: 0.4630
    2.5%: 0.5740
    1%: 0.7390
Do not reject Ho - The time series is stationary

Results analysis:

LCE series indicate that we cannot reject the null hypothesis (Ho) that the time series is not stationary. This is because the value of the KPSS statistic is greater than the critical values at all significance levels, and the p-value is less than the 1% significance level.

# Kwiatkowski–Phillips–Schmidt–Shin test with constant and trend for the lCE series

result_kpss2 = kpss(X, regression='ct')
print('KPSS Statistic: %f' % result_kpss2[0])
print('p-value: %f' % result_kpss2[1])
print('Critical values:')
for key, value in result_kpss2[3].items():
    print('\t%s: %.4f' % (key, value))

if result_kpss2[0] < result_kpss2[3]["5%"]:
    print ("Reject the Ho - The time series is not stationary")
else:
    print ("Do not reject the Ho - The time series is stationary")

KPSS Statistic: 0.206136
p-value: 0.013699
Critical values:
    10%: 0.1190
    5%: 0.1460
    2.5%: 0.1760
    1%: 0.2160
Do not reject the Ho - The time series is stationary

Results analysis:

When using the test with constant and trend, the possible presence of a trend and a constant in the time series is taken into account, which may affect its stationarity.

Therefore, the results suggest that the lCE series has a deterministic trend, which means that its mean is not constant over time. This can have important implications for time series modeling and decision making, as a trend can affect the prediction and interpretation of the results. It is important to take these results into account when performing analysis and modeling of the lCE time series.

# Kwiatkowski–Phillips–Schmidt–Shin test with constant for the lCE series

result_kpss3 = kpss(Y, regression='c')
print('KPSS Statistic: %f' % result_kpss3[0])
print('p-value: %f' % result_kpss3[1])
print('Critical values:')
for key, value in result_kpss3[3].items():
    print('\t%s: %.4f' % (key, value))

if result_kpss3[0] < result_kpss3[3]["5%"]:
    print ("Reject the Ho - The time series is not stationary")
else:
    print ("Do not reject the Ho - The time series is stationary")

KPSS Statistic: 0.384265
p-value: 0.083937
Critical values:
    10%: 0.3470
    5%: 0.4630
    2.5%: 0.5740
    1%: 0.7390
Reject the Ho - The time series is not stationary

Results analysis:

The results of the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test with constant for the LCE series indicate that we can reject the null hypothesis (Ho) that the time series is not stationary. This is because the value of the KPSS statistic is greater than the critical values at all significance levels, and the p-value is greater than the 5% significance level.

This suggests that the LCE series is stationary in the sense that its mean and variance are constant over time, and that it does not have a deterministic trend. This is important in time series modeling and analysis, as a stationary series may be easier to model and predict.

# Kwiatkowski–Phillips–Schmidt–Shin test with constant and trend for the lCE series

result_kpss4 = kpss(Y, regression='ct')
print('KPSS Statistic: %f' % result_kpss4[0])
print('p-value: %f' % result_kpss4[1])
print('Critical values:')
for key, value in result_kpss4[3].items():
    print('\t%s: %.4f' % (key, value))

if result_kpss4[0] < result_kpss4[3]["5%"]:
    print ("Reject the Ho - The time series is not stationary")
else:
    print ("Do not reject the Ho - The time series is stationary")

KPSS Statistic: 0.227229
p-value: 0.010000
Critical values:
    10%: 0.1190
    5%: 0.1460
    2.5%: 0.1760
    1%: 0.2160
Do not reject the Ho - The time series is stationary

Results analysis:

The results of the Kwiatkowski-Phillips-Schmidt-Shin (KPSS) test with constant and trend for the ICE series indicate that we cannot reject the null hypothesis (Ho) that the time series is not stationary. This is because the value of the KPSS statistic is less than the critical values at all significance levels, and the p-value is less than the 5% significance level. This suggests that the ICE series is not stationary in the sense that it has a deterministic trend. It is possible that the series is stationary in the sense that its mean and variance are constant over time, but this cannot be determined with the KPSS test with constant and trend.

The results of the unit root tests for both series, there it can be seen that the estimated statistics are less than the critical values at 5% significance. Therefore, it is concluded that the series are integrated of order one in levels and of order zero (stationary) in differences. Additionally, deterministic components such as intercept and trend are included in the tests.

# Graph both series separately

fig, ax = plt.subplots(figsize=(15,5), nrows=1, ncols=2)
ax[0].plot(data['LPBI'])
ax[1].plot(data['LCE'])
ax[0].set_title("LBI series chart")
ax[1].set_title("LCE series chart")
plt.setp(ax[0].xaxis.get_majorticklabels(), rotation=90)
plt.setp(ax[1].xaxis.get_majorticklabels(), rotation=90)
plt.show()

In this case, to carry out the Zivot Andrews test, the trend of our series was taken into account; since it captures individually and precisely the structural break of the series. However, using other tests, multiple structural breaks could be obtained, which would make the cointegration task more complex. That is why we only take the trend in our series to identify the breaks produced in the historical evolution of the series.

# import libraries

from statsmodels.tsa.stattools import zivot_andrews

# Zivot andrews test with trend for the lPBI series

result_za1 = zivot_andrews(data['LPBI'], trim = 0.15, maxlag=5, regression='t')
print('Statistician Zivot Andrews: %f' % result_za1[0])
print('P-value: %f' % result_za1[1])
print('Critical values:') 
for key, value in result_za1[2].items():
    print('\t%s: %.3f' % (key, value))
print('Baselag: %f' % result_za1[3])
year = data.iloc[result_za1[4]].name
print(f"Year corresponding to the Breakpoint index: {year}")

if result_za1[0] < result_za1[2]["5%"]:
    print ("Reject the Ho - The series is stationary in trend")
else:
    print ("Do not reject Ho - The series has a unit root with only one structural break")

Statistician Zivot Andrews: -4.282785
P-value: 0.070014
Critical values:
    1%: -5.034
    5%: -4.406
    10%: -4.137
Baselag: 1.000000
Year corresponding to the Breakpoint index: 1997
Do not reject Ho - The series has a unit root with only one structural break

Results analysis

The null hypothesis of this test is that there is a unit root in the time series with a structural break at some point. In this case, the test returned a statistical value of -4.282785 and a p-value of 0.070014.

Critical values are used to compare with the statistical value and determine whether or not to reject the null hypothesis. In this case, the statistical value is less than the corresponding critical value at the 5% level of significance but not at 1%, indicating that there is insufficient evidence to reject the null hypothesis at the 5% level of significance. However, if a 1% significance level is considered, the statistical value is greater than the corresponding critical value, indicating that there is sufficient evidence to reject the null hypothesis at the 1% significance level.

Therefore, it is concluded that the time series has a unit root with a single structural break in the year 1997 (index 26). It is important to note that the series does not have a stationary trend and that future values of the series may depend on past values. This should be considered when performing analyzes and projections based on the time series in question.

# Zivot andrews test with trend for the LCE series

result_za2 = zivot_andrews(data['LCE'], trim = 0.15 , maxlag=5, regression='t')
print('Zivot-Andrews statistic: %f' % result_za2[0])
print('P-value: %f' % result_za2[1])
print('Critical values:') 
for key, value in result_za2[2].items():
    print('\t%s: %.3f' % (key, value))
print('Baselag: %f' % result_za2[3])
year = data.iloc[result_za2[4]].name
print(f"Year corresponding to the Breakpoint index: {year}")

if result_za2[0] < result_za2[2]["5%"]:
    print ("Reject the Ho - The series is stationary in trend")
else:
    print ("Do not reject Ho - The series has a unit root with only one structural break")

Zivot-Andrews statistic: -3.112052
P-value: 0.587514
Critical values:
    1%: -5.034
    5%: -4.406
    10%: -4.137
Baselag: 1.000000
Year corresponding to the Breakpoint index: 1996
Do not reject Ho - The series has a unit root with only one structural break

Results analysis:

The null hypothesis of this test is that there is a unit root in the time series with a structural break at some point. In this case, the test returned a statistical value of -3.112052 and a p-value of 0.587514.

Critical values are used to compare with the statistical value and determine whether or not to reject the null hypothesis. In this case, the statistical value is not less than the critical values corresponding to the significance level of 1%, 5%, and 10%, indicating that there is insufficient evidence to reject the null hypothesis. Therefore, the time series is considered to have a unit root with a single structural break in the year 1996 (index 25).

In summary, it cannot be affirmed that the series has a stationary trend, which implies that the future values of the series can depend on the past values. It is important to keep this in mind when performing analyzes and projections based on the time series in question.

# Plot both series with the structural break

fig, ax = plt.subplots(figsize=(15,5), nrows=1, ncols=2)
ax[0].plot(data['LPBI'])
ax[1].plot(data['LCE'])
ax[0].set_title("LBI series chart with structural break")
ax[1].set_title("LCE series chart with structural break ")
ax[0].axvline(x='1997', color='g', linestyle='--')
ax[1].axvline(x='1996', color='r', linestyle='--')
plt.setp(ax[0].xaxis.get_majorticklabels(), rotation=90)
plt.setp(ax[1].xaxis.get_majorticklabels(), rotation=90)
plt.show()

These results indicate that the null hypothesis of unit root for the series in levels cannot be rejected at any conventional level. The test calculates the breaks that occurred in both series and both are consistent with historical events, such as the privatization process of the electricity sector, which began in 1994 with the sale of distribution companies in Lima, continuing with the sale of generating companies in 1995 and 1996. Likewise, with respect to GDP, the results coincide with the effects of the El Niño phenomenon of 1997-98, which reached great intensity in Peru. In this regard, Contreras et. al (2016), indicate that the El Niño phenomenon constitutes a risk due to a supply shock for the Peruvian economy. When El Niño reaches extraordinary magnitudes, it destroys part of the economy’s capital stock and affects the flow of production of goods and services, all of which generates impacts on potential GDP, amplifying business cycles.

In this sense, to capture the presence of these two structural breaks in our model, we will generate dummy variables for each year in question.

# Create dummy variables to identify structural breaks

data['fecha'] = pd.date_range(start='1971-01-01', end='2014-01-01', freq='AS')
data['dummy_LPBI'] = np.where(data['fecha'].dt.year >= 1997, 0, 1)
data['dummy_LCE'] = np.where(data['fecha'].dt.year >= 1996, 0, 1)
data.head(30)

	CE	PBI	LPBI	LCE	DLPBI	DLCE	fecha	dummy_LPBI	dummy_LCE
year
1971	388.514407	8784.738435	9.080771	5.962330	NaN	NaN	1971-01-01	1	1
1972	406.578668	8848.246104	9.087975	6.007777	0.720331	4.544718	1972-01-01	1	1
1973	410.295619	9158.635033	9.122452	6.016878	3.447789	0.910049	1973-01-01	1	1
1974	426.906110	9763.916538	9.186449	6.056564	6.399645	3.968618	1974-01-01	1	1
1975	435.773811	9930.384340	9.203354	6.077123	1.690558	2.055922	1975-01-01	1	1
1976	444.372471	9828.826562	9.193075	6.096663	-1.027963	1.953978	1976-01-01	1	1
1977	472.861818	9621.308071	9.171736	6.158803	-2.133932	6.214010	1977-01-01	1	1
1978	466.629163	9134.067535	9.119766	6.145535	-5.196912	-1.326835	1978-01-01	1	1
1979	476.986445	9273.244882	9.134889	6.167488	1.512225	2.195321	1979-01-01	1	1
1980	499.016545	9581.071923	9.167545	6.212639	3.265612	4.515118	1980-01-01	1	1
1981	525.252025	9864.912250	9.196740	6.263878	2.919477	5.123894	1981-01-01	1	1
1982	549.297685	9606.151437	9.170159	6.308641	-2.658058	4.476233	1982-01-01	1	1
1983	513.679730	8399.468293	9.035924	6.241600	-13.423526	-6.704055	1983-01-01	1	1
1984	549.061203	8495.541192	9.047297	6.308210	1.137305	6.660994	1984-01-01	1	1
1985	541.510669	8468.238346	9.044078	6.294363	-0.321896	-1.384715	1985-01-01	1	1
1986	563.625975	9054.683447	9.111037	6.334391	6.695963	4.002810	1986-01-01	1	1
1987	587.769745	9712.858757	9.181206	6.376335	7.016852	4.194442	1987-01-01	1	1
1988	550.421426	8599.306113	9.059437	6.310684	-12.176914	-6.565107	1988-01-01	1	1
1989	547.161942	7372.997919	8.905580	6.304745	-15.385712	-0.593940	1989-01-01	1	1
1990	539.370691	6852.020519	8.832299	6.290403	-7.328082	-1.434174	1990-01-01	1	1
1991	571.270273	6857.103080	8.833040	6.347862	0.074149	5.745936	1991-01-01	1	1
1992	469.513757	6682.663370	8.807272	6.151698	-2.576844	-19.616483	1992-01-01	1	1
1993	515.057446	6893.502315	8.838335	6.244278	3.106266	9.258084	1993-01-01	1	1
1994	525.366487	7590.461646	8.934648	6.264096	9.631314	1.981765	1994-01-01	1	1
1995	537.198677	7997.691526	8.986908	6.286368	5.226053	2.227191	1995-01-01	1	1
1996	577.702027	8070.283319	8.995944	6.359058	0.903565	7.269021	1996-01-01	1	0
1997	593.880180	8437.797871	9.040477	6.386678	4.453277	2.761937	1997-01-01	0	0
1998	626.766533	8257.345024	9.018858	6.440574	-2.161825	5.389653	1998-01-01	0	0
1999	637.888448	8242.230310	9.017026	6.458163	-0.183213	1.758931	1999-01-01	0	0
2000	661.315738	8336.585137	9.028409	6.494231	1.138270	3.606797	2000-01-01	0	0

# Import libraries

from statsmodels.tsa.vector_ar.var_model import VAR

data1 = data.iloc[:, [2,3]]

# Get the order of lags

model = VAR(data1)
for i in [1,2,3,4,5]:
    result = model.fit(i)
    print('Lag Order =', i)
    print('AIC : ', result.aic)
    print('BIC : ', result.bic)
    print('FPE : ', result.fpe)
    print('HQIC: ', result.hqic, '\n')

Lag Order = 1
AIC :  -27.668168238678618
BIC :  -27.422419385326027
FPE :  9.639720240944462e-13
HQIC:  -27.577543656488704 

Lag Order = 2
AIC :  -27.922031805644647
BIC :  -27.508300944148605
FPE :  7.492056344826517e-13
HQIC:  -27.77038308159449 

Lag Order = 3
AIC :  -27.783459448694792
BIC :  -27.198337279576247
FPE :  8.644306564017248e-13
HQIC:  -27.570390215296857 

Lag Order = 4
AIC :  -27.906094162559903
BIC :  -27.14609840820863
FPE :  7.71509063479803e-13
HQIC:  -27.63130369569299 

Lag Order = 5
AIC :  -28.03770016604337
BIC :  -27.099280775918952
FPE :  6.871194102520998e-13
HQIC:  -27.701003364249942

# Select the order of lags

x = model.select_order(maxlags=5)
x.summary()

VAR Order Selection (* highlights the minimums)
	AIC	BIC	FPE	HQIC
0	-22.09	-22.00	2.554e-10	-22.06
1	-27.61	-27.36	1.020e-12	-27.52
2	-27.87	-27.44*	7.926e-13	-27.71*
3	-27.74	-27.14	9.075e-13	-27.52
4	-27.85	-27.08	8.156e-13	-27.58
5	-28.04*	-27.10	6.871e-13*	-27.70

These results refer to the selection of the order of lags for a VAR (Vector Autoregression) model using four information criteria: AIC (Akaike Information Criterion), BIC (Bayesian Information Criterion), FPE (Final Prediction Error) and HQIC (Hannan- Quinn Information Criterion).

Several lag orders have been tested, from 0 to 5. The lowest values of the information criteria (AIC, BIC, FPE, HQIC) indicate that that lag order is the best for the VAR model.

In this case, the selected lag order is 2, since it has the lowest BIC value of all the lag orders tested. In addition, its AIC, FPE, and HQIC value are very close to the minimum value, so it can also be considered as a good option.

3.3. Cointegration Test

As mentioned above, the Johansen test was applied. Below are the results of the Maximum Likelihood Cointegration tests based on the range of the Π matrix, the Trace and Maximum Eigenvalue tests. The results of the Cointegration test between real GDP and energy consumption imply that the null hypothesis of no Cointegration in both tests, that is, H₀ is rejected at 5% significance, since the is statistic calculated for each is greater than the critical value.

# Import libraries

from statsmodels.tsa.vector_ar.vecm import coint_johansen

# Method 1: Perform Johansen's cointegration test

data1 = data.iloc[:, [2, 3]]
array = np.array(data1.values)
result_cj = coint_johansen(array, det_order=-1, k_ar_diff=1)
print('Rank:', result_cj.ind[0])
print('Eigenvalue:', result_cj.eig[0])
print('trace statistic: ', result_cj.trace_stat[0])
print('Critical value at 5%: ', result_cj.trace_stat_crit_vals[0][1])
print('Max Eigenvalue: ', result_cj.max_eig_stat[0])
print('Max Critical value at 5%: ', result_cj.max_eig_stat_crit_vals[0][1])
#
#
print('Rank:', result_cj.ind[1])
print('Eigenvalue:', result_cj.eig[1])
print('trace statistic: ', result_cj.trace_stat[1])
print('Critical value at 5%: ', result_cj.trace_stat_crit_vals[1][1])
print('Max Eigenvalue: ', result_cj.max_eig_stat[1])
print('Max Critical value at 5%: ', result_cj.max_eig_stat_crit_vals[1][1])

Rank: 0
Eigenvalue: 0.3461271127934622
trace statistic:  19.63366014549301
Critical value at 5%:  12.3212
Max Eigenvalue:  17.843376956378705
Max Critical value at 5%:  11.2246
Rank: 1
Eigenvalue: 0.04173008304161474
trace statistic:  1.7902831891143038
Critical value at 5%:  4.1296
Max Eigenvalue:  1.7902831891143038
Max Critical value at 5%:  4.1296

Based on the result obtained with the Johansen Cointegration Test, the Error Correction Model is derived. In this part, we include in the Error Correction Model, two dummy intervention variables that collect the effects of structural breaks in the variables found with the test of Zivot-Andrews, to take them into account in the short- and long-term adjustment dynamics.

The Granger representation theorem allows us to model short-term and long-term dynamics through an error correction model. However, first the optimal number of lags must be determined, which is obtained through the VAR(p) model. In this case, the optimal number of lags is 2, that is, our VAR model is of order 2, VAR (2). Now, the error correction model (VEC(p-1)) will be a VEC of order 1, that is, VEC(1).

# Import libraries

from statsmodels.tsa.vector_ar import vecm

# Method 2: Perform Johansen's cointegration test

vec_rank=vecm.select_coint_rank(data1, det_order=-1, k_ar_diff=1, method='trace', signif=0.05)
print(vec_rank.summary())

Johansen cointegration test using trace test statistic with 5% significance level
=====================================
r_0 r_1 test statistic critical value
-------------------------------------
  0   2          19.63          12.32
  1   2          1.790          4.130
-------------------------------------

The result of the Johansen cointegration test indicates that for r = 0, the value of the trace test statistic is 19.63. The corresponding critical value for a significance level of 5% is 12.32. Since the value of the test statistic is greater than the critical value, one can reject the null hypothesis that there is no cointegration between the variables and conclude that there is at least one cointegration relationship between them.

# Get maximum eigenvalues

vec_rank1=vecm.select_coint_rank(data1, det_order=-1, k_ar_diff=2, method='maxeig', signif=0.05)
print(vec_rank1.summary())

Johansen cointegration test using maximum eigenvalue test statistic with 5% significance level
=====================================
r_0 r_1 test statistic critical value
-------------------------------------
  0   1          12.77          11.22
  1   2          3.527          4.130
-------------------------------------

dummies = data.iloc[:, [7, 8]]
array = np.array(dummies.values)

Based on the VEC model, after applying the Johansen Cointegration Test between energy consumption and GDP, the adjustment dynamics in the short and long term can be analyzed.

# Import Libraries

from statsmodels.tsa.vector_ar.vecm import VECM

# Apply the error correction model (VECM)

vecm = VECM(data1, exog=dummies, k_ar_diff=1, coint_rank=1, deterministic='n', dates=data['fecha'], )
vecm_fit = vecm.fit()
print(vecm_fit.summary())

Det. terms outside the coint. relation & lagged endog. parameters for equation LPBI
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
exog1         -0.0428      0.050     -0.852      0.394      -0.141       0.056
exog2          0.0088      0.049      0.179      0.858      -0.088       0.106
L1.LPBI        0.4775      0.163      2.924      0.003       0.157       0.797
L1.LCE        -0.1407      0.170     -0.828      0.408      -0.474       0.192
Det. terms outside the coint. relation & lagged endog. parameters for equation LCE
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
exog1         -0.0159      0.043     -0.369      0.712      -0.100       0.069
exog2         -0.0657      0.042     -1.548      0.122      -0.149       0.017
L1.LPBI        0.2175      0.140      1.555      0.120      -0.057       0.492
L1.LCE        -0.5215      0.146     -3.582      0.000      -0.807      -0.236
                Loading coefficients (alpha) for equation LPBI                
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ec1            0.0275      0.016      1.678      0.093      -0.005       0.060
                Loading coefficients (alpha) for equation LCE                 
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
ec1            0.0812      0.014      5.791      0.000       0.054       0.109
          Cointegration relations for loading-coefficients-column 1           
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
beta.1         1.0000          0          0      0.000       1.000       1.000
beta.2        -1.2557      0.020    -61.722      0.000      -1.296      -1.216
==============================================================================

The results show that in the short term, the L1.LPBI coefficient is positive and significant at the 1% level, which indicates that in the short term, a variation in the GDP growth rate (LPBI) will translate into a increase in its own growth rate in the same period. On the other hand, the L1.LCE coefficient is negative but not statistically significant, which suggests that the growth rate of electricity consumption (LCE) does not significantly affect the GDP growth rate in the short term.

In this order of ideas, the energy conservation hypothesis is fulfilled for the Peruvian economy in the short term. Regarding the long-term relationships, the results of the cointegration relationships can be analyzed in the last table. Where we can see that the beta.2 coefficient is negative and significant at the 1% level, which indicates that in the long term there is a direct relationship between the growth rate of Peru’s GDP and the growth rate of energy consumption in Peru . In other words, in the long term, an increase in energy consumption of 1% generates an increase in GDP of 1,256%, therefore, in the long term, energy consumption does affect GDP.

Finally, the adjustment coefficient (speed of adjustment) to long-term imbalances of the model (∆LPIB) is 2.75%, while which for the model (∆LCE) is 8.12%. This implies a faster adjustment in the energy consumption error correction model.

# Make forecasts

vecm1 = VECM(data1, k_ar_diff=2, coint_rank=1, deterministic='n', dates=data['fecha'])
vecm_fit1 = vecm1.fit()
vecm_fit1.plot_forecast(5)

4. Conclusions

In this article, the short- and long-term relationship between energy consumption and real GDP for the Colombian economy during the period 1970-2009 was analyzed, using annual time series. First, the long-term elasticity of energy consumption to real GDP was estimated and then, under the VEC modeling, the short- and long-term dynamics were recognized.

The empirical results for the Peruvian case suggest the existence of a long-term bidirectional causal relationship between energy consumption and GDP. In other words, the fact that there is cointegration between the variables confirms the relationship between them, that is, it suggests that in the long term there is a feedback between energy consumption and GDP.

In summary, the results indicate that in the short term there is no relationship between energy consumption and GDP, but there is in the long term, so our results can be placed under the long-term energy conservation hypothesis. term and a long-term feedback effect.

References

Abosedra, A., et al. (1991). New evidence on the causal relationship between U.S. Energy consumption y Gross National product. Journal of Energy and Development.
Al-iriani, M. (2006). Energy GDP relationship revisted: an example from GCC countries using panel causality. Energy Policy.
Dickey, D. & Fuller, W. (1979). Distribution of the Estimators for Autoregressive Time Series with a Unit Root. Journal of the American Statistical Association.
Erol, et al. (2001). On the causal relationship between energy and income for industrialized countries. Journal of Energy Development.
Granger, C. & Newbold, P. (1974). Spurious Regressions in Econometrics. Journal of Econometrics, vol. 2: 111-120, 1974.
Hwang, D. & Gum, B. (1992). The causal relationship between energy and GNP: the case of Taiwan. Journal of Energy and Development.
Johansen, S. (1988). Statistical analysis of cointegration vectors. Journal of Economic Dynamics and Control, vol. 12.
Ozturk et at. (2010). The causal relationship between energy consumption and GDP in Albania, Bulgaria, Hungary and Romania: Evidence from ARDL bound testing approach. Applied Energy, Elsevier, vol. 87.
Sims, C. (1980). Macroeconomics and Reality. Econométrica, vol. 48(1): 1-48.
Soytas et al. (2001). Energy Consumption and GDP Relations in Turkey: A Cointegration and Vector Error Correction Analysis. Economics and Business in Transition: Facilitating Competitiveness and Change in the Global Environment Proceedings. Global Business and Technology Association.
Yu et al. (1984). The relationship between energy and GNP: further results. Energy Economics, vol. 6: 186-190, 1984.