Time Series Analysis is one of the ways of studying and understanding the characteristics of a selected variable through the time. Time variable gives the opportunity of seeing the changes that the other variables have over a period of time. This type of analysis is done with the intention of forecasting using a time variable as a reference point.
“When organizations analyze data over consistent intervals, they can also use time series forecasting to predict the likelihood of future events. Time series forecasting is part of predictive analytics. It can show likely changes in the data, like seasonality or cyclic behavior, which provides a better understanding of data variables and helps forecast better”. (Tableau, 2023)
Mexico and its attractiveness for nearshoring is an interesting case of study due its proximity to United States, one of the larger consumer markets in the world. Be.sides, the free trade agreement between Canada, United States and the analized country in this case, not mentioning its human cheap and competitive capital. This are some of the trends and the reasons why Mexico has become a potential country for big companies to invest in. A significant example of the nearshoring trend in Mexico is the recent announcement of Tesla’s new factory construction in Nuevo León. (NuvoCargo, 2023)
With all the context provided previously, it is time to understand the problem situation. The attractiveness of Mexico for nearshoring is a case that could bring many benefits to the Mexican economy and the country’s growth prospects. As mentioned earlier, Mexico boasts numerous attributes that can attract significant investments from large companies, promising a host of advantages for the nation.
In summary, the problem situation involves the analysis of the provided data to see flows behavior through time, comprehend and assess the relationships among the variables within the dataset (which will be presented and explained below) and Foreign Direct Investment (FDI) flows.
To tackle this problem effectively, it is essential to gain insights through the testing of forecasting models. These models will guide the country’s efforts toward the targeted development of factors that can attract FDI-driven nearshoring.
It is important to understand that, for the first part of this analysis, only “IED_Flows” and “Period” are going to be used. For this part, variable period is in quarters. Then, for the second part, more variables are going to be used.
library(plotly)
library(xts)
library(dplyr)
library(zoo)
library(tseries)
library(stats)
library(forecast)
library(astsa)
library(corrplot)
library(AER)
library(vars)
library(dynlm)
library(vars)
library(TSstudio)
library(tidyverse)
library(sarima)
library(dygraphs)
#Import dataset
data <- read.csv("C:\\Users\\danyb\\OneDrive - Instituto Tecnologico y de Estudios Superiores de Monterrey\\Docs\\Documentos\\Business Intelligence\\Quinto Semestre\\Introduction to Econometrics\\TS_SP.csv")
str(data)
## 'data.frame': 96 obs. of 2 variables:
## $ Date : chr "01/01/1999" "04/01/1999" "07/01/1999" "10/01/1999" ...
## $ IED_Flows: num 3596 3396 3028 3940 4601 ...
Variable “Date” is not in the correct format.
data$Date <- as.Date(data$Date,"%m/%d/%Y")
str(data)
## 'data.frame': 96 obs. of 2 variables:
## $ Date : Date, format: "1999-01-01" "1999-04-01" ...
## $ IED_Flows: num 3596 3396 3028 3940 4601 ...
Now it is in date format.
#Checked for NA's
colSums(is.na(data))
## Date IED_Flows
## 0 0
There are no missing values on the dataset.
plot_ly(data = data, x = ~Date, y = ~IED_Flows, type = "scatter", mode = "lines") %>%
layout(title = "IED_Flows",
xaxis = list(title = "Date"),
yaxis = list(title = "IED_Flows")) %>%
add_trace(text = ~paste("IED Flows: $", IED_Flows, "Date:", Date),
hoverinfo = "text")
It is possible to observe three increases within the time series plot. The first one is in 2001, the second one in 2013, and the last one in 2022. It is not easy to identify a trend in this time series, that is why decomposing could give a more clearly visualization of this variable.
Flowd<-ts(data$IED_Flows,start=c(1999,01),end=c(2022,12),frequency=4)
flowdec<-decompose(Flowd)
plot(flowdec)
It is difficult to establish that this time series has a trend, although
it can be seen that it has grown over time, there is a sharp drop in the
later periods of 2022. If it shows a seasonal factor, there is a pattern
that repeats with ups and downs. Additionally, it can be observed that
the peaks mentioned earlier are present in the random factor of the time
series and align with those in the time series plot.
adf.test(data$IED_Flows)
## Warning in adf.test(data$IED_Flows): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data$IED_Flows
## Dickey-Fuller = -4.1994, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
We got a p-value of 0.01, failing to reject H0, which means that there is an stationary time series data. It is important to mention that this data is on its original form.
acf(data$IED_Flows,main="Significant Autocorrelations")
Box.test(data$IED_Flows,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: data$IED_Flows
## X-squared = 0.028693, df = 1, p-value = 0.8655
Having a p-value of 0.9655 (>0.05), we fail to reject H0, indicating that this variable does not show serial autocorrelation. Seeing at the graph, it could be serial autocorrelation for lag = 4
Box.test(data$IED_Flows,lag=4,type="Ljung-Box")
##
## Box-Ljung test
##
## data: data$IED_Flows
## X-squared = 21.948, df = 4, p-value = 0.0002053
When using lag=4, p-value is well below 0.05, rejecting H0, indicating that this variable does show serial autocorrelation with this number of lags.
Model 1, ARMA
summary(IED_ARMA<-arma(log(data$IED_Flows)),order=c(1,1))
##
## Call:
## arma(x = log(data$IED_Flows))
##
## Model:
## ARMA(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41237 -0.35244 -0.00571 0.27709 1.50759
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## ar1 -0.2976 0.2271 -1.310 0.19003
## ma1 0.5173 0.1999 2.588 0.00967 **
## intercept 11.3149 1.9794 5.716 1.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Fit:
## sigma^2 estimated as 0.2688, Conditional Sum-of-Squares = 25.26, AIC = 152.3
IED_ARIMA <- Arima(log(data$IED_Flows),order=c(4,1,1))
summary(IED_ARIMA)
## Series: log(data$IED_Flows)
## ARIMA(4,1,1)
##
## Coefficients:
## ar1 ar2 ar3 ar4 ma1
## 0.0301 -0.2880 0.0024 0.2546 -0.9147
## s.e. 0.1263 0.1208 0.1224 0.1220 0.0705
##
## sigma^2 = 0.2389: log likelihood = -65.46
## AIC=142.91 AICc=143.87 BIC=158.24
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set 0.05648677 0.4732425 0.3487702 0.376108 4.031802 0.6410168
## ACF1
## Training set -0.01918978
IED_ARMA_R<-IED_ARMA$residuals
Box.test(IED_ARMA_R,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: IED_ARMA_R
## X-squared = 0.32592, df = 1, p-value = 0.5681
Having a p-value of 0.57 (>0.05), we fail to reject H0, indicating that this ARMA model does not show residual serial autocorrelation in its residuals, and they behave independently.
#When estimating this ARMA model, first value is null, so we need to omit it.
IED_ARMA$fitted.values <- na.omit(IED_ARMA$fitted.values)
IED_ARMA$residuals <- na.omit(IED_ARMA$residuals)
#Testing residuals
adf.test(IED_ARMA$residuals)
##
## Augmented Dickey-Fuller Test
##
## data: IED_ARMA$residuals
## Dickey-Fuller = -3.644, Lag order = 4, p-value = 0.03336
## alternative hypothesis: stationary
For ADF residuals, p-value is lower than 0.05, so we reject H0, and it can be concluded that residuals are stationary.
hist(IED_ARMA$residuals)
It is a little left-skewed, but close to follow a normal distribution.
#Reverting log transforming
IED_ARMA$fitted.values<-exp(IED_ARMA$fitted.values)
tsplot(IED_ARMA$fitted.values)
Diagnostic Tests for ARIMA Model
IED_ARIMA_R<-IED_ARIMA$residuals
Box.test(IED_ARIMA_R,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: IED_ARIMA_R
## X-squared = 0.036468, df = 1, p-value = 0.8486
Having a p-value of 0.96 (>0.05), we fail to reject H0, indicating that this ARMA model does not show residual serial autocorrelation in its residuals, and they behave independently.
#When estimating this ARMA model, first value is null, so we need to omit it.
#Testing residuals
adf.test(IED_ARIMA$residuals)
## Warning in adf.test(IED_ARIMA$residuals): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: IED_ARIMA$residuals
## Dickey-Fuller = -5.2941, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
For ADF residuals, p-value is lower than 0.05, so we reject H0, and it can be concluded that residuals are stationary.
hist(IED_ARIMA$residuals)
It is right-skewed, so it might suggest that residuals for ARIMA Model do not follow a normal distribution
IED_ARIMA$fitted<-exp(IED_ARIMA$fitted)
tsplot(IED_ARIMA$fitted)
It is possible to see that “IED_ARIMA” model has a lower value for AIC test, and it also has a higher value for Ljung-boxtest, that is why we are going to select this model to make a forecast for the next 5 periods in the time series.
FDI_forecast<-forecast(IED_ARIMA$fitted,h=5) #h=5 for 5 periods.
FDI_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 97 7856.942 5540.891 10172.99 4314.847 11399.04
## 98 7869.019 5549.355 10188.68 4321.400 11416.64
## 99 7880.854 5557.597 10204.11 4327.739 11433.97
## 100 7892.452 5565.603 10219.30 4333.843 11451.06
## 101 7903.819 5573.360 10234.28 4339.690 11467.95
When generating a forecast with this model, we can obtain an estimate of what the Foreign Direct Investment for the next 5 periods could be. Taking into account a 95% confidence level, these values would be as follows:
Pointed values for this forecast are:
The graphs for this forecast are plotted below.
#Importing dataset
data1 <- read.csv("C:\\Users\\danyb\\OneDrive - Instituto Tecnologico y de Estudios Superiores de Monterrey\\Docs\\Documentos\\Business Intelligence\\Quinto Semestre\\Introduction to Econometrics\\data_sp.csv")
str(data1)
## 'data.frame': 26 obs. of 16 variables:
## $ Period : int 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
## $ IED_Flows : num 12146 8374 13960 18249 30057 ...
## $ Exports : num 9088 9875 10990 12483 11300 ...
## $ Employment : num NA NA NA 97.8 97.4 ...
## $ Education : num 7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
## $ Daily_Salary : num 24.3 31.9 31.9 35.1 37.6 ...
## $ Innovation : num 11.3 11.4 12.5 13.2 13.5 ...
## $ Insecurity_Robbery : num 267 315 273 217 215 ...
## $ Insecurity_Homicide: num 14.6 14.3 12.6 10.9 10.2 ...
## $ Exchange_Rate : num 8.06 9.94 9.52 9.6 9.17 ...
## $ Road_Density : num 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
## $ Population_Density : num 47.4 48.8 49.5 50.6 51.3 ...
## $ CO2_Emissions : num 3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
## $ GPD_PER_CAPITA : num 127570 126739 129165 130875 128083 ...
## $ CPI : num 33.3 39.5 44.3 48.3 50.4 ...
## $ Financial_Crisis : int 0 0 0 0 0 0 0 0 0 0 ...
Date is in a different format, it needs to be transformed.
colSums(is.na(data1))
## Period IED_Flows Exports Employment
## 0 0 0 3
## Education Daily_Salary Innovation Insecurity_Robbery
## 3 0 2 0
## Insecurity_Homicide Exchange_Rate Road_Density Population_Density
## 1 0 0 0
## CO2_Emissions GPD_PER_CAPITA CPI Financial_Crisis
## 3 0 0 0
There are some missing values, we will need to transform them.
#Replacing Missing Values with Median Values.
data1 <- data1 %>%
mutate_all(~ replace_na(., median(., na.rm = TRUE)))
#Checking the replacing
colSums(is.na(data1))
## Period IED_Flows Exports Employment
## 0 0 0 0
## Education Daily_Salary Innovation Insecurity_Robbery
## 0 0 0 0
## Insecurity_Homicide Exchange_Rate Road_Density Population_Density
## 0 0 0 0
## CO2_Emissions GPD_PER_CAPITA CPI Financial_Crisis
## 0 0 0 0
Foreign Direct Investment Flows and Exports are given in million dollars (USD), while the other variables are in Mexican pesos (MXN)
#It would be necessary to change the IED and the Exports to MXN
data1$IED_FlowsMXN <- ((data1$IED_Flows * data1$Exchange_Rate) / data1$CPI) * 100
data1$Exports_MXN <- ((data1$Exports * data1$Exchange_Rate) / data1$CPI) * 100
#Eliminate the original columns, they will not be necessary.
data1 <- subset(data1, select = -c(IED_Flows, Exports))
#Confirm changes
colnames(data1)
## [1] "Period" "Employment" "Education"
## [4] "Daily_Salary" "Innovation" "Insecurity_Robbery"
## [7] "Insecurity_Homicide" "Exchange_Rate" "Road_Density"
## [10] "Population_Density" "CO2_Emissions" "GPD_PER_CAPITA"
## [13] "CPI" "Financial_Crisis" "IED_FlowsMXN"
## [16] "Exports_MXN"
Looking at these variables, the ones that could explain the Nearshoring in Mexico and the selected for this analysis are going to be:
It is expected that these variables have a positive relation with Foreign Direct Investment. As they grow, it is expected that investment will also increase.
#It is necessary to transform variable "Date" in order to plot a time series.
# Time series plot of the selected variables
par(mfrow=c(2,3))
plot(data1$Period,data1$Daily_Salary,type="l",col="blue",lwd=2,xlab="Date",ylab="Daily Salary",main="Daily Salary")
plot(data1$Period,data1$Exports_MXN,type="l",col="blue",lwd=2,xlab="Date",ylab="Exports",main="Exports")
plot(data1$Period,data1$Exchange_Rate,type="l",col="blue",lwd=2,xlab="Date",ylab="Exchange Rate",main="Exchange Rate")
plot(data1$Period,data1$Education,type="l",col="blue",lwd=2,xlab="Date",ylab="Education",main="Education")
plot(data1$Period,data1$Innovation,type="l",col="blue",lwd=2,xlab="Date",ylab="Innovation",main="Innovation")
plot(data1$Period,data1$IED_FlowsMXN,type="l",col="blue",lwd=2,xlab="Date",ylab="Foreign Direct Investment",main="IED")
It is highly relevant to see that all the variables, except Innovation, seem to have a positive relation with dependent variable “IED_FlowsMXN
When estimating a model, it is necessary to use stationary variables. If time series data are non-stationary, results of forecasting or modeling might be false or incorrect.
#Testing selected variables
adf.test(data1$IED_FlowsMXN)
##
## Augmented Dickey-Fuller Test
##
## data: data1$IED_FlowsMXN
## Dickey-Fuller = -2.0122, Lag order = 2, p-value = 0.5677
## alternative hypothesis: stationary
adf.test(data1$Exchange_Rate)
##
## Augmented Dickey-Fuller Test
##
## data: data1$Exchange_Rate
## Dickey-Fuller = -2.386, Lag order = 2, p-value = 0.4254
## alternative hypothesis: stationary
adf.test(data1$Innovation)
##
## Augmented Dickey-Fuller Test
##
## data: data1$Innovation
## Dickey-Fuller = -3.8976, Lag order = 2, p-value = 0.02874
## alternative hypothesis: stationary
adf.test(data1$Education)
## Warning in adf.test(data1$Education): p-value greater than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data1$Education
## Dickey-Fuller = 1.1162, Lag order = 2, p-value = 0.99
## alternative hypothesis: stationary
adf.test(data1$Daily_Salary)
## Warning in adf.test(data1$Daily_Salary): p-value greater than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: data1$Daily_Salary
## Dickey-Fuller = 6.0073, Lag order = 2, p-value = 0.99
## alternative hypothesis: stationary
adf.test(data1$Exports_MXN)
##
## Augmented Dickey-Fuller Test
##
## data: data1$Exports_MXN
## Dickey-Fuller = -2.2975, Lag order = 2, p-value = 0.4591
## alternative hypothesis: stationary
Once we obtained the results for Augmented Dickey-Fuller Test for the variables we are pretending to use in our model in their original format, it is possible to see that all of them are non-stationary, having a p-value greater than 0.05. That is why we need to transform them into differences and their log value.
#Testing selected variables
adf.test(diff(log(data1$IED_FlowsMXN)))
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(data1$IED_FlowsMXN))
## Dickey-Fuller = -2.9934, Lag order = 2, p-value = 0.1939
## alternative hypothesis: stationary
adf.test(diff(log(data1$Exchange_Rate)))
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(data1$Exchange_Rate))
## Dickey-Fuller = -3.5079, Lag order = 2, p-value = 0.0628
## alternative hypothesis: stationary
adf.test(diff(log(data1$Innovation)))
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(data1$Innovation))
## Dickey-Fuller = -3.5781, Lag order = 2, p-value = 0.05305
## alternative hypothesis: stationary
adf.test(diff(log(data1$Education)))
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(data1$Education))
## Dickey-Fuller = -0.94073, Lag order = 2, p-value = 0.9293
## alternative hypothesis: stationary
adf.test(diff(log(data1$Daily_Salary)))
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(data1$Daily_Salary))
## Dickey-Fuller = -0.7432, Lag order = 2, p-value = 0.9547
## alternative hypothesis: stationary
adf.test(diff(log(data1$Exports_MXN)))
##
## Augmented Dickey-Fuller Test
##
## data: diff(log(data1$Exports_MXN))
## Dickey-Fuller = -2.7797, Lag order = 2, p-value = 0.2754
## alternative hypothesis: stationary
Transforming the variables decrease the p-value, but we do not obtain stationary variables. It is crucial to understand that estimating a model and forecasting with this model, might have incorrect forecasts. We are going to do it, but it may not be an accurate forecast, since time series variables are non stationary.
#Converting to time series format
ied<-ts(data1$IED_FlowsMXN,start=c(1996),end=c(2022),frequency=1) #I used 1996 because if I wrote 1997, it started on 1998.
er<-ts(data1$Exchange_Rate,start=c(1996),end=c(2022),frequency=1)
inn<-ts(data1$Innovation,start=c(1996),end=c(2022),frequency=1)
edu<-ts(data1$Education,start=c(1996),end=c(2022),frequency=1)
ds<-ts(data1$Daily_Salary,start=c(1996),end=c(2022),frequency=1)
exp<-ts(data1$Exports_MXN,start=c(1996),end=c(2022),frequency=1)
#Transforming into diff and log values.
died<-diff(log(ied))
der<-diff(log(er))
dinn<-diff(log(inn))
dedu<-diff(log(edu))
dds<-diff(log(ds))
dexp<-diff(log(exp))
data_1<-cbind(died, der, dinn,dedu,dds,dexp)
colnames(data_1)<-cbind("IED_FlowsMXN", "Exchange_Rate", "Innovation","Education","Daily_Salary","ExportsMXN")
data_1
## Time Series:
## Start = 1997
## End = 2022
## Frequency = 1
## IED_FlowsMXN Exchange_Rate Innovation Education Daily_Salary
## 1997 -0.3328258167 0.209653464 0.006175582 0.015162248 0.27244309
## 1998 0.3516285761 -0.043172172 0.091545206 0.016282585 0.00000000
## 1999 0.1904911522 0.008368250 0.053898245 0.017345331 0.09585133
## 2000 0.4102348012 -0.045825812 0.024043232 0.015748357 0.06743509
## 2001 -0.1544452519 0.122014951 -0.051019819 0.015504187 0.05615238
## 2002 -0.2390511148 0.077961541 -0.080498541 0.016529302 0.04405782
## 2003 0.2664664748 0.001784122 0.065543520 0.013776048 0.04173658
## 2004 -0.0485349505 -0.046520016 0.061510547 0.012361097 0.04382902
## 2005 -0.2186285687 0.015748357 0.059351715 0.014634408 0.03922921
## 2006 0.3872999150 0.001836548 0.055360906 0.012033840 0.03815745
## 2007 0.0769619346 0.233729524 -0.014735699 0.011890747 0.03931513
## 2008 -0.5919653541 -0.054470756 -0.163074772 0.011751017 0.04518696
## 2009 0.3257949987 -0.051939289 0.007911434 0.008144315 0.04736568
## 2010 0.0251260056 0.121545470 -0.047608829 0.013809195 0.04024087
## 2011 -0.2719216084 -0.073447906 0.074048939 0.011363759 0.04529012
## 2012 0.7652471159 0.006139697 0.014476443 0.011236073 0.03827059
## 2013 -0.3861679261 0.119566703 0.032008687 0.011111225 0.03823309
## 2014 0.3111388902 0.163129741 0.101617255 0.010989122 0.06665202
## 2015 0.0002678477 0.175183492 -0.048128570 0.010869672 0.04108444
## 2016 -0.0242509147 -0.045552430 -0.024605811 0.010752792 0.19041214
## 2017 -0.0181637830 -0.004060919 -0.058624843 0.010638398 0.00000000
## 2018 -0.0856381189 -0.041012755 -0.042395559 0.013662850 0.15019798
## 2019 -0.1795515069 0.055154405 -0.118570747 -0.124328418 0.18235402
## 2020 0.0698284662 0.028672256 0.148817334 0.000000000 0.13974077
## 2021 0.0069226501 -0.055611623 0.000000000 0.000000000 0.19882772
## 2022 -0.6362639138 -0.878874841 -0.147045854 -0.161268148 -1.96206352
## ExportsMXN
## 1997 0.12216954
## 1998 -0.05254521
## 1999 0.04999471
## 2000 -0.18829632
## 2001 0.12011329
## 2002 0.13736474
## 2003 -0.01765435
## 2004 0.11394502
## 2005 0.03615943
## 2006 0.05337562
## 2007 0.04932858
## 2008 0.06248208
## 2009 0.04540176
## 2010 0.15474024
## 2011 -0.03090354
## 2012 0.01868257
## 2013 0.21415235
## 2014 0.08629861
## 2015 0.18831039
## 2016 -0.05082482
## 2017 0.03822088
## 2018 -0.06961734
## 2019 0.14459429
## 2020 0.04698442
## 2021 -0.09667247
## 2022 -1.17580448
lag_selection<-VARselect(data_1,lag.max=3,type="const") #Season is not required, we have information in years.
lag_selection$selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 3 3 3 3
lag_selection$criteria
## 1 2 3
## AIC(n) -2.599393e+01 -2.584597e+01 -1.058719e+02
## HQ(n) -2.547245e+01 -2.487750e+01 -1.044564e+02
## SC(n) -2.392042e+01 -2.199516e+01 -1.002438e+02
## FPE(n) 5.792504e-12 1.470289e-11 6.956325e-45
Having this information, we are going to use lag=3, it has the lowest AIC value.
# We estimate the VAR model. The p option refers to the number of lags used.
# It was not possible to use p=3, that is why we are going to use a number 1 for lags.
VAR_model<-VAR(data_1,p=1,type="const")
summary(VAR_model)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: IED_FlowsMXN, Exchange_Rate, Innovation, Education, Daily_Salary, ExportsMXN
## Deterministic variables: const
## Sample size: 25
## Log Likelihood: 156.131
## Roots of the characteristic polynomial:
## 1.513 0.5615 0.5615 0.372 0.372 0.08178
## Call:
## VAR(y = data_1, p = 1, type = "const")
##
##
## Estimation results for equation IED_FlowsMXN:
## =============================================
## IED_FlowsMXN = IED_FlowsMXN.l1 + Exchange_Rate.l1 + Innovation.l1 + Education.l1 + Daily_Salary.l1 + ExportsMXN.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_FlowsMXN.l1 -0.64081 0.25494 -2.514 0.0217 *
## Exchange_Rate.l1 -1.16071 0.91403 -1.270 0.2203
## Innovation.l1 1.96395 0.96744 2.030 0.0574 .
## Education.l1 -0.54674 2.58016 -0.212 0.8346
## Daily_Salary.l1 -0.54970 1.04913 -0.524 0.6067
## ExportsMXN.l1 0.83552 1.01774 0.821 0.4224
## const 0.06625 0.12354 0.536 0.5984
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.2869 on 18 degrees of freedom
## Multiple R-Squared: 0.4063, Adjusted R-squared: 0.2083
## F-statistic: 2.053 on 6 and 18 DF, p-value: 0.1108
##
##
## Estimation results for equation Exchange_Rate:
## ==============================================
## Exchange_Rate = IED_FlowsMXN.l1 + Exchange_Rate.l1 + Innovation.l1 + Education.l1 + Daily_Salary.l1 + ExportsMXN.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_FlowsMXN.l1 0.11614 0.17860 0.650 0.524
## Exchange_Rate.l1 0.04310 0.64032 0.067 0.947
## Innovation.l1 0.26654 0.67774 0.393 0.699
## Education.l1 -0.53078 1.80752 -0.294 0.772
## Daily_Salary.l1 -0.95833 0.73496 -1.304 0.209
## ExportsMXN.l1 0.59871 0.71298 0.840 0.412
## const 0.03605 0.08655 0.417 0.682
##
##
## Residual standard error: 0.201 on 18 degrees of freedom
## Multiple R-Squared: 0.2435, Adjusted R-squared: -0.00871
## F-statistic: 0.9655 on 6 and 18 DF, p-value: 0.4756
##
##
## Estimation results for equation Innovation:
## ===========================================
## Innovation = IED_FlowsMXN.l1 + Exchange_Rate.l1 + Innovation.l1 + Education.l1 + Daily_Salary.l1 + ExportsMXN.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_FlowsMXN.l1 -0.02114 0.05935 -0.356 0.7258
## Exchange_Rate.l1 -0.34412 0.21278 -1.617 0.1232
## Innovation.l1 0.32909 0.22522 1.461 0.1612
## Education.l1 -0.85487 0.60065 -1.423 0.1718
## Daily_Salary.l1 -0.02488 0.24423 -0.102 0.9200
## ExportsMXN.l1 0.60186 0.23693 2.540 0.0205 *
## const -0.01039 0.02876 -0.361 0.7222
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.06678 on 18 degrees of freedom
## Multiple R-Squared: 0.4568, Adjusted R-squared: 0.2757
## F-statistic: 2.523 on 6 and 18 DF, p-value: 0.05972
##
##
## Estimation results for equation Education:
## ==========================================
## Education = IED_FlowsMXN.l1 + Exchange_Rate.l1 + Innovation.l1 + Education.l1 + Daily_Salary.l1 + ExportsMXN.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_FlowsMXN.l1 0.00678 0.03548 0.191 0.8506
## Exchange_Rate.l1 0.05943 0.12721 0.467 0.6460
## Innovation.l1 0.09850 0.13464 0.732 0.4738
## Education.l1 -0.12172 0.35909 -0.339 0.7386
## Daily_Salary.l1 -0.26484 0.14601 -1.814 0.0864 .
## ExportsMXN.l1 0.14087 0.14164 0.995 0.3331
## const 0.01150 0.01719 0.669 0.5122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.03992 on 18 degrees of freedom
## Multiple R-Squared: 0.3632, Adjusted R-squared: 0.1509
## F-statistic: 1.711 on 6 and 18 DF, p-value: 0.1757
##
##
## Estimation results for equation Daily_Salary:
## =============================================
## Daily_Salary = IED_FlowsMXN.l1 + Exchange_Rate.l1 + Innovation.l1 + Education.l1 + Daily_Salary.l1 + ExportsMXN.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_FlowsMXN.l1 -0.001795 0.372434 -0.005 0.996
## Exchange_Rate.l1 0.438595 1.335269 0.328 0.746
## Innovation.l1 0.402214 1.413298 0.285 0.779
## Education.l1 -1.536922 3.769245 -0.408 0.688
## Daily_Salary.l1 -2.181544 1.532627 -1.423 0.172
## ExportsMXN.l1 0.907998 1.486777 0.611 0.549
## const 0.109786 0.180475 0.608 0.551
##
##
## Residual standard error: 0.4191 on 18 degrees of freedom
## Multiple R-Squared: 0.2179, Adjusted R-squared: -0.04281
## F-statistic: 0.8358 on 6 and 18 DF, p-value: 0.5582
##
##
## Estimation results for equation ExportsMXN:
## ===========================================
## ExportsMXN = IED_FlowsMXN.l1 + Exchange_Rate.l1 + Innovation.l1 + Education.l1 + Daily_Salary.l1 + ExportsMXN.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_FlowsMXN.l1 0.086316 0.237850 0.363 0.721
## Exchange_Rate.l1 0.399314 0.852754 0.468 0.645
## Innovation.l1 0.002461 0.902587 0.003 0.998
## Education.l1 -0.875993 2.407185 -0.364 0.720
## Daily_Salary.l1 -1.536950 0.978795 -1.570 0.134
## ExportsMXN.l1 0.247791 0.949513 0.261 0.797
## const 0.093484 0.115258 0.811 0.428
##
##
## Residual standard error: 0.2676 on 18 degrees of freedom
## Multiple R-Squared: 0.2125, Adjusted R-squared: -0.05006
## F-statistic: 0.8093 on 6 and 18 DF, p-value: 0.5761
##
##
##
## Covariance matrix of residuals:
## IED_FlowsMXN Exchange_Rate Innovation Education Daily_Salary
## IED_FlowsMXN 0.082293 0.030536 0.010239 0.006578 0.061469
## Exchange_Rate 0.030536 0.040387 0.003498 0.004637 0.073462
## Innovation 0.010239 0.003498 0.004460 0.001349 0.009137
## Education 0.006578 0.004637 0.001349 0.001594 0.010602
## Daily_Salary 0.061469 0.073462 0.009137 0.010602 0.175622
## ExportsMXN 0.033433 0.050709 0.005536 0.006268 0.101379
## ExportsMXN
## IED_FlowsMXN 0.033433
## Exchange_Rate 0.050709
## Innovation 0.005536
## Education 0.006268
## Daily_Salary 0.101379
## ExportsMXN 0.071629
##
## Correlation matrix of residuals:
## IED_FlowsMXN Exchange_Rate Innovation Education Daily_Salary
## IED_FlowsMXN 1.0000 0.5297 0.5344 0.5743 0.5113
## Exchange_Rate 0.5297 1.0000 0.2607 0.5780 0.8723
## Innovation 0.5344 0.2607 1.0000 0.5058 0.3265
## Education 0.5743 0.5780 0.5058 1.0000 0.6337
## Daily_Salary 0.5113 0.8723 0.3265 0.6337 1.0000
## ExportsMXN 0.4355 0.9428 0.3097 0.5866 0.9039
## ExportsMXN
## IED_FlowsMXN 0.4355
## Exchange_Rate 0.9428
## Innovation 0.3097
## Education 0.5866
## Daily_Salary 0.9039
## ExportsMXN 1.0000
VAR_model_residuals<-data.frame(residuals(VAR_model))
adf.test(VAR_model_residuals$IED_FlowsMXN)
##
## Augmented Dickey-Fuller Test
##
## data: VAR_model_residuals$IED_FlowsMXN
## Dickey-Fuller = -1.8808, Lag order = 2, p-value = 0.6178
## alternative hypothesis: stationary
# The p-value is higher than 0.05, so we reject H0, and it can be concluded that residuals are non-stationary.
The p-value is higher than 0.05, so we fail to reject H0, and it can be concluded that residuals are non-stationary.
Box.test(VAR_model_residuals$IED_FlowsMXN,lag=2,type="Ljung-Box")
##
## Box-Ljung test
##
## data: VAR_model_residuals$IED_FlowsMXN
## X-squared = 0.51198, df = 2, p-value = 0.7741
The p-value is greater than 0.05, so there is not enough evidence to conclude that there is autocorrelation in the residuals.
Important contribution
The assignment originally requested only one model. However, it’s important to mention that various tests were conducted outside of this document, and the results did not meet the expectations.
The variables used in the model appear to be non-stationary. In an effort to make them stationary and estimate a suitable VAR model, they were used in their original format, transformed using logarithms, and also with differences in combination with logarithms. None of these transformations resulted in a p-value lower than the expected threshold of 0.05. However, the last transformation did manage to make at least two variables stationary and yielded p-values closer to 0.05. This is why we are opting to use the previously constructed model with this variable transformation.
It’s crucial to understand that the results and forecasts could be inaccurate or incorrect. Despite the transformations, the model indicates that the residuals are non-stationary, and some of the variables are also non-stationary, which may lead to unexpected results. For the purposes of learning, the forecast will be made using this model.
Equation-Specific Results: - The output provides results for each endogenous variable equation in the VAR model. We are going to interpret the statistically significant variables for each equation.
IED_FlowsMXN Equation: (Main dependent Variable) In this equation, it is possible to see that dependent variable has a negative coefficient for its own laggeds values (“IED_FlowsMXN.l1”). This suggests that past values of the FDI negatively affect the current Flow.
Independent variables that have negative impact on IED FlowsMXN are:
Independent variables that have positive impact on IED FlowsMXN are:
It is important to understand that l1 is for the number of lags for each variable.
For this equation, IED_FlowsMXN.l1 and Innovation are significant.
Exchange_Rate equation
There are not significant variables for this equation, but it is possible to see that only education and Daily Salary have a negative impact.
Innovation equation
Exports has a positive impact in Innovation, and it is significant.
Education equation
Daily Salary has a negative impact in Education, and it is signficant.
Daily Salary and ExportsMXN equation
They seem to not have significant variables. For Daily Salary, it is possible to see that its own lagged value has a negative impact into the current one. For Exports, Education and Daily Salary have a negative impact.
granger_IED<-causality(VAR_model,cause="IED_FlowsMXN")
granger_IED
## $Granger
##
## Granger causality H0: IED_FlowsMXN do not Granger-cause Exchange_Rate
## Innovation Education Daily_Salary ExportsMXN
##
## data: VAR object VAR_model
## F-Test = 0.40091, df1 = 5, df2 = 108, p-value = 0.8473
##
##
## $Instant
##
## H0: No instantaneous causality between: IED_FlowsMXN and Exchange_Rate
## Innovation Education Daily_Salary ExportsMXN
##
## data: VAR object VAR_model
## Chi-squared = 9.1345, df = 5, p-value = 0.1038
Having a p-value of 0.85, we do not have enough evidence to say that IED FlowsMXN cause a significant effect on one or more of the selected variables. Failing to reject H0.
forecast <- predict(VAR_model, n.ahead = 5, ci = 0.95)
forecast
## $IED_FlowsMXN
## fcst lower upper CI
## [1,] 1.389598 0.8273475 1.951848 0.5622503
## [2,] -2.431468 -3.3326322 -1.530304 0.9011642
## [3,] 4.180559 2.7832799 5.577838 1.3972791
## [4,] -6.478082 -8.7133681 -4.242795 2.2352864
## [5,] 9.901374 6.3512699 13.451478 3.5501039
##
## $Exchange_Rate
## fcst lower upper CI
## [1,] 1.147011 0.753129 1.5408936 0.3938823
## [2,] -1.458362 -2.090430 -0.8262944 0.6320678
## [3,] 2.138041 1.236927 3.0391555 0.9011144
## [4,] -3.132163 -4.412720 -1.8516062 1.2805570
## [5,] 4.759891 2.899986 6.6197956 1.8599046
##
## $Innovation
## fcst lower upper CI
## [1,] -0.2638772 -0.3947669 -0.1329875 0.1308897
## [2,] 0.6687931 0.4616135 0.8759726 0.2071796
## [3,] -1.0198730 -1.3572786 -0.6824673 0.3374056
## [4,] 1.4743489 0.9241992 2.0244987 0.5501497
## [5,] -2.1896120 -3.0284841 -1.3507400 0.8388720
##
## $Education
## fcst lower upper CI
## [1,] 0.3140807 0.2358304 0.3923311 0.0782504
## [2,] -0.4436275 -0.6013553 -0.2858997 0.1577278
## [3,] 0.6830417 0.4343205 0.9317629 0.2487212
## [4,] -1.0153094 -1.3964988 -0.6341201 0.3811893
## [5,] 1.5475056 0.9682835 2.1267276 0.5792220
##
## $Daily_Salary
## fcst lower upper CI
## [1,] 3.126870 2.305501 3.948238 0.8213684
## [2,] -4.481986 -6.020517 -2.943455 1.5385312
## [3,] 6.772911 4.298986 9.246835 2.4739246
## [4,] -10.070586 -13.865310 -6.275862 3.7947240
## [5,] 15.330669 9.579694 21.081643 5.7509747
##
## $ExportsMXN
## fcst lower upper CI
## [1,] 2.552765 2.028208 3.077323 0.5245575
## [2,] -3.777625 -4.938087 -2.617163 1.1604624
## [3,] 5.644051 3.651657 7.636446 1.9923944
## [4,] -8.303844 -11.423026 -5.184662 3.1191823
## [5,] 12.597005 7.857236 17.336774 4.7397686
# Revertir las transformaciones log y diff
forecast$fcst$IED_FlowsMXN <- exp(forecast$fcst$IED_FlowsMXN) # Reverting log
#Reverting differences for forecasts
forecast$fcst$IED_FlowsMXN[1]<-forecast$fcst$IED_FlowsMXN[1] + tail(data1$IED_FlowsMXN, 1)
forecast$fcst$IED_FlowsMXN[2]<-forecast$fcst$IED_FlowsMXN[2] + forecast$fcst$IED_FlowsMXN[1]
forecast$fcst$IED_FlowsMXN[3]<-forecast$fcst$IED_FlowsMXN[3] + forecast$fcst$IED_FlowsMXN[2]
forecast$fcst$IED_FlowsMXN[4]<-forecast$fcst$IED_FlowsMXN[4] + forecast$fcst$IED_FlowsMXN[3]
forecast$fcst$IED_FlowsMXN[5]<-forecast$fcst$IED_FlowsMXN[5] + forecast$fcst$IED_FlowsMXN[4]
fanchart(forecast,names="IED_FlowsMXN",main="Foreign Direct Investment",xlab="Time Period",ylab="FDI")
tsplot(forecast)
forecast$fcst$IED_FlowsMXN
## fcst lower upper CI
## [1,] 555775.9 2.287244e+00 7.041690e+00 1.754616
## [2,] 555776.0 3.569901e-02 2.164699e-01 2.462468
## [3,] 555841.4 1.617198e+01 2.644992e+02 4.044181
## [4,] 555841.4 1.643737e-04 1.436737e-02 9.349159
## [5,] 575799.2 5.732202e+02 6.948684e+05 34.816936
When generating a forecast with this model, we can obtain an estimate of what the FDI Flows for the next 5 periods could be. Taking into account a 95% confidence level, these values might be close as follows
Period 1: Price close to $555775.9
Period 2: Price close to $555776.0
Period 3: Price close to $555841.4
Period 4: Price close to $555841.4
Period 5: Price close to $575799.2
It is important to consider the “Important contribution” section.
#This part is to add the forecasted values into data, it is the only way to show them in a tsplot
tsplot<-read.csv("C:\\Users\\danyb\\OneDrive - Instituto Tecnologico y de Estudios Superiores de Monterrey\\Docs\\Documentos\\Business Intelligence\\Quinto Semestre\\Introduction to Econometrics\\data_sp1.csv")
tsplot$Period <- as.Date(tsplot$Period)
fixed<-ts(tsplot$IED_FlowsMXN,start=c(1996),end=c(2022),frequency=1)
tsplot(fixed)
plot_ly(data = tsplot, x = ~Period, y = ~IED_FlowsMXN, type = "scatter", mode = "lines") %>%
layout(title = "IED_Flows",
xaxis = list(title = "Period"),
yaxis = list(title = "IED_Flows")) %>%
add_trace(text = ~paste("IED Flows: $", IED_FlowsMXN, "Date:", Period),
hoverinfo = "text")
In this plot the forecasted values were added. They are the ones for 2023, 2024, 2025, 2026 and 2027.
When using both the ARIMA and VAR models for forecasting, it becomes apparent that the expectation is for foreign direct investment flows to gradually increase in the coming years. Furthermore, it is crucial to emphasize in this analysis that the variables of innovation, education, exports, and daily wage can influence each other, in addition to their impact on the dependent variable, which is Foreign Direct Investment (FDI). Analyzing these interactions between variables plays a pivotal role in forecasting, as it allows us to understand how independent variables interact with each other and determine the type of impact, whether positive or negative, these variables may have on the dependent variable and among themselves. This understanding is essential for making informed decisions when making accurate forecasts.
The recommendation I would make is that, before making decisions, it would be advisable to make an effort to gather more data for the variables used in the analysis. Currently, these variables are on an annual basis, and it could be beneficial to have them on a monthly or at least quarterly basis. This would provide a more extensive dataset, leading to a more accurate analysis and guiding more informed decision-making.
If obtaining more frequent data is not feasible, it’s crucial to pay special attention to the variables that exhibit a significant impact on foreign direct investment flows. By analyzing their behavior, it is possible to measure this impact and guide actions to either increase or decrease the values in these variables, depending on the positive or negative impact they may have.
Principales Tendencias en el Comercio Fronterizo: El Auge del Nearshoring en México | Nuvocargo. (2023). Nuvocargo.com. https://www.nuvocargo.com/es/content/blog-posts/principales-tendencias-en-el-comercio-fronterizo-el-auge-del-nearshoring-en-mexico#:~:text=Un%20gran%20ejemplo%20de%20la,General%20Motors%2C%20Volkswagen%2C%20etc.
Time Series Analysis: Definition, Types, Techniques, and When It’s Used. (2023). Tableau. https://www.tableau.com/learn/articles/time-series-analysis#why-its-used