Time series analysis is a way analyze and interpret data collected over time b. It which evaluates the beahvior of one or more variables thoughout time in order to identify trends and patterns, to the later on make forecasts and predictions. All of this based on historical data.
Time series can be decomposed into four components, each expressing a particular aspect of the movement of the values of the time series.
(Springer Ebooks, 2008)
Nearshoring in Mexico has become a very popular practice, especially to companies in the USA and Canada. Mexico’s proximity to these countries, in addition to the T-MEC agreement and its competitive cost of skilled labor force, represent a big opportunity and attractive destination for nearshoring. Aside from the aspects already mentioned, the political and commercial conflicts between US and China have caused Mexico ́s US imports to rise and of course, American companies have migrated from offshoring in Asia, to nearshoring in Mexico.
This is why, according to Morgan Stanley, “Mexico Is Poised to Ride the Nearshoring Wave” (2023).
Maria, an econometrics analyst in a Mexican company, plans to use official data sources to create a database exploring conditions influencing nearshoring in Mexico, including socioeconomic, business, technological, environmental, and security factors. She aims to apply econometric models to predict the effects of nearshoring in Mexico, understand why Mexico is attractive for nearshoring, and identify opportunities for new business relocation in the country.
library(xts)
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(dplyr)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:xts':
##
## first, last
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(zoo)
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(stats)
library(forecast)
library(astsa)
##
## Attaching package: 'astsa'
## The following object is masked from 'package:forecast':
##
## gas
library(corrplot)
## corrplot 0.92 loaded
library(AER)
## Loading required package: car
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## Loading required package: lmtest
## Loading required package: sandwich
## Loading required package: survival
library(vars)
## Loading required package: MASS
##
## Attaching package: 'MASS'
## The following object is masked from 'package:dplyr':
##
## select
## Loading required package: strucchange
## Loading required package: urca
library(dynlm)
library(vars)
library(TSstudio)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.3 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ stringr::boundary() masks strucchange::boundary()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first() masks xts::first()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks xts::last()
## ✖ car::recode() masks dplyr::recode()
## ✖ MASS::select() masks dplyr::select()
## ✖ purrr::some() masks car::some()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(sarima)
## Loading required package: stats4
##
## Attaching package: 'sarima'
##
## The following object is masked from 'package:astsa':
##
## sarima
##
## The following object is masked from 'package:stats':
##
## spectrum
library(dygraphs)
library("readxl")
library(Hmisc) # Imputation
##
## Attaching package: 'Hmisc'
##
## The following objects are masked from 'package:dplyr':
##
## src, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, units
ts_data <- read_excel("C:\\Users\\maria\\OneDrive\\Desktop\\AD 2023\\ECONOMETRICS\\SP_DataMexicoAtractiveness_alumn-VF.xlsx", sheet = 'ts.data')
ts_data
## # A tibble: 96 × 3
## Year Trimestre IED_Flujos
## <dbl> <chr> <dbl>
## 1 1999 I 3596.
## 2 1999 II 3396.
## 3 1999 III 3028.
## 4 1999 IV 3940.
## 5 2000 I 4601.
## 6 2000 II 4857.
## 7 2000 III 3057.
## 8 2000 IV 5734.
## 9 2001 I 3599.
## 10 2001 II 5219.
## # ℹ 86 more rows
# Assuming your data frame is called 'df'
ts_data <- ts_data %>%
mutate(Month = case_when(
Trimestre == "I" ~ "01",
Trimestre == "II" ~ "04",
Trimestre == "III" ~ "07",
Trimestre == "IV" ~ "10",
TRUE ~ NA_character_
)) %>%
mutate(Date = as.Date(paste(Year, Month, "01", sep = "-"), format = "%Y-%m-%d"))
ts_data
## # A tibble: 96 × 5
## Year Trimestre IED_Flujos Month Date
## <dbl> <chr> <dbl> <chr> <date>
## 1 1999 I 3596. 01 1999-01-01
## 2 1999 II 3396. 04 1999-04-01
## 3 1999 III 3028. 07 1999-07-01
## 4 1999 IV 3940. 10 1999-10-01
## 5 2000 I 4601. 01 2000-01-01
## 6 2000 II 4857. 04 2000-04-01
## 7 2000 III 3057. 07 2000-07-01
## 8 2000 IV 5734. 10 2000-10-01
## 9 2001 I 3599. 01 2001-01-01
## 10 2001 II 5219. 04 2001-04-01
## # ℹ 86 more rows
# - Briefly describe the dataset’s selected variables
ts_data
## # A tibble: 96 × 5
## Year Trimestre IED_Flujos Month Date
## <dbl> <chr> <dbl> <chr> <date>
## 1 1999 I 3596. 01 1999-01-01
## 2 1999 II 3396. 04 1999-04-01
## 3 1999 III 3028. 07 1999-07-01
## 4 1999 IV 3940. 10 1999-10-01
## 5 2000 I 4601. 01 2000-01-01
## 6 2000 II 4857. 04 2000-04-01
## 7 2000 III 3057. 07 2000-07-01
## 8 2000 IV 5734. 10 2000-10-01
## 9 2001 I 3599. 01 2001-01-01
## 10 2001 II 5219. 04 2001-04-01
## # ℹ 86 more rows
# As we can observe, our relevant variable (also dependent variable) is IED Flujos. This is the Foreign Direct Investment in Mexico.
# We already fixed the date format in order for us to code easily, but we will be analyzing this variable behavior thoughout the last years.
# - Plot the variable IED_Flujos using a time series format
# time series plot 2
df<-xts(ts_data$IED_Flujos,order.by=ts_data$Date)
plot(df)
# time series plot 3
dygraph(df, main = "Foreign Direct Investment") %>%
dyOptions(colors = RColorBrewer::brewer.pal(4, "Dark2"))
#We can observe how volatile this behavior has been during the last 20 years. There can be so much change (positive or negative) from one quarter to another.
# - decompose the time series data into trend, seasonal, and random components. Briefly, describe the decomposition time series plot. Do the time series data show a trend? Do the time series data show seasonality?
df2<-ts(ts_data$IED_Flujos,frequency=4,start=c(1999,1))
data_ts_decompose<-decompose(df2)
plot(data_ts_decompose)
# After decomposing the graph we can interpret the following:
# - Observed component: We can observe a IED with low volatility during the first years, but still with some random peaks.Then in around 2013/2014 a sudden elevation of investment came to Mexico. From there the volatility continued but with more extreme values.
# - Trend component: In the long run, we can identify an increasing trend, slow but existing. As mentioned before, we can see the high level investment made in 2013/2014 which helped a lot to the positive trend.
# - Seasonal component: We can clearly observe the behavoir or the seasonal part. One year its at its peak, the next drops, and so on so forth.
# - Random component: We can identify a steady behavior with just a few fluncuations.
#- detect the presence of stationary
adf.test(ts_data$IED_Flujos)
## Warning in adf.test(ts_data$IED_Flujos): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: ts_data$IED_Flujos
## Dickey-Fuller = -4.1994, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#H0: Non-stationary and HA: Stationary.
#p-values < 0.05 reject the H0.
#P-Value = 0.01. Reject the H0.
# Time series data IS stationary.
#-detect the presence of serial autocorrelation
acf(ts_data$IED_Flujos,main="Significant Autocorrelations")
# In this graph we can observe partial auto-correlation, since even though most of the values fit inside the blue dotted lines (showing no autocorrelation), there are some exceptions.
#ARMA
# ARMA
# time series modeling
summary(df_ARMA<-arma(log(ts_data$IED_Flujos),order=c(1,1)))
##
## Call:
## arma(x = log(ts_data$IED_Flujos), order = c(1, 1))
##
## Model:
## ARMA(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41234 -0.35244 -0.00571 0.27709 1.50760
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## ar1 -0.2977 0.2271 -1.311 0.18987
## ma1 0.5174 0.1999 2.588 0.00964 **
## intercept 11.3157 1.9793 5.717 1.08e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Fit:
## sigma^2 estimated as 0.2688, Conditional Sum-of-Squares = 25.26, AIC = 152.3
plot(df_ARMA)
estimated_IED<-exp(df_ARMA$fitted.values)
plot(estimated_IED)
# model evaluation
# Testing serial autocorrelation in regression residuals
# Ho: There is no serial autocorrelation
# Ha: There is serial autocorrelation
ARMA_residuals<-df_ARMA$residuals
Box.test(ARMA_residuals,lag=5,type="Ljung-Box")
##
## Box-Ljung test
##
## data: ARMA_residuals
## X-squared = 13.689, df = 5, p-value = 0.01771
# this test shows residual autocorrelation. with a p-value of 0.01771 we accept Ho.
# There is no serial autocorrelation
adf.test(na.omit(estimated_IED))
## Warning in adf.test(na.omit(estimated_IED)): p-value smaller than printed
## p-value
##
## Augmented Dickey-Fuller Test
##
## data: na.omit(estimated_IED)
## Dickey-Fuller = -4.1127, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#Ho: non stationary
#H1: stationary
# With a p value of 0.01 we have strong evidence to reject the Ho, so we can affirm our data is stationary
#ARIMA
# ARIMA
df_ARIMA <- Arima(ts_data$IED_Flujos,order=c(1,1,1))
print(df_ARIMA)
## Series: ts_data$IED_Flujos
## ARIMA(1,1,1)
##
## Coefficients:
## ar1 ma1
## -0.0520 -0.9399
## s.e. 0.1072 0.0323
##
## sigma^2 = 15868763: log likelihood = -922.46
## AIC=1850.91 AICc=1851.17 BIC=1858.57
plot(df_ARIMA$residuals,main="ARIMA(1,1,1) - Foreign Direct Ivestment")
#Autocorrelation
acf(df_ARIMA$residuals,main="ACF - ARIMA (1,1,1)")
# ACF plot displays partial autocorrelation as well, however there is a slight more correlation comparing to ARMA model.
Box.test(df_ARIMA$residuals,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: df_ARIMA$residuals
## X-squared = 0.2818, df = 1, p-value = 0.5955
# Ho: There is no serial autocorrelation
# Ha: There is serial autocorrelation
# P-value is 0.5955 > 0.05 indicating that ARIMA model DOES show residual serial autocorrelation.
adf.test(df_ARIMA$residuals)
## Warning in adf.test(df_ARIMA$residuals): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: df_ARIMA$residuals
## Dickey-Fuller = -4.3162, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
#Ho: non stationary
#H1: stationary
# ADF test shows a p-value of 0.01, which suggest that ARIMA residuals are NOT stationary since p-value is < 0.05.
After modelling both ARMA and ARIMA, we can confidently say that the model that fits best our data is ARMA. - For starters, our data was initally stationary, which is recommended to use ARMA in these cases. - Second, ARMA showed no serial autocorrelation neither stationarity, which are desirable characteristics for the model. (We can see how the ARIMA model showed the oppposite of what we expected)
# - By using the selected model, make a forecast for the next 5 periods. In doing so, include a time series plot showing your forecast.
# forecast (ARIMA)
df_ARIMA_forecast<-forecast(df_ARIMA,h=5)
df_ARIMA_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 97 8227.637 3122.496 13332.78 419.99682 16035.28
## 98 7892.280 2786.970 12997.59 84.38254 15700.18
## 99 7909.715 2795.322 13024.11 87.92538 15731.50
## 100 7908.808 2786.143 13031.47 74.36801 15743.25
## 101 7908.856 2777.891 13039.82 61.72284 15755.99
plot(df_ARIMA_forecast)
autoplot(df_ARIMA_forecast)
#We can observe the plot which shows the predicted values for Foreign Investment for the next time periods.
data<-read_excel("C:\\Users\\maria\\OneDrive\\Desktop\\AD 2023\\ECONOMETRICS\\SP_DataMexicoAtractiveness_alumn-VF.xlsx", sheet = 'Datos')
head(data)
## # A tibble: 6 × 15
## Year IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1997 12146. 9088. NA 7.20 24.3 11.3
## 2 1998 8374. 9875. NA 7.31 31.9 11.4
## 3 1999 13960. 10990. NA 7.43 31.9 12.5
## 4 2000 18249. 12483. 97.8 7.56 35.1 13.1
## 5 2001 30057. 11300. 97.4 7.68 37.6 13.5
## 6 2002 24099. 11923. 97.7 7.80 39.7 12.8
## # ℹ 8 more variables: Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>,
## # Tipo_de_Cambio <dbl>, Densidad_Carretera <dbl>, Densidad_Poblacion <dbl>,
## # CO2_Emisiones <dbl>, PIB_Per_Capita <dbl>, INPC <dbl>
#VARts$Date <- as.Date(paste0(VARts$Year, "-12-31"))
#head(VARts)
# Search for N/A in data (df1)
sum(is.na(data))
## [1] 12
glimpse(data)
## Rows: 26
## Columns: 15
## $ Year <dbl> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, …
## $ IED_Flujos <dbl> 12145.60, 8373.50, 13960.32, 18248.69, 30057.18,…
## $ Exportaciones <dbl> 9087.616, 9875.065, 10990.013, 12482.963, 11300.…
## $ Empleo <dbl> NA, NA, NA, 97.83000, 97.36000, 97.66000, 97.060…
## $ Educacion <dbl> 7.197968, 7.311587, 7.427924, 7.560000, 7.678270…
## $ Salario_Diario <dbl> 24.30, 31.91, 31.91, 35.12, 37.57, 39.74, 41.53,…
## $ Innovacion <dbl> 11.30141, 11.37221, 12.45897, 13.14556, 13.46812…
## $ Inseguridad_Robo <dbl> 266.5065, 314.7762, 272.8936, 216.9808, 214.5269…
## $ Inseguridad_Homicidio <dbl> 14.554142, 14.319399, 12.641071, 10.857846, 10.2…
## $ Tipo_de_Cambio <dbl> 8.0640, 9.9395, 9.5222, 9.5997, 9.1692, 10.3613,…
## $ Densidad_Carretera <dbl> 0.05205217, 0.05295475, 0.05501698, 0.05522774, …
## $ Densidad_Poblacion <dbl> 47.43650, 48.76163, 49.48089, 50.57930, 51.27675…
## $ CO2_Emisiones <dbl> 3.675330, 3.853045, 3.686918, 3.874147, 3.811393…
## $ PIB_Per_Capita <dbl> 127570.1, 126738.8, 129164.7, 130874.9, 128083.4…
## $ INPC <dbl> 33.27987, 39.47297, 44.33552, 48.30767, 50.43490…
# Search for N/A in ts_data (df2)
sum(is.na(data))
## [1] 12
# Identify columns mith missing values
missing <- colSums(is.na(data)) > 0
missing_col <- names(data)[missing]
missing_col
## [1] "Empleo" "Educacion" "Innovacion"
## [4] "Inseguridad_Homicidio" "CO2_Emisiones"
#After examining our dataset, we noticed that there are several N/A, which for the sake of our model we need to imputate in order for it to be more precise. We will be using a Mchine Learning model to predict missing values.
# Llibrary(Hmisc) allows for multiple imputation with bootstrapping, additive regression, and predictive mean matching
# The named variables specifies the predictors for imputation
data_imp <- aregImpute(~ Year + IED_Flujos + Exportaciones + Empleo + Educacion + Salario_Diario + Innovacion + Inseguridad_Robo + Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera + Densidad_Poblacion + CO2_Emisiones + PIB_Per_Capita + INPC,
n.impute = 2, # specifies to generate one imputed dataset
type = 'pmm', # specifies imputation method (predictive mean matching)
data = data) # your data frame
## Iteration 1
Iteration 2
Iteration 3
Iteration 4
Iteration 5
# Display the information of the model we created
# We can see which columns have N/A
# The R-squared shows the quality of the impuitation models. The higher the value (closer to 1), the better prediction.
data_imp
##
## Multiple Imputation using Bootstrap and PMM
##
## aregImpute(formula = ~Year + IED_Flujos + Exportaciones + Empleo +
## Educacion + Salario_Diario + Innovacion + Inseguridad_Robo +
## Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera +
## Densidad_Poblacion + CO2_Emisiones + PIB_Per_Capita + INPC,
## data = data, n.impute = 2, type = "pmm")
##
## n: 26 p: 15 Imputations: 2 nk: 3
##
## Number of NAs:
## Year IED_Flujos Exportaciones
## 0 0 0
## Empleo Educacion Salario_Diario
## 3 3 0
## Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 2 0 1
## Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## 0 0 0
## CO2_Emisiones PIB_Per_Capita INPC
## 3 0 0
##
## type d.f.
## Year s 2
## IED_Flujos s 2
## Exportaciones s 2
## Empleo s 2
## Educacion s 2
## Salario_Diario s 2
## Innovacion s 2
## Inseguridad_Robo s 2
## Inseguridad_Homicidio s 2
## Tipo_de_Cambio s 2
## Densidad_Carretera s 2
## Densidad_Poblacion s 2
## CO2_Emisiones s 1
## PIB_Per_Capita s 2
## INPC s 2
##
## Transformation of Target Variables Forced to be Linear
##
## R-squares for Predicting Non-Missing Values for Each Variable
## Using Last Imputations of Predictors
## Empleo Educacion Innovacion
## 1 1 1
## Inseguridad_Homicidio CO2_Emisiones
## 1 1
# Replace de N/A of each column with the imputation model values
# The column Education was not applied, since the values the model gave us, dind math the tendency of the data.
# For education we used another type of impintation method
data$Empleo[is.na(data$Empleo)] <- data_imp$imputed$Empleo
## Warning in data$Empleo[is.na(data$Empleo)] <- data_imp$imputed$Empleo: number
## of items to replace is not a multiple of replacement length
data$Innovacion[is.na(data$Innovacion)] <- data_imp$imputed$Innovacion
## Warning in data$Innovacion[is.na(data$Innovacion)] <-
## data_imp$imputed$Innovacion: number of items to replace is not a multiple of
## replacement length
data$Inseguridad_Homicidio[is.na(data$Inseguridad_Homicidio)] <- data_imp$imputed$Inseguridad_Homicidio
## Warning in data$Inseguridad_Homicidio[is.na(data$Inseguridad_Homicidio)] <-
## data_imp$imputed$Inseguridad_Homicidio: number of items to replace is not a
## multiple of replacement length
data$CO2_Emisiones[is.na(data$CO2_Emisiones)] <- data_imp$imputed$CO2_Emisiones
## Warning in data$CO2_Emisiones[is.na(data$CO2_Emisiones)] <-
## data_imp$imputed$CO2_Emisiones: number of items to replace is not a multiple of
## replacement length
#verify the changes made
data
## # A tibble: 26 × 15
## Year IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1997 12146. 9088. 96.9 7.20 24.3 11.3
## 2 1998 8374. 9875. 96.9 7.31 31.9 11.4
## 3 1999 13960. 10990. 97.8 7.43 31.9 12.5
## 4 2000 18249. 12483. 97.8 7.56 35.1 13.1
## 5 2001 30057. 11300. 97.4 7.68 37.6 13.5
## 6 2002 24099. 11923. 97.7 7.80 39.7 12.8
## 7 2003 18250. 13156 97.1 7.93 41.5 11.8
## 8 2004 25016. 13573. 96.5 8.04 43.3 12.6
## 9 2005 25796. 16466. 97.2 8.14 45.2 13.4
## 10 2006 21233. 17486. 96.5 8.26 47.0 14.2
## # ℹ 16 more rows
## # ℹ 8 more variables: Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>,
## # Tipo_de_Cambio <dbl>, Densidad_Carretera <dbl>, Densidad_Poblacion <dbl>,
## # CO2_Emisiones <dbl>, PIB_Per_Capita <dbl>, INPC <dbl>
# For the imputation we used a linear regression prediction in order to follow the pattern of the data
imp_edu <- approx(seq_along(data$Educacion),data$Educacion, method = "linear", n = length(data$Educacion))$y
data$Educacion <- imp_edu
#We can see how we fixed our missing value issue, so we can proceed with our code.
data
## # A tibble: 26 × 15
## Year IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1997 12146. 9088. 96.9 7.20 24.3 11.3
## 2 1998 8374. 9875. 96.9 7.30 31.9 11.4
## 3 1999 13960. 10990. 97.8 7.40 31.9 12.5
## 4 2000 18249. 12483. 97.8 7.51 35.1 13.1
## 5 2001 30057. 11300. 97.4 7.62 37.6 13.5
## 6 2002 24099. 11923. 97.7 7.73 39.7 12.8
## 7 2003 18250. 13156 97.1 7.84 41.5 11.8
## 8 2004 25016. 13573. 96.5 7.94 43.3 12.6
## 9 2005 25796. 16466. 97.2 8.05 45.2 13.4
## 10 2006 21233. 17486. 96.5 8.13 47.0 14.2
## # ℹ 16 more rows
## # ℹ 8 more variables: Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>,
## # Tipo_de_Cambio <dbl>, Densidad_Carretera <dbl>, Densidad_Poblacion <dbl>,
## # CO2_Emisiones <dbl>, PIB_Per_Capita <dbl>, INPC <dbl>
#Verify there isnt any N/A
sum(is.na(data))
## [1] 0
#data$Date <- as.Date(paste0(data$Year, "-12-31"))
#head(data)
#library(lubridate)
# Assuming your date variable is named 'date' in the VARts data frame
#data$Year1 <- year(data$Date)
#head(data)
From the time series dataset, select the explanatory variables that might explain the Nearshoring in Mexico
We are going to evaluate the data set variables: -IED_Flujos -Exportaciones -Empleo -Innovacion -Inseguridad_Robo -PIB_Per_Capita -INPC
#- Describe the hypothetical relationship / impact between each selected factor and the dependent variable IED_Flujos. For example, how does the exchange rate increase / reduce the foreign direct investment flows in Mexico?
# - In describing the above relationships, please include a time series plot that displays the selected variables’ performance over the time period.
### converting to time series format
Exportaciones<-ts(data$Exportaciones,start=c(1997,1),end=c(2022,4),frequency=1)
Empleo<-ts(data$Empleo,start=c(1997,1),end=c(2022,4),frequency=1)
Innovacion<-ts(data$Innovacion,start=c(1997,1),end=c(2022,4),frequency=1)
Inseguridad_Robo<-ts(data$Inseguridad_Robo,start=c(1997,1),end=c(2022,4),frequency=1)
PIB_Per_Capita<-ts(data$PIB_Per_Capita,start=c(1997,1),end=c(2022,4),frequency=1)
IED_Flujos<-ts(data$IED_Flujos,start=c(1997,1),end=c(2022,4),frequency=1)
# plotting time series data
par(mfrow=c(2,3))
plot(data$Year,data$Exportaciones,type="l",col="blue",lwd=2,xlab="Date",ylab="Exportations",main="Exportations")
plot(data$Year,data$Empleo,type="l",col="blue",lwd=2,xlab="Date",ylab="Employment",main="Employment ")
plot(data$Year,data$Innovacion,type="l",col="blue",lwd=2,xlab="Date",ylab="Innovation",main="Innovation")
plot(data$Year,data$Inseguridad_Robo,type="l",col="blue",lwd=2,xlab="Date",ylab="Theft Insecurity",main="Theft Insecurity")
plot(data$Year,data$PIB_Per_Capita,type="l",col="blue",lwd=2,xlab="Date",ylab="GDP",main="GDP")
plot(data$Year,data$IED_Flujos,type="l",col="blue",lwd=2,xlab="Date",ylab="Foreign Investment",main="Foreign Investment")
#Exportations: With this variable we can observe clarly positive trend, however the slope is not e=close to the one in our dependent variable, Foreign Investments.
#Employment: Employment seems to be not som much correlated with our dependent variable. At first glance, we dont see a pattern.
#Innovation: With innvoation, we can observe how the peaks in thhis variable, as well as in our dependent variable match. So we can say that when there is a significant increase in Innovation, Foreign Investments also inrcease.
#Theft Insecurity: We can observe how this variable and our dependent variable have opposite slopes, howerver there isny enough evidence yet to get a relationship between these 2.
#GDP: With this last variable we can identify some type of correlation, both in 2010 started increasing and after FI reached its peak, the GDP also got there.
# it is important to assess whether the variables under study are stationary or not
adf.test(data$Exportaciones) # non-stationary (p-value > 0.05)
##
## Augmented Dickey-Fuller Test
##
## data: data$Exportaciones
## Dickey-Fuller = -0.88228, Lag order = 2, p-value = 0.9379
## alternative hypothesis: stationary
adf.test(data$Empleo) # non-stationary (p-value > 0.05)
##
## Augmented Dickey-Fuller Test
##
## data: data$Empleo
## Dickey-Fuller = -1.0105, Lag order = 2, p-value = 0.919
## alternative hypothesis: stationary
adf.test(data$Innovacion) # non-stationary (p-value > 0.05)
##
## Augmented Dickey-Fuller Test
##
## data: data$Innovacion
## Dickey-Fuller = -2.8406, Lag order = 2, p-value = 0.2521
## alternative hypothesis: stationary
adf.test(data$Inseguridad_Robo) # non-stationary (p-value > 0.05)
##
## Augmented Dickey-Fuller Test
##
## data: data$Inseguridad_Robo
## Dickey-Fuller = -3.3724, Lag order = 2, p-value = 0.08162
## alternative hypothesis: stationary
adf.test(data$IED_Flujos) # non-stationary (p-value > 0.05)
##
## Augmented Dickey-Fuller Test
##
## data: data$IED_Flujos
## Dickey-Fuller = -3.0832, Lag order = 2, p-value = 0.1597
## alternative hypothesis: stationary
adf.test(data$PIB_Per_Capita) # non-stationary (p-value > 0.05)
##
## Augmented Dickey-Fuller Test
##
## data: data$PIB_Per_Capita
## Dickey-Fuller = -2.5796, Lag order = 2, p-value = 0.3516
## alternative hypothesis: stationary
#Separate our usefull variables
VAR_ts<-cbind(IED_Flujos,PIB_Per_Capita,Innovacion, Empleo)
colnames(VAR_ts)<-cbind("IED_Flujos","PIB_Per_Capita","Innovacion", "Empleo")
VAR_ts
## Time Series:
## Start = 1997
## End = 2025
## Frequency = 1
## IED_Flujos PIB_Per_Capita Innovacion Empleo
## 1997 12145.60 127570.1 11.30141 96.85275
## 1998 8373.50 126738.8 11.37221 96.85275
## 1999 13960.32 129164.7 12.45897 97.83000
## 2000 18248.69 130874.9 13.14556 97.83000
## 2001 30057.18 128083.4 13.46812 97.36000
## 2002 24099.21 128205.9 12.79893 97.66000
## 2003 18249.97 128737.9 11.81075 97.06000
## 2004 25015.57 132563.5 12.60841 96.48000
## 2005 25795.82 132941.1 13.41442 97.17000
## 2006 21232.54 135894.9 14.23137 96.53000
## 2007 32393.33 137795.7 15.04295 96.60000
## 2008 29502.46 135176.0 14.81882 95.68000
## 2009 17849.95 131233.0 12.59250 95.20264
## 2010 27189.28 134991.7 12.69477 95.06220
## 2011 25632.52 138891.9 12.09530 95.49284
## 2012 21769.32 141530.2 13.02609 95.52654
## 2013 48354.42 144112.0 13.21576 95.74676
## 2014 30351.25 147277.4 13.65014 96.23509
## 2015 35943.75 149433.5 15.11449 96.04487
## 2016 31188.98 152275.4 14.39837 96.62297
## 2017 34017.05 153235.7 14.04673 96.85275
## 2018 34100.43 153133.8 13.24773 96.64163
## 2019 34577.16 150233.1 12.69960 97.09482
## 2020 28205.89 142609.3 11.27991 96.20963
## 2021 31553.52 142772.0 11.81075 96.48553
## 2022 36215.37 146826.7 11.30141 97.23665
## 2023 12145.60 127570.1 11.30141 96.85275
## 2024 8373.50 126738.8 11.37221 96.85275
## 2025 13960.32 129164.7 12.45897 97.83000
#Estimate a VAR_Model that includes at least 1 explanatory factor that might affect the dependent variable IED_Flujos.
# Generate the preferred lag order based on multiple iterations of the AIC.
lag_selection<-VARselect(VAR_ts,lag.max=5,type="const")
## Warning in log(sigma.det): NaNs produced
## Warning in log(sigma.det): NaNs produced
## Warning in log(sigma.det): NaNs produced
lag_selection$selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 4 4 1 5
lag_selection$criteria
## 1 2 3 4 5
## AIC(n) 3.311668e+01 3.328365e+01 3.334901e+01 3.224452e+01 NaN
## HQ(n) 3.337713e+01 3.375246e+01 3.402618e+01 3.313005e+01 NaN
## SC(n) 3.409840e+01 3.505073e+01 3.590146e+01 3.558234e+01 NaN
## FPE(n) 2.472558e+14 3.324397e+14 5.111942e+14 4.105436e+14 -0.07165178
# We estimate the VAR model. The p option refers to the number of lags used.
VAR_model1<-VAR(VAR_ts,p=1,type="const")
summary(VAR_model1)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: IED_Flujos, PIB_Per_Capita, Innovacion, Empleo
## Deterministic variables: const
## Sample size: 28
## Log Likelihood: -597.089
## Roots of the characteristic polynomial:
## 0.8579 0.8579 0.6042 0.02355
## Call:
## VAR(y = VAR_ts, p = 1, type = "const")
##
##
## Estimation results for equation IED_Flujos:
## ===========================================
## IED_Flujos = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -1.399e-01 2.448e-01 -0.571 0.5733
## PIB_Per_Capita.l1 5.741e-01 2.439e-01 2.354 0.0275 *
## Innovacion.l1 2.426e+03 1.365e+03 1.777 0.0889 .
## Empleo.l1 -2.416e+03 1.948e+03 -1.241 0.2273
## const 1.523e+05 1.951e+05 0.781 0.4430
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 7313 on 23 degrees of freedom
## Multiple R-Squared: 0.4855, Adjusted R-squared: 0.396
## F-statistic: 5.426 on 4 and 23 DF, p-value: 0.003157
##
##
## Estimation results for equation PIB_Per_Capita:
## ===============================================
## PIB_Per_Capita = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -1.598e-01 1.405e-01 -1.137 0.2672
## PIB_Per_Capita.l1 8.758e-01 1.400e-01 6.257 2.2e-06 ***
## Innovacion.l1 1.102e+03 7.836e+02 1.407 0.1729
## Empleo.l1 -2.659e+03 1.118e+03 -2.378 0.0261 *
## const 2.638e+05 1.120e+05 2.356 0.0274 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 4197 on 23 degrees of freedom
## Multiple R-Squared: 0.8137, Adjusted R-squared: 0.7812
## F-statistic: 25.11 on 4 and 23 DF, p-value: 4.206e-08
##
##
## Estimation results for equation Innovacion:
## ===========================================
## Innovacion = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -3.223e-05 2.717e-05 -1.186 0.248
## PIB_Per_Capita.l1 6.894e-06 2.707e-05 0.255 0.801
## Innovacion.l1 8.145e-01 1.515e-01 5.375 1.85e-05 ***
## Empleo.l1 -6.005e-02 2.162e-01 -0.278 0.784
## const 8.103e+00 2.166e+01 0.374 0.712
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.8118 on 23 degrees of freedom
## Multiple R-Squared: 0.5794, Adjusted R-squared: 0.5063
## F-statistic: 7.922 on 4 and 23 DF, p-value: 0.0003618
##
##
## Estimation results for equation Empleo:
## =======================================
## Empleo = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -1.144e-05 1.724e-05 -0.663 0.5137
## PIB_Per_Capita.l1 9.486e-06 1.718e-05 0.552 0.5861
## Innovacion.l1 -1.248e-01 9.615e-02 -1.298 0.2072
## Empleo.l1 7.587e-01 1.372e-01 5.531 1.26e-05 ***
## const 2.393e+01 1.374e+01 1.742 0.0949 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.515 on 23 degrees of freedom
## Multiple R-Squared: 0.632, Adjusted R-squared: 0.568
## F-statistic: 9.875 on 4 and 23 DF, p-value: 8.407e-05
##
##
##
## Covariance matrix of residuals:
## IED_Flujos PIB_Per_Capita Innovacion Empleo
## IED_Flujos 5.348e+07 1.559e+07 1.725e+03 463.88552
## PIB_Per_Capita 1.559e+07 1.762e+07 1.199e+03 887.16635
## Innovacion 1.725e+03 1.199e+03 6.590e-01 0.07798
## Empleo 4.639e+02 8.872e+02 7.798e-02 0.26525
##
## Correlation matrix of residuals:
## IED_Flujos PIB_Per_Capita Innovacion Empleo
## IED_Flujos 1.0000 0.5080 0.2906 0.1232
## PIB_Per_Capita 0.5080 1.0000 0.3517 0.4104
## Innovacion 0.2906 0.3517 1.0000 0.1865
## Empleo 0.1232 0.4104 0.1865 1.0000
# the lag on the dependent variable is negative, however it is not statistically significant
# GDP has a positive correlation with a significance of 0.01. It also has the biggest coefficient.
# Employment also has a significant relation with a value of 0.01, however it shows a negative correlation.
# -Detect if the estimated VAR_Model residuals are stationary.
VAR_model1_residuals<-data.frame(residuals(VAR_model1))
adf.test(VAR_model1_residuals$IED_Flujos)
##
## Augmented Dickey-Fuller Test
##
## data: VAR_model1_residuals$IED_Flujos
## Dickey-Fuller = -3.3617, Lag order = 3, p-value = 0.08227
## alternative hypothesis: stationary
#The test gives us a p-value of 0.093, which means that we reject Ho (data is non/stationary), which mean we accept H1, DATA IS STATIONARY .
#- Detect if the estimated VAR_Model residuals show serial autocorrelation.
Box.test(VAR_model1_residuals$IED_Flujos,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: VAR_model1_residuals$IED_Flujos
## X-squared = 0.19326, df = 1, p-value = 0.6602
# ho: no serial autocorrelacion
# with our p-value of 0.7185, we can see how we reject Ho, that means WE HAVE AUTOCORERELATION.
#- Based on the regression results and diagnostic tests, select the VAR_Model that you consider might generate the best forecast.
#- Briefly interpret the regression results. That is, is there a statistically significantrelationship between the explanatory variable(s) and themain dependent variable?
#VAR 2
# Transform non-stationary time series variables to stationary
diff_Exportaciones<-diff(log(Exportaciones))
diff_Innovacion<-diff(log(Innovacion))
diff_Inseguridad_Robo<-diff(log(Inseguridad_Robo))
diff_IED_Flujos<-diff(log(IED_Flujos))
diff_PIB_Per_Capita<-diff(log(PIB_Per_Capita))
# plotting differenced time series variables
par(mfrow=c(2,3))
plot(diff_Exportaciones)
plot(diff_Innovacion)
plot(diff_Inseguridad_Robo)
plot(diff_IED_Flujos)
plot(diff_PIB_Per_Capita)
# plotting differenced time series variables
par(mfrow=c(2,3))
ts_plot(diff_Exportaciones)
ts_plot(diff_Innovacion)
ts_plot(diff_Inseguridad_Robo)
ts_plot(diff_IED_Flujos)
ts_plot(diff_PIB_Per_Capita)
VAR_tsdiff<-cbind(diff_IED_Flujos,diff_PIB_Per_Capita,diff_Innovacion,diff_Exportaciones)
colnames(VAR_tsdiff)<-cbind("IED_Flujos","PIB_Per_Capita","diff_Innovacion", "diff_Exportaciones")
lag_diff_selection<-VARselect(VAR_tsdiff,lag.max=10,type="const")
lag_diff_selection$selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 5 5 5 5
lag_diff_selection$criteria
## 1 2 3 4 5 6 7
## AIC(n) -1.728972e+01 -1.783294e+01 -2.175012e+01 -1.217829e+02 -Inf -Inf -Inf
## HQ(n) -1.715331e+01 -1.758740e+01 -2.139545e+01 -1.213191e+02 -Inf -Inf -Inf
## SC(n) -1.630042e+01 -1.605219e+01 -1.917793e+01 -1.184193e+02 -Inf -Inf -Inf
## FPE(n) 3.290084e-08 2.670307e-08 1.638148e-09 1.012237e-50 0 0 0
## 8 9 10
## AIC(n) -Inf -Inf -Inf
## HQ(n) -Inf -Inf -Inf
## SC(n) -Inf -Inf -Inf
## FPE(n) 0 0 0
VAR_model2<-VAR(VAR_tsdiff,p=2,type="const",season=NULL,exog=NULL)
summary(VAR_model2)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: IED_Flujos, PIB_Per_Capita, diff_Innovacion, diff_Exportaciones
## Deterministic variables: const
## Sample size: 26
## Log Likelihood: 109.334
## Roots of the characteristic polynomial:
## 0.6543 0.6543 0.5453 0.4946 0.4946 0.4489 0.205 0.205
## Call:
## VAR(y = VAR_tsdiff, p = 2, type = "const", exogen = NULL)
##
##
## Estimation results for equation IED_Flujos:
## ===========================================
## IED_Flujos = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -7.418e-01 2.820e-01 -2.631 0.01753 *
## PIB_Per_Capita.l1 -5.574e+00 4.127e+00 -1.351 0.19453
## diff_Innovacion.l1 3.232e+00 1.328e+00 2.434 0.02625 *
## diff_Exportaciones.l1 1.285e+00 4.354e-01 2.950 0.00895 **
## IED_Flujos.l2 -3.353e-01 2.821e-01 -1.189 0.25091
## PIB_Per_Capita.l2 3.174e+00 4.442e+00 0.715 0.48456
## diff_Innovacion.l2 -8.270e-01 1.508e+00 -0.548 0.59065
## diff_Exportaciones.l2 -9.681e-02 4.767e-01 -0.203 0.84149
## const 4.355e-17 6.461e-02 0.000 1.00000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.3294 on 17 degrees of freedom
## Multiple R-Squared: 0.5156, Adjusted R-squared: 0.2876
## F-statistic: 2.262 on 8 and 17 DF, p-value: 0.07489
##
##
## Estimation results for equation PIB_Per_Capita:
## ===============================================
## PIB_Per_Capita = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -3.261e-02 3.227e-02 -1.011 0.326
## PIB_Per_Capita.l1 -4.338e-01 4.722e-01 -0.918 0.371
## diff_Innovacion.l1 2.161e-01 1.520e-01 1.422 0.173
## diff_Exportaciones.l1 7.124e-02 4.983e-02 1.430 0.171
## IED_Flujos.l2 -2.238e-02 3.227e-02 -0.693 0.497
## PIB_Per_Capita.l2 2.438e-01 5.083e-01 0.480 0.638
## diff_Innovacion.l2 -1.064e-01 1.726e-01 -0.616 0.546
## diff_Exportaciones.l2 -6.310e-03 5.455e-02 -0.116 0.909
## const 2.722e-18 7.393e-03 0.000 1.000
##
##
## Residual standard error: 0.0377 on 17 degrees of freedom
## Multiple R-Squared: 0.2167, Adjusted R-squared: -0.1519
## F-statistic: 0.5879 on 8 and 17 DF, p-value: 0.7746
##
##
## Estimation results for equation diff_Innovacion:
## ================================================
## diff_Innovacion = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -7.363e-02 6.060e-02 -1.215 0.241
## PIB_Per_Capita.l1 4.540e-01 8.869e-01 0.512 0.615
## diff_Innovacion.l1 1.980e-01 2.854e-01 0.694 0.497
## diff_Exportaciones.l1 3.165e-02 9.357e-02 0.338 0.739
## IED_Flujos.l2 -2.619e-02 6.061e-02 -0.432 0.671
## PIB_Per_Capita.l2 5.117e-01 9.546e-01 0.536 0.599
## diff_Innovacion.l2 2.461e-02 3.241e-01 0.076 0.940
## diff_Exportaciones.l2 -6.066e-02 1.024e-01 -0.592 0.562
## const 9.526e-18 1.388e-02 0.000 1.000
##
##
## Residual standard error: 0.07079 on 17 degrees of freedom
## Multiple R-Squared: 0.2121, Adjusted R-squared: -0.1587
## F-statistic: 0.5719 on 8 and 17 DF, p-value: 0.7868
##
##
## Estimation results for equation diff_Exportaciones:
## ===================================================
## diff_Exportaciones = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1 -3.944e-02 3.085e-01 -0.128 0.8998
## PIB_Per_Capita.l1 -8.311e+00 4.515e+00 -1.841 0.0832 .
## diff_Innovacion.l1 2.022e+00 1.453e+00 1.392 0.1819
## diff_Exportaciones.l1 7.907e-01 4.764e-01 1.660 0.1153
## IED_Flujos.l2 -3.666e-02 3.086e-01 -0.119 0.9068
## PIB_Per_Capita.l2 4.776e+00 4.860e+00 0.983 0.3396
## diff_Innovacion.l2 -1.669e+00 1.650e+00 -1.011 0.3261
## diff_Exportaciones.l2 -3.791e-01 5.216e-01 -0.727 0.4772
## const 5.443e-18 7.069e-02 0.000 1.0000
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 0.3604 on 17 degrees of freedom
## Multiple R-Squared: 0.2376, Adjusted R-squared: -0.1212
## F-statistic: 0.6621 on 8 and 17 DF, p-value: 0.7172
##
##
##
## Covariance matrix of residuals:
## IED_Flujos PIB_Per_Capita diff_Innovacion diff_Exportaciones
## IED_Flujos 0.108526 0.0061754 0.0064028 0.060377
## PIB_Per_Capita 0.006175 0.0014210 0.0007026 0.011203
## diff_Innovacion 0.006403 0.0007026 0.0050118 -0.001556
## diff_Exportaciones 0.060377 0.0112034 -0.0015560 0.129908
##
## Correlation matrix of residuals:
## IED_Flujos PIB_Per_Capita diff_Innovacion diff_Exportaciones
## IED_Flujos 1.0000 0.4973 0.27454 0.50849
## PIB_Per_Capita 0.4973 1.0000 0.26327 0.82459
## diff_Innovacion 0.2745 0.2633 1.00000 -0.06098
## diff_Exportaciones 0.5085 0.8246 -0.06098 1.00000
#VAR2 model is using 2 lags insead of 1, also its using one more explanatory variable
# In this new model, we can now see statistical significance between the lag of the dependent variables, which is negative once again but with the highest coefficient talking in absoluy=t value terms
# We can also observe that the difference of Exportaciones shows has a positive correlation with a significance of 0.01. It also has the biggest coefficient.
# Innovacion also has a pretty small but not insignificant relation with a value of 0.05.
# -Detect if the estimated VAR_Model residuals are stationary.
VAR_model2_residuals<-data.frame(residuals(VAR_model2))
adf.test(VAR_model2_residuals$IED_Flujos)
##
## Augmented Dickey-Fuller Test
##
## data: VAR_model2_residuals$IED_Flujos
## Dickey-Fuller = -2.8409, Lag order = 2, p-value = 0.252
## alternative hypothesis: stationary
#The test gives us a p-value of 0.2959, which means that we reject Ho (data is non-stationary), which mean we accept H1, DATA IS STATIONARY.
# We can note how this p-value is bigger than VAR1
#- Detect if the estimated VAR_Model residuals show serial autocorrelation.
Box.test(VAR_model2_residuals$IED_Flujos,lag=2,type="Ljung-Box")
##
## Box-Ljung test
##
## data: VAR_model2_residuals$IED_Flujos
## X-squared = 0.045666, df = 2, p-value = 0.9774
# ho: no serial autocorrelacion
# with our p-value of 0.8904, we can see how we reject Ho, that means WE HAVE AUTOCORERELATION.
After analizing both previous models, the wise decision is to select the second one due to these next points: -VAR 2 residuals are STATIONARY -VAR 2 has 3 significant explanatory variables instead of just 2 -Both VAR1 and VAR2 have autocorrelation.
#CAUSALITY
granger_diff_IED<-causality(VAR_model2,cause="IED_Flujos")
granger_diff_IED
## $Granger
##
## Granger causality H0: IED_Flujos do not Granger-cause PIB_Per_Capita
## diff_Innovacion diff_Exportaciones
##
## data: VAR object VAR_model2
## F-Test = 0.50402, df1 = 6, df2 = 68, p-value = 0.8032
##
##
## $Instant
##
## H0: No instantaneous causality between: IED_Flujos and PIB_Per_Capita
## diff_Innovacion diff_Exportaciones
##
## data: VAR object VAR_model2
## Chi-squared = 6.7943, df = 3, p-value = 0.07875
#Granger p-value: 0.7235
#Instant p-value: 0.04989
#We can observe how this model we fail to reject the Ho in Granger Causality. (p-value>0.05)
#This means that Direct Foreign Investment DO NOT have a predictive effect in the variables selected.
#On the other hand, we have Instant Causality.
#We can observe how this model we accepts the Ho in Instant Causality. (p-value<0.05)
#This means that there is evidence that Direct Foreign Investment DO INDEED has a immediate or simultaneous causal relationship between the variables selected.
#Just for comapring purposes, I tested VAR1 causality. We can observe how in that model there is not granger causality neither instant casuality.
granger2_diff_IED<-causality(VAR_model1,cause="IED_Flujos")
granger2_diff_IED
## $Granger
##
## Granger causality H0: IED_Flujos do not Granger-cause PIB_Per_Capita
## Innovacion Empleo
##
## data: VAR object VAR_model1
## F-Test = 0.67631, df1 = 3, df2 = 92, p-value = 0.5687
##
##
## $Instant
##
## H0: No instantaneous causality between: IED_Flujos and PIB_Per_Capita
## Innovacion Empleo
##
## data: VAR object VAR_model1
## Chi-squared = 6.1637, df = 3, p-value = 0.1039
#Granger p-value: 0.7235
#Instant p-value: 0.04989
#FORECAST
#Based on the selected VAR_Model, forecast the increasing / decreasing trend of FDI inflows in Mexico for the next 5 periods. Display the forecast in a time series plot.
forecast<-predict(VAR_model2,n.ahead=5,ci=0.95) ### forecast for the next four quarters
fanchart(forecast,names="IED_Flujos",main="Direct Foreign Investment",xlab="Time Period",ylab="Direct Foreign Investment")
#Though this graph we can observe how the next 5 predicted values for Direct Foreign Investment are going down, showing an incoming negative trend.
Briefly describe the main insights from previous sections.
Based on the selected results, please share at least 1 recommendation that address the problem situation.
Insights of each graph and code are written just below them.
The selected models were ARIMA and VAR2
Based on our results from both ARIMA and VAR2 models, we can observe how it is not likely for Direct Foreign Investment to increment drastically the next few years according to the data given. It is predicted a steady behavior or even negative trend. Knowing this, and taking into consideration the random outcome in decomposition, i would suggest to deepen our research of variables explaining DFI, since there is evidence suggesting that exist more important variables which are not being taking into consideration to further improve the models. -At the same time, I would recommend to keep an eye for these Foreign Investment, as the mexican government i would suggest improving our situation in these explanatory variables mentioned in the analysis to keep resulting an interesting and atr=tractive to foreign countries to invest.
Time Series. (2008). Springer EBooks, 536–539. https://doi.org/10.1007/978-0-387-32833-1_401
Time Series Analysis: Definition, Types, Techniques, and When It’s Used. (2023). Tableau. https://www.tableau.com/learn/articles/time-series-analysis#definition
Mexico Rides Nearshoring Wave | Morgan Stanley. (2023). Morgan Stanley; Morgan Stanley. https://www.morganstanley.com/ideas/mexico-nearshoring-gdp-growth