Introduction

Time Series Analysis

Time series analysis is a way analyze and interpret data collected over time b. It which evaluates the beahvior of one or more variables thoughout time in order to identify trends and patterns, to the later on make forecasts and predictions. All of this based on historical data.

Time series can be decomposed into four components, each expressing a particular aspect of the movement of the values of the time series.

Decomposition: observed + trend + seasonal + random
observed: data observations
trend: increasing / decreasing value of data observations
seasonality: repeating short-term cycle in time series
noise: random variation in time series

(Springer Ebooks, 2008)

Background

What are the latest trends of Nearshoring in Mexico?

Nearshoring in Mexico has become a very popular practice, especially to companies in the USA and Canada. Mexico’s proximity to these countries, in addition to the T-MEC agreement and its competitive cost of skilled labor force, represent a big opportunity and attractive destination for nearshoring. Aside from the aspects already mentioned, the political and commercial conflicts between US and China have caused Mexico ́s US imports to rise and of course, American companies have migrated from offshoring in Asia, to nearshoring in Mexico.

This is why, according to Morgan Stanley, “Mexico Is Poised to Ride the Nearshoring Wave” (2023).

Problem Situation

Maria, an econometrics analyst in a Mexican company, plans to use official data sources to create a database exploring conditions influencing nearshoring in Mexico, including socioeconomic, business, technological, environmental, and security factors. She aims to apply econometric models to predict the effects of nearshoring in Mexico, understand why Mexico is attractive for nearshoring, and identify opportunities for new business relocation in the country.

Data and Methodology

loading required libraries

library(xts)

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(dplyr)

## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:xts':
## 
##     first, last

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(zoo)
library(tseries)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(stats)
library(forecast)
library(astsa)

## 
## Attaching package: 'astsa'

## The following object is masked from 'package:forecast':
## 
##     gas

library(corrplot)

## corrplot 0.92 loaded

library(AER)

## Loading required package: car

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:dplyr':
## 
##     recode

## Loading required package: lmtest

## Loading required package: sandwich

## Loading required package: survival

library(vars)

## Loading required package: MASS

## 
## Attaching package: 'MASS'

## The following object is masked from 'package:dplyr':
## 
##     select

## Loading required package: strucchange

## Loading required package: urca

library(dynlm)
library(vars)
library(TSstudio)
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.3     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ stringr::boundary() masks strucchange::boundary()
## ✖ dplyr::filter()     masks stats::filter()
## ✖ dplyr::first()      masks xts::first()
## ✖ dplyr::lag()        masks stats::lag()
## ✖ dplyr::last()       masks xts::last()
## ✖ car::recode()       masks dplyr::recode()
## ✖ MASS::select()      masks dplyr::select()
## ✖ purrr::some()       masks car::some()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(sarima)

## Loading required package: stats4
## 
## Attaching package: 'sarima'
## 
## The following object is masked from 'package:astsa':
## 
##     sarima
## 
## The following object is masked from 'package:stats':
## 
##     spectrum

library(dygraphs)
library("readxl")    
library(Hmisc)        # Imputation

## 
## Attaching package: 'Hmisc'
## 
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## 
## The following objects are masked from 'package:base':
## 
##     format.pval, units

0. Import DataBase

ts_data <- read_excel("C:\\Users\\maria\\OneDrive\\Desktop\\AD 2023\\ECONOMETRICS\\SP_DataMexicoAtractiveness_alumn-VF.xlsx", sheet = 'ts.data')
ts_data

## # A tibble: 96 × 3
##     Year Trimestre IED_Flujos
##    <dbl> <chr>          <dbl>
##  1  1999 I              3596.
##  2  1999 II             3396.
##  3  1999 III            3028.
##  4  1999 IV             3940.
##  5  2000 I              4601.
##  6  2000 II             4857.
##  7  2000 III            3057.
##  8  2000 IV             5734.
##  9  2001 I              3599.
## 10  2001 II             5219.
## # ℹ 86 more rows

1. Set Date Format

# Assuming your data frame is called 'df'
ts_data <- ts_data %>%
  mutate(Month = case_when(
    Trimestre == "I" ~ "01",
    Trimestre == "II" ~ "04",
    Trimestre == "III" ~ "07",
    Trimestre == "IV" ~ "10",
    TRUE ~ NA_character_
  )) %>%
  mutate(Date = as.Date(paste(Year, Month, "01", sep = "-"), format = "%Y-%m-%d")) 

ts_data

## # A tibble: 96 × 5
##     Year Trimestre IED_Flujos Month Date      
##    <dbl> <chr>          <dbl> <chr> <date>    
##  1  1999 I              3596. 01    1999-01-01
##  2  1999 II             3396. 04    1999-04-01
##  3  1999 III            3028. 07    1999-07-01
##  4  1999 IV             3940. 10    1999-10-01
##  5  2000 I              4601. 01    2000-01-01
##  6  2000 II             4857. 04    2000-04-01
##  7  2000 III            3057. 07    2000-07-01
##  8  2000 IV             5734. 10    2000-10-01
##  9  2001 I              3599. 01    2001-01-01
## 10  2001 II             5219. 04    2001-04-01
## # ℹ 86 more rows

2. Describe variables

# - Briefly describe the dataset’s selected variables
ts_data

## # A tibble: 96 × 5
##     Year Trimestre IED_Flujos Month Date      
##    <dbl> <chr>          <dbl> <chr> <date>    
##  1  1999 I              3596. 01    1999-01-01
##  2  1999 II             3396. 04    1999-04-01
##  3  1999 III            3028. 07    1999-07-01
##  4  1999 IV             3940. 10    1999-10-01
##  5  2000 I              4601. 01    2000-01-01
##  6  2000 II             4857. 04    2000-04-01
##  7  2000 III            3057. 07    2000-07-01
##  8  2000 IV             5734. 10    2000-10-01
##  9  2001 I              3599. 01    2001-01-01
## 10  2001 II             5219. 04    2001-04-01
## # ℹ 86 more rows

# As we can observe, our relevant variable (also dependent variable) is IED Flujos. This is the Foreign Direct Investment in Mexico.
# We already fixed the date format in order for us to code easily, but we will be analyzing this variable behavior thoughout the last years.

# - Plot the variable IED_Flujos using a time series format
# time series plot 2
df<-xts(ts_data$IED_Flujos,order.by=ts_data$Date)
plot(df)

# time series plot 3
dygraph(df, main = "Foreign Direct Investment") %>% 
  dyOptions(colors = RColorBrewer::brewer.pal(4, "Dark2"))

#We can observe how volatile this behavior has been during the last 20 years. There can be so much change (positive or negative) from one quarter to another.

# - decompose the time series data into trend, seasonal, and random components. Briefly, describe the decomposition time series plot. Do the time series data show a trend? Do the time series data show seasonality?


df2<-ts(ts_data$IED_Flujos,frequency=4,start=c(1999,1))
data_ts_decompose<-decompose(df2)
plot(data_ts_decompose)

# After decomposing the graph we can interpret the following:
# - Observed component: We can observe a IED with low volatility during the first years, but still with some random peaks.Then in around 2013/2014 a sudden elevation of investment came to Mexico. From there the volatility continued  but with more extreme values.
# - Trend component: In the long run, we can identify an increasing trend, slow but existing. As mentioned before, we can see the high level investment made in 2013/2014 which helped a lot to the positive trend.
# - Seasonal component: We can clearly observe the behavoir or the seasonal part. One year its at its peak, the next drops, and so on so forth.
# - Random component: We can identify a steady behavior with just  a few fluncuations. 


#- detect the presence of stationary
adf.test(ts_data$IED_Flujos)

## Warning in adf.test(ts_data$IED_Flujos): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_data$IED_Flujos
## Dickey-Fuller = -4.1994, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

#H0: Non-stationary and HA: Stationary. 
#p-values < 0.05 reject the H0. 
#P-Value = 0.01. Reject the H0. 
# Time series data IS stationary.

#-detect the presence of serial autocorrelation
acf(ts_data$IED_Flujos,main="Significant Autocorrelations")

# In this graph we can observe partial auto-correlation, since even though most of the values fit inside the blue dotted lines (showing no autocorrelation), there are some exceptions.

Time Series Regression Analysis

3. Time Series Model 1

#ARMA

# ARMA
# time series modeling 
summary(df_ARMA<-arma(log(ts_data$IED_Flujos),order=c(1,1)))

## 
## Call:
## arma(x = log(ts_data$IED_Flujos), order = c(1, 1))
## 
## Model:
## ARMA(1,1)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.41234 -0.35244 -0.00571  0.27709  1.50760 
## 
## Coefficient(s):
##            Estimate  Std. Error  t value Pr(>|t|)    
## ar1         -0.2977      0.2271   -1.311  0.18987    
## ma1          0.5174      0.1999    2.588  0.00964 ** 
## intercept   11.3157      1.9793    5.717 1.08e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Fit:
## sigma^2 estimated as 0.2688,  Conditional Sum-of-Squares = 25.26,  AIC = 152.3

plot(df_ARMA)

estimated_IED<-exp(df_ARMA$fitted.values)
plot(estimated_IED)

# model evaluation 
# Testing serial autocorrelation in regression residuals 
# Ho: There is no serial autocorrelation 
# Ha: There is serial autocorrelation
ARMA_residuals<-df_ARMA$residuals
Box.test(ARMA_residuals,lag=5,type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  ARMA_residuals
## X-squared = 13.689, df = 5, p-value = 0.01771

# this test shows residual autocorrelation. with a p-value of 0.01771 we accept Ho.
# There is no serial autocorrelation

adf.test(na.omit(estimated_IED))

## Warning in adf.test(na.omit(estimated_IED)): p-value smaller than printed
## p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  na.omit(estimated_IED)
## Dickey-Fuller = -4.1127, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

#Ho: non stationary
#H1: stationary
# With a p value of 0.01 we have strong evidence to reject the Ho, so we can affirm our data is stationary

#ARIMA

# ARIMA
df_ARIMA <- Arima(ts_data$IED_Flujos,order=c(1,1,1))
print(df_ARIMA)

## Series: ts_data$IED_Flujos 
## ARIMA(1,1,1) 
## 
## Coefficients:
##           ar1      ma1
##       -0.0520  -0.9399
## s.e.   0.1072   0.0323
## 
## sigma^2 = 15868763:  log likelihood = -922.46
## AIC=1850.91   AICc=1851.17   BIC=1858.57

plot(df_ARIMA$residuals,main="ARIMA(1,1,1) - Foreign Direct Ivestment")

#Autocorrelation
acf(df_ARIMA$residuals,main="ACF - ARIMA (1,1,1)")

# ACF plot displays partial autocorrelation as well, however there is a slight more correlation comparing to ARMA model. 

Box.test(df_ARIMA$residuals,lag=1,type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  df_ARIMA$residuals
## X-squared = 0.2818, df = 1, p-value = 0.5955

# Ho: There is no serial autocorrelation 
# Ha: There is serial autocorrelation
# P-value is 0.5955 > 0.05 indicating that ARIMA model DOES show residual serial autocorrelation. 

adf.test(df_ARIMA$residuals)

## Warning in adf.test(df_ARIMA$residuals): p-value smaller than printed p-value

## 
##  Augmented Dickey-Fuller Test
## 
## data:  df_ARIMA$residuals
## Dickey-Fuller = -4.3162, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary

#Ho: non stationary
#H1: stationary
# ADF test shows a p-value of 0.01, which suggest that ARIMA residuals are NOT stationary since p-value is < 0.05.

Selected Model

Based on diagnostic tests, compare the 3 estimated time series regression models, and select the results that you consider might generate the best forecast.

After modelling both ARMA and ARIMA, we can confidently say that the model that fits best our data is ARMA. - For starters, our data was initally stationary, which is recommended to use ARMA in these cases. - Second, ARMA showed no serial autocorrelation neither stationarity, which are desirable characteristics for the model. (We can see how the ARIMA model showed the oppposite of what we expected)

# - By using the selected model, make a forecast for the next 5 periods. In doing so, include a time series plot showing your forecast.
# forecast (ARIMA)

df_ARIMA_forecast<-forecast(df_ARIMA,h=5)
df_ARIMA_forecast

##     Point Forecast    Lo 80    Hi 80     Lo 95    Hi 95
##  97       8227.637 3122.496 13332.78 419.99682 16035.28
##  98       7892.280 2786.970 12997.59  84.38254 15700.18
##  99       7909.715 2795.322 13024.11  87.92538 15731.50
## 100       7908.808 2786.143 13031.47  74.36801 15743.25
## 101       7908.856 2777.891 13039.82  61.72284 15755.99

plot(df_ARIMA_forecast)

autoplot(df_ARIMA_forecast)

#We can observe the plot which shows the predicted values for Foreign Investment for the next time periods.

4. Time series model 2

data<-read_excel("C:\\Users\\maria\\OneDrive\\Desktop\\AD 2023\\ECONOMETRICS\\SP_DataMexicoAtractiveness_alumn-VF.xlsx", sheet = 'Datos')
head(data)

## # A tibble: 6 × 15
##    Year IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
##   <dbl>      <dbl>         <dbl>  <dbl>     <dbl>          <dbl>      <dbl>
## 1  1997     12146.         9088.   NA        7.20           24.3       11.3
## 2  1998      8374.         9875.   NA        7.31           31.9       11.4
## 3  1999     13960.        10990.   NA        7.43           31.9       12.5
## 4  2000     18249.        12483.   97.8      7.56           35.1       13.1
## 5  2001     30057.        11300.   97.4      7.68           37.6       13.5
## 6  2002     24099.        11923.   97.7      7.80           39.7       12.8
## # ℹ 8 more variables: Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>,
## #   Tipo_de_Cambio <dbl>, Densidad_Carretera <dbl>, Densidad_Poblacion <dbl>,
## #   CO2_Emisiones <dbl>, PIB_Per_Capita <dbl>, INPC <dbl>

#VARts$Date <- as.Date(paste0(VARts$Year, "-12-31"))
#head(VARts)

1. Data imputation

# Search for N/A in data (df1)
sum(is.na(data))

## [1] 12

glimpse(data)

## Rows: 26
## Columns: 15
## $ Year                  <dbl> 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, …
## $ IED_Flujos            <dbl> 12145.60, 8373.50, 13960.32, 18248.69, 30057.18,…
## $ Exportaciones         <dbl> 9087.616, 9875.065, 10990.013, 12482.963, 11300.…
## $ Empleo                <dbl> NA, NA, NA, 97.83000, 97.36000, 97.66000, 97.060…
## $ Educacion             <dbl> 7.197968, 7.311587, 7.427924, 7.560000, 7.678270…
## $ Salario_Diario        <dbl> 24.30, 31.91, 31.91, 35.12, 37.57, 39.74, 41.53,…
## $ Innovacion            <dbl> 11.30141, 11.37221, 12.45897, 13.14556, 13.46812…
## $ Inseguridad_Robo      <dbl> 266.5065, 314.7762, 272.8936, 216.9808, 214.5269…
## $ Inseguridad_Homicidio <dbl> 14.554142, 14.319399, 12.641071, 10.857846, 10.2…
## $ Tipo_de_Cambio        <dbl> 8.0640, 9.9395, 9.5222, 9.5997, 9.1692, 10.3613,…
## $ Densidad_Carretera    <dbl> 0.05205217, 0.05295475, 0.05501698, 0.05522774, …
## $ Densidad_Poblacion    <dbl> 47.43650, 48.76163, 49.48089, 50.57930, 51.27675…
## $ CO2_Emisiones         <dbl> 3.675330, 3.853045, 3.686918, 3.874147, 3.811393…
## $ PIB_Per_Capita        <dbl> 127570.1, 126738.8, 129164.7, 130874.9, 128083.4…
## $ INPC                  <dbl> 33.27987, 39.47297, 44.33552, 48.30767, 50.43490…

# Search for N/A in ts_data (df2)
sum(is.na(data))

## [1] 12

# Identify columns mith missing values
missing <- colSums(is.na(data)) > 0
missing_col <- names(data)[missing]
missing_col

## [1] "Empleo"                "Educacion"             "Innovacion"           
## [4] "Inseguridad_Homicidio" "CO2_Emisiones"

#After examining our dataset, we noticed that there are  several N/A, which for the sake of our model we need to imputate in order for it to be more precise. We will be using a Mchine Learning model to predict missing values.

# Llibrary(Hmisc)  allows for multiple imputation with bootstrapping, additive regression, and predictive mean matching

# The named variables specifies the predictors for imputation
data_imp <- aregImpute(~ Year + IED_Flujos + Exportaciones + Empleo + Educacion + Salario_Diario + Innovacion +  Inseguridad_Robo + Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera + Densidad_Poblacion + CO2_Emisiones +  PIB_Per_Capita + INPC, 
                         n.impute = 2,  # specifies to generate one imputed dataset
                         type = 'pmm', # specifies imputation method (predictive mean matching)
                         data = data) # your data frame

## Iteration 1 
Iteration 2 
Iteration 3 
Iteration 4 
Iteration 5

# Display the information of the model we created
# We can see which columns have N/A
# The R-squared shows the quality of the impuitation models. The higher the value (closer to 1), the better prediction.
data_imp

## 
## Multiple Imputation using Bootstrap and PMM
## 
## aregImpute(formula = ~Year + IED_Flujos + Exportaciones + Empleo + 
##     Educacion + Salario_Diario + Innovacion + Inseguridad_Robo + 
##     Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera + 
##     Densidad_Poblacion + CO2_Emisiones + PIB_Per_Capita + INPC, 
##     data = data, n.impute = 2, type = "pmm")
## 
## n: 26    p: 15   Imputations: 2      nk: 3 
## 
## Number of NAs:
##                  Year            IED_Flujos         Exportaciones 
##                     0                     0                     0 
##                Empleo             Educacion        Salario_Diario 
##                     3                     3                     0 
##            Innovacion      Inseguridad_Robo Inseguridad_Homicidio 
##                     2                     0                     1 
##        Tipo_de_Cambio    Densidad_Carretera    Densidad_Poblacion 
##                     0                     0                     0 
##         CO2_Emisiones        PIB_Per_Capita                  INPC 
##                     3                     0                     0 
## 
##                       type d.f.
## Year                     s    2
## IED_Flujos               s    2
## Exportaciones            s    2
## Empleo                   s    2
## Educacion                s    2
## Salario_Diario           s    2
## Innovacion               s    2
## Inseguridad_Robo         s    2
## Inseguridad_Homicidio    s    2
## Tipo_de_Cambio           s    2
## Densidad_Carretera       s    2
## Densidad_Poblacion       s    2
## CO2_Emisiones            s    1
## PIB_Per_Capita           s    2
## INPC                     s    2
## 
## Transformation of Target Variables Forced to be Linear
## 
## R-squares for Predicting Non-Missing Values for Each Variable
## Using Last Imputations of Predictors
##                Empleo             Educacion            Innovacion 
##                     1                     1                     1 
## Inseguridad_Homicidio         CO2_Emisiones 
##                     1                     1

# Replace de N/A of each column with the imputation model values
# The column Education was not applied, since the values the model gave us, dind math the tendency of the data.
# For education we used another type of impintation method
data$Empleo[is.na(data$Empleo)] <- data_imp$imputed$Empleo

## Warning in data$Empleo[is.na(data$Empleo)] <- data_imp$imputed$Empleo: number
## of items to replace is not a multiple of replacement length

data$Innovacion[is.na(data$Innovacion)] <- data_imp$imputed$Innovacion

## Warning in data$Innovacion[is.na(data$Innovacion)] <-
## data_imp$imputed$Innovacion: number of items to replace is not a multiple of
## replacement length

data$Inseguridad_Homicidio[is.na(data$Inseguridad_Homicidio)] <- data_imp$imputed$Inseguridad_Homicidio

## Warning in data$Inseguridad_Homicidio[is.na(data$Inseguridad_Homicidio)] <-
## data_imp$imputed$Inseguridad_Homicidio: number of items to replace is not a
## multiple of replacement length

data$CO2_Emisiones[is.na(data$CO2_Emisiones)] <- data_imp$imputed$CO2_Emisiones

## Warning in data$CO2_Emisiones[is.na(data$CO2_Emisiones)] <-
## data_imp$imputed$CO2_Emisiones: number of items to replace is not a multiple of
## replacement length

#verify the changes made
data

## # A tibble: 26 × 15
##     Year IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
##    <dbl>      <dbl>         <dbl>  <dbl>     <dbl>          <dbl>      <dbl>
##  1  1997     12146.         9088.   96.9      7.20           24.3       11.3
##  2  1998      8374.         9875.   96.9      7.31           31.9       11.4
##  3  1999     13960.        10990.   97.8      7.43           31.9       12.5
##  4  2000     18249.        12483.   97.8      7.56           35.1       13.1
##  5  2001     30057.        11300.   97.4      7.68           37.6       13.5
##  6  2002     24099.        11923.   97.7      7.80           39.7       12.8
##  7  2003     18250.        13156    97.1      7.93           41.5       11.8
##  8  2004     25016.        13573.   96.5      8.04           43.3       12.6
##  9  2005     25796.        16466.   97.2      8.14           45.2       13.4
## 10  2006     21233.        17486.   96.5      8.26           47.0       14.2
## # ℹ 16 more rows
## # ℹ 8 more variables: Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>,
## #   Tipo_de_Cambio <dbl>, Densidad_Carretera <dbl>, Densidad_Poblacion <dbl>,
## #   CO2_Emisiones <dbl>, PIB_Per_Capita <dbl>, INPC <dbl>

# For the imputation we used a linear regression prediction in order to follow the pattern of the data
imp_edu <- approx(seq_along(data$Educacion),data$Educacion, method = "linear", n = length(data$Educacion))$y
data$Educacion <- imp_edu


#We can see how we fixed our missing value issue, so we can proceed with our code.
data

## # A tibble: 26 × 15
##     Year IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
##    <dbl>      <dbl>         <dbl>  <dbl>     <dbl>          <dbl>      <dbl>
##  1  1997     12146.         9088.   96.9      7.20           24.3       11.3
##  2  1998      8374.         9875.   96.9      7.30           31.9       11.4
##  3  1999     13960.        10990.   97.8      7.40           31.9       12.5
##  4  2000     18249.        12483.   97.8      7.51           35.1       13.1
##  5  2001     30057.        11300.   97.4      7.62           37.6       13.5
##  6  2002     24099.        11923.   97.7      7.73           39.7       12.8
##  7  2003     18250.        13156    97.1      7.84           41.5       11.8
##  8  2004     25016.        13573.   96.5      7.94           43.3       12.6
##  9  2005     25796.        16466.   97.2      8.05           45.2       13.4
## 10  2006     21233.        17486.   96.5      8.13           47.0       14.2
## # ℹ 16 more rows
## # ℹ 8 more variables: Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>,
## #   Tipo_de_Cambio <dbl>, Densidad_Carretera <dbl>, Densidad_Poblacion <dbl>,
## #   CO2_Emisiones <dbl>, PIB_Per_Capita <dbl>, INPC <dbl>

#Verify there isnt any N/A
sum(is.na(data))

## [1] 0

2. Date format

#data$Date <- as.Date(paste0(data$Year, "-12-31"))
#head(data)
#library(lubridate)

# Assuming your date variable is named 'date' in the VARts data frame
#data$Year1 <- year(data$Date)
#head(data)

4. Time series model 2

From the time series dataset, select the explanatory variables that might explain the Nearshoring in Mexico

We are going to evaluate the data set variables: -IED_Flujos -Exportaciones -Empleo -Innovacion -Inseguridad_Robo -PIB_Per_Capita -INPC

#- Describe the hypothetical relationship / impact between each selected factor and the dependent variable IED_Flujos. For example, how does the exchange rate increase / reduce the foreign direct investment flows in Mexico?

# - In describing the above relationships, please include a time series plot that displays the selected variables’ performance over the time period.

### converting to time series format
Exportaciones<-ts(data$Exportaciones,start=c(1997,1),end=c(2022,4),frequency=1)
Empleo<-ts(data$Empleo,start=c(1997,1),end=c(2022,4),frequency=1)
Innovacion<-ts(data$Innovacion,start=c(1997,1),end=c(2022,4),frequency=1)
Inseguridad_Robo<-ts(data$Inseguridad_Robo,start=c(1997,1),end=c(2022,4),frequency=1)
PIB_Per_Capita<-ts(data$PIB_Per_Capita,start=c(1997,1),end=c(2022,4),frequency=1)
IED_Flujos<-ts(data$IED_Flujos,start=c(1997,1),end=c(2022,4),frequency=1)

# plotting time series data
par(mfrow=c(2,3))
plot(data$Year,data$Exportaciones,type="l",col="blue",lwd=2,xlab="Date",ylab="Exportations",main="Exportations")
plot(data$Year,data$Empleo,type="l",col="blue",lwd=2,xlab="Date",ylab="Employment",main="Employment ")
plot(data$Year,data$Innovacion,type="l",col="blue",lwd=2,xlab="Date",ylab="Innovation",main="Innovation")
plot(data$Year,data$Inseguridad_Robo,type="l",col="blue",lwd=2,xlab="Date",ylab="Theft Insecurity",main="Theft Insecurity")
plot(data$Year,data$PIB_Per_Capita,type="l",col="blue",lwd=2,xlab="Date",ylab="GDP",main="GDP")
plot(data$Year,data$IED_Flujos,type="l",col="blue",lwd=2,xlab="Date",ylab="Foreign Investment",main="Foreign Investment")

#Exportations: With this variable we can observe clarly positive trend, however the slope is not e=close to the one in our dependent variable, Foreign Investments.
#Employment: Employment seems to be not som much correlated with our dependent variable. At first glance, we dont see a pattern.
#Innovation: With innvoation, we can observe how the peaks in thhis variable, as well as in our dependent variable match. So we can say that when there is a significant increase in Innovation, Foreign Investments also inrcease.
#Theft Insecurity: We can observe how this variable and our dependent variable have opposite slopes, howerver there isny enough evidence yet to get a relationship between these 2.
#GDP: With this last variable we can identify some type of correlation, both in 2010 started increasing and after FI reached its peak, the GDP also got there.


# it is important to assess whether the variables under study are stationary or not
adf.test(data$Exportaciones)                # non-stationary (p-value > 0.05)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data$Exportaciones
## Dickey-Fuller = -0.88228, Lag order = 2, p-value = 0.9379
## alternative hypothesis: stationary

adf.test(data$Empleo)   # non-stationary (p-value > 0.05)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data$Empleo
## Dickey-Fuller = -1.0105, Lag order = 2, p-value = 0.919
## alternative hypothesis: stationary

adf.test(data$Innovacion)       # non-stationary (p-value > 0.05)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data$Innovacion
## Dickey-Fuller = -2.8406, Lag order = 2, p-value = 0.2521
## alternative hypothesis: stationary

adf.test(data$Inseguridad_Robo) # non-stationary (p-value > 0.05)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data$Inseguridad_Robo
## Dickey-Fuller = -3.3724, Lag order = 2, p-value = 0.08162
## alternative hypothesis: stationary

adf.test(data$IED_Flujos)         # non-stationary (p-value > 0.05)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data$IED_Flujos
## Dickey-Fuller = -3.0832, Lag order = 2, p-value = 0.1597
## alternative hypothesis: stationary

adf.test(data$PIB_Per_Capita)      # non-stationary (p-value > 0.05)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  data$PIB_Per_Capita
## Dickey-Fuller = -2.5796, Lag order = 2, p-value = 0.3516
## alternative hypothesis: stationary

#Separate our usefull variables
VAR_ts<-cbind(IED_Flujos,PIB_Per_Capita,Innovacion, Empleo)
colnames(VAR_ts)<-cbind("IED_Flujos","PIB_Per_Capita","Innovacion", "Empleo")
VAR_ts

## Time Series:
## Start = 1997 
## End = 2025 
## Frequency = 1 
##      IED_Flujos PIB_Per_Capita Innovacion   Empleo
## 1997   12145.60       127570.1   11.30141 96.85275
## 1998    8373.50       126738.8   11.37221 96.85275
## 1999   13960.32       129164.7   12.45897 97.83000
## 2000   18248.69       130874.9   13.14556 97.83000
## 2001   30057.18       128083.4   13.46812 97.36000
## 2002   24099.21       128205.9   12.79893 97.66000
## 2003   18249.97       128737.9   11.81075 97.06000
## 2004   25015.57       132563.5   12.60841 96.48000
## 2005   25795.82       132941.1   13.41442 97.17000
## 2006   21232.54       135894.9   14.23137 96.53000
## 2007   32393.33       137795.7   15.04295 96.60000
## 2008   29502.46       135176.0   14.81882 95.68000
## 2009   17849.95       131233.0   12.59250 95.20264
## 2010   27189.28       134991.7   12.69477 95.06220
## 2011   25632.52       138891.9   12.09530 95.49284
## 2012   21769.32       141530.2   13.02609 95.52654
## 2013   48354.42       144112.0   13.21576 95.74676
## 2014   30351.25       147277.4   13.65014 96.23509
## 2015   35943.75       149433.5   15.11449 96.04487
## 2016   31188.98       152275.4   14.39837 96.62297
## 2017   34017.05       153235.7   14.04673 96.85275
## 2018   34100.43       153133.8   13.24773 96.64163
## 2019   34577.16       150233.1   12.69960 97.09482
## 2020   28205.89       142609.3   11.27991 96.20963
## 2021   31553.52       142772.0   11.81075 96.48553
## 2022   36215.37       146826.7   11.30141 97.23665
## 2023   12145.60       127570.1   11.30141 96.85275
## 2024    8373.50       126738.8   11.37221 96.85275
## 2025   13960.32       129164.7   12.45897 97.83000

#Estimate a VAR_Model that includes at least 1 explanatory factor that might affect the dependent variable IED_Flujos.
# Generate the preferred lag order based on multiple iterations of the AIC. 
lag_selection<-VARselect(VAR_ts,lag.max=5,type="const")

## Warning in log(sigma.det): NaNs produced

## Warning in log(sigma.det): NaNs produced

## Warning in log(sigma.det): NaNs produced

lag_selection$selection

## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      4      4      1      5

lag_selection$criteria

##                   1            2            3            4           5
## AIC(n) 3.311668e+01 3.328365e+01 3.334901e+01 3.224452e+01         NaN
## HQ(n)  3.337713e+01 3.375246e+01 3.402618e+01 3.313005e+01         NaN
## SC(n)  3.409840e+01 3.505073e+01 3.590146e+01 3.558234e+01         NaN
## FPE(n) 2.472558e+14 3.324397e+14 5.111942e+14 4.105436e+14 -0.07165178

# We estimate the VAR model. The p option refers to the number of lags used.
VAR_model1<-VAR(VAR_ts,p=1,type="const") 
summary(VAR_model1)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: IED_Flujos, PIB_Per_Capita, Innovacion, Empleo 
## Deterministic variables: const 
## Sample size: 28 
## Log Likelihood: -597.089 
## Roots of the characteristic polynomial:
## 0.8579 0.8579 0.6042 0.02355
## Call:
## VAR(y = VAR_ts, p = 1, type = "const")
## 
## 
## Estimation results for equation IED_Flujos: 
## =========================================== 
## IED_Flujos = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const 
## 
##                     Estimate Std. Error t value Pr(>|t|)  
## IED_Flujos.l1     -1.399e-01  2.448e-01  -0.571   0.5733  
## PIB_Per_Capita.l1  5.741e-01  2.439e-01   2.354   0.0275 *
## Innovacion.l1      2.426e+03  1.365e+03   1.777   0.0889 .
## Empleo.l1         -2.416e+03  1.948e+03  -1.241   0.2273  
## const              1.523e+05  1.951e+05   0.781   0.4430  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 7313 on 23 degrees of freedom
## Multiple R-Squared: 0.4855,  Adjusted R-squared: 0.396 
## F-statistic: 5.426 on 4 and 23 DF,  p-value: 0.003157 
## 
## 
## Estimation results for equation PIB_Per_Capita: 
## =============================================== 
## PIB_Per_Capita = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const 
## 
##                     Estimate Std. Error t value Pr(>|t|)    
## IED_Flujos.l1     -1.598e-01  1.405e-01  -1.137   0.2672    
## PIB_Per_Capita.l1  8.758e-01  1.400e-01   6.257  2.2e-06 ***
## Innovacion.l1      1.102e+03  7.836e+02   1.407   0.1729    
## Empleo.l1         -2.659e+03  1.118e+03  -2.378   0.0261 *  
## const              2.638e+05  1.120e+05   2.356   0.0274 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 4197 on 23 degrees of freedom
## Multiple R-Squared: 0.8137,  Adjusted R-squared: 0.7812 
## F-statistic: 25.11 on 4 and 23 DF,  p-value: 4.206e-08 
## 
## 
## Estimation results for equation Innovacion: 
## =========================================== 
## Innovacion = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const 
## 
##                     Estimate Std. Error t value Pr(>|t|)    
## IED_Flujos.l1     -3.223e-05  2.717e-05  -1.186    0.248    
## PIB_Per_Capita.l1  6.894e-06  2.707e-05   0.255    0.801    
## Innovacion.l1      8.145e-01  1.515e-01   5.375 1.85e-05 ***
## Empleo.l1         -6.005e-02  2.162e-01  -0.278    0.784    
## const              8.103e+00  2.166e+01   0.374    0.712    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.8118 on 23 degrees of freedom
## Multiple R-Squared: 0.5794,  Adjusted R-squared: 0.5063 
## F-statistic: 7.922 on 4 and 23 DF,  p-value: 0.0003618 
## 
## 
## Estimation results for equation Empleo: 
## ======================================= 
## Empleo = IED_Flujos.l1 + PIB_Per_Capita.l1 + Innovacion.l1 + Empleo.l1 + const 
## 
##                     Estimate Std. Error t value Pr(>|t|)    
## IED_Flujos.l1     -1.144e-05  1.724e-05  -0.663   0.5137    
## PIB_Per_Capita.l1  9.486e-06  1.718e-05   0.552   0.5861    
## Innovacion.l1     -1.248e-01  9.615e-02  -1.298   0.2072    
## Empleo.l1          7.587e-01  1.372e-01   5.531 1.26e-05 ***
## const              2.393e+01  1.374e+01   1.742   0.0949 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.515 on 23 degrees of freedom
## Multiple R-Squared: 0.632,   Adjusted R-squared: 0.568 
## F-statistic: 9.875 on 4 and 23 DF,  p-value: 8.407e-05 
## 
## 
## 
## Covariance matrix of residuals:
##                IED_Flujos PIB_Per_Capita Innovacion    Empleo
## IED_Flujos      5.348e+07      1.559e+07  1.725e+03 463.88552
## PIB_Per_Capita  1.559e+07      1.762e+07  1.199e+03 887.16635
## Innovacion      1.725e+03      1.199e+03  6.590e-01   0.07798
## Empleo          4.639e+02      8.872e+02  7.798e-02   0.26525
## 
## Correlation matrix of residuals:
##                IED_Flujos PIB_Per_Capita Innovacion Empleo
## IED_Flujos         1.0000         0.5080     0.2906 0.1232
## PIB_Per_Capita     0.5080         1.0000     0.3517 0.4104
## Innovacion         0.2906         0.3517     1.0000 0.1865
## Empleo             0.1232         0.4104     0.1865 1.0000

# the lag on the dependent variable is negative, however it is not statistically significant
# GDP has a positive correlation with a significance of 0.01. It also has the biggest coefficient.
# Employment also has a significant relation with a value of 0.01, however it shows a negative correlation.


# -Detect if the estimated VAR_Model residuals are stationary.
VAR_model1_residuals<-data.frame(residuals(VAR_model1))
adf.test(VAR_model1_residuals$IED_Flujos)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  VAR_model1_residuals$IED_Flujos
## Dickey-Fuller = -3.3617, Lag order = 3, p-value = 0.08227
## alternative hypothesis: stationary

#The test gives us a p-value of 0.093, which means that we reject Ho (data is non/stationary), which mean we accept H1, DATA IS STATIONARY .

#- Detect if the estimated VAR_Model residuals show serial autocorrelation.
Box.test(VAR_model1_residuals$IED_Flujos,lag=1,type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  VAR_model1_residuals$IED_Flujos
## X-squared = 0.19326, df = 1, p-value = 0.6602

# ho: no serial autocorrelacion 
# with our p-value of 0.7185, we can see how we reject Ho, that means WE HAVE AUTOCORERELATION.


#- Based on the regression results and diagnostic tests, select the VAR_Model that you consider might generate the best forecast.
#- Briefly interpret the regression results. That is, is there a statistically significantrelationship between the explanatory variable(s) and themain dependent variable?

#VAR 2

# Transform non-stationary time series variables to stationary 
diff_Exportaciones<-diff(log(Exportaciones))
diff_Innovacion<-diff(log(Innovacion))
diff_Inseguridad_Robo<-diff(log(Inseguridad_Robo))
diff_IED_Flujos<-diff(log(IED_Flujos))
diff_PIB_Per_Capita<-diff(log(PIB_Per_Capita))

# plotting differenced time series variables 
par(mfrow=c(2,3))
plot(diff_Exportaciones)
plot(diff_Innovacion)
plot(diff_Inseguridad_Robo)
plot(diff_IED_Flujos)
plot(diff_PIB_Per_Capita)


# plotting differenced time series variables 
par(mfrow=c(2,3))

ts_plot(diff_Exportaciones)

ts_plot(diff_Innovacion)

ts_plot(diff_Inseguridad_Robo)

ts_plot(diff_IED_Flujos)

ts_plot(diff_PIB_Per_Capita)

VAR_tsdiff<-cbind(diff_IED_Flujos,diff_PIB_Per_Capita,diff_Innovacion,diff_Exportaciones)
colnames(VAR_tsdiff)<-cbind("IED_Flujos","PIB_Per_Capita","diff_Innovacion", "diff_Exportaciones")

lag_diff_selection<-VARselect(VAR_tsdiff,lag.max=10,type="const")
lag_diff_selection$selection

## AIC(n)  HQ(n)  SC(n) FPE(n) 
##      5      5      5      5

lag_diff_selection$criteria

##                    1             2             3             4    5    6    7
## AIC(n) -1.728972e+01 -1.783294e+01 -2.175012e+01 -1.217829e+02 -Inf -Inf -Inf
## HQ(n)  -1.715331e+01 -1.758740e+01 -2.139545e+01 -1.213191e+02 -Inf -Inf -Inf
## SC(n)  -1.630042e+01 -1.605219e+01 -1.917793e+01 -1.184193e+02 -Inf -Inf -Inf
## FPE(n)  3.290084e-08  2.670307e-08  1.638148e-09  1.012237e-50    0    0    0
##           8    9   10
## AIC(n) -Inf -Inf -Inf
## HQ(n)  -Inf -Inf -Inf
## SC(n)  -Inf -Inf -Inf
## FPE(n)    0    0    0

VAR_model2<-VAR(VAR_tsdiff,p=2,type="const",season=NULL,exog=NULL) 
summary(VAR_model2)

## 
## VAR Estimation Results:
## ========================= 
## Endogenous variables: IED_Flujos, PIB_Per_Capita, diff_Innovacion, diff_Exportaciones 
## Deterministic variables: const 
## Sample size: 26 
## Log Likelihood: 109.334 
## Roots of the characteristic polynomial:
## 0.6543 0.6543 0.5453 0.4946 0.4946 0.4489 0.205 0.205
## Call:
## VAR(y = VAR_tsdiff, p = 2, type = "const", exogen = NULL)
## 
## 
## Estimation results for equation IED_Flujos: 
## =========================================== 
## IED_Flujos = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)   
## IED_Flujos.l1         -7.418e-01  2.820e-01  -2.631  0.01753 * 
## PIB_Per_Capita.l1     -5.574e+00  4.127e+00  -1.351  0.19453   
## diff_Innovacion.l1     3.232e+00  1.328e+00   2.434  0.02625 * 
## diff_Exportaciones.l1  1.285e+00  4.354e-01   2.950  0.00895 **
## IED_Flujos.l2         -3.353e-01  2.821e-01  -1.189  0.25091   
## PIB_Per_Capita.l2      3.174e+00  4.442e+00   0.715  0.48456   
## diff_Innovacion.l2    -8.270e-01  1.508e+00  -0.548  0.59065   
## diff_Exportaciones.l2 -9.681e-02  4.767e-01  -0.203  0.84149   
## const                  4.355e-17  6.461e-02   0.000  1.00000   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.3294 on 17 degrees of freedom
## Multiple R-Squared: 0.5156,  Adjusted R-squared: 0.2876 
## F-statistic: 2.262 on 8 and 17 DF,  p-value: 0.07489 
## 
## 
## Estimation results for equation PIB_Per_Capita: 
## =============================================== 
## PIB_Per_Capita = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1         -3.261e-02  3.227e-02  -1.011    0.326
## PIB_Per_Capita.l1     -4.338e-01  4.722e-01  -0.918    0.371
## diff_Innovacion.l1     2.161e-01  1.520e-01   1.422    0.173
## diff_Exportaciones.l1  7.124e-02  4.983e-02   1.430    0.171
## IED_Flujos.l2         -2.238e-02  3.227e-02  -0.693    0.497
## PIB_Per_Capita.l2      2.438e-01  5.083e-01   0.480    0.638
## diff_Innovacion.l2    -1.064e-01  1.726e-01  -0.616    0.546
## diff_Exportaciones.l2 -6.310e-03  5.455e-02  -0.116    0.909
## const                  2.722e-18  7.393e-03   0.000    1.000
## 
## 
## Residual standard error: 0.0377 on 17 degrees of freedom
## Multiple R-Squared: 0.2167,  Adjusted R-squared: -0.1519 
## F-statistic: 0.5879 on 8 and 17 DF,  p-value: 0.7746 
## 
## 
## Estimation results for equation diff_Innovacion: 
## ================================================ 
## diff_Innovacion = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)
## IED_Flujos.l1         -7.363e-02  6.060e-02  -1.215    0.241
## PIB_Per_Capita.l1      4.540e-01  8.869e-01   0.512    0.615
## diff_Innovacion.l1     1.980e-01  2.854e-01   0.694    0.497
## diff_Exportaciones.l1  3.165e-02  9.357e-02   0.338    0.739
## IED_Flujos.l2         -2.619e-02  6.061e-02  -0.432    0.671
## PIB_Per_Capita.l2      5.117e-01  9.546e-01   0.536    0.599
## diff_Innovacion.l2     2.461e-02  3.241e-01   0.076    0.940
## diff_Exportaciones.l2 -6.066e-02  1.024e-01  -0.592    0.562
## const                  9.526e-18  1.388e-02   0.000    1.000
## 
## 
## Residual standard error: 0.07079 on 17 degrees of freedom
## Multiple R-Squared: 0.2121,  Adjusted R-squared: -0.1587 
## F-statistic: 0.5719 on 8 and 17 DF,  p-value: 0.7868 
## 
## 
## Estimation results for equation diff_Exportaciones: 
## =================================================== 
## diff_Exportaciones = IED_Flujos.l1 + PIB_Per_Capita.l1 + diff_Innovacion.l1 + diff_Exportaciones.l1 + IED_Flujos.l2 + PIB_Per_Capita.l2 + diff_Innovacion.l2 + diff_Exportaciones.l2 + const 
## 
##                         Estimate Std. Error t value Pr(>|t|)  
## IED_Flujos.l1         -3.944e-02  3.085e-01  -0.128   0.8998  
## PIB_Per_Capita.l1     -8.311e+00  4.515e+00  -1.841   0.0832 .
## diff_Innovacion.l1     2.022e+00  1.453e+00   1.392   0.1819  
## diff_Exportaciones.l1  7.907e-01  4.764e-01   1.660   0.1153  
## IED_Flujos.l2         -3.666e-02  3.086e-01  -0.119   0.9068  
## PIB_Per_Capita.l2      4.776e+00  4.860e+00   0.983   0.3396  
## diff_Innovacion.l2    -1.669e+00  1.650e+00  -1.011   0.3261  
## diff_Exportaciones.l2 -3.791e-01  5.216e-01  -0.727   0.4772  
## const                  5.443e-18  7.069e-02   0.000   1.0000  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## 
## Residual standard error: 0.3604 on 17 degrees of freedom
## Multiple R-Squared: 0.2376,  Adjusted R-squared: -0.1212 
## F-statistic: 0.6621 on 8 and 17 DF,  p-value: 0.7172 
## 
## 
## 
## Covariance matrix of residuals:
##                    IED_Flujos PIB_Per_Capita diff_Innovacion diff_Exportaciones
## IED_Flujos           0.108526      0.0061754       0.0064028           0.060377
## PIB_Per_Capita       0.006175      0.0014210       0.0007026           0.011203
## diff_Innovacion      0.006403      0.0007026       0.0050118          -0.001556
## diff_Exportaciones   0.060377      0.0112034      -0.0015560           0.129908
## 
## Correlation matrix of residuals:
##                    IED_Flujos PIB_Per_Capita diff_Innovacion diff_Exportaciones
## IED_Flujos             1.0000         0.4973         0.27454            0.50849
## PIB_Per_Capita         0.4973         1.0000         0.26327            0.82459
## diff_Innovacion        0.2745         0.2633         1.00000           -0.06098
## diff_Exportaciones     0.5085         0.8246        -0.06098            1.00000

#VAR2 model is using 2 lags insead of 1, also its using one more explanatory variable
# In this new model, we can now see statistical significance between the lag of the dependent variables, which is negative once again but with the highest coefficient talking in absoluy=t value terms
# We can also observe that the difference of Exportaciones shows has a positive correlation with a significance of 0.01. It also has the biggest coefficient.
# Innovacion also has a pretty small but not insignificant relation with a value of 0.05.

# -Detect if the estimated VAR_Model residuals are stationary.
VAR_model2_residuals<-data.frame(residuals(VAR_model2))
adf.test(VAR_model2_residuals$IED_Flujos)

## 
##  Augmented Dickey-Fuller Test
## 
## data:  VAR_model2_residuals$IED_Flujos
## Dickey-Fuller = -2.8409, Lag order = 2, p-value = 0.252
## alternative hypothesis: stationary

#The test gives us a p-value of 0.2959, which means that we reject Ho (data is non-stationary), which mean we accept H1, DATA IS STATIONARY.
# We can note how this p-value is bigger than VAR1

#- Detect if the estimated VAR_Model residuals show serial autocorrelation.
Box.test(VAR_model2_residuals$IED_Flujos,lag=2,type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  VAR_model2_residuals$IED_Flujos
## X-squared = 0.045666, df = 2, p-value = 0.9774

# ho: no serial autocorrelacion 
# with our p-value of 0.8904, we can see how we reject Ho, that means WE HAVE AUTOCORERELATION.

SELECTED VAR MODEL

After analizing both previous models, the wise decision is to select the second one due to these next points: -VAR 2 residuals are STATIONARY -VAR 2 has 3 significant explanatory variables instead of just 2 -Both VAR1 and VAR2 have autocorrelation.

#CAUSALITY

granger_diff_IED<-causality(VAR_model2,cause="IED_Flujos")
granger_diff_IED

## $Granger
## 
##  Granger causality H0: IED_Flujos do not Granger-cause PIB_Per_Capita
##  diff_Innovacion diff_Exportaciones
## 
## data:  VAR object VAR_model2
## F-Test = 0.50402, df1 = 6, df2 = 68, p-value = 0.8032
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: IED_Flujos and PIB_Per_Capita
##  diff_Innovacion diff_Exportaciones
## 
## data:  VAR object VAR_model2
## Chi-squared = 6.7943, df = 3, p-value = 0.07875

#Granger p-value: 0.7235
#Instant p-value: 0.04989
#We can observe how this model we fail to reject the Ho in Granger Causality. (p-value>0.05)
#This means that Direct Foreign Investment DO NOT have a predictive effect in the variables selected.

#On the other hand, we have Instant Causality.
#We can observe how this model we accepts the Ho in Instant Causality. (p-value<0.05)
#This means that there is evidence that Direct Foreign Investment DO INDEED has a immediate or simultaneous causal relationship between the variables selected.

#Just for comapring purposes, I tested VAR1 causality. We can observe how in that model there is not granger causality neither instant casuality.
granger2_diff_IED<-causality(VAR_model1,cause="IED_Flujos")
granger2_diff_IED

## $Granger
## 
##  Granger causality H0: IED_Flujos do not Granger-cause PIB_Per_Capita
##  Innovacion Empleo
## 
## data:  VAR object VAR_model1
## F-Test = 0.67631, df1 = 3, df2 = 92, p-value = 0.5687
## 
## 
## $Instant
## 
##  H0: No instantaneous causality between: IED_Flujos and PIB_Per_Capita
##  Innovacion Empleo
## 
## data:  VAR object VAR_model1
## Chi-squared = 6.1637, df = 3, p-value = 0.1039

#Granger p-value: 0.7235
#Instant p-value: 0.04989

#FORECAST

#Based on the selected VAR_Model, forecast the increasing / decreasing trend of FDI inflows in Mexico for the next 5 periods. Display the forecast in a time series plot.

forecast<-predict(VAR_model2,n.ahead=5,ci=0.95) ### forecast for the next four quarters
fanchart(forecast,names="IED_Flujos",main="Direct Foreign Investment",xlab="Time Period",ylab="Direct Foreign Investment")

#Though this graph we can observe how the next 5 predicted values for Direct Foreign Investment are going down, showing an incoming negative trend.

Conclusions and Recommendations

Briefly describe the main insights from previous sections.
Based on the selected results, please share at least 1 recommendation that address the problem situation.
Insights of each graph and code are written just below them.
The selected models were ARIMA and VAR2
Based on our results from both ARIMA and VAR2 models, we can observe how it is not likely for Direct Foreign Investment to increment drastically the next few years according to the data given. It is predicted a steady behavior or even negative trend. Knowing this, and taking into consideration the random outcome in decomposition, i would suggest to deepen our research of variables explaining DFI, since there is evidence suggesting that exist more important variables which are not being taking into consideration to further improve the models. -At the same time, I would recommend to keep an eye for these Foreign Investment, as the mexican government i would suggest improving our situation in these explanatory variables mentioned in the analysis to keep resulting an interesting and atr=tractive to foreign countries to invest.

Bibliography

Time Series. (2008). Springer EBooks, 536–539. https://doi.org/10.1007/978-0-387-32833-1_401

Time Series Analysis: Definition, Types, Techniques, and When It’s Used. (2023). Tableau. https://www.tableau.com/learn/articles/time-series-analysis#definition

Mexico Rides Nearshoring Wave | Morgan Stanley. (2023). Morgan Stanley; Morgan Stanley. https://www.morganstanley.com/ideas/mexico-nearshoring-gdp-growth

Evidence 2. Time Series Model Report

Mariana Garcia

2023-09-11

Introduction

Background

Problem Situation

Data and Methodology

loading required libraries

0. Import DataBase

1. Set Date Format

2. Describe variables

Time Series Regression Analysis

3. Time Series Model 1

Selected Model

4. Time series model 2

1. Data imputation

2. Date format

4. Time series model 2

SELECTED VAR MODEL

Conclusions and Recommendations

Bibliography