|
Time series analysis is a method in the area of statistics that works with time series data and trend analysis. Time series data follows periodic time periods measured in time intervals or a particular period. Time series is a series of data points ordered in an specific period of time. We make time series analysis to predict based on time series data. This type of analysis is important in areas such as finance, economics, ecology, and any discipline that needs to analyze how certain phenomena change over time. The main goals of time series analysis include understanding underlying patterns (like trends or seasonal variations), forecasting future values, and describing and modeling the time-dependent structure in the data. (TIBCO 2023)
Time series analysis is also key to many predictive modeling tasks, as understanding the temporal dependencies between observations can lead to more accurate and insightful forecasts.
These terms are related to time series analysis: - Stationarity is a crucial aspect of a time series. A time series is determined to be stationary when its statistical properties such as the average (mean) and the variance do not alter over time. It has a constant variance and mean, and the covariance is separate from time.
-Autocorrelation is the similarity between observations as a function of the time lag between them. Plotting autocorrelated data yields a graph similar to a sinusoidal function. (TABLEAU 2022)
According to NuvoCargo 2023, nearshoring is causing an increase in trade between the U.S. and Mexico, as more companies nearshore their business processes to Mexico. This is creating job opportunities and boosting both the Mexican and American economies, particularly in the manufacturing, finance, and IT sectors.
One of the biggest winners from nearshoring to Mexico, is the supply chain, as this trend is helping to shorten and streamline processes.
In addition, one of the significant changes that nearshoring is bringing to the U.S.-Mexico relationship is an increase in trade. As more and more companies nearshore their business processes to Mexico, the demand for trade between the two countries is increasing. According to the U.S. Census Bureau, the U.S.-Mexico trade reached $614.5 billion in 2020, with Mexico being the United States’ second-largest trading partner. As nearshoring continues to grow, this trend is expected to continue to rise.
Both the United States and Mexico win from this arrangement: While the U.S. saves costs and efficiency, Mexico has more employment opportunities and an economic boost. Recent government budget projections suggest that the Mexican GDP could increase by approximately 3.0% in 2023 and 2024, driven by manufacturing and nearshoring activities.
In addition, recent studies indicate that while a product manufactured in China contributes approximately 4% to the US economy, a product from Mexico contributes about 40% – a tenfold difference. Thus, for US companies eyeing Mexico, it’s not just about proximity, speed, or potentially lower labor costs; it also significantly benefits the American economy.
The purpose of this evidence is to help Maria, an analyst in a Mexican company that wants to know if Mexico can be attractive to other countries that want to make nearshoring in this country. She has made an investigation based on INEGI, Bank of Mexico and the Ministry of Economy, with some variables such as GDP per capita, daily wage, exportations in millions of dollars, exchange rate, road information, etc.
Basically she wants to know what econometric model she should use to help her predict the consequences of nearshoring in Mexico, why this country may be attractive to do nearshoring and what are some opportunities that Mexico has in terms of relocating businesses in this area.
With this work we want to know the explanatory variables that might explain the Nearshoring in Mexico. By also creating a forecast the increasing / decreasing trend of FDI inflows in Mexico for the next 5 periods.
# Import BD
library(foreign)
bd<- read.csv("C:\\Users\\85171075\\Desktop\\Mariana\\TEC\\Econometrics\\sp_data.csv")
summary(bd)
## periodo IED_Flujos IED_M Exportaciones
## Min. :1997 Min. : 8374 Min. :210876 Min. : 9088
## 1st Qu.:2003 1st Qu.:21367 1st Qu.:368560 1st Qu.:13260
## Median :2010 Median :27698 Median :497054 Median :21188
## Mean :2010 Mean :26770 Mean :493596 Mean :23601
## 3rd Qu.:2016 3rd Qu.:32183 3rd Qu.:578606 3rd Qu.:31601
## Max. :2022 Max. :48354 Max. :754438 Max. :46478
##
## Exportaciones_m Empleo Educacion Salario_Diario
## Min. :205483 Min. :95.06 Min. :7.200 Min. : 24.30
## 1st Qu.:262337 1st Qu.:95.89 1st Qu.:7.865 1st Qu.: 41.97
## Median :366294 Median :96.53 Median :8.460 Median : 54.48
## Mean :433856 Mean :96.47 Mean :8.423 Mean : 65.16
## 3rd Qu.:632356 3rd Qu.:97.08 3rd Qu.:9.000 3rd Qu.: 72.31
## Max. :785655 Max. :97.83 Max. :9.580 Max. :172.87
## NA's :3 NA's :3
## Innovacion Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio
## Min. :11.28 Min. :120.5 Min. : 8.04 Min. : 8.06
## 1st Qu.:12.56 1st Qu.:148.3 1st Qu.:10.25 1st Qu.:10.75
## Median :13.09 Median :181.8 Median :16.93 Median :13.02
## Mean :13.11 Mean :185.4 Mean :17.29 Mean :13.91
## 3rd Qu.:13.75 3rd Qu.:209.9 3rd Qu.:22.43 3rd Qu.:18.49
## Max. :15.11 Max. :314.8 Max. :29.59 Max. :20.66
## NA's :2 NA's :1
## Densidad_Carretera Densidad_Poblacion CO2_Emisiones PIB_Per_Capita
## Min. :0.05000 Min. :47.44 Min. :3.590 Min. :126739
## 1st Qu.:0.06000 1st Qu.:52.77 1st Qu.:3.830 1st Qu.:130964
## Median :0.07000 Median :58.09 Median :3.930 Median :136845
## Mean :0.07115 Mean :57.33 Mean :3.945 Mean :138550
## 3rd Qu.:0.08000 3rd Qu.:61.39 3rd Qu.:4.105 3rd Qu.:146148
## Max. :0.09000 Max. :65.60 Max. :4.220 Max. :153236
## NA's :3
## INPC crisis_financiera
## Min. : 33.28 Min. :0.00000
## 1st Qu.: 56.15 1st Qu.:0.00000
## Median : 73.35 Median :0.00000
## Mean : 75.17 Mean :0.07692
## 3rd Qu.: 91.29 3rd Qu.:0.00000
## Max. :126.48 Max. :1.00000
##
library(readxl)
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.3 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2)
library(corrplot)
## corrplot 0.92 loaded
library(gmodels)
library(effects)
## Loading required package: carData
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(stargazer)
##
## Please cite as:
##
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(olsrr)
##
## Attaching package: 'olsrr'
##
## The following object is masked from 'package:datasets':
##
## rivers
#library(kableExtra)
library(jtools)
library(fastmap)
#library(dlookr)
library(Hmisc)
##
## Attaching package: 'Hmisc'
##
## The following object is masked from 'package:jtools':
##
## %nin%
##
## The following objects are masked from 'package:dplyr':
##
## src, summarize
##
## The following objects are masked from 'package:base':
##
## format.pval, units
library(naniar)
library(glmnet)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Loaded glmnet 4.1-7
library(caret)
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(car)
##
## Attaching package: 'car'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:purrr':
##
## some
library(lmtest)
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(dplyr)
library(xts)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(zoo)
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
library(stats)
library(forecast)
library(astsa)
##
## Attaching package: 'astsa'
##
## The following object is masked from 'package:forecast':
##
## gas
library(corrplot)
library(AER)
## Loading required package: sandwich
## Loading required package: survival
##
## Attaching package: 'survival'
##
## The following object is masked from 'package:caret':
##
## cluster
library(dynlm)
library(vars)
## Loading required package: MASS
##
## Attaching package: 'MASS'
##
## The following object is masked from 'package:olsrr':
##
## cement
##
## The following object is masked from 'package:dplyr':
##
## select
##
## Loading required package: strucchange
##
## Attaching package: 'strucchange'
##
## The following object is masked from 'package:stringr':
##
## boundary
##
## Loading required package: urca
#library(mFilter)
library(TSstudio)
library(tidyverse)
library(sarima)
## Loading required package: stats4
##
## Attaching package: 'sarima'
##
## The following object is masked from 'package:astsa':
##
## sarima
##
## The following object is masked from 'package:stats':
##
## spectrum
library(stargazer)
library(xts)
library(dplyr)
library(zoo)
library(tseries)
library(stats)
library(forecast)
library(astsa)
library(corrplot)
library(AER)
library(vars)
library(dynlm)
library(TSstudio)
library(tidyverse)
library(sarima)
library(dygraphs)
# Import BD
library(foreign)
bd1<- read.csv("C:\\Users\\85171075\\Desktop\\Mariana\\TEC\\Econometrics\\sp_data.csv")
summary(bd1)
## periodo IED_Flujos IED_M Exportaciones
## Min. :1997 Min. : 8374 Min. :210876 Min. : 9088
## 1st Qu.:2003 1st Qu.:21367 1st Qu.:368560 1st Qu.:13260
## Median :2010 Median :27698 Median :497054 Median :21188
## Mean :2010 Mean :26770 Mean :493596 Mean :23601
## 3rd Qu.:2016 3rd Qu.:32183 3rd Qu.:578606 3rd Qu.:31601
## Max. :2022 Max. :48354 Max. :754438 Max. :46478
##
## Exportaciones_m Empleo Educacion Salario_Diario
## Min. :205483 Min. :95.06 Min. :7.200 Min. : 24.30
## 1st Qu.:262337 1st Qu.:95.89 1st Qu.:7.865 1st Qu.: 41.97
## Median :366294 Median :96.53 Median :8.460 Median : 54.48
## Mean :433856 Mean :96.47 Mean :8.423 Mean : 65.16
## 3rd Qu.:632356 3rd Qu.:97.08 3rd Qu.:9.000 3rd Qu.: 72.31
## Max. :785655 Max. :97.83 Max. :9.580 Max. :172.87
## NA's :3 NA's :3
## Innovacion Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio
## Min. :11.28 Min. :120.5 Min. : 8.04 Min. : 8.06
## 1st Qu.:12.56 1st Qu.:148.3 1st Qu.:10.25 1st Qu.:10.75
## Median :13.09 Median :181.8 Median :16.93 Median :13.02
## Mean :13.11 Mean :185.4 Mean :17.29 Mean :13.91
## 3rd Qu.:13.75 3rd Qu.:209.9 3rd Qu.:22.43 3rd Qu.:18.49
## Max. :15.11 Max. :314.8 Max. :29.59 Max. :20.66
## NA's :2 NA's :1
## Densidad_Carretera Densidad_Poblacion CO2_Emisiones PIB_Per_Capita
## Min. :0.05000 Min. :47.44 Min. :3.590 Min. :126739
## 1st Qu.:0.06000 1st Qu.:52.77 1st Qu.:3.830 1st Qu.:130964
## Median :0.07000 Median :58.09 Median :3.930 Median :136845
## Mean :0.07115 Mean :57.33 Mean :3.945 Mean :138550
## 3rd Qu.:0.08000 3rd Qu.:61.39 3rd Qu.:4.105 3rd Qu.:146148
## Max. :0.09000 Max. :65.60 Max. :4.220 Max. :153236
## NA's :3
## INPC crisis_financiera
## Min. : 33.28 Min. :0.00000
## 1st Qu.: 56.15 1st Qu.:0.00000
## Median : 73.35 Median :0.00000
## Mean : 75.17 Mean :0.07692
## 3rd Qu.: 91.29 3rd Qu.:0.00000
## Max. :126.48 Max. :1.00000
##
bd2<- read.csv("C:\\Users\\85171075\\Desktop\\Mariana\\TEC\\Econometrics\\sp_series.csv")
summary(bd2)
## periodo trimestre IED_Flujos
## Min. :1999 Length:96 Min. : 1341
## 1st Qu.:2005 Class :character 1st Qu.: 4351
## Median :2010 Mode :character Median : 6238
## Mean :2011 Mean : 7036
## 3rd Qu.:2016 3rd Qu.: 8053
## Max. :2023 Max. :22794
# setting time series format
bd2$periodo=as.yearqtr(bd2$periodo,format="%Y/%q")
# Descriptive stadistics of the dependent variable
summary(bd2$IED_Flujos)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1341 4351 6238 7036 8053 22794
# Visualizing time series data and plot
IEDxts<-xts(bd2$IED_Flujos,order.by=bd2$periodo)
dygraph(IEDxts, main = "IED") %>%
dyOptions(colors = RColorBrewer::brewer.pal(4, "Dark2")) %>%
dyShading(from = "2012/12/12",
to = "2022/01/12",
color = "#F81BD7")
Plot the variable IED_Flujos using a time series format: i) decompose the time series data into trend, seasonal, and random components.
Briefly, describe the decomposition time series plot. Do the time series data show a trend? Do the time series data show seasonality?
# Decompose a time series
# 1) observed: data observations
# 2) trend: increasing / decreasing value of data observations
# 3) seasonality: repeating short-term cycle in time series
# 4) noise: random variation in time series
IEDts<-ts(bd2$IED_Flujos,frequency=4,start=c(1999,1))
IED_decompose<-decompose(IEDts)
plot(IED_decompose)
# This decomposition does not show a trend too much, it is not constant over time.
# Based on the previous graph we can see seasonality on the rises and falls that happpens. We also can observe some peaks that are also repeating on the pattern.
# Stationary Test
# H0: Non-stationary and HA: Stationary. p-values < 0.05 reject the H0.
adf.test(bd2$IED_Flujos)
## Warning in adf.test(bd2$IED_Flujos): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: bd2$IED_Flujos
## Dickey-Fuller = -4.1994, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# As the p value is 0.01 which is less than 0.05 we reject the H0, bein stationary.
# Serial Autocorrelation
acf(bd2$IED_Flujos,main="Significant Autocorrelations")
#There are not too much serial autocorrelation in this vaiable.Autocorrelation measures the linear relationship between a series and a lagged version of itself.
# Model 1 ARIMA 1
IED_ARIMA <- Arima(log(bd2$IED_Flujos), order = c(2, 1, 1))
print(IED_ARIMA)
## Series: log(bd2$IED_Flujos)
## ARIMA(2,1,1)
##
## Coefficients:
## ar1 ar2 ma1
## 0.0012 -0.415 -0.8601
## s.e. 0.1056 0.102 0.0708
##
## sigma^2 = 0.246: log likelihood = -67.79
## AIC=143.59 AICc=144.03 BIC=153.8
# Plot ARIMA
plot(IED_ARIMA$residuals, main = "ARIMA(2,1,1) - IED")
acf(IED_ARIMA$residuals, main = "ACF - ARIMA (2,1,1)")
# this shows no autocorrelation
Box.test(IED_ARIMA$residuals, lag = 1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: IED_ARIMA$residuals
## X-squared = 0.14398, df = 1, p-value = 0.7044
adf.test(IED_ARIMA$residuals)
## Warning in adf.test(IED_ARIMA$residuals): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: IED_ARIMA$residuals
## Dickey-Fuller = -4.646, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# P-value 0.70 is bigger than 0.05 being non stationary.
# Model 2 ARIMA2
IED_ARIMA2 <- Arima(bd2$IED_Flujos, order = c(1, 1, 2))
print(IED_ARIMA2)
## Series: bd2$IED_Flujos
## ARIMA(1,1,2)
##
## Coefficients:
## ar1 ma1 ma2
## -0.4127 -0.5035 -0.4175
## s.e. 0.5588 0.5394 0.5076
##
## sigma^2 = 16072627: log likelihood = -922.53
## AIC=1853.05 AICc=1853.5 BIC=1863.27
plot(IED_ARIMA2$residuals, main = "ARIMA(1,1,2) - IED")
acf(IED_ARIMA2$residuals, main = "ACF - ARIMA (1,1,2)")
Box.test(IED_ARIMA2$residuals, lag = 1, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: IED_ARIMA2$residuals
## X-squared = 1.1404, df = 1, p-value = 0.2856
adf.test(IED_ARIMA2$residuals)
## Warning in adf.test(IED_ARIMA2$residuals): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: IED_ARIMA2$residuals
## Dickey-Fuller = -4.2515, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# P-value 0.28 is bigger than 0.05 being non stationary.
# Model 3 ARMA
summary(IED_ARMA<-arma(log(bd2$IED_Flujos),order=c(1,1)))
##
## Call:
## arma(x = log(bd2$IED_Flujos), order = c(1, 1))
##
## Model:
## ARMA(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41237 -0.35244 -0.00571 0.27709 1.50759
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## ar1 -0.2976 0.2271 -1.310 0.19003
## ma1 0.5173 0.1999 2.588 0.00967 **
## intercept 11.3149 1.9794 5.716 1.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Fit:
## sigma^2 estimated as 0.2688, Conditional Sum-of-Squares = 25.26, AIC = 152.3
plot(IED_ARMA)
dest<-exp(IED_ARMA$fitted.values)
plot(dest)
IED_ARMA_residuals<-IED_ARMA$residuals
Box.test(IED_ARMA_residuals,lag=5,type="Ljung-Box")
##
## Box-Ljung test
##
## data: IED_ARMA_residuals
## X-squared = 13.689, df = 5, p-value = 0.01771
IED_ARMA$residuals <- na.omit(IED_ARMA$residuals)
adf.test(IED_ARMA$residuals)
##
## Augmented Dickey-Fuller Test
##
## data: IED_ARMA$residuals
## Dickey-Fuller = -3.644, Lag order = 4, p-value = 0.03336
## alternative hypothesis: stationary
summary(IED_ARMA)
##
## Call:
## arma(x = log(bd2$IED_Flujos), order = c(1, 1))
##
## Model:
## ARMA(1,1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.41237 -0.35244 -0.00571 0.27709 1.50759
##
## Coefficient(s):
## Estimate Std. Error t value Pr(>|t|)
## ar1 -0.2976 0.2271 -1.310 0.19003
## ma1 0.5173 0.1999 2.588 0.00967 **
## intercept 11.3149 1.9794 5.716 1.09e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Fit:
## sigma^2 estimated as 0.2688, Conditional Sum-of-Squares = 25.26, AIC = 152.3
Box.test(IED_ARMA_residuals, lag = 5, type = "Ljung-Box")
##
## Box-Ljung test
##
## data: IED_ARMA_residuals
## X-squared = 13.689, df = 5, p-value = 0.01771
adf.test(IED_ARMA$residuals) #stationary
##
## Augmented Dickey-Fuller Test
##
## data: IED_ARMA$residuals
## Dickey-Fuller = -3.644, Lag order = 4, p-value = 0.03336
## alternative hypothesis: stationary
### P-value of Ljung-Box 0.5433 is bigger than 0.05 this means no serial autocorrelation.
## ADF says that p-value 0.01 is less than 0.05, being stationary.
# Forecast ARIMA 1
AIC(IED_ARIMA)
## [1] 143.5873
fitted_values_arima1 <- fitted(IED_ARIMA)
r_arima1 <- sqrt(mean((fitted_values_arima1 - bd2$IED_Flujos)^2))
print(r_arima1)
## [1] 8065.557
# Forecast ARIMA 2
AIC(IED_ARIMA2)
## [1] 1853.051
fitted_values_arima2 <- fitted(IED_ARIMA2)
r_arima2 <- sqrt(mean((fitted_values_arima2 - bd2$IED_Flujos)^2))
print(r_arima2)
## [1] 3924.657
# Forecast ARMA
ARMA <- arima(log(bd2$IED_Flujos), order = c(1, 0, 1))
AICAA <- AIC(ARMA)
AICAA
## [1] 152.6035
ARMA <- arima(log(bd2$IED_Flujos), order = c(1, 0, 1))
residuals_arma <- ARMA$residuals
r_arma <- sqrt(mean((log(bd2$IED_Flujos) - residuals_arma)^2))
print(r_arma)
## [1] 8.721015
AIC(IED_ARIMA)
## [1] 143.5873
AIC(IED_ARIMA2)
## [1] 1853.051
AICAA
## [1] 152.6035
#With this results we say that the best model for forecasting is model ARMA with the AIC lowest of 152.60. The Akaike Information Critera (AIC) is a widely used measure of a statistical model. It basically quantifies 1) the goodness of fit, and 2) the simplicity/parsimony, of the model into a single statistic. When comparing two models, the one with the lower AIC is generally “better”
PIB_Per_Capita: is an economic measure that is used to have an idea of the standard of living or economic well-being of a population in a specific country or region.
Exportaciones: Indicates Mexico’s capability in global trade, suggesting existing infrastructure and expertise in producing goods and services that meet international standards.
Educación: This indicate the availability of skilled labor, crucial for sectors requiring technical expertise.
Tipo_de_cambio: is a crucial tool in international economics and finance, affecting investment decisions, trade and, in general, the economic stability of countries.
In describing the above relationships, please include a time series plot that displays the selected variables’ performance over the time period.
ggplot(bd1, aes(x = periodo, y = Educacion)) +
geom_line(color = "pink") +
labs(title = "Time Series of ducacion",
x = "Date",
y = "ducacion") +
theme_minimal()
## Warning: Removed 3 rows containing missing values (`geom_line()`).
ggplot(bd1, aes(x = periodo, y = Inseguridad_Homicidio)) +
geom_line(color = "green") +
labs(title = "Time Series of Inseguridad_Homicidio",
x = "Date",
y = "Inseguridad_Homicidio") +
theme_minimal()
## Warning: Removed 1 row containing missing values (`geom_line()`).
ggplot(bd1, aes(x = periodo, y = Exportaciones)) +
geom_line(color = "purple") +
labs(title = "Time Series of Exportaciones",
x = "Date",
y = "Exportaciones") +
theme_minimal()
ggplot(bd1, aes(x = periodo, y = Tipo_de_Cambio)) +
geom_line(color = "red") +
labs(title = "Time Series of Tipo de cambio",
x = "Date",
y = "Tipo de cambio") +
theme_minimal()
ggplot(bd1, aes(x = periodo, y = Salario_Diario)) +
geom_line(color = "yellow") +
labs(title = "Time Series of Salario diario",
x = "Date",
y = "Salario Diario") +
theme_minimal()
Describe the hypothetical relationship / impact between each selected factor and the dependent variable IED_Flujos. For example, how does the exchange rate increase / reduce the foreign direct investment flows in Mexico?
The variable (Educacion): Represents the logged value of an ‘Educacion’ variable. For a 1% increase in ‘Educacion’, the foreign direct investment is expected to increase by 2.9459 units. This is statistically significant at the 0.01 level. For a 1% increase in ‘Inseguridad_Homicidio’, the the foreign direct investment is expected to decrease by 0.2959 units. This is significant at the 0.05 level.
Estimate a VAR_Model that includes at least 1 explanatory factor that might affect the dependent variable IED_Flujos.
for(column in names(bd1)) {
if(is.numeric(bd1[[column]])) {
bd1[[column]][is.na(bd1[[column]])] <- median(bd1[[column]], na.rm = TRUE)
}
}
# Check if variables are or not stationary
adf.test(bd1$IED_Flujos)
##
## Augmented Dickey-Fuller Test
##
## data: bd1$IED_Flujos
## Dickey-Fuller = -3.0832, Lag order = 2, p-value = 0.1597
## alternative hypothesis: stationary
VAR <- cbind(bd1$IED_Flujos, bd1$PIB_Per_Capita, bd1$Tipo_de_Cambio)
#colnames(VAR)<-cbind("bd2$IED_Flujos", "bd2$PIB_Per_Capita", "bd2$Tipo_de_Cambio")
lag_select<-VARselect(VAR,lag.max=5,type="const", season=52)
lag_select$selection
## AIC(n) HQ(n) SC(n) FPE(n)
## 1 1 1 1
lag_select$criteria
## 1 2 3 4 5
## AIC(n) -Inf -Inf -Inf -Inf -Inf
## HQ(n) -Inf -Inf -Inf -Inf -Inf
## SC(n) -Inf -Inf -Inf -Inf -Inf
## FPE(n) 0 0 0 0 0
# Transform non-stationary time series variables to stationary
diff_IED<-diff(bd1$IED_Flujos)
diff_GDP<-diff(bd1$PIB_Per_Capita)
diff_Exchange<-diff(bd1$Tipo_de_Cambio)
VARld <- cbind(diff_IED, diff_GDP, diff_Exchange)
colnames(VARld)<-cbind("IED","GDP","Exchange rate")
VARm1<-VAR(VARld,p=1,type="const",season=NULL,exog=NULL)
summary(VARm1)
##
## VAR Estimation Results:
## =========================
## Endogenous variables: IED, GDP, Exchange.rate
## Deterministic variables: const
## Sample size: 24
## Log Likelihood: -504.546
## Roots of the characteristic polynomial:
## 0.3893 0.3421 0.2301
## Call:
## VAR(y = VARld, p = 1, type = "const", exogen = NULL)
##
##
## Estimation results for equation IED:
## ====================================
## IED = IED.l1 + GDP.l1 + Exchange.rate.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED.l1 -0.7119 0.1733 -4.107 0.000548 ***
## GDP.l1 0.8930 0.5135 1.739 0.097380 .
## Exchange.rate.l1 -3124.8934 1202.0980 -2.600 0.017144 *
## const 2792.4588 1566.8882 1.782 0.089911 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## Residual standard error: 6844 on 20 degrees of freedom
## Multiple R-Squared: 0.489, Adjusted R-squared: 0.4124
## F-statistic: 6.38 on 3 and 20 DF, p-value: 0.003284
##
##
## Estimation results for equation GDP:
## ====================================
## GDP = IED.l1 + GDP.l1 + Exchange.rate.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED.l1 -0.04424 0.07501 -0.590 0.562
## GDP.l1 0.36042 0.22220 1.622 0.120
## Exchange.rate.l1 -5.24003 520.20423 -0.010 0.992
## const 647.20008 678.06610 0.954 0.351
##
##
## Residual standard error: 2962 on 20 degrees of freedom
## Multiple R-Squared: 0.1195, Adjusted R-squared: -0.01258
## F-statistic: 0.9047 on 3 and 20 DF, p-value: 0.4563
##
##
## Estimation results for equation Exchange.rate:
## ==============================================
## Exchange.rate = IED.l1 + GDP.l1 + Exchange.rate.l1 + const
##
## Estimate Std. Error t value Pr(>|t|)
## IED.l1 3.862e-05 3.286e-05 1.175 0.254
## GDP.l1 2.543e-05 9.734e-05 0.261 0.797
## Exchange.rate.l1 7.416e-02 2.279e-01 0.325 0.748
## const 3.087e-01 2.970e-01 1.039 0.311
##
##
## Residual standard error: 1.297 on 20 degrees of freedom
## Multiple R-Squared: 0.07951, Adjusted R-squared: -0.05856
## F-statistic: 0.5759 on 3 and 20 DF, p-value: 0.6375
##
##
##
## Covariance matrix of residuals:
## IED GDP Exchange.rate
## IED 46844023 4114725.2 -1934.792
## GDP 4114725 8772476.7 -72.103
## Exchange.rate -1935 -72.1 1.683
##
## Correlation matrix of residuals:
## IED GDP Exchange.rate
## IED 1.0000 0.20298 -0.21788
## GDP 0.2030 1.00000 -0.01876
## Exchange.rate -0.2179 -0.01876 1.00000
Detect if the estimated VAR_Model residuals are stationary.
# Detect if the estimated VAR_Model residuals are stationary.
VARm1_residuals<-data.frame(residuals(VARm1))
adf.test(VARm1_residuals$IED)
##
## Augmented Dickey-Fuller Test
##
## data: VARm1_residuals$IED
## Dickey-Fuller = -3.5146, Lag order = 2, p-value = 0.06187
## alternative hypothesis: stationary
# P-value is 0.04, smaller than 0.05 this is stattionary.
# Detect if the estimated VAR_Model residuals show serial autocorrelation.
Box.test(VARm1_residuals$IED,lag=1,type="Ljung-Box")
##
## Box-Ljung test
##
## data: VARm1_residuals$IED
## X-squared = 0.47979, df = 1, p-value = 0.4885
#P-value is greater than 0.05, which means no autocorrelation. Autocorrelation, also known as serial correlation, refers to the correlation of a time series with its own past and future values. It's a property of data where values from one time point are not independent from values at another time point. When there is "no autocorrelation," it means that the values in the time series (or the residuals from a regression model) are not related to their preceding (or subsequent) values.
Based on the regression results and diagnostic tests, select the VAR_Model that you consider might generate the best forecast.
Briefly interpret the regression results. That is, is there a statistically significant relationship between the explanatory variable(s) and the main dependent variable?
Is there an instantaneous causality between IED_Flujos and the selected explanatory variables? Estimate a Granger Causality Test to either reject or fail to reject the hypothesis of instantaneous causality.
granferdiff <- causality(VARm1,cause="IED")
granferdiff
## $Granger
##
## Granger causality H0: IED do not Granger-cause GDP Exchange.rate
##
## data: VAR object VARm1
## F-Test = 0.85197, df1 = 2, df2 = 60, p-value = 0.4317
##
##
## $Instant
##
## H0: No instantaneous causality between: IED and GDP Exchange.rate
##
## data: VAR object VARm1
## Chi-squared = 1.9218, df = 2, p-value = 0.3826
# as the p is greater than 0.05 we fail to reject the H0, meaning there is No instantaneous causality between: IED and Ins.Hom Educacion.
Based on the selected VAR_Model, forecast the increasing / decreasing trend of FDI inflows in Mexico for the next 5 periods. Display the forecast in a time series plot.
forecast1 <- predict(VARm1,n.ahead=60,ci=0.95)
fanchart(forecast1,names="IED_Flujos",main="IED_Flujos",xlab="Time Period",ylab="IED_Flujos")
## Warning in fanchart(forecast1, names = "IED_Flujos", main = "IED_Flujos", :
## Invalid variable name(s) supplied, using first variable.
Winning_model_forecast<-forecast(dest,h=5)
## Warning in ets(object, lambda = lambda, biasadj = biasadj,
## allow.multiplicative.trend = allow.multiplicative.trend, : Missing values
## encountered. Using longest contiguous portion of time series
Winning_model_forecast
## Point Forecast Lo 80 Hi 80 Lo 95 Hi 95
## 97 6188.802 5222.402 7155.202 4710.821 7666.783
## 98 6188.802 5222.402 7155.202 4710.821 7666.783
## 99 6188.802 5222.402 7155.202 4710.821 7666.783
## 100 6188.802 5222.402 7155.202 4710.821 7666.783
## 101 6188.802 5222.402 7155.202 4710.821 7666.783
plot(Winning_model_forecast)
autoplot(Winning_model_forecast)
Briefly describe the main insights from previous sections.(the interpretation of the models is below each section)
My winning model is the ARMA, I choose this model thanks to the results of the diagnostic tests based on AIC, with the AIC of 152.60. This ARMA model, (Autoregressive Moving Average) can help me to forecast time series data, (one variable in a period of time). It combines both autoregressive (AR) and moving average (MA) models to describe the autocorrelation in time series data. For instance, an ARMA model can be used to understand the values given in the data base of Nearshoring in Mexico.
As my winning model is ARMA it is important to mention that time series is stationary, meaning that its statistical properties do not change over time. For other data type, (not stationary), I should use a model like ARIMA (which includes an integrated term for non-stationary series) to model it. Once my model was fitted, it helped me to forecast future values of the flow of direct imports in Mexico
Based on the selected results, please share at least 1 recommendation that address the problem situation.
I first recommend to gather the data from the same periods, this caused me a lot of problems to analyze the information.
For future inversionists I would say that the variables Educacion and Inseguridad Homicidio are important factors that asffect IED flow, so I would recommend to check on this levels first.
A 1-unit increase in the logarithm of “Educacion” is related with an estimated increase of 2.54806 in the variable, holding other variables constant. This is statistically significant at the 0.05/5%.
A 1-unit increase in variable “Inseguridad_Homicidio” is related with a decrease of 0.34275 in the variable, and the other variables are constant. This is also statistically significant at the 5%l.
TIBCO (2023, March 24) Time Series Analysis https://www.tibco.com/reference-center/what-is-time-series-analysis
A.Figueroa (2023, August 9) The Rise of Nearshoring to Mexico https://www.nuvocargo.com/en/content/blog-posts/key-cross-border-trends-the-rise-of-nearshoring-to-mexico