Context

What is “Nearshoring” and why Mexico might be attractive to it?
Nearshoring is a business strategy that has gained a lot of strength in recent months, it is based on shortening value chains, bringing production closer to the final market, compared to its leading predecessor “offshoring” not only seeks the services of a third party country, but rather seeks to reduce transportation and logistics costs, in turn transit times, improving the efficiency of the chain and allowing a better connection between different areas of the company. (Duran, 2023).

Mexico is considered one of the best alternatives for the relocation of value chains, mainly for the North American market, being one of the largest markets in the world. In Mexico, there is a sufficiently developed infrastructure and high-level human capital to attract the attention of American companies, other factors that make Mexico a great alternative for this market is the shared land border, this would greatly reduce the transportation costs and times. Nearshoring can be very attractive for Mexico because economic sectors of the country can be strengthened, even the T-MEC can be used to achieve greater benefits within sectors such as the automotive industry. (Duran, 2023).

What is “Predictive Analytics”?
Predictive analytics is one of the 3 great methods of data analysis, this method is used to make evaluations of what will happen in the future, obtaining historical information, necessary to carry out statistical modeling that will define the possibilities of future results, even Machine learning techniques are applied to improve the analysis. A good predictive analytics needs a good previous descriptive analytics, to know the nature of the data and understand the database that you are managing. (University of Bath, 2021).

Regression analysis is the main tool of predictive analysis, this is a statistical process that looks for relationships between variables to predict the future values of your dependent variable using at least one independent variable. There are various types of regression models that are used in predictive analysis, the main ones are: linear regression (with all its variants), lasso regression and rigid regression. Lasso and Ridge are used to check significant variables towards the prediction of your dependent. For a good predictive analysis it is recommended to carry out multiple models and compare their attitude and their predictive capacity. (Wohlwend, 2023)

How regression analysis can help us to predict the occurrence of “Nearshoring” for the Mexican case?
In Mexico there are a large number of economic and social variables that can be thought to explain the nearshoring phenomenon, of which there are many records for analysis. Having a good database is essential for the regression analysis, which will analyze these independent variables and find the relationship between them and the variable that will help us measure nearshoring, which in the case of Mexico would be foreign direct investment.

Problem Situation
Starting with the COVID-19 pandemic, many sectors and countries were affected in various ways. One of the most affected countries was China, as it was the epicenter of the pandemic. The effect of the pandemic in China had an international impact, by concentrating the productions of the largest economic markets in the country, the supply chains were broken, affecting the global economy.
What happened in China made the economic markets think about stopping concentrating their productions in the country and looking for alternatives. For the American market, the option that seems to be the most striking is nearshoring, seeking to transfer their production processes to countries like Mexico that They have great capacity, labor and a closeness that would reduce costs and facilitate transfers to the American market.

# Import the database to the Rmarkdown
bd = read.csv("C:\\Users\\Silva\\Documents\\Tec\\CSV\\Semestre5\\sp_data.csv")

Libraries

# Calling the libraries that would be used on the analysis
library(tidyverse)
library(ggplot2)
library(corrplot)
library(gmodels)
library(effects)
library(stargazer)
library(olsrr)        
library(kableExtra)
library(jtools)
library(fastmap)
library(dlookr)
library(Hmisc)
library(naniar)
library(glmnet)
library(caret)
library(car)
library(lmtest)
library(xts)
library(dygraphs)
library(tseries)

Exploratory Data Analysis (EDA)

Missing Values

# Identifying missing values 
missing_values = colSums(is.na(bd))
missing_values
##               periodo                 IED_M       Exportaciones_m 
##                     0                     0                     0 
##                Empleo             Educacion        Salario_Diario 
##                     3                     3                     0 
##            Innovacion      Inseguridad_Robo Inseguridad_Homicidio 
##                     2                     0                     1 
##        Tipo_de_Cambio    Densidad_Carretera    Densidad_Poblacion 
##                     0                     0                     0 
##         CO2_Emisiones        PIB_Per_Capita                  INPC 
##                     3                     0                     0 
##     crisis_financiera 
##                     0
## Missing values were found in variables, since there are few records in the database it is important to keep them, an imputation of values will be made in the missing ones with the median of the variable to affect the analysis of the data as little as possible.

## Eliminate missing values
bd <- bd %>%
  mutate(across(everything(), ~ifelse(is.na(.), median(., na.rm = TRUE), .)))

Data Structure

## Structure of data
str(bd)
## 'data.frame':    26 obs. of  16 variables:
##  $ periodo              : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ IED_M                : num  294151 210876 299734 362632 546548 ...
##  $ Exportaciones_m      : num  220091 248691 235961 248057 205483 ...
##  $ Empleo               : num  96.5 96.5 96.5 97.8 97.4 ...
##  $ Educacion            : num  7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
##  $ Salario_Diario       : num  24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovacion           : num  11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num  267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num  14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num  8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num  0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
##  $ Densidad_Poblacion   : num  47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num  3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
##  $ PIB_Per_Capita       : num  127570 126739 129165 130875 128083 ...
##  $ INPC                 : num  33.3 39.5 44.3 48.3 50.4 ...
##  $ crisis_financiera    : int  0 0 0 0 0 0 0 0 0 0 ...

Descriptive Statistics

summary(bd)
##     periodo         IED_M        Exportaciones_m      Empleo     
##  Min.   :1997   Min.   :210876   Min.   :205483   Min.   :95.06  
##  1st Qu.:2003   1st Qu.:368560   1st Qu.:262337   1st Qu.:96.08  
##  Median :2010   Median :497054   Median :366294   Median :96.53  
##  Mean   :2010   Mean   :493596   Mean   :433856   Mean   :96.48  
##  3rd Qu.:2016   3rd Qu.:578606   3rd Qu.:632356   3rd Qu.:97.01  
##  Max.   :2022   Max.   :754438   Max.   :785655   Max.   :97.83  
##    Educacion     Salario_Diario     Innovacion    Inseguridad_Robo
##  Min.   :7.200   Min.   : 24.30   Min.   :11.28   Min.   :120.5   
##  1st Qu.:7.957   1st Qu.: 41.97   1st Qu.:12.60   1st Qu.:148.3   
##  Median :8.460   Median : 54.48   Median :13.09   Median :181.8   
##  Mean   :8.428   Mean   : 65.16   Mean   :13.10   Mean   :185.4   
##  3rd Qu.:8.925   3rd Qu.: 72.31   3rd Qu.:13.61   3rd Qu.:209.9   
##  Max.   :9.580   Max.   :172.87   Max.   :15.11   Max.   :314.8   
##  Inseguridad_Homicidio Tipo_de_Cambio  Densidad_Carretera Densidad_Poblacion
##  Min.   : 8.04         Min.   : 8.06   Min.   :0.05000    Min.   :47.44     
##  1st Qu.:10.40         1st Qu.:10.75   1st Qu.:0.06000    1st Qu.:52.77     
##  Median :16.93         Median :13.02   Median :0.07000    Median :58.09     
##  Mean   :17.28         Mean   :13.91   Mean   :0.07115    Mean   :57.33     
##  3rd Qu.:22.34         3rd Qu.:18.49   3rd Qu.:0.08000    3rd Qu.:61.39     
##  Max.   :29.59         Max.   :20.66   Max.   :0.09000    Max.   :65.60     
##  CO2_Emisiones   PIB_Per_Capita        INPC        crisis_financiera
##  Min.   :3.590   Min.   :126739   Min.   : 33.28   Min.   :0.00000  
##  1st Qu.:3.842   1st Qu.:130964   1st Qu.: 56.15   1st Qu.:0.00000  
##  Median :3.930   Median :136845   Median : 73.35   Median :0.00000  
##  Mean   :3.943   Mean   :138550   Mean   : 75.17   Mean   :0.07692  
##  3rd Qu.:4.090   3rd Qu.:146148   3rd Qu.: 91.29   3rd Qu.:0.00000  
##  Max.   :4.220   Max.   :153236   Max.   :126.48   Max.   :1.00000

Dependent Variable Analysis

Statistics

describe(bd$IED_M)
## bd$IED_M 
##        n  missing distinct     Info     Mean      Gmd      .05      .10 
##       26        0       26        1   493596   167384   295547   312026 
##      .25      .50      .75      .90      .95 
##   368560   497054   578606   691611   700045 
## 
## lowest : 210876 294151 299734 324318 350979, highest: 671018 683318 699904 700092 754437

Normality

# Histogram of dependent variable to see the normality of the registers
hist(bd$IED_M)

# Histogram of the natural logarithmic of dependent variable to ...
hist(log(bd$IED_M))

FDI flows through the years

Foreign direct investment is a dependent variable that has a lot of volatility, it has had many ups and downs over the years but generally maintains a positive trend, its lowest values have been in 2009 and 2011.

Data Visualization

Employment rate

The employment variable has had a lot of variation in its affectation to the variable, it can hardly be found that there is a positive or negative trend in its relationship with the dependent variable, the relationship that exists between the two may be non-linear.

Scholarship years

The average level of schooling in Mexico seems to have a positive effect on foreign investment. This variable may be an indicator of labor capacity, which may attract the attention of foreign companies.

Homicide rate

Foreign direct investment in relation to the homicide rate also has volatility in its results, which can lead us to the conclusion that its relationship is positive, which raises many doubts and it is believed that the values of the homicide rate may become irrelevant with to explain the independent variable.

GDP per capita

GDP per capita may be one of the economic variables with the greatest impact on foreign direct investment, with the graph we can see a positive relationship between the two, the better Mexico does economically, the greater foreign investment will be.

Correlation

Most of the variables in the database have high positive relationships between them and with the dependent variable. Employment and theft are the variables that stand out for having negative relationships with the others. having a financial crisis in the country seems to be unrelated to any other variable.

Estimation method
The estimation method used will be Ordinary Least Squares (OLS), it is the most used method for linear regression models, which will be used to analyze and predict the possible results of foreign direct investment. The OLS method minimizes the sum of the squares of the differences between the observed values and the values predicted by the model.(XLSTAT, 2023)

Linear Regression Analysis

Hypotheses

hypothesis 1
H0: The variable “PIB Per Capita” has an positive and significant impact on the dependent variable.
H1: The variable “PIB Per Capita” has no significant impact on the dependent variable.

Hypothesis 2
H0: The variable “Empleo” has a linear impact on the dependent variable.
H1: The variable “Empleo” has not a linear impact on the dependent variable.

Hypothesis 3
H0: The variable “Inseguridad_homicidio” has a positive impact on the dependent variable
H1: The variable “Inseguridad_homicidio” has not a positive impact on the dependent variable

Regression models

Model 1

modelo_2 = lm(IED_M ~ Tipo_de_Cambio + Empleo + Salario_Diario + PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo,data= bd)
summary(modelo_2)
## 
## Call:
## lm(formula = IED_M ~ Tipo_de_Cambio + Empleo + Salario_Diario + 
##     PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo, data = bd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -148196  -35045  -14655   27300  216923 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)      -3.474e+06  4.055e+06  -0.857   0.4022  
## Tipo_de_Cambio    6.856e+03  1.519e+04   0.451   0.6569  
## Empleo            2.553e+04  3.583e+04   0.712   0.4848  
## Salario_Diario   -1.285e+03  1.159e+03  -1.109   0.2814  
## PIB_Per_Capita    1.195e+01  4.824e+00   2.477   0.0228 *
## CO2_Emisiones    -6.299e+03  1.718e+05  -0.037   0.9711  
## Inseguridad_Robo -7.392e+02  6.260e+02  -1.181   0.2522  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 95320 on 19 degrees of freedom
## Multiple R-squared:  0.6663, Adjusted R-squared:  0.5609 
## F-statistic: 6.323 on 6 and 19 DF,  p-value: 0.0008732
# Check the veracity of the model, coefficient used for comparison and selection of the variable.
AIC(modelo_2)
## [1] 677.8074

Model 2

modelo_5 = lm(log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo,data= bd)
summary(modelo_5)
## 
## Call:
## lm(formula = log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + 
##     PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31681 -0.07548 -0.03732  0.09449  0.40316 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       4.937e+00  8.801e+00   0.561   0.5814  
## Tipo_de_Cambio    1.159e-02  3.297e-02   0.352   0.7290  
## Empleo            5.597e-02  7.777e-02   0.720   0.4805  
## Salario_Diario   -2.671e-03  2.516e-03  -1.061   0.3018  
## PIB_Per_Capita    2.432e-05  1.047e-05   2.323   0.0314 *
## CO2_Emisiones    -3.903e-02  3.729e-01  -0.105   0.9177  
## Inseguridad_Robo -2.566e-03  1.359e-03  -1.888   0.0743 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2069 on 19 degrees of freedom
## Multiple R-squared:  0.6772, Adjusted R-squared:  0.5752 
## F-statistic: 6.643 on 6 and 19 DF,  p-value: 0.0006555
# Check the veracity of the model, coefficient used for comparison and selection of the variable.
AIC(modelo_5)
## [1] -0.3009011

Model 3

modelo_6 = lm(log(IED_M) ~ log(Tipo_de_Cambio) + log(Empleo) + log(Salario_Diario) + log(PIB_Per_Capita) + log(CO2_Emisiones) + log(Inseguridad_Robo),data= bd)
summary(modelo_6)
## 
## Call:
## lm(formula = log(IED_M) ~ log(Tipo_de_Cambio) + log(Empleo) + 
##     log(Salario_Diario) + log(PIB_Per_Capita) + log(CO2_Emisiones) + 
##     log(Inseguridad_Robo), data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33129 -0.10150 -0.02987  0.07535  0.43537 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           -58.8950    41.9032  -1.405   0.1760  
## log(Tipo_de_Cambio)     0.1503     0.5673   0.265   0.7938  
## log(Empleo)             6.2551     8.0883   0.773   0.4488  
## log(Salario_Diario)    -0.1777     0.2911  -0.610   0.5489  
## log(PIB_Per_Capita)     3.8096     1.5341   2.483   0.0225 *
## log(CO2_Emisiones)      0.3394     1.5077   0.225   0.8243  
## log(Inseguridad_Robo)  -0.3563     0.3023  -1.178   0.2532  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2187 on 19 degrees of freedom
## Multiple R-squared:  0.6394, Adjusted R-squared:  0.5255 
## F-statistic: 5.615 on 6 and 19 DF,  p-value: 0.0017
# Check the veracity of the model, coefficient used for comparison and selection of the variable.
AIC(modelo_6)
## [1] 2.575984

Comparison of models

stargazer(modelo_2,modelo_5,modelo_6,type="text",title="OLS Regression Results",single.row=TRUE,ci=FALSE,ci.level=0.9)
## 
## OLS Regression Results
## =================================================================================================
##                                                       Dependent variable:                        
##                               -------------------------------------------------------------------
##                                           IED_M                           log(IED_M)             
##                                            (1)                       (2)               (3)       
## -------------------------------------------------------------------------------------------------
## Tipo_de_Cambio                    6,855.639 (15,189.400)        0.012 (0.033)                    
## Empleo                           25,527.140 (35,829.660)        0.056 (0.078)                    
## Salario_Diario                    -1,285.295 (1,159.315)       -0.003 (0.003)                    
## PIB_Per_Capita                       11.949** (4.824)        0.00002** (0.00001)                 
## CO2_Emisiones                    -6,298.513 (171,784.000)      -0.039 (0.373)                    
## Inseguridad_Robo                    -739.248 (626.042)         -0.003* (0.001)                   
## log(Tipo_de_Cambio)                                                               0.150 (0.567)  
## log(Empleo)                                                                       6.255 (8.088)  
## log(Salario_Diario)                                                               -0.178 (0.291) 
## log(PIB_Per_Capita)                                                              3.810** (1.534) 
## log(CO2_Emisiones)                                                                0.339 (1.508)  
## log(Inseguridad_Robo)                                                             -0.356 (0.302) 
## Constant                      -3,474,435.000 (4,054,579.000)    4.937 (8.801)    -58.895 (41.903)
## -------------------------------------------------------------------------------------------------
## Observations                                26                       26                 26       
## R2                                        0.666                     0.677             0.639      
## Adjusted R2                               0.561                     0.575             0.526      
## Residual Std. Error (df = 19)           95,316.190                  0.207             0.219      
## F Statistic (df = 6; 19)                 6.323***                 6.643***           5.615***    
## =================================================================================================
## Note:                                                                 *p<0.1; **p<0.05; ***p<0.01

Diagnostic test

Model 2

## multicollinearity
vif(modelo_5)
##   Tipo_de_Cambio           Empleo   Salario_Diario   PIB_Per_Capita 
##        10.934894         1.835855         4.752951         5.027690 
##    CO2_Emisiones Inseguridad_Robo 
##         2.631591         2.450383
## heteroscedasticity
bptest(modelo_5)
## 
##  studentized Breusch-Pagan test
## 
## data:  modelo_5
## BP = 5.7852, df = 6, p-value = 0.4477
## normality of residuals
histogram(modelo_5$residuals)

In model 2, there is no multicollinearity, so the accuracy of the predictive power of the model can be trusted. The variable that had the highest value in the VIF test was the exchange rate. When performing the BPtest of the model, a P-value greater than 0.05 was obtained, this leads us to rule out H0, concluding that there is no heterosedasticity in the model.

Selected Model

Selection criteria

The model selection criteria will be based mainly on the comparison of the AIC statistic, which indicates the predictive power of each model. In this case, a regression model has better predictive qualities the lower its AIC compared to the others. The value of R2 of the model will also be taken into account to know the number of cases that the model explains. Finally, the selection will be confirmed once having multicollinearity and heterosedasticity in the model is ruled out.

Model

Model 2 is the one selected as it has a lower AIC and a higher R2 compared to the others, in addition to having ruled out multicollinearity and heterosedasticity. This model uses the transformation to the natural logarithm of the dependent variable, so its estimates should be interpreted as percentages. The variables that explain this model are: “Tipo_de_Cambio”, “Empleo”, “Salario_Diario”, “PIB_Per_Capita”, “CO2_Emisiones” and “Inseguridad_Robo”.

## 
## Call:
## lm(formula = log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + 
##     PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31681 -0.07548 -0.03732  0.09449  0.40316 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       4.937e+00  8.801e+00   0.561   0.5814  
## Tipo_de_Cambio    1.159e-02  3.297e-02   0.352   0.7290  
## Empleo            5.597e-02  7.777e-02   0.720   0.4805  
## Salario_Diario   -2.671e-03  2.516e-03  -1.061   0.3018  
## PIB_Per_Capita    2.432e-05  1.047e-05   2.323   0.0314 *
## CO2_Emisiones    -3.903e-02  3.729e-01  -0.105   0.9177  
## Inseguridad_Robo -2.566e-03  1.359e-03  -1.888   0.0743 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2069 on 19 degrees of freedom
## Multiple R-squared:  0.6772, Adjusted R-squared:  0.5752 
## F-statistic: 6.643 on 6 and 19 DF,  p-value: 0.0006555

Interpretation

The selected model has an R2 of 0.57 and a negative AIC, regarding the independent variables of the model it can be observed that the one with “PIB_Per_Capita” is the most significant to predict foreign direct investment, has a positive impact y is the variable with which it exerts the lowest percentage change for each unit increased. “Inseguridad_Robo” is the second most significant variable of the model, this variable has a negative impact, the higher the theft, the lower the foreign direct investment. “Tipo_de_Cambio” and “Empleo” are other variables of the model that have a positive impact on the dependent variable. “CO2_Emisiones” and “Salario_Diario” are variables with negative impact.

Glossary of variables
- “Tipo_de_Cambio” = Exchange rate
- “Empleo” = Employment rate
- “Salario_Diario” = Daily salary
- “PIB_Per_Capita” = GDP_Per_Capita
- “CO2_Emisiones” = CO2 emissions
- “Inseguridad_Robo” = insecurity (robbery)

Predicted values

avPlots(modelo_5)

Insights

  • Having good results in the country’s economic indicators encourages foreign companies to make investments within Mexico.
  • High daily wages can affect direct foreign investment, despite seeking closeness, care continues to be provided to cheap and good quality labor.
  • Poor ecological conditions can condition the interest of companies in making investments within the country.
  • Social problems such as theft generate a negative impact on investments, it can be considered as an indicator related to culture.
  • Nearshoring or foreign direct investment has had a large increase in the country over time, as of 2016 there was a small decline in its values but in future years it is expected to maintain a positive trend and increase in quantity.

Appendant

Lasso Model

set.seed(123)                                
training.samples<-bd$IED_M %>%
  createDataPartition(p=0.75,list=FALSE)       

train.data<-bd[training.samples, ]   
test.data<-bd[-training.samples, ]

selected_model = lm(log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo, data=bd) 
summary(selected_model)
## 
## Call:
## lm(formula = log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + 
##     PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.31681 -0.07548 -0.03732  0.09449  0.40316 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       4.937e+00  8.801e+00   0.561   0.5814  
## Tipo_de_Cambio    1.159e-02  3.297e-02   0.352   0.7290  
## Empleo            5.597e-02  7.777e-02   0.720   0.4805  
## Salario_Diario   -2.671e-03  2.516e-03  -1.061   0.3018  
## PIB_Per_Capita    2.432e-05  1.047e-05   2.323   0.0314 *
## CO2_Emisiones    -3.903e-02  3.729e-01  -0.105   0.9177  
## Inseguridad_Robo -2.566e-03  1.359e-03  -1.888   0.0743 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2069 on 19 degrees of freedom
## Multiple R-squared:  0.6772, Adjusted R-squared:  0.5752 
## F-statistic: 6.643 on 6 and 19 DF,  p-value: 0.0006555
RMSE(selected_model$fitted.values,test.data$IED_M)
## [1] 532900
x = model.matrix(log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo, train.data)[,-1]
y = train.data$IED_M

set.seed(123) 
cv.lasso<-cv.glmnet(x,y,alpha=1)

cv.lasso$lambda.min 
## [1] 2865.87
lassomodel<-glmnet(x,y,alpha=1,lambda=cv.lasso$lambda.min)

coef(lassomodel)
## 7 x 1 sparse Matrix of class "dgCMatrix"
##                             s0
## (Intercept)      -3.500878e+06
## Tipo_de_Cambio    1.397916e+04
## Empleo            2.945387e+04
## Salario_Diario   -1.154902e+03
## PIB_Per_Capita    8.214213e+00
## CO2_Emisiones     .           
## Inseguridad_Robo -6.260518e+02
x.test<-model.matrix(log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo,test.data)[,-1]

lassopredictions <- lassomodel %>% predict(x.test) %>% as.vector()

data.frame(
  RMSE = RMSE(lassopredictions, test.data$IED_M),
  Rsquare = R2(lassopredictions, test.data$IED_M))
##     RMSE  Rsquare
## 1 133038 0.606517
# Lasso model graph
lbs_fun <- function(fit, offset_x=1, ...) {
  L <- length(fit$lambda)
  x <- log(fit$lambda[L])+ offset_x
  y <- fit$beta[, L]
  labs <- names(y)
  text(x, y, labels=labs, ...)
}

lasso<-glmnet(scale(x),y,alpha=1)

plot(lasso,xvar="lambda",label=T)
lbs_fun(lasso)
abline(v=cv.lasso$lambda.min,col="red",lty=2)
abline(v=cv.lasso$lambda.1se,col="blue",lty=2)

Stationarity in FDI Flows series

bd$periodo<- as.Date(paste0(bd$periodo, "-01-01"))

bdxts<-xts(bd$IED_M,order.by=bd$periodo)

dygraph(bdxts, main = "Foreign Direct Investment Flows") %>% 
  dyOptions(colors = RColorBrewer::brewer.pal(4, "Dark2")) %>%
  dyShading(from = "2018-12-3",
            to = "2022-12-26", 
            color = "#FFE6E6")
# Stationary test
adf.test(bd$IED_M)
## 
##  Augmented Dickey-Fuller Test
## 
## data:  bd$IED_M
## Dickey-Fuller = -2.0122, Lag order = 2, p-value = 0.5677
## alternative hypothesis: stationary

With a p-value of 0.57 Fails to Reject the H0. Time series data is non-stationary.

Serial Autocorrelation

acf(bd$IED_M,main="Significant Autocorrelations")

The dependent variables has some serial autocorrelation on T1 from T1 to T-2, on further lags the autocorrelation can´t be considered significant.

References

Duran R. (2023). Nearshoring: 10 preguntas y respuestas sobre el tema del que todos hablan. EGADE. https://egade.tec.mx/es/egade-ideas/investigacion/nearshoring-10-preguntas-y-respuestas-sobre-el-tema-del-que-todos-hablan

University of Bath. (2021). Descriptive, predictive and prescriptive: three types of business analytics. The University of Bath. https://online.bath.ac.uk/content/descriptive-predictive-and-prescriptive-three-types-business-analytics#:~:text=There%20are%20three%20types%20of,should%20happen%20in%20the%20future

Wohlwend, B. (2023). Three Regression Models for Data Science: Linear Regression, Lasso Regression, and Ridge Regression. Medium. https://medium.com/@brandon93.w/three-regression-models-for-data-science-linear-regression-lasso-regression-and-ridge-regression-6aac73c0d7a5#:~:text=Comparison%20of%20Linear%2C%20Lasso%2C%20and%20Ridge%20Regression&text=Model%20Complexity%20and%20Overfitting%3A%20All,to%20limit%20the%20model’s%20complexity

XLSTAT. (2023). Ordinary Least Squares regression (OLS). XLSTAT. https://www.xlstat.com/en/solutions/features/ordinary-least-squares-regression-ols

