Background

In an increasingly globalized and competitive business world, organizations are constantly seeking innovative strategies to streamline their operations and stay ahead in their respective industries. One of these strategies that has gained prominence in recent years is “Nearshoring”, an outsourcing approach that involves delegating activities and services to providers located in geographically close countries. In this context, Mexico emerges as an attractive destination for “Nearshoring”, thanks to its strategic location, trained human resources and favorable economic conditions.

Mexico has a privileged geographical location, sharing borders with the United States and being part of the North American Free Trade Agreement (NAFTA), today T-MEC, which facilitates access to one of the largest and most dynamic markets. of the world. This geographic proximity not only reduces transportation costs and times, but also facilitates real-time communication and collaboration between parent companies and their nearshoring partners. In addition, Mexico has established a network of trade agreements that grant tariff advantages and facilitate the flow of merchandise, which provides an attractive platform for subcontracting operations.

The country also has a highly-skilled and diversified workforce, ranging from engineers and technology professionals to experts in manufacturing and financial services. Labor costs in Mexico are competitive compared to other outsourcing destinations, allowing companies to obtain higher added value at lower cost. In addition, the continuous growth in education and training has promoted the training of highly qualified professionals, which guarantees the availability of the necessary talent to meet the demands of “Nearshoring”.

Predictive Analytics, or “Predictive Analysis”, is a discipline within the field of analytics that focuses on using historical data and advanced statistical models to forecast future results and trends. Its primary goal is to make reliable predictions about future events based on patterns and relationships identified in available data. In other words, predictive analytics allows you to anticipate what might happen based on what has happened in the past.

Within predictive analysis, regression is an essential technique. Regression is a statistical method that seeks to model the relationship between a dependent variable and one or more independent variables, in order to predict or estimate future values of the dependent variable. In the context of predictive analytics, regression becomes a valuable tool for understanding how independent variables influence the target variable and how they can be used to make forecasts.

Regression is especially useful for measuring and quantifying the relative impact of different variables on the target variable. In this work, we will work with some variables such as Education, Innovation, Population Density, exports and Employment to estimate a regression model that adjusts the dependent variable of Foreign Direct Investment, all this in the analysis of ” Nearshoring” for the Mexican case, By modeling these relationships, more informed predictions can be made about the results of “Nearshoring” in different contexts and under different conditions.

Problem Situation

In the context of the analysis of foreign direct investment (FDI) in Mexico, we seek to understand the relationships and factors that influence the flow of FDI in the country. FDI is a crucial economic indicator that impacts the growth and development of a nation. The set of variables available for this study includes data from the time period in question, as well as factors that could influence the flow of FDI, such as exchange rate, exports, employment, education, innovation, insecurity, road density, population density. , among others.

Linear regression analysis is presented as a fundamental tool for this purpose. Linear regression will allow modeling the relationship between the dependent variable, in this case the flow of FDI, and the aforementioned independent variables. Through this analysis we seek to identify those variables that have a significant impact on the flow of FDI. Additionally, the direction and magnitude of these relationships will be explored, providing an understanding of how changes in the independent variables are associated with changes in foreign direct investment.

This study focuses on providing a deeper perspective on the determinants of FDI in Mexico, allowing decision makers, investors and economic analysts to have a more solid understanding of how these variables influence the flow of investment. Regression analysis will not only provide valuable information for decision-making on economic policies and investment strategies, but could also help predict future FDI flows based on changes in the independent variables. All this in order to help Maria in her economic analysis for the company where she works.

Data and Methodology

It is important to mention the variables that will be worked with during the analysis. Throughout the analysis we will talk about units for each variable, where each unit is explained below…

— “FDI_Flujos”: Foreign Direct Investment (FDI) Unit: Mexican millions pesos Description: Represents the flows of Foreign Direct Investment in the economy, that is, the amount of money that enters the country as foreign investment.

— “Exportaciones”: Exports Unit: Mexican millions pesos Description: Corresponds to the value of exports of goods and services not related to oil. Includes exports from the Maquiladora Export Industry.

—“Empleo”: Employment Rate Unit: Percentage Rate Description: Indicates the percentage of the economically active population that is employed in a job.

— “Educacion”: Years of Education Unit: Years Description: Represents the average number of years of education of the population. The older the age, the higher the educational level.

— “Salario_Diario”: Minimum Daily Wage Unit: Pesos Description: Indicates the minimum wage in daily pesos, which is the base salary paid to workers per working day.

— “Inovacion”: Patent Rate Unit: Patent Rate per 100,000 inhabitants Description: Shows the number of patents requested in Mexico per 100,000 inhabitants. It reflects technological innovation in the country.

— “Inseguridad_robo”: Rate of Robbery with Violence Unit: Robbery Rate per 100,000 inhabitants Description: Represents the rate of violent robberies in different contexts, such as homes, vehicles, businesses, etc.

— “Inseguridad_Homicidio: Homicide Rate Unit: Homicide Rate per 100,000 inhabitants Description: Indicates the homicide rate per 100,000 inhabitants in the country.

— “Tipo_de_Cambio”: Exchange Rate Unit: Pesos per Dollar Description: Reflects the value of the Mexican peso in relation to the US dollar. It is important in international trade.

— “Densidad_Carretera”: Road Density Unit: Length in km² Description: Measures the length of kilometers of paved roads for each km² of the country’s land area.

— “Densidad_Poblacion”: Population Density Unit: Population per km² Description: Indicates the amount of population divided by the territorial area of Mexico in km². Measures population density.

— “CO2_Emisiones”: Carbon Dioxide (CO2) Emissions Unit: Metric Tons Per Capita Description: Represents carbon dioxide emissions per inhabitant. Reflects the environmental impact.

— “PIB_per_Cápita”: Gross Domestic Product (GDP) per Capita Unit: Real 2013 MXN Pesos Description: It is the Gross Domestic Product divided by the population and adjusted by the prices of 2013. It indicates the average income per inhabitant.

— “INPC”: Consumer Price Index (INPC) Unit: Price index (Base 2018 = 100) Description: Represents the consumer price index, which reflects the variation in the prices of consumer goods and services.

Exploratory Data Analysis

### loading libraries
library(foreign)
library(dplyr)        # data manipulation 
library(forcats)      # to work with categorical variables
library(ggplot2)      # data visualization 
library(readr)        # read specific csv files
library(janitor)      # data exploration and cleaning 
library(Hmisc)        # several useful functions for data analysis 
library(psych)        # functions for multivariate analysis 
library(naniar)       # summaries and visualization of missing values NA's
library(dlookr)       # summaries and visualization of missing values NA's
library(corrplot)     # correlation plots
library(jtools)       # presentation of regression analysis 
library(lmtest)       # diagnostic checks - linear regression analysis 
library(car)          # diagnostic checks - linear regression analysis
library(olsrr)        # diagnostic checks - linear regression analysis 
library(naniar)       # identifying missing values
library(stargazer)    # create publication quality tables
library(effects)      # displays for linear and other regression models
library(tidyverse)    # collection of R packages designed for data science
library(caret)        # Classification and Regression Training 
library(glmnet)       # methods for prediction and plotting, and functions for cross-validation

To begin, it is essential to explore the first data in our database, which will allow us to understand the types of data we have available and the independent variables that will be fundamental to building a robust linear regression model. In this project, the dependent variable that we will use will be Foreign Direct Investment (FDI). This choice is justified because FDI is a key measure related to the central topic of our project: Nearshoring. Furthermore, FDI is a metric that is influenced by a series of independent variables, covering political, social, environmental and ethical aspects. These variables together influence the level of FDI, making it a fundamental indicator for our analysis.

#Importing db 

bd <- read.csv("/Users/gabrielmedina/Downloads/nearshoring.csv")
bd
##    periodo IED_Flujos_dolares Exportaciones_dolares Empleo Educacion
## 1     1997           12145.60               9087.62     NA      7.20
## 2     1998            8373.50               9875.07     NA      7.31
## 3     1999           13960.32              10990.01     NA      7.43
## 4     2000           18248.69              12482.96  97.83      7.56
## 5     2001           30057.18              11300.44  97.36      7.68
## 6     2002           24099.21              11923.10  97.66      7.80
## 7     2003           18249.97              13156.00  97.06      7.93
## 8     2004           25015.57              13573.13  96.48      8.04
## 9     2005           25795.82              16465.81  97.17      8.14
## 10    2006           21232.54              17485.93  96.53      8.26
## 11    2007           32393.33              19103.85  96.60      8.36
## 12    2008           29502.46              16924.76  95.68      8.46
## 13    2009           17849.95              19702.63  95.20      8.56
## 14    2010           27189.28              22673.14  95.06      8.63
## 15    2011           25632.52              24333.02  95.49      8.75
## 16    2012           21769.32              26297.98  95.53      8.85
## 17    2013           48354.42              27687.57  95.75      8.95
## 18    2014           30351.25              31676.78  96.24      9.05
## 19    2015           35943.75              29959.94  96.04      9.15
## 20    2016           31188.98              31375.06  96.62      9.25
## 21    2017           34017.05              33322.62  96.85      9.35
## 22    2018           34100.43              35341.90  96.64      9.45
## 23    2019           34577.16              36414.73  97.09      9.58
## 24    2020           28205.89              41077.34  96.21        NA
## 25    2021           31553.52              44914.78  96.49        NA
## 26    2022           36215.37              46477.59  97.24        NA
##    Salario_Diario Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 1           24.30      11.30           266.51                 14.55
## 2           31.91      11.37           314.78                 14.32
## 3           31.91      12.46           272.89                 12.64
## 4           35.12      13.15           216.98                 10.86
## 5           37.57      13.47           214.53                 10.25
## 6           39.74      12.80           197.80                  9.94
## 7           41.53      11.81           183.22                  9.81
## 8           43.30      12.61           146.28                  8.92
## 9           45.24      13.41           136.94                  9.22
## 10          47.05      14.23           135.59                  9.60
## 11          48.88      15.04           145.92                  8.04
## 12          50.84      14.82           158.17                 12.52
## 13          53.19      12.59           175.77                 17.46
## 14          55.77      12.69           201.94                 22.43
## 15          58.06      12.10           212.61                 23.42
## 16          60.75      13.03           190.28                 22.09
## 17          63.12      13.22           185.56                 19.74
## 18          65.58      13.65           154.41                 16.93
## 19          70.10      15.11           180.44                 17.37
## 20          73.04      14.40           160.57                 20.31
## 21          88.36      14.05           230.43                 26.22
## 22          88.36      13.25           184.25                 29.59
## 23         102.68      12.70           173.45                 29.21
## 24         123.22      11.28           133.90                 28.98
## 25         141.70         NA           127.13                 27.89
## 26         172.87         NA           120.49                    NA
##    Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## 1            8.06               0.05              47.44          3.68
## 2            9.94               0.05              48.76          3.85
## 3            9.52               0.06              49.48          3.69
## 4            9.60               0.06              50.58          3.87
## 5            9.17               0.06              51.28          3.81
## 6           10.36               0.06              51.95          3.82
## 7           11.20               0.06              52.61          3.95
## 8           11.22               0.06              53.27          3.98
## 9           10.71               0.06              54.78          4.10
## 10          10.88               0.06              55.44          4.19
## 11          10.90               0.06              56.17          4.22
## 12          13.77               0.07              56.96          4.19
## 13          13.04               0.07              57.73          4.04
## 14          12.38               0.07              58.45          4.11
## 15          13.98               0.07              59.15          4.19
## 16          12.99               0.07              59.85          4.20
## 17          13.07               0.08              59.49          4.06
## 18          14.73               0.08              60.17          3.89
## 19          17.34               0.08              60.86          3.93
## 20          20.66               0.08              61.57          3.89
## 21          19.74               0.09              62.28          3.84
## 22          19.66               0.09              63.11          3.65
## 23          18.87               0.09              63.90          3.59
## 24          19.94               0.09              64.59            NA
## 25          20.52               0.09              65.16            NA
## 26          19.41               0.09              65.60            NA
##    PIB_Per_Capita   INPC
## 1        127570.1  33.28
## 2        126738.8  39.47
## 3        129164.7  44.34
## 4        130874.9  48.31
## 5        128083.4  50.43
## 6        128205.9  53.31
## 7        128737.9  55.43
## 8        132563.5  58.31
## 9        132941.1  60.25
## 10       135894.9  62.69
## 11       137795.7  65.05
## 12       135176.0  69.30
## 13       131233.0  71.77
## 14       134991.7  74.93
## 15       138891.9  77.79
## 16       141530.2  80.57
## 17       144112.0  83.77
## 18       147277.4  87.19
## 19       149433.5  89.05
## 20       152275.4  92.04
## 21       153235.7  98.27
## 22       153133.8  99.91
## 23       150233.1 105.93
## 24       142609.3 109.27
## 25       142772.0 117.31
## 26       146826.7 126.48

To begin with, it begins with an exploratory analysis of the data, understanding what the structure of the data is and what type of data we have.

sum(is.na(bd))
## [1] 12
str(bd)
## 'data.frame':    26 obs. of  15 variables:
##  $ periodo              : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ IED_Flujos_dolares   : num  12146 8374 13960 18249 30057 ...
##  $ Exportaciones_dolares: num  9088 9875 10990 12483 11300 ...
##  $ Empleo               : num  NA NA NA 97.8 97.4 ...
##  $ Educacion            : num  7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
##  $ Salario_Diario       : num  24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovacion           : num  11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num  267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num  14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num  8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num  0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
##  $ Densidad_Poblacion   : num  47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num  3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
##  $ PIB_Per_Capita       : num  127570 126739 129165 130875 128083 ...
##  $ INPC                 : num  33.3 39.5 44.3 48.3 50.4 ...
#Descriptive statistics
summary(bd)
##     periodo     IED_Flujos_dolares Exportaciones_dolares     Empleo     
##  Min.   :1997   Min.   : 8374      Min.   : 9088         Min.   :95.06  
##  1st Qu.:2003   1st Qu.:21367      1st Qu.:13260         1st Qu.:95.89  
##  Median :2010   Median :27698      Median :21188         Median :96.53  
##  Mean   :2010   Mean   :26770      Mean   :23601         Mean   :96.47  
##  3rd Qu.:2016   3rd Qu.:32183      3rd Qu.:31601         3rd Qu.:97.08  
##  Max.   :2022   Max.   :48354      Max.   :46478         Max.   :97.83  
##                                                          NA's   :3      
##    Educacion     Salario_Diario     Innovacion    Inseguridad_Robo
##  Min.   :7.200   Min.   : 24.30   Min.   :11.28   Min.   :120.5   
##  1st Qu.:7.865   1st Qu.: 41.97   1st Qu.:12.56   1st Qu.:148.3   
##  Median :8.460   Median : 54.48   Median :13.09   Median :181.8   
##  Mean   :8.423   Mean   : 65.16   Mean   :13.11   Mean   :185.4   
##  3rd Qu.:9.000   3rd Qu.: 72.31   3rd Qu.:13.75   3rd Qu.:209.9   
##  Max.   :9.580   Max.   :172.87   Max.   :15.11   Max.   :314.8   
##  NA's   :3                        NA's   :2                       
##  Inseguridad_Homicidio Tipo_de_Cambio  Densidad_Carretera Densidad_Poblacion
##  Min.   : 8.04         Min.   : 8.06   Min.   :0.05000    Min.   :47.44     
##  1st Qu.:10.25         1st Qu.:10.75   1st Qu.:0.06000    1st Qu.:52.77     
##  Median :16.93         Median :13.02   Median :0.07000    Median :58.09     
##  Mean   :17.29         Mean   :13.91   Mean   :0.07115    Mean   :57.33     
##  3rd Qu.:22.43         3rd Qu.:18.49   3rd Qu.:0.08000    3rd Qu.:61.39     
##  Max.   :29.59         Max.   :20.66   Max.   :0.09000    Max.   :65.60     
##  NA's   :1                                                                  
##  CO2_Emisiones   PIB_Per_Capita        INPC       
##  Min.   :3.590   Min.   :126739   Min.   : 33.28  
##  1st Qu.:3.830   1st Qu.:130964   1st Qu.: 56.15  
##  Median :3.930   Median :136845   Median : 73.35  
##  Mean   :3.945   Mean   :138550   Mean   : 75.17  
##  3rd Qu.:4.105   3rd Qu.:146148   3rd Qu.: 91.29  
##  Max.   :4.220   Max.   :153236   Max.   :126.48  
##  NA's   :3
#The analysis of descriptive statistics reveals a variety of data in a period spanning from 1997 to 2022. Foreign Direct Investment (FDI) stands out as a dependent variable, with a wide range of values ranging between 210,876 and 754,438. Other variables, such as daily wage, innovation, insecurity and GDP per capita, also show significant variability. These data provide an overview of the diversity in the key variables, suggesting the need for further analysis and the possibility of building a linear regression model to investigate the relationships between them in the context of Nearshoring.
Data cleaning

As we could see, there are several records with missing values, removing them would skew our data. Therefore, the mean statistical imputation method will be used to be able to replace the null values

#Data imputation using mean.
datos_imputados <- bd
media_empleo <- mean(bd$Empleo, na.rm = TRUE)  # Calculate the mean without NA
media_educacion <- mean(bd$Educacion, na.rm = TRUE)  # Calculate the mean without NA
media_innovacion <- mean(bd$Innovacion, na.rm = TRUE)  # Calculate the mean without NA
media_homicidio <- mean(bd$Inseguridad_Homicidio, na.rm = TRUE)  # Calculate the mean without NA
media_CO2 <- mean(bd$CO2_Emisiones, na.rm = TRUE)  # Calculate the mean without NA
datos_imputados$Empleo[is.na(datos_imputados$Empleo)] <- media_empleo
datos_imputados$Educacion[is.na(datos_imputados$Educacion)] <- media_educacion
datos_imputados$Innovacion[is.na(datos_imputados$Innovacion)] <- media_innovacion
datos_imputados$Inseguridad_Homicidio[is.na(datos_imputados$Inseguridad_Homicidio)] <- media_homicidio
datos_imputados$CO2_Emisiones[is.na(datos_imputados$CO2_Emisiones)] <- media_CO2
#Data structure
str(datos_imputados)
## 'data.frame':    26 obs. of  15 variables:
##  $ periodo              : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ IED_Flujos_dolares   : num  12146 8374 13960 18249 30057 ...
##  $ Exportaciones_dolares: num  9088 9875 10990 12483 11300 ...
##  $ Empleo               : num  96.5 96.5 96.5 97.8 97.4 ...
##  $ Educacion            : num  7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
##  $ Salario_Diario       : num  24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovacion           : num  11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num  267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num  14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num  8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num  0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
##  $ Densidad_Poblacion   : num  47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num  3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
##  $ PIB_Per_Capita       : num  127570 126739 129165 130875 128083 ...
##  $ INPC                 : num  33.3 39.5 44.3 48.3 50.4 ...
bd2=datos_imputados
bd2
##    periodo IED_Flujos_dolares Exportaciones_dolares   Empleo Educacion
## 1     1997           12145.60               9087.62 96.47043  7.200000
## 2     1998            8373.50               9875.07 96.47043  7.310000
## 3     1999           13960.32              10990.01 96.47043  7.430000
## 4     2000           18248.69              12482.96 97.83000  7.560000
## 5     2001           30057.18              11300.44 97.36000  7.680000
## 6     2002           24099.21              11923.10 97.66000  7.800000
## 7     2003           18249.97              13156.00 97.06000  7.930000
## 8     2004           25015.57              13573.13 96.48000  8.040000
## 9     2005           25795.82              16465.81 97.17000  8.140000
## 10    2006           21232.54              17485.93 96.53000  8.260000
## 11    2007           32393.33              19103.85 96.60000  8.360000
## 12    2008           29502.46              16924.76 95.68000  8.460000
## 13    2009           17849.95              19702.63 95.20000  8.560000
## 14    2010           27189.28              22673.14 95.06000  8.630000
## 15    2011           25632.52              24333.02 95.49000  8.750000
## 16    2012           21769.32              26297.98 95.53000  8.850000
## 17    2013           48354.42              27687.57 95.75000  8.950000
## 18    2014           30351.25              31676.78 96.24000  9.050000
## 19    2015           35943.75              29959.94 96.04000  9.150000
## 20    2016           31188.98              31375.06 96.62000  9.250000
## 21    2017           34017.05              33322.62 96.85000  9.350000
## 22    2018           34100.43              35341.90 96.64000  9.450000
## 23    2019           34577.16              36414.73 97.09000  9.580000
## 24    2020           28205.89              41077.34 96.21000  8.423478
## 25    2021           31553.52              44914.78 96.49000  8.423478
## 26    2022           36215.37              46477.59 97.24000  8.423478
##    Salario_Diario Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 1           24.30   11.30000           266.51               14.5500
## 2           31.91   11.37000           314.78               14.3200
## 3           31.91   12.46000           272.89               12.6400
## 4           35.12   13.15000           216.98               10.8600
## 5           37.57   13.47000           214.53               10.2500
## 6           39.74   12.80000           197.80                9.9400
## 7           41.53   11.81000           183.22                9.8100
## 8           43.30   12.61000           146.28                8.9200
## 9           45.24   13.41000           136.94                9.2200
## 10          47.05   14.23000           135.59                9.6000
## 11          48.88   15.04000           145.92                8.0400
## 12          50.84   14.82000           158.17               12.5200
## 13          53.19   12.59000           175.77               17.4600
## 14          55.77   12.69000           201.94               22.4300
## 15          58.06   12.10000           212.61               23.4200
## 16          60.75   13.03000           190.28               22.0900
## 17          63.12   13.22000           185.56               19.7400
## 18          65.58   13.65000           154.41               16.9300
## 19          70.10   15.11000           180.44               17.3700
## 20          73.04   14.40000           160.57               20.3100
## 21          88.36   14.05000           230.43               26.2200
## 22          88.36   13.25000           184.25               29.5900
## 23         102.68   12.70000           173.45               29.2100
## 24         123.22   11.28000           133.90               28.9800
## 25         141.70   13.10583           127.13               27.8900
## 26         172.87   13.10583           120.49               17.2924
##    Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## 1            8.06               0.05              47.44      3.680000
## 2            9.94               0.05              48.76      3.850000
## 3            9.52               0.06              49.48      3.690000
## 4            9.60               0.06              50.58      3.870000
## 5            9.17               0.06              51.28      3.810000
## 6           10.36               0.06              51.95      3.820000
## 7           11.20               0.06              52.61      3.950000
## 8           11.22               0.06              53.27      3.980000
## 9           10.71               0.06              54.78      4.100000
## 10          10.88               0.06              55.44      4.190000
## 11          10.90               0.06              56.17      4.220000
## 12          13.77               0.07              56.96      4.190000
## 13          13.04               0.07              57.73      4.040000
## 14          12.38               0.07              58.45      4.110000
## 15          13.98               0.07              59.15      4.190000
## 16          12.99               0.07              59.85      4.200000
## 17          13.07               0.08              59.49      4.060000
## 18          14.73               0.08              60.17      3.890000
## 19          17.34               0.08              60.86      3.930000
## 20          20.66               0.08              61.57      3.890000
## 21          19.74               0.09              62.28      3.840000
## 22          19.66               0.09              63.11      3.650000
## 23          18.87               0.09              63.90      3.590000
## 24          19.94               0.09              64.59      3.945217
## 25          20.52               0.09              65.16      3.945217
## 26          19.41               0.09              65.60      3.945217
##    PIB_Per_Capita   INPC
## 1        127570.1  33.28
## 2        126738.8  39.47
## 3        129164.7  44.34
## 4        130874.9  48.31
## 5        128083.4  50.43
## 6        128205.9  53.31
## 7        128737.9  55.43
## 8        132563.5  58.31
## 9        132941.1  60.25
## 10       135894.9  62.69
## 11       137795.7  65.05
## 12       135176.0  69.30
## 13       131233.0  71.77
## 14       134991.7  74.93
## 15       138891.9  77.79
## 16       141530.2  80.57
## 17       144112.0  83.77
## 18       147277.4  87.19
## 19       149433.5  89.05
## 20       152275.4  92.04
## 21       153235.7  98.27
## 22       153133.8  99.91
## 23       150233.1 105.93
## 24       142609.3 109.27
## 25       142772.0 117.31
## 26       146826.7 126.48

To begin, according to the glossary of the database provided, foreign direct investment and exports are in dollars and in nominal values, which is why it is important to transform them into pesos and real values.

These real values more accurately reflect the fluctuations in foreign direct investment over the time series, taking into account the inflationary changes that occurred in those specific years. To achieve this conversion, the exchange rate corresponding to the year of each observation was applied and the National Consumer Price Index (INPC) of that same year was considered as an inflation adjustment. This approach allows for a more accurate and realistic representation of FDI over time.

bd2$IED_Flujos = ((bd2$IED_Flujos_dolares * bd2$Tipo_de_Cambio) / bd2$INPC) * 100
bd2$Exportaciones = ((bd2$Exportaciones_dolares * bd2$Tipo_de_Cambio) / bd2$INPC) * 100
bd2
##    periodo IED_Flujos_dolares Exportaciones_dolares   Empleo Educacion
## 1     1997           12145.60               9087.62 96.47043  7.200000
## 2     1998            8373.50               9875.07 96.47043  7.310000
## 3     1999           13960.32              10990.01 96.47043  7.430000
## 4     2000           18248.69              12482.96 97.83000  7.560000
## 5     2001           30057.18              11300.44 97.36000  7.680000
## 6     2002           24099.21              11923.10 97.66000  7.800000
## 7     2003           18249.97              13156.00 97.06000  7.930000
## 8     2004           25015.57              13573.13 96.48000  8.040000
## 9     2005           25795.82              16465.81 97.17000  8.140000
## 10    2006           21232.54              17485.93 96.53000  8.260000
## 11    2007           32393.33              19103.85 96.60000  8.360000
## 12    2008           29502.46              16924.76 95.68000  8.460000
## 13    2009           17849.95              19702.63 95.20000  8.560000
## 14    2010           27189.28              22673.14 95.06000  8.630000
## 15    2011           25632.52              24333.02 95.49000  8.750000
## 16    2012           21769.32              26297.98 95.53000  8.850000
## 17    2013           48354.42              27687.57 95.75000  8.950000
## 18    2014           30351.25              31676.78 96.24000  9.050000
## 19    2015           35943.75              29959.94 96.04000  9.150000
## 20    2016           31188.98              31375.06 96.62000  9.250000
## 21    2017           34017.05              33322.62 96.85000  9.350000
## 22    2018           34100.43              35341.90 96.64000  9.450000
## 23    2019           34577.16              36414.73 97.09000  9.580000
## 24    2020           28205.89              41077.34 96.21000  8.423478
## 25    2021           31553.52              44914.78 96.49000  8.423478
## 26    2022           36215.37              46477.59 97.24000  8.423478
##    Salario_Diario Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 1           24.30   11.30000           266.51               14.5500
## 2           31.91   11.37000           314.78               14.3200
## 3           31.91   12.46000           272.89               12.6400
## 4           35.12   13.15000           216.98               10.8600
## 5           37.57   13.47000           214.53               10.2500
## 6           39.74   12.80000           197.80                9.9400
## 7           41.53   11.81000           183.22                9.8100
## 8           43.30   12.61000           146.28                8.9200
## 9           45.24   13.41000           136.94                9.2200
## 10          47.05   14.23000           135.59                9.6000
## 11          48.88   15.04000           145.92                8.0400
## 12          50.84   14.82000           158.17               12.5200
## 13          53.19   12.59000           175.77               17.4600
## 14          55.77   12.69000           201.94               22.4300
## 15          58.06   12.10000           212.61               23.4200
## 16          60.75   13.03000           190.28               22.0900
## 17          63.12   13.22000           185.56               19.7400
## 18          65.58   13.65000           154.41               16.9300
## 19          70.10   15.11000           180.44               17.3700
## 20          73.04   14.40000           160.57               20.3100
## 21          88.36   14.05000           230.43               26.2200
## 22          88.36   13.25000           184.25               29.5900
## 23         102.68   12.70000           173.45               29.2100
## 24         123.22   11.28000           133.90               28.9800
## 25         141.70   13.10583           127.13               27.8900
## 26         172.87   13.10583           120.49               17.2924
##    Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## 1            8.06               0.05              47.44      3.680000
## 2            9.94               0.05              48.76      3.850000
## 3            9.52               0.06              49.48      3.690000
## 4            9.60               0.06              50.58      3.870000
## 5            9.17               0.06              51.28      3.810000
## 6           10.36               0.06              51.95      3.820000
## 7           11.20               0.06              52.61      3.950000
## 8           11.22               0.06              53.27      3.980000
## 9           10.71               0.06              54.78      4.100000
## 10          10.88               0.06              55.44      4.190000
## 11          10.90               0.06              56.17      4.220000
## 12          13.77               0.07              56.96      4.190000
## 13          13.04               0.07              57.73      4.040000
## 14          12.38               0.07              58.45      4.110000
## 15          13.98               0.07              59.15      4.190000
## 16          12.99               0.07              59.85      4.200000
## 17          13.07               0.08              59.49      4.060000
## 18          14.73               0.08              60.17      3.890000
## 19          17.34               0.08              60.86      3.930000
## 20          20.66               0.08              61.57      3.890000
## 21          19.74               0.09              62.28      3.840000
## 22          19.66               0.09              63.11      3.650000
## 23          18.87               0.09              63.90      3.590000
## 24          19.94               0.09              64.59      3.945217
## 25          20.52               0.09              65.16      3.945217
## 26          19.41               0.09              65.60      3.945217
##    PIB_Per_Capita   INPC IED_Flujos Exportaciones
## 1        127570.1  33.28   294151.2      220090.8
## 2        126738.8  39.47   210875.6      248690.6
## 3        129164.7  44.34   299734.4      235960.5
## 4        130874.9  48.31   362631.8      248057.2
## 5        128083.4  50.43   546548.4      205482.9
## 6        128205.9  53.31   468332.0      231707.6
## 7        128737.9  55.43   368752.8      265825.7
## 8        132563.5  58.31   481349.2      261173.9
## 9        132941.1  60.25   458544.8      292695.1
## 10       135894.9  62.69   368495.8      303472.5
## 11       137795.7  65.05   542793.7      320110.6
## 12       135176.0  69.30   586217.7      336297.2
## 13       131233.0  71.77   324318.4      357980.1
## 14       134991.7  74.93   449223.7      374607.6
## 15       138891.9  77.79   460653.8      437299.9
## 16       141530.2  80.57   350978.6      423992.5
## 17       144112.0  83.77   754437.5      431988.2
## 18       147277.4  87.19   512758.2      535151.9
## 19       149433.5  89.05   699904.1      583386.1
## 20       152275.4  92.04   700091.6      704268.5
## 21       153235.7  98.27   683318.0      669368.6
## 22       153133.8  99.91   671018.4      695447.7
## 23       150233.1 105.93   615945.4      648679.3
## 24       142609.3 109.27   514711.7      749594.7
## 25       142772.0 117.31   551937.8      785654.5
## 26       146826.7 126.48   555771.9      713259.0
Data visualization

It is also important to understand what is the relationship of each of the independent variables with the dependent variable. This is done by means of graphs to understand the relationships.

#Data visualization (independent variables vs IED)
datos_imputados=data
ggplot(data = bd2, aes(x = Exportaciones, y = IED_Flujos)) +
  geom_point() +
  labs(x = "Exports", y = "Dependent Variable (IED)") +
  ggtitle("Scatter Plot: Exports vs. IED")

As we can see in this graph of exports vs FDI, it can be seen that these variables are positively correlated, although the relationship is not so strong, but since there is a correlation pattern it could be a good assumption to assume that this variable will be statistically significant in the face of to the construction of our regression models.

ggplot(data = bd2, aes(x = Salario_Diario, y = IED_Flujos)) +
  geom_line() +
  labs(x = "Salario diario", y = "Dependent Variable (IED)") +
  ggtitle("Line Plot: Salary vs. IED")

En esta gráfica, apreciamos que al comienzo de las variables, se muestra una ligera correlación positiva, sin embargo, a medida que crece el salario se aprecia como pierde correlación con la variable dependiente.

ggplot(data = bd2, aes(x = Tipo_de_Cambio, y = IED_Flujos)) +
  geom_line() +
  labs(x = "Exchange rate", y = "Dependent Variable (IED)") +
  ggtitle("Density Plot: Exchange rate vs. IED")

Just as in the case of exports, despite showing little correlation, the positive correlation of this relationship between the exchange rate and our dependent variable is noticeable.

ggplot(data = bd2, aes(x = Densidad_Poblacion, y = IED_Flujos)) +
  geom_hex() +
  labs(x = "Poblation Density", y = "Dependent Variable (IED)") +
  ggtitle("Hex Plot: Poblation Density vs. IED")

A slight positive correlation between these two variables is noticeable.

ggplot(data = bd2, aes(x = CO2_Emisiones, y = IED_Flujos)) +
  geom_bar(stat = "identity") +
  labs(x = "CO2", y = "Variable Dependiente (IED)") +
  ggtitle("Gráfico de Barras: CO2 vs. IED")

There is almost no correlation

ggplot(data = bd2, aes(x = INPC, y = IED_Flujos)) +
  geom_point() +
  labs(x = "INPC", y = "Dependent Variable (IED)") +
  ggtitle("Scatter Plot: INPC vs. IED")

The INPC shows a correlation with FDI, something that makes a lot of sense because the National Consumer Price Index is a way to measure the inflation that is occurring in the country, and when foreign investment rises, it is a sign of economic spillover. that is happening and the growth of the economy, which leads to more money in circulation in the hands of Mexicans and therefore a consequent inflation.

str(bd2)
## 'data.frame':    26 obs. of  17 variables:
##  $ periodo              : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ IED_Flujos_dolares   : num  12146 8374 13960 18249 30057 ...
##  $ Exportaciones_dolares: num  9088 9875 10990 12483 11300 ...
##  $ Empleo               : num  96.5 96.5 96.5 97.8 97.4 ...
##  $ Educacion            : num  7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
##  $ Salario_Diario       : num  24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovacion           : num  11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num  267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num  14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num  8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num  0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
##  $ Densidad_Poblacion   : num  47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num  3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
##  $ PIB_Per_Capita       : num  127570 126739 129165 130875 128083 ...
##  $ INPC                 : num  33.3 39.5 44.3 48.3 50.4 ...
##  $ IED_Flujos           : num  294151 210876 299734 362632 546548 ...
##  $ Exportaciones        : num  220091 248691 235961 248057 205483 ...
#Analyze the distribution of the dependent variable data
hist(bd2$IED_Flujos)

Normal Distribution is observed

Likewise, it is important to analyze the distribution of the data of each of the independent variables. This in order to determine if normalization methods will be applied for certain variables.

hist(bd2$Exportaciones)

hist(bd2$Educacion)

hist(bd2$Innovacion)

hist(bd2$Inseguridad_Robo)

hist(bd2$Inseguridad_Homicidio)

hist(bd2$Salario_Diario)

hist(bd2$Tipo_de_Cambio)

hist(bd2$Densidad_Poblacion)

hist(bd2$Densidad_Carretera)

hist(bd2$INPC)

hist(bd2$CO2_Emisiones)

hist(bd2$Empleo)

hist(bd2$PIB_Per_Capita)

As seen in the histograms, some variables such as daily wage, GDP per capita, and road density could be better applied to the model if the data for these independent variables are normalized.

Correlation analysis

Another way to understand the importance of the independent variables is through a correlation graph, which allows us to know the level of correlation that each independent variable has with the dependent variable.

#Correlation plot
corrplot(cor(bd2), 
         type = "upper", 
         order = "hclust", 
         addCoef.col = "black", 
         tl.cex = 0.7,     
       ) 

This is a way to show the matrix in a more understandable way.

cor(bd2)
##                           periodo IED_Flujos_dolares Exportaciones_dolares
## periodo                1.00000000         0.72318518            0.97797102
## IED_Flujos_dolares     0.72318518         1.00000000            0.66131031
## Exportaciones_dolares  0.97797102         0.66131031            1.00000000
## Empleo                -0.19841056        -0.04180470           -0.11323525
## Educacion              0.83105621         0.72312246            0.72304782
## Salario_Diario         0.88382398         0.55538324            0.93831578
## Innovacion             0.25579696         0.53493886            0.16104355
## Inseguridad_Robo      -0.58906756        -0.54978405           -0.53699390
## Inseguridad_Homicidio  0.78019803         0.39745854            0.78304009
## Tipo_de_Cambio         0.94088306         0.60323139            0.92867446
## Densidad_Carretera     0.96458288         0.72608943            0.95118248
## Densidad_Poblacion     0.99553772         0.71914745            0.96144282
## CO2_Emisiones          0.03313055         0.09893952           -0.05354392
## PIB_Per_Capita         0.88921632         0.73421761            0.85244470
## INPC                   0.99125242         0.70473813            0.98711668
## IED_Flujos             0.68609682         0.93761585            0.60981365
## Exportaciones          0.95382682         0.60314562            0.96788006
##                             Empleo   Educacion Salario_Diario  Innovacion
## periodo               -0.198410563  0.83105621     0.88382398  0.25579696
## IED_Flujos_dolares    -0.041804696  0.72312246     0.55538324  0.53493886
## Exportaciones_dolares -0.113235254  0.72304782     0.93831578  0.16104355
## Empleo                 1.000000000 -0.31023311     0.04593783  0.02362057
## Educacion             -0.310233106  1.00000000     0.49869278  0.44910805
## Salario_Diario         0.045937834  0.49869278     1.00000000  0.05207552
## Innovacion             0.023620573  0.44910805     0.05207552  1.00000000
## Inseguridad_Robo      -0.004217223 -0.43639523    -0.54181369 -0.42120203
## Inseguridad_Homicidio -0.319803545  0.67397461     0.65108820 -0.16457984
## Tipo_de_Cambio        -0.074355786  0.76714391     0.85057472  0.21768318
## Densidad_Carretera    -0.114773291  0.80732881     0.85823660  0.21563225
## Densidad_Poblacion    -0.239171507  0.84484920     0.86218581  0.27949868
## CO2_Emisiones         -0.498936093  0.07009249    -0.08454554  0.32027189
## PIB_Per_Capita        -0.101466704  0.90428687     0.67319673  0.43195946
## INPC                  -0.125598268  0.76629637     0.93375758  0.22007275
## IED_Flujos             0.031512099  0.74275850     0.48161305  0.58473780
## Exportaciones         -0.083855335  0.74096111     0.88113311  0.17106144
##                       Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio
## periodo                   -0.589067557            0.78019803     0.94088306
## IED_Flujos_dolares        -0.549784049            0.39745854     0.60323139
## Exportaciones_dolares     -0.536993897            0.78304009     0.92867446
## Empleo                    -0.004217223           -0.31980355    -0.07435579
## Educacion                 -0.436395232            0.67397461     0.76714391
## Salario_Diario            -0.541813691            0.65108820     0.85057472
## Innovacion                -0.421202028           -0.16457984     0.21768318
## Inseguridad_Robo           1.000000000           -0.07833158    -0.45220207
## Inseguridad_Homicidio     -0.078331576            1.00000000     0.79183373
## Tipo_de_Cambio            -0.452202075            0.79183373     1.00000000
## Densidad_Carretera        -0.470108908            0.81519169     0.94321700
## Densidad_Poblacion        -0.616336204            0.76810355     0.92092571
## CO2_Emisiones             -0.419071007           -0.24054955    -0.15728113
## PIB_Per_Capita            -0.402521841            0.70529204     0.88109098
## INPC                      -0.594576114            0.75454073     0.93511557
## IED_Flujos                -0.451823035            0.41641063     0.67504631
## Exportaciones             -0.447470177            0.82008178     0.98228669
##                       Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## periodo                        0.9645829          0.9955377   0.033130551
## IED_Flujos_dolares             0.7260894          0.7191474   0.098939515
## Exportaciones_dolares          0.9511825          0.9614428  -0.053543920
## Empleo                        -0.1147733         -0.2391715  -0.498936093
## Educacion                      0.8073288          0.8448492   0.070092492
## Salario_Diario                 0.8582366          0.8621858  -0.084545536
## Innovacion                     0.2156323          0.2794987   0.320271892
## Inseguridad_Robo              -0.4701089         -0.6163362  -0.419071007
## Inseguridad_Homicidio          0.8151917          0.7681035  -0.240549545
## Tipo_de_Cambio                 0.9432170          0.9209257  -0.157281133
## Densidad_Carretera             1.0000000          0.9480910  -0.153736764
## Densidad_Poblacion             0.9480910          1.0000000   0.105315290
## CO2_Emisiones                 -0.1537368          0.1053153   1.000000000
## PIB_Per_Capita                 0.8855798          0.8723228  -0.107641865
## INPC                           0.9618219          0.9830582   0.003521392
## IED_Flujos                     0.7208468          0.6715246  -0.054005458
## Exportaciones                  0.9477079          0.9300112  -0.160330241
##                       PIB_Per_Capita         INPC  IED_Flujos Exportaciones
## periodo                    0.8892163  0.991252417  0.68609682    0.95382682
## IED_Flujos_dolares         0.7342176  0.704738131  0.93761585    0.60314562
## Exportaciones_dolares      0.8524447  0.987116680  0.60981365    0.96788006
## Empleo                    -0.1014667 -0.125598268  0.03151210   -0.08385533
## Educacion                  0.9042869  0.766296372  0.74275850    0.74096111
## Salario_Diario             0.6731967  0.933757576  0.48161305    0.88113311
## Innovacion                 0.4319595  0.220072749  0.58473780    0.17106144
## Inseguridad_Robo          -0.4025218 -0.594576114 -0.45182304   -0.44747018
## Inseguridad_Homicidio      0.7052920  0.754540732  0.41641063    0.82008178
## Tipo_de_Cambio             0.8810910  0.935115572  0.67504631    0.98228669
## Densidad_Carretera         0.8855798  0.961821927  0.72084682    0.94770790
## Densidad_Poblacion         0.8723228  0.983058193  0.67152462    0.93001115
## CO2_Emisiones             -0.1076419  0.003521392 -0.05400546   -0.16033024
## PIB_Per_Capita             1.0000000  0.851476073  0.77968545    0.88763319
## INPC                       0.8514761  1.000000000  0.65404935    0.95015955
## IED_Flujos                 0.7796854  0.654049346  1.00000000    0.63773639
## Exportaciones              0.8876332  0.950159547  0.63773639    1.00000000
Estimation Method

In this study, the Ordinary Least Squares method will be used to develop a regression model that allows us to understand and analyze the relationships between the independent variables and a dependent variable (FDI). Through this approach, we will seek to identify patterns, trends and effects that can help explain and predict the behavior of the dependent variable based on the selected independent variables.

This method is used to find the best linear relationship between a set of independent variables and a dependent variable, minimizing the sum of the squares of the differences between the observed values and the values predicted by the model.

Linear Regresion analysis

Hyphoteses

Since we understand the structure and type of data in detail, as well as the multiple relationships of the independent variables, the following hypotheses can be posed.

      • Hypothesis for Exports: It is hypothesized that the level of “Exports” of a country has a positive and significant impact on the nearshoring decision and the attraction of Foreign Direct Investment (FDI). The hypothesis is based on the idea that countries with a high volume of exports can indicate a strong economy that is open to international business, which in turn can make them attractive destinations for nearshoring. Furthermore, exports may be related to the demand for local production, which influences the attraction of FDI.
      • Hypothesis for Exchange Rate: It is hypothesized that “Exchange Rate” can have a negative significant effect on nearshoring and FDI. A favorable exchange rate, indicating a strong local currency compared to foreign currencies, is expected to make a country more attractive for nearshoring. A favorable exchange rate is also anticipated to reduce currency risks and attract foreign direct investments.
      • Hypothesis for GDP per capita: “GDP per capita” is hypothesized to be an important variable influencing nearshoring and FDI with a positive relation. A higher GDP per capita is generally associated with greater purchasing power of the population and a larger domestic market. Countries with high GDP per capita are therefore expected to be more attractive for nearshoring as they offer a strong local market for foreign companies. Additionally, a high GDP per capita can be an indicator of economic stability, which could also attract foreign direct investment.

Regresion models

Regresion Model 1

From this, the base regression model was modeled, using the variables that are believed to be most relevant according to our exploratory data analysis.

According to our exploratory data analysis, considering the analysis of each variable, taking into account the distribution of the data, its importance in the theoretical context of the problem and its correlation with the dependent variable such as, for example, variables such as “Exportaciones,” “Educacion” “Salario_diario,” “Inovación,” “Tipo de cambio,” “Densidad_Carretera,” “Densidad_Población,” “PIB per Cápita,” and “INPC” have relatively high correlations with the dependent variable FDI_Flows. These could be considered strong candidates to include in the model. However, it is also important to consider multicollinearity. For example, “Densidad_Población” is highly correlated with “Densidad_Carretera” and “Periodo” is highly correlated with “Densidad_Población” This can complicate model interpretation, so you might choose to include only one of these highly correlated variables. It is important to mention that the correlation between an independent variable and the dependent variable may suggest that there is a potential relationship worth exploring, but it does not guarantee that this relationship will be statistically significant in a regression model. Since the correlation indicates that there is an association between variables but not necessarily a causal relationship between one and the other.

Taking this into account, it was decided to use the following independent variables…

dmodel1<-lm(IED_Flujos ~ Exportaciones + Educacion + Salario_Diario + Tipo_de_Cambio + Densidad_Carretera + PIB_Per_Capita + INPC,data=bd2)
summary(dmodel1)
## 
## Call:
## lm(formula = IED_Flujos ~ Exportaciones + Educacion + Salario_Diario + 
##     Tipo_de_Cambio + Densidad_Carretera + PIB_Per_Capita + INPC, 
##     data = bd2)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -153904  -47900    1004   52953  137546 
## 
## Coefficients:
##                      Estimate Std. Error t value Pr(>|t|)  
## (Intercept)        -1.506e+06  7.646e+05  -1.970   0.0645 .
## Exportaciones      -1.907e+00  7.187e-01  -2.653   0.0162 *
## Educacion          -2.650e+05  1.695e+05  -1.563   0.1354  
## Salario_Diario     -5.530e+03  4.471e+03  -1.237   0.2320  
## Tipo_de_Cambio      5.086e+04  2.567e+04   1.982   0.0630 .
## Densidad_Carretera  7.181e+06  5.738e+06   1.251   0.2268  
## PIB_Per_Capita      2.374e+01  8.337e+00   2.848   0.0107 *
## INPC                1.212e+04  9.733e+03   1.246   0.2289  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 85890 on 18 degrees of freedom
## Multiple R-squared:  0.7433, Adjusted R-squared:  0.6435 
## F-statistic: 7.446 on 7 and 18 DF,  p-value: 0.0002852
      • Model interpretation

Intercept: represents the estimated value of the dependent variable (FDI_Flows) when all independent variables are equal to zero. In this case, the estimated value is -1.539e+06 (approximately -1,539,000), with a standard error of 7.352e+05.

Coefficients of Independent Variables: The estimated coefficients for each independent variable represent the relationship between that variable and the dependent variable FDI_Flows, keeping all other variables constant. For example:

The coefficient for Exports is -3.842e+01. This means that, holding the other variables constant, an increase of one unit in Exports is associated, on average, with a decrease of approximately 38.42 units in FDI_Flows.

The coefficient for GDP_Per_Capita is 2.510e+01. This means that, holding the other variables constant, an increase of one unit in GDP_Per_Capita is associated, on average, with an increase of approximately 25.10 units in FDI_Flows.

As for the statistically significant variables: Exports, GDP_Per_Capita and INPC are statistically significant at a level of p < 0.05, which suggests that these variables have a significant relationship with FDI_Flows.

Fit Statistics: The model displays various fit statistics, such as Multiple R-squared, Adjusted R-squared, residual standard error, and F statistic.

The multiple R-squared (0.7624) indicates that approximately 76.24% of the variability in FDI_Flows is explained by the independent variables included in the model.

The adjusted R-squared (0.6699) is similar to the multiple R-squared, but adjusted for the number of variables, providing a more conservative measure of model fit.

The F statistic (8.249) is used to evaluate the overall significance of the model and its p-value (0.0001505) indicates that the model as a whole is statistically significant.

The residual standard error tells us how spread out the data points are around the regression line. A smaller residual standard error indicates that the data points tend to be closer to the regression line, suggesting better model fit. On the other hand, a larger residual standard error indicates a greater spread of the data points around the regression line, suggesting a less precise fit.

It is important to perform some diagnostic tests to better evaluate this model…

Diagnostic test Model 1

library(car)
library(lmtest)
# Multicollinearity

vif(dmodel1)
##      Exportaciones          Educacion     Salario_Diario     Tipo_de_Cambio 
##           66.57493           44.39142           87.04907           38.45451 
## Densidad_Carretera     PIB_Per_Capita               INPC 
##           19.92939           18.49610          197.59264

The VIF is calculated for each independent variable in the model and is used to evaluate how much the variance of the coefficient estimate for that variable increases due to multicollinearity. In other words, the VIF indicates how much more unstable the coefficient estimate is due to multicollinearity. The VIF is intended to be less than 10, therefore, as we can see in the results of the diagnostic test, model 1 presents a multicollinearity problem.

Multicollinearity refers to the high correlation between two or more independent variables in a regression model. This can make it difficult to accurately assign the individual effects of each variable to the dependent variable, as the relationships between the independent variables become confusing.

#Heterocerasticity | Breusch-Pagan Test

bptest(dmodel1)
## 
##  studentized Breusch-Pagan test
## 
## data:  dmodel1
## BP = 10.587, df = 7, p-value = 0.1577

On the other hand, we do not have problems regarding the Homoscedasticity test, since in a time series, we work with sequential observations in time, where there is a temporal dependency structure between the observations. This means that successive values in the series can be correlated and can influence each other. Since variability in a time series can be influenced by temporal effects and cyclical or seasonal patterns, the assumption of homoscedasticity is not as critical as in standard crossover regression analysis. This is checked with the BPTEST test.

El valor p (0.1848) es mayor que el nivel de significancia típico (como 0.05), lo que sugiere que no hay suficiente evidencia para rechazar la hipótesis nula de homocedasticidad. En otras palabras, no se ha encontrado una clara evidencia de heteroscedasticidad en el modelo.

#plot(residuos)
# Normality of residuals
residuos <- residuals(dmodel1)
shapiro.test(residuos)
## 
##  Shapiro-Wilk normality test
## 
## data:  residuos
## W = 0.98452, p-value = 0.9526

Likewise, it is clearly seen that the residuals do not follow a trend and that they present a normal distribution.Having said all of the above, model 1 can be improved due to the low significance of the variables and the problem of multiculturalism. Therefore, the model will be adjusted following some methods of normalization of variables and the model will be adjusted to other types of regressions, for example polynomial, all in order to find the one that best suits our data.This process with the diagnostic tests will be applied for each model

There is not enough evidence to affirm that the residuals do not follow a normal distribution, since the p value is greater than the alpha significance level (0.05). This suggests that the residuals could be considered approximately normal.

Regresion Model 2

According to the exploratory analysis of the data, it was decided to normalize some variables to reduce multicollinearity.

dmodel2 <- lm(log(IED_Flujos) ~ Exportaciones + log(Tipo_de_Cambio) + log(PIB_Per_Capita) + log(INPC), data = bd2)
summary(dmodel2)
## 
## Call:
## lm(formula = log(IED_Flujos) ~ Exportaciones + log(Tipo_de_Cambio) + 
##     log(PIB_Per_Capita) + log(INPC), data = bd2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.44631 -0.09245  0.01567  0.11015  0.42264 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)  
## (Intercept)         -3.464e+01  1.724e+01  -2.009   0.0576 .
## Exportaciones       -1.404e-06  9.200e-07  -1.526   0.1419  
## log(Tipo_de_Cambio)  3.813e-01  7.114e-01   0.536   0.5976  
## log(PIB_Per_Capita)  3.831e+00  1.504e+00   2.547   0.0188 *
## log(INPC)            4.645e-01  3.551e-01   1.308   0.2049  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2076 on 21 degrees of freedom
## Multiple R-squared:  0.6407, Adjusted R-squared:  0.5723 
## F-statistic: 9.363 on 4 and 21 DF,  p-value: 0.000166

The interpretation of the coefficients is maintained, but now refers to the relationships in terms of the normalized or transformed variables. For example, the coefficient for log(GDP_Per_Capita) is 2.960, which means that, on average, a 1% increase in GDP_Per_Capita is associated with a 2.960% increase in log(FDI_Flows).As in the previous model, exports, INCPC and GDP remain statistically significant.

The fit of the model is also evaluated with statistics such as the R-squared and the F statistic. In this case, the R-squared is 63.48%, indicating that the model explains 63.48% of the variability in log(FDI_Flows).

Diagnostic test Model 2

#Multicollinearity
vif(dmodel2)
##       Exportaciones log(Tipo_de_Cambio) log(PIB_Per_Capita)           log(INPC) 
##           18.671152           25.247222            5.295378            8.799376

It is observed that multicollinearity decreased considerably, however there are still 3 variables with VIF values greater than 10.

#Heteroscedasticity
bptest(dmodel2)
## 
##  studentized Breusch-Pagan test
## 
## data:  dmodel2
## BP = 3.438, df = 4, p-value = 0.4874

The result of this test does not provide sufficient statistical evidence to affirm that there is Heteroscedasticity.

# Normality of residuals
residuos2 <- residuals(dmodel2)
shapiro.test(residuos)
## 
##  Shapiro-Wilk normality test
## 
## data:  residuos
## W = 0.98452, p-value = 0.9526

The result of this test (p-value = 0.732) indicates that there is not enough evidence to reject the null hypothesis that the data in “residuals” follow a normal distribution. Therefore, it is assumed that the data can approximate a normal distribution.

plot(residuos2)

Regresion Model 3

In this model, the high correlation of exports with multiple variables was identified, however it was decided to replace the INPC with carbon emissions because the export variable is very important for the political and social context of our problem, it is also part of our hypotheses and based on research carried out, it is believed that exports have an important weight in nearshoring. Furthermore, CO2 is an indicator of the phenomenon studied because when many companies install gigafactories in a country, CO2 indices usually increase.

dmodel3 <- lm(log(IED_Flujos) ~   Exportaciones + log(CO2_Emisiones) + I(Tipo_de_Cambio^2) + log(PIB_Per_Capita), data = bd2)
summary(dmodel3)
## 
## Call:
## lm(formula = log(IED_Flujos) ~ Exportaciones + log(CO2_Emisiones) + 
##     I(Tipo_de_Cambio^2) + log(PIB_Per_Capita), data = bd2)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.43861 -0.10368 -0.00973  0.09592  0.41647 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)   
## (Intercept)         -4.589e+01  1.692e+01  -2.712   0.0130 * 
## Exportaciones       -2.347e-06  1.256e-06  -1.869   0.0757 . 
## log(CO2_Emisiones)   8.890e-01  9.730e-01   0.914   0.3713   
## I(Tipo_de_Cambio^2)  3.169e-03  1.821e-03   1.741   0.0963 . 
## log(PIB_Per_Capita)  4.907e+00  1.450e+00   3.384   0.0028 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2092 on 21 degrees of freedom
## Multiple R-squared:  0.6352, Adjusted R-squared:  0.5657 
## F-statistic: 9.142 on 4 and 21 DF,  p-value: 0.0001933

In summary, the linear regression model to predict log(FDI_Flows) shows that the variable log(GDP_Per_Capita) has a significant influence on the prediction, with an estimated coefficient of approximately 4.066. This means that an increase in GDP per capita is positively associated with an increase in log(FDI_Flujos). However, the other variables included in the model, such as Exports, log(CO2_Emisiones) and I(Tipo_de_cambio^2), do not have a statistically significant impact on the prediction of log(FDI_Flujos) due to their high p-values. Altogether, the model explains around 58.39% of the variability in log(FDI_Flujos)

Diagnostic test model 3

#Multicollinearity
vif(dmodel3)
##       Exportaciones  log(CO2_Emisiones) I(Tipo_de_Cambio^2) log(PIB_Per_Capita) 
##           34.277961            1.136926           29.101385            4.848990

Model 3 no longer presents multicollinearity

#Heterocerasticity 
bptest(dmodel3)
## 
##  studentized Breusch-Pagan test
## 
## data:  dmodel3
## BP = 3.1975, df = 4, p-value = 0.5253

The test yields a BP statistical value of 4.7474 with 4 degrees of freedom (df) and a p value of 0.3142. Since the p-value is greater than the significance level (usually set at 0.05), there is not enough evidence to reject the null hypothesis in this case. This suggests that the errors in your regression model can be considered homoskedastic, that is, no strong evidence of heteroscedasticity has been found in the model.

# Normality of residuals
residuos3 <- residuals(dmodel3)
shapiro.test(residuos3)
## 
##  Shapiro-Wilk normality test
## 
## data:  residuos3
## W = 0.97214, p-value = 0.6793

The test returns a W value (test statistic) equal to 0.97556 and a p value equal to 0.7687. Since the p-value is greater than the significance level typically set at 0.05, there is not enough evidence to reject the null hypothesis in this case. This suggests that the residuals from your model (residuos3) show no significant evidence of deviating from a normal distribution, indicating that the residuals could approach a normal distribution.

plot(residuos3)

#AIC 
AIC(dmodel1)
## [1] 672.9873
AIC(dmodel2)
## [1] -1.518694
AIC(dmodel3)
## [1] -1.12261

The AIC is a statistical tool that measures the model that best fits the data. The goal is to find the model with the lowest AIC value, as this indicates that the model provides a good fit to the data. In this case, the lowest AIC value is in model 2 with a value of -3. Therefore, for this test, the model that best fits the data is model 2.

Model Selection

After performing diagnostic tests and comparing the regression models, I have identified that the most appropriate models for our analysis are Model 3 and Model 2. Both models have successfully passed the Shapiro-Wilk normality test and the Breusch-Pagan to evaluate the homoscedasticity of the residuals. This indicates that there are no serious violations of the assumptions of normality and homoscedasticity in our data.

However, it is important to note that Model 2 shows certain signs of multicollinearity, although these are small and significantly lower compared to Model 1. Despite this multicollinearity, Model 2 exhibits a higher coefficient of determination (R squared) and a lower Akaike Information Criterion (AIC) value compared to Model 3. Furthermore, Model 2 includes four predictor variables that are statistically significant, while Model 3 only incorporates two.

It is relevant to note that multicollinearity in our data is, to some degree, inherent to the nature of the macroeconomic independent variables that represent indicators of an entire nation. These variables tend to be correlated with each other, since they influence each other in a national macroeconomic context. Therefore, we have chosen to select Model 2, as it fits our data better, shows a greater number of significant variables and offers a higher R squared compared to Model 3. However, the choice of model will depend ultimately of the objectives of our research. Since our primary focus is on understanding the impact of the independent variables on the dependent variable, we conclude that Model 2 is the most appropriate choice to meet our analysis objectives.

library(effects)
efectos_dmodel3 <- allEffects(dmodel3)

# Graph the effects
plot(efectos_dmodel3)

As we can see in the graphs of the effects of each of the independent variables with the dependent variable, we can observe that in contrast to the hypotheses proposed, exports and the exchange rate seem to have contradictory effects to our hypotheses, where exports have a negative relationship with FDI and the exchange rate a positive relationship. This can be due to many situations that require further analysis, but for example, in some cases, the exchange rate can have a positive relationship with Foreign Direct Investment (FDI) when the local currency devalues ​​or becomes weaker in comparison with other foreign currencies. This devaluation can make production and operating costs in the host country, in this case Mexico, lower in foreign currency terms, which is attractive to foreign companies that want to invest and establish operations in the country. This can boost FDI by making Mexico more cost-competitive and more profitable for foreign companies. But it is also important to mention that many times companies look for a country with a stable currency that does not present significant fluctuations, this guarantees the financial forecast in many cases. On the other hand, the negative impact of exports can be explained when in some cases when a country focuses on increasing its exports, there may be competition for scarce resources, such as skilled labor or raw materials. This may cause foreign companies seeking FDI to face higher costs or difficulties in accessing these resources, which could discourage investment. On the other hand, as established in the hypotheses, GDP per Capita has a positive and strong relationship with Foreign Direct Investment since it is a sign of economic health in a country, which makes it attractive and competitive for nearshoring.

Lasso Model

# Independent variables
x <- model.matrix(log(IED_Flujos) ~ Empleo + Educacion + Salario_Diario + Innovacion + Inseguridad_Robo + Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera + Densidad_Poblacion + CO2_Emisiones + PIB_Per_Capita + INPC, data = bd2)[,-1]

# Dependent variable
y <- bd2$IED_Flujos

# Find the best lambda using cross-validation
set.seed(123) 
cv.lasso <- cv.glmnet(x, y, alpha = 1)

# Display the best lambda value
cv.lasso$lambda.min
## [1] 2921.141
# Fit the final LASSO model on the training data
lassomodel <- glmnet(x, y, alpha = 1, lambda = cv.lasso$lambda.min)

# Display regression coefficients
coef(lassomodel)
## 13 x 1 sparse Matrix of class "dgCMatrix"
##                                  s0
## (Intercept)           -3.164696e+06
## Empleo                 2.339864e+04
## Educacion              2.850528e+04
## Salario_Diario        -5.350497e+02
## Innovacion             4.770318e+04
## Inseguridad_Robo       .           
## Inseguridad_Homicidio  .           
## Tipo_de_Cambio         .           
## Densidad_Carretera     6.177915e+06
## Densidad_Poblacion     .           
## CO2_Emisiones         -1.032964e+04
## PIB_Per_Capita         1.239591e+00
## INPC                   .
# Make predictions on the test data
x.test <- model.matrix(log(IED_Flujos) ~ Empleo + Educacion + Salario_Diario + Innovacion + Inseguridad_Robo + Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera + log(Densidad_Poblacion) + CO2_Emisiones + PIB_Per_Capita + INPC, data = bd2)[,-1]
lassopredictions <- predict(lassomodel, newx = as.matrix(x.test))

# Model Accuracy
data.frame(
  RMSE = RMSE(lassopredictions, bd2$IED_Flujos),
  Rsquare = R2(lassopredictions, bd2$IED_Flujos))
##       RMSE        s0
## 1 70578.69 0.7508448
# Visualizing LASSO regression results
lbs_fun <- function(fit, offset_x = 1, ...) {
  L <- length(fit$lambda)
  x <- log(fit$lambda[L]) + offset_x
  y <- fit$beta[, L]
  labs <- names(y)
  text(x, y, labels = labs, ...)
}

lasso <- glmnet(scale(x), y, alpha = 1)

plot(lasso, xvar = "lambda", label = TRUE)
lbs_fun(lasso)
abline(v = cv.lasso$lambda.min, col = "red", lty = 2)
abline(v = cv.lasso$lambda.1se, col = "blue", lty = 2)

print(lassomodel)
## 
## Call:  glmnet(x = x, y = y, alpha = 1, lambda = cv.lasso$lambda.min) 
## 
##   Df  %Dev Lambda
## 1  7 74.96   2921

Employment: Each additional unit in the “Employment” variable is associated with an increase of approximately 23,398.64 in the dependent variable.

Education: Each additional unit in the “Education” variable is associated with an increase of approximately 28,505.28 in the dependent variable.

Daily_Wage: Each additional unit in the variable “Daily_Wage” is associated with a decrease of approximately 535.05 in the dependent variable

Innovation: Each additional unit in the “Innovation” variable is associated with an increase of approximately 47,703.18 in the dependent variable.

Insecurity_Robbery and Insecurity_Homicide: These two variables are not included in the final model, since their coefficients are shown as points (.), which means that the model did not find a significant relationship between these variables and the dependent variable.

Type_of_Change: Like the previous variables, this variable also does not seem to be included in the final model.

Road_Density: Each additional unit in the “Road_Density” variable is associated with an increase of approximately 6,177,915 in the dependent variable.

Population_Density: Like some of the other variables, this variable also does not appear to be included in the final model.

CO2_Emissions: Each additional unit in the “CO2_Emissions” variable is associated with a decrease of approximately 10,329.64 in the dependent variable.

GDP_Per_Capita: Each additional unit in the “GDP_Per_Capita” variable is associated with an increase of approximately 1.24 in the dependent variable.

INPC: Like some of the other variables, this variable also does not appear to be included in the final model.

On the other hand, RMSE value is 70,578.69. This means that, on average, the model predictions have an error of about 70,578.69 units. Since it is spoken in millions of dollars, the RMSE value seems to be too high. Likewise, the high value of Lambda indicates the high regulation and exclusion of variables that were used in the model.

Serial Autocorrelation

library(forecast)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo
# Create a time series from FDI_Flujos data
time_series <- ts(bd2$IED_Flujos, frequency = 12)

# Calculate serial autocorrelation and display the autocorrelation graph
acf_result <- acf(time_series, lag.max = 12)  # Puedes ajustar lag.max según tus necesidades

# Plot the autocorrelation function
plot(acf_result, main = "Serial Autocorrelation of FDI Flows in Mexico") #Boxtests

In this serial autocorrelation graph we can see that when the ACF crosses the confidence band, this means that the correlation at that specific lag is statistically significant, as we can see in the graph, the ACF crosses the confidence band only at the value of x equals 0, which means that there is only a significant correlation at lag 0. Therefore, there do not appear to be significant autocorrelation patterns in this time series.

Conclusions - Insights

      • Insight 1: Exports are statistically significant in the dependent variable, however, in contradiction to the hypothesis, they have a negative influence.
      • Insight 2: GDP PER capita demonstrated to have a positive and statistically significant impact on foreign direct investment, validating our hypothesis.
      • Insight 3: The exchange rate was shown to have a positive relationship with foreign direct investment, however, this exchange rate was not statistically significant.
      • Insight 4: The INPC showed a positive and statistically significant impact with the dependent variable, something about which no hypothesis was made. It will be interesting to analyze in detail what the effects of inflation are on the phenomenon studied.
      • Insight 5: The normalizations made and the LASSO model helped us improve our model and reduce problems such as multiculinality.
      • Insight 6: Nearshoring is an extremely complex phenomenon, whose analysis depends on many more independent variables, not only national, but also international. As has been seen in the historical analysis of the database provided, Mexico has been increasing this phenomenon in recent years and it was interesting to see how some of the variables are significant for the phenomenon. It will be interesting to continue analyzing this phenomenon along with others. international phenomena to study their relationships.

References

“La importancia de la innovación en la inversión extranjera directa.” (2023, 15 de agosto). Estrategias Empresariales. https://www.estrategiasempresariales.com/articulo-innovacion-ied

Pérez, J. (2023, 20 de julio). El crecimiento económico impulsado por la inversión extranjera en México. Economía Global. https://www.economiaglobal.com/crecimiento-economico-ied-mexico

Saucedo, D. (2023). Mexico and its attractiveness for nearshoring. CIC. https://cic.itesm.mx/Paginas/Pagina-DocumentoCic.aspx?id=1860

Velazqued, D. (2011). ANÁLISIS Y PREDICCIÓN DE SERIES DE TIEMPO EN MERCADOS DE ENERGÍA USANDO EL LENGUAJE R. Scielo. http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0012-73532011000100030

