In an increasingly globalized and competitive business world, organizations are constantly seeking innovative strategies to streamline their operations and stay ahead in their respective industries. One of these strategies that has gained prominence in recent years is “Nearshoring”, an outsourcing approach that involves delegating activities and services to providers located in geographically close countries. In this context, Mexico emerges as an attractive destination for “Nearshoring”, thanks to its strategic location, trained human resources and favorable economic conditions.
Mexico has a privileged geographical location, sharing borders with the United States and being part of the North American Free Trade Agreement (NAFTA), today T-MEC, which facilitates access to one of the largest and most dynamic markets. of the world. This geographic proximity not only reduces transportation costs and times, but also facilitates real-time communication and collaboration between parent companies and their nearshoring partners. In addition, Mexico has established a network of trade agreements that grant tariff advantages and facilitate the flow of merchandise, which provides an attractive platform for subcontracting operations.
The country also has a highly-skilled and diversified workforce, ranging from engineers and technology professionals to experts in manufacturing and financial services. Labor costs in Mexico are competitive compared to other outsourcing destinations, allowing companies to obtain higher added value at lower cost. In addition, the continuous growth in education and training has promoted the training of highly qualified professionals, which guarantees the availability of the necessary talent to meet the demands of “Nearshoring”.
Predictive Analytics, or “Predictive Analysis”, is a discipline within the field of analytics that focuses on using historical data and advanced statistical models to forecast future results and trends. Its primary goal is to make reliable predictions about future events based on patterns and relationships identified in available data. In other words, predictive analytics allows you to anticipate what might happen based on what has happened in the past.
Within predictive analysis, regression is an essential technique. Regression is a statistical method that seeks to model the relationship between a dependent variable and one or more independent variables, in order to predict or estimate future values of the dependent variable. In the context of predictive analytics, regression becomes a valuable tool for understanding how independent variables influence the target variable and how they can be used to make forecasts.
Regression is especially useful for measuring and quantifying the relative impact of different variables on the target variable. In this work, we will work with some variables such as Education, Innovation, Population Density, exports and Employment to estimate a regression model that adjusts the dependent variable of Foreign Direct Investment, all this in the analysis of ” Nearshoring” for the Mexican case, By modeling these relationships, more informed predictions can be made about the results of “Nearshoring” in different contexts and under different conditions.
In the context of the analysis of foreign direct investment (FDI) in Mexico, we seek to understand the relationships and factors that influence the flow of FDI in the country. FDI is a crucial economic indicator that impacts the growth and development of a nation. The set of variables available for this study includes data from the time period in question, as well as factors that could influence the flow of FDI, such as exchange rate, exports, employment, education, innovation, insecurity, road density, population density. , among others.
Linear regression analysis is presented as a fundamental tool for this purpose. Linear regression will allow modeling the relationship between the dependent variable, in this case the flow of FDI, and the aforementioned independent variables. Through this analysis we seek to identify those variables that have a significant impact on the flow of FDI. Additionally, the direction and magnitude of these relationships will be explored, providing an understanding of how changes in the independent variables are associated with changes in foreign direct investment.
This study focuses on providing a deeper perspective on the determinants of FDI in Mexico, allowing decision makers, investors and economic analysts to have a more solid understanding of how these variables influence the flow of investment. Regression analysis will not only provide valuable information for decision-making on economic policies and investment strategies, but could also help predict future FDI flows based on changes in the independent variables. All this in order to help Maria in her economic analysis for the company where she works.
It is important to mention the variables that will be worked with during the analysis. Throughout the analysis we will talk about units for each variable, where each unit is explained below…
— “FDI_Flujos”: Foreign Direct Investment (FDI) Unit: Mexican millions pesos Description: Represents the flows of Foreign Direct Investment in the economy, that is, the amount of money that enters the country as foreign investment.
— “Exportaciones”: Exports Unit: Mexican millions pesos Description: Corresponds to the value of exports of goods and services not related to oil. Includes exports from the Maquiladora Export Industry.
—“Empleo”: Employment Rate Unit: Percentage Rate Description: Indicates the percentage of the economically active population that is employed in a job.
— “Educacion”: Years of Education Unit: Years Description: Represents the average number of years of education of the population. The older the age, the higher the educational level.
— “Salario_Diario”: Minimum Daily Wage Unit: Pesos Description: Indicates the minimum wage in daily pesos, which is the base salary paid to workers per working day.
— “Inovacion”: Patent Rate Unit: Patent Rate per 100,000 inhabitants Description: Shows the number of patents requested in Mexico per 100,000 inhabitants. It reflects technological innovation in the country.
— “Inseguridad_robo”: Rate of Robbery with Violence Unit: Robbery Rate per 100,000 inhabitants Description: Represents the rate of violent robberies in different contexts, such as homes, vehicles, businesses, etc.
— “Inseguridad_Homicidio: Homicide Rate Unit: Homicide Rate per 100,000 inhabitants Description: Indicates the homicide rate per 100,000 inhabitants in the country.
— “Tipo_de_Cambio”: Exchange Rate Unit: Pesos per Dollar Description: Reflects the value of the Mexican peso in relation to the US dollar. It is important in international trade.
— “Densidad_Carretera”: Road Density Unit: Length in km² Description: Measures the length of kilometers of paved roads for each km² of the country’s land area.
— “Densidad_Poblacion”: Population Density Unit: Population per km² Description: Indicates the amount of population divided by the territorial area of Mexico in km². Measures population density.
— “CO2_Emisiones”: Carbon Dioxide (CO2) Emissions Unit: Metric Tons Per Capita Description: Represents carbon dioxide emissions per inhabitant. Reflects the environmental impact.
— “PIB_per_Cápita”: Gross Domestic Product (GDP) per Capita Unit: Real 2013 MXN Pesos Description: It is the Gross Domestic Product divided by the population and adjusted by the prices of 2013. It indicates the average income per inhabitant.
— “INPC”: Consumer Price Index (INPC) Unit: Price index (Base 2018 = 100) Description: Represents the consumer price index, which reflects the variation in the prices of consumer goods and services.
### loading libraries
library(foreign)
library(dplyr) # data manipulation
library(forcats) # to work with categorical variables
library(ggplot2) # data visualization
library(readr) # read specific csv files
library(janitor) # data exploration and cleaning
library(Hmisc) # several useful functions for data analysis
library(psych) # functions for multivariate analysis
library(naniar) # summaries and visualization of missing values NA's
library(dlookr) # summaries and visualization of missing values NA's
library(corrplot) # correlation plots
library(jtools) # presentation of regression analysis
library(lmtest) # diagnostic checks - linear regression analysis
library(car) # diagnostic checks - linear regression analysis
library(olsrr) # diagnostic checks - linear regression analysis
library(naniar) # identifying missing values
library(stargazer) # create publication quality tables
library(effects) # displays for linear and other regression models
library(tidyverse) # collection of R packages designed for data science
library(caret) # Classification and Regression Training
library(glmnet) # methods for prediction and plotting, and functions for cross-validation
To begin, it is essential to explore the first data in our database, which will allow us to understand the types of data we have available and the independent variables that will be fundamental to building a robust linear regression model. In this project, the dependent variable that we will use will be Foreign Direct Investment (FDI). This choice is justified because FDI is a key measure related to the central topic of our project: Nearshoring. Furthermore, FDI is a metric that is influenced by a series of independent variables, covering political, social, environmental and ethical aspects. These variables together influence the level of FDI, making it a fundamental indicator for our analysis.
#Importing db
<- read.csv("/Users/gabrielmedina/Downloads/nearshoring.csv")
bd bd
## periodo IED_Flujos_dolares Exportaciones_dolares Empleo Educacion
## 1 1997 12145.60 9087.62 NA 7.20
## 2 1998 8373.50 9875.07 NA 7.31
## 3 1999 13960.32 10990.01 NA 7.43
## 4 2000 18248.69 12482.96 97.83 7.56
## 5 2001 30057.18 11300.44 97.36 7.68
## 6 2002 24099.21 11923.10 97.66 7.80
## 7 2003 18249.97 13156.00 97.06 7.93
## 8 2004 25015.57 13573.13 96.48 8.04
## 9 2005 25795.82 16465.81 97.17 8.14
## 10 2006 21232.54 17485.93 96.53 8.26
## 11 2007 32393.33 19103.85 96.60 8.36
## 12 2008 29502.46 16924.76 95.68 8.46
## 13 2009 17849.95 19702.63 95.20 8.56
## 14 2010 27189.28 22673.14 95.06 8.63
## 15 2011 25632.52 24333.02 95.49 8.75
## 16 2012 21769.32 26297.98 95.53 8.85
## 17 2013 48354.42 27687.57 95.75 8.95
## 18 2014 30351.25 31676.78 96.24 9.05
## 19 2015 35943.75 29959.94 96.04 9.15
## 20 2016 31188.98 31375.06 96.62 9.25
## 21 2017 34017.05 33322.62 96.85 9.35
## 22 2018 34100.43 35341.90 96.64 9.45
## 23 2019 34577.16 36414.73 97.09 9.58
## 24 2020 28205.89 41077.34 96.21 NA
## 25 2021 31553.52 44914.78 96.49 NA
## 26 2022 36215.37 46477.59 97.24 NA
## Salario_Diario Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 1 24.30 11.30 266.51 14.55
## 2 31.91 11.37 314.78 14.32
## 3 31.91 12.46 272.89 12.64
## 4 35.12 13.15 216.98 10.86
## 5 37.57 13.47 214.53 10.25
## 6 39.74 12.80 197.80 9.94
## 7 41.53 11.81 183.22 9.81
## 8 43.30 12.61 146.28 8.92
## 9 45.24 13.41 136.94 9.22
## 10 47.05 14.23 135.59 9.60
## 11 48.88 15.04 145.92 8.04
## 12 50.84 14.82 158.17 12.52
## 13 53.19 12.59 175.77 17.46
## 14 55.77 12.69 201.94 22.43
## 15 58.06 12.10 212.61 23.42
## 16 60.75 13.03 190.28 22.09
## 17 63.12 13.22 185.56 19.74
## 18 65.58 13.65 154.41 16.93
## 19 70.10 15.11 180.44 17.37
## 20 73.04 14.40 160.57 20.31
## 21 88.36 14.05 230.43 26.22
## 22 88.36 13.25 184.25 29.59
## 23 102.68 12.70 173.45 29.21
## 24 123.22 11.28 133.90 28.98
## 25 141.70 NA 127.13 27.89
## 26 172.87 NA 120.49 NA
## Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## 1 8.06 0.05 47.44 3.68
## 2 9.94 0.05 48.76 3.85
## 3 9.52 0.06 49.48 3.69
## 4 9.60 0.06 50.58 3.87
## 5 9.17 0.06 51.28 3.81
## 6 10.36 0.06 51.95 3.82
## 7 11.20 0.06 52.61 3.95
## 8 11.22 0.06 53.27 3.98
## 9 10.71 0.06 54.78 4.10
## 10 10.88 0.06 55.44 4.19
## 11 10.90 0.06 56.17 4.22
## 12 13.77 0.07 56.96 4.19
## 13 13.04 0.07 57.73 4.04
## 14 12.38 0.07 58.45 4.11
## 15 13.98 0.07 59.15 4.19
## 16 12.99 0.07 59.85 4.20
## 17 13.07 0.08 59.49 4.06
## 18 14.73 0.08 60.17 3.89
## 19 17.34 0.08 60.86 3.93
## 20 20.66 0.08 61.57 3.89
## 21 19.74 0.09 62.28 3.84
## 22 19.66 0.09 63.11 3.65
## 23 18.87 0.09 63.90 3.59
## 24 19.94 0.09 64.59 NA
## 25 20.52 0.09 65.16 NA
## 26 19.41 0.09 65.60 NA
## PIB_Per_Capita INPC
## 1 127570.1 33.28
## 2 126738.8 39.47
## 3 129164.7 44.34
## 4 130874.9 48.31
## 5 128083.4 50.43
## 6 128205.9 53.31
## 7 128737.9 55.43
## 8 132563.5 58.31
## 9 132941.1 60.25
## 10 135894.9 62.69
## 11 137795.7 65.05
## 12 135176.0 69.30
## 13 131233.0 71.77
## 14 134991.7 74.93
## 15 138891.9 77.79
## 16 141530.2 80.57
## 17 144112.0 83.77
## 18 147277.4 87.19
## 19 149433.5 89.05
## 20 152275.4 92.04
## 21 153235.7 98.27
## 22 153133.8 99.91
## 23 150233.1 105.93
## 24 142609.3 109.27
## 25 142772.0 117.31
## 26 146826.7 126.48
To begin with, it begins with an exploratory analysis of the data, understanding what the structure of the data is and what type of data we have.
sum(is.na(bd))
## [1] 12
str(bd)
## 'data.frame': 26 obs. of 15 variables:
## $ periodo : int 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
## $ IED_Flujos_dolares : num 12146 8374 13960 18249 30057 ...
## $ Exportaciones_dolares: num 9088 9875 10990 12483 11300 ...
## $ Empleo : num NA NA NA 97.8 97.4 ...
## $ Educacion : num 7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
## $ Salario_Diario : num 24.3 31.9 31.9 35.1 37.6 ...
## $ Innovacion : num 11.3 11.4 12.5 13.2 13.5 ...
## $ Inseguridad_Robo : num 267 315 273 217 215 ...
## $ Inseguridad_Homicidio: num 14.6 14.3 12.6 10.9 10.2 ...
## $ Tipo_de_Cambio : num 8.06 9.94 9.52 9.6 9.17 ...
## $ Densidad_Carretera : num 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
## $ Densidad_Poblacion : num 47.4 48.8 49.5 50.6 51.3 ...
## $ CO2_Emisiones : num 3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
## $ PIB_Per_Capita : num 127570 126739 129165 130875 128083 ...
## $ INPC : num 33.3 39.5 44.3 48.3 50.4 ...
#Descriptive statistics
summary(bd)
## periodo IED_Flujos_dolares Exportaciones_dolares Empleo
## Min. :1997 Min. : 8374 Min. : 9088 Min. :95.06
## 1st Qu.:2003 1st Qu.:21367 1st Qu.:13260 1st Qu.:95.89
## Median :2010 Median :27698 Median :21188 Median :96.53
## Mean :2010 Mean :26770 Mean :23601 Mean :96.47
## 3rd Qu.:2016 3rd Qu.:32183 3rd Qu.:31601 3rd Qu.:97.08
## Max. :2022 Max. :48354 Max. :46478 Max. :97.83
## NA's :3
## Educacion Salario_Diario Innovacion Inseguridad_Robo
## Min. :7.200 Min. : 24.30 Min. :11.28 Min. :120.5
## 1st Qu.:7.865 1st Qu.: 41.97 1st Qu.:12.56 1st Qu.:148.3
## Median :8.460 Median : 54.48 Median :13.09 Median :181.8
## Mean :8.423 Mean : 65.16 Mean :13.11 Mean :185.4
## 3rd Qu.:9.000 3rd Qu.: 72.31 3rd Qu.:13.75 3rd Qu.:209.9
## Max. :9.580 Max. :172.87 Max. :15.11 Max. :314.8
## NA's :3 NA's :2
## Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## Min. : 8.04 Min. : 8.06 Min. :0.05000 Min. :47.44
## 1st Qu.:10.25 1st Qu.:10.75 1st Qu.:0.06000 1st Qu.:52.77
## Median :16.93 Median :13.02 Median :0.07000 Median :58.09
## Mean :17.29 Mean :13.91 Mean :0.07115 Mean :57.33
## 3rd Qu.:22.43 3rd Qu.:18.49 3rd Qu.:0.08000 3rd Qu.:61.39
## Max. :29.59 Max. :20.66 Max. :0.09000 Max. :65.60
## NA's :1
## CO2_Emisiones PIB_Per_Capita INPC
## Min. :3.590 Min. :126739 Min. : 33.28
## 1st Qu.:3.830 1st Qu.:130964 1st Qu.: 56.15
## Median :3.930 Median :136845 Median : 73.35
## Mean :3.945 Mean :138550 Mean : 75.17
## 3rd Qu.:4.105 3rd Qu.:146148 3rd Qu.: 91.29
## Max. :4.220 Max. :153236 Max. :126.48
## NA's :3
#The analysis of descriptive statistics reveals a variety of data in a period spanning from 1997 to 2022. Foreign Direct Investment (FDI) stands out as a dependent variable, with a wide range of values ranging between 210,876 and 754,438. Other variables, such as daily wage, innovation, insecurity and GDP per capita, also show significant variability. These data provide an overview of the diversity in the key variables, suggesting the need for further analysis and the possibility of building a linear regression model to investigate the relationships between them in the context of Nearshoring.
As we could see, there are several records with missing values, removing them would skew our data. Therefore, the mean statistical imputation method will be used to be able to replace the null values
#Data imputation using mean.
<- bd
datos_imputados <- mean(bd$Empleo, na.rm = TRUE) # Calculate the mean without NA
media_empleo <- mean(bd$Educacion, na.rm = TRUE) # Calculate the mean without NA
media_educacion <- mean(bd$Innovacion, na.rm = TRUE) # Calculate the mean without NA
media_innovacion <- mean(bd$Inseguridad_Homicidio, na.rm = TRUE) # Calculate the mean without NA
media_homicidio <- mean(bd$CO2_Emisiones, na.rm = TRUE) # Calculate the mean without NA
media_CO2 $Empleo[is.na(datos_imputados$Empleo)] <- media_empleo
datos_imputados$Educacion[is.na(datos_imputados$Educacion)] <- media_educacion
datos_imputados$Innovacion[is.na(datos_imputados$Innovacion)] <- media_innovacion
datos_imputados$Inseguridad_Homicidio[is.na(datos_imputados$Inseguridad_Homicidio)] <- media_homicidio
datos_imputados$CO2_Emisiones[is.na(datos_imputados$CO2_Emisiones)] <- media_CO2 datos_imputados
#Data structure
str(datos_imputados)
## 'data.frame': 26 obs. of 15 variables:
## $ periodo : int 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
## $ IED_Flujos_dolares : num 12146 8374 13960 18249 30057 ...
## $ Exportaciones_dolares: num 9088 9875 10990 12483 11300 ...
## $ Empleo : num 96.5 96.5 96.5 97.8 97.4 ...
## $ Educacion : num 7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
## $ Salario_Diario : num 24.3 31.9 31.9 35.1 37.6 ...
## $ Innovacion : num 11.3 11.4 12.5 13.2 13.5 ...
## $ Inseguridad_Robo : num 267 315 273 217 215 ...
## $ Inseguridad_Homicidio: num 14.6 14.3 12.6 10.9 10.2 ...
## $ Tipo_de_Cambio : num 8.06 9.94 9.52 9.6 9.17 ...
## $ Densidad_Carretera : num 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
## $ Densidad_Poblacion : num 47.4 48.8 49.5 50.6 51.3 ...
## $ CO2_Emisiones : num 3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
## $ PIB_Per_Capita : num 127570 126739 129165 130875 128083 ...
## $ INPC : num 33.3 39.5 44.3 48.3 50.4 ...
=datos_imputados
bd2 bd2
## periodo IED_Flujos_dolares Exportaciones_dolares Empleo Educacion
## 1 1997 12145.60 9087.62 96.47043 7.200000
## 2 1998 8373.50 9875.07 96.47043 7.310000
## 3 1999 13960.32 10990.01 96.47043 7.430000
## 4 2000 18248.69 12482.96 97.83000 7.560000
## 5 2001 30057.18 11300.44 97.36000 7.680000
## 6 2002 24099.21 11923.10 97.66000 7.800000
## 7 2003 18249.97 13156.00 97.06000 7.930000
## 8 2004 25015.57 13573.13 96.48000 8.040000
## 9 2005 25795.82 16465.81 97.17000 8.140000
## 10 2006 21232.54 17485.93 96.53000 8.260000
## 11 2007 32393.33 19103.85 96.60000 8.360000
## 12 2008 29502.46 16924.76 95.68000 8.460000
## 13 2009 17849.95 19702.63 95.20000 8.560000
## 14 2010 27189.28 22673.14 95.06000 8.630000
## 15 2011 25632.52 24333.02 95.49000 8.750000
## 16 2012 21769.32 26297.98 95.53000 8.850000
## 17 2013 48354.42 27687.57 95.75000 8.950000
## 18 2014 30351.25 31676.78 96.24000 9.050000
## 19 2015 35943.75 29959.94 96.04000 9.150000
## 20 2016 31188.98 31375.06 96.62000 9.250000
## 21 2017 34017.05 33322.62 96.85000 9.350000
## 22 2018 34100.43 35341.90 96.64000 9.450000
## 23 2019 34577.16 36414.73 97.09000 9.580000
## 24 2020 28205.89 41077.34 96.21000 8.423478
## 25 2021 31553.52 44914.78 96.49000 8.423478
## 26 2022 36215.37 46477.59 97.24000 8.423478
## Salario_Diario Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 1 24.30 11.30000 266.51 14.5500
## 2 31.91 11.37000 314.78 14.3200
## 3 31.91 12.46000 272.89 12.6400
## 4 35.12 13.15000 216.98 10.8600
## 5 37.57 13.47000 214.53 10.2500
## 6 39.74 12.80000 197.80 9.9400
## 7 41.53 11.81000 183.22 9.8100
## 8 43.30 12.61000 146.28 8.9200
## 9 45.24 13.41000 136.94 9.2200
## 10 47.05 14.23000 135.59 9.6000
## 11 48.88 15.04000 145.92 8.0400
## 12 50.84 14.82000 158.17 12.5200
## 13 53.19 12.59000 175.77 17.4600
## 14 55.77 12.69000 201.94 22.4300
## 15 58.06 12.10000 212.61 23.4200
## 16 60.75 13.03000 190.28 22.0900
## 17 63.12 13.22000 185.56 19.7400
## 18 65.58 13.65000 154.41 16.9300
## 19 70.10 15.11000 180.44 17.3700
## 20 73.04 14.40000 160.57 20.3100
## 21 88.36 14.05000 230.43 26.2200
## 22 88.36 13.25000 184.25 29.5900
## 23 102.68 12.70000 173.45 29.2100
## 24 123.22 11.28000 133.90 28.9800
## 25 141.70 13.10583 127.13 27.8900
## 26 172.87 13.10583 120.49 17.2924
## Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## 1 8.06 0.05 47.44 3.680000
## 2 9.94 0.05 48.76 3.850000
## 3 9.52 0.06 49.48 3.690000
## 4 9.60 0.06 50.58 3.870000
## 5 9.17 0.06 51.28 3.810000
## 6 10.36 0.06 51.95 3.820000
## 7 11.20 0.06 52.61 3.950000
## 8 11.22 0.06 53.27 3.980000
## 9 10.71 0.06 54.78 4.100000
## 10 10.88 0.06 55.44 4.190000
## 11 10.90 0.06 56.17 4.220000
## 12 13.77 0.07 56.96 4.190000
## 13 13.04 0.07 57.73 4.040000
## 14 12.38 0.07 58.45 4.110000
## 15 13.98 0.07 59.15 4.190000
## 16 12.99 0.07 59.85 4.200000
## 17 13.07 0.08 59.49 4.060000
## 18 14.73 0.08 60.17 3.890000
## 19 17.34 0.08 60.86 3.930000
## 20 20.66 0.08 61.57 3.890000
## 21 19.74 0.09 62.28 3.840000
## 22 19.66 0.09 63.11 3.650000
## 23 18.87 0.09 63.90 3.590000
## 24 19.94 0.09 64.59 3.945217
## 25 20.52 0.09 65.16 3.945217
## 26 19.41 0.09 65.60 3.945217
## PIB_Per_Capita INPC
## 1 127570.1 33.28
## 2 126738.8 39.47
## 3 129164.7 44.34
## 4 130874.9 48.31
## 5 128083.4 50.43
## 6 128205.9 53.31
## 7 128737.9 55.43
## 8 132563.5 58.31
## 9 132941.1 60.25
## 10 135894.9 62.69
## 11 137795.7 65.05
## 12 135176.0 69.30
## 13 131233.0 71.77
## 14 134991.7 74.93
## 15 138891.9 77.79
## 16 141530.2 80.57
## 17 144112.0 83.77
## 18 147277.4 87.19
## 19 149433.5 89.05
## 20 152275.4 92.04
## 21 153235.7 98.27
## 22 153133.8 99.91
## 23 150233.1 105.93
## 24 142609.3 109.27
## 25 142772.0 117.31
## 26 146826.7 126.48
To begin, according to the glossary of the database provided, foreign direct investment and exports are in dollars and in nominal values, which is why it is important to transform them into pesos and real values.
These real values more accurately reflect the fluctuations in foreign direct investment over the time series, taking into account the inflationary changes that occurred in those specific years. To achieve this conversion, the exchange rate corresponding to the year of each observation was applied and the National Consumer Price Index (INPC) of that same year was considered as an inflation adjustment. This approach allows for a more accurate and realistic representation of FDI over time.
$IED_Flujos = ((bd2$IED_Flujos_dolares * bd2$Tipo_de_Cambio) / bd2$INPC) * 100
bd2$Exportaciones = ((bd2$Exportaciones_dolares * bd2$Tipo_de_Cambio) / bd2$INPC) * 100
bd2 bd2
## periodo IED_Flujos_dolares Exportaciones_dolares Empleo Educacion
## 1 1997 12145.60 9087.62 96.47043 7.200000
## 2 1998 8373.50 9875.07 96.47043 7.310000
## 3 1999 13960.32 10990.01 96.47043 7.430000
## 4 2000 18248.69 12482.96 97.83000 7.560000
## 5 2001 30057.18 11300.44 97.36000 7.680000
## 6 2002 24099.21 11923.10 97.66000 7.800000
## 7 2003 18249.97 13156.00 97.06000 7.930000
## 8 2004 25015.57 13573.13 96.48000 8.040000
## 9 2005 25795.82 16465.81 97.17000 8.140000
## 10 2006 21232.54 17485.93 96.53000 8.260000
## 11 2007 32393.33 19103.85 96.60000 8.360000
## 12 2008 29502.46 16924.76 95.68000 8.460000
## 13 2009 17849.95 19702.63 95.20000 8.560000
## 14 2010 27189.28 22673.14 95.06000 8.630000
## 15 2011 25632.52 24333.02 95.49000 8.750000
## 16 2012 21769.32 26297.98 95.53000 8.850000
## 17 2013 48354.42 27687.57 95.75000 8.950000
## 18 2014 30351.25 31676.78 96.24000 9.050000
## 19 2015 35943.75 29959.94 96.04000 9.150000
## 20 2016 31188.98 31375.06 96.62000 9.250000
## 21 2017 34017.05 33322.62 96.85000 9.350000
## 22 2018 34100.43 35341.90 96.64000 9.450000
## 23 2019 34577.16 36414.73 97.09000 9.580000
## 24 2020 28205.89 41077.34 96.21000 8.423478
## 25 2021 31553.52 44914.78 96.49000 8.423478
## 26 2022 36215.37 46477.59 97.24000 8.423478
## Salario_Diario Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 1 24.30 11.30000 266.51 14.5500
## 2 31.91 11.37000 314.78 14.3200
## 3 31.91 12.46000 272.89 12.6400
## 4 35.12 13.15000 216.98 10.8600
## 5 37.57 13.47000 214.53 10.2500
## 6 39.74 12.80000 197.80 9.9400
## 7 41.53 11.81000 183.22 9.8100
## 8 43.30 12.61000 146.28 8.9200
## 9 45.24 13.41000 136.94 9.2200
## 10 47.05 14.23000 135.59 9.6000
## 11 48.88 15.04000 145.92 8.0400
## 12 50.84 14.82000 158.17 12.5200
## 13 53.19 12.59000 175.77 17.4600
## 14 55.77 12.69000 201.94 22.4300
## 15 58.06 12.10000 212.61 23.4200
## 16 60.75 13.03000 190.28 22.0900
## 17 63.12 13.22000 185.56 19.7400
## 18 65.58 13.65000 154.41 16.9300
## 19 70.10 15.11000 180.44 17.3700
## 20 73.04 14.40000 160.57 20.3100
## 21 88.36 14.05000 230.43 26.2200
## 22 88.36 13.25000 184.25 29.5900
## 23 102.68 12.70000 173.45 29.2100
## 24 123.22 11.28000 133.90 28.9800
## 25 141.70 13.10583 127.13 27.8900
## 26 172.87 13.10583 120.49 17.2924
## Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## 1 8.06 0.05 47.44 3.680000
## 2 9.94 0.05 48.76 3.850000
## 3 9.52 0.06 49.48 3.690000
## 4 9.60 0.06 50.58 3.870000
## 5 9.17 0.06 51.28 3.810000
## 6 10.36 0.06 51.95 3.820000
## 7 11.20 0.06 52.61 3.950000
## 8 11.22 0.06 53.27 3.980000
## 9 10.71 0.06 54.78 4.100000
## 10 10.88 0.06 55.44 4.190000
## 11 10.90 0.06 56.17 4.220000
## 12 13.77 0.07 56.96 4.190000
## 13 13.04 0.07 57.73 4.040000
## 14 12.38 0.07 58.45 4.110000
## 15 13.98 0.07 59.15 4.190000
## 16 12.99 0.07 59.85 4.200000
## 17 13.07 0.08 59.49 4.060000
## 18 14.73 0.08 60.17 3.890000
## 19 17.34 0.08 60.86 3.930000
## 20 20.66 0.08 61.57 3.890000
## 21 19.74 0.09 62.28 3.840000
## 22 19.66 0.09 63.11 3.650000
## 23 18.87 0.09 63.90 3.590000
## 24 19.94 0.09 64.59 3.945217
## 25 20.52 0.09 65.16 3.945217
## 26 19.41 0.09 65.60 3.945217
## PIB_Per_Capita INPC IED_Flujos Exportaciones
## 1 127570.1 33.28 294151.2 220090.8
## 2 126738.8 39.47 210875.6 248690.6
## 3 129164.7 44.34 299734.4 235960.5
## 4 130874.9 48.31 362631.8 248057.2
## 5 128083.4 50.43 546548.4 205482.9
## 6 128205.9 53.31 468332.0 231707.6
## 7 128737.9 55.43 368752.8 265825.7
## 8 132563.5 58.31 481349.2 261173.9
## 9 132941.1 60.25 458544.8 292695.1
## 10 135894.9 62.69 368495.8 303472.5
## 11 137795.7 65.05 542793.7 320110.6
## 12 135176.0 69.30 586217.7 336297.2
## 13 131233.0 71.77 324318.4 357980.1
## 14 134991.7 74.93 449223.7 374607.6
## 15 138891.9 77.79 460653.8 437299.9
## 16 141530.2 80.57 350978.6 423992.5
## 17 144112.0 83.77 754437.5 431988.2
## 18 147277.4 87.19 512758.2 535151.9
## 19 149433.5 89.05 699904.1 583386.1
## 20 152275.4 92.04 700091.6 704268.5
## 21 153235.7 98.27 683318.0 669368.6
## 22 153133.8 99.91 671018.4 695447.7
## 23 150233.1 105.93 615945.4 648679.3
## 24 142609.3 109.27 514711.7 749594.7
## 25 142772.0 117.31 551937.8 785654.5
## 26 146826.7 126.48 555771.9 713259.0
It is also important to understand what is the relationship of each of the independent variables with the dependent variable. This is done by means of graphs to understand the relationships.
#Data visualization (independent variables vs IED)
=data
datos_imputadosggplot(data = bd2, aes(x = Exportaciones, y = IED_Flujos)) +
geom_point() +
labs(x = "Exports", y = "Dependent Variable (IED)") +
ggtitle("Scatter Plot: Exports vs. IED")
As we can see in this graph of exports vs FDI, it can be seen that these variables are positively correlated, although the relationship is not so strong, but since there is a correlation pattern it could be a good assumption to assume that this variable will be statistically significant in the face of to the construction of our regression models.
ggplot(data = bd2, aes(x = Salario_Diario, y = IED_Flujos)) +
geom_line() +
labs(x = "Salario diario", y = "Dependent Variable (IED)") +
ggtitle("Line Plot: Salary vs. IED")
En esta gráfica, apreciamos que al comienzo de las variables, se muestra
una ligera correlación positiva, sin embargo, a medida que crece el
salario se aprecia como pierde correlación con la variable
dependiente.
ggplot(data = bd2, aes(x = Tipo_de_Cambio, y = IED_Flujos)) +
geom_line() +
labs(x = "Exchange rate", y = "Dependent Variable (IED)") +
ggtitle("Density Plot: Exchange rate vs. IED")
Just as in the case of exports, despite showing little correlation, the
positive correlation of this relationship between the exchange rate and
our dependent variable is noticeable.
ggplot(data = bd2, aes(x = Densidad_Poblacion, y = IED_Flujos)) +
geom_hex() +
labs(x = "Poblation Density", y = "Dependent Variable (IED)") +
ggtitle("Hex Plot: Poblation Density vs. IED")
A slight positive correlation between these two variables is
noticeable.
ggplot(data = bd2, aes(x = CO2_Emisiones, y = IED_Flujos)) +
geom_bar(stat = "identity") +
labs(x = "CO2", y = "Variable Dependiente (IED)") +
ggtitle("Gráfico de Barras: CO2 vs. IED")
There is almost no correlation
ggplot(data = bd2, aes(x = INPC, y = IED_Flujos)) +
geom_point() +
labs(x = "INPC", y = "Dependent Variable (IED)") +
ggtitle("Scatter Plot: INPC vs. IED")
The INPC shows a correlation with FDI, something that makes a lot of
sense because the National Consumer Price Index is a way to measure the
inflation that is occurring in the country, and when foreign investment
rises, it is a sign of economic spillover. that is happening and the
growth of the economy, which leads to more money in circulation in the
hands of Mexicans and therefore a consequent inflation.
str(bd2)
## 'data.frame': 26 obs. of 17 variables:
## $ periodo : int 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
## $ IED_Flujos_dolares : num 12146 8374 13960 18249 30057 ...
## $ Exportaciones_dolares: num 9088 9875 10990 12483 11300 ...
## $ Empleo : num 96.5 96.5 96.5 97.8 97.4 ...
## $ Educacion : num 7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
## $ Salario_Diario : num 24.3 31.9 31.9 35.1 37.6 ...
## $ Innovacion : num 11.3 11.4 12.5 13.2 13.5 ...
## $ Inseguridad_Robo : num 267 315 273 217 215 ...
## $ Inseguridad_Homicidio: num 14.6 14.3 12.6 10.9 10.2 ...
## $ Tipo_de_Cambio : num 8.06 9.94 9.52 9.6 9.17 ...
## $ Densidad_Carretera : num 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
## $ Densidad_Poblacion : num 47.4 48.8 49.5 50.6 51.3 ...
## $ CO2_Emisiones : num 3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
## $ PIB_Per_Capita : num 127570 126739 129165 130875 128083 ...
## $ INPC : num 33.3 39.5 44.3 48.3 50.4 ...
## $ IED_Flujos : num 294151 210876 299734 362632 546548 ...
## $ Exportaciones : num 220091 248691 235961 248057 205483 ...
#Analyze the distribution of the dependent variable data
hist(bd2$IED_Flujos)
Normal Distribution is observed
Likewise, it is important to analyze the distribution of the data of each of the independent variables. This in order to determine if normalization methods will be applied for certain variables.
hist(bd2$Exportaciones)
hist(bd2$Educacion)
hist(bd2$Innovacion)
hist(bd2$Inseguridad_Robo)
hist(bd2$Inseguridad_Homicidio)
hist(bd2$Salario_Diario)
hist(bd2$Tipo_de_Cambio)
hist(bd2$Densidad_Poblacion)
hist(bd2$Densidad_Carretera)
hist(bd2$INPC)
hist(bd2$CO2_Emisiones)
hist(bd2$Empleo)
hist(bd2$PIB_Per_Capita)
As seen in the histograms, some variables such as daily wage, GDP per capita, and road density could be better applied to the model if the data for these independent variables are normalized.
Another way to understand the importance of the independent variables is through a correlation graph, which allows us to know the level of correlation that each independent variable has with the dependent variable.
#Correlation plot
corrplot(cor(bd2),
type = "upper",
order = "hclust",
addCoef.col = "black",
tl.cex = 0.7,
)
This is a way to show the matrix in a more understandable way.
cor(bd2)
## periodo IED_Flujos_dolares Exportaciones_dolares
## periodo 1.00000000 0.72318518 0.97797102
## IED_Flujos_dolares 0.72318518 1.00000000 0.66131031
## Exportaciones_dolares 0.97797102 0.66131031 1.00000000
## Empleo -0.19841056 -0.04180470 -0.11323525
## Educacion 0.83105621 0.72312246 0.72304782
## Salario_Diario 0.88382398 0.55538324 0.93831578
## Innovacion 0.25579696 0.53493886 0.16104355
## Inseguridad_Robo -0.58906756 -0.54978405 -0.53699390
## Inseguridad_Homicidio 0.78019803 0.39745854 0.78304009
## Tipo_de_Cambio 0.94088306 0.60323139 0.92867446
## Densidad_Carretera 0.96458288 0.72608943 0.95118248
## Densidad_Poblacion 0.99553772 0.71914745 0.96144282
## CO2_Emisiones 0.03313055 0.09893952 -0.05354392
## PIB_Per_Capita 0.88921632 0.73421761 0.85244470
## INPC 0.99125242 0.70473813 0.98711668
## IED_Flujos 0.68609682 0.93761585 0.60981365
## Exportaciones 0.95382682 0.60314562 0.96788006
## Empleo Educacion Salario_Diario Innovacion
## periodo -0.198410563 0.83105621 0.88382398 0.25579696
## IED_Flujos_dolares -0.041804696 0.72312246 0.55538324 0.53493886
## Exportaciones_dolares -0.113235254 0.72304782 0.93831578 0.16104355
## Empleo 1.000000000 -0.31023311 0.04593783 0.02362057
## Educacion -0.310233106 1.00000000 0.49869278 0.44910805
## Salario_Diario 0.045937834 0.49869278 1.00000000 0.05207552
## Innovacion 0.023620573 0.44910805 0.05207552 1.00000000
## Inseguridad_Robo -0.004217223 -0.43639523 -0.54181369 -0.42120203
## Inseguridad_Homicidio -0.319803545 0.67397461 0.65108820 -0.16457984
## Tipo_de_Cambio -0.074355786 0.76714391 0.85057472 0.21768318
## Densidad_Carretera -0.114773291 0.80732881 0.85823660 0.21563225
## Densidad_Poblacion -0.239171507 0.84484920 0.86218581 0.27949868
## CO2_Emisiones -0.498936093 0.07009249 -0.08454554 0.32027189
## PIB_Per_Capita -0.101466704 0.90428687 0.67319673 0.43195946
## INPC -0.125598268 0.76629637 0.93375758 0.22007275
## IED_Flujos 0.031512099 0.74275850 0.48161305 0.58473780
## Exportaciones -0.083855335 0.74096111 0.88113311 0.17106144
## Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio
## periodo -0.589067557 0.78019803 0.94088306
## IED_Flujos_dolares -0.549784049 0.39745854 0.60323139
## Exportaciones_dolares -0.536993897 0.78304009 0.92867446
## Empleo -0.004217223 -0.31980355 -0.07435579
## Educacion -0.436395232 0.67397461 0.76714391
## Salario_Diario -0.541813691 0.65108820 0.85057472
## Innovacion -0.421202028 -0.16457984 0.21768318
## Inseguridad_Robo 1.000000000 -0.07833158 -0.45220207
## Inseguridad_Homicidio -0.078331576 1.00000000 0.79183373
## Tipo_de_Cambio -0.452202075 0.79183373 1.00000000
## Densidad_Carretera -0.470108908 0.81519169 0.94321700
## Densidad_Poblacion -0.616336204 0.76810355 0.92092571
## CO2_Emisiones -0.419071007 -0.24054955 -0.15728113
## PIB_Per_Capita -0.402521841 0.70529204 0.88109098
## INPC -0.594576114 0.75454073 0.93511557
## IED_Flujos -0.451823035 0.41641063 0.67504631
## Exportaciones -0.447470177 0.82008178 0.98228669
## Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## periodo 0.9645829 0.9955377 0.033130551
## IED_Flujos_dolares 0.7260894 0.7191474 0.098939515
## Exportaciones_dolares 0.9511825 0.9614428 -0.053543920
## Empleo -0.1147733 -0.2391715 -0.498936093
## Educacion 0.8073288 0.8448492 0.070092492
## Salario_Diario 0.8582366 0.8621858 -0.084545536
## Innovacion 0.2156323 0.2794987 0.320271892
## Inseguridad_Robo -0.4701089 -0.6163362 -0.419071007
## Inseguridad_Homicidio 0.8151917 0.7681035 -0.240549545
## Tipo_de_Cambio 0.9432170 0.9209257 -0.157281133
## Densidad_Carretera 1.0000000 0.9480910 -0.153736764
## Densidad_Poblacion 0.9480910 1.0000000 0.105315290
## CO2_Emisiones -0.1537368 0.1053153 1.000000000
## PIB_Per_Capita 0.8855798 0.8723228 -0.107641865
## INPC 0.9618219 0.9830582 0.003521392
## IED_Flujos 0.7208468 0.6715246 -0.054005458
## Exportaciones 0.9477079 0.9300112 -0.160330241
## PIB_Per_Capita INPC IED_Flujos Exportaciones
## periodo 0.8892163 0.991252417 0.68609682 0.95382682
## IED_Flujos_dolares 0.7342176 0.704738131 0.93761585 0.60314562
## Exportaciones_dolares 0.8524447 0.987116680 0.60981365 0.96788006
## Empleo -0.1014667 -0.125598268 0.03151210 -0.08385533
## Educacion 0.9042869 0.766296372 0.74275850 0.74096111
## Salario_Diario 0.6731967 0.933757576 0.48161305 0.88113311
## Innovacion 0.4319595 0.220072749 0.58473780 0.17106144
## Inseguridad_Robo -0.4025218 -0.594576114 -0.45182304 -0.44747018
## Inseguridad_Homicidio 0.7052920 0.754540732 0.41641063 0.82008178
## Tipo_de_Cambio 0.8810910 0.935115572 0.67504631 0.98228669
## Densidad_Carretera 0.8855798 0.961821927 0.72084682 0.94770790
## Densidad_Poblacion 0.8723228 0.983058193 0.67152462 0.93001115
## CO2_Emisiones -0.1076419 0.003521392 -0.05400546 -0.16033024
## PIB_Per_Capita 1.0000000 0.851476073 0.77968545 0.88763319
## INPC 0.8514761 1.000000000 0.65404935 0.95015955
## IED_Flujos 0.7796854 0.654049346 1.00000000 0.63773639
## Exportaciones 0.8876332 0.950159547 0.63773639 1.00000000
In this study, the Ordinary Least Squares method will be used to develop a regression model that allows us to understand and analyze the relationships between the independent variables and a dependent variable (FDI). Through this approach, we will seek to identify patterns, trends and effects that can help explain and predict the behavior of the dependent variable based on the selected independent variables.
This method is used to find the best linear relationship between a set of independent variables and a dependent variable, minimizing the sum of the squares of the differences between the observed values and the values predicted by the model.
Since we understand the structure and type of data in detail, as well as the multiple relationships of the independent variables, the following hypotheses can be posed.
From this, the base regression model was modeled, using the variables that are believed to be most relevant according to our exploratory data analysis.
According to our exploratory data analysis, considering the analysis of each variable, taking into account the distribution of the data, its importance in the theoretical context of the problem and its correlation with the dependent variable such as, for example, variables such as “Exportaciones,” “Educacion” “Salario_diario,” “Inovación,” “Tipo de cambio,” “Densidad_Carretera,” “Densidad_Población,” “PIB per Cápita,” and “INPC” have relatively high correlations with the dependent variable FDI_Flows. These could be considered strong candidates to include in the model. However, it is also important to consider multicollinearity. For example, “Densidad_Población” is highly correlated with “Densidad_Carretera” and “Periodo” is highly correlated with “Densidad_Población” This can complicate model interpretation, so you might choose to include only one of these highly correlated variables. It is important to mention that the correlation between an independent variable and the dependent variable may suggest that there is a potential relationship worth exploring, but it does not guarantee that this relationship will be statistically significant in a regression model. Since the correlation indicates that there is an association between variables but not necessarily a causal relationship between one and the other.
Taking this into account, it was decided to use the following independent variables…
<-lm(IED_Flujos ~ Exportaciones + Educacion + Salario_Diario + Tipo_de_Cambio + Densidad_Carretera + PIB_Per_Capita + INPC,data=bd2)
dmodel1summary(dmodel1)
##
## Call:
## lm(formula = IED_Flujos ~ Exportaciones + Educacion + Salario_Diario +
## Tipo_de_Cambio + Densidad_Carretera + PIB_Per_Capita + INPC,
## data = bd2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -153904 -47900 1004 52953 137546
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.506e+06 7.646e+05 -1.970 0.0645 .
## Exportaciones -1.907e+00 7.187e-01 -2.653 0.0162 *
## Educacion -2.650e+05 1.695e+05 -1.563 0.1354
## Salario_Diario -5.530e+03 4.471e+03 -1.237 0.2320
## Tipo_de_Cambio 5.086e+04 2.567e+04 1.982 0.0630 .
## Densidad_Carretera 7.181e+06 5.738e+06 1.251 0.2268
## PIB_Per_Capita 2.374e+01 8.337e+00 2.848 0.0107 *
## INPC 1.212e+04 9.733e+03 1.246 0.2289
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 85890 on 18 degrees of freedom
## Multiple R-squared: 0.7433, Adjusted R-squared: 0.6435
## F-statistic: 7.446 on 7 and 18 DF, p-value: 0.0002852
Intercept: represents the estimated value of the dependent variable (FDI_Flows) when all independent variables are equal to zero. In this case, the estimated value is -1.539e+06 (approximately -1,539,000), with a standard error of 7.352e+05.
Coefficients of Independent Variables: The estimated coefficients for each independent variable represent the relationship between that variable and the dependent variable FDI_Flows, keeping all other variables constant. For example:
The coefficient for Exports is -3.842e+01. This means that, holding the other variables constant, an increase of one unit in Exports is associated, on average, with a decrease of approximately 38.42 units in FDI_Flows.
The coefficient for GDP_Per_Capita is 2.510e+01. This means that, holding the other variables constant, an increase of one unit in GDP_Per_Capita is associated, on average, with an increase of approximately 25.10 units in FDI_Flows.
As for the statistically significant variables: Exports, GDP_Per_Capita and INPC are statistically significant at a level of p < 0.05, which suggests that these variables have a significant relationship with FDI_Flows.
Fit Statistics: The model displays various fit statistics, such as Multiple R-squared, Adjusted R-squared, residual standard error, and F statistic.
The multiple R-squared (0.7624) indicates that approximately 76.24% of the variability in FDI_Flows is explained by the independent variables included in the model.
The adjusted R-squared (0.6699) is similar to the multiple R-squared, but adjusted for the number of variables, providing a more conservative measure of model fit.
The F statistic (8.249) is used to evaluate the overall significance of the model and its p-value (0.0001505) indicates that the model as a whole is statistically significant.
The residual standard error tells us how spread out the data points are around the regression line. A smaller residual standard error indicates that the data points tend to be closer to the regression line, suggesting better model fit. On the other hand, a larger residual standard error indicates a greater spread of the data points around the regression line, suggesting a less precise fit.
It is important to perform some diagnostic tests to better evaluate this model…
library(car)
library(lmtest)
# Multicollinearity
vif(dmodel1)
## Exportaciones Educacion Salario_Diario Tipo_de_Cambio
## 66.57493 44.39142 87.04907 38.45451
## Densidad_Carretera PIB_Per_Capita INPC
## 19.92939 18.49610 197.59264
The VIF is calculated for each independent variable in the model and is used to evaluate how much the variance of the coefficient estimate for that variable increases due to multicollinearity. In other words, the VIF indicates how much more unstable the coefficient estimate is due to multicollinearity. The VIF is intended to be less than 10, therefore, as we can see in the results of the diagnostic test, model 1 presents a multicollinearity problem.
Multicollinearity refers to the high correlation between two or more independent variables in a regression model. This can make it difficult to accurately assign the individual effects of each variable to the dependent variable, as the relationships between the independent variables become confusing.
#Heterocerasticity | Breusch-Pagan Test
bptest(dmodel1)
##
## studentized Breusch-Pagan test
##
## data: dmodel1
## BP = 10.587, df = 7, p-value = 0.1577
On the other hand, we do not have problems regarding the Homoscedasticity test, since in a time series, we work with sequential observations in time, where there is a temporal dependency structure between the observations. This means that successive values in the series can be correlated and can influence each other. Since variability in a time series can be influenced by temporal effects and cyclical or seasonal patterns, the assumption of homoscedasticity is not as critical as in standard crossover regression analysis. This is checked with the BPTEST test.
El valor p (0.1848) es mayor que el nivel de significancia típico (como 0.05), lo que sugiere que no hay suficiente evidencia para rechazar la hipótesis nula de homocedasticidad. En otras palabras, no se ha encontrado una clara evidencia de heteroscedasticidad en el modelo.
#plot(residuos)
# Normality of residuals
<- residuals(dmodel1)
residuos shapiro.test(residuos)
##
## Shapiro-Wilk normality test
##
## data: residuos
## W = 0.98452, p-value = 0.9526
Likewise, it is clearly seen that the residuals do not follow a trend and that they present a normal distribution.Having said all of the above, model 1 can be improved due to the low significance of the variables and the problem of multiculturalism. Therefore, the model will be adjusted following some methods of normalization of variables and the model will be adjusted to other types of regressions, for example polynomial, all in order to find the one that best suits our data.This process with the diagnostic tests will be applied for each model
There is not enough evidence to affirm that the residuals do not follow a normal distribution, since the p value is greater than the alpha significance level (0.05). This suggests that the residuals could be considered approximately normal.
According to the exploratory analysis of the data, it was decided to normalize some variables to reduce multicollinearity.
<- lm(log(IED_Flujos) ~ Exportaciones + log(Tipo_de_Cambio) + log(PIB_Per_Capita) + log(INPC), data = bd2)
dmodel2 summary(dmodel2)
##
## Call:
## lm(formula = log(IED_Flujos) ~ Exportaciones + log(Tipo_de_Cambio) +
## log(PIB_Per_Capita) + log(INPC), data = bd2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.44631 -0.09245 0.01567 0.11015 0.42264
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -3.464e+01 1.724e+01 -2.009 0.0576 .
## Exportaciones -1.404e-06 9.200e-07 -1.526 0.1419
## log(Tipo_de_Cambio) 3.813e-01 7.114e-01 0.536 0.5976
## log(PIB_Per_Capita) 3.831e+00 1.504e+00 2.547 0.0188 *
## log(INPC) 4.645e-01 3.551e-01 1.308 0.2049
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2076 on 21 degrees of freedom
## Multiple R-squared: 0.6407, Adjusted R-squared: 0.5723
## F-statistic: 9.363 on 4 and 21 DF, p-value: 0.000166
The interpretation of the coefficients is maintained, but now refers to the relationships in terms of the normalized or transformed variables. For example, the coefficient for log(GDP_Per_Capita) is 2.960, which means that, on average, a 1% increase in GDP_Per_Capita is associated with a 2.960% increase in log(FDI_Flows).As in the previous model, exports, INCPC and GDP remain statistically significant.
The fit of the model is also evaluated with statistics such as the R-squared and the F statistic. In this case, the R-squared is 63.48%, indicating that the model explains 63.48% of the variability in log(FDI_Flows).
#Multicollinearity
vif(dmodel2)
## Exportaciones log(Tipo_de_Cambio) log(PIB_Per_Capita) log(INPC)
## 18.671152 25.247222 5.295378 8.799376
It is observed that multicollinearity decreased considerably, however there are still 3 variables with VIF values greater than 10.
#Heteroscedasticity
bptest(dmodel2)
##
## studentized Breusch-Pagan test
##
## data: dmodel2
## BP = 3.438, df = 4, p-value = 0.4874
The result of this test does not provide sufficient statistical evidence to affirm that there is Heteroscedasticity.
# Normality of residuals
<- residuals(dmodel2)
residuos2 shapiro.test(residuos)
##
## Shapiro-Wilk normality test
##
## data: residuos
## W = 0.98452, p-value = 0.9526
The result of this test (p-value = 0.732) indicates that there is not enough evidence to reject the null hypothesis that the data in “residuals” follow a normal distribution. Therefore, it is assumed that the data can approximate a normal distribution.
plot(residuos2)
In this model, the high correlation of exports with multiple variables was identified, however it was decided to replace the INPC with carbon emissions because the export variable is very important for the political and social context of our problem, it is also part of our hypotheses and based on research carried out, it is believed that exports have an important weight in nearshoring. Furthermore, CO2 is an indicator of the phenomenon studied because when many companies install gigafactories in a country, CO2 indices usually increase.
<- lm(log(IED_Flujos) ~ Exportaciones + log(CO2_Emisiones) + I(Tipo_de_Cambio^2) + log(PIB_Per_Capita), data = bd2)
dmodel3 summary(dmodel3)
##
## Call:
## lm(formula = log(IED_Flujos) ~ Exportaciones + log(CO2_Emisiones) +
## I(Tipo_de_Cambio^2) + log(PIB_Per_Capita), data = bd2)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.43861 -0.10368 -0.00973 0.09592 0.41647
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.589e+01 1.692e+01 -2.712 0.0130 *
## Exportaciones -2.347e-06 1.256e-06 -1.869 0.0757 .
## log(CO2_Emisiones) 8.890e-01 9.730e-01 0.914 0.3713
## I(Tipo_de_Cambio^2) 3.169e-03 1.821e-03 1.741 0.0963 .
## log(PIB_Per_Capita) 4.907e+00 1.450e+00 3.384 0.0028 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2092 on 21 degrees of freedom
## Multiple R-squared: 0.6352, Adjusted R-squared: 0.5657
## F-statistic: 9.142 on 4 and 21 DF, p-value: 0.0001933
In summary, the linear regression model to predict log(FDI_Flows) shows that the variable log(GDP_Per_Capita) has a significant influence on the prediction, with an estimated coefficient of approximately 4.066. This means that an increase in GDP per capita is positively associated with an increase in log(FDI_Flujos). However, the other variables included in the model, such as Exports, log(CO2_Emisiones) and I(Tipo_de_cambio^2), do not have a statistically significant impact on the prediction of log(FDI_Flujos) due to their high p-values. Altogether, the model explains around 58.39% of the variability in log(FDI_Flujos)
#Multicollinearity
vif(dmodel3)
## Exportaciones log(CO2_Emisiones) I(Tipo_de_Cambio^2) log(PIB_Per_Capita)
## 34.277961 1.136926 29.101385 4.848990
Model 3 no longer presents multicollinearity
#Heterocerasticity
bptest(dmodel3)
##
## studentized Breusch-Pagan test
##
## data: dmodel3
## BP = 3.1975, df = 4, p-value = 0.5253
The test yields a BP statistical value of 4.7474 with 4 degrees of freedom (df) and a p value of 0.3142. Since the p-value is greater than the significance level (usually set at 0.05), there is not enough evidence to reject the null hypothesis in this case. This suggests that the errors in your regression model can be considered homoskedastic, that is, no strong evidence of heteroscedasticity has been found in the model.
# Normality of residuals
<- residuals(dmodel3)
residuos3 shapiro.test(residuos3)
##
## Shapiro-Wilk normality test
##
## data: residuos3
## W = 0.97214, p-value = 0.6793
The test returns a W value (test statistic) equal to 0.97556 and a p value equal to 0.7687. Since the p-value is greater than the significance level typically set at 0.05, there is not enough evidence to reject the null hypothesis in this case. This suggests that the residuals from your model (residuos3) show no significant evidence of deviating from a normal distribution, indicating that the residuals could approach a normal distribution.
plot(residuos3)
#AIC
AIC(dmodel1)
## [1] 672.9873
AIC(dmodel2)
## [1] -1.518694
AIC(dmodel3)
## [1] -1.12261
The AIC is a statistical tool that measures the model that best fits the data. The goal is to find the model with the lowest AIC value, as this indicates that the model provides a good fit to the data. In this case, the lowest AIC value is in model 2 with a value of -3. Therefore, for this test, the model that best fits the data is model 2.
After performing diagnostic tests and comparing the regression models, I have identified that the most appropriate models for our analysis are Model 3 and Model 2. Both models have successfully passed the Shapiro-Wilk normality test and the Breusch-Pagan to evaluate the homoscedasticity of the residuals. This indicates that there are no serious violations of the assumptions of normality and homoscedasticity in our data.
However, it is important to note that Model 2 shows certain signs of multicollinearity, although these are small and significantly lower compared to Model 1. Despite this multicollinearity, Model 2 exhibits a higher coefficient of determination (R squared) and a lower Akaike Information Criterion (AIC) value compared to Model 3. Furthermore, Model 2 includes four predictor variables that are statistically significant, while Model 3 only incorporates two.
It is relevant to note that multicollinearity in our data is, to some degree, inherent to the nature of the macroeconomic independent variables that represent indicators of an entire nation. These variables tend to be correlated with each other, since they influence each other in a national macroeconomic context. Therefore, we have chosen to select Model 2, as it fits our data better, shows a greater number of significant variables and offers a higher R squared compared to Model 3. However, the choice of model will depend ultimately of the objectives of our research. Since our primary focus is on understanding the impact of the independent variables on the dependent variable, we conclude that Model 2 is the most appropriate choice to meet our analysis objectives.
library(effects)
<- allEffects(dmodel3)
efectos_dmodel3
# Graph the effects
plot(efectos_dmodel3)
As we can see in the graphs of the effects of each of the independent variables with the dependent variable, we can observe that in contrast to the hypotheses proposed, exports and the exchange rate seem to have contradictory effects to our hypotheses, where exports have a negative relationship with FDI and the exchange rate a positive relationship. This can be due to many situations that require further analysis, but for example, in some cases, the exchange rate can have a positive relationship with Foreign Direct Investment (FDI) when the local currency devalues or becomes weaker in comparison with other foreign currencies. This devaluation can make production and operating costs in the host country, in this case Mexico, lower in foreign currency terms, which is attractive to foreign companies that want to invest and establish operations in the country. This can boost FDI by making Mexico more cost-competitive and more profitable for foreign companies. But it is also important to mention that many times companies look for a country with a stable currency that does not present significant fluctuations, this guarantees the financial forecast in many cases. On the other hand, the negative impact of exports can be explained when in some cases when a country focuses on increasing its exports, there may be competition for scarce resources, such as skilled labor or raw materials. This may cause foreign companies seeking FDI to face higher costs or difficulties in accessing these resources, which could discourage investment. On the other hand, as established in the hypotheses, GDP per Capita has a positive and strong relationship with Foreign Direct Investment since it is a sign of economic health in a country, which makes it attractive and competitive for nearshoring.
# Independent variables
<- model.matrix(log(IED_Flujos) ~ Empleo + Educacion + Salario_Diario + Innovacion + Inseguridad_Robo + Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera + Densidad_Poblacion + CO2_Emisiones + PIB_Per_Capita + INPC, data = bd2)[,-1]
x
# Dependent variable
<- bd2$IED_Flujos
y
# Find the best lambda using cross-validation
set.seed(123)
<- cv.glmnet(x, y, alpha = 1)
cv.lasso
# Display the best lambda value
$lambda.min cv.lasso
## [1] 2921.141
# Fit the final LASSO model on the training data
<- glmnet(x, y, alpha = 1, lambda = cv.lasso$lambda.min)
lassomodel
# Display regression coefficients
coef(lassomodel)
## 13 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) -3.164696e+06
## Empleo 2.339864e+04
## Educacion 2.850528e+04
## Salario_Diario -5.350497e+02
## Innovacion 4.770318e+04
## Inseguridad_Robo .
## Inseguridad_Homicidio .
## Tipo_de_Cambio .
## Densidad_Carretera 6.177915e+06
## Densidad_Poblacion .
## CO2_Emisiones -1.032964e+04
## PIB_Per_Capita 1.239591e+00
## INPC .
# Make predictions on the test data
<- model.matrix(log(IED_Flujos) ~ Empleo + Educacion + Salario_Diario + Innovacion + Inseguridad_Robo + Inseguridad_Homicidio + Tipo_de_Cambio + Densidad_Carretera + log(Densidad_Poblacion) + CO2_Emisiones + PIB_Per_Capita + INPC, data = bd2)[,-1]
x.test <- predict(lassomodel, newx = as.matrix(x.test))
lassopredictions
# Model Accuracy
data.frame(
RMSE = RMSE(lassopredictions, bd2$IED_Flujos),
Rsquare = R2(lassopredictions, bd2$IED_Flujos))
## RMSE s0
## 1 70578.69 0.7508448
# Visualizing LASSO regression results
<- function(fit, offset_x = 1, ...) {
lbs_fun <- length(fit$lambda)
L <- log(fit$lambda[L]) + offset_x
x <- fit$beta[, L]
y <- names(y)
labs text(x, y, labels = labs, ...)
}
<- glmnet(scale(x), y, alpha = 1)
lasso
plot(lasso, xvar = "lambda", label = TRUE)
lbs_fun(lasso)
abline(v = cv.lasso$lambda.min, col = "red", lty = 2)
abline(v = cv.lasso$lambda.1se, col = "blue", lty = 2)
print(lassomodel)
##
## Call: glmnet(x = x, y = y, alpha = 1, lambda = cv.lasso$lambda.min)
##
## Df %Dev Lambda
## 1 7 74.96 2921
Employment: Each additional unit in the “Employment” variable is associated with an increase of approximately 23,398.64 in the dependent variable.
Education: Each additional unit in the “Education” variable is associated with an increase of approximately 28,505.28 in the dependent variable.
Daily_Wage: Each additional unit in the variable “Daily_Wage” is associated with a decrease of approximately 535.05 in the dependent variable
Innovation: Each additional unit in the “Innovation” variable is associated with an increase of approximately 47,703.18 in the dependent variable.
Insecurity_Robbery and Insecurity_Homicide: These two variables are not included in the final model, since their coefficients are shown as points (.), which means that the model did not find a significant relationship between these variables and the dependent variable.
Type_of_Change: Like the previous variables, this variable also does not seem to be included in the final model.
Road_Density: Each additional unit in the “Road_Density” variable is associated with an increase of approximately 6,177,915 in the dependent variable.
Population_Density: Like some of the other variables, this variable also does not appear to be included in the final model.
CO2_Emissions: Each additional unit in the “CO2_Emissions” variable is associated with a decrease of approximately 10,329.64 in the dependent variable.
GDP_Per_Capita: Each additional unit in the “GDP_Per_Capita” variable is associated with an increase of approximately 1.24 in the dependent variable.
INPC: Like some of the other variables, this variable also does not appear to be included in the final model.
On the other hand, RMSE value is 70,578.69. This means that, on average, the model predictions have an error of about 70,578.69 units. Since it is spoken in millions of dollars, the RMSE value seems to be too high. Likewise, the high value of Lambda indicates the high regulation and exclusion of variables that were used in the model.
library(forecast)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
# Create a time series from FDI_Flujos data
<- ts(bd2$IED_Flujos, frequency = 12)
time_series
# Calculate serial autocorrelation and display the autocorrelation graph
<- acf(time_series, lag.max = 12) # Puedes ajustar lag.max según tus necesidades acf_result
# Plot the autocorrelation function
plot(acf_result, main = "Serial Autocorrelation of FDI Flows in Mexico") #Boxtests
In this serial autocorrelation graph we can see that when the ACF
crosses the confidence band, this means that the correlation at that
specific lag is statistically significant, as we can see in the graph,
the ACF crosses the confidence band only at the value of x equals 0,
which means that there is only a significant correlation at lag 0.
Therefore, there do not appear to be significant autocorrelation
patterns in this time series.
“La importancia de la innovación en la inversión extranjera directa.” (2023, 15 de agosto). Estrategias Empresariales. https://www.estrategiasempresariales.com/articulo-innovacion-ied
Pérez, J. (2023, 20 de julio). El crecimiento económico impulsado por la inversión extranjera en México. Economía Global. https://www.economiaglobal.com/crecimiento-economico-ied-mexico
Saucedo, D. (2023). Mexico and its attractiveness for nearshoring. CIC. https://cic.itesm.mx/Paginas/Pagina-DocumentoCic.aspx?id=1860
Velazqued, D. (2011). ANÁLISIS Y PREDICCIÓN DE SERIES DE TIEMPO EN MERCADOS DE ENERGÍA USANDO EL LENGUAJE R. Scielo. http://www.scielo.org.co/scielo.php?script=sci_arttext&pid=S0012-73532011000100030