|

Part 1

Background

What is nearshoring? “Nearshoring” refers to the practice of transferring a business process to companies in a nearby country, usually with the aim of reducing production or operating costs and taking advantage of geographical and temporal advantages. It is similar to “offshoring”, which involves moving businesses or production processes to other countries, but the key difference is in geographical proximity.

With this strategy, in which a company seeks to move part of its production to be closer to its final destination. For example, after the economic disruption caused by COVID 19, many companies are looking for shorter and more resilient production chains that are able to stay in operation at all times.

With nearshoring companies can reduce costs: one of the main reasons companies consider nearshoring is the potential to access skilled labor at a lower cost. Wages in nearshore locations are often significantly less than in the home country. Nearshoring also can make it easier for employees to travel, if employees need to travel between offices, geographic proximity can make travel less expensive. When the companies move to a nearby country, they take lower regulation and political risks than moving to more distant locations.

(Vargas C 2023)

Why might Mexico be attractive to Nearshoring? Mexico has become an attractive nearshoring destination for many companies, especially to nearby countries such as the United States and Canada.

One of the most important advantages that Mexico offers as a nearshoring destination is its geographical proximity to the United States and Canada. There is also some cultural proximity and above all a highly skilled workforce that offers competitive costs, which although not as low as those in the Asian regions, can offer other kinds of advantages. (Samuel Garcia Mexico’s Industry Supply Chain 2023)

In recent years, Mexico has presented challenges and limitations in transport and communications infrastructure due to the low level of investment in these sectors. However, Mexico has a strong and growing business ecosystem, with domestic and foreign companies operating in various sectors that can facilitate collaboration and the exchange of knowledge between companies. (EGADE 2023)

What is Predictive Analytics? According to IBM (2023), Predictive analytics is a branch of advanced analytics that makes predictions about future outcomes using the data with statistical modeling, data mining techniques and machine learning.

Predictive analytics implies the use of data, statistical algorithms, and machine learning techniques. Its main purpose is to know what has happened to provide a accurate information of what will happen in the future. Essentially, companies use predictive analytics to find patterns in this data to identify risks and new opportunities.

(IBM 2023)

What the use is of regression analysis in predictive analytics? Regression analysis is a foundational statistical tool used in predictive analytics to understand relationships between variables and predict future outcomes. At its core, regression analysis estimates the relationships among variables, helping to model the relationship between a dependent variable and one or more independent variables.

We know that predictive analytics is a tool for machine learning and big data but regression modeling is a tool for predictive analytics. Regression analysis is the process of looking at dependent variables (outcomes) and an independent variable (the action). It seeks to determine the relationship or a connection between variables. Essentially, it evaluates whether a relationship exists between the variables and the robustness of that link. (GutCheck 2017)

How regression analysis can help us to predict the occurrence of Nearshoring for the Mexican case?

Regression analysis can be used to predict the occurrence of approximations to Mexico by analyzing various factors that may influence the decision of companies to make nearshoring. Having historical data of companies that have approached Mexico could indicate cost savings they have previously achieved, type of industry, company size, previous nearby destinations, global economic indicators, trade policies and more. With this we can identify which variables (factors) seem most relevant (statistically speaking) for the decision to approach Mexico.

With the regression analysis we can see how different variables influence the nearby decision. And finally, once the model has been selected, the probability of a company near Mexico can be predicted based on the values of the independent variables, and it can help quantify how much each factor contributes to the decision. (T. Osvarauld 2023)

Part 2

Problem Situation According to the document “Mexico and Its Attractiveness for Nearshoring”, what is the problem situation? how to address the problem situation?

The purpose of this evidence is to help Maria, an analyst in a Mexican company that wants to know if Mexico can be attractive to other countries that want to make nearshoring in this country. She has made an investigation based on INEGI, Bank of Mexico and the Ministry of Economy, with some variables such as GDP per capita, daily wage, exportations in millions of dollars, exchange rate, road information, etc.

Basically she wants to know what econometric model she should use to help her predict the consequences of nearshoring in Mexico, why this country may be attractive to do nearshoring and what are some opportunities that Mexico has in terms of relocating businesses in this area.

With this work we want to know factors that attract the nearshoring or that frighten it for foreign investors in Mexico.

Part 3

Data and Methodology. Exploratory Data Analysis EDA

# Import BD
library(foreign)
bd<- read.csv("C:\\Users\\85171075\\Desktop\\Mariana\\TEC\\Econometrics\\sp_data.csv")
summary(bd)
##     periodo       IED_Flujos        IED_M        Exportaciones  
##  Min.   :1997   Min.   : 8374   Min.   :210876   Min.   : 9088  
##  1st Qu.:2003   1st Qu.:21367   1st Qu.:368560   1st Qu.:13260  
##  Median :2010   Median :27698   Median :497054   Median :21188  
##  Mean   :2010   Mean   :26770   Mean   :493596   Mean   :23601  
##  3rd Qu.:2016   3rd Qu.:32183   3rd Qu.:578606   3rd Qu.:31601  
##  Max.   :2022   Max.   :48354   Max.   :754438   Max.   :46478  
##                                                                 
##  Exportaciones_m      Empleo        Educacion     Salario_Diario  
##  Min.   :205483   Min.   :95.06   Min.   :7.200   Min.   : 24.30  
##  1st Qu.:262337   1st Qu.:95.89   1st Qu.:7.865   1st Qu.: 41.97  
##  Median :366294   Median :96.53   Median :8.460   Median : 54.48  
##  Mean   :433856   Mean   :96.47   Mean   :8.423   Mean   : 65.16  
##  3rd Qu.:632356   3rd Qu.:97.08   3rd Qu.:9.000   3rd Qu.: 72.31  
##  Max.   :785655   Max.   :97.83   Max.   :9.580   Max.   :172.87  
##                   NA's   :3       NA's   :3                       
##    Innovacion    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio 
##  Min.   :11.28   Min.   :120.5    Min.   : 8.04         Min.   : 8.06  
##  1st Qu.:12.56   1st Qu.:148.3    1st Qu.:10.25         1st Qu.:10.75  
##  Median :13.09   Median :181.8    Median :16.93         Median :13.02  
##  Mean   :13.11   Mean   :185.4    Mean   :17.29         Mean   :13.91  
##  3rd Qu.:13.75   3rd Qu.:209.9    3rd Qu.:22.43         3rd Qu.:18.49  
##  Max.   :15.11   Max.   :314.8    Max.   :29.59         Max.   :20.66  
##  NA's   :2                        NA's   :1                            
##  Densidad_Carretera Densidad_Poblacion CO2_Emisiones   PIB_Per_Capita  
##  Min.   :0.05000    Min.   :47.44      Min.   :3.590   Min.   :126739  
##  1st Qu.:0.06000    1st Qu.:52.77      1st Qu.:3.830   1st Qu.:130964  
##  Median :0.07000    Median :58.09      Median :3.930   Median :136845  
##  Mean   :0.07115    Mean   :57.33      Mean   :3.945   Mean   :138550  
##  3rd Qu.:0.08000    3rd Qu.:61.39      3rd Qu.:4.105   3rd Qu.:146148  
##  Max.   :0.09000    Max.   :65.60      Max.   :4.220   Max.   :153236  
##                                        NA's   :3                       
##       INPC        crisis_financiera
##  Min.   : 33.28   Min.   :0.00000  
##  1st Qu.: 56.15   1st Qu.:0.00000  
##  Median : 73.35   Median :0.00000  
##  Mean   : 75.17   Mean   :0.07692  
##  3rd Qu.: 91.29   3rd Qu.:0.00000  
##  Max.   :126.48   Max.   :1.00000  
## 
#Installing libraries
#library(pysch)
library(readxl)
library(tidyverse)
library(ggplot2)
library(corrplot)
library(gmodels)
library(effects)
library(stargazer)
library(olsrr)        
library(kableExtra)
library(jtools)
library(fastmap)
library(dlookr)
library(Hmisc)
library(naniar)
library(glmnet)
library(caret)
library(car)
library(lmtest)
library(dplyr)
#Identify missing values
missing_values<-colSums(is.na(bd))
missing_values
##               periodo            IED_Flujos                 IED_M 
##                     0                     0                     0 
##         Exportaciones       Exportaciones_m                Empleo 
##                     0                     0                     3 
##             Educacion        Salario_Diario            Innovacion 
##                     3                     0                     2 
##      Inseguridad_Robo Inseguridad_Homicidio        Tipo_de_Cambio 
##                     0                     1                     0 
##    Densidad_Carretera    Densidad_Poblacion         CO2_Emisiones 
##                     0                     0                     3 
##        PIB_Per_Capita                  INPC     crisis_financiera 
##                     0                     0                     0
#Display data set structure
str(bd)
## 'data.frame':    26 obs. of  18 variables:
##  $ periodo              : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ IED_Flujos           : num  12146 8374 13960 18249 30057 ...
##  $ IED_M                : num  294151 210876 299734 362632 546548 ...
##  $ Exportaciones        : num  9088 9875 10990 12483 11300 ...
##  $ Exportaciones_m      : num  220091 248691 235961 248057 205483 ...
##  $ Empleo               : num  NA NA NA 97.8 97.4 ...
##  $ Educacion            : num  7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
##  $ Salario_Diario       : num  24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovacion           : num  11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num  267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num  14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num  8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num  0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
##  $ Densidad_Poblacion   : num  47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num  3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
##  $ PIB_Per_Capita       : num  127570 126739 129165 130875 128083 ...
##  $ INPC                 : num  33.3 39.5 44.3 48.3 50.4 ...
##  $ crisis_financiera    : int  0 0 0 0 0 0 0 0 0 0 ...
# Include descriptive statistics (mean, median, standard deviation, minimum, maximum)
summary(bd)
##     periodo       IED_Flujos        IED_M        Exportaciones  
##  Min.   :1997   Min.   : 8374   Min.   :210876   Min.   : 9088  
##  1st Qu.:2003   1st Qu.:21367   1st Qu.:368560   1st Qu.:13260  
##  Median :2010   Median :27698   Median :497054   Median :21188  
##  Mean   :2010   Mean   :26770   Mean   :493596   Mean   :23601  
##  3rd Qu.:2016   3rd Qu.:32183   3rd Qu.:578606   3rd Qu.:31601  
##  Max.   :2022   Max.   :48354   Max.   :754438   Max.   :46478  
##                                                                 
##  Exportaciones_m      Empleo        Educacion     Salario_Diario  
##  Min.   :205483   Min.   :95.06   Min.   :7.200   Min.   : 24.30  
##  1st Qu.:262337   1st Qu.:95.89   1st Qu.:7.865   1st Qu.: 41.97  
##  Median :366294   Median :96.53   Median :8.460   Median : 54.48  
##  Mean   :433856   Mean   :96.47   Mean   :8.423   Mean   : 65.16  
##  3rd Qu.:632356   3rd Qu.:97.08   3rd Qu.:9.000   3rd Qu.: 72.31  
##  Max.   :785655   Max.   :97.83   Max.   :9.580   Max.   :172.87  
##                   NA's   :3       NA's   :3                       
##    Innovacion    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio 
##  Min.   :11.28   Min.   :120.5    Min.   : 8.04         Min.   : 8.06  
##  1st Qu.:12.56   1st Qu.:148.3    1st Qu.:10.25         1st Qu.:10.75  
##  Median :13.09   Median :181.8    Median :16.93         Median :13.02  
##  Mean   :13.11   Mean   :185.4    Mean   :17.29         Mean   :13.91  
##  3rd Qu.:13.75   3rd Qu.:209.9    3rd Qu.:22.43         3rd Qu.:18.49  
##  Max.   :15.11   Max.   :314.8    Max.   :29.59         Max.   :20.66  
##  NA's   :2                        NA's   :1                            
##  Densidad_Carretera Densidad_Poblacion CO2_Emisiones   PIB_Per_Capita  
##  Min.   :0.05000    Min.   :47.44      Min.   :3.590   Min.   :126739  
##  1st Qu.:0.06000    1st Qu.:52.77      1st Qu.:3.830   1st Qu.:130964  
##  Median :0.07000    Median :58.09      Median :3.930   Median :136845  
##  Mean   :0.07115    Mean   :57.33      Mean   :3.945   Mean   :138550  
##  3rd Qu.:0.08000    3rd Qu.:61.39      3rd Qu.:4.105   3rd Qu.:146148  
##  Max.   :0.09000    Max.   :65.60      Max.   :4.220   Max.   :153236  
##                                        NA's   :3                       
##       INPC        crisis_financiera
##  Min.   : 33.28   Min.   :0.00000  
##  1st Qu.: 56.15   1st Qu.:0.00000  
##  Median : 73.35   Median :0.00000  
##  Mean   : 75.17   Mean   :0.07692  
##  3rd Qu.: 91.29   3rd Qu.:0.00000  
##  Max.   :126.48   Max.   :1.00000  
## 
# We can observe that of our dependent variable "IED_M" the minimum value is 210,876 and our maximum value is 754438, since we do not have the information by regions, it is not possible to know in which areas of the country are the most flow of investments or where they've been lower. This would serve us a little more by regions to see which points and areas of the country are more attractive to do near shoring. This suggestion also applies to determine daily wages, when you want to move a business to another country you have to consider the salary of workers to see if the company suits and can be interesting for the population near the area.


# Also we can see that the information of the data base starts form 1997 until 2022. With this summary function we can see the maximum and minimum values of each variable (column) given.

# Transform variables if required
#bd$Salario_Diario = as.factor(bd$Salario_Diario) 

# Which is the estimation method to be used to estimate the linear regression model? 

#In linear regression analysis, the most common estimation method is the Ordinary Least Squares (OLS) method. The OLS method minimizes the sum of the squared differences (or "errors") between the observed values (dependent variable values) and the values predicted by the model. Its main goal is to fit a linear model to a data base in which the sum of the squared differences between the observed values and the values predicted by the model is minimized ("errors”). In R, the OLS cna be seen in the "regression line" that best fits the data in a when we plot scatter plot. 
# Replace missing values
bd <- bd %>%
  mutate(across(everything(), ~ifelse(is.na(.), median(., na.rm = TRUE), .)))
bd
##    periodo IED_Flujos    IED_M Exportaciones Exportaciones_m Empleo Educacion
## 1     1997   12145.60 294151.2       9087.62        220090.8  96.53      7.20
## 2     1998    8373.50 210875.6       9875.07        248690.6  96.53      7.31
## 3     1999   13960.32 299734.4      10990.01        235960.5  96.53      7.43
## 4     2000   18248.69 362631.8      12482.96        248057.2  97.83      7.56
## 5     2001   30057.18 546548.4      11300.44        205482.9  97.36      7.68
## 6     2002   24099.21 468332.0      11923.10        231707.6  97.66      7.80
## 7     2003   18249.97 368752.8      13156.00        265825.7  97.06      7.93
## 8     2004   25015.57 481349.2      13573.13        261173.9  96.48      8.04
## 9     2005   25795.82 458544.8      16465.81        292695.1  97.17      8.14
## 10    2006   21232.54 368495.8      17485.93        303472.5  96.53      8.26
## 11    2007   32393.33 542793.7      19103.85        320110.6  96.60      8.36
## 12    2008   29502.46 586217.7      16924.76        336297.2  95.68      8.46
## 13    2009   17849.95 324318.4      19702.63        357980.1  95.20      8.56
## 14    2010   27189.28 449223.7      22673.14        374607.6  95.06      8.63
## 15    2011   25632.52 460653.8      24333.02        437299.9  95.49      8.75
## 16    2012   21769.32 350978.6      26297.98        423992.5  95.53      8.85
## 17    2013   48354.42 754437.5      27687.57        431988.2  95.75      8.95
## 18    2014   30351.25 512758.2      31676.78        535151.9  96.24      9.05
## 19    2015   35943.75 699904.1      29959.94        583386.1  96.04      9.15
## 20    2016   31188.98 700091.6      31375.06        704268.5  96.62      9.25
## 21    2017   34017.05 683318.0      33322.62        669368.6  96.85      9.35
## 22    2018   34100.43 671018.4      35341.90        695447.7  96.64      9.45
## 23    2019   34577.16 615945.4      36414.73        648679.3  97.09      9.58
## 24    2020   28205.89 514711.7      41077.34        749594.7  96.21      8.46
## 25    2021   31553.52 551937.8      44914.78        785654.5  96.49      8.46
## 26    2022   36215.37 555771.9      46477.59        713259.0  97.24      8.46
##    Salario_Diario Innovacion Inseguridad_Robo Inseguridad_Homicidio
## 1           24.30      11.30           266.51                 14.55
## 2           31.91      11.37           314.78                 14.32
## 3           31.91      12.46           272.89                 12.64
## 4           35.12      13.15           216.98                 10.86
## 5           37.57      13.47           214.53                 10.25
## 6           39.74      12.80           197.80                  9.94
## 7           41.53      11.81           183.22                  9.81
## 8           43.30      12.61           146.28                  8.92
## 9           45.24      13.41           136.94                  9.22
## 10          47.05      14.23           135.59                  9.60
## 11          48.88      15.04           145.92                  8.04
## 12          50.84      14.82           158.17                 12.52
## 13          53.19      12.59           175.77                 17.46
## 14          55.77      12.69           201.94                 22.43
## 15          58.06      12.10           212.61                 23.42
## 16          60.75      13.03           190.28                 22.09
## 17          63.12      13.22           185.56                 19.74
## 18          65.58      13.65           154.41                 16.93
## 19          70.10      15.11           180.44                 17.37
## 20          73.04      14.40           160.57                 20.31
## 21          88.36      14.05           230.43                 26.22
## 22          88.36      13.25           184.25                 29.59
## 23         102.68      12.70           173.45                 29.21
## 24         123.22      11.28           133.90                 28.98
## 25         141.70      13.09           127.13                 27.89
## 26         172.87      13.09           120.49                 16.93
##    Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## 1            8.06               0.05              47.44          3.68
## 2            9.94               0.05              48.76          3.85
## 3            9.52               0.06              49.48          3.69
## 4            9.60               0.06              50.58          3.87
## 5            9.17               0.06              51.28          3.81
## 6           10.36               0.06              51.95          3.82
## 7           11.20               0.06              52.61          3.95
## 8           11.22               0.06              53.27          3.98
## 9           10.71               0.06              54.78          4.10
## 10          10.88               0.06              55.44          4.19
## 11          10.90               0.06              56.17          4.22
## 12          13.77               0.07              56.96          4.19
## 13          13.04               0.07              57.73          4.04
## 14          12.38               0.07              58.45          4.11
## 15          13.98               0.07              59.15          4.19
## 16          12.99               0.07              59.85          4.20
## 17          13.07               0.08              59.49          4.06
## 18          14.73               0.08              60.17          3.89
## 19          17.34               0.08              60.86          3.93
## 20          20.66               0.08              61.57          3.89
## 21          19.74               0.09              62.28          3.84
## 22          19.66               0.09              63.11          3.65
## 23          18.87               0.09              63.90          3.59
## 24          19.94               0.09              64.59          3.93
## 25          20.52               0.09              65.16          3.93
## 26          19.41               0.09              65.60          3.93
##    PIB_Per_Capita   INPC crisis_financiera
## 1        127570.1  33.28                 0
## 2        126738.8  39.47                 0
## 3        129164.7  44.34                 0
## 4        130874.9  48.31                 0
## 5        128083.4  50.43                 0
## 6        128205.9  53.31                 0
## 7        128737.9  55.43                 0
## 8        132563.5  58.31                 0
## 9        132941.1  60.25                 0
## 10       135894.9  62.69                 0
## 11       137795.7  65.05                 0
## 12       135176.0  69.30                 1
## 13       131233.0  71.77                 1
## 14       134991.7  74.93                 0
## 15       138891.9  77.79                 0
## 16       141530.2  80.57                 0
## 17       144112.0  83.77                 0
## 18       147277.4  87.19                 0
## 19       149433.5  89.05                 0
## 20       152275.4  92.04                 0
## 21       153235.7  98.27                 0
## 22       153133.8  99.91                 0
## 23       150233.1 105.93                 0
## 24       142609.3 109.27                 0
## 25       142772.0 117.31                 0
## 26       146826.7 126.48                 0
#Identify missing values
missing_values<-colSums(is.na(bd))
missing_values
##               periodo            IED_Flujos                 IED_M 
##                     0                     0                     0 
##         Exportaciones       Exportaciones_m                Empleo 
##                     0                     0                     0 
##             Educacion        Salario_Diario            Innovacion 
##                     0                     0                     0 
##      Inseguridad_Robo Inseguridad_Homicidio        Tipo_de_Cambio 
##                     0                     0                     0 
##    Densidad_Carretera    Densidad_Poblacion         CO2_Emisiones 
##                     0                     0                     0 
##        PIB_Per_Capita                  INPC     crisis_financiera 
##                     0                     0                     0

Data Visualization

# Y= IED_Flujos

# Histogram Exportaciones 

hist1 <- ggplot(bd, aes(x = Exportaciones_m)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "pink", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribución de Exportaciones", x = "Valor", y = "Frecuencia") +
  theme(plot.title = element_text(hjust = 0.5))

print(hist1)
## Warning: The dot-dot notation (`..density..`) was deprecated in ggplot2 3.4.0.
## ℹ Please use `after_stat(density)` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

# Histogram Salario diario 
hist2 <- ggplot(bd, aes(x = Salario_Diario)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "purple", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribución de Salario diario", x = "Valor", y = "Frecuencia") +
  theme(plot.title = element_text(hjust = 0.5))

print(hist2)

# Histogram Educacion
hist3 <- ggplot(bd, aes(x = Educacion)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "yellow", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribución de Años de educación", x = "Valor", y = "Frecuencia") +
  theme(plot.title = element_text(hjust = 0.5))

print(hist3)

# Histogram PIB_per_capita

hist4 <- ggplot(bd, aes(x = PIB_Per_Capita)) +
  geom_histogram(aes(y = ..density..), bins = 10, fill = "blue", color = "black") +
  geom_density(color = "red") +
  labs(title = "Distribución de PIB", x = "Valor", y = "Frecuencia") +
  theme(plot.title = element_text(hjust = 0.5))

print(hist4)

#Scatter plot 1
ggplot(data=bd, aes(x=PIB_Per_Capita, y=IED_M)) +
  geom_point() +        
  labs(title="Scatter Plot PIB vs IED ", x="PIB", y="IED") +
  theme_minimal()

#Scatter plot 2
ggplot(data=bd, aes(x=CO2_Emisiones, y=IED_M)) +
  geom_point() +        
  labs(title="Scatter CO2_Emisiones vs IED ", x="CO2_Emisiones", y="IED") +
  theme_minimal()

# With this histograms we can see that some variables are lightly skewed, which means that the tail of the distribution extends more to the right, rather than the left or viceversa. This means that there are a few very high values pulling the tail out to the right/left side. The presence of skewness can influence the choice of tests or models. Many statistical techniques assume normality (center distribution). When data is skewed, certain assumptions are violated, which we may ned data transformations (it can be into log).

# When plotting two variables in a scatter plot it can be seen that the points follow a general direction, either ascending or descending, it is possible that there is a correlation between those two variables.With scatter plots you can see general trends in data that are not necessarily correlated. 
# Display a histogram of dependent variable 
hist4=histogram(bd$IED_M)
hist4

# Display a histogram of dependent variable in LOG
ggplot(data = bd, aes(x = log(IED_M)))+
  geom_histogram(bins = 10, fill = "lightpink", color = "black", boundary = 15) + labs(title = "Frequency of Flujos de inversión extranjera directa", x="Media_values", y="Frequency")+ theme(plot.title = element_text(hjust = 0.5)) 

res <- cor(bd)
round(res, 2)
##                       periodo IED_Flujos IED_M Exportaciones Exportaciones_m
## periodo                  1.00       0.72  0.69          0.98            0.95
## IED_Flujos               0.72       1.00  0.94          0.66            0.60
## IED_M                    0.69       0.94  1.00          0.61            0.64
## Exportaciones            0.98       0.66  0.61          1.00            0.97
## Exportaciones_m          0.95       0.60  0.64          0.97            1.00
## Empleo                  -0.21      -0.06  0.02         -0.13           -0.09
## Educacion                0.84       0.73  0.74          0.73            0.75
## Salario_Diario           0.88       0.56  0.48          0.94            0.88
## Innovacion               0.25       0.53  0.58          0.16            0.17
## Inseguridad_Robo        -0.59      -0.55 -0.45         -0.54           -0.45
## Inseguridad_Homicidio    0.78       0.40  0.42          0.78            0.82
## Tipo_de_Cambio           0.94       0.60  0.68          0.93            0.98
## Densidad_Carretera       0.96       0.73  0.72          0.95            0.95
## Densidad_Poblacion       1.00       0.72  0.67          0.96            0.93
## CO2_Emisiones            0.02       0.09 -0.06         -0.07           -0.18
## PIB_Per_Capita           0.89       0.73  0.78          0.85            0.89
## INPC                     0.99       0.70  0.65          0.99            0.95
## crisis_financiera       -0.04      -0.10 -0.08         -0.14           -0.13
##                       Empleo Educacion Salario_Diario Innovacion
## periodo                -0.21      0.84           0.88       0.25
## IED_Flujos             -0.06      0.73           0.56       0.53
## IED_M                   0.02      0.74           0.48       0.58
## Exportaciones          -0.13      0.73           0.94       0.16
## Exportaciones_m        -0.09      0.75           0.88       0.17
## Empleo                  1.00     -0.32           0.04       0.01
## Educacion              -0.32      1.00           0.51       0.45
## Salario_Diario          0.04      0.51           1.00       0.05
## Innovacion              0.01      0.45           0.05       1.00
## Inseguridad_Robo        0.02     -0.44          -0.54      -0.42
## Inseguridad_Homicidio  -0.33      0.68           0.64      -0.17
## Tipo_de_Cambio         -0.09      0.78           0.85       0.22
## Densidad_Carretera     -0.13      0.82           0.86       0.21
## Densidad_Poblacion     -0.26      0.85           0.86       0.28
## CO2_Emisiones          -0.51      0.07          -0.11       0.33
## PIB_Per_Capita         -0.11      0.91           0.67       0.43
## INPC                   -0.14      0.78           0.93       0.22
## crisis_financiera      -0.42      0.04          -0.11       0.16
##                       Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio
## periodo                          -0.59                  0.78           0.94
## IED_Flujos                       -0.55                  0.40           0.60
## IED_M                            -0.45                  0.42           0.68
## Exportaciones                    -0.54                  0.78           0.93
## Exportaciones_m                  -0.45                  0.82           0.98
## Empleo                            0.02                 -0.33          -0.09
## Educacion                        -0.44                  0.68           0.78
## Salario_Diario                   -0.54                  0.64           0.85
## Innovacion                       -0.42                 -0.17           0.22
## Inseguridad_Robo                  1.00                 -0.08          -0.45
## Inseguridad_Homicidio            -0.08                  1.00           0.79
## Tipo_de_Cambio                   -0.45                  0.79           1.00
## Densidad_Carretera               -0.47                  0.81           0.94
## Densidad_Poblacion               -0.62                  0.76           0.92
## CO2_Emisiones                    -0.41                 -0.25          -0.17
## PIB_Per_Capita                   -0.40                  0.70           0.88
## INPC                             -0.59                  0.75           0.94
## crisis_financiera                -0.11                 -0.09          -0.04
##                       Densidad_Carretera Densidad_Poblacion CO2_Emisiones
## periodo                             0.96               1.00          0.02
## IED_Flujos                          0.73               0.72          0.09
## IED_M                               0.72               0.67         -0.06
## Exportaciones                       0.95               0.96         -0.07
## Exportaciones_m                     0.95               0.93         -0.18
## Empleo                             -0.13              -0.26         -0.51
## Educacion                           0.82               0.85          0.07
## Salario_Diario                      0.86               0.86         -0.11
## Innovacion                          0.21               0.28          0.33
## Inseguridad_Robo                   -0.47              -0.62         -0.41
## Inseguridad_Homicidio               0.81               0.76         -0.25
## Tipo_de_Cambio                      0.94               0.92         -0.17
## Densidad_Carretera                  1.00               0.95         -0.17
## Densidad_Poblacion                  0.95               1.00          0.09
## CO2_Emisiones                      -0.17               0.09          1.00
## PIB_Per_Capita                      0.89               0.87         -0.11
## INPC                                0.96               0.98         -0.01
## crisis_financiera                  -0.03               0.00          0.28
##                       PIB_Per_Capita  INPC crisis_financiera
## periodo                         0.89  0.99             -0.04
## IED_Flujos                      0.73  0.70             -0.10
## IED_M                           0.78  0.65             -0.08
## Exportaciones                   0.85  0.99             -0.14
## Exportaciones_m                 0.89  0.95             -0.13
## Empleo                         -0.11 -0.14             -0.42
## Educacion                       0.91  0.78              0.04
## Salario_Diario                  0.67  0.93             -0.11
## Innovacion                      0.43  0.22              0.16
## Inseguridad_Robo               -0.40 -0.59             -0.11
## Inseguridad_Homicidio           0.70  0.75             -0.09
## Tipo_de_Cambio                  0.88  0.94             -0.04
## Densidad_Carretera              0.89  0.96             -0.03
## Densidad_Poblacion              0.87  0.98              0.00
## CO2_Emisiones                  -0.11 -0.01              0.28
## PIB_Per_Capita                  1.00  0.85             -0.18
## INPC                            0.85  1.00             -0.06
## crisis_financiera              -0.18 -0.06              1.00
# Display a correlation plot 
cor_matrix <- cor(bd, use = "complete.obs")
corrplot(cor_matrix, method = "circle",type="upper")

#qualitative data  
corrplot(cor(bd),type='upper',order='hclust',addCoef.col='black')

Part 4

Linear Regression Analysis

1st hypothesis
H0: The years of education have a significant impact on the flow of foreign direct imports. H1: The years of education have not an impact on the flow of foreign direct imports.

2nd hypothesis
H0: The financial crisis has a significant impact on the flow of foreign direct imports. H1: The The financial crisis has not an impact on the flow of foreign direct imports.

3rd hypothesis
H0: The minimum wage has a significant impact on foreign direct imports. H1: The minimum wage has not an impact on foreign direct imports.

Models.

# Modelo Inicial 1
mod1 <- lm(IED_M ~Exportaciones_m+ Educacion+Inseguridad_Homicidio+Salario_Diario+Tipo_de_Cambio+crisis_financiera, data = bd)
summary(mod1)
## 
## Call:
## lm(formula = IED_M ~ Exportaciones_m + Educacion + Inseguridad_Homicidio + 
##     Salario_Diario + Tipo_de_Cambio + crisis_financiera, data = bd)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -140411  -35956   -6718   37603  230766 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           -8.172e+05  3.718e+05  -2.198   0.0405 *
## Exportaciones_m       -3.719e-01  7.759e-01  -0.479   0.6372  
## Educacion              1.358e+05  5.124e+04   2.650   0.0158 *
## Inseguridad_Homicidio -8.057e+03  4.996e+03  -1.612   0.1233  
## Salario_Diario         9.475e+00  1.342e+03   0.007   0.9944  
## Tipo_de_Cambio         3.403e+04  3.027e+04   1.124   0.2749  
## crisis_financiera     -8.978e+04  8.382e+04  -1.071   0.2975  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 95540 on 19 degrees of freedom
## Multiple R-squared:  0.6647, Adjusted R-squared:  0.5589 
## F-statistic: 6.279 on 6 and 19 DF,  p-value: 0.0009093
# Modelo 2 
mod2 <- lm(log(IED_M) ~log(lag(IED_M)) + log(Exportaciones_m)+ log(Educacion)+log(Inseguridad_Homicidio)+log(Salario_Diario)+log(Tipo_de_Cambio)+crisis_financiera, data = bd)
summary(mod2)
## 
## Call:
## lm(formula = log(IED_M) ~ log(lag(IED_M)) + log(Exportaciones_m) + 
##     log(Educacion) + log(Inseguridad_Homicidio) + log(Salario_Diario) + 
##     log(Tipo_de_Cambio) + crisis_financiera, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26148 -0.15268  0.02053  0.09231  0.35313 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                 18.2555     5.8840   3.103  0.00647 **
## log(lag(IED_M))             -0.3087     0.2307  -1.338  0.19858   
## log(Exportaciones_m)        -1.1575     0.6432  -1.799  0.08972 . 
## log(Educacion)               4.0847     1.1024   3.705  0.00176 **
## log(Inseguridad_Homicidio)  -0.3817     0.1815  -2.103  0.05069 . 
## log(Salario_Diario)          0.3481     0.2690   1.294  0.21291   
## log(Tipo_de_Cambio)          1.8107     0.8306   2.180  0.04361 * 
## crisis_financiera           -0.2093     0.1651  -1.268  0.22195   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1956 on 17 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7157, Adjusted R-squared:  0.5986 
## F-statistic: 6.113 on 7 and 17 DF,  p-value: 0.001103
## Model 3 
mod3 <- lm(log(IED_M) ~  log(Educacion)+log(Inseguridad_Homicidio)+log(Salario_Diario)+crisis_financiera, data = bd)
summary(mod3)
## 
## Call:
## lm(formula = log(IED_M) ~ log(Educacion) + log(Inseguridad_Homicidio) + 
##     log(Salario_Diario) + crisis_financiera, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.37219 -0.09881  0.01057  0.11007  0.37397 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                  6.6697     1.2952   5.149 4.21e-05 ***
## log(Educacion)               2.9459     0.7722   3.815  0.00101 ** 
## log(Inseguridad_Homicidio)  -0.2959     0.1412  -2.096  0.04840 *  
## log(Salario_Diario)          0.2346     0.1350   1.738  0.09679 .  
## crisis_financiera           -0.1221     0.1525  -0.801  0.43231    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2046 on 21 degrees of freedom
## Multiple R-squared:  0.6512, Adjusted R-squared:  0.5847 
## F-statistic:   9.8 on 4 and 21 DF,  p-value: 0.0001235
mod4 <- lm(log(IED_M) ~log(lag(IED_M)) + log(Exportaciones_m)+ log(Educacion^2)+log(Inseguridad_Homicidio)+log(Salario_Diario)+log(Tipo_de_Cambio)+crisis_financiera, data = bd)
summary(mod4)
## 
## Call:
## lm(formula = log(IED_M) ~ log(lag(IED_M)) + log(Exportaciones_m) + 
##     log(Educacion^2) + log(Inseguridad_Homicidio) + log(Salario_Diario) + 
##     log(Tipo_de_Cambio) + crisis_financiera, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26148 -0.15268  0.02053  0.09231  0.35313 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                 18.2555     5.8840   3.103  0.00647 **
## log(lag(IED_M))             -0.3087     0.2307  -1.338  0.19858   
## log(Exportaciones_m)        -1.1575     0.6432  -1.799  0.08972 . 
## log(Educacion^2)             2.0423     0.5512   3.705  0.00176 **
## log(Inseguridad_Homicidio)  -0.3817     0.1815  -2.103  0.05069 . 
## log(Salario_Diario)          0.3481     0.2690   1.294  0.21291   
## log(Tipo_de_Cambio)          1.8107     0.8306   2.180  0.04361 * 
## crisis_financiera           -0.2093     0.1651  -1.268  0.22195   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1956 on 17 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7157, Adjusted R-squared:  0.5986 
## F-statistic: 6.113 on 7 and 17 DF,  p-value: 0.001103

Compare models

stargazer(mod1,mod2,mod3,type="text",title="OLS Regression Results",single.row=TRUE,ci=FALSE,ci.level=0.9)
## 
## OLS Regression Results
## ===================================================================================================
##                                                      Dependent variable:                           
##                            ------------------------------------------------------------------------
##                                       IED_M                             log(IED_M)                 
##                                        (1)                       (2)                   (3)         
## ---------------------------------------------------------------------------------------------------
## Exportaciones_m                   -0.372 (0.776)                                                   
## Educacion                   135,770.100** (51,237.110)                                             
## Inseguridad_Homicidio         -8,056.819 (4,996.485)                                               
## Salario_Diario                  9.475 (1,342.384)                                                  
## Tipo_de_Cambio               34,033.200 (30,274.240)                                               
## log(lag(IED_M))                                            -0.309 (0.231)                          
## log(Exportaciones_m)                                       -1.157* (0.643)                         
## log(Educacion)                                            4.085*** (1.102)      2.946*** (0.772)   
## log(Inseguridad_Homicidio)                                 -0.382* (0.182)      -0.296** (0.141)   
## log(Salario_Diario)                                         0.348 (0.269)        0.235* (0.135)    
## log(Tipo_de_Cambio)                                        1.811** (0.831)                         
## crisis_financiera            -89,784.880 (83,821.190)      -0.209 (0.165)        -0.122 (0.152)    
## Constant                   -817,189.700** (371,769.400)   18.255*** (5.884)     6.670*** (1.295)   
## ---------------------------------------------------------------------------------------------------
## Observations                            26                       25                    26          
## R2                                    0.665                     0.716                 0.651        
## Adjusted R2                           0.559                     0.599                 0.585        
## Residual Std. Error            95,540.370 (df = 19)        0.196 (df = 17)       0.205 (df = 21)   
## F Statistic                   6.279*** (df = 6; 19)     6.113*** (df = 7; 17) 9.800*** (df = 4; 21)
## ===================================================================================================
## Note:                                                                   *p<0.1; **p<0.05; ***p<0.01
# Model graphs 
par(mfrow = c(1,2))
plot(x=predict(mod1),y=bd$IED_M,
     xlab='Predicted values',ylab='Observed values',
     main='Model 1')
abline(a=0,b=1,col="yellow")

plot(x=predict(mod3),y=bd$IED_M,
     xlab='Predicted values',ylab='Observed values',
     main='Model 3')
abline(a=0,b=1,col="yellow")

# Show the level of accuracy for each linear regression model
# Model 1 - level of accuracy
AIC(mod1)
## [1] 677.9295
# Model 2 - level of accuracy
AIC(mod2)
## [1] -2.283171
# Model 3 - level of accuracy
AIC(mod3)
## [1] -2.285823
# Model 4 - level of accuracy
AIC(mod4)
## [1] -2.283171

Model selction Select the regression model that better fits the data. Please consider diagnostic tests in selecting the model.

#Diagnostic tests
# Model 1:
vif(mod1)
##       Exportaciones_m             Educacion Inseguridad_Homicidio 
##             62.709115              3.278016              3.462635 
##        Salario_Diario        Tipo_de_Cambio     crisis_financiera 
##              6.342684             43.235405              1.421025
bptest(mod1)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod1
## BP = 5.8766, df = 6, p-value = 0.4372
AIC(mod1)
## [1] 677.9295
histogram(mod1$residuals)

# Model 2:
vif(mod2)
##            log(lag(IED_M))       log(Exportaciones_m) 
##                   3.467282                  50.247929 
##             log(Educacion) log(Inseguridad_Homicidio) 
##                   4.416202                   3.834936 
##        log(Salario_Diario)        log(Tipo_de_Cambio) 
##                   9.422846                  34.018824 
##          crisis_financiera 
##                   1.311019
bptest(mod2)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod2
## BP = 8.1307, df = 7, p-value = 0.3212
AIC(mod2)
## [1] -2.283171
histogram(mod2$residuals)

# Model 3:
vif(mod3)
##             log(Educacion) log(Inseguridad_Homicidio) 
##                   2.333886                   2.124843 
##        log(Salario_Diario)          crisis_financiera 
##                   2.508881                   1.025598
bptest(mod3)
## 
##  studentized Breusch-Pagan test
## 
## data:  mod3
## BP = 3.1743, df = 4, p-value = 0.5291
AIC(mod3)
## [1] -2.285823
histogram(mod3$residuals)

# VIF
vif(mod1)
##       Exportaciones_m             Educacion Inseguridad_Homicidio 
##             62.709115              3.278016              3.462635 
##        Salario_Diario        Tipo_de_Cambio     crisis_financiera 
##              6.342684             43.235405              1.421025
vif(mod2)
##            log(lag(IED_M))       log(Exportaciones_m) 
##                   3.467282                  50.247929 
##             log(Educacion) log(Inseguridad_Homicidio) 
##                   4.416202                   3.834936 
##        log(Salario_Diario)        log(Tipo_de_Cambio) 
##                   9.422846                  34.018824 
##          crisis_financiera 
##                   1.311019
vif(mod3)
##             log(Educacion) log(Inseguridad_Homicidio) 
##                   2.333886                   2.124843 
##        log(Salario_Diario)          crisis_financiera 
##                   2.508881                   1.025598
vif(mod4)
##            log(lag(IED_M))       log(Exportaciones_m) 
##                   3.467282                  50.247929 
##           log(Educacion^2) log(Inseguridad_Homicidio) 
##                   4.416202                   3.834936 
##        log(Salario_Diario)        log(Tipo_de_Cambio) 
##                   9.422846                  34.018824 
##          crisis_financiera 
##                   1.311019

Show the predicted values of the dependent variable (e.g., effects plot)

plot(x=predict(mod3),y=bd$IED_M,
     xlab='Predicted values',ylab='Observed values',
     main='Model 3')

Part 4.1

EXTRA LASSO

set.seed(123)                                
training.samples<-bd$IED_M %>%
  createDataPartition(p=0.75,list=FALSE)       

train.data<-bd[training.samples, ]   
test.data<-bd[-training.samples, ]

selected_model = lm(log(IED_M) ~ log(lag(IED_M)) + log(Exportaciones_m)+ log(Educacion^2)+log(Inseguridad_Homicidio)+log(Salario_Diario)+log(Tipo_de_Cambio)+crisis_financiera, data=bd) 
summary(selected_model)
## 
## Call:
## lm(formula = log(IED_M) ~ log(lag(IED_M)) + log(Exportaciones_m) + 
##     log(Educacion^2) + log(Inseguridad_Homicidio) + log(Salario_Diario) + 
##     log(Tipo_de_Cambio) + crisis_financiera, data = bd)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.26148 -0.15268  0.02053  0.09231  0.35313 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)   
## (Intercept)                 18.2555     5.8840   3.103  0.00647 **
## log(lag(IED_M))             -0.3087     0.2307  -1.338  0.19858   
## log(Exportaciones_m)        -1.1575     0.6432  -1.799  0.08972 . 
## log(Educacion^2)             2.0423     0.5512   3.705  0.00176 **
## log(Inseguridad_Homicidio)  -0.3817     0.1815  -2.103  0.05069 . 
## log(Salario_Diario)          0.3481     0.2690   1.294  0.21291   
## log(Tipo_de_Cambio)          1.8107     0.8306   2.180  0.04361 * 
## crisis_financiera           -0.2093     0.1651  -1.268  0.22195   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1956 on 17 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7157, Adjusted R-squared:  0.5986 
## F-statistic: 6.113 on 7 and 17 DF,  p-value: 0.001103
RMSE(selected_model$fitted.values,test.data$IED_M)
## Warning in pred - obs: longitud de objeto mayor no es múltiplo de la longitud
## de uno menor
## [1] 535321.2
#mod3 <- lm(log(IED_M) ~log(lag(IED_M)) + log(Exportaciones_m)+ log(Educacion^2)+log(Inseguridad_Homicidio)+log(Salario_Diario)+log(Tipo_de_Cambio)+crisis_financiera, data = bd)
#summary(mod3)

x = model.matrix(log(IED_M) ~ log(lag(IED_M)) + log(Exportaciones_m)+ log(Educacion^2)+log(Inseguridad_Homicidio)+log(Salario_Diario)+log(Tipo_de_Cambio)+crisis_financiera, train.data)[,-1]
y = train.data$IED_M[-1]


set.seed(123) 
cv.lasso<-cv.glmnet(x,y,alpha=1)
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
## fold
cv.lasso$lambda.min 
## [1] 1122.986
lassomodel<-glmnet(x,y,alpha=1,lambda=cv.lasso$lambda.min)

coef(lassomodel)
## 8 x 1 sparse Matrix of class "dgCMatrix"
##                                    s0
## (Intercept)                1653473.85
## log(lag(IED_M))             -60727.67
## log(Exportaciones_m)       -312043.98
## log(Educacion^2)            447115.13
## log(Inseguridad_Homicidio) -143672.53
## log(Salario_Diario)         -25170.11
## log(Tipo_de_Cambio)         864080.72
## crisis_financiera           -80593.82
x.test<-model.matrix(log(IED_M) ~ Tipo_de_Cambio + Empleo + Salario_Diario + PIB_Per_Capita + CO2_Emisiones + Inseguridad_Robo,test.data)[,-1]

#lassopredictions <- lassomodel %>% predict(x.test) %>% as.vector()

#data.frame(
 # RMSE = RMSE(lassopredictions, test.data$IED_M),
 # Rsquare = R2(lassopredictions, test.data$IED_M))

# Lasso model graph
lbs_fun <- function(fit, offset_x=1, ...) {
  L <- length(fit$lambda)
  x <- log(fit$lambda[L])+ offset_x
  y <- fit$beta[, L]
  labs <- names(y)
  text(x, y, labels=labs, ...)
}

lasso<-glmnet(scale(x),y,alpha=1)

plot(lasso,xvar="lambda",label=T)
lbs_fun(lasso)
abline(v=cv.lasso$lambda.min,col="red",lty=2)
abline(v=cv.lasso$lambda.1se,col="blue",lty=2)

As we can see, the signs of the coefficients are not different tather than my winning model #3. Which verifies that the model 3 is the most appropiate in comparison with my other models.

A Lasso regression is used to estimate linear regression models regularized by loss L1 for a dependent variable in one or more independent variables, and includes optional modes to display crawl charts and to select the alpha hyperparameter value based on cross validation.

Lasso regression generates “scattered coefficients”: coefficient vectors in which most of them take the zero value. This means that the model will ignore some of the predictive features, which can be considered a type of automatic feature selection. Including fewer features is a simpler model to interpret that can reveal the most important features of the dataset. In the event that there is some correlation between the predictive characteristics, Lasso will tend to choose one of them at random.

Part 4. 2

EXTRA DETECT AUTOCORRELTION (JUST A GRAPH)

ts_data <- ts(bd)
acf(ts_data)

Box.test(mod3$residuals,lag=5,type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  mod3$residuals
## X-squared = 4.3962, df = 5, p-value = 0.4939
Box.test(mod1$residuals,lag=5,type="Ljung-Box")
## 
##  Box-Ljung test
## 
## data:  mod1$residuals
## X-squared = 5.5708, df = 5, p-value = 0.3502
tsmodel1_res<-ts(mod3$residuals,start=1997,end=2020,frequency=1) 
tsmodel2_res<-ts(mod1$residuals,start=1997,end=2020,frequency=1)
### Detect serial autocorrelation in acf plots
acf(tsmodel1_res)

acf(tsmodel2_res)

#The Ljung-Box test is used to check for autocorrelation in the residuals of a time series model. This parameter sets the number of lags to consider in the test. The test will check for autocorrelation up to the 5th lag in this case. 

#Model 3
# The p-value of 0.4587 > than the conventional significance level of 0.05, I fail to reject the null hypothesis. This means that there is no evidence to suggest significant autocorrelation in the residuals of the model.

#Model 1
# The p-value of 0.3502 > than 0.05, I fail to reject the null hypothesis. There's no evidence to suggest significant autocorrelation in the residuals of this model.

# There's no significant evidence of autocorrelation in the residuals  based on the Ljung-Box test. This is something positive, it indicates that the models might be capturing the underlying patterns in the data effectively, without leaving patterns (autocorrelation) in the residuals.

Part 5

Conclusions. General interpretations (VIF, multicollinearity and Insights) The winning model in my case was 3, this can be seen in these factors: It has the smallest VIF (Variance Inflation Factor) in general of the variables which tells us that the data are more accurate and there is very little multicollinearity, there is only a little in the exchange rate variable.

As we have seen in class, multicollinearity exists when there is a correlation between multiple independent variables in a multiple regression model. To obtain the most accurate model, what I did was to eliminate variables so that they did not have a VIF greater than 10, which is considered multicollinearity. The variables with the most impact on our dependent variable are Educacion and Inseguridad Homicidio.

Insights:

  • A 1-unit increase in the logarithm of “Educacion” is related with an estimated increase of 2.54806 in the variable, holding other variables constant. This is statistically significant at the 0.05/5%.

  • A 1-unit increase in variable “Inseguridad_Homicidio” is related with a decrease of 0.34275 in the variable, and the other variables are constant. This is also statistically significant at the 5%l.

  • Talking about Tipo_de_cambio, its not statistically significant (p = 0.4140). It seems to have a potential positive association with the dependent variable , but the evidence isn’t strong enough.

  • The Adjusted R-squared of model 3 is 0.5786, in my models is the most accurate in relationship with my other models.The model 3 is my most statistically significant model affecting the dependent variable. Educacion and Inseguridad_Homicidio are significant predictors in the model. Other variables like Salario_Diario, Tipo_de_Cambio, and crisis_financiera are not statistically significant to the depend variable IED_M

References

Vargas C. (2023, March 29). Nearshoring, la nueva frontera de México. https://egade.tec.mx/es/egade-ideas/opinion/nearshoring-la-nueva-frontera-de-mexico

Garcia S. (2023, August 16). Invita Samuel García a capitalizar ventajas que ofrece “Near Nuevo León” https://www.nl.gob.mx/boletines-comunicados-y-avisos/invita-samuel-garcia-capitalizar-ventajas-que-ofrece-el-near-nuevo

GutCheck (2017, December 5) PREDICTIVE ANALYTICS AND REGRESSION MODELS EXPLAINED https://gutcheckit.com/blog/predictive-analytics-regression-models-explained/

IBM. (2022, May 20) What is predictive analytics?. https://www.ibm.com/topics/predictive-analytics

T. Osvarauld (2023, February 10) Predictive analytics https://www.cio.com/article/228901/what-is-predictive-analytics-transforming-data-into-future-insights.html

