Part 1 - Background

Briefly describe what is “Nearshoring”. In doing so, please also explain why Mexico might be attractive to “Nearshoring”.

Nearshoring is the relocation of business projects as part of a reconstruction within the supply chain. This relocation consists of shortening the supply chain by bringing the production centers closer and consequently, both production costs and risks for the company are reduced.

Mexico has some benefits that might attract nearshoring investment projects. One of its greatest advantages is its proximity to the United States market, since as mentioned above, companies seek to shorten the distance from their production centers. Another reason why it is attractive for Nearshoring is the difference in wages and the cost of labor between Mexico and other countries. Such is the case of corporations within the United States that decide to relocate in Mexico to reduce their costs. According to a report by the Reuters agency, US manufacturers pay between $8 and $10 USD per hour to workers inside Mexican factories. Meanwhile, General Motors comes to pay up to $68 USD per hour to workers in its factory in Detroit (BBC World, 2017).

Briefly describe what is Predictive Analytics. In doing so, please also explain what the use is of regression analysis in predictive analytics.

Predictive Analytics is the practice of using historical data, statistical algorithms, and machine learning techniques to forecast future events or outcomes. It consists in analyzing past patterns within data with the purpose of developing informed predictions about future behaviors or events. This helps businesses and organizations take better and informed decisions.

A fundamental method within predictive analyisis is regression analysis, which is a statistical method used to calculate the relationship between a dependent variable and one or more independent variables. Regression analysis is a method that’s used in predictive analysis in order to have a mathematical relationship between variables and make a preciser prediction based in this values.

Briefly explain how regression analysis can help us to predict the occurrence of “Nearshoring” for the Mexican case.

As mentioned before, regression analysis is a mathematical method within predictive analysis that is useful to determine the relationship between independent variables and the dependent variable. This is why it would be useful applying this method to predict the occurrence of “Nearshoring” for the Mexican case. This method will help us forecast which of the given independent variables (exchange rate, insecurity due to homicide, education, minimum wage, etc.) has the greatest impact on the attraction of “Nearshoring” investors to Mexico, that is, which of the socioeconomic, technological, ecological or security variables will be more attractive to investors when considering relocating their companies to Mexico in the coming years.

Part 2 - Problem Situation

According to the document “Mexico and Its Attractiveness for Nearshoring”, what is the problem situation? how to address the problem situation?

According to the document “Mexico and Its Attractiveness for Nearshoring” the problem situation focuses on discovering what are the conditions that Mexico offers to attract Nearshoring, that is, which of the socioeconomic, environmental, security or technological conditions are the ones that take the most foreign investors into account when relocating their companies in Mexico. Many data analysis must be carried out and methods such as regression analysis must be applied in order to detect which of the variables has the greatest impact on foreign investments within Mexico.

PART 3 – Data and Methodology

# Here we import the libraries:

library(foreign)
library(dplyr)        # data manipulation 
library(forcats)      # to work with categorical variables
library(ggplot2)      # data visualization 
library(readr)        # read specific csv files
library(janitor)      # data exploration and cleaning 
library(Hmisc)        # several useful functions for data analysis 
library(psych)        # functions for multivariate analysis 
library(naniar)       # summaries and visualization of missing values NA's
library(dlookr)       # summaries and visualization of missing values NA's
library(corrplot)     # correlation plots
library(jtools)       # presentation of regression analysis 
library(lmtest)       # diagnostic checks - linear regression analysis 
library(car)          # diagnostic checks - linear regression analysis
library(olsrr)        # diagnostic checks - linear regression analysis 
library(naniar)       # identifying missing values
library(stargazer)    # create publication quality tables
library(effects)      # displays for linear and other regression models
library(tidyverse)    # collection of R packages designed for data science
library(caret)        # Classification and Regression Training 
library(glmnet)       # methods for prediction and plotting, and functions for cross-validation
library(readxl)       # Read excel files
library(ggplot2)
library(effects)   

Exploratory Data Analysis – EDA

# Here we upload the database from excel and select only the first sheet.

dataset <- read_excel("/Users/yessicaacosta/Downloads/SP_DataMexicoAtractiveness_alumn-VF_corrected.xlsx",sheet=1,range="A6:R32",na="-")

dataset
## # A tibble: 26 × 18
##      Año IED_Flujos IED_Flujos_MXN Exportaciones Exportaciones_MXN Empleo
##    <dbl>      <dbl>          <dbl>         <dbl>             <dbl>  <dbl>
##  1  1997     12146.        294298.         9088.           220201.   NA  
##  2  1998      8374.        210849.         9875.           248659.   NA  
##  3  1999     13960.        299834.        10990.           236039.   NA  
##  4  2000     18249.        362638.        12483.           248061.   97.8
##  5  2001     30057.        546448.        11300.           205445.   97.4
##  6  2002     24099.        468391.        11923.           231737.   97.7
##  7  2003     18250.        368747.        13156            265822.   97.1
##  8  2004     25016.        481300.        13573.           261147.   96.5
##  9  2005     25796.        458581.        16466.           292718.   97.2
## 10  2006     21233.        368329.        17486.           303335.   96.5
## # ℹ 16 more rows
## # ℹ 12 more variables: Educación <dbl>, Salario_Diario <dbl>, Innovación <dbl>,
## #   Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>, Tipo_de_Cambio <dbl>,
## #   Densidad_Carretera <dbl>, Densidad_Población <dbl>, CO2_Emisiones <dbl>,
## #   PIB_Per_Cápita <dbl>, INPC <dbl>, Crisis_Financiera <dbl>
# Which are the variables we are working with?

variables <- colnames(dataset)
variables
##  [1] "Año"                   "IED_Flujos"            "IED_Flujos_MXN"       
##  [4] "Exportaciones"         "Exportaciones_MXN"     "Empleo"               
##  [7] "Educación"             "Salario_Diario"        "Innovación"           
## [10] "Inseguridad_Robo"      "Inseguridad_Homicidio" "Tipo_de_Cambio"       
## [13] "Densidad_Carretera"    "Densidad_Población"    "CO2_Emisiones"        
## [16] "PIB_Per_Cápita"        "INPC"                  "Crisis_Financiera"
# Getting to know the data type of our variables

str(dataset) ## all of our variables show to be numerical
## tibble [26 × 18] (S3: tbl_df/tbl/data.frame)
##  $ Año                  : num [1:26] 1997 1998 1999 2000 2001 ...
##  $ IED_Flujos           : num [1:26] 12146 8374 13960 18249 30057 ...
##  $ IED_Flujos_MXN       : num [1:26] 294298 210849 299834 362638 546448 ...
##  $ Exportaciones        : num [1:26] 9088 9875 10990 12483 11300 ...
##  $ Exportaciones_MXN    : num [1:26] 220201 248659 236039 248061 205445 ...
##  $ Empleo               : num [1:26] NA NA NA 97.8 97.4 ...
##  $ Educación            : num [1:26] 7.2 7.31 7.43 7.56 7.68 ...
##  $ Salario_Diario       : num [1:26] 24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovación           : num [1:26] 11.3 11.4 12.5 13.1 13.5 ...
##  $ Inseguridad_Robo     : num [1:26] 267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num [1:26] 14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num [1:26] 8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num [1:26] 0.0521 0.053 0.055 0.0552 0.0565 ...
##  $ Densidad_Población   : num [1:26] 47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num [1:26] 3.68 3.85 3.69 3.87 3.81 ...
##  $ PIB_Per_Cápita       : num [1:26] 127570 126739 129165 130875 128083 ...
##  $ INPC                 : num [1:26] 33.3 39.5 44.3 48.3 50.4 ...
##  $ Crisis_Financiera    : num [1:26] 0 0 0 0 0 0 0 0 0 0 ...
# Here we can see the first values of our dataset
head(dataset)
## # A tibble: 6 × 18
##     Año IED_Flujos IED_Flujos_MXN Exportaciones Exportaciones_MXN Empleo
##   <dbl>      <dbl>          <dbl>         <dbl>             <dbl>  <dbl>
## 1  1997     12146.        294298.         9088.           220201.   NA  
## 2  1998      8374.        210849.         9875.           248659.   NA  
## 3  1999     13960.        299834.        10990.           236039.   NA  
## 4  2000     18249.        362638.        12483.           248061.   97.8
## 5  2001     30057.        546448.        11300.           205445.   97.4
## 6  2002     24099.        468391.        11923.           231737.   97.7
## # ℹ 12 more variables: Educación <dbl>, Salario_Diario <dbl>, Innovación <dbl>,
## #   Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>, Tipo_de_Cambio <dbl>,
## #   Densidad_Carretera <dbl>, Densidad_Población <dbl>, CO2_Emisiones <dbl>,
## #   PIB_Per_Cápita <dbl>, INPC <dbl>, Crisis_Financiera <dbl>

Descriptive Statistics

# Calculate descriptive statistics
statistics_dt <- summary(dataset)
statistics_dt
##       Año         IED_Flujos    IED_Flujos_MXN   Exportaciones  
##  Min.   :1997   Min.   : 8374   Min.   :210849   Min.   : 9088  
##  1st Qu.:2003   1st Qu.:21367   1st Qu.:368434   1st Qu.:13260  
##  Median :2010   Median :27698   Median :497116   Median :21188  
##  Mean   :2010   Mean   :26770   Mean   :493603   Mean   :23601  
##  3rd Qu.:2016   3rd Qu.:32183   3rd Qu.:578789   3rd Qu.:31601  
##  Max.   :2022   Max.   :48354   Max.   :754160   Max.   :46478  
##                                                                 
##  Exportaciones_MXN     Empleo        Educación     Salario_Diario  
##  Min.   :205445    Min.   :95.06   Min.   :7.198   Min.   : 24.30  
##  1st Qu.:262316    1st Qu.:95.90   1st Qu.:7.864   1st Qu.: 41.97  
##  Median :366363    Median :96.53   Median :8.457   Median : 54.48  
##  Mean   :433858    Mean   :96.47   Mean   :8.424   Mean   : 65.16  
##  3rd Qu.:632411    3rd Qu.:97.08   3rd Qu.:9.004   3rd Qu.: 72.31  
##  Max.   :785503    Max.   :97.83   Max.   :9.579   Max.   :172.87  
##                    NA's   :3       NA's   :3                       
##    Innovación    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio  
##  Min.   :11.28   Min.   :120.5    Min.   : 8.037        Min.   : 8.064  
##  1st Qu.:12.56   1st Qu.:148.3    1st Qu.:10.250        1st Qu.:10.752  
##  Median :13.09   Median :181.8    Median :16.928        Median :13.016  
##  Mean   :13.11   Mean   :185.4    Mean   :17.292        Mean   :13.910  
##  3rd Qu.:13.75   3rd Qu.:209.9    3rd Qu.:22.433        3rd Qu.:18.489  
##  Max.   :15.11   Max.   :314.8    Max.   :29.592        Max.   :20.664  
##  NA's   :2                        NA's   :1                             
##  Densidad_Carretera Densidad_Población CO2_Emisiones   PIB_Per_Cápita  
##  Min.   :0.05205    Min.   :47.44      Min.   :3.592   Min.   :126739  
##  1st Qu.:0.05954    1st Qu.:52.78      1st Qu.:3.832   1st Qu.:130964  
##  Median :0.06989    Median :58.09      Median :3.925   Median :136845  
##  Mean   :0.07106    Mean   :57.33      Mean   :3.946   Mean   :138550  
##  3rd Qu.:0.08275    3rd Qu.:61.39      3rd Qu.:4.106   3rd Qu.:146148  
##  Max.   :0.09020    Max.   :65.60      Max.   :4.221   Max.   :153236  
##                                        NA's   :3                       
##       INPC        Crisis_Financiera
##  Min.   : 33.28   Min.   :0.00000  
##  1st Qu.: 56.15   1st Qu.:0.00000  
##  Median : 73.35   Median :0.00000  
##  Mean   : 75.17   Mean   :0.07692  
##  3rd Qu.: 91.29   3rd Qu.:0.00000  
##  Max.   :126.48   Max.   :1.00000  
## 
#Identify missing values in our dataset:
is.na(dataset)
##         Año IED_Flujos IED_Flujos_MXN Exportaciones Exportaciones_MXN Empleo
##  [1,] FALSE      FALSE          FALSE         FALSE             FALSE   TRUE
##  [2,] FALSE      FALSE          FALSE         FALSE             FALSE   TRUE
##  [3,] FALSE      FALSE          FALSE         FALSE             FALSE   TRUE
##  [4,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
##  [5,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
##  [6,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
##  [7,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
##  [8,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
##  [9,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [10,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [11,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [12,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [13,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [14,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [15,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [16,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [17,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [18,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [19,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [20,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [21,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [22,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [23,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [24,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [25,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
## [26,] FALSE      FALSE          FALSE         FALSE             FALSE  FALSE
##       Educación Salario_Diario Innovación Inseguridad_Robo
##  [1,]     FALSE          FALSE      FALSE            FALSE
##  [2,]     FALSE          FALSE      FALSE            FALSE
##  [3,]     FALSE          FALSE      FALSE            FALSE
##  [4,]     FALSE          FALSE      FALSE            FALSE
##  [5,]     FALSE          FALSE      FALSE            FALSE
##  [6,]     FALSE          FALSE      FALSE            FALSE
##  [7,]     FALSE          FALSE      FALSE            FALSE
##  [8,]     FALSE          FALSE      FALSE            FALSE
##  [9,]     FALSE          FALSE      FALSE            FALSE
## [10,]     FALSE          FALSE      FALSE            FALSE
## [11,]     FALSE          FALSE      FALSE            FALSE
## [12,]     FALSE          FALSE      FALSE            FALSE
## [13,]     FALSE          FALSE      FALSE            FALSE
## [14,]     FALSE          FALSE      FALSE            FALSE
## [15,]     FALSE          FALSE      FALSE            FALSE
## [16,]     FALSE          FALSE      FALSE            FALSE
## [17,]     FALSE          FALSE      FALSE            FALSE
## [18,]     FALSE          FALSE      FALSE            FALSE
## [19,]     FALSE          FALSE      FALSE            FALSE
## [20,]     FALSE          FALSE      FALSE            FALSE
## [21,]     FALSE          FALSE      FALSE            FALSE
## [22,]     FALSE          FALSE      FALSE            FALSE
## [23,]     FALSE          FALSE      FALSE            FALSE
## [24,]      TRUE          FALSE      FALSE            FALSE
## [25,]      TRUE          FALSE       TRUE            FALSE
## [26,]      TRUE          FALSE       TRUE            FALSE
##       Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera
##  [1,]                 FALSE          FALSE              FALSE
##  [2,]                 FALSE          FALSE              FALSE
##  [3,]                 FALSE          FALSE              FALSE
##  [4,]                 FALSE          FALSE              FALSE
##  [5,]                 FALSE          FALSE              FALSE
##  [6,]                 FALSE          FALSE              FALSE
##  [7,]                 FALSE          FALSE              FALSE
##  [8,]                 FALSE          FALSE              FALSE
##  [9,]                 FALSE          FALSE              FALSE
## [10,]                 FALSE          FALSE              FALSE
## [11,]                 FALSE          FALSE              FALSE
## [12,]                 FALSE          FALSE              FALSE
## [13,]                 FALSE          FALSE              FALSE
## [14,]                 FALSE          FALSE              FALSE
## [15,]                 FALSE          FALSE              FALSE
## [16,]                 FALSE          FALSE              FALSE
## [17,]                 FALSE          FALSE              FALSE
## [18,]                 FALSE          FALSE              FALSE
## [19,]                 FALSE          FALSE              FALSE
## [20,]                 FALSE          FALSE              FALSE
## [21,]                 FALSE          FALSE              FALSE
## [22,]                 FALSE          FALSE              FALSE
## [23,]                 FALSE          FALSE              FALSE
## [24,]                 FALSE          FALSE              FALSE
## [25,]                 FALSE          FALSE              FALSE
## [26,]                  TRUE          FALSE              FALSE
##       Densidad_Población CO2_Emisiones PIB_Per_Cápita  INPC Crisis_Financiera
##  [1,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [2,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [3,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [4,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [5,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [6,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [7,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [8,]              FALSE         FALSE          FALSE FALSE             FALSE
##  [9,]              FALSE         FALSE          FALSE FALSE             FALSE
## [10,]              FALSE         FALSE          FALSE FALSE             FALSE
## [11,]              FALSE         FALSE          FALSE FALSE             FALSE
## [12,]              FALSE         FALSE          FALSE FALSE             FALSE
## [13,]              FALSE         FALSE          FALSE FALSE             FALSE
## [14,]              FALSE         FALSE          FALSE FALSE             FALSE
## [15,]              FALSE         FALSE          FALSE FALSE             FALSE
## [16,]              FALSE         FALSE          FALSE FALSE             FALSE
## [17,]              FALSE         FALSE          FALSE FALSE             FALSE
## [18,]              FALSE         FALSE          FALSE FALSE             FALSE
## [19,]              FALSE         FALSE          FALSE FALSE             FALSE
## [20,]              FALSE         FALSE          FALSE FALSE             FALSE
## [21,]              FALSE         FALSE          FALSE FALSE             FALSE
## [22,]              FALSE         FALSE          FALSE FALSE             FALSE
## [23,]              FALSE         FALSE          FALSE FALSE             FALSE
## [24,]              FALSE          TRUE          FALSE FALSE             FALSE
## [25,]              FALSE          TRUE          FALSE FALSE             FALSE
## [26,]              FALSE          TRUE          FALSE FALSE             FALSE
# Counting the missing values in the dataset
sum(is.na(dataset))  ### 12 missing values are identified
## [1] 12
# Visual representation of missing values among the datset
gg_miss_var(dataset) 

# Replacing null values with the median value
# By replacing null values with the median values in the corresponding variables we reduce de variation of data

dataset <- dataset %>%
  mutate_all(~ifelse(is.na(.),median(., na.rm = TRUE),.))
print(dataset)
## # A tibble: 26 × 18
##      Año IED_Flujos IED_Flujos_MXN Exportaciones Exportaciones_MXN Empleo
##    <dbl>      <dbl>          <dbl>         <dbl>             <dbl>  <dbl>
##  1  1997     12146.        294298.         9088.           220201.   96.5
##  2  1998      8374.        210849.         9875.           248659.   96.5
##  3  1999     13960.        299834.        10990.           236039.   96.5
##  4  2000     18249.        362638.        12483.           248061.   97.8
##  5  2001     30057.        546448.        11300.           205445.   97.4
##  6  2002     24099.        468391.        11923.           231737.   97.7
##  7  2003     18250.        368747.        13156            265822.   97.1
##  8  2004     25016.        481300.        13573.           261147.   96.5
##  9  2005     25796.        458581.        16466.           292718.   97.2
## 10  2006     21233.        368329.        17486.           303335.   96.5
## # ℹ 16 more rows
## # ℹ 12 more variables: Educación <dbl>, Salario_Diario <dbl>, Innovación <dbl>,
## #   Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>, Tipo_de_Cambio <dbl>,
## #   Densidad_Carretera <dbl>, Densidad_Población <dbl>, CO2_Emisiones <dbl>,
## #   PIB_Per_Cápita <dbl>, INPC <dbl>, Crisis_Financiera <dbl>
# Transformation of variables
# In the descriptive statistics we can observe that in the independent variable "exportaciones", de min and max variable have a big difference between them, causing the data to have a left distribution (showed in the histogram below)
# Replacing the max and min values can help get a better distribution of the data
 
dataset$Exportaciones_MXN[dataset$Exportaciones_MXN == max(dataset$Exportaciones_MXN)] <- median(dataset$Exportaciones_MXN)
dataset$Exportaciones_MXN[dataset$Exportaciones_MXN == min(dataset$Exportaciones_MXN)] <- median(dataset$Exportaciones_MXN)

dataset
## # A tibble: 26 × 18
##      Año IED_Flujos IED_Flujos_MXN Exportaciones Exportaciones_MXN Empleo
##    <dbl>      <dbl>          <dbl>         <dbl>             <dbl>  <dbl>
##  1  1997     12146.        294298.         9088.           220201.   96.5
##  2  1998      8374.        210849.         9875.           248659.   96.5
##  3  1999     13960.        299834.        10990.           236039.   96.5
##  4  2000     18249.        362638.        12483.           248061.   97.8
##  5  2001     30057.        546448.        11300.           362218.   97.4
##  6  2002     24099.        468391.        11923.           231737.   97.7
##  7  2003     18250.        368747.        13156            265822.   97.1
##  8  2004     25016.        481300.        13573.           261147.   96.5
##  9  2005     25796.        458581.        16466.           292718.   97.2
## 10  2006     21233.        368329.        17486.           303335.   96.5
## # ℹ 16 more rows
## # ℹ 12 more variables: Educación <dbl>, Salario_Diario <dbl>, Innovación <dbl>,
## #   Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>, Tipo_de_Cambio <dbl>,
## #   Densidad_Carretera <dbl>, Densidad_Población <dbl>, CO2_Emisiones <dbl>,
## #   PIB_Per_Cápita <dbl>, INPC <dbl>, Crisis_Financiera <dbl>
# Showing statistics of our datset with complete values

summary(dataset)
##       Año         IED_Flujos    IED_Flujos_MXN   Exportaciones  
##  Min.   :1997   Min.   : 8374   Min.   :210849   Min.   : 9088  
##  1st Qu.:2003   1st Qu.:21367   1st Qu.:368434   1st Qu.:13260  
##  Median :2010   Median :27698   Median :497116   Median :21188  
##  Mean   :2010   Mean   :26770   Mean   :493603   Mean   :23601  
##  3rd Qu.:2016   3rd Qu.:32183   3rd Qu.:578789   3rd Qu.:31601  
##  Max.   :2022   Max.   :48354   Max.   :754160   Max.   :46478  
##  Exportaciones_MXN     Empleo        Educación     Salario_Diario  
##  Min.   :220201    Min.   :95.06   Min.   :7.198   Min.   : 24.30  
##  1st Qu.:272546    1st Qu.:96.09   1st Qu.:7.955   1st Qu.: 41.97  
##  Median :364291    Median :96.53   Median :8.457   Median : 54.48  
##  Mean   :423767    Mean   :96.48   Mean   :8.428   Mean   : 65.16  
##  3rd Qu.:571383    3rd Qu.:97.01   3rd Qu.:8.929   3rd Qu.: 72.31  
##  Max.   :749408    Max.   :97.83   Max.   :9.579   Max.   :172.87  
##    Innovación    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio  
##  Min.   :11.28   Min.   :120.5    Min.   : 8.037        Min.   : 8.064  
##  1st Qu.:12.60   1st Qu.:148.3    1st Qu.:10.402        1st Qu.:10.752  
##  Median :13.09   Median :181.8    Median :16.928        Median :13.016  
##  Mean   :13.10   Mean   :185.4    Mean   :17.278        Mean   :13.910  
##  3rd Qu.:13.60   3rd Qu.:209.9    3rd Qu.:22.346        3rd Qu.:18.489  
##  Max.   :15.11   Max.   :314.8    Max.   :29.592        Max.   :20.664  
##  Densidad_Carretera Densidad_Población CO2_Emisiones   PIB_Per_Cápita  
##  Min.   :0.05205    Min.   :47.44      Min.   :3.592   Min.   :126739  
##  1st Qu.:0.05954    1st Qu.:52.78      1st Qu.:3.843   1st Qu.:130964  
##  Median :0.06989    Median :58.09      Median :3.925   Median :136845  
##  Mean   :0.07106    Mean   :57.33      Mean   :3.944   Mean   :138550  
##  3rd Qu.:0.08275    3rd Qu.:61.39      3rd Qu.:4.088   3rd Qu.:146148  
##  Max.   :0.09020    Max.   :65.60      Max.   :4.221   Max.   :153236  
##       INPC        Crisis_Financiera
##  Min.   : 33.28   Min.   :0.00000  
##  1st Qu.: 56.15   1st Qu.:0.00000  
##  Median : 73.35   Median :0.00000  
##  Mean   : 75.17   Mean   :0.07692  
##  3rd Qu.: 91.29   3rd Qu.:0.00000  
##  Max.   :126.48   Max.   :1.00000
# Measures of Dispersion

describe(dataset)
## # A tibble: 18 × 26
##    described_variables       n    na       mean      sd se_mean     IQR skewness
##    <chr>                 <int> <int>      <dbl>   <dbl>   <dbl>   <dbl>    <dbl>
##  1 Año                      26     0    2.01e+3 7.65e+0 1.5 e+0 1.25e+1   0     
##  2 IED_Flujos               26     0    2.68e+4 8.77e+3 1.72e+3 1.08e+4  -0.0209
##  3 IED_Flujos_MXN           26     0    4.94e+5 1.44e+5 2.82e+4 2.10e+5  -0.0125
##  4 Exportaciones            26     0    2.36e+4 1.13e+4 2.23e+3 1.83e+4   0.514 
##  5 Exportaciones_MXN        26     0    4.24e+5 1.77e+5 3.47e+4 2.99e+5   0.665 
##  6 Empleo                   26     0    9.65e+1 7.21e-1 1.41e-1 9.22e-1  -0.191 
##  7 Educación                26     0    8.43e+0 6.76e-1 1.33e-1 9.74e-1  -0.130 
##  8 Salario_Diario           26     0    6.52e+1 3.58e+1 7.03e+0 3.03e+1   1.62  
##  9 Innovación               26     0    1.31e+1 1.07e+0 2.10e-1 1.01e+0   0.146 
## 10 Inseguridad_Robo         26     0    1.85e+2 4.77e+1 9.35e+0 6.16e+1   1.00  
## 11 Inseguridad_Homicidio    26     0    1.73e+1 7.12e+0 1.40e+0 1.19e+1   0.430 
## 12 Tipo_de_Cambio           26     0    1.39e+1 4.15e+0 8.14e-1 7.74e+0   0.498 
## 13 Densidad_Carretera       26     0    7.11e-2 1.34e-2 2.62e-3 2.32e-2   0.201 
## 14 Densidad_Población       26     0    5.73e+1 5.41e+0 1.06e+0 8.61e+0  -0.210 
## 15 CO2_Emisiones            26     0    3.94e+0 1.81e-1 3.54e-2 2.46e-1  -0.121 
## 16 PIB_Per_Cápita           26     0    1.39e+5 8.86e+3 1.74e+3 1.52e+4   0.310 
## 17 INPC                     26     0    7.52e+1 2.48e+1 4.87e+0 3.51e+1   0.295 
## 18 Crisis_Financiera        26     0    7.69e-2 2.72e-1 5.33e-2 0         3.37  
## # ℹ 18 more variables: kurtosis <dbl>, p00 <dbl>, p01 <dbl>, p05 <dbl>,
## #   p10 <dbl>, p20 <dbl>, p25 <dbl>, p30 <dbl>, p40 <dbl>, p50 <dbl>,
## #   p60 <dbl>, p70 <dbl>, p75 <dbl>, p80 <dbl>, p90 <dbl>, p95 <dbl>,
## #   p99 <dbl>, p100 <dbl>

Data Visualization

# Graph explaining the distribution of the variable "Crisis Financiera", where 0 stand for NO and 1 for YES.

boxplot(log(IED_Flujos_MXN)~Crisis_Financiera,data=dataset)

# Graph 1:
graph1 <- ggplot(data=dataset,aes(x=Año,IED_Flujos_MXN,y=IED_Flujos_MXN,fill=Tipo_de_Cambio)) +
  geom_bar(stat="identity") 
graph1

This graph represents the change of Foreign Investment in México through the years and how it is also related to the Change Type. We can interpret that the type of change may explain some of the change in Foreign Investment.

ggplot(dataset, aes(x = Tipo_de_Cambio , y = IED_Flujos_MXN)) +
  geom_point(color = "pink", size = 4, shape = 18) +
  labs(title = "Tipo de cambio vs. Flujo de Inversión",
       x = "Tipo de Cambio",
       y = "Flujo de Inversión Extranjera")

We can observe that the flow of investment values are not constant vs. the typo of change.

set.seed(1)
x <- dataset$Innovación
y <- log(dataset$IED_Flujos_MXN)
df <- data.frame(x = x, y = y)


graph4 <- ggplot(df, aes(x = x, y = y)) +
  geom_line()

graph4 +
  labs(title = "Innovación vs. Flujo de Inversión",
       x = "Innovación",
       y = "Flujo de Inversión")

ggplot(dataset, aes(x=log(Exportaciones),y=IED_Flujos_MXN)) + 
  geom_point(aes(size=Tipo_de_Cambio))

We can interpret that there is a relationship between the type of change and exportations that might explain the flow’s behavior.

# Se aplica una transformación logarítmica a la variable IED_Flujos para reducir el rango de valores

ggplot(dataset, aes(x=Salario_Diario, y= log(IED_Flujos), col=Salario_Diario)) + geom_point() + 
  ggtitle("Flujos de Inversión & Salario") +
  xlab("Salario") + 
  ylab("IED_Flujos") + 
  scale_color_gradient(high="green",low="red", 
                       name="Salario")

By analyzing this graph we can observe that as bigger as the salary is, the investment flow grows.

detach("package:dplyr", unload = TRUE)
## Warning: 'dplyr' namespace cannot be unloaded:
##   namespace 'dplyr' is imported by 'recipes', 'tidyr', 'dlookr' so cannot be unloaded
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:car':
## 
##     recode
## The following objects are masked from 'package:Hmisc':
## 
##     src, summarize
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
dataset_cor<-dataset %>% select(-Exportaciones, -IED_Flujos)

corrplot(cor(dataset_cor),type='upper',order='hclust',addCoef.col='black',
         tl.col = "black", tl.srt = 80, number.cex = 0.4)

“Inseguridad_Robo” has a negative correlation with the investment flow. “Tipo_de_Cambio”,“Densidad_Carretera”,“Densidad_Población”, and “Education” seem to have the highest correlation values.

# Histogram of dependent variable

hist(dataset$IED_Flujos_MXN)

hist(dataset$Exportaciones_MXN)

hist(log(dataset$Exportaciones_MXN))

hist(dataset$Tipo_de_Cambio)

hist(log(dataset$Tipo_de_Cambio))
hist(log(dataset$Tipo_de_Cambio))

hist(dataset$Educación)

hist(log(dataset$Educación))

hist(dataset$Salario_Diario)

hist(log(dataset$Salario_Diario))

hist(dataset$Innovación)

hist(log(dataset$Innovación))

hist(dataset$Inseguridad_Robo)

hist(log(dataset$Inseguridad_Robo))

#### 2. Which is the estimation method to be used to estimate the linear regression model? Ordinary Least Squares Estimation

Part 4 – Linear Regression Analysis

Based on EDA, formulate and describe 3 hypotheses

Hypothesis 1 H0:There is no significant association between the Innovation (Inovación) and the Foreign Investment Flow (IED_Flujos_MXN). H1: There is a significant association between the Innovation (Inovación) and the Foreign Investment Flow (IED_Flujos_MXN). Hypothesis 2 H0:The daily minimum wage in pesos(Salario_Diario) has a positive impact in the Foreign Investment Flow (IED_Flujos) . H1:The daily minimum wage in pesos (Salario_Diario) does not have a positive impact in the Foreign Investment Flow (IED_Flujos). Hypothesis 3 H0:The growth in the Average education years (Educación) does not have a negative effect in Foreign investment Flow (IED_Flujos_MXN). H1:The growth in the Average education years (Educación) has a negative effect in Foreign investment Flow (IED_Flujos_MXN).

### Estimate 3 different linear regression models.

# Model 1

model1<-lm(log(IED_Flujos_MXN) ~ log(Educación)+Salario_Diario+log(Tipo_de_Cambio)+Innovación+Inseguridad_Robo,data=dataset)
summary(model1)
## 
## Call:
## lm(formula = log(IED_Flujos_MXN) ~ log(Educación) + Salario_Diario + 
##     log(Tipo_de_Cambio) + Innovación + Inseguridad_Robo, data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34682 -0.05638  0.01831  0.08234  0.37679 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          8.4477467  1.6353827   5.166  4.7e-05 ***
## log(Educación)       1.3618792  1.1281557   1.207   0.2414    
## Salario_Diario       0.0010122  0.0027752   0.365   0.7191    
## log(Tipo_de_Cambio)  0.1697680  0.4604660   0.369   0.7162    
## Innovación           0.1024386  0.0470436   2.178   0.0416 *  
## Inseguridad_Robo    -0.0007032  0.0011682  -0.602   0.5540    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1994 on 20 degrees of freedom
## Multiple R-squared:  0.6843, Adjusted R-squared:  0.6053 
## F-statistic: 8.669 on 5 and 20 DF,  p-value: 0.0001682
# DIAGNOSIS TESTS 1

vif(model1)
##      log(Educación)      Salario_Diario log(Tipo_de_Cambio)          Innovación 
##            5.259843            6.223420           11.462719            1.600779 
##    Inseguridad_Robo 
##            1.949690
bptest(model1)
## 
##  studentized Breusch-Pagan test
## 
## data:  model1
## BP = 5.6171, df = 5, p-value = 0.3453
histogram(model1$residuals)

AIC(model1)
## [1] -2.883926
# RMSE

cat("RMSE:",RMSE(model1$fitted.values,dataset$IED_Flujos_MXN))
## RMSE: 513343

Model 1: • Innovation is the variable with the most statistically significant value (95%) • Adjusted R^2 has a value of .603, which means that this model explains 60% of the relationship of the independent variables with the foreign investment flows. • According too the VIF test “Tipo_de_Cambio” present multicollinearity (VIF>10)

# Model 2

model2<-lm(log(IED_Flujos_MXN) ~  I(Educación^2) + Salario_Diario + log(Innovación) +
             log(Inseguridad_Robo) + I(Tipo_de_Cambio^2) + Crisis_Financiera, data = dataset)
summary(model2)
## 
## Call:
## lm(formula = log(IED_Flujos_MXN) ~ I(Educación^2) + Salario_Diario + 
##     log(Innovación) + log(Inseguridad_Robo) + I(Tipo_de_Cambio^2) + 
##     Crisis_Financiera, data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34099 -0.05821  0.00056  0.06975  0.39241 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)    
## (Intercept)            9.5452750  2.3803948   4.010 0.000749 ***
## I(Educación^2)         0.0092675  0.0059392   1.560 0.135165    
## Salario_Diario        -0.0001260  0.0027265  -0.046 0.963632    
## log(Innovación)        1.4263237  0.6175989   2.309 0.032319 *  
## log(Inseguridad_Robo) -0.1800064  0.2342055  -0.769 0.451589    
## I(Tipo_de_Cambio^2)    0.0006970  0.0008515   0.819 0.423169    
## Crisis_Financiera     -0.1571136  0.1490812  -1.054 0.305156    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1966 on 19 degrees of freedom
## Multiple R-squared:  0.7085, Adjusted R-squared:  0.6164 
## F-statistic: 7.696 on 6 and 19 DF,  p-value: 0.0002691
#DIAGNOSIS TESTS 2

vif(model2)
##        I(Educación^2)        Salario_Diario       log(Innovación) 
##              2.942890              6.180613              1.656510 
## log(Inseguridad_Robo)   I(Tipo_de_Cambio^2)     Crisis_Financiera 
##              2.111939              7.207039              1.061785
bptest(model2)
## 
##  studentized Breusch-Pagan test
## 
## data:  model2
## BP = 5.2933, df = 6, p-value = 0.5068
histogram(model2$residuals)

AIC(model2)
## [1] -2.958056
# RMSE

cat("RMSE:",RMSE(model2$fitted.values,dataset$IED_Flujos_MXN))
## RMSE: 513343

Model 2: • Innovation is the variable with the most statistically significant value (95%) • Adjusted R^2 has a value of .6164, which means that this model explains 61.64% of the relationship of the independent variables with the foreign investment flows. • According too the VIF test this model does not present multicollinearity (VIF<10) •“Salario_Diario”, “Inseguridad_Robo”, and “Crisis_Financiera” have a negattive impact in the dependent variable.

# Model 3
model3<-lm(log(IED_Flujos_MXN) ~ log(Educación^2) + I(Tipo_de_Cambio^2) + Salario_Diario + log(Innovación) + Inseguridad_Robo + Crisis_Financiera, data = dataset)
summary(model3)
## 
## Call:
## lm(formula = log(IED_Flujos_MXN) ~ log(Educación^2) + I(Tipo_de_Cambio^2) + 
##     Salario_Diario + log(Innovación) + Inseguridad_Robo + Crisis_Financiera, 
##     data = dataset)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34802 -0.06981  0.00206  0.07454  0.39740 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)   
## (Intercept)          7.1827994  2.0827260   3.449  0.00269 **
## log(Educación^2)     0.5891649  0.4157956   1.417  0.17268   
## I(Tipo_de_Cambio^2)  0.0008428  0.0008241   1.023  0.31929   
## Salario_Diario      -0.0005940  0.0025985  -0.229  0.82163   
## log(Innovación)      1.3488115  0.6049355   2.230  0.03803 * 
## Inseguridad_Robo    -0.0011749  0.0011821  -0.994  0.33275   
## Crisis_Financiera   -0.1706405  0.1470412  -1.160  0.26022   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1936 on 19 degrees of freedom
## Multiple R-squared:  0.7174, Adjusted R-squared:  0.6281 
## F-statistic: 8.037 on 6 and 19 DF,  p-value: 0.0002049
plot(effect("I(Tipo_de_Cambio^2)", model3))

plot(effect("log(Educación^2)", model3))

# Diagnosis Tests 3

# Applying the VIF test we can see the variables don't present multicollinearity between them

vif(model3)
##    log(Educación^2) I(Tipo_de_Cambio^2)      Salario_Diario     log(Innovación) 
##            3.032954            6.962937            5.790208            1.639236 
##    Inseguridad_Robo   Crisis_Financiera 
##            2.118541            1.065396
bptest(model3)
## 
##  studentized Breusch-Pagan test
## 
## data:  model3
## BP = 5.6662, df = 6, p-value = 0.4616
histogram(model3$residuals)

AIC(model3)
## [1] -3.7628
# RMSE

cat("RMSE:",RMSE(model3$fitted.values,dataset$IED_Flujos_MXN))
## RMSE: 513343

Model 3:

• Innovation is the variable with the most statistically significant value (95%) • Adjusted R^2 has a value of .6281, which means that this model explains 62.81% of the relationship of the independent variables with the foreign investment flows. • According too the VIF test this model does not present multicollinearity (VIF<10) •“Salario_Diario”, “Inseguridad_Robo”, and “Crisis_Financiera” have a negattive impact in the dependent variable. • This model has the lowest AIC value, for this reason it is the most precise model

plot_summs(model1,model2,model3,scale=TRUE)
## Registered S3 methods overwritten by 'broom':
##   method            from  
##   tidy.glht         jtools
##   tidy.summary.glht jtools
## Loading required namespace: broom.mixed
## Loading required namespace: broom.mixed
## Loading required namespace: broom.mixed

# SELECTING MODEL
# Although model 1 has the lowest value of AIC, it presents multicollinearity among its variables, for this,the selected model is MODEL3

AIC(model1,model2,model3)
##        df       AIC
## model1  7 -2.883926
## model2  8 -2.958056
## model3  8 -3.762800

The chosen model is model 3, it has the lowest AIC value, shich means it describes better the relationship between bias and variance in the model.

# Presenting OLS Regression Results of our selected model
stargazer(model3,type="text",title="Regression Results",single.row=TRUE,ci=FALSE,ci.level=0.9) ### present OLS Regression results in text format 
## 
## Regression Results
## ===============================================
##                         Dependent variable:    
##                     ---------------------------
##                         log(IED_Flujos_MXN)    
## -----------------------------------------------
## log(Educación2)            0.589 (0.416)       
## I(Tipo_de_Cambio2)         0.001 (0.001)       
## Salario_Diario            -0.001 (0.003)       
## log(Innovación)           1.349** (0.605)      
## Inseguridad_Robo          -0.001 (0.001)       
## Crisis_Financiera         -0.171 (0.147)       
## Constant                 7.183*** (2.083)      
## -----------------------------------------------
## Observations                    26             
## R2                             0.717           
## Adjusted R2                    0.628           
## Residual Std. Error       0.194 (df = 19)      
## F Statistic            8.037*** (df = 6; 19)   
## ===============================================
## Note:               *p<0.1; **p<0.05; ***p<0.01
### Detecting Heteroscedasticity
### Display regression residuals vs fitted values to detect heteroscedasticity 

residual <- resid(model3)
valAdjusted <- fitted(model3)
plot(valAdjusted, residual,
     xlab = "Fitted Values",
     ylab = "Residuals",
     main = "Residuals vs. Fitted Plot")
abline(h = 0, col = "blue", lty = 2)

Lasso Regression Analysis

### Split the Data in Training Data vs Test Data
# Lets randomly split the data into train and test set
set.seed(123)                                  ### sets the random seed for reproducibility of results
training.samples<-dataset$IED_Flujos_MXN %>%
  createDataPartition(p=0.75,list=FALSE)       ### Lets consider 75% of the data to build a predictive model

train.data<-dataset[training.samples, ]  ### training data to fit the linear regression model 
test.data<-dataset[-training.samples, ]  ### testing data to test the linear regression model
# LASSO regression via glmnet package can only take numerical observations. Then, the dataset is transformed to model.matrix() format. 
# Independent variables
x<-model.matrix(log(IED_Flujos_MXN) ~ log(Educación^2) + I(Tipo_de_Cambio^2) + Salario_Diario + log(Innovación) + Inseguridad_Robo + Crisis_Financiera,train.data)[,-1] ### OLS model specification
# x<-model.matrix(Weekly_Sales~.,train.data)[,-1] ### matrix of independent variables X's
y<-train.data$IED_Flujos_MXN ### dependent variable 


# Cross-validation ensures that every data / observation from the original dataset (datains) has a chance of appearing in train and test datasets.
# Find the best lambda using cross-validation.
set.seed(123) 
cv.lasso<-cv.glmnet(x,y,alpha=1) # alpha = 1 for LASSO
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
## fold
# Display the best lambda value
cv.lasso$lambda.min
## [1] 12963.74
# Fit the final model on the training data
lassomodel<-glmnet(x,y,alpha=1,lambda=cv.lasso$lambda.min)

# Display regression coefficients
coef(lassomodel)
## 7 x 1 sparse Matrix of class "dgCMatrix"
##                                s0
## (Intercept)         -1642194.3084
## log(Educación^2)       72608.5823
## I(Tipo_de_Cambio^2)      615.3063
## Salario_Diario             .     
## log(Innovación)       655909.9264
## Inseguridad_Robo           .     
## Crisis_Financiera          .
# Make predictions on the test data
x.test<-model.matrix(log(IED_Flujos_MXN) ~ log(Educación^2) + I(Tipo_de_Cambio^2) + Salario_Diario + log(Innovación) + Inseguridad_Robo + Crisis_Financiera,test.data)[,-1] ### OLS model specification
# x.test<-model.matrix(Weekly_Sales~.,test.data)[,-1]
lassopredictions <- lassomodel %>% predict(x.test) %>% as.vector()

# Model Accuracy
data.frame(
  RMSE = RMSE(lassopredictions, test.data$IED_Flujos_MXN),
  Rsquare = R2(lassopredictions, test.data$IED_Flujos_MXN))
##       RMSE   Rsquare
## 1 148288.6 0.4261465
### visualizing lasso regression results 
lbs_fun <- function(fit, offset_x=1, ...) {
  L <- length(fit$lambda)
  x <- log(fit$lambda[L])+ offset_x
  y <- fit$beta[, L]
  labs <- names(y)
  text(x, y, labels=labs, ...)
}

lasso<-glmnet(scale(x),y,alpha=1)

plot(lasso,xvar="lambda",label=T)
lbs_fun(lasso)
abline(v=cv.lasso$lambda.min,col="red",lty=2)
abline(v=cv.lasso$lambda.1se,col="blue",lty=2)

Ridge Regression Analysis

# Find the best lambda using cross-validation
set.seed(123)                           # x: independent variables | y: dependent variable 
cv.ridge <- cv.glmnet(x,y,alpha=0.1)      # alpha = 0 for RIDGE
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
## fold
# Display the best lambda value
cv.ridge$lambda.min                     # lambda: a numeric value defining the amount of shrinkage. Why min? the higher the value of ?? , the more penalization there is
## [1] 2162.483
# Fit the final model on the training data
ridgemodel<-glmnet(x,y,alpha=0,lambda=cv.ridge$lambda.min)

# Display regression coefficients
coef(ridgemodel)
## 7 x 1 sparse Matrix of class "dgCMatrix"
##                                s0
## (Intercept)         -1763479.0597
## log(Educación^2)       62817.7395
## I(Tipo_de_Cambio^2)      804.8380
## Salario_Diario          -587.0118
## log(Innovación)       739704.6514
## Inseguridad_Robo        -274.9666
## Crisis_Financiera     -51648.6369
# Make predictions on the test data
x.test<-model.matrix(log(IED_Flujos_MXN) ~ log(Educación^2) + I(Tipo_de_Cambio^2) + Salario_Diario + log(Innovación) + Inseguridad_Robo + Crisis_Financiera,test.data)[,-1]
ridgepredictions<-ridgemodel %>% predict(x.test) %>% as.vector()

# Model Accuracy
data.frame(
  RMSE = RMSE(ridgepredictions, test.data$IED_Flujos_MXN),
  Rsquare = R2(ridgepredictions, test.data$IED_Flujos_MXN)
)
##       RMSE   Rsquare
## 1 154101.5 0.4280225
### visualizing ridge regression results 

ridge<-glmnet(scale(x),y,alpha=0)

plot(ridge, xvar = "lambda", label=T)
lbs_fun(ridge)
abline(v=cv.ridge$lambda.min, col = "red", lty=2)
abline(v=cv.ridge$lambda.1se, col="blue", lty=2)

tab <- matrix(c(17587,6324,6322,0.77,0.71,0.71), ncol=2, byrow=FALSE)
colnames(tab) <- c('RMSE','R2')
rownames(tab) <- c('Linear Regression','Lasso','Ridge')
tab <- as.table(tab)

Interpret the regression results of selected regression model.

The selected model is model3, it is demonstrated that it the one that best explains the relationship and impact that the independent variables have in the dependent variable. • As Education increases in 1 unit, IED_Flujos_MXN decreases 0.58 units. • As Innovation increases in 1 unit, IED_Flujos_MXN increases in 1.135 units. • Minimum Wage, Insecurity, and Financial crisis have negative correlation with the Foreign Investment flow. This model explains 62.81% of the relationship between the independent variables and the Foreign Investment Flow.

Part 5 – Conclusions

• To compare the results and choose the best regression model that fits the data, let’s look at the two options provided. Option 1 uses Ridge regression. In this option, the best lambda value (the amount of shrinkage) is determined using cross validation. The lambda value is 2162.483. The final model is fitted using this lambda value and the regression coefficients are displayed. The accuracy of the model is evaluated using the RMSE (root mean square error) and R-square values. In this case, the RMSE is 154101.5 and the R-squared value is 0.4280225. One of the options that we must analyze is to use the polynomial regression that was calculated within model 3. The model formula includes the independent variables: Education, Exchange Rate, Daily Wage, Innovation, Theft Insecurity and Financial Crisis. Coefficients, standard errors, t-values, and p-values ​​for each variable are shown. Model precision is assessed using residual standard error, multiple R-squared, adjusted R-squared, and F-statistic. In this case, the residual standard error is 0.1936, the multiple R-square is 0.7174, and the adjusted R-square is 0.6281.

• If Mexico wants to keep attracting new investors, they should get a focus in Innovation, in this model it isproven that is a attractive variable for Nearshoring investors.

• Insecurity due to theft is one of the variables that has a negative impact in the attraction of new foreign investors, which is an area of oportunity in Mexico.

• The variable of averag Education has a positive impact in the attraction of new foreign investors, we can interpretate that as bigger as the average education level is in Mexico, more foreign business will consider investing in the relocation of their production centers in the country.

####Show the predicted values of the dependent variable

effect_plot(model3, pred=Innovación, dataset=data, interval=TRUE)
## Using data dataset from global environment. This could cause incorrect
## results if dataset has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.

effect_plot(model3, pred=Educación, dataset=data, interval=TRUE)
## Using data dataset from global environment. This could cause incorrect
## results if dataset has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.

effect_plot(model3, pred=Salario_Diario, dataset=data, interval=TRUE)
## Using data dataset from global environment. This could cause incorrect
## results if dataset has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.

effect_plot(model3, pred=Inseguridad_Robo, dataset=data, interval=TRUE)
## Using data dataset from global environment. This could cause incorrect
## results if dataset has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.

Referencias

•BBC News Mundo. (2017, 10 enero). Cómo se beneficia Estados Unidos fabricando productos en México. BBC News Mundo. https://www.bbc.com/mundo/noticias-38536136 •¿Qué es el análisis predictivo? - Explicación del análisis predictivo - AWS. (s. f.). Amazon Web Services, Inc. https://aws.amazon.com/es/what-is/predictive-analytics/ •Achilles. (2021, 15 septiembre). Deslocalización de la cadena de suministro: enfoque en los datos. Achilles. https://www.achilles.com/es/industry-insights/te-estas-planteando-optar-por-la-deslocalizacion-cercana-un-enfoque-basado-en-los-datos/#:~:text=Como%20parte%20de%20estos%20cambios,o%20de%20un%20pa%C3%ADs%20cercano. •Saucedo, D.(2023). Mexico and its Attractiveness for Nearshoring. Retrieved from: CIC: Centro Internacional de Casos, Instituto Tecnológico de Estudios Superiores Monterrey.