Evidence-1

Introduction to Econometrics (Gpo 103)

Professor: David Saucedo de la Fuente

Abraham Castañon Alfaro - A01747966

Libraries

library(foreign)
library(dplyr)        # data manipulation

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(forcats)      # to work with categorical variables
library(ggplot2)      # data visualization 
library(readr)        # read specific csv files
library(janitor)      # data exploration and cleaning

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(Hmisc)        # several useful functions for data analysis

## 
## Attaching package: 'Hmisc'

## The following objects are masked from 'package:dplyr':
## 
##     src, summarize

## The following objects are masked from 'package:base':
## 
##     format.pval, units

library(psych)        # functions for multivariate analysis

## 
## Attaching package: 'psych'

## The following object is masked from 'package:Hmisc':
## 
##     describe

## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha

library(naniar)       # summaries and visualization of missing values NAs
library(dlookr)       # summaries and visualization of missing values NAs

## 
## Attaching package: 'dlookr'

## The following object is masked from 'package:psych':
## 
##     describe

## The following object is masked from 'package:Hmisc':
## 
##     describe

## The following object is masked from 'package:base':
## 
##     transform

library(corrplot)     # correlation plots

## corrplot 0.92 loaded

library(jtools)       # presentation of regression analysis

## 
## Attaching package: 'jtools'

## The following object is masked from 'package:Hmisc':
## 
##     %nin%

library(lmtest)       # diagnostic checks - linear regression analysis

## Loading required package: zoo

## 
## Attaching package: 'zoo'

## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(car)          # diagnostic checks - linear regression analysis

## Loading required package: carData

## 
## Attaching package: 'car'

## The following object is masked from 'package:psych':
## 
##     logit

## The following object is masked from 'package:dplyr':
## 
##     recode

library(olsrr)        # diagnostic checks - linear regression analysis

## 
## Attaching package: 'olsrr'

## The following object is masked from 'package:datasets':
## 
##     rivers

library(naniar)       # identifying missing values
library(stargazer)    # create publication quality tables

## 
## Please cite as:

##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.

##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer

library(effects)      # displays for linear and other regression models

## Registered S3 method overwritten by 'survey':
##   method      from  
##   summary.pps dlookr

## lattice theme set by effectsTheme()
## See ?effectsTheme for details.

library(tidyverse)    # collection of R packages designed for data science

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ✔ stringr   1.5.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ psych::%+%()       masks ggplot2::%+%()
## ✖ psych::alpha()     masks ggplot2::alpha()
## ✖ tidyr::extract()   masks dlookr::extract()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ car::recode()      masks dplyr::recode()
## ✖ purrr::some()      masks car::some()
## ✖ Hmisc::src()       masks dplyr::src()
## ✖ Hmisc::summarize() masks dplyr::summarize()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(caret)        # Classification and Regression Training

## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift

library(glmnet)       # methods for prediction and plotting, and functions for cross-validation

## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Loaded glmnet 4.1-7

library(readxl)      #Read the excel file
library(xts)

## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last

library(zoo)
library(tseries)

## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

library(stats)
library(forecast)
library(astsa)

## 
## Attaching package: 'astsa'
## 
## The following object is masked from 'package:forecast':
## 
##     gas
## 
## The following object is masked from 'package:psych':
## 
##     scatter.hist

library(corrplot)
library(AER)

## Loading required package: sandwich
## Loading required package: survival
## 
## Attaching package: 'survival'
## 
## The following object is masked from 'package:caret':
## 
##     cluster

library(vars)

## Loading required package: MASS
## 
## Attaching package: 'MASS'
## 
## The following object is masked from 'package:olsrr':
## 
##     cement
## 
## The following object is masked from 'package:dplyr':
## 
##     select
## 
## Loading required package: strucchange
## 
## Attaching package: 'strucchange'
## 
## The following object is masked from 'package:stringr':
## 
##     boundary
## 
## Loading required package: urca
## 
## Attaching package: 'vars'
## 
## The following object is masked from 'package:dlookr':
## 
##     normality

library(dynlm)
library(vars)
library(TSstudio)
library(sarima)

## Loading required package: stats4
## 
## Attaching package: 'sarima'
## 
## The following object is masked from 'package:astsa':
## 
##     sarima
## 
## The following object is masked from 'package:stats':
## 
##     spectrum

library(dygraphs)

Setting work directory

setwd("/Users/abrahamcast/Desktop/")

Importing Dataset (csv document)

data <-read_excel("SP_DataMexicoAtractiveness_alumn-VF_1.xlsx", na = '-')

P1.Background

Nearshoring:

It refers to the subcontracting that a company does to outsource parts of its production to third parties, which, even if they are in another country, are sought to be as close as possible, with similar time zones. All this is with the purpose of trying to reduce the risk by distributing their productions and suppliers to avoid keeping everything in the same place and that in the event of a crisis in the foreign country, they can react and mobilize. This process arose as a response to offshoring that sought to reduce production costs by going to other destinations no matter how far and without contemplating diversification, destinies like as Asia were selected; Resulting in situations such as those experienced during the COVID-19 pandemic.

In recent years, many countries that are world powers in the markets, such as the United States, have begun to transfer their productions from Asian countries to countries that are closer and have a better relationship. This means a large foreign investment for the country that is selected as a destination. However, there are many variables that affect the decision of companies and countries on where to establish themselves. Mexico has grown as a result of its proximity to the United States, thanks to the slow growth of recent years, lower labor costs, easy maritime, land, and air transfer of assets, the absence of presence in international conflicts, and the favorable exchange rate; Mexico has been the target mainly of several American companies that seek to transfer their production, but also of companies from European or even Asian countries.

Predictive analytics:

Predictive analysis employs historical data, machine learning and statistical procedures to establish probable outcomes or as the name says, predictive patterns that benefits to know the result of the topic that it is been investigate. Different industries can benefit from analyzing historical and real-time data and it can be made by different methods to know the forecast of the event or events you are interest in.

“Regression analysis is a statistical technique for determining the relationship between a single dependent (criterion) variable and one or more independent (predictor) variables.” (NCBI, 2009). The regression analysis method is based on the assumption that the relationship between the variables is dependent or casual. Is the most commonly used technique because of the reliability to identify the variables that have impact, other way to say it is which factors influence and which ones you can ignore to know the effect in a certain topic of investigation. So the use in predictive analysis is to identify which variables are going to affect your dependent variable based on the same correlation that already happened in the past.

Predictive analysis and Nearshoring in Mexico:

Once we know the context of both Nearshoring and the predictive analysis, we can make a relation between these two topics by answering the question of how regression analysis can help us to predict the occurrence of Nearshoring for Mexico. There is different ways that a regression model could be helpful:

Collect information of different key independent variables that could affect or influence the decision of foreign countries to invest in Mexico as a Nearshoring strategy.
Predict the possibility of Nearshoring in Mexico based on the historical data of the country and if the predictive model is competitive to other countries that already presented Nearshoring.

P2. Problem Situation

The problem situation involves that Maria an econometrics analyst from Mexico has the belief that with a database in which the socioeconomic, business environment, technological, environmental and security conditions of Mexico are included is possible to know if the country is attractive for nearshoring from a foreign company point of view.

P3.Data and Methodology

Exploratory Data Analysis

head(data)

## # A tibble: 6 × 18
##     Año IED_Flujos IED_Flujos_MXN Exportaciones Exportaciones_MXN Empleo
##   <dbl>      <dbl>          <dbl>         <dbl>             <dbl>  <dbl>
## 1  1997     12146.        294151.         9088.           220091.   NA  
## 2  1998      8374.        210876.         9875.           248691.   NA  
## 3  1999     13960.        299734.        10990.           235961.   NA  
## 4  2000     18249.        362632.        12483.           248057.   97.8
## 5  2001     30057.        546548.        11300.           205483.   97.4
## 6  2002     24099.        468332.        11923.           231708.   97.7
## # ℹ 12 more variables: Educación <dbl>, Salario_Diario <dbl>, Innovación <dbl>,
## #   Inseguridad_Robo <dbl>, Inseguridad_Homicidio <dbl>, Tipo_de_Cambio <dbl>,
## #   Densidad_Carretera <dbl>, Densidad_Población <dbl>, CO2_Emisiones <dbl>,
## #   PIB_Per_Cápita <dbl>, INPC <dbl>, Crisis_Financiera <dbl>

str(data)

## tibble [26 × 18] (S3: tbl_df/tbl/data.frame)
##  $ Año                  : num [1:26] 1997 1998 1999 2000 2001 ...
##  $ IED_Flujos           : num [1:26] 12146 8374 13960 18249 30057 ...
##  $ IED_Flujos_MXN       : num [1:26] 294151 210876 299734 362632 546548 ...
##  $ Exportaciones        : num [1:26] 9088 9875 10990 12483 11300 ...
##  $ Exportaciones_MXN    : num [1:26] 220091 248691 235961 248057 205483 ...
##  $ Empleo               : num [1:26] NA NA NA 97.8 97.4 ...
##  $ Educación            : num [1:26] 7.2 7.3 7.4 7.6 7.7 7.8 7.9 8 8.1 8.3 ...
##  $ Salario_Diario       : num [1:26] 24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovación           : num [1:26] 11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num [1:26] 267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num [1:26] 14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num [1:26] 8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num [1:26] 0.0521 0.053 0.055 0.0552 0.0565 0.0576 0.0596 0.0595 0.0625 0.0628 ...
##  $ Densidad_Población   : num [1:26] 47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num [1:26] 3.68 3.85 3.69 3.87 3.81 ...
##  $ PIB_Per_Cápita       : num [1:26] 127570 126739 129165 130875 128083 ...
##  $ INPC                 : num [1:26] 33.3 39.5 44.3 48.3 50.4 ...
##  $ Crisis_Financiera    : num [1:26] 0 0 0 0 0 0 0 0 0 0 ...

names(data)

##  [1] "Año"                   "IED_Flujos"            "IED_Flujos_MXN"       
##  [4] "Exportaciones"         "Exportaciones_MXN"     "Empleo"               
##  [7] "Educación"             "Salario_Diario"        "Innovación"           
## [10] "Inseguridad_Robo"      "Inseguridad_Homicidio" "Tipo_de_Cambio"       
## [13] "Densidad_Carretera"    "Densidad_Población"    "CO2_Emisiones"        
## [16] "PIB_Per_Cápita"        "INPC"                  "Crisis_Financiera"

Variable’s Name Description

FDI_Flows: Millions of Dollars Foreign Investment Flows Direct.
FDI_FLOWS_MXN: Millions of Pesos Foreign Investment Flows Direct.
Exports: Millions of Dollars Non-Oil Exports. The value of exports from the Maquiladora Export Industry is included.
Exports_MXN: Millions of Pesos Non-Oil Exports. The value of exports from the Maquiladora Export Industry is included.
Employment: Percentage Rate Percentage of the Employed Economically Active Population.
Education: Average Years of Years of Education.
Daily_Salary: Pesos Minimum salary in daily pesos.
Innovation: Patent rate per 100,000 inhabitants Number of patents applied for in Mexico.
Insecurity_Robbery: Robbery rate per 100,000 inhabitants Robbery with violence at home, vehicle, passers-by, carriers, banking institutions, businesses, livestock, machinery, auto parts, mainly.
Insecurity_Homicide: Homicide rate per 100,000 inhabitants Number of homicides.
Exchange_Type: Pesos per dollar FIX exchange rate.
Road_Density: Length in km2 Length of kilometers of paved road for each km2 of territorial surface.
Population_Density: Population per km2 The number of population is divided by the territorial extension of Mexico in km2.
CO2_Emissions: Metric Tons Per Capita Carbon Dioxide Emissions.
GDP_Per_Cápita: Real 2013 MXN Pesos Gross Domestic Product (GDP) divided by the population. Adjusted value for 2013 prices.
INPC: National Consumer Price Index (INPC) price index. Base 2018 = 100.
Finance Crisis: If that year there was a finance crisis.

Identify missing values

#Change the missing values
data <- data %>%
  mutate_all(~ifelse(is.na(.), median(., na.rm = TRUE), .))
sum(is.na(data))

## [1] 0

colSums(is.na(data))

##                   Año            IED_Flujos        IED_Flujos_MXN 
##                     0                     0                     0 
##         Exportaciones     Exportaciones_MXN                Empleo 
##                     0                     0                     0 
##             Educación        Salario_Diario            Innovación 
##                     0                     0                     0 
##      Inseguridad_Robo Inseguridad_Homicidio        Tipo_de_Cambio 
##                     0                     0                     0 
##    Densidad_Carretera    Densidad_Población         CO2_Emisiones 
##                     0                     0                     0 
##        PIB_Per_Cápita                  INPC     Crisis_Financiera 
##                     0                     0                     0

gg_miss_var(data)

Descriptive statistics (mean, median, standard deviation, minimum, maximum)

summary(data)

##       Año         IED_Flujos    IED_Flujos_MXN   Exportaciones  
##  Min.   :1997   Min.   : 8374   Min.   :210876   Min.   : 9088  
##  1st Qu.:2003   1st Qu.:21367   1st Qu.:368560   1st Qu.:13260  
##  Median :2010   Median :27698   Median :497054   Median :21188  
##  Mean   :2010   Mean   :26770   Mean   :493596   Mean   :23601  
##  3rd Qu.:2016   3rd Qu.:32183   3rd Qu.:578606   3rd Qu.:31601  
##  Max.   :2022   Max.   :48354   Max.   :754438   Max.   :46478  
##  Exportaciones_MXN     Empleo        Educación     Salario_Diario  
##  Min.   :205483    Min.   :95.06   Min.   :7.200   Min.   : 24.30  
##  1st Qu.:262337    1st Qu.:96.08   1st Qu.:7.925   1st Qu.: 41.97  
##  Median :366294    Median :96.53   Median :8.500   Median : 54.48  
##  Mean   :433856    Mean   :96.48   Mean   :8.450   Mean   : 65.16  
##  3rd Qu.:632356    3rd Qu.:97.01   3rd Qu.:8.975   3rd Qu.: 72.31  
##  Max.   :785654    Max.   :97.83   Max.   :9.600   Max.   :172.87  
##    Innovación    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio 
##  Min.   :11.28   Min.   :120.5    Min.   : 8.04         Min.   : 8.06  
##  1st Qu.:12.60   1st Qu.:148.3    1st Qu.:10.40         1st Qu.:10.75  
##  Median :13.09   Median :181.8    Median :16.93         Median :13.02  
##  Mean   :13.10   Mean   :185.4    Mean   :17.28         Mean   :13.91  
##  3rd Qu.:13.61   3rd Qu.:209.9    3rd Qu.:22.34         3rd Qu.:18.49  
##  Max.   :15.11   Max.   :314.8    Max.   :29.59         Max.   :20.66  
##  Densidad_Carretera Densidad_Población CO2_Emisiones   PIB_Per_Cápita  
##  Min.   :0.05210    Min.   :47.44      Min.   :3.592   Min.   :126739  
##  1st Qu.:0.05953    1st Qu.:52.77      1st Qu.:3.842   1st Qu.:130964  
##  Median :0.06990    Median :58.09      Median :3.925   Median :136846  
##  Mean   :0.07106    Mean   :57.33      Mean   :3.944   Mean   :138550  
##  3rd Qu.:0.08273    3rd Qu.:61.39      3rd Qu.:4.088   3rd Qu.:146148  
##  Max.   :0.09020    Max.   :65.60      Max.   :4.221   Max.   :153236  
##       INPC        Crisis_Financiera
##  Min.   : 33.28   Min.   :0.00000  
##  1st Qu.: 56.15   1st Qu.:0.00000  
##  Median : 73.35   Median :0.00000  
##  Mean   : 75.17   Mean   :0.07692  
##  3rd Qu.: 91.29   3rd Qu.:0.00000  
##  Max.   :126.48   Max.   :1.00000

summary(data$IED_Flujos_MXN)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  210876  368560  497054  493596  578606  754438

summary(log(data$IED_Flujos_MXN))

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.26   12.82   13.12   13.06   13.27   13.53

Measures of dispersion

describe(log(data))

## # A tibble: 18 × 26
##    described_variables       n    na    mean        sd    se_mean       IQR
##    <chr>                 <int> <int>   <dbl>     <dbl>      <dbl>     <dbl>
##  1 Año                      26     0    7.61   0.00381   0.000746   0.00622
##  2 IED_Flujos               26     0   10.1    0.385     0.0755     0.410  
##  3 IED_Flujos_MXN           26     0   13.1    0.317     0.0623     0.451  
##  4 Exportaciones            26     0    9.95   0.501     0.0983     0.869  
##  5 Exportaciones_MXN        26     0   12.9    0.447     0.0877     0.879  
##  6 Empleo                   26     0    4.57   0.00748   0.00147    0.00958
##  7 Educación                26     0    2.13   0.0831    0.0163     0.124  
##  8 Salario_Diario           26     0    4.06   0.480     0.0942     0.544  
##  9 Innovación               26     0    2.57   0.0819    0.0161     0.0771 
## 10 Inseguridad_Robo         26     0    5.19   0.244     0.0478     0.348  
## 11 Inseguridad_Homicidio    26     0    2.77   0.422     0.0828     0.765  
## 12 Tipo_de_Cambio           26     0    2.59   0.293     0.0575     0.541  
## 13 Densidad_Carretera       26     0   -2.66   0.189     0.0370     0.329  
## 14 Densidad_Población       26     0    4.04   0.0959    0.0188     0.151  
## 15 CO2_Emisiones            26     0    1.37   0.0460    0.00902    0.0620 
## 16 PIB_Per_Cápita           26     0   11.8    0.0635    0.0125     0.110  
## 17 INPC                     26     0    4.26   0.347     0.0680     0.486  
## 18 Crisis_Financiera        26     0 -Inf    NaN       NaN        NaN      
## # ℹ 19 more variables: skewness <dbl>, kurtosis <dbl>, p00 <dbl>, p01 <dbl>,
## #   p05 <dbl>, p10 <dbl>, p20 <dbl>, p25 <dbl>, p30 <dbl>, p40 <dbl>,
## #   p50 <dbl>, p60 <dbl>, p70 <dbl>, p75 <dbl>, p80 <dbl>, p90 <dbl>,
## #   p95 <dbl>, p99 <dbl>, p100 <dbl>

Data Visualization

Histogram of dependent variable

ggplot(data =  data, aes(IED_Flujos_MXN)) +
  geom_histogram(fill = 'lightgreen', color='black')+
  labs(x = 'Foreign Investment Flow in Pesos') +
  ggtitle('Foreign Investment Flow Histogram in MXN')

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(data, aes(x = log(IED_Flujos_MXN))) +
  geom_histogram(fill = "lightgreen", color="black" ) +
  labs(x = "Foreign Investment Flow in Pesos", y="Frecuency") +
  ggtitle("Foreign Investment Flow Histogram in Pesos")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Graphs to analyze variables

ggplot(data = data, aes(x = IED_Flujos, y = Exportaciones)) +
  geom_point() +
  labs(x = "FDI Flows MXN (Millions of Dollars)",
       y = "Exports MXN (Millions of Dollars)",
       title = "Scatter Plot: FDI Flows vs. Exports")

ggplot(data = data, aes(x = factor(Crisis_Financiera), y = IED_Flujos)) +
  geom_boxplot() +
  labs(x = "Finance Crisis",
       y = "FDI Flows MXN (Millions of Pesos)",
       title = "Box Plot: FDI Flows MXN by Finance Crisis")

ggplot(data =data, aes(x = Densidad_Población, y = IED_Flujos)) +
  geom_point() +                          
  geom_smooth(method = "lm", se = FALSE) + 
  labs(x = "Densidad de Población",
       y = "Flujos de Inversión Directa (Millones de Pesos)",
       title = "Gráfico de Regresión Lineal")

## `geom_smooth()` using formula = 'y ~ x'

data %>% 
  gather(key, val, -Año, -IED_Flujos, -Exportaciones) %>%
  ggplot(aes(x=val)) +
  geom_histogram(fill = "lightgreen", color="black" ) +
  facet_wrap(~key, scales = "free")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Display correlation plot

datacor<-data
corrplot(cor(datacor),method = "color",
         type = "full", order = "hclust", addCoef.col = "black",
         tl.col = "black", tl.srt = 90, diag = FALSE, number.cex = 0.5)

colnames(data)

##  [1] "Año"                   "IED_Flujos"            "IED_Flujos_MXN"       
##  [4] "Exportaciones"         "Exportaciones_MXN"     "Empleo"               
##  [7] "Educación"             "Salario_Diario"        "Innovación"           
## [10] "Inseguridad_Robo"      "Inseguridad_Homicidio" "Tipo_de_Cambio"       
## [13] "Densidad_Carretera"    "Densidad_Población"    "CO2_Emisiones"        
## [16] "PIB_Per_Cápita"        "INPC"                  "Crisis_Financiera"

Which is the estimation method to be used to estimate the linear regression model?

We learned in class the Ordinary Least Squares method. Describes the relationship between one or more quantitative independent variables and the dependent variable (simple or multiple linear regression). Least squares represents the least squared error (SSE).

P4.Regression Analysis

Hypothesis

- Ho: More Millions of Pesos Non-Oil Exports does not have a negative effect on the Millions of Pesos in Foreign Investment Flows.
- H1: More Millions of Pesos Non-Oil Exports has a negative effect on the Millions of Pesos in Foreign Investment Flows.
- Ho: Finance Crisis does not have a positive effect on the Millions of Pesos in Foreign Investment Flows.
- H1: Finance Crisis has a positive effect on the Millions of Pesos in Foreign Investment Flows.
- Ho: Population_Density does not have an effect on the Millions of Pesos in Foreign Investment Flows.
- H1: Population_Density have an effect on the Millions of Pesos in Foreign Investment Flows.

- MODEL 1

model1<-lm(log(IED_Flujos_MXN) ~Exportaciones_MXN+Salario_Diario+Tipo_de_Cambio+Densidad_Población+Densidad_Carretera+Educación+Inseguridad_Robo+Inseguridad_Homicidio+PIB_Per_Cápita,data=data)
summary(model1)

## 
## Call:
## lm(formula = log(IED_Flujos_MXN) ~ Exportaciones_MXN + Salario_Diario + 
##     Tipo_de_Cambio + Densidad_Población + Densidad_Carretera + 
##     Educación + Inseguridad_Robo + Inseguridad_Homicidio + PIB_Per_Cápita, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.33651 -0.09014 -0.02813  0.10138  0.39561 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            8.473e+00  3.547e+00   2.389   0.0296 *
## Exportaciones_MXN     -4.338e-06  2.790e-06  -1.555   0.1395  
## Salario_Diario        -3.538e-03  7.376e-03  -0.480   0.6380  
## Tipo_de_Cambio         1.051e-01  7.342e-02   1.432   0.1714  
## Densidad_Población    -1.883e-03  8.137e-02  -0.023   0.9818  
## Densidad_Carretera     3.975e+01  4.834e+01   0.822   0.4230  
## Educación             -4.126e-01  5.635e-01  -0.732   0.4747  
## Inseguridad_Robo      -2.631e-03  2.109e-03  -1.247   0.2302  
## Inseguridad_Homicidio  2.231e-03  2.288e-02   0.098   0.9235  
## PIB_Per_Cápita         4.662e-05  3.226e-05   1.445   0.1677  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2101 on 16 degrees of freedom
## Multiple R-squared:  0.7197, Adjusted R-squared:  0.562 
## F-statistic: 4.565 on 9 and 16 DF,  p-value: 0.004104

Diagnosis Test Model 1

vif(model1)

##     Exportaciones_MXN        Salario_Diario        Tipo_de_Cambio 
##             167.64883              39.61086              52.58801 
##    Densidad_Población    Densidad_Carretera             Educación 
##             109.81181             236.68622              86.80646 
##      Inseguridad_Robo Inseguridad_Homicidio        PIB_Per_Cápita 
##               5.72725              15.02041              46.29822

bptest(model1)

## 
##  studentized Breusch-Pagan test
## 
## data:  model1
## BP = 6.2615, df = 9, p-value = 0.7135

cat("AIC:", AIC(model1),"\n")

## AIC: 2.027688

selected_model1<-model1
cat("RMSE:",RMSE(selected_model1$fitted.values,data$IED_Flujos_MXN)) ### Root Mean Square

## RMSE: 513342.8

plot(model1)

hist(model1$residuals)

shapiro.test(model1$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model1$residuals
## W = 0.97442, p-value = 0.7393

- MODEL 2

model2<-lm(log(IED_Flujos_MXN) ~log(lag(IED_Flujos_MXN))+
log(Exportaciones_MXN)+Educación+I(Educación^2)+I(Exportaciones^2)+log(Tipo_de_Cambio)+log(Densidad_Población)+Crisis_Financiera+log(PIB_Per_Cápita)+log(Salario_Diario)+Inseguridad_Homicidio,data=data)
summary(model2)

## 
## Call:
## lm(formula = log(IED_Flujos_MXN) ~ log(lag(IED_Flujos_MXN)) + 
##     log(Exportaciones_MXN) + Educación + I(Educación^2) + I(Exportaciones^2) + 
##     log(Tipo_de_Cambio) + log(Densidad_Población) + Crisis_Financiera + 
##     log(PIB_Per_Cápita) + log(Salario_Diario) + Inseguridad_Homicidio, 
##     data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.38260 -0.07348  0.00691  0.07571  0.25273 
## 
## Coefficients:
##                            Estimate Std. Error t value Pr(>|t|)  
## (Intercept)              -5.891e+01  5.594e+01  -1.053   0.3115  
## log(lag(IED_Flujos_MXN)) -1.849e-01  2.466e-01  -0.750   0.4668  
## log(Exportaciones_MXN)   -3.356e+00  1.305e+00  -2.571   0.0233 *
## Educación                 4.893e+00  6.697e+00   0.731   0.4779  
## I(Educación^2)           -2.780e-01  3.695e-01  -0.752   0.4653  
## I(Exportaciones^2)        1.027e-09  8.358e-10   1.229   0.2408  
## log(Tipo_de_Cambio)       3.223e+00  1.277e+00   2.523   0.0255 *
## log(Densidad_Población)   3.068e+00  1.110e+01   0.276   0.7865  
## Crisis_Financiera        -1.627e-01  2.184e-01  -0.745   0.4696  
## log(PIB_Per_Cápita)       6.736e+00  4.563e+00   1.476   0.1637  
## log(Salario_Diario)      -1.223e+00  1.591e+00  -0.768   0.4560  
## Inseguridad_Homicidio     1.190e-03  2.026e-02   0.059   0.9541  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1913 on 13 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.792,  Adjusted R-squared:  0.616 
## F-statistic:   4.5 on 11 and 13 DF,  p-value: 0.006194

Diagnosis Test Model 2

vif(model2)

## log(lag(IED_Flujos_MXN))   log(Exportaciones_MXN)                Educación 
##                 4.140384               216.357363             12796.415174 
##           I(Educación^2)       I(Exportaciones^2)      log(Tipo_de_Cambio) 
##             11183.397028               169.220090                84.108497 
##  log(Densidad_Población)        Crisis_Financiera      log(PIB_Per_Cápita) 
##               653.472774                 2.399339                53.578935 
##      log(Salario_Diario)    Inseguridad_Homicidio 
##               344.646336                14.111623

bptest(model2)

## 
##  studentized Breusch-Pagan test
## 
## data:  model2
## BP = 11.176, df = 11, p-value = 0.4287

cat("AIC:", AIC(model2),"\n")

## AIC: -2.09908

selected_model2<-model2
cat("RMSE:",RMSE(selected_model2$fitted.values,data$IED_Flujos_MXN)) ### Root Mean Square

## Warning in pred - obs: longer object length is not a multiple of shorter object
## length

## RMSE: 513342.8

plot(model2)

hist(model2$residuals)

shapiro.test(model2$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model2$residuals
## W = 0.97045, p-value = 0.6565

- MODEL 3

model3<-lm(log(IED_Flujos_MXN) ~ log(lag(IED_Flujos_MXN))+
log(Exportaciones_MXN)+Tipo_de_Cambio+log(Densidad_Población)+Crisis_Financiera+log(PIB_Per_Cápita)+log(Salario_Diario)+Inseguridad_Homicidio,data=data)
summary(model3)

## 
## Call:
## lm(formula = log(IED_Flujos_MXN) ~ log(lag(IED_Flujos_MXN)) + 
##     log(Exportaciones_MXN) + Tipo_de_Cambio + log(Densidad_Población) + 
##     Crisis_Financiera + log(PIB_Per_Cápita) + log(Salario_Diario) + 
##     Inseguridad_Homicidio, data = data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.34465 -0.08638 -0.00315  0.10122  0.33533 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)   
## (Intercept)              -37.35332   18.16190  -2.057  0.05640 . 
## log(lag(IED_Flujos_MXN))  -0.20985    0.19890  -1.055  0.30708   
## log(Exportaciones_MXN)    -2.28851    0.65174  -3.511  0.00289 **
## Tipo_de_Cambio             0.15130    0.05195   2.912  0.01018 * 
## log(Densidad_Población)    6.82734    2.61992   2.606  0.01911 * 
## Crisis_Financiera         -0.17269    0.18002  -0.959  0.35168   
## log(PIB_Per_Cápita)        4.59791    2.05420   2.238  0.03977 * 
## log(Salario_Diario)       -0.31766    0.40823  -0.778  0.44784   
## Inseguridad_Homicidio     -0.01238    0.01047  -1.182  0.25440   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1775 on 16 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.7797, Adjusted R-squared:  0.6696 
## F-statistic: 7.079 on 8 and 16 DF,  p-value: 0.000477

Diagnosis Test Model 3

vif(model3)

## log(lag(IED_Flujos_MXN))   log(Exportaciones_MXN)           Tipo_de_Cambio 
##                 3.129644                62.660354                33.852695 
##  log(Densidad_Población)        Crisis_Financiera      log(PIB_Per_Cápita) 
##                42.322720                 1.893510                12.615028 
##      log(Salario_Diario)    Inseguridad_Homicidio 
##                26.364516                 4.379455

bptest(model3)

## 
##  studentized Breusch-Pagan test
## 
## data:  model3
## BP = 9.3066, df = 8, p-value = 0.3171

cat("AIC:", AIC(model3),"\n")

## AIC: -6.661234

selected_model3<-model3
cat("RMSE:",RMSE(selected_model3$fitted.values,data$IED_Flujos_MXN)) ### Root Mean Square

## RMSE: 513342.8

plot(model3)

hist(model3$residuals)

shapiro.test(model3$residuals)

## 
##  Shapiro-Wilk normality test
## 
## data:  model3$residuals
## W = 0.97629, p-value = 0.8032

effect_plot(model3,pred=Densidad_Población, data=data, interval=TRUE)

effect_plot(model3,pred=Exportaciones_MXN, data=data, interval=TRUE)

effect_plot(model3,pred=PIB_Per_Cápita, data=data, interval=TRUE)

Extra

Autocorrelation

acf(data$IED_Flujos_MXN, main = "Autocorrelation")

Box.test(model3$residuals, lag=5, type="Ljung-Box")

## 
##  Box-Ljung test
## 
## data:  model3$residuals
## X-squared = 4.068, df = 5, p-value = 0.5397

Selection of best regression model

Model selected: Model 3

According to the results of the analysis, model 3 is the one that best fits. It is the model with the lowest AICM of -6.661234, even when the multiple R-squared is the second lowest (model 2 has a higher one),it has the best Adjusted R-squared: 0.6696 and the three models have the same RMSE of 513342.8. Nevertheless, this model present the lowest cases of multicollinearity (which can be checked with the results of the Vif test in every diagnosis test). Also the residuals have a major normality (Shapiro-Wilk Test) and even when the model 2 has a higher Breusch Pagan Test p-value (when the p-value is higher there is more chance of homoscedasticity); The model 3 has a better overrall results to be choose, the difference between multiple R-squared is only approximately 0.1 lower so it really does not exists a big difference unlike the Vif tests where the difference is abysmal and this can deteriorate the precision of the model. Then we can see with the effect plots that in fact with the effect plots the effect of the three biggest independent varibale are correct, Population density affects 6 dependent variable, Exports affect in a negative by -2 units the dependent variable and PIB Per Cápita affects 4.6 units the dependent variable. And at last with the autocorrelation graph and test we can see that p-value > 0,05 which means there is in fact autocorrelation.

P5.Conclusions

Insights

Now that the data analysis has been finished. The insights can be already listed:

The top three variables that have the biggest effect in the depentent variable (FDI_FLOWS_MXN) are PIB Per Cápita, Population density and Exports.
Exports: When a country exports a lot and also does not imports as much as it exports, it that the country has a big influence in the offer of certain product which means a lost when this offer increase.Also a country that exports in big quantity might be vulnerable to global trends which goes totally against the nearshoring idea.
PIB Per Cápita: A solid and big PIB Per Cápita means that there is more money to build buildings, houses or buy machinery and that more goods and services will be produced. This is beneficial for all because there will be more employment and more opportunities to do business. So the foreign company has a lower risk arriving to Mexico.
Population density:A dense population may represent a larger potential market for the products and services of foreign companies. Companies may be more willing to invest in countries with a significant population, as this increases the consumer base for their products. Also in labor resources a higher population density can mean greater availability of skilled and unskilled labor. This can be attractive to foreign companies seeking access to a large and diverse workforce.

References

Arriaga, C.(2017). Inversión extranjera directa en México: comparación entre la inversión procedente de los Estados Unidos y del resto del mundo. Recovered from: https://www.scielo.org.mx/scielo.php?script=sci_arttext&pid=S0185-013X2017000200317

Dabla-Norris, E. Duval, R.(2016). La reducción de las barreras comerciales puede reactivar la productividad y el crecimiento mundial. Recovered from: https://www.imf.org/es/Blogs/Articles/2016/06/20/how-lowering-trade-barriers-can-revive-global-productivity-and-growth

León, Juan.(2010). ECONOMÍA APLICADA. Recovered from: https://economia.unmsm.edu.pe/org/arch_doc/JLeonM/publ/Interiores_Economia_Aplicada.pdf

Palmer, P.(2009). Regression Analysis for Prediction: Understanding the Process. Recovered from: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2845248/#:~:text=In%20most%20cases%2C%20the%20investigators,more%20independent%20(predictor)%20variables.

Saucedo, D.(2023). Mexico and Its Attractiveness for Nearshoring. Recovered from: https://cic.itesm.mx/Paginas/Pagina-DocumentoCic.aspx?id=1860