Part. 1 Introduction

Time series analysis is a statistical method used to analyze and interpret data points collected over time, typically at regular intervals. It focuses on understanding and modeling the patterns, trends, and dependencies within a time-ordered dataset. Time series data can be found in various fields, including finance, economics, weather forecasting, and more, making it a crucial tool for forecasting future values and making informed decisions.

One widely recognized reference for time series analysis is the book “Time Series Analysis and Its Applications: With R Examples” by Robert H. Shumway and David S. Stoffer. This book provides a comprehensive introduction to time series analysis, covering topics such as data decomposition, trend identification, seasonality detection, and various statistical methods for modeling and forecasting time series data. Time series analysis plays a pivotal role in understanding historical patterns, making predictions, and informing decision-making processes across numerous domains.

Part 2. Background

Sustainability: Sustainability and environmental considerations were gaining importance in nearshoring decisions, with companies looking for partners who embraced eco-friendly practices.

Part 3. Problem Situation

The problem situation seen in the Case Study is first of all defining why Mexico is an attractive country for nearhsoring, as the title of the Case Study states but as well the problem situation is to show which econometric model(s) should be applied to predict the effect of nearshoring in Mexico therefore seeing what investors may consider when relocating their investments ot Mexico in 2023 and in the future years.

This problem situation will be approached by analizing the case study and some simple background information about the topic in question. Next the data base provided will be analyzed and tested with R studio, determining our depedent variable which is “Flujos de Inversion Extranjera Directa” and lastly do different tests and models to reach and predict the effect of nearshoring in Mexico, seeing which variable impacts the most the dependent variable reaching a conclusion and an accurate interpratation of the results.


Part 4. Data and Methodology

Import libraries

library(foreign)
library(dplyr)        # data manipulation 
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(forcats)      # to work with categorical variables
library(ggplot2)      # data visualization 
library(readr)        # read dfecific csv files
library(janitor)      # data exploration and cleaning 
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(Hmisc)        # several useful functions for data analysis 
## 
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
## 
##     src, summarize
## The following objects are masked from 'package:base':
## 
##     format.pval, units
library(psych)        # functions for multivariate analysis 
## 
## Attaching package: 'psych'
## The following object is masked from 'package:Hmisc':
## 
##     describe
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(naniar)       # summaries and visualization of missing values NA's
library(corrplot)     # correlation plots
## corrplot 0.92 loaded
library(jtools)       # presentation of regression analysis 
## 
## Attaching package: 'jtools'
## The following object is masked from 'package:Hmisc':
## 
##     %nin%
library(lmtest)       # diagnostic checks - linear regression analysis 
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
library(car)          # diagnostic checks - linear regression analysis
## Loading required package: carData
## 
## Attaching package: 'car'
## The following object is masked from 'package:psych':
## 
##     logit
## The following object is masked from 'package:dplyr':
## 
##     recode
library(olsrr)        # diagnostic checks - linear regression analysis 
## 
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
## 
##     rivers
library(naniar)       # identifying missing values
library(stargazer)    # create publication quality tables
## 
## Please cite as:
##  Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
##  R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(effects)      # didflays for linear and other regression models
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(tidyverse)    # collection of R packages designed for data science
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ✔ stringr   1.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ psych::%+%()       masks ggplot2::%+%()
## ✖ psych::alpha()     masks ggplot2::alpha()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ car::recode()      masks dplyr::recode()
## ✖ purrr::some()      masks car::some()
## ✖ Hmisc::src()       masks dplyr::src()
## ✖ Hmisc::summarize() masks dplyr::summarize()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(caret)        # Classification and Regression Training 
## Loading required package: lattice
## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift
library(glmnet)       # methods for prediction and plotting, and functions for cross-validation
## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## Loaded glmnet 4.1-7
library(xts)
## 
## ######################### Warning from 'xts' package ##########################
## #                                                                             #
## # The dplyr lag() function breaks how base R's lag() function is supposed to  #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or       #
## # source() into this session won't work correctly.                            #
## #                                                                             #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop           #
## # dplyr from breaking base R's lag() function.                                #
## #                                                                             #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning.  #
## #                                                                             #
## ###############################################################################
## 
## Attaching package: 'xts'
## 
## The following objects are masked from 'package:dplyr':
## 
##     first, last
library(tseries)
## Registered S3 method overwritten by 'quantmod':
##   method            from
##   as.zoo.data.frame zoo

Upload database

#file.choose()
df <- read.csv("/Users/genarorodriguezalcantara/Desktop/Tec/Introduction to econometrics/Ev1/ev2_data.csv")
df # Confirm that the data base has been uploaded correctly. 
##    periodo IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## 1     1997   12145.60       9087.62     NA      7.20          24.30      11.30
## 2     1998    8373.50       9875.07     NA      7.31          31.91      11.37
## 3     1999   13960.32      10990.01     NA      7.43          31.91      12.46
## 4     2000   18248.69      12482.96  97.83      7.56          35.12      13.15
## 5     2001   30057.18      11300.44  97.36      7.68          37.57      13.47
## 6     2002   24099.21      11923.10  97.66      7.80          39.74      12.80
## 7     2003   18249.97      13156.00  97.06      7.93          41.53      11.81
## 8     2004   25015.57      13573.13  96.48      8.04          43.30      12.61
## 9     2005   25795.82      16465.81  97.17      8.14          45.24      13.41
## 10    2006   21232.54      17485.93  96.53      8.26          47.05      14.23
## 11    2007   32393.33      19103.85  96.60      8.36          48.88      15.04
## 12    2008   29502.46      16924.76  95.68      8.46          50.84      14.82
## 13    2009   17849.95      19702.63  95.20      8.56          53.19      12.59
## 14    2010   27189.28      22673.14  95.06      8.63          55.77      12.69
## 15    2011   25632.52      24333.02  95.49      8.75          58.06      12.10
## 16    2012   21769.32      26297.98  95.53      8.85          60.75      13.03
## 17    2013   48354.42      27687.57  95.75      8.95          63.12      13.22
## 18    2014   30351.25      31676.78  96.24      9.05          65.58      13.65
## 19    2015   35943.75      29959.94  96.04      9.15          70.10      15.11
## 20    2016   31188.98      31375.06  96.62      9.25          73.04      14.40
## 21    2017   34017.05      33322.62  96.85      9.35          88.36      14.05
## 22    2018   34100.43      35341.90  96.64      9.45          88.36      13.25
## 23    2019   34577.16      36414.73  97.09      9.58         102.68      12.70
## 24    2020   28205.89      41077.34  96.21        NA         123.22      11.28
## 25    2021   31553.52      44914.78  96.49        NA         141.70         NA
## 26    2022   36215.37      46477.59  97.24        NA         172.87         NA
##    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera
## 1            266.51                 14.55           8.06               0.05
## 2            314.78                 14.32           9.94               0.05
## 3            272.89                 12.64           9.52               0.06
## 4            216.98                 10.86           9.60               0.06
## 5            214.53                 10.25           9.17               0.06
## 6            197.80                  9.94          10.36               0.06
## 7            183.22                  9.81          11.20               0.06
## 8            146.28                  8.92          11.22               0.06
## 9            136.94                  9.22          10.71               0.06
## 10           135.59                  9.60          10.88               0.06
## 11           145.92                  8.04          10.90               0.06
## 12           158.17                 12.52          13.77               0.07
## 13           175.77                 17.46          13.04               0.07
## 14           201.94                 22.43          12.38               0.07
## 15           212.61                 23.42          13.98               0.07
## 16           190.28                 22.09          12.99               0.07
## 17           185.56                 19.74          13.07               0.08
## 18           154.41                 16.93          14.73               0.08
## 19           180.44                 17.37          17.34               0.08
## 20           160.57                 20.31          20.66               0.08
## 21           230.43                 26.22          19.74               0.09
## 22           184.25                 29.59          19.66               0.09
## 23           173.45                 29.21          18.87               0.09
## 24           133.90                 28.98          19.94               0.09
## 25           127.13                 27.89          20.52               0.09
## 26           120.49                    NA          19.41               0.09
##    Densidad_Poblacion CO2_Emisiones PIB_Per_Capita   INPC
## 1               47.44          3.68       127570.1  33.28
## 2               48.76          3.85       126738.8  39.47
## 3               49.48          3.69       129164.7  44.34
## 4               50.58          3.87       130874.9  48.31
## 5               51.28          3.81       128083.4  50.43
## 6               51.95          3.82       128205.9  53.31
## 7               52.61          3.95       128737.9  55.43
## 8               53.27          3.98       132563.5  58.31
## 9               54.78          4.10       132941.1  60.25
## 10              55.44          4.19       135894.9  62.69
## 11              56.17          4.22       137795.7  65.05
## 12              56.96          4.19       135176.0  69.30
## 13              57.73          4.04       131233.0  71.77
## 14              58.45          4.11       134991.7  74.93
## 15              59.15          4.19       138891.9  77.79
## 16              59.85          4.20       141530.2  80.57
## 17              59.49          4.06       144112.0  83.77
## 18              60.17          3.89       147277.4  87.19
## 19              60.86          3.93       149433.5  89.05
## 20              61.57          3.89       152275.4  92.04
## 21              62.28          3.84       153235.7  98.27
## 22              63.11          3.65       153133.8  99.91
## 23              63.90          3.59       150233.1 105.93
## 24              64.59            NA       142609.3 109.27
## 25              65.16            NA       142772.0 117.31
## 26              65.60            NA       146826.7 126.48
df_cash <- read.csv("/Users/genarorodriguezalcantara/Desktop/Tec/Introduction to econometrics/Ev1/ev2_data.csv")
df_cash # Confirm that the data base has been uploaded correctly. 
##    periodo IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## 1     1997   12145.60       9087.62     NA      7.20          24.30      11.30
## 2     1998    8373.50       9875.07     NA      7.31          31.91      11.37
## 3     1999   13960.32      10990.01     NA      7.43          31.91      12.46
## 4     2000   18248.69      12482.96  97.83      7.56          35.12      13.15
## 5     2001   30057.18      11300.44  97.36      7.68          37.57      13.47
## 6     2002   24099.21      11923.10  97.66      7.80          39.74      12.80
## 7     2003   18249.97      13156.00  97.06      7.93          41.53      11.81
## 8     2004   25015.57      13573.13  96.48      8.04          43.30      12.61
## 9     2005   25795.82      16465.81  97.17      8.14          45.24      13.41
## 10    2006   21232.54      17485.93  96.53      8.26          47.05      14.23
## 11    2007   32393.33      19103.85  96.60      8.36          48.88      15.04
## 12    2008   29502.46      16924.76  95.68      8.46          50.84      14.82
## 13    2009   17849.95      19702.63  95.20      8.56          53.19      12.59
## 14    2010   27189.28      22673.14  95.06      8.63          55.77      12.69
## 15    2011   25632.52      24333.02  95.49      8.75          58.06      12.10
## 16    2012   21769.32      26297.98  95.53      8.85          60.75      13.03
## 17    2013   48354.42      27687.57  95.75      8.95          63.12      13.22
## 18    2014   30351.25      31676.78  96.24      9.05          65.58      13.65
## 19    2015   35943.75      29959.94  96.04      9.15          70.10      15.11
## 20    2016   31188.98      31375.06  96.62      9.25          73.04      14.40
## 21    2017   34017.05      33322.62  96.85      9.35          88.36      14.05
## 22    2018   34100.43      35341.90  96.64      9.45          88.36      13.25
## 23    2019   34577.16      36414.73  97.09      9.58         102.68      12.70
## 24    2020   28205.89      41077.34  96.21        NA         123.22      11.28
## 25    2021   31553.52      44914.78  96.49        NA         141.70         NA
## 26    2022   36215.37      46477.59  97.24        NA         172.87         NA
##    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera
## 1            266.51                 14.55           8.06               0.05
## 2            314.78                 14.32           9.94               0.05
## 3            272.89                 12.64           9.52               0.06
## 4            216.98                 10.86           9.60               0.06
## 5            214.53                 10.25           9.17               0.06
## 6            197.80                  9.94          10.36               0.06
## 7            183.22                  9.81          11.20               0.06
## 8            146.28                  8.92          11.22               0.06
## 9            136.94                  9.22          10.71               0.06
## 10           135.59                  9.60          10.88               0.06
## 11           145.92                  8.04          10.90               0.06
## 12           158.17                 12.52          13.77               0.07
## 13           175.77                 17.46          13.04               0.07
## 14           201.94                 22.43          12.38               0.07
## 15           212.61                 23.42          13.98               0.07
## 16           190.28                 22.09          12.99               0.07
## 17           185.56                 19.74          13.07               0.08
## 18           154.41                 16.93          14.73               0.08
## 19           180.44                 17.37          17.34               0.08
## 20           160.57                 20.31          20.66               0.08
## 21           230.43                 26.22          19.74               0.09
## 22           184.25                 29.59          19.66               0.09
## 23           173.45                 29.21          18.87               0.09
## 24           133.90                 28.98          19.94               0.09
## 25           127.13                 27.89          20.52               0.09
## 26           120.49                    NA          19.41               0.09
##    Densidad_Poblacion CO2_Emisiones PIB_Per_Capita   INPC
## 1               47.44          3.68       127570.1  33.28
## 2               48.76          3.85       126738.8  39.47
## 3               49.48          3.69       129164.7  44.34
## 4               50.58          3.87       130874.9  48.31
## 5               51.28          3.81       128083.4  50.43
## 6               51.95          3.82       128205.9  53.31
## 7               52.61          3.95       128737.9  55.43
## 8               53.27          3.98       132563.5  58.31
## 9               54.78          4.10       132941.1  60.25
## 10              55.44          4.19       135894.9  62.69
## 11              56.17          4.22       137795.7  65.05
## 12              56.96          4.19       135176.0  69.30
## 13              57.73          4.04       131233.0  71.77
## 14              58.45          4.11       134991.7  74.93
## 15              59.15          4.19       138891.9  77.79
## 16              59.85          4.20       141530.2  80.57
## 17              59.49          4.06       144112.0  83.77
## 18              60.17          3.89       147277.4  87.19
## 19              60.86          3.93       149433.5  89.05
## 20              61.57          3.89       152275.4  92.04
## 21              62.28          3.84       153235.7  98.27
## 22              63.11          3.65       153133.8  99.91
## 23              63.90          3.59       150233.1 105.93
## 24              64.59            NA       142609.3 109.27
## 25              65.16            NA       142772.0 117.31
## 26              65.60            NA       146826.7 126.48
df_time_series <- read.csv("/Users/genarorodriguezalcantara/Desktop/Tec/Introduction to econometrics/Ev1/ev22_series_tiempo.csv")
df_time_series 
##     Año Trimestre IED_Flujos
## 1  1999         I    3596.08
## 2  1999        II    3395.89
## 3  1999       III    3028.45
## 4  1999        IV    3939.90
## 5  2000         I    4600.64
## 6  2000        II    4857.42
## 7  2000       III    3056.95
## 8  2000        IV    5733.68
## 9  2001         I    3598.68
## 10 2001        II    5218.83
## 11 2001       III   16314.05
## 12 2001        IV    4925.63
## 13 2002         I    5067.98
## 14 2002        II    6258.52
## 15 2002       III    6114.34
## 16 2002        IV    6658.37
## 17 2003         I    3963.69
## 18 2003        II    5547.34
## 19 2003       III    2521.68
## 20 2003        IV    6217.27
## 21 2004         I    9363.46
## 22 2004        II    4351.50
## 23 2004       III    3284.91
## 24 2004        IV    8015.70
## 25 2005         I    6761.62
## 26 2005        II    6773.62
## 27 2005       III    5478.92
## 28 2005        IV    6781.66
## 29 2006         I    7436.81
## 30 2006        II    6634.31
## 31 2006       III    2346.57
## 32 2006        IV    4814.85
## 33 2007         I   10815.78
## 34 2007        II    6137.64
## 35 2007       III    7628.45
## 36 2007        IV    7811.47
## 37 2008         I    8546.44
## 38 2008        II    8376.14
## 39 2008       III    5643.68
## 40 2008        IV    6936.20
## 41 2009         I    6105.30
## 42 2009        II    6094.18
## 43 2009       III    2397.52
## 44 2009        IV    3252.95
## 45 2010         I    8722.26
## 46 2010        II    9301.62
## 47 2010       III    3932.89
## 48 2010        IV    5232.50
## 49 2011         I    8431.21
## 50 2011        II    6697.41
## 51 2011       III    4349.89
## 52 2011        IV    6154.00
## 53 2012         I    7892.69
## 54 2012        II    5622.13
## 55 2012       III    5736.29
## 56 2012        IV    2518.21
## 57 2013         I   10571.58
## 58 2013        II   21019.14
## 59 2013       III    4178.81
## 60 2013        IV   12584.89
## 61 2014         I   13828.00
## 62 2014        II    5478.74
## 63 2014       III    3222.08
## 64 2014        IV    7822.43
## 65 2015         I   12136.33
## 66 2015        II    6656.85
## 67 2015       III    9635.56
## 68 2015        IV    7515.02
## 69 2016         I   12805.36
## 70 2016        II    6210.62
## 71 2016       III    4317.27
## 72 2016        IV    7855.73
## 73 2017         I   13779.06
## 74 2017        II    6814.16
## 75 2017       III    6361.22
## 76 2017        IV    7062.61
## 77 2018         I   14067.52
## 78 2018        II    9577.75
## 79 2018       III    4132.97
## 80 2018        IV    6322.19
## 81 2019         I   15175.27
## 82 2019        II    6504.62
## 83 2019       III    8217.40
## 84 2019        IV    4679.87
## 85 2020         I   16807.60
## 86 2020        II    7293.96
## 87 2020       III    1340.58
## 88 2020        IV    2763.75
## 89 2021         I   16206.05
## 90 2021        II    5883.73
## 91 2021       III    6419.43
## 92 2021        IV    3044.31
## 93 2022         I   22794.16
## 94 2022        II    8164.27
## 95 2022       III    3479.68
## 96 2023        IV    1777.26
df_cash$IED_FlujosMX <- ((df_cash$IED_Flujos * df_cash$Tipo_de_Cambio)/df_cash$INPC)*100
df_cash$ExportacionesMX <- ((df_cash$Exportaciones * df_cash$Tipo_de_Cambio)/df_cash$INPC)*100
df_cash
##    periodo IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## 1     1997   12145.60       9087.62     NA      7.20          24.30      11.30
## 2     1998    8373.50       9875.07     NA      7.31          31.91      11.37
## 3     1999   13960.32      10990.01     NA      7.43          31.91      12.46
## 4     2000   18248.69      12482.96  97.83      7.56          35.12      13.15
## 5     2001   30057.18      11300.44  97.36      7.68          37.57      13.47
## 6     2002   24099.21      11923.10  97.66      7.80          39.74      12.80
## 7     2003   18249.97      13156.00  97.06      7.93          41.53      11.81
## 8     2004   25015.57      13573.13  96.48      8.04          43.30      12.61
## 9     2005   25795.82      16465.81  97.17      8.14          45.24      13.41
## 10    2006   21232.54      17485.93  96.53      8.26          47.05      14.23
## 11    2007   32393.33      19103.85  96.60      8.36          48.88      15.04
## 12    2008   29502.46      16924.76  95.68      8.46          50.84      14.82
## 13    2009   17849.95      19702.63  95.20      8.56          53.19      12.59
## 14    2010   27189.28      22673.14  95.06      8.63          55.77      12.69
## 15    2011   25632.52      24333.02  95.49      8.75          58.06      12.10
## 16    2012   21769.32      26297.98  95.53      8.85          60.75      13.03
## 17    2013   48354.42      27687.57  95.75      8.95          63.12      13.22
## 18    2014   30351.25      31676.78  96.24      9.05          65.58      13.65
## 19    2015   35943.75      29959.94  96.04      9.15          70.10      15.11
## 20    2016   31188.98      31375.06  96.62      9.25          73.04      14.40
## 21    2017   34017.05      33322.62  96.85      9.35          88.36      14.05
## 22    2018   34100.43      35341.90  96.64      9.45          88.36      13.25
## 23    2019   34577.16      36414.73  97.09      9.58         102.68      12.70
## 24    2020   28205.89      41077.34  96.21        NA         123.22      11.28
## 25    2021   31553.52      44914.78  96.49        NA         141.70         NA
## 26    2022   36215.37      46477.59  97.24        NA         172.87         NA
##    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera
## 1            266.51                 14.55           8.06               0.05
## 2            314.78                 14.32           9.94               0.05
## 3            272.89                 12.64           9.52               0.06
## 4            216.98                 10.86           9.60               0.06
## 5            214.53                 10.25           9.17               0.06
## 6            197.80                  9.94          10.36               0.06
## 7            183.22                  9.81          11.20               0.06
## 8            146.28                  8.92          11.22               0.06
## 9            136.94                  9.22          10.71               0.06
## 10           135.59                  9.60          10.88               0.06
## 11           145.92                  8.04          10.90               0.06
## 12           158.17                 12.52          13.77               0.07
## 13           175.77                 17.46          13.04               0.07
## 14           201.94                 22.43          12.38               0.07
## 15           212.61                 23.42          13.98               0.07
## 16           190.28                 22.09          12.99               0.07
## 17           185.56                 19.74          13.07               0.08
## 18           154.41                 16.93          14.73               0.08
## 19           180.44                 17.37          17.34               0.08
## 20           160.57                 20.31          20.66               0.08
## 21           230.43                 26.22          19.74               0.09
## 22           184.25                 29.59          19.66               0.09
## 23           173.45                 29.21          18.87               0.09
## 24           133.90                 28.98          19.94               0.09
## 25           127.13                 27.89          20.52               0.09
## 26           120.49                    NA          19.41               0.09
##    Densidad_Poblacion CO2_Emisiones PIB_Per_Capita   INPC IED_FlujosMX
## 1               47.44          3.68       127570.1  33.28     294151.2
## 2               48.76          3.85       126738.8  39.47     210875.6
## 3               49.48          3.69       129164.7  44.34     299734.4
## 4               50.58          3.87       130874.9  48.31     362631.8
## 5               51.28          3.81       128083.4  50.43     546548.4
## 6               51.95          3.82       128205.9  53.31     468332.0
## 7               52.61          3.95       128737.9  55.43     368752.8
## 8               53.27          3.98       132563.5  58.31     481349.2
## 9               54.78          4.10       132941.1  60.25     458544.8
## 10              55.44          4.19       135894.9  62.69     368495.8
## 11              56.17          4.22       137795.7  65.05     542793.7
## 12              56.96          4.19       135176.0  69.30     586217.7
## 13              57.73          4.04       131233.0  71.77     324318.4
## 14              58.45          4.11       134991.7  74.93     449223.7
## 15              59.15          4.19       138891.9  77.79     460653.8
## 16              59.85          4.20       141530.2  80.57     350978.6
## 17              59.49          4.06       144112.0  83.77     754437.5
## 18              60.17          3.89       147277.4  87.19     512758.2
## 19              60.86          3.93       149433.5  89.05     699904.1
## 20              61.57          3.89       152275.4  92.04     700091.6
## 21              62.28          3.84       153235.7  98.27     683318.0
## 22              63.11          3.65       153133.8  99.91     671018.4
## 23              63.90          3.59       150233.1 105.93     615945.4
## 24              64.59            NA       142609.3 109.27     514711.7
## 25              65.16            NA       142772.0 117.31     551937.8
## 26              65.60            NA       146826.7 126.48     555771.9
##    ExportacionesMX
## 1         220090.8
## 2         248690.6
## 3         235960.5
## 4         248057.2
## 5         205482.9
## 6         231707.6
## 7         265825.7
## 8         261173.9
## 9         292695.1
## 10        303472.5
## 11        320110.6
## 12        336297.2
## 13        357980.1
## 14        374607.6
## 15        437299.9
## 16        423992.5
## 17        431988.2
## 18        535151.9
## 19        583386.1
## 20        704268.5
## 21        669368.6
## 22        695447.7
## 23        648679.3
## 24        749594.7
## 25        785654.5
## 26        713259.0
df_cash$IED_Flujos <- NULL
df_cash$Exportaciones <- NULL
df_cash
##    periodo Empleo Educacion Salario_Diario Innovacion Inseguridad_Robo
## 1     1997     NA      7.20          24.30      11.30           266.51
## 2     1998     NA      7.31          31.91      11.37           314.78
## 3     1999     NA      7.43          31.91      12.46           272.89
## 4     2000  97.83      7.56          35.12      13.15           216.98
## 5     2001  97.36      7.68          37.57      13.47           214.53
## 6     2002  97.66      7.80          39.74      12.80           197.80
## 7     2003  97.06      7.93          41.53      11.81           183.22
## 8     2004  96.48      8.04          43.30      12.61           146.28
## 9     2005  97.17      8.14          45.24      13.41           136.94
## 10    2006  96.53      8.26          47.05      14.23           135.59
## 11    2007  96.60      8.36          48.88      15.04           145.92
## 12    2008  95.68      8.46          50.84      14.82           158.17
## 13    2009  95.20      8.56          53.19      12.59           175.77
## 14    2010  95.06      8.63          55.77      12.69           201.94
## 15    2011  95.49      8.75          58.06      12.10           212.61
## 16    2012  95.53      8.85          60.75      13.03           190.28
## 17    2013  95.75      8.95          63.12      13.22           185.56
## 18    2014  96.24      9.05          65.58      13.65           154.41
## 19    2015  96.04      9.15          70.10      15.11           180.44
## 20    2016  96.62      9.25          73.04      14.40           160.57
## 21    2017  96.85      9.35          88.36      14.05           230.43
## 22    2018  96.64      9.45          88.36      13.25           184.25
## 23    2019  97.09      9.58         102.68      12.70           173.45
## 24    2020  96.21        NA         123.22      11.28           133.90
## 25    2021  96.49        NA         141.70         NA           127.13
## 26    2022  97.24        NA         172.87         NA           120.49
##    Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## 1                  14.55           8.06               0.05              47.44
## 2                  14.32           9.94               0.05              48.76
## 3                  12.64           9.52               0.06              49.48
## 4                  10.86           9.60               0.06              50.58
## 5                  10.25           9.17               0.06              51.28
## 6                   9.94          10.36               0.06              51.95
## 7                   9.81          11.20               0.06              52.61
## 8                   8.92          11.22               0.06              53.27
## 9                   9.22          10.71               0.06              54.78
## 10                  9.60          10.88               0.06              55.44
## 11                  8.04          10.90               0.06              56.17
## 12                 12.52          13.77               0.07              56.96
## 13                 17.46          13.04               0.07              57.73
## 14                 22.43          12.38               0.07              58.45
## 15                 23.42          13.98               0.07              59.15
## 16                 22.09          12.99               0.07              59.85
## 17                 19.74          13.07               0.08              59.49
## 18                 16.93          14.73               0.08              60.17
## 19                 17.37          17.34               0.08              60.86
## 20                 20.31          20.66               0.08              61.57
## 21                 26.22          19.74               0.09              62.28
## 22                 29.59          19.66               0.09              63.11
## 23                 29.21          18.87               0.09              63.90
## 24                 28.98          19.94               0.09              64.59
## 25                 27.89          20.52               0.09              65.16
## 26                    NA          19.41               0.09              65.60
##    CO2_Emisiones PIB_Per_Capita   INPC IED_FlujosMX ExportacionesMX
## 1           3.68       127570.1  33.28     294151.2        220090.8
## 2           3.85       126738.8  39.47     210875.6        248690.6
## 3           3.69       129164.7  44.34     299734.4        235960.5
## 4           3.87       130874.9  48.31     362631.8        248057.2
## 5           3.81       128083.4  50.43     546548.4        205482.9
## 6           3.82       128205.9  53.31     468332.0        231707.6
## 7           3.95       128737.9  55.43     368752.8        265825.7
## 8           3.98       132563.5  58.31     481349.2        261173.9
## 9           4.10       132941.1  60.25     458544.8        292695.1
## 10          4.19       135894.9  62.69     368495.8        303472.5
## 11          4.22       137795.7  65.05     542793.7        320110.6
## 12          4.19       135176.0  69.30     586217.7        336297.2
## 13          4.04       131233.0  71.77     324318.4        357980.1
## 14          4.11       134991.7  74.93     449223.7        374607.6
## 15          4.19       138891.9  77.79     460653.8        437299.9
## 16          4.20       141530.2  80.57     350978.6        423992.5
## 17          4.06       144112.0  83.77     754437.5        431988.2
## 18          3.89       147277.4  87.19     512758.2        535151.9
## 19          3.93       149433.5  89.05     699904.1        583386.1
## 20          3.89       152275.4  92.04     700091.6        704268.5
## 21          3.84       153235.7  98.27     683318.0        669368.6
## 22          3.65       153133.8  99.91     671018.4        695447.7
## 23          3.59       150233.1 105.93     615945.4        648679.3
## 24            NA       142609.3 109.27     514711.7        749594.7
## 25            NA       142772.0 117.31     551937.8        785654.5
## 26            NA       146826.7 126.48     555771.9        713259.0

Exploratory Data Analysis

Descriptive Statistics & Measures of Didfersion

str(df_cash)
## 'data.frame':    26 obs. of  15 variables:
##  $ periodo              : int  1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
##  $ Empleo               : num  NA NA NA 97.8 97.4 ...
##  $ Educacion            : num  7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
##  $ Salario_Diario       : num  24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovacion           : num  11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num  267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num  14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num  8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num  0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
##  $ Densidad_Poblacion   : num  47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num  3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
##  $ PIB_Per_Capita       : num  127570 126739 129165 130875 128083 ...
##  $ INPC                 : num  33.3 39.5 44.3 48.3 50.4 ...
##  $ IED_FlujosMX         : num  294151 210876 299734 362632 546548 ...
##  $ ExportacionesMX      : num  220091 248691 235961 248057 205483 ...
str(df_time_series)
## 'data.frame':    96 obs. of  3 variables:
##  $ Año       : int  1999 1999 1999 1999 2000 2000 2000 2000 2001 2001 ...
##  $ Trimestre : chr  "I" "II" "III" "IV" ...
##  $ IED_Flujos: num  3596 3396 3028 3940 4601 ...
summary(df_cash)
##     periodo         Empleo        Educacion     Salario_Diario  
##  Min.   :1997   Min.   :95.06   Min.   :7.200   Min.   : 24.30  
##  1st Qu.:2003   1st Qu.:95.89   1st Qu.:7.865   1st Qu.: 41.97  
##  Median :2010   Median :96.53   Median :8.460   Median : 54.48  
##  Mean   :2010   Mean   :96.47   Mean   :8.423   Mean   : 65.16  
##  3rd Qu.:2016   3rd Qu.:97.08   3rd Qu.:9.000   3rd Qu.: 72.31  
##  Max.   :2022   Max.   :97.83   Max.   :9.580   Max.   :172.87  
##                 NA's   :3       NA's   :3                       
##    Innovacion    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio 
##  Min.   :11.28   Min.   :120.5    Min.   : 8.04         Min.   : 8.06  
##  1st Qu.:12.56   1st Qu.:148.3    1st Qu.:10.25         1st Qu.:10.75  
##  Median :13.09   Median :181.8    Median :16.93         Median :13.02  
##  Mean   :13.11   Mean   :185.4    Mean   :17.29         Mean   :13.91  
##  3rd Qu.:13.75   3rd Qu.:209.9    3rd Qu.:22.43         3rd Qu.:18.49  
##  Max.   :15.11   Max.   :314.8    Max.   :29.59         Max.   :20.66  
##  NA's   :2                        NA's   :1                            
##  Densidad_Carretera Densidad_Poblacion CO2_Emisiones   PIB_Per_Capita  
##  Min.   :0.05000    Min.   :47.44      Min.   :3.590   Min.   :126739  
##  1st Qu.:0.06000    1st Qu.:52.77      1st Qu.:3.830   1st Qu.:130964  
##  Median :0.07000    Median :58.09      Median :3.930   Median :136845  
##  Mean   :0.07115    Mean   :57.33      Mean   :3.945   Mean   :138550  
##  3rd Qu.:0.08000    3rd Qu.:61.39      3rd Qu.:4.105   3rd Qu.:146148  
##  Max.   :0.09000    Max.   :65.60      Max.   :4.220   Max.   :153236  
##                                        NA's   :3                       
##       INPC         IED_FlujosMX    ExportacionesMX 
##  Min.   : 33.28   Min.   :210876   Min.   :205483  
##  1st Qu.: 56.15   1st Qu.:368560   1st Qu.:262337  
##  Median : 73.35   Median :497054   Median :366294  
##  Mean   : 75.17   Mean   :493596   Mean   :433856  
##  3rd Qu.: 91.29   3rd Qu.:578606   3rd Qu.:632356  
##  Max.   :126.48   Max.   :754438   Max.   :785654  
## 
summary(df_time_series)
##       Año        Trimestre           IED_Flujos   
##  Min.   :1999   Length:96          Min.   : 1341  
##  1st Qu.:2005   Class :character   1st Qu.: 4351  
##  Median :2010   Mode  :character   Median : 6238  
##  Mean   :2011                      Mean   : 7036  
##  3rd Qu.:2016                      3rd Qu.: 8053  
##  Max.   :2023                      Max.   :22794
# Transforming the int variables values of the columns to a numeric type in our databases.
df_cash$periodo <- as.numeric(df_cash$periodo)
df_time_series$Año <- as.numeric(df_time_series$Año)

# changing character variable to a numeric in the second database
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "I", "A", df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "II","B", df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "III","C", df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "IV","D", df_time_series$Trimestre)

df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "A", 1, df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "B", 2, df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "C", 3, df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "D", 4, df_time_series$Trimestre)
print(df_time_series)
##     Año Trimestre IED_Flujos
## 1  1999         1    3596.08
## 2  1999         2    3395.89
## 3  1999         3    3028.45
## 4  1999         4    3939.90
## 5  2000         1    4600.64
## 6  2000         2    4857.42
## 7  2000         3    3056.95
## 8  2000         4    5733.68
## 9  2001         1    3598.68
## 10 2001         2    5218.83
## 11 2001         3   16314.05
## 12 2001         4    4925.63
## 13 2002         1    5067.98
## 14 2002         2    6258.52
## 15 2002         3    6114.34
## 16 2002         4    6658.37
## 17 2003         1    3963.69
## 18 2003         2    5547.34
## 19 2003         3    2521.68
## 20 2003         4    6217.27
## 21 2004         1    9363.46
## 22 2004         2    4351.50
## 23 2004         3    3284.91
## 24 2004         4    8015.70
## 25 2005         1    6761.62
## 26 2005         2    6773.62
## 27 2005         3    5478.92
## 28 2005         4    6781.66
## 29 2006         1    7436.81
## 30 2006         2    6634.31
## 31 2006         3    2346.57
## 32 2006         4    4814.85
## 33 2007         1   10815.78
## 34 2007         2    6137.64
## 35 2007         3    7628.45
## 36 2007         4    7811.47
## 37 2008         1    8546.44
## 38 2008         2    8376.14
## 39 2008         3    5643.68
## 40 2008         4    6936.20
## 41 2009         1    6105.30
## 42 2009         2    6094.18
## 43 2009         3    2397.52
## 44 2009         4    3252.95
## 45 2010         1    8722.26
## 46 2010         2    9301.62
## 47 2010         3    3932.89
## 48 2010         4    5232.50
## 49 2011         1    8431.21
## 50 2011         2    6697.41
## 51 2011         3    4349.89
## 52 2011         4    6154.00
## 53 2012         1    7892.69
## 54 2012         2    5622.13
## 55 2012         3    5736.29
## 56 2012         4    2518.21
## 57 2013         1   10571.58
## 58 2013         2   21019.14
## 59 2013         3    4178.81
## 60 2013         4   12584.89
## 61 2014         1   13828.00
## 62 2014         2    5478.74
## 63 2014         3    3222.08
## 64 2014         4    7822.43
## 65 2015         1   12136.33
## 66 2015         2    6656.85
## 67 2015         3    9635.56
## 68 2015         4    7515.02
## 69 2016         1   12805.36
## 70 2016         2    6210.62
## 71 2016         3    4317.27
## 72 2016         4    7855.73
## 73 2017         1   13779.06
## 74 2017         2    6814.16
## 75 2017         3    6361.22
## 76 2017         4    7062.61
## 77 2018         1   14067.52
## 78 2018         2    9577.75
## 79 2018         3    4132.97
## 80 2018         4    6322.19
## 81 2019         1   15175.27
## 82 2019         2    6504.62
## 83 2019         3    8217.40
## 84 2019         4    4679.87
## 85 2020         1   16807.60
## 86 2020         2    7293.96
## 87 2020         3    1340.58
## 88 2020         4    2763.75
## 89 2021         1   16206.05
## 90 2021         2    5883.73
## 91 2021         3    6419.43
## 92 2021         4    3044.31
## 93 2022         1   22794.16
## 94 2022         2    8164.27
## 95 2022         3    3479.68
## 96 2023         4    1777.26
df_time_series$Trimestre <- as.numeric(df_time_series$Trimestre)
str(df_cash)
## 'data.frame':    26 obs. of  15 variables:
##  $ periodo              : num  1997 1998 1999 2000 2001 ...
##  $ Empleo               : num  NA NA NA 97.8 97.4 ...
##  $ Educacion            : num  7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
##  $ Salario_Diario       : num  24.3 31.9 31.9 35.1 37.6 ...
##  $ Innovacion           : num  11.3 11.4 12.5 13.2 13.5 ...
##  $ Inseguridad_Robo     : num  267 315 273 217 215 ...
##  $ Inseguridad_Homicidio: num  14.6 14.3 12.6 10.9 10.2 ...
##  $ Tipo_de_Cambio       : num  8.06 9.94 9.52 9.6 9.17 ...
##  $ Densidad_Carretera   : num  0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
##  $ Densidad_Poblacion   : num  47.4 48.8 49.5 50.6 51.3 ...
##  $ CO2_Emisiones        : num  3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
##  $ PIB_Per_Capita       : num  127570 126739 129165 130875 128083 ...
##  $ INPC                 : num  33.3 39.5 44.3 48.3 50.4 ...
##  $ IED_FlujosMX         : num  294151 210876 299734 362632 546548 ...
##  $ ExportacionesMX      : num  220091 248691 235961 248057 205483 ...
str(df_time_series)
## 'data.frame':    96 obs. of  3 variables:
##  $ Año       : num  1999 1999 1999 1999 2000 ...
##  $ Trimestre : num  1 2 3 4 1 2 3 4 1 2 ...
##  $ IED_Flujos: num  3596 3396 3028 3940 4601 ...
# Identify the name of the variables
colnames(df_cash)
##  [1] "periodo"               "Empleo"                "Educacion"            
##  [4] "Salario_Diario"        "Innovacion"            "Inseguridad_Robo"     
##  [7] "Inseguridad_Homicidio" "Tipo_de_Cambio"        "Densidad_Carretera"   
## [10] "Densidad_Poblacion"    "CO2_Emisiones"         "PIB_Per_Capita"       
## [13] "INPC"                  "IED_FlujosMX"          "ExportacionesMX"
colnames(df_time_series)
## [1] "Año"        "Trimestre"  "IED_Flujos"
# Identify missing values
df_missing_values <-  sum(is.na(df_cash))
df_missing_values
## [1] 12
df_time_series_missing_values <-  sum(is.na(df_time_series))
df_time_series_missing_values
## [1] 0
# In the first database there are 12 missing values. It is needed to do an imputation method to take care of these values. For the second database there where no NAs

## Imputation method for missing values. 
# Calculating the mean for each of the variables that have NA. 
mean_empleo <- mean(df_cash$Empleo, na.rm = TRUE)
mean_innovacion <-  mean(df_cash$Innovacion, na.rm = TRUE)
mean_inseguridad <- mean(df_cash$Inseguridad_Homicidio, na.rm = TRUE)
mean_co2 <- mean(df_cash$CO2_Emisiones, na.rm = TRUE)

# Imputating the missing values with the mean of each category.
df_cash$Empleo[is.na(df_cash$Empleo)] <- mean_empleo
df_cash$Innovacion[is.na(df_cash$Innovacion)] <- mean_innovacion
df_cash$Inseguridad_Homicidio[is.na(df_cash$Inseguridad_Homicidio)] <- mean_inseguridad
df_cash$CO2_Emisiones[is.na(df_cash$CO2_Emisiones)] <- mean_co2

# Imputating with linear interpolation the column "Educacion"
ascending_educacion <- approx(seq_along(df_cash$Educacion), df_cash$Educacion, method = "linear", n = length(df_cash$Educacion))$y
df_cash$Educacion <- ascending_educacion
print(df_cash$Educacion)
##  [1] 7.2000 7.2968 7.4012 7.5132 7.6224 7.7280 7.8364 7.9476 8.0440 8.1320
## [11] 8.2360 8.3280 8.4160 8.5040 8.5824 8.6540 8.7580 8.8460 8.9340 9.0220
## [21] 9.1100 9.1980 9.2860 9.3740 9.4656 9.5800
# We look for any missing values in our data. 
na_df_2 <- sum(is.na(df_cash))
na_df_2
## [1] 0
print(df_cash)
##    periodo   Empleo Educacion Salario_Diario Innovacion Inseguridad_Robo
## 1     1997 96.47043    7.2000          24.30   11.30000           266.51
## 2     1998 96.47043    7.2968          31.91   11.37000           314.78
## 3     1999 96.47043    7.4012          31.91   12.46000           272.89
## 4     2000 97.83000    7.5132          35.12   13.15000           216.98
## 5     2001 97.36000    7.6224          37.57   13.47000           214.53
## 6     2002 97.66000    7.7280          39.74   12.80000           197.80
## 7     2003 97.06000    7.8364          41.53   11.81000           183.22
## 8     2004 96.48000    7.9476          43.30   12.61000           146.28
## 9     2005 97.17000    8.0440          45.24   13.41000           136.94
## 10    2006 96.53000    8.1320          47.05   14.23000           135.59
## 11    2007 96.60000    8.2360          48.88   15.04000           145.92
## 12    2008 95.68000    8.3280          50.84   14.82000           158.17
## 13    2009 95.20000    8.4160          53.19   12.59000           175.77
## 14    2010 95.06000    8.5040          55.77   12.69000           201.94
## 15    2011 95.49000    8.5824          58.06   12.10000           212.61
## 16    2012 95.53000    8.6540          60.75   13.03000           190.28
## 17    2013 95.75000    8.7580          63.12   13.22000           185.56
## 18    2014 96.24000    8.8460          65.58   13.65000           154.41
## 19    2015 96.04000    8.9340          70.10   15.11000           180.44
## 20    2016 96.62000    9.0220          73.04   14.40000           160.57
## 21    2017 96.85000    9.1100          88.36   14.05000           230.43
## 22    2018 96.64000    9.1980          88.36   13.25000           184.25
## 23    2019 97.09000    9.2860         102.68   12.70000           173.45
## 24    2020 96.21000    9.3740         123.22   11.28000           133.90
## 25    2021 96.49000    9.4656         141.70   13.10583           127.13
## 26    2022 97.24000    9.5800         172.87   13.10583           120.49
##    Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## 1                14.5500           8.06               0.05              47.44
## 2                14.3200           9.94               0.05              48.76
## 3                12.6400           9.52               0.06              49.48
## 4                10.8600           9.60               0.06              50.58
## 5                10.2500           9.17               0.06              51.28
## 6                 9.9400          10.36               0.06              51.95
## 7                 9.8100          11.20               0.06              52.61
## 8                 8.9200          11.22               0.06              53.27
## 9                 9.2200          10.71               0.06              54.78
## 10                9.6000          10.88               0.06              55.44
## 11                8.0400          10.90               0.06              56.17
## 12               12.5200          13.77               0.07              56.96
## 13               17.4600          13.04               0.07              57.73
## 14               22.4300          12.38               0.07              58.45
## 15               23.4200          13.98               0.07              59.15
## 16               22.0900          12.99               0.07              59.85
## 17               19.7400          13.07               0.08              59.49
## 18               16.9300          14.73               0.08              60.17
## 19               17.3700          17.34               0.08              60.86
## 20               20.3100          20.66               0.08              61.57
## 21               26.2200          19.74               0.09              62.28
## 22               29.5900          19.66               0.09              63.11
## 23               29.2100          18.87               0.09              63.90
## 24               28.9800          19.94               0.09              64.59
## 25               27.8900          20.52               0.09              65.16
## 26               17.2924          19.41               0.09              65.60
##    CO2_Emisiones PIB_Per_Capita   INPC IED_FlujosMX ExportacionesMX
## 1       3.680000       127570.1  33.28     294151.2        220090.8
## 2       3.850000       126738.8  39.47     210875.6        248690.6
## 3       3.690000       129164.7  44.34     299734.4        235960.5
## 4       3.870000       130874.9  48.31     362631.8        248057.2
## 5       3.810000       128083.4  50.43     546548.4        205482.9
## 6       3.820000       128205.9  53.31     468332.0        231707.6
## 7       3.950000       128737.9  55.43     368752.8        265825.7
## 8       3.980000       132563.5  58.31     481349.2        261173.9
## 9       4.100000       132941.1  60.25     458544.8        292695.1
## 10      4.190000       135894.9  62.69     368495.8        303472.5
## 11      4.220000       137795.7  65.05     542793.7        320110.6
## 12      4.190000       135176.0  69.30     586217.7        336297.2
## 13      4.040000       131233.0  71.77     324318.4        357980.1
## 14      4.110000       134991.7  74.93     449223.7        374607.6
## 15      4.190000       138891.9  77.79     460653.8        437299.9
## 16      4.200000       141530.2  80.57     350978.6        423992.5
## 17      4.060000       144112.0  83.77     754437.5        431988.2
## 18      3.890000       147277.4  87.19     512758.2        535151.9
## 19      3.930000       149433.5  89.05     699904.1        583386.1
## 20      3.890000       152275.4  92.04     700091.6        704268.5
## 21      3.840000       153235.7  98.27     683318.0        669368.6
## 22      3.650000       153133.8  99.91     671018.4        695447.7
## 23      3.590000       150233.1 105.93     615945.4        648679.3
## 24      3.945217       142609.3 109.27     514711.7        749594.7
## 25      3.945217       142772.0 117.31     551937.8        785654.5
## 26      3.945217       146826.7 126.48     555771.9        713259.0
# Basic descriptive statistics and measures of didfersion
df_descriptive_statistics <- summary(df_cash)
df_descriptive_statistics
##     periodo         Empleo        Educacion     Salario_Diario  
##  Min.   :1997   Min.   :95.06   Min.   :7.200   Min.   : 24.30  
##  1st Qu.:2003   1st Qu.:96.08   1st Qu.:7.864   1st Qu.: 41.97  
##  Median :2010   Median :96.48   Median :8.460   Median : 54.48  
##  Mean   :2010   Mean   :96.47   Mean   :8.424   Mean   : 65.16  
##  3rd Qu.:2016   3rd Qu.:97.01   3rd Qu.:9.000   3rd Qu.: 72.31  
##  Max.   :2022   Max.   :97.83   Max.   :9.580   Max.   :172.87  
##    Innovacion    Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio 
##  Min.   :11.28   Min.   :120.5    Min.   : 8.04         Min.   : 8.06  
##  1st Qu.:12.60   1st Qu.:148.3    1st Qu.:10.40         1st Qu.:10.75  
##  Median :13.11   Median :181.8    Median :17.11         Median :13.02  
##  Mean   :13.11   Mean   :185.4    Mean   :17.29         Mean   :13.91  
##  3rd Qu.:13.61   3rd Qu.:209.9    3rd Qu.:22.34         3rd Qu.:18.49  
##  Max.   :15.11   Max.   :314.8    Max.   :29.59         Max.   :20.66  
##  Densidad_Carretera Densidad_Poblacion CO2_Emisiones   PIB_Per_Capita  
##  Min.   :0.05000    Min.   :47.44      Min.   :3.590   Min.   :126739  
##  1st Qu.:0.06000    1st Qu.:52.77      1st Qu.:3.842   1st Qu.:130964  
##  Median :0.07000    Median :58.09      Median :3.945   Median :136845  
##  Mean   :0.07115    Mean   :57.33      Mean   :3.945   Mean   :138550  
##  3rd Qu.:0.08000    3rd Qu.:61.39      3rd Qu.:4.090   3rd Qu.:146148  
##  Max.   :0.09000    Max.   :65.60      Max.   :4.220   Max.   :153236  
##       INPC         IED_FlujosMX    ExportacionesMX 
##  Min.   : 33.28   Min.   :210876   Min.   :205483  
##  1st Qu.: 56.15   1st Qu.:368560   1st Qu.:262337  
##  Median : 73.35   Median :497054   Median :366294  
##  Mean   : 75.17   Mean   :493596   Mean   :433856  
##  3rd Qu.: 91.29   3rd Qu.:578606   3rd Qu.:632356  
##  Max.   :126.48   Max.   :754438   Max.   :785654
df_describe <- describe(df_cash)
df_describe
##                       vars  n      mean        sd    median   trimmed       mad
## periodo                  1 26   2009.50      7.65   2009.50   2009.50      9.64
## Empleo                   2 26     96.47      0.72     96.48     96.48      0.76
## Educacion                3 26      8.42      0.71      8.46      8.43      0.88
## Salario_Diario           4 26     65.16     35.85     54.48     60.16     22.51
## Innovacion               5 26     13.11      1.07     13.11     13.09      0.79
## Inseguridad_Robo         6 26    185.42     47.67    181.83    181.16     47.06
## Inseguridad_Homicidio    7 26     17.29      7.12     17.11     16.99      9.31
## Tipo_de_Cambio           8 26     13.91      4.15     13.02     13.78      4.25
## Densidad_Carretera       9 26      0.07      0.01      0.07      0.07      0.01
## Densidad_Poblacion      10 26     57.33      5.41     58.09     57.44      6.68
## CO2_Emisiones           11 26      3.95      0.18      3.95      3.95      0.18
## PIB_Per_Capita          12 26 138550.10   8861.10 136845.30 138255.64  11080.42
## INPC                    13 26     75.17     24.81     73.35     74.45     27.14
## IED_FlujosMX            14 26 493596.02 143849.16 497053.70 494270.03 183243.92
## ExportacionesMX         15 26 433855.52 195018.66 366293.83 423610.02 184264.93
##                             min       max     range  skew kurtosis       se
## periodo                 1997.00   2022.00     25.00  0.00    -1.34     1.50
## Empleo                    95.06     97.83      2.77 -0.14    -0.73     0.14
## Educacion                  7.20      9.58      2.38 -0.09    -1.28     0.14
## Salario_Diario            24.30    172.87    148.57  1.43     1.44     7.03
## Innovacion                11.28     15.11      3.83  0.12    -0.70     0.21
## Inseguridad_Robo         120.49    314.78    194.29  0.89     0.30     9.35
## Inseguridad_Homicidio      8.04     29.59     21.55  0.38    -1.28     1.40
## Tipo_de_Cambio             8.06     20.66     12.60  0.44    -1.39     0.81
## Densidad_Carretera         0.05      0.09      0.04  0.19    -1.41     0.00
## Densidad_Poblacion        47.44     65.60     18.16 -0.19    -1.24     1.06
## CO2_Emisiones              3.59      4.22      0.63 -0.14    -0.95     0.04
## PIB_Per_Capita        126738.75 153235.73  26496.98  0.28    -1.41  1737.81
## INPC                      33.28    126.48     93.20  0.26    -0.95     4.87
## IED_FlujosMX          210875.58 754437.47 543561.89 -0.01    -1.00 28211.14
## ExportacionesMX       205482.92 785654.49 580171.58  0.48    -1.40 38246.31
df_variance <- var(df_cash)
df_variance
##                             periodo        Empleo     Educacion Salario_Diario
## periodo                    58.50000 -1.093600e+00  5.456280e+00   2.423374e+02
## Empleo                     -1.09360  5.193158e-01 -1.086377e-01   1.186760e+00
## Educacion                   5.45628 -1.086377e-01  5.098458e-01   2.239210e+01
## Salario_Diario            242.33740  1.186760e+00  2.239210e+01   1.285149e+03
## Innovacion                  2.09800  1.825317e-02  2.106674e-01   2.001901e+00
## Inseguridad_Robo         -214.76000 -1.448614e-01 -2.083824e+01  -9.258424e+02
## Inseguridad_Homicidio      42.46360 -1.639959e+00  3.881321e+00   1.660927e+02
## Tipo_de_Cambio             29.86600 -2.223791e-01  2.764304e+00   1.265472e+02
## Densidad_Carretera          0.09860 -1.105391e-03  9.144708e-03   4.111906e-01
## Densidad_Poblacion         41.20340 -9.326584e-01  3.854972e+00   1.672534e+02
## CO2_Emisiones               0.04560 -6.470219e-02  8.375443e-03  -5.454132e-01
## PIB_Per_Capita          60266.10260 -6.479285e+02  5.599582e+03   2.138486e+05
## INPC                      188.09020 -2.245450e+00  1.751434e+01   8.304526e+02
## IED_FlujosMX           754867.37199  3.266634e+03  7.102963e+04   2.483604e+06
## ExportacionesMX       1422733.70764 -1.178481e+04  1.312817e+05   6.160194e+06
##                          Innovacion Inseguridad_Robo Inseguridad_Homicidio
## periodo                2.098000e+00    -2.147600e+02          4.246360e+01
## Empleo                 1.825317e-02    -1.448614e-01         -1.639959e+00
## Educacion              2.106674e-01    -2.083824e+01          3.881321e+00
## Salario_Diario         2.001901e+00    -9.258424e+02          1.660927e+02
## Innovacion             1.149911e+00    -2.152945e+01         -1.255865e+00
## Inseguridad_Robo      -2.152945e+01     2.272065e+03         -2.656937e+01
## Inseguridad_Homicidio -1.255865e+00    -2.656937e+01          5.063701e+01
## Tipo_de_Cambio         9.687697e-01    -8.945536e+01          2.338468e+01
## Densidad_Carretera     3.090333e-03    -2.994803e-01          7.752704e-02
## Densidad_Poblacion     1.621845e+00    -1.589738e+02          2.957680e+01
## CO2_Emisiones          6.180296e-02    -3.594647e+00         -3.080327e-01
## PIB_Per_Capita         4.104525e+03    -1.700151e+05          4.447242e+04
## INPC                   5.854670e+00    -7.031076e+02          1.332051e+02
## IED_FlujosMX           9.019878e+04    -3.098033e+06          4.262488e+05
## ExportacionesMX        3.577342e+04    -4.159591e+06          1.138066e+06
##                       Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## periodo                 2.986600e+01       9.860000e-02       4.120340e+01
## Empleo                 -2.223791e-01      -1.105391e-03      -9.326584e-01
## Educacion               2.764304e+00       9.144708e-03       3.854972e+00
## Salario_Diario          1.265472e+02       4.111906e-01       1.672534e+02
## Innovacion              9.687697e-01       3.090333e-03       1.621845e+00
## Inseguridad_Robo       -8.945536e+01      -2.994803e-01      -1.589738e+02
## Inseguridad_Homicidio   2.338468e+01       7.752704e-02       2.957680e+01
## Tipo_de_Cambio          1.722372e+01       5.231600e-02       2.068166e+01
## Densidad_Carretera      5.231600e-02       1.786154e-04       6.856569e-02
## Densidad_Poblacion      2.068166e+01       6.856569e-02       2.928160e+01
## CO2_Emisiones          -1.174623e-01      -3.697391e-04       1.025527e-01
## PIB_Per_Capita          3.240201e+04       1.048757e+02       4.182754e+04
## INPC                    9.627926e+01       3.189026e-01       1.319716e+02
## IED_FlujosMX            4.029994e+05       1.385829e+03       5.227170e+05
## ExportacionesMX         7.950197e+05       2.470075e+03       9.814354e+05
##                       CO2_Emisiones PIB_Per_Capita          INPC  IED_FlujosMX
## periodo                4.560000e-02   6.026610e+04  1.880902e+02  7.548674e+05
## Empleo                -6.470219e-02  -6.479285e+02 -2.245450e+00  3.266634e+03
## Educacion              8.375443e-03   5.599582e+03  1.751434e+01  7.102963e+04
## Salario_Diario        -5.454132e-01   2.138486e+05  8.304526e+02  2.483604e+06
## Innovacion             6.180296e-02   4.104525e+03  5.854670e+00  9.019878e+04
## Inseguridad_Robo      -3.594647e+00  -1.700151e+05 -7.031076e+02 -3.098033e+06
## Inseguridad_Homicidio -3.080327e-01   4.447242e+04  1.332051e+02  4.262488e+05
## Tipo_de_Cambio        -1.174623e-01   3.240201e+04  9.627926e+01  4.029994e+05
## Densidad_Carretera    -3.697391e-04   1.048757e+02  3.189026e-01  1.385829e+03
## Densidad_Poblacion     1.025527e-01   4.182754e+04  1.319716e+02  5.227170e+05
## CO2_Emisiones          3.238296e-02  -1.716434e+02  1.572087e-02 -1.397987e+03
## PIB_Per_Capita        -1.716434e+02   7.851913e+07  1.871820e+05  9.938355e+08
## INPC                   1.572087e-02   1.871820e+05  6.154715e+02  2.334113e+06
## IED_FlujosMX          -1.397987e+03   9.938355e+08  2.334113e+06  2.069258e+10
## ExportacionesMX       -5.626649e+03   1.533901e+09  4.597023e+06  1.789059e+10
##                       ExportacionesMX
## periodo                  1.422734e+06
## Empleo                  -1.178481e+04
## Educacion                1.312817e+05
## Salario_Diario           6.160194e+06
## Innovacion               3.577342e+04
## Inseguridad_Robo        -4.159591e+06
## Inseguridad_Homicidio    1.138066e+06
## Tipo_de_Cambio           7.950197e+05
## Densidad_Carretera       2.470075e+03
## Densidad_Poblacion       9.814354e+05
## CO2_Emisiones           -5.626649e+03
## PIB_Per_Capita           1.533901e+09
## INPC                     4.597023e+06
## IED_FlujosMX             1.789059e+10
## ExportacionesMX          3.803228e+10
df_time_series_descriptive_statistics <- summary(df_time_series)
df_time_series_descriptive_statistics
##       Año         Trimestre      IED_Flujos   
##  Min.   :1999   Min.   :1.00   Min.   : 1341  
##  1st Qu.:2005   1st Qu.:1.75   1st Qu.: 4351  
##  Median :2010   Median :2.50   Median : 6238  
##  Mean   :2011   Mean   :2.50   Mean   : 7036  
##  3rd Qu.:2016   3rd Qu.:3.25   3rd Qu.: 8053  
##  Max.   :2023   Max.   :4.00   Max.   :22794
df_time_series_describe <- describe(df_time_series)
df_time_series_describe
##            vars  n    mean      sd median trimmed     mad     min      max
## Año           1 96 2010.51    6.98 2010.5 2010.50    8.90 1999.00  2023.00
## Trimestre     2 96    2.50    1.12    2.5    2.50    1.48    1.00     4.00
## IED_Flujos    3 96 7036.50 3978.53 6237.9 6458.65 2797.96 1340.58 22794.16
##               range skew kurtosis     se
## Año           24.00 0.01    -1.23   0.71
## Trimestre      3.00 0.00    -1.39   0.11
## IED_Flujos 21453.58 1.60     3.01 406.06
df_time_series_variance <- var(df_time_series)
df_time_series_variance
##                     Año     Trimestre   IED_Flujos
## Año        4.867357e+01  1.578947e-02     8212.462
## Trimestre  1.578947e-02  1.263158e+00    -1861.527
## IED_Flujos 8.212462e+03 -1.861527e+03 15828730.902
# Dependent variable (Inversion Extranjera Directa)
df_cash$IED_FlujosMX
##  [1] 294151.2 210875.6 299734.4 362631.8 546548.4 468332.0 368752.8 481349.2
##  [9] 458544.8 368495.8 542793.7 586217.7 324318.4 449223.7 460653.8 350978.6
## [17] 754437.5 512758.2 699904.1 700091.6 683318.0 671018.4 615945.4 514711.7
## [25] 551937.8 555771.9
summary(df_cash$IED_FlujosMX)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  210876  368560  497054  493596  578606  754438
unique_vals <- unique(df_time_series$Trimestre)
print(unique_vals)
## [1] 1 2 3 4
# Check unique values
unique_vals <- unique(df_time_series$Trimestre)
print(unique_vals)
## [1] 1 2 3 4
# Replace values based on actual values
df_time_series$Trimestre[df_time_series$Trimestre == "I" | df_time_series$Trimestre == "i"] <- "01"
df_time_series$Trimestre[df_time_series$Trimestre == "II" | df_time_series$Trimestre == "ii"] <- "02"
df_time_series$Trimestre[df_time_series$Trimestre == "III" | df_time_series$Trimestre == "iii"] <- "03"
df_time_series$Trimestre[df_time_series$Trimestre == "IV" | df_time_series$Trimestre == "iv"] <- "04"

# Print the updated Trimestre column
print(df_time_series$Trimestre)
##  [1] "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3"
## [20] "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2"
## [39] "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1"
## [58] "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4"
## [77] "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3"
## [96] "4"
# Concatenar las columnas 'Año' y 'Trimestre'
df_time_series$Date <- paste(df_time_series$Año, df_time_series$Trimestre, sep = "/")

# Ver el resultado
print(df_time_series)
##     Año Trimestre IED_Flujos   Date
## 1  1999         1    3596.08 1999/1
## 2  1999         2    3395.89 1999/2
## 3  1999         3    3028.45 1999/3
## 4  1999         4    3939.90 1999/4
## 5  2000         1    4600.64 2000/1
## 6  2000         2    4857.42 2000/2
## 7  2000         3    3056.95 2000/3
## 8  2000         4    5733.68 2000/4
## 9  2001         1    3598.68 2001/1
## 10 2001         2    5218.83 2001/2
## 11 2001         3   16314.05 2001/3
## 12 2001         4    4925.63 2001/4
## 13 2002         1    5067.98 2002/1
## 14 2002         2    6258.52 2002/2
## 15 2002         3    6114.34 2002/3
## 16 2002         4    6658.37 2002/4
## 17 2003         1    3963.69 2003/1
## 18 2003         2    5547.34 2003/2
## 19 2003         3    2521.68 2003/3
## 20 2003         4    6217.27 2003/4
## 21 2004         1    9363.46 2004/1
## 22 2004         2    4351.50 2004/2
## 23 2004         3    3284.91 2004/3
## 24 2004         4    8015.70 2004/4
## 25 2005         1    6761.62 2005/1
## 26 2005         2    6773.62 2005/2
## 27 2005         3    5478.92 2005/3
## 28 2005         4    6781.66 2005/4
## 29 2006         1    7436.81 2006/1
## 30 2006         2    6634.31 2006/2
## 31 2006         3    2346.57 2006/3
## 32 2006         4    4814.85 2006/4
## 33 2007         1   10815.78 2007/1
## 34 2007         2    6137.64 2007/2
## 35 2007         3    7628.45 2007/3
## 36 2007         4    7811.47 2007/4
## 37 2008         1    8546.44 2008/1
## 38 2008         2    8376.14 2008/2
## 39 2008         3    5643.68 2008/3
## 40 2008         4    6936.20 2008/4
## 41 2009         1    6105.30 2009/1
## 42 2009         2    6094.18 2009/2
## 43 2009         3    2397.52 2009/3
## 44 2009         4    3252.95 2009/4
## 45 2010         1    8722.26 2010/1
## 46 2010         2    9301.62 2010/2
## 47 2010         3    3932.89 2010/3
## 48 2010         4    5232.50 2010/4
## 49 2011         1    8431.21 2011/1
## 50 2011         2    6697.41 2011/2
## 51 2011         3    4349.89 2011/3
## 52 2011         4    6154.00 2011/4
## 53 2012         1    7892.69 2012/1
## 54 2012         2    5622.13 2012/2
## 55 2012         3    5736.29 2012/3
## 56 2012         4    2518.21 2012/4
## 57 2013         1   10571.58 2013/1
## 58 2013         2   21019.14 2013/2
## 59 2013         3    4178.81 2013/3
## 60 2013         4   12584.89 2013/4
## 61 2014         1   13828.00 2014/1
## 62 2014         2    5478.74 2014/2
## 63 2014         3    3222.08 2014/3
## 64 2014         4    7822.43 2014/4
## 65 2015         1   12136.33 2015/1
## 66 2015         2    6656.85 2015/2
## 67 2015         3    9635.56 2015/3
## 68 2015         4    7515.02 2015/4
## 69 2016         1   12805.36 2016/1
## 70 2016         2    6210.62 2016/2
## 71 2016         3    4317.27 2016/3
## 72 2016         4    7855.73 2016/4
## 73 2017         1   13779.06 2017/1
## 74 2017         2    6814.16 2017/2
## 75 2017         3    6361.22 2017/3
## 76 2017         4    7062.61 2017/4
## 77 2018         1   14067.52 2018/1
## 78 2018         2    9577.75 2018/2
## 79 2018         3    4132.97 2018/3
## 80 2018         4    6322.19 2018/4
## 81 2019         1   15175.27 2019/1
## 82 2019         2    6504.62 2019/2
## 83 2019         3    8217.40 2019/3
## 84 2019         4    4679.87 2019/4
## 85 2020         1   16807.60 2020/1
## 86 2020         2    7293.96 2020/2
## 87 2020         3    1340.58 2020/3
## 88 2020         4    2763.75 2020/4
## 89 2021         1   16206.05 2021/1
## 90 2021         2    5883.73 2021/2
## 91 2021         3    6419.43 2021/3
## 92 2021         4    3044.31 2021/4
## 93 2022         1   22794.16 2022/1
## 94 2022         2    8164.27 2022/2
## 95 2022         3    3479.68 2022/3
## 96 2023         4    1777.26 2023/4
df_time_series$quarter=as.yearqtr(df_time_series$Date,format="%Y/%q")
head(df_time_series)
##    Año Trimestre IED_Flujos   Date quarter
## 1 1999         1    3596.08 1999/1 1999 Q1
## 2 1999         2    3395.89 1999/2 1999 Q2
## 3 1999         3    3028.45 1999/3 1999 Q3
## 4 1999         4    3939.90 1999/4 1999 Q4
## 5 2000         1    4600.64 2000/1 2000 Q1
## 6 2000         2    4857.42 2000/2 2000 Q2

Data Visualization

#ggplot(df_time_series, aes(x = as.Date(paste(Año, Date, "2"), format = "%Y %m %d"), y = IED_Flujos)) +
#  geom_line() +
#  labs(x = "Year", y = "IED_Flujos", title = "Time Series Plot of IED_Flujos")
ts_data <- ts(df_time_series$IED_Flujos, start = c(1991, 1), frequency = 4)

# Perform the decomposition
decomposed_data <- decompose(ts_data)

# Plot the decomposition
plot(decomposed_data)

# it is important to assess whether the variables under study are stationary or not
ts_data <- ts(df_time_series$IED_Flujos, start = c(1991, 1), frequency = 4) # non-stationary
adf.test(ts_data)
## Warning in adf.test(ts_data): p-value smaller than printed p-value
## 
##  Augmented Dickey-Fuller Test
## 
## data:  ts_data
## Dickey-Fuller = -4.1994, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# Ho: There is no serial autocorrelation 
# H1: There is serial autocorrelation
#Box.test(IED_db_ARMA_residuals,lag=5,type="Ljung-Box") # Reject the Ho. P-value is < 0.05 indicating that ARMA Model does show residual serial autocorrelation. 
#print(Box.test(IED_db_ARMA_residuals))
ljung_box_test <- Box.test(ts_data, lag = 5, type = "Ljung-Box")

# Print the test it should perform the Ljung-Box test on results
print(ljung_box_test)
## 
##  Box-Ljung test
## 
## data:  ts_data
## X-squared = 21.982, df = 5, p-value = 0.0005277

Part 4. Regression Analysis

Hypotheses

  • Hypothesis 1: The presence of a higher Foreign Direct Investment Flow in Mexico is expected to exhibit a positive correlation with the economically active population in the country. This correlation suggests that the influx of foreign investment is attracted to Mexico due to its higher employment rate, indicating a well-educated workforce and a favorable business environment for foreign entities.

  • Hypothesis 2: There is a negative association between the level of foreign direct investment flow and the occurrence of robbery and theft crimes in a country. This hypothesis suggests that as the foreign direct investment flow increases, it can contribute to improved economic conditions and enhanced security measures, potentially leading to a reduction in robbery and theft incidents.

  • Hypothesis 3: A higher percentage of the patent rate per 100,000 inhabitants in a country is expected to have a strong correlation with the Foreign Direct Investment Flow. This variable is considered the most significant factor in attracting foreign investment. The rationale behind this hypothesis is that foreign investment brings new developments, advanced technology, and knowledge, which in turn drive innovation within the country.

Estimate 3 different linear regression models.

Linear regression

linear_regresion_df<-lm(IED_FlujosMX ~ periodo+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(linear_regresion_df)
## 
## Call:
## lm(formula = IED_FlujosMX ~ periodo + Empleo + Innovacion + Tipo_de_Cambio + 
##     CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + 
##     PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + 
##     Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -86345 -30391   5560  29891  80704 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           -1.255e+08  3.382e+08  -0.371  0.71758   
## periodo                5.680e+04  1.749e+05   0.325  0.75143   
## Empleo                 9.505e+04  4.186e+04   2.271  0.04426 * 
## Innovacion             8.253e+04  2.933e+04   2.814  0.01685 * 
## Tipo_de_Cambio         2.267e+04  2.644e+04   0.857  0.40958   
## CO2_Emisiones          2.933e+05  1.999e+05   1.467  0.17025   
## Educacion              2.055e+06  1.775e+06   1.158  0.27136   
## Inseguridad_Robo       1.551e+02  9.177e+02   0.169  0.86888   
## Densidad_Carretera     4.847e+06  7.888e+06   0.614  0.55142   
## PIB_Per_Capita        -7.667e+00  9.019e+00  -0.850  0.41343   
## ExportacionesMX       -1.283e+00  9.226e-01  -1.390  0.19194   
## Salario_Diario        -1.406e+02  4.142e+03  -0.034  0.97354   
## Inseguridad_Homicidio  1.869e+04  1.093e+04   1.711  0.11518   
## Densidad_Poblacion    -2.660e+05  8.498e+04  -3.130  0.00957 **
## INPC                  -1.328e+04  2.013e+04  -0.660  0.52305   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 69350 on 11 degrees of freedom
## Multiple R-squared:  0.8977, Adjusted R-squared:  0.7676 
## F-statistic: 6.897 on 14 and 11 DF,  p-value: 0.001377

Linear regression without “periodo”

linear_regresion2_df<-lm(IED_FlujosMX ~ Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(linear_regresion2_df)
## 
## Call:
## lm(formula = IED_FlujosMX ~ Empleo + Innovacion + Tipo_de_Cambio + 
##     CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + 
##     PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + 
##     Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -92169 -31318   5827  27474  83660 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           -1.569e+07  6.050e+06  -2.594  0.02350 * 
## Empleo                 9.333e+04  3.995e+04   2.336  0.03764 * 
## Innovacion             8.272e+04  2.821e+04   2.933  0.01254 * 
## Tipo_de_Cambio         1.762e+04  2.058e+04   0.856  0.40863   
## CO2_Emisiones          2.756e+05  1.850e+05   1.490  0.16207   
## Educacion              2.577e+06  7.264e+05   3.548  0.00401 **
## Inseguridad_Robo       2.294e+02  8.550e+02   0.268  0.79302   
## Densidad_Carretera     4.838e+06  7.588e+06   0.638  0.53576   
## PIB_Per_Capita        -7.182e+00  8.557e+00  -0.839  0.41770   
## ExportacionesMX       -1.067e+00  6.173e-01  -1.729  0.10943   
## Salario_Diario        -1.419e+02  3.984e+03  -0.036  0.97218   
## Inseguridad_Homicidio  1.950e+04  1.023e+04   1.905  0.08098 . 
## Densidad_Poblacion    -2.674e+05  8.165e+04  -3.275  0.00664 **
## INPC                  -1.153e+04  1.866e+04  -0.618  0.54829   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 66720 on 12 degrees of freedom
## Multiple R-squared:  0.8968, Adjusted R-squared:  0.7849 
## F-statistic: 8.017 on 13 and 12 DF,  p-value: 0.0004816

Logaritmic Linear regression

log_linear_regresion_df<-lm(log(IED_FlujosMX) ~ periodo+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(log_linear_regresion_df)
## 
## Call:
## lm(formula = log(IED_FlujosMX) ~ periodo + Empleo + Innovacion + 
##     Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + 
##     Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + 
##     Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20253 -0.08031  0.02460  0.06382  0.19858 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)            3.970e+01  7.581e+02   0.052   0.9592  
## periodo               -3.140e-02  3.921e-01  -0.080   0.9376  
## Empleo                 2.127e-01  9.385e-02   2.266   0.0446 *
## Innovacion             1.784e-01  6.575e-02   2.713   0.0202 *
## Tipo_de_Cambio         2.969e-02  5.928e-02   0.501   0.6264  
## CO2_Emisiones          5.625e-01  4.481e-01   1.255   0.2354  
## Educacion              5.430e+00  3.979e+00   1.365   0.1996  
## Inseguridad_Robo      -6.764e-04  2.058e-03  -0.329   0.7485  
## Densidad_Carretera     1.071e+01  1.768e+01   0.606   0.5569  
## PIB_Per_Capita        -1.695e-05  2.022e-05  -0.838   0.4198  
## ExportacionesMX       -2.243e-06  2.068e-06  -1.084   0.3014  
## Salario_Diario        -1.994e-03  9.285e-03  -0.215   0.8339  
## Inseguridad_Homicidio  4.624e-02  2.450e-02   1.888   0.0857 .
## Densidad_Poblacion    -5.507e-01  1.905e-01  -2.891   0.0147 *
## INPC                  -1.558e-02  4.514e-02  -0.345   0.7364  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1555 on 11 degrees of freedom
## Multiple R-squared:  0.8944, Adjusted R-squared:  0.7601 
## F-statistic: 6.658 on 14 and 11 DF,  p-value: 0.001609

Logaritmic Linear regression without “periodo”

log_linear_regresion2_df<-lm(log(IED_FlujosMX) ~ Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(log_linear_regresion2_df)
## 
## Call:
## lm(formula = log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio + 
##     CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + 
##     PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + 
##     Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.20292 -0.07955  0.02458  0.06454  0.20099 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           -2.101e+01  1.350e+01  -1.556  0.14572   
## Empleo                 2.137e-01  8.916e-02   2.396  0.03374 * 
## Innovacion             1.783e-01  6.296e-02   2.832  0.01512 * 
## Tipo_de_Cambio         3.248e-02  4.593e-02   0.707  0.49295   
## CO2_Emisiones          5.723e-01  4.128e-01   1.386  0.19092   
## Educacion              5.141e+00  1.621e+00   3.171  0.00805 **
## Inseguridad_Robo      -7.175e-04  1.908e-03  -0.376  0.71349   
## Densidad_Carretera     1.072e+01  1.694e+01   0.633  0.53868   
## PIB_Per_Capita        -1.722e-05  1.910e-05  -0.901  0.38510   
## ExportacionesMX       -2.362e-06  1.378e-06  -1.714  0.11217   
## Salario_Diario        -1.993e-03  8.893e-03  -0.224  0.82642   
## Inseguridad_Homicidio  4.580e-02  2.284e-02   2.005  0.06808 . 
## Densidad_Poblacion    -5.499e-01  1.822e-01  -3.018  0.01071 * 
## INPC                  -1.656e-02  4.164e-02  -0.398  0.69794   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1489 on 12 degrees of freedom
## Multiple R-squared:  0.8944, Adjusted R-squared:   0.78 
## F-statistic: 7.817 on 13 and 12 DF,  p-value: 0.0005452

Polynomial regresion

polynomial_regresion_df <- lm(IED_FlujosMX ~ periodo+Empleo+Innovacion+I(Innovacion^2)+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC, data=df_cash ) 
summary(polynomial_regresion_df)
## 
## Call:
## lm(formula = IED_FlujosMX ~ periodo + Empleo + Innovacion + I(Innovacion^2) + 
##     Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + 
##     Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + 
##     Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -82800 -29173   1974  29850  82890 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           -5.784e+07  4.200e+08  -0.138   0.8932  
## periodo                2.294e+04  2.152e+05   0.107   0.9172  
## Empleo                 9.368e+04  4.395e+04   2.131   0.0589 .
## Innovacion            -9.438e+04  5.953e+05  -0.159   0.8772  
## I(Innovacion^2)        6.718e+03  2.258e+04   0.298   0.7721  
## Tipo_de_Cambio         1.929e+04  2.986e+04   0.646   0.5328  
## CO2_Emisiones          2.780e+05  2.149e+05   1.294   0.2249  
## Educacion              2.219e+06  1.933e+06   1.148   0.2777  
## Inseguridad_Robo       4.308e+01  1.030e+03   0.042   0.9675  
## Densidad_Carretera     4.223e+06  8.499e+06   0.497   0.6300  
## PIB_Per_Capita        -6.985e+00  9.693e+00  -0.721   0.4876  
## ExportacionesMX       -1.220e+00  9.860e-01  -1.238   0.2441  
## Salario_Diario        -1.343e+03  5.919e+03  -0.227   0.8251  
## Inseguridad_Homicidio  2.054e+04  1.299e+04   1.581   0.1450  
## Densidad_Poblacion    -2.694e+05  8.947e+04  -3.011   0.0131 *
## INPC                  -5.523e+03  3.349e+04  -0.165   0.8723  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 72420 on 10 degrees of freedom
## Multiple R-squared:  0.8986, Adjusted R-squared:  0.7466 
## F-statistic:  5.91 on 15 and 10 DF,  p-value: 0.003692

Polynomial regresion without “periodo”

polynomial_regresion2_df <- lm(IED_FlujosMX ~ Empleo+Innovacion+I(Innovacion^2)+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC, data=df_cash ) 
summary(polynomial_regresion2_df)
## 
## Call:
## lm(formula = IED_FlujosMX ~ Empleo + Innovacion + I(Innovacion^2) + 
##     Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + 
##     Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + 
##     Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -83822 -29212   2696  29056  84164 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           -1.307e+07  8.667e+06  -1.509  0.15957   
## Empleo                 9.292e+04  4.137e+04   2.246  0.04622 * 
## Innovacion            -1.278e+05  4.825e+05  -0.265  0.79596   
## I(Innovacion^2)        7.991e+03  1.828e+04   0.437  0.67044   
## Tipo_de_Cambio         1.718e+04  2.133e+04   0.805  0.43769   
## CO2_Emisiones          2.700e+05  1.920e+05   1.406  0.18724   
## Educacion              2.402e+06  8.523e+05   2.818  0.01672 * 
## Inseguridad_Robo       4.347e+01  9.822e+02   0.044  0.96549   
## Densidad_Carretera     4.102e+06  8.036e+06   0.511  0.61978   
## PIB_Per_Capita        -6.715e+00  8.925e+00  -0.752  0.46762   
## ExportacionesMX       -1.146e+00  6.639e-01  -1.726  0.11234   
## Salario_Diario        -1.571e+03  5.264e+03  -0.298  0.77089   
## Inseguridad_Homicidio  2.113e+04  1.123e+04   1.881  0.08671 . 
## Densidad_Poblacion    -2.705e+05  8.484e+04  -3.188  0.00863 **
## INPC                  -3.543e+03  2.658e+04  -0.133  0.89640   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 69080 on 11 degrees of freedom
## Multiple R-squared:  0.8985, Adjusted R-squared:  0.7694 
## F-statistic: 6.956 on 14 and 11 DF,  p-value: 0.001325

Lasso regresion model

### Splitting the Data into Training and Test Sets
# Let's randomly split the data into a training set and a test set
set.seed(123)  # Sets the random seed for reproducibility of results
training.samples <- df_cash$IED_FlujosMX %>%
  createDataPartition(p = 0.75, list = FALSE)  # Consider 75% of the data for building a predictive model

train.data <- df_cash[training.samples, ]  # Training data to fit the linear regression model
test.data <- df_cash[-training.samples, ]  # Testing data to test the linear regression model

# LASSO regression via glmnet package can only take numerical observations. Then, the dataset is transformed to model.matrix() format.
# Independent variables
x <- model.matrix(log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + Densidad_Poblacion + INPC, train.data)[,-1]  # OLS model specification
x
##      Empleo Innovacion Tipo_de_Cambio CO2_Emisiones Educacion Inseguridad_Robo
## 2  96.47043   11.37000           9.94      3.850000    7.2968           314.78
## 3  96.47043   12.46000           9.52      3.690000    7.4012           272.89
## 4  97.83000   13.15000           9.60      3.870000    7.5132           216.98
## 5  97.36000   13.47000           9.17      3.810000    7.6224           214.53
## 7  97.06000   11.81000          11.20      3.950000    7.8364           183.22
## 8  96.48000   12.61000          11.22      3.980000    7.9476           146.28
## 9  97.17000   13.41000          10.71      4.100000    8.0440           136.94
## 10 96.53000   14.23000          10.88      4.190000    8.1320           135.59
## 11 96.60000   15.04000          10.90      4.220000    8.2360           145.92
## 12 95.68000   14.82000          13.77      4.190000    8.3280           158.17
## 13 95.20000   12.59000          13.04      4.040000    8.4160           175.77
## 14 95.06000   12.69000          12.38      4.110000    8.5040           201.94
## 15 95.49000   12.10000          13.98      4.190000    8.5824           212.61
## 16 95.53000   13.03000          12.99      4.200000    8.6540           190.28
## 18 96.24000   13.65000          14.73      3.890000    8.8460           154.41
## 19 96.04000   15.11000          17.34      3.930000    8.9340           180.44
## 20 96.62000   14.40000          20.66      3.890000    9.0220           160.57
## 21 96.85000   14.05000          19.74      3.840000    9.1100           230.43
## 22 96.64000   13.25000          19.66      3.650000    9.1980           184.25
## 23 97.09000   12.70000          18.87      3.590000    9.2860           173.45
## 24 96.21000   11.28000          19.94      3.945217    9.3740           133.90
## 26 97.24000   13.10583          19.41      3.945217    9.5800           120.49
##    Densidad_Carretera PIB_Per_Capita ExportacionesMX Salario_Diario
## 2                0.05       126738.8        248690.6          31.91
## 3                0.06       129164.7        235960.5          31.91
## 4                0.06       130874.9        248057.2          35.12
## 5                0.06       128083.4        205482.9          37.57
## 7                0.06       128737.9        265825.7          41.53
## 8                0.06       132563.5        261173.9          43.30
## 9                0.06       132941.1        292695.1          45.24
## 10               0.06       135894.9        303472.5          47.05
## 11               0.06       137795.7        320110.6          48.88
## 12               0.07       135176.0        336297.2          50.84
## 13               0.07       131233.0        357980.1          53.19
## 14               0.07       134991.7        374607.6          55.77
## 15               0.07       138891.9        437299.9          58.06
## 16               0.07       141530.2        423992.5          60.75
## 18               0.08       147277.4        535151.9          65.58
## 19               0.08       149433.5        583386.1          70.10
## 20               0.08       152275.4        704268.5          73.04
## 21               0.09       153235.7        669368.6          88.36
## 22               0.09       153133.8        695447.7          88.36
## 23               0.09       150233.1        648679.3         102.68
## 24               0.09       142609.3        749594.7         123.22
## 26               0.09       146826.7        713259.0         172.87
##    Inseguridad_Homicidio Densidad_Poblacion   INPC
## 2                14.3200              48.76  39.47
## 3                12.6400              49.48  44.34
## 4                10.8600              50.58  48.31
## 5                10.2500              51.28  50.43
## 7                 9.8100              52.61  55.43
## 8                 8.9200              53.27  58.31
## 9                 9.2200              54.78  60.25
## 10                9.6000              55.44  62.69
## 11                8.0400              56.17  65.05
## 12               12.5200              56.96  69.30
## 13               17.4600              57.73  71.77
## 14               22.4300              58.45  74.93
## 15               23.4200              59.15  77.79
## 16               22.0900              59.85  80.57
## 18               16.9300              60.17  87.19
## 19               17.3700              60.86  89.05
## 20               20.3100              61.57  92.04
## 21               26.2200              62.28  98.27
## 22               29.5900              63.11  99.91
## 23               29.2100              63.90 105.93
## 24               28.9800              64.59 109.27
## 26               17.2924              65.60 126.48
# x <- model.matrix(~., train.data)[,-1]  # Matrix of independent variables X's
y <- train.data$IED_FlujosMX  # Dependent variable
y
##  [1] 210875.6 299734.4 362631.8 546548.4 368752.8 481349.2 458544.8 368495.8
##  [9] 542793.7 586217.7 324318.4 449223.7 460653.8 350978.6 512758.2 699904.1
## [17] 700091.6 683318.0 671018.4 615945.4 514711.7 555771.9
# In estimating LASSO regression, it is important to define the lambda that minimizes the prediction error rate.
# Cross-validation ensures that every data/observation from the original dataset (datains) has a chance of appearing in train and test datasets.
# Find the best lambda using cross-validation.
set.seed(123)
cv.lasso <- cv.glmnet(x, y, alpha = 1)  # alpha = 1 for LASSO
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
## fold
# NO SE PUEDE USAR CON SOLO UNA VARIABLE

# Display the best lambda value
cv.lasso$lambda.min  # Lambda: a numeric value defining the amount of shrinkage. Why min? The higher the value of lambda, the more penalization there is
## [1] 11569.57
# Fit the final model on the training data
lassomodel <- glmnet(x, y, alpha = 1, lambda = cv.lasso$lambda.min)
lassomodel
## 
## Call:  glmnet(x = x, y = y, alpha = 1, lambda = cv.lasso$lambda.min) 
## 
##   Df  %Dev Lambda
## 1  6 79.04  11570
# Display regression coefficients
coef(lassomodel)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                                  s0
## (Intercept)           -1.213581e+06
## Empleo                 7.376953e+03
## Innovacion             5.304716e+04
## Tipo_de_Cambio         1.585699e+04
## CO2_Emisiones         -1.913095e+04
## Educacion              .           
## Inseguridad_Robo       .           
## Densidad_Carretera     1.268284e+06
## PIB_Per_Capita         3.837093e-01
## ExportacionesMX        .           
## Salario_Diario         .           
## Inseguridad_Homicidio  .           
## Densidad_Poblacion     .           
## INPC                   .
# Make predictions on the test data
x.test <- model.matrix(log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + Densidad_Poblacion + INPC, test.data)[,-1]  # OLS model specification
# x.test <- model.matrix(Weekly_Sales ~ ., test.data)[,-1]
lassopredictions <- lassomodel %>% predict(x.test) %>% as.vector()

# Model Accuracy
data.frame(
  RMSE = RMSE(lassopredictions, test.data$IED_FlujosMX),
  Rsquare = R2(lassopredictions, test.data$IED_FlujosMX)
)
##       RMSE   Rsquare
## 1 144761.4 0.4561103
### Visualizing LASSO Regression Results
lbs_fun <- function(fit, offset_x = 1, ...) {
  L <- length(fit$lambda)
  x <- log(fit$lambda[L]) + offset_x
  y <- fit$beta[, L]
  labs <- names(y)
  text(x, y, labels = labs, ...)
}

lasso <- glmnet(scale(x), y, alpha = 1)

plot(lasso, xvar = "lambda", label = TRUE)
lbs_fun(lasso)
abline(v = cv.lasso$lambda.min, col = "brown", lty = 2)
abline(v = cv.lasso$lambda.1se, col = "yellow", lty = 2)

Ridge regresion model

# Find the best lambda using cross-validation
set.seed(123)
cv.ridge <- cv.glmnet(x, y, alpha = 0.1)  # alpha = 0 for RIDGE
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
## fold
# Display the best lambda value
cv.ridge$lambda.min  # Lambda: a numeric value defining the amount of shrinkage. Why min? The higher the value of lambda, the more penalization there is
## [1] 37885.11
# Fit the final model on the training data
ridgemodel <- glmnet(x, y, alpha = 0, lambda = cv.ridge$lambda.min)

# Display regression coefficients
coef(ridgemodel)
## 14 x 1 sparse Matrix of class "dgCMatrix"
##                                  s0
## (Intercept)           -2.232324e+06
## Empleo                 1.835178e+04
## Innovacion             4.460466e+04
## Tipo_de_Cambio         6.556839e+03
## CO2_Emisiones         -7.646332e+04
## Educacion              1.598268e+04
## Inseguridad_Robo      -2.635834e+02
## Densidad_Carretera     1.368237e+06
## PIB_Per_Capita         2.053151e+00
## ExportacionesMX        4.333402e-02
## Salario_Diario        -3.615001e+02
## Inseguridad_Homicidio  4.690874e+02
## Densidad_Poblacion     1.530222e+03
## INPC                   1.572627e+02
# Make predictions on the test data
x.test <- model.matrix(log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + Densidad_Poblacion + INPC, test.data)[,-1]
ridgepredictions <- ridgemodel %>% predict(x.test) %>% as.vector()

# Model Accuracy
data.frame(
  RMSE = RMSE(ridgepredictions, test.data$IED_FlujosMX),
  Rsquare = R2(ridgepredictions, test.data$IED_FlujosMX)
)
##       RMSE   Rsquare
## 1 135411.3 0.5144119
### Visualizing Ridge Regression Results
ridge <- glmnet(scale(x), y, alpha = 0)
plot(ridge, xvar = "lambda", label = TRUE)
lbs_fun(ridge)
abline(v = cv.ridge$lambda.min, col = "brown", lty = 2)
abline(v = cv.ridge$lambda.1se, col = "yellow", lty = 2)

### Diagnostic Tests
# Linear regression:
vif(linear_regresion_df)
##               periodo                Empleo            Innovacion 
##           9300.244675              4.730382              5.141043 
##        Tipo_de_Cambio         CO2_Emisiones             Educacion 
##             62.602797              6.723812           8347.798634 
##      Inseguridad_Robo    Densidad_Carretera        PIB_Per_Capita 
##              9.947260             57.766617             33.202772 
##       ExportacionesMX        Salario_Diario Inseguridad_Homicidio 
##            168.278645            114.585619             31.424549 
##    Densidad_Poblacion                  INPC 
##           1099.236379           1296.841200
bptest(linear_regresion_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  linear_regresion_df
## BP = 6.5965, df = 14, p-value = 0.9491
AIC(linear_regresion_df)
## [1] 663.0599
histogram(linear_regresion_df$residuals)

# Linear regression without "periodo":
vif(linear_regresion2_df)
##                Empleo            Innovacion        Tipo_de_Cambio 
##              4.654336              5.138823             40.959177 
##         CO2_Emisiones             Educacion      Inseguridad_Robo 
##              6.222893           1511.043702              9.328882 
##    Densidad_Carretera        PIB_Per_Capita       ExportacionesMX 
##             57.765894             32.294264             81.407268 
##        Salario_Diario Inseguridad_Homicidio    Densidad_Poblacion 
##            114.585512             29.790759           1096.435161 
##                  INPC 
##           1203.388890
bptest(linear_regresion2_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  linear_regresion2_df
## BP = 6.9369, df = 13, p-value = 0.9054
AIC(linear_regresion2_df)
## [1] 661.3081
histogram(linear_regresion2_df$residuals)

# Log Linear regression:
vif(log_linear_regresion_df)
##               periodo                Empleo            Innovacion 
##           9300.244675              4.730382              5.141043 
##        Tipo_de_Cambio         CO2_Emisiones             Educacion 
##             62.602797              6.723812           8347.798634 
##      Inseguridad_Robo    Densidad_Carretera        PIB_Per_Capita 
##              9.947260             57.766617             33.202772 
##       ExportacionesMX        Salario_Diario Inseguridad_Homicidio 
##            168.278645            114.585619             31.424549 
##    Densidad_Poblacion                  INPC 
##           1099.236379           1296.841200
bptest(log_linear_regresion_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  log_linear_regresion_df
## BP = 5.0509, df = 14, p-value = 0.9851
AIC(log_linear_regresion_df)
## [1] -13.36425
histogram(log_linear_regresion_df$residuals)

# Log Linear regression without "periodo":
vif(log_linear_regresion2_df)
##                Empleo            Innovacion        Tipo_de_Cambio 
##              4.654336              5.138823             40.959177 
##         CO2_Emisiones             Educacion      Inseguridad_Robo 
##              6.222893           1511.043702              9.328882 
##    Densidad_Carretera        PIB_Per_Capita       ExportacionesMX 
##             57.765894             32.294264             81.407268 
##        Salario_Diario Inseguridad_Homicidio    Densidad_Poblacion 
##            114.585512             29.790759           1096.435161 
##                  INPC 
##           1203.388890
bptest(log_linear_regresion2_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  log_linear_regresion2_df
## BP = 4.9872, df = 13, p-value = 0.9755
AIC(log_linear_regresion2_df)
## [1] -15.34909
histogram(log_linear_regresion2_df$residuals)

# Polynomial regression:
vif(polynomial_regresion_df)
##               periodo                Empleo            Innovacion 
##          12911.624364              4.782716           1942.427754 
##       I(Innovacion^2)        Tipo_de_Cambio         CO2_Emisiones 
##           1944.059476             73.201471              7.130581 
##             Educacion      Inseguridad_Robo    Densidad_Carretera 
##           9083.580621             11.481684             61.505633 
##        PIB_Per_Capita       ExportacionesMX        Salario_Diario 
##             35.166094            176.261000            214.651936 
## Inseguridad_Homicidio    Densidad_Poblacion                  INPC 
##             40.756241           1117.525234           3291.032434
bptest(polynomial_regresion_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  polynomial_regresion_df
## BP = 11.514, df = 15, p-value = 0.7154
AIC(polynomial_regresion_df)
## [1] 664.8307
histogram(polynomial_regresion_df$residuals)

# Polynomial regression without "periodo":
vif(polynomial_regresion2_df)
##                Empleo            Innovacion       I(Innovacion^2) 
##              4.656670           1402.428957           1400.306288 
##        Tipo_de_Cambio         CO2_Emisiones             Educacion 
##             41.050732              6.250776           1939.855948 
##      Inseguridad_Robo    Densidad_Carretera        PIB_Per_Capita 
##             11.481533             60.412638             32.763627 
##       ExportacionesMX        Salario_Diario Inseguridad_Homicidio 
##             87.815456            186.570266             33.464161 
##    Densidad_Poblacion                  INPC 
##           1103.966735           2278.408362
bptest(polynomial_regresion2_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  polynomial_regresion2_df
## BP = 11.056, df = 14, p-value = 0.6816
AIC(polynomial_regresion2_df)
## [1] 662.8602
histogram(polynomial_regresion2_df$residuals)

### Adjusting the regression models by not including all variables
# Adjusted Linear regression:
linear_regresion3_df<-lm(IED_FlujosMX ~ lag(IED_FlujosMX)+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(linear_regresion3_df)
## 
## Call:
## lm(formula = IED_FlujosMX ~ lag(IED_FlujosMX) + Empleo + Innovacion + 
##     Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + 
##     Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + 
##     Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -69413 -25598  -3945  23034  81196 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)   
## (Intercept)           -1.928e+07  5.817e+06  -3.315  0.00782 **
## lag(IED_FlujosMX)     -4.201e-01  2.207e-01  -1.904  0.08611 . 
## Empleo                 1.190e+05  4.045e+04   2.941  0.01476 * 
## Innovacion             5.969e+04  3.072e+04   1.943  0.08064 . 
## Tipo_de_Cambio         2.858e+04  2.086e+04   1.370  0.20062   
## CO2_Emisiones          1.312e+05  1.835e+05   0.715  0.49096   
## Educacion              2.477e+06  8.371e+05   2.959  0.01431 * 
## Inseguridad_Robo       5.658e+02  8.528e+02   0.663  0.52203   
## Densidad_Carretera     7.320e+06  7.153e+06   1.023  0.33022   
## PIB_Per_Capita        -1.011e+01  8.350e+00  -1.211  0.25375   
## ExportacionesMX       -9.568e-01  5.705e-01  -1.677  0.12443   
## Salario_Diario        -1.148e+03  4.811e+03  -0.239  0.81628   
## Inseguridad_Homicidio  7.911e+03  1.323e+04   0.598  0.56323   
## Densidad_Poblacion    -1.997e+05  9.551e+04  -2.091  0.06301 . 
## INPC                  -1.975e+04  1.996e+04  -0.989  0.34587   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 61180 on 10 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.9214, Adjusted R-squared:  0.8112 
## F-statistic: 8.368 on 14 and 10 DF,  p-value: 0.0009276
# Adjusted Logartimic Linear regression:
log_linear_regresion3_df<-lm(log(IED_FlujosMX) ~ lag(IED_FlujosMX)+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(log_linear_regresion3_df)
## 
## Call:
## lm(formula = log(IED_FlujosMX) ~ lag(IED_FlujosMX) + Empleo + 
##     Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion + 
##     Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + 
##     ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + 
##     Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.14763 -0.06662  0.01253  0.05292  0.19195 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           -2.791e+01  1.338e+01  -2.085   0.0636 .
## lag(IED_FlujosMX)     -9.335e-07  5.076e-07  -1.839   0.0958 .
## Empleo                 2.784e-01  9.306e-02   2.991   0.0135 *
## Innovacion             1.183e-01  7.066e-02   1.674   0.1251  
## Tipo_de_Cambio         6.257e-02  4.799e-02   1.304   0.2215  
## CO2_Emisiones          2.816e-01  4.221e-01   0.667   0.5199  
## Educacion              4.353e+00  1.926e+00   2.261   0.0473 *
## Inseguridad_Robo       2.927e-04  1.962e-03   0.149   0.8844  
## Densidad_Carretera     1.697e+01  1.645e+01   1.031   0.3266  
## PIB_Per_Capita        -2.592e-05  1.921e-05  -1.349   0.2069  
## ExportacionesMX       -2.089e-06  1.312e-06  -1.592   0.1425  
## Salario_Diario        -7.603e-03  1.107e-02  -0.687   0.5077  
## Inseguridad_Homicidio  1.410e-02  3.044e-02   0.463   0.6532  
## Densidad_Poblacion    -3.579e-01  2.197e-01  -1.629   0.1344  
## INPC                  -2.178e-02  4.592e-02  -0.474   0.6455  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1407 on 10 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.9134, Adjusted R-squared:  0.7921 
## F-statistic: 7.533 on 14 and 10 DF,  p-value: 0.00144
# Adjusted Polynomial regression:
polynomial_regresion3_df <- lm(IED_FlujosMX ~ lag(IED_FlujosMX)+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+I(Innovacion^2)+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(polynomial_regresion3_df)
## 
## Call:
## lm(formula = IED_FlujosMX ~ lag(IED_FlujosMX) + Empleo + Innovacion + 
##     Tipo_de_Cambio + CO2_Emisiones + Educacion + I(Innovacion^2) + 
##     Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + 
##     ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + 
##     Densidad_Poblacion + INPC, data = df_cash)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -69204 -25692  -3701  23449  81170 
## 
## Coefficients:
##                         Estimate Std. Error t value Pr(>|t|)  
## (Intercept)           -1.942e+07  8.754e+06  -2.218   0.0537 .
## lag(IED_FlujosMX)     -4.214e-01  2.408e-01  -1.750   0.1141  
## Empleo                 1.191e+05  4.296e+04   2.772   0.0217 *
## Innovacion             6.955e+04  4.628e+05   0.150   0.8838  
## Tipo_de_Cambio         2.865e+04  2.219e+04   1.291   0.2290  
## CO2_Emisiones          1.310e+05  1.935e+05   0.677   0.5153  
## Educacion              2.484e+06  9.430e+05   2.634   0.0272 *
## I(Innovacion^2)       -3.776e+02  1.767e+04  -0.021   0.9834  
## Inseguridad_Robo       5.761e+02  1.019e+03   0.565   0.5856  
## Densidad_Carretera     7.364e+06  7.812e+06   0.943   0.3705  
## PIB_Per_Capita        -1.015e+01  8.950e+00  -1.134   0.2862  
## ExportacionesMX       -9.527e-01  6.312e-01  -1.510   0.1654  
## Salario_Diario        -1.088e+03  5.781e+03  -0.188   0.8549  
## Inseguridad_Homicidio  7.789e+03  1.508e+04   0.516   0.6180  
## Densidad_Poblacion    -1.993e+05  1.026e+05  -1.943   0.0839 .
## INPC                  -2.013e+04  2.767e+04  -0.728   0.4853  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 64490 on 9 degrees of freedom
##   (1 observation deleted due to missingness)
## Multiple R-squared:  0.9214, Adjusted R-squared:  0.7903 
## F-statistic: 7.029 on 15 and 9 DF,  p-value: 0.002846
### Adjusted Diagnostics Tests
# Adjusted Linear regression:
vif(linear_regresion3_df)
##     lag(IED_FlujosMX)                Empleo            Innovacion 
##              6.677654              5.676148              6.390772 
##        Tipo_de_Cambio         CO2_Emisiones             Educacion 
##             45.921300              6.623875           2094.472155 
##      Inseguridad_Robo    Densidad_Carretera        PIB_Per_Capita 
##              9.707707             54.669429             34.229534 
##       ExportacionesMX        Salario_Diario Inseguridad_Homicidio 
##             78.538609            187.950284             58.857167 
##    Densidad_Poblacion                  INPC 
##           1536.038948           1443.904797
bptest(linear_regresion3_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  linear_regresion3_df
## BP = 8.8414, df = 14, p-value = 0.8411
AIC(linear_regresion3_df)
## [1] 631.1195
histogram(linear_regresion3_df$residuals)

# Adjusted Log Linear regression:
vif(log_linear_regresion3_df)
##     lag(IED_FlujosMX)                Empleo            Innovacion 
##              6.677654              5.676148              6.390772 
##        Tipo_de_Cambio         CO2_Emisiones             Educacion 
##             45.921300              6.623875           2094.472155 
##      Inseguridad_Robo    Densidad_Carretera        PIB_Per_Capita 
##              9.707707             54.669429             34.229534 
##       ExportacionesMX        Salario_Diario Inseguridad_Homicidio 
##             78.538609            187.950284             58.857167 
##    Densidad_Poblacion                  INPC 
##           1536.038948           1443.904797
bptest(log_linear_regresion3_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  log_linear_regresion3_df
## BP = 7.6539, df = 14, p-value = 0.9066
AIC(log_linear_regresion3_df)
## [1] -18.00063
histogram(log_linear_regresion3_df$residuals)

# Adjusted Polynomial regression:
vif(polynomial_regresion3_df)
##     lag(IED_FlujosMX)                Empleo            Innovacion 
##              7.158290              5.762157           1305.720393 
##        Tipo_de_Cambio         CO2_Emisiones             Educacion 
##             46.778600              6.632396           2392.565230 
##       I(Innovacion^2)      Inseguridad_Robo    Densidad_Carretera 
##           1342.833493             12.470976             58.698280 
##        PIB_Per_Capita       ExportacionesMX        Salario_Diario 
##             35.392175             86.523871            244.246185 
## Inseguridad_Homicidio    Densidad_Poblacion                  INPC 
##             68.813372           1595.158196           2495.929531
bptest(polynomial_regresion3_df)
## 
##  studentized Breusch-Pagan test
## 
## data:  polynomial_regresion3_df
## BP = 9.0352, df = 15, p-value = 0.8757
AIC(polynomial_regresion3_df)
## [1] 633.1182
histogram(polynomial_regresion3_df$residuals)

Selection of the regression model that better fits the data (considering diagnostic tests in selecting the model). And the interpretation of the regression results of the selected regression model.

To select the best regression model, several analyses and changes were made to the variables and tests. Approximately five changes were made, including excluding variables suggested by the lasso model, adding variables based on their significance with the dependent variable, and considering the inclusion of lagged variables. The final model included all variables and the lagged variable “lag(IED_FlujoMX)”. The Adjusted Logarithmic Linear Regression Model was chosen as it showed a good fit with a relatively high R-squared value of 0.7921. Although slightly lower than the R-squared value of the linear regression model (0.8112), the logarithmic model still provides a good fit. Additionally, the residual error of the logarithmic model (-18.00063) was significantly lower than that of the linear regression model (631.1195), further supporting its suitability. The regression results of the selected model indicate that employment (Empleo) and education (Educacion) levels are significant predictors of foreign direct investment in Mexico. Specifically, the coefficient for “Empleo” is highly positive, suggesting that an increase in the economically active population is associated with a higher level of foreign direct investment. This finding supports the hypothesis that there is a positive correlation between employment and foreign direct investment flow. Similarly, the positive coefficient for “Educacion” indicates that higher education levels contribute to attracting foreign direct investment to Mexico. (Education) also shows a significant positive coefficient, suggesting that higher education levels are associated with higher levels of foreign direct investment in Mexico.

Show the predicted values of the dependent variable

effect_plot(log_linear_regresion3_df,pred=Empleo,interval=TRUE)
## Using data df_cash from global environment. This could cause incorrect
## results if df_cash has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.
## Warning: Removed 1 row containing missing values (`geom_path()`).

effect_plot(log_linear_regresion3_df,pred=Educacion,interval=TRUE)
## Using data df_cash from global environment. This could cause incorrect
## results if df_cash has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.
## Warning: Removed 1 row containing missing values (`geom_path()`).

effect_plot(log_linear_regresion3_df,pred=Innovacion,interval=TRUE)
## Using data df_cash from global environment. This could cause incorrect
## results if df_cash has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.
## Warning: Removed 1 row containing missing values (`geom_path()`).


Part 5. Conclusions

Briefly summarize the main insights from your data analysis in Parts 3-4.

Summary of Data Analysis Insights:

  1. Model Selection: Initially, the Linear Regression Model showed the best fit among all the regression models, with the highest adjusted R-squared value. However, considering multicollinearity issues, adjustments were made to the variables. The inclusion of the new variable “lag(IED_FlujosMX)” and the exclusion of the “periodo” variable resulted in an improved fit. The Adjusted Logarithmic Linear Regression Model had the second-highest R-squared value (0.7921) compared to the Linear Regression Model (0.8112), but it had a significantly lower residual error (-18.00063) compared to the linear model (631.1195).

  2. Hypothesis Confirmation: Based on the regression model results, Hypothesis 1, which suggests a positive relationship between employment (Empleo) and foreign direct investment (FDI), aligns with the analysis. The highly positive coefficient for “Empleo” indicates that an increase in the economically active population is associated with a higher level of FDI.

  3. Ridge Model Performance: The Ridge model exhibited a lower Root Mean Squared Error (RMSE) and a slightly higher R-squared value compared to the Lasso model, suggesting that it may perform better in predicting FDI. The Ridge model had an RMSE of 135411.3 and an R-squared value of 0.5144119.

  4. Multicollinearity: There is evidence of multicollinearity among the independent variables in all three models. This suggests that some variables may be correlated with each other, potentially affecting the stability and reliability of the coefficient estimates. Although efforts were made to address multicollinearity, further investigation and analysis are needed to mitigate its presence.

  5. Heteroscedasticity: The Breusch-Pagan tests conducted on all models did not indicate any significant signs of heteroscedasticity, suggesting that the assumption of constant variance is reasonable.

  6. Insights and Lessons Learned: The analysis revealed that variables such as education, employment, and population density had more significant impacts on foreign direct investment than initially anticipated. This highlights the importance of conducting thorough analysis and diagnostic tests to uncover the true relationships and identify the most influential factors. It also emphasizes the need for continuous investigation and further analysis to refine and obtain accurate results.

References