Time series analysis is a statistical method used to analyze and interpret data points collected over time, typically at regular intervals. It focuses on understanding and modeling the patterns, trends, and dependencies within a time-ordered dataset. Time series data can be found in various fields, including finance, economics, weather forecasting, and more, making it a crucial tool for forecasting future values and making informed decisions.
One widely recognized reference for time series analysis is the book “Time Series Analysis and Its Applications: With R Examples” by Robert H. Shumway and David S. Stoffer. This book provides a comprehensive introduction to time series analysis, covering topics such as data decomposition, trend identification, seasonality detection, and various statistical methods for modeling and forecasting time series data. Time series analysis plays a pivotal role in understanding historical patterns, making predictions, and informing decision-making processes across numerous domains.
Mexico is a popular nearshoring destination for companies, especially those in the United States, due to its geographic proximity, cost-effectiveness, and skilled workforce.
Supply Chain Diversification: Many companies were looking to diversify their supply chains to reduce dependency on a single region, especially in light of disruptions caused by the COVID-19 pandemic.
Digital Transformation: The adoption of digital technologies and Industry 4.0 practices was on the rise in Mexican manufacturing facilities, making it an attractive destination for companies seeking advanced capabilities.
Talent Development: The Mexican government and educational institutions were investing in skill development programs to ensure a steady supply of skilled labor for industries like IT, manufacturing, and engineering.
Logistics Infrastructure: Investments in logistics and transportation infrastructure were improving connectivity and reducing lead times, making Mexico even more competitive for nearshoring.
The problem situation seen in the Case Study is first of all defining why Mexico is an attractive country for nearhsoring, as the title of the Case Study states but as well the problem situation is to show which econometric model(s) should be applied to predict the effect of nearshoring in Mexico therefore seeing what investors may consider when relocating their investments ot Mexico in 2023 and in the future years.
This problem situation will be approached by analizing the case study and some simple background information about the topic in question. Next the data base provided will be analyzed and tested with R studio, determining our depedent variable which is “Flujos de Inversion Extranjera Directa” and lastly do different tests and models to reach and predict the effect of nearshoring in Mexico, seeing which variable impacts the most the dependent variable reaching a conclusion and an accurate interpratation of the results.
library(foreign)
library(dplyr) # data manipulation
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(forcats) # to work with categorical variables
library(ggplot2) # data visualization
library(readr) # read dfecific csv files
library(janitor) # data exploration and cleaning
##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(Hmisc) # several useful functions for data analysis
##
## Attaching package: 'Hmisc'
## The following objects are masked from 'package:dplyr':
##
## src, summarize
## The following objects are masked from 'package:base':
##
## format.pval, units
library(psych) # functions for multivariate analysis
##
## Attaching package: 'psych'
## The following object is masked from 'package:Hmisc':
##
## describe
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(naniar) # summaries and visualization of missing values NA's
library(corrplot) # correlation plots
## corrplot 0.92 loaded
library(jtools) # presentation of regression analysis
##
## Attaching package: 'jtools'
## The following object is masked from 'package:Hmisc':
##
## %nin%
library(lmtest) # diagnostic checks - linear regression analysis
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(car) # diagnostic checks - linear regression analysis
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:psych':
##
## logit
## The following object is masked from 'package:dplyr':
##
## recode
library(olsrr) # diagnostic checks - linear regression analysis
##
## Attaching package: 'olsrr'
## The following object is masked from 'package:datasets':
##
## rivers
library(naniar) # identifying missing values
library(stargazer) # create publication quality tables
##
## Please cite as:
## Hlavac, Marek (2022). stargazer: Well-Formatted Regression and Summary Statistics Tables.
## R package version 5.2.3. https://CRAN.R-project.org/package=stargazer
library(effects) # didflays for linear and other regression models
## lattice theme set by effectsTheme()
## See ?effectsTheme for details.
library(tidyverse) # collection of R packages designed for data science
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ✔ stringr 1.5.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ psych::%+%() masks ggplot2::%+%()
## ✖ psych::alpha() masks ggplot2::alpha()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ✖ car::recode() masks dplyr::recode()
## ✖ purrr::some() masks car::some()
## ✖ Hmisc::src() masks dplyr::src()
## ✖ Hmisc::summarize() masks dplyr::summarize()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(caret) # Classification and Regression Training
## Loading required package: lattice
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(glmnet) # methods for prediction and plotting, and functions for cross-validation
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
## Loaded glmnet 4.1-7
library(xts)
##
## ######################### Warning from 'xts' package ##########################
## # #
## # The dplyr lag() function breaks how base R's lag() function is supposed to #
## # work, which breaks lag(my_xts). Calls to lag(my_xts) that you type or #
## # source() into this session won't work correctly. #
## # #
## # Use stats::lag() to make sure you're not using dplyr::lag(), or you can add #
## # conflictRules('dplyr', exclude = 'lag') to your .Rprofile to stop #
## # dplyr from breaking base R's lag() function. #
## # #
## # Code in packages is not affected. It's protected by R's namespace mechanism #
## # Set `options(xts.warn_dplyr_breaks_lag = FALSE)` to suppress this warning. #
## # #
## ###############################################################################
##
## Attaching package: 'xts'
##
## The following objects are masked from 'package:dplyr':
##
## first, last
library(tseries)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
#file.choose()
df <- read.csv("/Users/genarorodriguezalcantara/Desktop/Tec/Introduction to econometrics/Ev1/ev2_data.csv")
df # Confirm that the data base has been uploaded correctly.
## periodo IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## 1 1997 12145.60 9087.62 NA 7.20 24.30 11.30
## 2 1998 8373.50 9875.07 NA 7.31 31.91 11.37
## 3 1999 13960.32 10990.01 NA 7.43 31.91 12.46
## 4 2000 18248.69 12482.96 97.83 7.56 35.12 13.15
## 5 2001 30057.18 11300.44 97.36 7.68 37.57 13.47
## 6 2002 24099.21 11923.10 97.66 7.80 39.74 12.80
## 7 2003 18249.97 13156.00 97.06 7.93 41.53 11.81
## 8 2004 25015.57 13573.13 96.48 8.04 43.30 12.61
## 9 2005 25795.82 16465.81 97.17 8.14 45.24 13.41
## 10 2006 21232.54 17485.93 96.53 8.26 47.05 14.23
## 11 2007 32393.33 19103.85 96.60 8.36 48.88 15.04
## 12 2008 29502.46 16924.76 95.68 8.46 50.84 14.82
## 13 2009 17849.95 19702.63 95.20 8.56 53.19 12.59
## 14 2010 27189.28 22673.14 95.06 8.63 55.77 12.69
## 15 2011 25632.52 24333.02 95.49 8.75 58.06 12.10
## 16 2012 21769.32 26297.98 95.53 8.85 60.75 13.03
## 17 2013 48354.42 27687.57 95.75 8.95 63.12 13.22
## 18 2014 30351.25 31676.78 96.24 9.05 65.58 13.65
## 19 2015 35943.75 29959.94 96.04 9.15 70.10 15.11
## 20 2016 31188.98 31375.06 96.62 9.25 73.04 14.40
## 21 2017 34017.05 33322.62 96.85 9.35 88.36 14.05
## 22 2018 34100.43 35341.90 96.64 9.45 88.36 13.25
## 23 2019 34577.16 36414.73 97.09 9.58 102.68 12.70
## 24 2020 28205.89 41077.34 96.21 NA 123.22 11.28
## 25 2021 31553.52 44914.78 96.49 NA 141.70 NA
## 26 2022 36215.37 46477.59 97.24 NA 172.87 NA
## Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera
## 1 266.51 14.55 8.06 0.05
## 2 314.78 14.32 9.94 0.05
## 3 272.89 12.64 9.52 0.06
## 4 216.98 10.86 9.60 0.06
## 5 214.53 10.25 9.17 0.06
## 6 197.80 9.94 10.36 0.06
## 7 183.22 9.81 11.20 0.06
## 8 146.28 8.92 11.22 0.06
## 9 136.94 9.22 10.71 0.06
## 10 135.59 9.60 10.88 0.06
## 11 145.92 8.04 10.90 0.06
## 12 158.17 12.52 13.77 0.07
## 13 175.77 17.46 13.04 0.07
## 14 201.94 22.43 12.38 0.07
## 15 212.61 23.42 13.98 0.07
## 16 190.28 22.09 12.99 0.07
## 17 185.56 19.74 13.07 0.08
## 18 154.41 16.93 14.73 0.08
## 19 180.44 17.37 17.34 0.08
## 20 160.57 20.31 20.66 0.08
## 21 230.43 26.22 19.74 0.09
## 22 184.25 29.59 19.66 0.09
## 23 173.45 29.21 18.87 0.09
## 24 133.90 28.98 19.94 0.09
## 25 127.13 27.89 20.52 0.09
## 26 120.49 NA 19.41 0.09
## Densidad_Poblacion CO2_Emisiones PIB_Per_Capita INPC
## 1 47.44 3.68 127570.1 33.28
## 2 48.76 3.85 126738.8 39.47
## 3 49.48 3.69 129164.7 44.34
## 4 50.58 3.87 130874.9 48.31
## 5 51.28 3.81 128083.4 50.43
## 6 51.95 3.82 128205.9 53.31
## 7 52.61 3.95 128737.9 55.43
## 8 53.27 3.98 132563.5 58.31
## 9 54.78 4.10 132941.1 60.25
## 10 55.44 4.19 135894.9 62.69
## 11 56.17 4.22 137795.7 65.05
## 12 56.96 4.19 135176.0 69.30
## 13 57.73 4.04 131233.0 71.77
## 14 58.45 4.11 134991.7 74.93
## 15 59.15 4.19 138891.9 77.79
## 16 59.85 4.20 141530.2 80.57
## 17 59.49 4.06 144112.0 83.77
## 18 60.17 3.89 147277.4 87.19
## 19 60.86 3.93 149433.5 89.05
## 20 61.57 3.89 152275.4 92.04
## 21 62.28 3.84 153235.7 98.27
## 22 63.11 3.65 153133.8 99.91
## 23 63.90 3.59 150233.1 105.93
## 24 64.59 NA 142609.3 109.27
## 25 65.16 NA 142772.0 117.31
## 26 65.60 NA 146826.7 126.48
df_cash <- read.csv("/Users/genarorodriguezalcantara/Desktop/Tec/Introduction to econometrics/Ev1/ev2_data.csv")
df_cash # Confirm that the data base has been uploaded correctly.
## periodo IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## 1 1997 12145.60 9087.62 NA 7.20 24.30 11.30
## 2 1998 8373.50 9875.07 NA 7.31 31.91 11.37
## 3 1999 13960.32 10990.01 NA 7.43 31.91 12.46
## 4 2000 18248.69 12482.96 97.83 7.56 35.12 13.15
## 5 2001 30057.18 11300.44 97.36 7.68 37.57 13.47
## 6 2002 24099.21 11923.10 97.66 7.80 39.74 12.80
## 7 2003 18249.97 13156.00 97.06 7.93 41.53 11.81
## 8 2004 25015.57 13573.13 96.48 8.04 43.30 12.61
## 9 2005 25795.82 16465.81 97.17 8.14 45.24 13.41
## 10 2006 21232.54 17485.93 96.53 8.26 47.05 14.23
## 11 2007 32393.33 19103.85 96.60 8.36 48.88 15.04
## 12 2008 29502.46 16924.76 95.68 8.46 50.84 14.82
## 13 2009 17849.95 19702.63 95.20 8.56 53.19 12.59
## 14 2010 27189.28 22673.14 95.06 8.63 55.77 12.69
## 15 2011 25632.52 24333.02 95.49 8.75 58.06 12.10
## 16 2012 21769.32 26297.98 95.53 8.85 60.75 13.03
## 17 2013 48354.42 27687.57 95.75 8.95 63.12 13.22
## 18 2014 30351.25 31676.78 96.24 9.05 65.58 13.65
## 19 2015 35943.75 29959.94 96.04 9.15 70.10 15.11
## 20 2016 31188.98 31375.06 96.62 9.25 73.04 14.40
## 21 2017 34017.05 33322.62 96.85 9.35 88.36 14.05
## 22 2018 34100.43 35341.90 96.64 9.45 88.36 13.25
## 23 2019 34577.16 36414.73 97.09 9.58 102.68 12.70
## 24 2020 28205.89 41077.34 96.21 NA 123.22 11.28
## 25 2021 31553.52 44914.78 96.49 NA 141.70 NA
## 26 2022 36215.37 46477.59 97.24 NA 172.87 NA
## Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera
## 1 266.51 14.55 8.06 0.05
## 2 314.78 14.32 9.94 0.05
## 3 272.89 12.64 9.52 0.06
## 4 216.98 10.86 9.60 0.06
## 5 214.53 10.25 9.17 0.06
## 6 197.80 9.94 10.36 0.06
## 7 183.22 9.81 11.20 0.06
## 8 146.28 8.92 11.22 0.06
## 9 136.94 9.22 10.71 0.06
## 10 135.59 9.60 10.88 0.06
## 11 145.92 8.04 10.90 0.06
## 12 158.17 12.52 13.77 0.07
## 13 175.77 17.46 13.04 0.07
## 14 201.94 22.43 12.38 0.07
## 15 212.61 23.42 13.98 0.07
## 16 190.28 22.09 12.99 0.07
## 17 185.56 19.74 13.07 0.08
## 18 154.41 16.93 14.73 0.08
## 19 180.44 17.37 17.34 0.08
## 20 160.57 20.31 20.66 0.08
## 21 230.43 26.22 19.74 0.09
## 22 184.25 29.59 19.66 0.09
## 23 173.45 29.21 18.87 0.09
## 24 133.90 28.98 19.94 0.09
## 25 127.13 27.89 20.52 0.09
## 26 120.49 NA 19.41 0.09
## Densidad_Poblacion CO2_Emisiones PIB_Per_Capita INPC
## 1 47.44 3.68 127570.1 33.28
## 2 48.76 3.85 126738.8 39.47
## 3 49.48 3.69 129164.7 44.34
## 4 50.58 3.87 130874.9 48.31
## 5 51.28 3.81 128083.4 50.43
## 6 51.95 3.82 128205.9 53.31
## 7 52.61 3.95 128737.9 55.43
## 8 53.27 3.98 132563.5 58.31
## 9 54.78 4.10 132941.1 60.25
## 10 55.44 4.19 135894.9 62.69
## 11 56.17 4.22 137795.7 65.05
## 12 56.96 4.19 135176.0 69.30
## 13 57.73 4.04 131233.0 71.77
## 14 58.45 4.11 134991.7 74.93
## 15 59.15 4.19 138891.9 77.79
## 16 59.85 4.20 141530.2 80.57
## 17 59.49 4.06 144112.0 83.77
## 18 60.17 3.89 147277.4 87.19
## 19 60.86 3.93 149433.5 89.05
## 20 61.57 3.89 152275.4 92.04
## 21 62.28 3.84 153235.7 98.27
## 22 63.11 3.65 153133.8 99.91
## 23 63.90 3.59 150233.1 105.93
## 24 64.59 NA 142609.3 109.27
## 25 65.16 NA 142772.0 117.31
## 26 65.60 NA 146826.7 126.48
df_time_series <- read.csv("/Users/genarorodriguezalcantara/Desktop/Tec/Introduction to econometrics/Ev1/ev22_series_tiempo.csv")
df_time_series
## Año Trimestre IED_Flujos
## 1 1999 I 3596.08
## 2 1999 II 3395.89
## 3 1999 III 3028.45
## 4 1999 IV 3939.90
## 5 2000 I 4600.64
## 6 2000 II 4857.42
## 7 2000 III 3056.95
## 8 2000 IV 5733.68
## 9 2001 I 3598.68
## 10 2001 II 5218.83
## 11 2001 III 16314.05
## 12 2001 IV 4925.63
## 13 2002 I 5067.98
## 14 2002 II 6258.52
## 15 2002 III 6114.34
## 16 2002 IV 6658.37
## 17 2003 I 3963.69
## 18 2003 II 5547.34
## 19 2003 III 2521.68
## 20 2003 IV 6217.27
## 21 2004 I 9363.46
## 22 2004 II 4351.50
## 23 2004 III 3284.91
## 24 2004 IV 8015.70
## 25 2005 I 6761.62
## 26 2005 II 6773.62
## 27 2005 III 5478.92
## 28 2005 IV 6781.66
## 29 2006 I 7436.81
## 30 2006 II 6634.31
## 31 2006 III 2346.57
## 32 2006 IV 4814.85
## 33 2007 I 10815.78
## 34 2007 II 6137.64
## 35 2007 III 7628.45
## 36 2007 IV 7811.47
## 37 2008 I 8546.44
## 38 2008 II 8376.14
## 39 2008 III 5643.68
## 40 2008 IV 6936.20
## 41 2009 I 6105.30
## 42 2009 II 6094.18
## 43 2009 III 2397.52
## 44 2009 IV 3252.95
## 45 2010 I 8722.26
## 46 2010 II 9301.62
## 47 2010 III 3932.89
## 48 2010 IV 5232.50
## 49 2011 I 8431.21
## 50 2011 II 6697.41
## 51 2011 III 4349.89
## 52 2011 IV 6154.00
## 53 2012 I 7892.69
## 54 2012 II 5622.13
## 55 2012 III 5736.29
## 56 2012 IV 2518.21
## 57 2013 I 10571.58
## 58 2013 II 21019.14
## 59 2013 III 4178.81
## 60 2013 IV 12584.89
## 61 2014 I 13828.00
## 62 2014 II 5478.74
## 63 2014 III 3222.08
## 64 2014 IV 7822.43
## 65 2015 I 12136.33
## 66 2015 II 6656.85
## 67 2015 III 9635.56
## 68 2015 IV 7515.02
## 69 2016 I 12805.36
## 70 2016 II 6210.62
## 71 2016 III 4317.27
## 72 2016 IV 7855.73
## 73 2017 I 13779.06
## 74 2017 II 6814.16
## 75 2017 III 6361.22
## 76 2017 IV 7062.61
## 77 2018 I 14067.52
## 78 2018 II 9577.75
## 79 2018 III 4132.97
## 80 2018 IV 6322.19
## 81 2019 I 15175.27
## 82 2019 II 6504.62
## 83 2019 III 8217.40
## 84 2019 IV 4679.87
## 85 2020 I 16807.60
## 86 2020 II 7293.96
## 87 2020 III 1340.58
## 88 2020 IV 2763.75
## 89 2021 I 16206.05
## 90 2021 II 5883.73
## 91 2021 III 6419.43
## 92 2021 IV 3044.31
## 93 2022 I 22794.16
## 94 2022 II 8164.27
## 95 2022 III 3479.68
## 96 2023 IV 1777.26
df_cash$IED_FlujosMX <- ((df_cash$IED_Flujos * df_cash$Tipo_de_Cambio)/df_cash$INPC)*100
df_cash$ExportacionesMX <- ((df_cash$Exportaciones * df_cash$Tipo_de_Cambio)/df_cash$INPC)*100
df_cash
## periodo IED_Flujos Exportaciones Empleo Educacion Salario_Diario Innovacion
## 1 1997 12145.60 9087.62 NA 7.20 24.30 11.30
## 2 1998 8373.50 9875.07 NA 7.31 31.91 11.37
## 3 1999 13960.32 10990.01 NA 7.43 31.91 12.46
## 4 2000 18248.69 12482.96 97.83 7.56 35.12 13.15
## 5 2001 30057.18 11300.44 97.36 7.68 37.57 13.47
## 6 2002 24099.21 11923.10 97.66 7.80 39.74 12.80
## 7 2003 18249.97 13156.00 97.06 7.93 41.53 11.81
## 8 2004 25015.57 13573.13 96.48 8.04 43.30 12.61
## 9 2005 25795.82 16465.81 97.17 8.14 45.24 13.41
## 10 2006 21232.54 17485.93 96.53 8.26 47.05 14.23
## 11 2007 32393.33 19103.85 96.60 8.36 48.88 15.04
## 12 2008 29502.46 16924.76 95.68 8.46 50.84 14.82
## 13 2009 17849.95 19702.63 95.20 8.56 53.19 12.59
## 14 2010 27189.28 22673.14 95.06 8.63 55.77 12.69
## 15 2011 25632.52 24333.02 95.49 8.75 58.06 12.10
## 16 2012 21769.32 26297.98 95.53 8.85 60.75 13.03
## 17 2013 48354.42 27687.57 95.75 8.95 63.12 13.22
## 18 2014 30351.25 31676.78 96.24 9.05 65.58 13.65
## 19 2015 35943.75 29959.94 96.04 9.15 70.10 15.11
## 20 2016 31188.98 31375.06 96.62 9.25 73.04 14.40
## 21 2017 34017.05 33322.62 96.85 9.35 88.36 14.05
## 22 2018 34100.43 35341.90 96.64 9.45 88.36 13.25
## 23 2019 34577.16 36414.73 97.09 9.58 102.68 12.70
## 24 2020 28205.89 41077.34 96.21 NA 123.22 11.28
## 25 2021 31553.52 44914.78 96.49 NA 141.70 NA
## 26 2022 36215.37 46477.59 97.24 NA 172.87 NA
## Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera
## 1 266.51 14.55 8.06 0.05
## 2 314.78 14.32 9.94 0.05
## 3 272.89 12.64 9.52 0.06
## 4 216.98 10.86 9.60 0.06
## 5 214.53 10.25 9.17 0.06
## 6 197.80 9.94 10.36 0.06
## 7 183.22 9.81 11.20 0.06
## 8 146.28 8.92 11.22 0.06
## 9 136.94 9.22 10.71 0.06
## 10 135.59 9.60 10.88 0.06
## 11 145.92 8.04 10.90 0.06
## 12 158.17 12.52 13.77 0.07
## 13 175.77 17.46 13.04 0.07
## 14 201.94 22.43 12.38 0.07
## 15 212.61 23.42 13.98 0.07
## 16 190.28 22.09 12.99 0.07
## 17 185.56 19.74 13.07 0.08
## 18 154.41 16.93 14.73 0.08
## 19 180.44 17.37 17.34 0.08
## 20 160.57 20.31 20.66 0.08
## 21 230.43 26.22 19.74 0.09
## 22 184.25 29.59 19.66 0.09
## 23 173.45 29.21 18.87 0.09
## 24 133.90 28.98 19.94 0.09
## 25 127.13 27.89 20.52 0.09
## 26 120.49 NA 19.41 0.09
## Densidad_Poblacion CO2_Emisiones PIB_Per_Capita INPC IED_FlujosMX
## 1 47.44 3.68 127570.1 33.28 294151.2
## 2 48.76 3.85 126738.8 39.47 210875.6
## 3 49.48 3.69 129164.7 44.34 299734.4
## 4 50.58 3.87 130874.9 48.31 362631.8
## 5 51.28 3.81 128083.4 50.43 546548.4
## 6 51.95 3.82 128205.9 53.31 468332.0
## 7 52.61 3.95 128737.9 55.43 368752.8
## 8 53.27 3.98 132563.5 58.31 481349.2
## 9 54.78 4.10 132941.1 60.25 458544.8
## 10 55.44 4.19 135894.9 62.69 368495.8
## 11 56.17 4.22 137795.7 65.05 542793.7
## 12 56.96 4.19 135176.0 69.30 586217.7
## 13 57.73 4.04 131233.0 71.77 324318.4
## 14 58.45 4.11 134991.7 74.93 449223.7
## 15 59.15 4.19 138891.9 77.79 460653.8
## 16 59.85 4.20 141530.2 80.57 350978.6
## 17 59.49 4.06 144112.0 83.77 754437.5
## 18 60.17 3.89 147277.4 87.19 512758.2
## 19 60.86 3.93 149433.5 89.05 699904.1
## 20 61.57 3.89 152275.4 92.04 700091.6
## 21 62.28 3.84 153235.7 98.27 683318.0
## 22 63.11 3.65 153133.8 99.91 671018.4
## 23 63.90 3.59 150233.1 105.93 615945.4
## 24 64.59 NA 142609.3 109.27 514711.7
## 25 65.16 NA 142772.0 117.31 551937.8
## 26 65.60 NA 146826.7 126.48 555771.9
## ExportacionesMX
## 1 220090.8
## 2 248690.6
## 3 235960.5
## 4 248057.2
## 5 205482.9
## 6 231707.6
## 7 265825.7
## 8 261173.9
## 9 292695.1
## 10 303472.5
## 11 320110.6
## 12 336297.2
## 13 357980.1
## 14 374607.6
## 15 437299.9
## 16 423992.5
## 17 431988.2
## 18 535151.9
## 19 583386.1
## 20 704268.5
## 21 669368.6
## 22 695447.7
## 23 648679.3
## 24 749594.7
## 25 785654.5
## 26 713259.0
df_cash$IED_Flujos <- NULL
df_cash$Exportaciones <- NULL
df_cash
## periodo Empleo Educacion Salario_Diario Innovacion Inseguridad_Robo
## 1 1997 NA 7.20 24.30 11.30 266.51
## 2 1998 NA 7.31 31.91 11.37 314.78
## 3 1999 NA 7.43 31.91 12.46 272.89
## 4 2000 97.83 7.56 35.12 13.15 216.98
## 5 2001 97.36 7.68 37.57 13.47 214.53
## 6 2002 97.66 7.80 39.74 12.80 197.80
## 7 2003 97.06 7.93 41.53 11.81 183.22
## 8 2004 96.48 8.04 43.30 12.61 146.28
## 9 2005 97.17 8.14 45.24 13.41 136.94
## 10 2006 96.53 8.26 47.05 14.23 135.59
## 11 2007 96.60 8.36 48.88 15.04 145.92
## 12 2008 95.68 8.46 50.84 14.82 158.17
## 13 2009 95.20 8.56 53.19 12.59 175.77
## 14 2010 95.06 8.63 55.77 12.69 201.94
## 15 2011 95.49 8.75 58.06 12.10 212.61
## 16 2012 95.53 8.85 60.75 13.03 190.28
## 17 2013 95.75 8.95 63.12 13.22 185.56
## 18 2014 96.24 9.05 65.58 13.65 154.41
## 19 2015 96.04 9.15 70.10 15.11 180.44
## 20 2016 96.62 9.25 73.04 14.40 160.57
## 21 2017 96.85 9.35 88.36 14.05 230.43
## 22 2018 96.64 9.45 88.36 13.25 184.25
## 23 2019 97.09 9.58 102.68 12.70 173.45
## 24 2020 96.21 NA 123.22 11.28 133.90
## 25 2021 96.49 NA 141.70 NA 127.13
## 26 2022 97.24 NA 172.87 NA 120.49
## Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## 1 14.55 8.06 0.05 47.44
## 2 14.32 9.94 0.05 48.76
## 3 12.64 9.52 0.06 49.48
## 4 10.86 9.60 0.06 50.58
## 5 10.25 9.17 0.06 51.28
## 6 9.94 10.36 0.06 51.95
## 7 9.81 11.20 0.06 52.61
## 8 8.92 11.22 0.06 53.27
## 9 9.22 10.71 0.06 54.78
## 10 9.60 10.88 0.06 55.44
## 11 8.04 10.90 0.06 56.17
## 12 12.52 13.77 0.07 56.96
## 13 17.46 13.04 0.07 57.73
## 14 22.43 12.38 0.07 58.45
## 15 23.42 13.98 0.07 59.15
## 16 22.09 12.99 0.07 59.85
## 17 19.74 13.07 0.08 59.49
## 18 16.93 14.73 0.08 60.17
## 19 17.37 17.34 0.08 60.86
## 20 20.31 20.66 0.08 61.57
## 21 26.22 19.74 0.09 62.28
## 22 29.59 19.66 0.09 63.11
## 23 29.21 18.87 0.09 63.90
## 24 28.98 19.94 0.09 64.59
## 25 27.89 20.52 0.09 65.16
## 26 NA 19.41 0.09 65.60
## CO2_Emisiones PIB_Per_Capita INPC IED_FlujosMX ExportacionesMX
## 1 3.68 127570.1 33.28 294151.2 220090.8
## 2 3.85 126738.8 39.47 210875.6 248690.6
## 3 3.69 129164.7 44.34 299734.4 235960.5
## 4 3.87 130874.9 48.31 362631.8 248057.2
## 5 3.81 128083.4 50.43 546548.4 205482.9
## 6 3.82 128205.9 53.31 468332.0 231707.6
## 7 3.95 128737.9 55.43 368752.8 265825.7
## 8 3.98 132563.5 58.31 481349.2 261173.9
## 9 4.10 132941.1 60.25 458544.8 292695.1
## 10 4.19 135894.9 62.69 368495.8 303472.5
## 11 4.22 137795.7 65.05 542793.7 320110.6
## 12 4.19 135176.0 69.30 586217.7 336297.2
## 13 4.04 131233.0 71.77 324318.4 357980.1
## 14 4.11 134991.7 74.93 449223.7 374607.6
## 15 4.19 138891.9 77.79 460653.8 437299.9
## 16 4.20 141530.2 80.57 350978.6 423992.5
## 17 4.06 144112.0 83.77 754437.5 431988.2
## 18 3.89 147277.4 87.19 512758.2 535151.9
## 19 3.93 149433.5 89.05 699904.1 583386.1
## 20 3.89 152275.4 92.04 700091.6 704268.5
## 21 3.84 153235.7 98.27 683318.0 669368.6
## 22 3.65 153133.8 99.91 671018.4 695447.7
## 23 3.59 150233.1 105.93 615945.4 648679.3
## 24 NA 142609.3 109.27 514711.7 749594.7
## 25 NA 142772.0 117.31 551937.8 785654.5
## 26 NA 146826.7 126.48 555771.9 713259.0
str(df_cash)
## 'data.frame': 26 obs. of 15 variables:
## $ periodo : int 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 ...
## $ Empleo : num NA NA NA 97.8 97.4 ...
## $ Educacion : num 7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
## $ Salario_Diario : num 24.3 31.9 31.9 35.1 37.6 ...
## $ Innovacion : num 11.3 11.4 12.5 13.2 13.5 ...
## $ Inseguridad_Robo : num 267 315 273 217 215 ...
## $ Inseguridad_Homicidio: num 14.6 14.3 12.6 10.9 10.2 ...
## $ Tipo_de_Cambio : num 8.06 9.94 9.52 9.6 9.17 ...
## $ Densidad_Carretera : num 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
## $ Densidad_Poblacion : num 47.4 48.8 49.5 50.6 51.3 ...
## $ CO2_Emisiones : num 3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
## $ PIB_Per_Capita : num 127570 126739 129165 130875 128083 ...
## $ INPC : num 33.3 39.5 44.3 48.3 50.4 ...
## $ IED_FlujosMX : num 294151 210876 299734 362632 546548 ...
## $ ExportacionesMX : num 220091 248691 235961 248057 205483 ...
str(df_time_series)
## 'data.frame': 96 obs. of 3 variables:
## $ Año : int 1999 1999 1999 1999 2000 2000 2000 2000 2001 2001 ...
## $ Trimestre : chr "I" "II" "III" "IV" ...
## $ IED_Flujos: num 3596 3396 3028 3940 4601 ...
summary(df_cash)
## periodo Empleo Educacion Salario_Diario
## Min. :1997 Min. :95.06 Min. :7.200 Min. : 24.30
## 1st Qu.:2003 1st Qu.:95.89 1st Qu.:7.865 1st Qu.: 41.97
## Median :2010 Median :96.53 Median :8.460 Median : 54.48
## Mean :2010 Mean :96.47 Mean :8.423 Mean : 65.16
## 3rd Qu.:2016 3rd Qu.:97.08 3rd Qu.:9.000 3rd Qu.: 72.31
## Max. :2022 Max. :97.83 Max. :9.580 Max. :172.87
## NA's :3 NA's :3
## Innovacion Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio
## Min. :11.28 Min. :120.5 Min. : 8.04 Min. : 8.06
## 1st Qu.:12.56 1st Qu.:148.3 1st Qu.:10.25 1st Qu.:10.75
## Median :13.09 Median :181.8 Median :16.93 Median :13.02
## Mean :13.11 Mean :185.4 Mean :17.29 Mean :13.91
## 3rd Qu.:13.75 3rd Qu.:209.9 3rd Qu.:22.43 3rd Qu.:18.49
## Max. :15.11 Max. :314.8 Max. :29.59 Max. :20.66
## NA's :2 NA's :1
## Densidad_Carretera Densidad_Poblacion CO2_Emisiones PIB_Per_Capita
## Min. :0.05000 Min. :47.44 Min. :3.590 Min. :126739
## 1st Qu.:0.06000 1st Qu.:52.77 1st Qu.:3.830 1st Qu.:130964
## Median :0.07000 Median :58.09 Median :3.930 Median :136845
## Mean :0.07115 Mean :57.33 Mean :3.945 Mean :138550
## 3rd Qu.:0.08000 3rd Qu.:61.39 3rd Qu.:4.105 3rd Qu.:146148
## Max. :0.09000 Max. :65.60 Max. :4.220 Max. :153236
## NA's :3
## INPC IED_FlujosMX ExportacionesMX
## Min. : 33.28 Min. :210876 Min. :205483
## 1st Qu.: 56.15 1st Qu.:368560 1st Qu.:262337
## Median : 73.35 Median :497054 Median :366294
## Mean : 75.17 Mean :493596 Mean :433856
## 3rd Qu.: 91.29 3rd Qu.:578606 3rd Qu.:632356
## Max. :126.48 Max. :754438 Max. :785654
##
summary(df_time_series)
## Año Trimestre IED_Flujos
## Min. :1999 Length:96 Min. : 1341
## 1st Qu.:2005 Class :character 1st Qu.: 4351
## Median :2010 Mode :character Median : 6238
## Mean :2011 Mean : 7036
## 3rd Qu.:2016 3rd Qu.: 8053
## Max. :2023 Max. :22794
# Transforming the int variables values of the columns to a numeric type in our databases.
df_cash$periodo <- as.numeric(df_cash$periodo)
df_time_series$Año <- as.numeric(df_time_series$Año)
# changing character variable to a numeric in the second database
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "I", "A", df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "II","B", df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "III","C", df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "IV","D", df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "A", 1, df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "B", 2, df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "C", 3, df_time_series$Trimestre)
df_time_series$Trimestre <- ifelse(df_time_series$Trimestre == "D", 4, df_time_series$Trimestre)
print(df_time_series)
## Año Trimestre IED_Flujos
## 1 1999 1 3596.08
## 2 1999 2 3395.89
## 3 1999 3 3028.45
## 4 1999 4 3939.90
## 5 2000 1 4600.64
## 6 2000 2 4857.42
## 7 2000 3 3056.95
## 8 2000 4 5733.68
## 9 2001 1 3598.68
## 10 2001 2 5218.83
## 11 2001 3 16314.05
## 12 2001 4 4925.63
## 13 2002 1 5067.98
## 14 2002 2 6258.52
## 15 2002 3 6114.34
## 16 2002 4 6658.37
## 17 2003 1 3963.69
## 18 2003 2 5547.34
## 19 2003 3 2521.68
## 20 2003 4 6217.27
## 21 2004 1 9363.46
## 22 2004 2 4351.50
## 23 2004 3 3284.91
## 24 2004 4 8015.70
## 25 2005 1 6761.62
## 26 2005 2 6773.62
## 27 2005 3 5478.92
## 28 2005 4 6781.66
## 29 2006 1 7436.81
## 30 2006 2 6634.31
## 31 2006 3 2346.57
## 32 2006 4 4814.85
## 33 2007 1 10815.78
## 34 2007 2 6137.64
## 35 2007 3 7628.45
## 36 2007 4 7811.47
## 37 2008 1 8546.44
## 38 2008 2 8376.14
## 39 2008 3 5643.68
## 40 2008 4 6936.20
## 41 2009 1 6105.30
## 42 2009 2 6094.18
## 43 2009 3 2397.52
## 44 2009 4 3252.95
## 45 2010 1 8722.26
## 46 2010 2 9301.62
## 47 2010 3 3932.89
## 48 2010 4 5232.50
## 49 2011 1 8431.21
## 50 2011 2 6697.41
## 51 2011 3 4349.89
## 52 2011 4 6154.00
## 53 2012 1 7892.69
## 54 2012 2 5622.13
## 55 2012 3 5736.29
## 56 2012 4 2518.21
## 57 2013 1 10571.58
## 58 2013 2 21019.14
## 59 2013 3 4178.81
## 60 2013 4 12584.89
## 61 2014 1 13828.00
## 62 2014 2 5478.74
## 63 2014 3 3222.08
## 64 2014 4 7822.43
## 65 2015 1 12136.33
## 66 2015 2 6656.85
## 67 2015 3 9635.56
## 68 2015 4 7515.02
## 69 2016 1 12805.36
## 70 2016 2 6210.62
## 71 2016 3 4317.27
## 72 2016 4 7855.73
## 73 2017 1 13779.06
## 74 2017 2 6814.16
## 75 2017 3 6361.22
## 76 2017 4 7062.61
## 77 2018 1 14067.52
## 78 2018 2 9577.75
## 79 2018 3 4132.97
## 80 2018 4 6322.19
## 81 2019 1 15175.27
## 82 2019 2 6504.62
## 83 2019 3 8217.40
## 84 2019 4 4679.87
## 85 2020 1 16807.60
## 86 2020 2 7293.96
## 87 2020 3 1340.58
## 88 2020 4 2763.75
## 89 2021 1 16206.05
## 90 2021 2 5883.73
## 91 2021 3 6419.43
## 92 2021 4 3044.31
## 93 2022 1 22794.16
## 94 2022 2 8164.27
## 95 2022 3 3479.68
## 96 2023 4 1777.26
df_time_series$Trimestre <- as.numeric(df_time_series$Trimestre)
str(df_cash)
## 'data.frame': 26 obs. of 15 variables:
## $ periodo : num 1997 1998 1999 2000 2001 ...
## $ Empleo : num NA NA NA 97.8 97.4 ...
## $ Educacion : num 7.2 7.31 7.43 7.56 7.68 7.8 7.93 8.04 8.14 8.26 ...
## $ Salario_Diario : num 24.3 31.9 31.9 35.1 37.6 ...
## $ Innovacion : num 11.3 11.4 12.5 13.2 13.5 ...
## $ Inseguridad_Robo : num 267 315 273 217 215 ...
## $ Inseguridad_Homicidio: num 14.6 14.3 12.6 10.9 10.2 ...
## $ Tipo_de_Cambio : num 8.06 9.94 9.52 9.6 9.17 ...
## $ Densidad_Carretera : num 0.05 0.05 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 ...
## $ Densidad_Poblacion : num 47.4 48.8 49.5 50.6 51.3 ...
## $ CO2_Emisiones : num 3.68 3.85 3.69 3.87 3.81 3.82 3.95 3.98 4.1 4.19 ...
## $ PIB_Per_Capita : num 127570 126739 129165 130875 128083 ...
## $ INPC : num 33.3 39.5 44.3 48.3 50.4 ...
## $ IED_FlujosMX : num 294151 210876 299734 362632 546548 ...
## $ ExportacionesMX : num 220091 248691 235961 248057 205483 ...
str(df_time_series)
## 'data.frame': 96 obs. of 3 variables:
## $ Año : num 1999 1999 1999 1999 2000 ...
## $ Trimestre : num 1 2 3 4 1 2 3 4 1 2 ...
## $ IED_Flujos: num 3596 3396 3028 3940 4601 ...
# Identify the name of the variables
colnames(df_cash)
## [1] "periodo" "Empleo" "Educacion"
## [4] "Salario_Diario" "Innovacion" "Inseguridad_Robo"
## [7] "Inseguridad_Homicidio" "Tipo_de_Cambio" "Densidad_Carretera"
## [10] "Densidad_Poblacion" "CO2_Emisiones" "PIB_Per_Capita"
## [13] "INPC" "IED_FlujosMX" "ExportacionesMX"
colnames(df_time_series)
## [1] "Año" "Trimestre" "IED_Flujos"
# Identify missing values
df_missing_values <- sum(is.na(df_cash))
df_missing_values
## [1] 12
df_time_series_missing_values <- sum(is.na(df_time_series))
df_time_series_missing_values
## [1] 0
# In the first database there are 12 missing values. It is needed to do an imputation method to take care of these values. For the second database there where no NAs
## Imputation method for missing values.
# Calculating the mean for each of the variables that have NA.
mean_empleo <- mean(df_cash$Empleo, na.rm = TRUE)
mean_innovacion <- mean(df_cash$Innovacion, na.rm = TRUE)
mean_inseguridad <- mean(df_cash$Inseguridad_Homicidio, na.rm = TRUE)
mean_co2 <- mean(df_cash$CO2_Emisiones, na.rm = TRUE)
# Imputating the missing values with the mean of each category.
df_cash$Empleo[is.na(df_cash$Empleo)] <- mean_empleo
df_cash$Innovacion[is.na(df_cash$Innovacion)] <- mean_innovacion
df_cash$Inseguridad_Homicidio[is.na(df_cash$Inseguridad_Homicidio)] <- mean_inseguridad
df_cash$CO2_Emisiones[is.na(df_cash$CO2_Emisiones)] <- mean_co2
# Imputating with linear interpolation the column "Educacion"
ascending_educacion <- approx(seq_along(df_cash$Educacion), df_cash$Educacion, method = "linear", n = length(df_cash$Educacion))$y
df_cash$Educacion <- ascending_educacion
print(df_cash$Educacion)
## [1] 7.2000 7.2968 7.4012 7.5132 7.6224 7.7280 7.8364 7.9476 8.0440 8.1320
## [11] 8.2360 8.3280 8.4160 8.5040 8.5824 8.6540 8.7580 8.8460 8.9340 9.0220
## [21] 9.1100 9.1980 9.2860 9.3740 9.4656 9.5800
# We look for any missing values in our data.
na_df_2 <- sum(is.na(df_cash))
na_df_2
## [1] 0
print(df_cash)
## periodo Empleo Educacion Salario_Diario Innovacion Inseguridad_Robo
## 1 1997 96.47043 7.2000 24.30 11.30000 266.51
## 2 1998 96.47043 7.2968 31.91 11.37000 314.78
## 3 1999 96.47043 7.4012 31.91 12.46000 272.89
## 4 2000 97.83000 7.5132 35.12 13.15000 216.98
## 5 2001 97.36000 7.6224 37.57 13.47000 214.53
## 6 2002 97.66000 7.7280 39.74 12.80000 197.80
## 7 2003 97.06000 7.8364 41.53 11.81000 183.22
## 8 2004 96.48000 7.9476 43.30 12.61000 146.28
## 9 2005 97.17000 8.0440 45.24 13.41000 136.94
## 10 2006 96.53000 8.1320 47.05 14.23000 135.59
## 11 2007 96.60000 8.2360 48.88 15.04000 145.92
## 12 2008 95.68000 8.3280 50.84 14.82000 158.17
## 13 2009 95.20000 8.4160 53.19 12.59000 175.77
## 14 2010 95.06000 8.5040 55.77 12.69000 201.94
## 15 2011 95.49000 8.5824 58.06 12.10000 212.61
## 16 2012 95.53000 8.6540 60.75 13.03000 190.28
## 17 2013 95.75000 8.7580 63.12 13.22000 185.56
## 18 2014 96.24000 8.8460 65.58 13.65000 154.41
## 19 2015 96.04000 8.9340 70.10 15.11000 180.44
## 20 2016 96.62000 9.0220 73.04 14.40000 160.57
## 21 2017 96.85000 9.1100 88.36 14.05000 230.43
## 22 2018 96.64000 9.1980 88.36 13.25000 184.25
## 23 2019 97.09000 9.2860 102.68 12.70000 173.45
## 24 2020 96.21000 9.3740 123.22 11.28000 133.90
## 25 2021 96.49000 9.4656 141.70 13.10583 127.13
## 26 2022 97.24000 9.5800 172.87 13.10583 120.49
## Inseguridad_Homicidio Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## 1 14.5500 8.06 0.05 47.44
## 2 14.3200 9.94 0.05 48.76
## 3 12.6400 9.52 0.06 49.48
## 4 10.8600 9.60 0.06 50.58
## 5 10.2500 9.17 0.06 51.28
## 6 9.9400 10.36 0.06 51.95
## 7 9.8100 11.20 0.06 52.61
## 8 8.9200 11.22 0.06 53.27
## 9 9.2200 10.71 0.06 54.78
## 10 9.6000 10.88 0.06 55.44
## 11 8.0400 10.90 0.06 56.17
## 12 12.5200 13.77 0.07 56.96
## 13 17.4600 13.04 0.07 57.73
## 14 22.4300 12.38 0.07 58.45
## 15 23.4200 13.98 0.07 59.15
## 16 22.0900 12.99 0.07 59.85
## 17 19.7400 13.07 0.08 59.49
## 18 16.9300 14.73 0.08 60.17
## 19 17.3700 17.34 0.08 60.86
## 20 20.3100 20.66 0.08 61.57
## 21 26.2200 19.74 0.09 62.28
## 22 29.5900 19.66 0.09 63.11
## 23 29.2100 18.87 0.09 63.90
## 24 28.9800 19.94 0.09 64.59
## 25 27.8900 20.52 0.09 65.16
## 26 17.2924 19.41 0.09 65.60
## CO2_Emisiones PIB_Per_Capita INPC IED_FlujosMX ExportacionesMX
## 1 3.680000 127570.1 33.28 294151.2 220090.8
## 2 3.850000 126738.8 39.47 210875.6 248690.6
## 3 3.690000 129164.7 44.34 299734.4 235960.5
## 4 3.870000 130874.9 48.31 362631.8 248057.2
## 5 3.810000 128083.4 50.43 546548.4 205482.9
## 6 3.820000 128205.9 53.31 468332.0 231707.6
## 7 3.950000 128737.9 55.43 368752.8 265825.7
## 8 3.980000 132563.5 58.31 481349.2 261173.9
## 9 4.100000 132941.1 60.25 458544.8 292695.1
## 10 4.190000 135894.9 62.69 368495.8 303472.5
## 11 4.220000 137795.7 65.05 542793.7 320110.6
## 12 4.190000 135176.0 69.30 586217.7 336297.2
## 13 4.040000 131233.0 71.77 324318.4 357980.1
## 14 4.110000 134991.7 74.93 449223.7 374607.6
## 15 4.190000 138891.9 77.79 460653.8 437299.9
## 16 4.200000 141530.2 80.57 350978.6 423992.5
## 17 4.060000 144112.0 83.77 754437.5 431988.2
## 18 3.890000 147277.4 87.19 512758.2 535151.9
## 19 3.930000 149433.5 89.05 699904.1 583386.1
## 20 3.890000 152275.4 92.04 700091.6 704268.5
## 21 3.840000 153235.7 98.27 683318.0 669368.6
## 22 3.650000 153133.8 99.91 671018.4 695447.7
## 23 3.590000 150233.1 105.93 615945.4 648679.3
## 24 3.945217 142609.3 109.27 514711.7 749594.7
## 25 3.945217 142772.0 117.31 551937.8 785654.5
## 26 3.945217 146826.7 126.48 555771.9 713259.0
# Basic descriptive statistics and measures of didfersion
df_descriptive_statistics <- summary(df_cash)
df_descriptive_statistics
## periodo Empleo Educacion Salario_Diario
## Min. :1997 Min. :95.06 Min. :7.200 Min. : 24.30
## 1st Qu.:2003 1st Qu.:96.08 1st Qu.:7.864 1st Qu.: 41.97
## Median :2010 Median :96.48 Median :8.460 Median : 54.48
## Mean :2010 Mean :96.47 Mean :8.424 Mean : 65.16
## 3rd Qu.:2016 3rd Qu.:97.01 3rd Qu.:9.000 3rd Qu.: 72.31
## Max. :2022 Max. :97.83 Max. :9.580 Max. :172.87
## Innovacion Inseguridad_Robo Inseguridad_Homicidio Tipo_de_Cambio
## Min. :11.28 Min. :120.5 Min. : 8.04 Min. : 8.06
## 1st Qu.:12.60 1st Qu.:148.3 1st Qu.:10.40 1st Qu.:10.75
## Median :13.11 Median :181.8 Median :17.11 Median :13.02
## Mean :13.11 Mean :185.4 Mean :17.29 Mean :13.91
## 3rd Qu.:13.61 3rd Qu.:209.9 3rd Qu.:22.34 3rd Qu.:18.49
## Max. :15.11 Max. :314.8 Max. :29.59 Max. :20.66
## Densidad_Carretera Densidad_Poblacion CO2_Emisiones PIB_Per_Capita
## Min. :0.05000 Min. :47.44 Min. :3.590 Min. :126739
## 1st Qu.:0.06000 1st Qu.:52.77 1st Qu.:3.842 1st Qu.:130964
## Median :0.07000 Median :58.09 Median :3.945 Median :136845
## Mean :0.07115 Mean :57.33 Mean :3.945 Mean :138550
## 3rd Qu.:0.08000 3rd Qu.:61.39 3rd Qu.:4.090 3rd Qu.:146148
## Max. :0.09000 Max. :65.60 Max. :4.220 Max. :153236
## INPC IED_FlujosMX ExportacionesMX
## Min. : 33.28 Min. :210876 Min. :205483
## 1st Qu.: 56.15 1st Qu.:368560 1st Qu.:262337
## Median : 73.35 Median :497054 Median :366294
## Mean : 75.17 Mean :493596 Mean :433856
## 3rd Qu.: 91.29 3rd Qu.:578606 3rd Qu.:632356
## Max. :126.48 Max. :754438 Max. :785654
df_describe <- describe(df_cash)
df_describe
## vars n mean sd median trimmed mad
## periodo 1 26 2009.50 7.65 2009.50 2009.50 9.64
## Empleo 2 26 96.47 0.72 96.48 96.48 0.76
## Educacion 3 26 8.42 0.71 8.46 8.43 0.88
## Salario_Diario 4 26 65.16 35.85 54.48 60.16 22.51
## Innovacion 5 26 13.11 1.07 13.11 13.09 0.79
## Inseguridad_Robo 6 26 185.42 47.67 181.83 181.16 47.06
## Inseguridad_Homicidio 7 26 17.29 7.12 17.11 16.99 9.31
## Tipo_de_Cambio 8 26 13.91 4.15 13.02 13.78 4.25
## Densidad_Carretera 9 26 0.07 0.01 0.07 0.07 0.01
## Densidad_Poblacion 10 26 57.33 5.41 58.09 57.44 6.68
## CO2_Emisiones 11 26 3.95 0.18 3.95 3.95 0.18
## PIB_Per_Capita 12 26 138550.10 8861.10 136845.30 138255.64 11080.42
## INPC 13 26 75.17 24.81 73.35 74.45 27.14
## IED_FlujosMX 14 26 493596.02 143849.16 497053.70 494270.03 183243.92
## ExportacionesMX 15 26 433855.52 195018.66 366293.83 423610.02 184264.93
## min max range skew kurtosis se
## periodo 1997.00 2022.00 25.00 0.00 -1.34 1.50
## Empleo 95.06 97.83 2.77 -0.14 -0.73 0.14
## Educacion 7.20 9.58 2.38 -0.09 -1.28 0.14
## Salario_Diario 24.30 172.87 148.57 1.43 1.44 7.03
## Innovacion 11.28 15.11 3.83 0.12 -0.70 0.21
## Inseguridad_Robo 120.49 314.78 194.29 0.89 0.30 9.35
## Inseguridad_Homicidio 8.04 29.59 21.55 0.38 -1.28 1.40
## Tipo_de_Cambio 8.06 20.66 12.60 0.44 -1.39 0.81
## Densidad_Carretera 0.05 0.09 0.04 0.19 -1.41 0.00
## Densidad_Poblacion 47.44 65.60 18.16 -0.19 -1.24 1.06
## CO2_Emisiones 3.59 4.22 0.63 -0.14 -0.95 0.04
## PIB_Per_Capita 126738.75 153235.73 26496.98 0.28 -1.41 1737.81
## INPC 33.28 126.48 93.20 0.26 -0.95 4.87
## IED_FlujosMX 210875.58 754437.47 543561.89 -0.01 -1.00 28211.14
## ExportacionesMX 205482.92 785654.49 580171.58 0.48 -1.40 38246.31
df_variance <- var(df_cash)
df_variance
## periodo Empleo Educacion Salario_Diario
## periodo 58.50000 -1.093600e+00 5.456280e+00 2.423374e+02
## Empleo -1.09360 5.193158e-01 -1.086377e-01 1.186760e+00
## Educacion 5.45628 -1.086377e-01 5.098458e-01 2.239210e+01
## Salario_Diario 242.33740 1.186760e+00 2.239210e+01 1.285149e+03
## Innovacion 2.09800 1.825317e-02 2.106674e-01 2.001901e+00
## Inseguridad_Robo -214.76000 -1.448614e-01 -2.083824e+01 -9.258424e+02
## Inseguridad_Homicidio 42.46360 -1.639959e+00 3.881321e+00 1.660927e+02
## Tipo_de_Cambio 29.86600 -2.223791e-01 2.764304e+00 1.265472e+02
## Densidad_Carretera 0.09860 -1.105391e-03 9.144708e-03 4.111906e-01
## Densidad_Poblacion 41.20340 -9.326584e-01 3.854972e+00 1.672534e+02
## CO2_Emisiones 0.04560 -6.470219e-02 8.375443e-03 -5.454132e-01
## PIB_Per_Capita 60266.10260 -6.479285e+02 5.599582e+03 2.138486e+05
## INPC 188.09020 -2.245450e+00 1.751434e+01 8.304526e+02
## IED_FlujosMX 754867.37199 3.266634e+03 7.102963e+04 2.483604e+06
## ExportacionesMX 1422733.70764 -1.178481e+04 1.312817e+05 6.160194e+06
## Innovacion Inseguridad_Robo Inseguridad_Homicidio
## periodo 2.098000e+00 -2.147600e+02 4.246360e+01
## Empleo 1.825317e-02 -1.448614e-01 -1.639959e+00
## Educacion 2.106674e-01 -2.083824e+01 3.881321e+00
## Salario_Diario 2.001901e+00 -9.258424e+02 1.660927e+02
## Innovacion 1.149911e+00 -2.152945e+01 -1.255865e+00
## Inseguridad_Robo -2.152945e+01 2.272065e+03 -2.656937e+01
## Inseguridad_Homicidio -1.255865e+00 -2.656937e+01 5.063701e+01
## Tipo_de_Cambio 9.687697e-01 -8.945536e+01 2.338468e+01
## Densidad_Carretera 3.090333e-03 -2.994803e-01 7.752704e-02
## Densidad_Poblacion 1.621845e+00 -1.589738e+02 2.957680e+01
## CO2_Emisiones 6.180296e-02 -3.594647e+00 -3.080327e-01
## PIB_Per_Capita 4.104525e+03 -1.700151e+05 4.447242e+04
## INPC 5.854670e+00 -7.031076e+02 1.332051e+02
## IED_FlujosMX 9.019878e+04 -3.098033e+06 4.262488e+05
## ExportacionesMX 3.577342e+04 -4.159591e+06 1.138066e+06
## Tipo_de_Cambio Densidad_Carretera Densidad_Poblacion
## periodo 2.986600e+01 9.860000e-02 4.120340e+01
## Empleo -2.223791e-01 -1.105391e-03 -9.326584e-01
## Educacion 2.764304e+00 9.144708e-03 3.854972e+00
## Salario_Diario 1.265472e+02 4.111906e-01 1.672534e+02
## Innovacion 9.687697e-01 3.090333e-03 1.621845e+00
## Inseguridad_Robo -8.945536e+01 -2.994803e-01 -1.589738e+02
## Inseguridad_Homicidio 2.338468e+01 7.752704e-02 2.957680e+01
## Tipo_de_Cambio 1.722372e+01 5.231600e-02 2.068166e+01
## Densidad_Carretera 5.231600e-02 1.786154e-04 6.856569e-02
## Densidad_Poblacion 2.068166e+01 6.856569e-02 2.928160e+01
## CO2_Emisiones -1.174623e-01 -3.697391e-04 1.025527e-01
## PIB_Per_Capita 3.240201e+04 1.048757e+02 4.182754e+04
## INPC 9.627926e+01 3.189026e-01 1.319716e+02
## IED_FlujosMX 4.029994e+05 1.385829e+03 5.227170e+05
## ExportacionesMX 7.950197e+05 2.470075e+03 9.814354e+05
## CO2_Emisiones PIB_Per_Capita INPC IED_FlujosMX
## periodo 4.560000e-02 6.026610e+04 1.880902e+02 7.548674e+05
## Empleo -6.470219e-02 -6.479285e+02 -2.245450e+00 3.266634e+03
## Educacion 8.375443e-03 5.599582e+03 1.751434e+01 7.102963e+04
## Salario_Diario -5.454132e-01 2.138486e+05 8.304526e+02 2.483604e+06
## Innovacion 6.180296e-02 4.104525e+03 5.854670e+00 9.019878e+04
## Inseguridad_Robo -3.594647e+00 -1.700151e+05 -7.031076e+02 -3.098033e+06
## Inseguridad_Homicidio -3.080327e-01 4.447242e+04 1.332051e+02 4.262488e+05
## Tipo_de_Cambio -1.174623e-01 3.240201e+04 9.627926e+01 4.029994e+05
## Densidad_Carretera -3.697391e-04 1.048757e+02 3.189026e-01 1.385829e+03
## Densidad_Poblacion 1.025527e-01 4.182754e+04 1.319716e+02 5.227170e+05
## CO2_Emisiones 3.238296e-02 -1.716434e+02 1.572087e-02 -1.397987e+03
## PIB_Per_Capita -1.716434e+02 7.851913e+07 1.871820e+05 9.938355e+08
## INPC 1.572087e-02 1.871820e+05 6.154715e+02 2.334113e+06
## IED_FlujosMX -1.397987e+03 9.938355e+08 2.334113e+06 2.069258e+10
## ExportacionesMX -5.626649e+03 1.533901e+09 4.597023e+06 1.789059e+10
## ExportacionesMX
## periodo 1.422734e+06
## Empleo -1.178481e+04
## Educacion 1.312817e+05
## Salario_Diario 6.160194e+06
## Innovacion 3.577342e+04
## Inseguridad_Robo -4.159591e+06
## Inseguridad_Homicidio 1.138066e+06
## Tipo_de_Cambio 7.950197e+05
## Densidad_Carretera 2.470075e+03
## Densidad_Poblacion 9.814354e+05
## CO2_Emisiones -5.626649e+03
## PIB_Per_Capita 1.533901e+09
## INPC 4.597023e+06
## IED_FlujosMX 1.789059e+10
## ExportacionesMX 3.803228e+10
df_time_series_descriptive_statistics <- summary(df_time_series)
df_time_series_descriptive_statistics
## Año Trimestre IED_Flujos
## Min. :1999 Min. :1.00 Min. : 1341
## 1st Qu.:2005 1st Qu.:1.75 1st Qu.: 4351
## Median :2010 Median :2.50 Median : 6238
## Mean :2011 Mean :2.50 Mean : 7036
## 3rd Qu.:2016 3rd Qu.:3.25 3rd Qu.: 8053
## Max. :2023 Max. :4.00 Max. :22794
df_time_series_describe <- describe(df_time_series)
df_time_series_describe
## vars n mean sd median trimmed mad min max
## Año 1 96 2010.51 6.98 2010.5 2010.50 8.90 1999.00 2023.00
## Trimestre 2 96 2.50 1.12 2.5 2.50 1.48 1.00 4.00
## IED_Flujos 3 96 7036.50 3978.53 6237.9 6458.65 2797.96 1340.58 22794.16
## range skew kurtosis se
## Año 24.00 0.01 -1.23 0.71
## Trimestre 3.00 0.00 -1.39 0.11
## IED_Flujos 21453.58 1.60 3.01 406.06
df_time_series_variance <- var(df_time_series)
df_time_series_variance
## Año Trimestre IED_Flujos
## Año 4.867357e+01 1.578947e-02 8212.462
## Trimestre 1.578947e-02 1.263158e+00 -1861.527
## IED_Flujos 8.212462e+03 -1.861527e+03 15828730.902
# Dependent variable (Inversion Extranjera Directa)
df_cash$IED_FlujosMX
## [1] 294151.2 210875.6 299734.4 362631.8 546548.4 468332.0 368752.8 481349.2
## [9] 458544.8 368495.8 542793.7 586217.7 324318.4 449223.7 460653.8 350978.6
## [17] 754437.5 512758.2 699904.1 700091.6 683318.0 671018.4 615945.4 514711.7
## [25] 551937.8 555771.9
summary(df_cash$IED_FlujosMX)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 210876 368560 497054 493596 578606 754438
unique_vals <- unique(df_time_series$Trimestre)
print(unique_vals)
## [1] 1 2 3 4
# Check unique values
unique_vals <- unique(df_time_series$Trimestre)
print(unique_vals)
## [1] 1 2 3 4
# Replace values based on actual values
df_time_series$Trimestre[df_time_series$Trimestre == "I" | df_time_series$Trimestre == "i"] <- "01"
df_time_series$Trimestre[df_time_series$Trimestre == "II" | df_time_series$Trimestre == "ii"] <- "02"
df_time_series$Trimestre[df_time_series$Trimestre == "III" | df_time_series$Trimestre == "iii"] <- "03"
df_time_series$Trimestre[df_time_series$Trimestre == "IV" | df_time_series$Trimestre == "iv"] <- "04"
# Print the updated Trimestre column
print(df_time_series$Trimestre)
## [1] "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3"
## [20] "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2"
## [39] "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1"
## [58] "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4"
## [77] "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3" "4" "1" "2" "3"
## [96] "4"
# Concatenar las columnas 'Año' y 'Trimestre'
df_time_series$Date <- paste(df_time_series$Año, df_time_series$Trimestre, sep = "/")
# Ver el resultado
print(df_time_series)
## Año Trimestre IED_Flujos Date
## 1 1999 1 3596.08 1999/1
## 2 1999 2 3395.89 1999/2
## 3 1999 3 3028.45 1999/3
## 4 1999 4 3939.90 1999/4
## 5 2000 1 4600.64 2000/1
## 6 2000 2 4857.42 2000/2
## 7 2000 3 3056.95 2000/3
## 8 2000 4 5733.68 2000/4
## 9 2001 1 3598.68 2001/1
## 10 2001 2 5218.83 2001/2
## 11 2001 3 16314.05 2001/3
## 12 2001 4 4925.63 2001/4
## 13 2002 1 5067.98 2002/1
## 14 2002 2 6258.52 2002/2
## 15 2002 3 6114.34 2002/3
## 16 2002 4 6658.37 2002/4
## 17 2003 1 3963.69 2003/1
## 18 2003 2 5547.34 2003/2
## 19 2003 3 2521.68 2003/3
## 20 2003 4 6217.27 2003/4
## 21 2004 1 9363.46 2004/1
## 22 2004 2 4351.50 2004/2
## 23 2004 3 3284.91 2004/3
## 24 2004 4 8015.70 2004/4
## 25 2005 1 6761.62 2005/1
## 26 2005 2 6773.62 2005/2
## 27 2005 3 5478.92 2005/3
## 28 2005 4 6781.66 2005/4
## 29 2006 1 7436.81 2006/1
## 30 2006 2 6634.31 2006/2
## 31 2006 3 2346.57 2006/3
## 32 2006 4 4814.85 2006/4
## 33 2007 1 10815.78 2007/1
## 34 2007 2 6137.64 2007/2
## 35 2007 3 7628.45 2007/3
## 36 2007 4 7811.47 2007/4
## 37 2008 1 8546.44 2008/1
## 38 2008 2 8376.14 2008/2
## 39 2008 3 5643.68 2008/3
## 40 2008 4 6936.20 2008/4
## 41 2009 1 6105.30 2009/1
## 42 2009 2 6094.18 2009/2
## 43 2009 3 2397.52 2009/3
## 44 2009 4 3252.95 2009/4
## 45 2010 1 8722.26 2010/1
## 46 2010 2 9301.62 2010/2
## 47 2010 3 3932.89 2010/3
## 48 2010 4 5232.50 2010/4
## 49 2011 1 8431.21 2011/1
## 50 2011 2 6697.41 2011/2
## 51 2011 3 4349.89 2011/3
## 52 2011 4 6154.00 2011/4
## 53 2012 1 7892.69 2012/1
## 54 2012 2 5622.13 2012/2
## 55 2012 3 5736.29 2012/3
## 56 2012 4 2518.21 2012/4
## 57 2013 1 10571.58 2013/1
## 58 2013 2 21019.14 2013/2
## 59 2013 3 4178.81 2013/3
## 60 2013 4 12584.89 2013/4
## 61 2014 1 13828.00 2014/1
## 62 2014 2 5478.74 2014/2
## 63 2014 3 3222.08 2014/3
## 64 2014 4 7822.43 2014/4
## 65 2015 1 12136.33 2015/1
## 66 2015 2 6656.85 2015/2
## 67 2015 3 9635.56 2015/3
## 68 2015 4 7515.02 2015/4
## 69 2016 1 12805.36 2016/1
## 70 2016 2 6210.62 2016/2
## 71 2016 3 4317.27 2016/3
## 72 2016 4 7855.73 2016/4
## 73 2017 1 13779.06 2017/1
## 74 2017 2 6814.16 2017/2
## 75 2017 3 6361.22 2017/3
## 76 2017 4 7062.61 2017/4
## 77 2018 1 14067.52 2018/1
## 78 2018 2 9577.75 2018/2
## 79 2018 3 4132.97 2018/3
## 80 2018 4 6322.19 2018/4
## 81 2019 1 15175.27 2019/1
## 82 2019 2 6504.62 2019/2
## 83 2019 3 8217.40 2019/3
## 84 2019 4 4679.87 2019/4
## 85 2020 1 16807.60 2020/1
## 86 2020 2 7293.96 2020/2
## 87 2020 3 1340.58 2020/3
## 88 2020 4 2763.75 2020/4
## 89 2021 1 16206.05 2021/1
## 90 2021 2 5883.73 2021/2
## 91 2021 3 6419.43 2021/3
## 92 2021 4 3044.31 2021/4
## 93 2022 1 22794.16 2022/1
## 94 2022 2 8164.27 2022/2
## 95 2022 3 3479.68 2022/3
## 96 2023 4 1777.26 2023/4
df_time_series$quarter=as.yearqtr(df_time_series$Date,format="%Y/%q")
head(df_time_series)
## Año Trimestre IED_Flujos Date quarter
## 1 1999 1 3596.08 1999/1 1999 Q1
## 2 1999 2 3395.89 1999/2 1999 Q2
## 3 1999 3 3028.45 1999/3 1999 Q3
## 4 1999 4 3939.90 1999/4 1999 Q4
## 5 2000 1 4600.64 2000/1 2000 Q1
## 6 2000 2 4857.42 2000/2 2000 Q2
#ggplot(df_time_series, aes(x = as.Date(paste(Año, Date, "2"), format = "%Y %m %d"), y = IED_Flujos)) +
# geom_line() +
# labs(x = "Year", y = "IED_Flujos", title = "Time Series Plot of IED_Flujos")
ts_data <- ts(df_time_series$IED_Flujos, start = c(1991, 1), frequency = 4)
# Perform the decomposition
decomposed_data <- decompose(ts_data)
# Plot the decomposition
plot(decomposed_data)
# it is important to assess whether the variables under study are stationary or not
ts_data <- ts(df_time_series$IED_Flujos, start = c(1991, 1), frequency = 4) # non-stationary
adf.test(ts_data)
## Warning in adf.test(ts_data): p-value smaller than printed p-value
##
## Augmented Dickey-Fuller Test
##
## data: ts_data
## Dickey-Fuller = -4.1994, Lag order = 4, p-value = 0.01
## alternative hypothesis: stationary
# Ho: There is no serial autocorrelation
# H1: There is serial autocorrelation
#Box.test(IED_db_ARMA_residuals,lag=5,type="Ljung-Box") # Reject the Ho. P-value is < 0.05 indicating that ARMA Model does show residual serial autocorrelation.
#print(Box.test(IED_db_ARMA_residuals))
ljung_box_test <- Box.test(ts_data, lag = 5, type = "Ljung-Box")
# Print the test it should perform the Ljung-Box test on results
print(ljung_box_test)
##
## Box-Ljung test
##
## data: ts_data
## X-squared = 21.982, df = 5, p-value = 0.0005277
Hypothesis 1: The presence of a higher Foreign Direct Investment Flow in Mexico is expected to exhibit a positive correlation with the economically active population in the country. This correlation suggests that the influx of foreign investment is attracted to Mexico due to its higher employment rate, indicating a well-educated workforce and a favorable business environment for foreign entities.
Hypothesis 2: There is a negative association between the level of foreign direct investment flow and the occurrence of robbery and theft crimes in a country. This hypothesis suggests that as the foreign direct investment flow increases, it can contribute to improved economic conditions and enhanced security measures, potentially leading to a reduction in robbery and theft incidents.
Hypothesis 3: A higher percentage of the patent rate per 100,000 inhabitants in a country is expected to have a strong correlation with the Foreign Direct Investment Flow. This variable is considered the most significant factor in attracting foreign investment. The rationale behind this hypothesis is that foreign investment brings new developments, advanced technology, and knowledge, which in turn drive innovation within the country.
linear_regresion_df<-lm(IED_FlujosMX ~ periodo+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(linear_regresion_df)
##
## Call:
## lm(formula = IED_FlujosMX ~ periodo + Empleo + Innovacion + Tipo_de_Cambio +
## CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera +
## PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio +
## Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -86345 -30391 5560 29891 80704
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.255e+08 3.382e+08 -0.371 0.71758
## periodo 5.680e+04 1.749e+05 0.325 0.75143
## Empleo 9.505e+04 4.186e+04 2.271 0.04426 *
## Innovacion 8.253e+04 2.933e+04 2.814 0.01685 *
## Tipo_de_Cambio 2.267e+04 2.644e+04 0.857 0.40958
## CO2_Emisiones 2.933e+05 1.999e+05 1.467 0.17025
## Educacion 2.055e+06 1.775e+06 1.158 0.27136
## Inseguridad_Robo 1.551e+02 9.177e+02 0.169 0.86888
## Densidad_Carretera 4.847e+06 7.888e+06 0.614 0.55142
## PIB_Per_Capita -7.667e+00 9.019e+00 -0.850 0.41343
## ExportacionesMX -1.283e+00 9.226e-01 -1.390 0.19194
## Salario_Diario -1.406e+02 4.142e+03 -0.034 0.97354
## Inseguridad_Homicidio 1.869e+04 1.093e+04 1.711 0.11518
## Densidad_Poblacion -2.660e+05 8.498e+04 -3.130 0.00957 **
## INPC -1.328e+04 2.013e+04 -0.660 0.52305
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 69350 on 11 degrees of freedom
## Multiple R-squared: 0.8977, Adjusted R-squared: 0.7676
## F-statistic: 6.897 on 14 and 11 DF, p-value: 0.001377
linear_regresion2_df<-lm(IED_FlujosMX ~ Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(linear_regresion2_df)
##
## Call:
## lm(formula = IED_FlujosMX ~ Empleo + Innovacion + Tipo_de_Cambio +
## CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera +
## PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio +
## Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -92169 -31318 5827 27474 83660
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.569e+07 6.050e+06 -2.594 0.02350 *
## Empleo 9.333e+04 3.995e+04 2.336 0.03764 *
## Innovacion 8.272e+04 2.821e+04 2.933 0.01254 *
## Tipo_de_Cambio 1.762e+04 2.058e+04 0.856 0.40863
## CO2_Emisiones 2.756e+05 1.850e+05 1.490 0.16207
## Educacion 2.577e+06 7.264e+05 3.548 0.00401 **
## Inseguridad_Robo 2.294e+02 8.550e+02 0.268 0.79302
## Densidad_Carretera 4.838e+06 7.588e+06 0.638 0.53576
## PIB_Per_Capita -7.182e+00 8.557e+00 -0.839 0.41770
## ExportacionesMX -1.067e+00 6.173e-01 -1.729 0.10943
## Salario_Diario -1.419e+02 3.984e+03 -0.036 0.97218
## Inseguridad_Homicidio 1.950e+04 1.023e+04 1.905 0.08098 .
## Densidad_Poblacion -2.674e+05 8.165e+04 -3.275 0.00664 **
## INPC -1.153e+04 1.866e+04 -0.618 0.54829
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 66720 on 12 degrees of freedom
## Multiple R-squared: 0.8968, Adjusted R-squared: 0.7849
## F-statistic: 8.017 on 13 and 12 DF, p-value: 0.0004816
log_linear_regresion_df<-lm(log(IED_FlujosMX) ~ periodo+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(log_linear_regresion_df)
##
## Call:
## lm(formula = log(IED_FlujosMX) ~ periodo + Empleo + Innovacion +
## Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo +
## Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario +
## Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20253 -0.08031 0.02460 0.06382 0.19858
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.970e+01 7.581e+02 0.052 0.9592
## periodo -3.140e-02 3.921e-01 -0.080 0.9376
## Empleo 2.127e-01 9.385e-02 2.266 0.0446 *
## Innovacion 1.784e-01 6.575e-02 2.713 0.0202 *
## Tipo_de_Cambio 2.969e-02 5.928e-02 0.501 0.6264
## CO2_Emisiones 5.625e-01 4.481e-01 1.255 0.2354
## Educacion 5.430e+00 3.979e+00 1.365 0.1996
## Inseguridad_Robo -6.764e-04 2.058e-03 -0.329 0.7485
## Densidad_Carretera 1.071e+01 1.768e+01 0.606 0.5569
## PIB_Per_Capita -1.695e-05 2.022e-05 -0.838 0.4198
## ExportacionesMX -2.243e-06 2.068e-06 -1.084 0.3014
## Salario_Diario -1.994e-03 9.285e-03 -0.215 0.8339
## Inseguridad_Homicidio 4.624e-02 2.450e-02 1.888 0.0857 .
## Densidad_Poblacion -5.507e-01 1.905e-01 -2.891 0.0147 *
## INPC -1.558e-02 4.514e-02 -0.345 0.7364
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1555 on 11 degrees of freedom
## Multiple R-squared: 0.8944, Adjusted R-squared: 0.7601
## F-statistic: 6.658 on 14 and 11 DF, p-value: 0.001609
log_linear_regresion2_df<-lm(log(IED_FlujosMX) ~ Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(log_linear_regresion2_df)
##
## Call:
## lm(formula = log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio +
## CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera +
## PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio +
## Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.20292 -0.07955 0.02458 0.06454 0.20099
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.101e+01 1.350e+01 -1.556 0.14572
## Empleo 2.137e-01 8.916e-02 2.396 0.03374 *
## Innovacion 1.783e-01 6.296e-02 2.832 0.01512 *
## Tipo_de_Cambio 3.248e-02 4.593e-02 0.707 0.49295
## CO2_Emisiones 5.723e-01 4.128e-01 1.386 0.19092
## Educacion 5.141e+00 1.621e+00 3.171 0.00805 **
## Inseguridad_Robo -7.175e-04 1.908e-03 -0.376 0.71349
## Densidad_Carretera 1.072e+01 1.694e+01 0.633 0.53868
## PIB_Per_Capita -1.722e-05 1.910e-05 -0.901 0.38510
## ExportacionesMX -2.362e-06 1.378e-06 -1.714 0.11217
## Salario_Diario -1.993e-03 8.893e-03 -0.224 0.82642
## Inseguridad_Homicidio 4.580e-02 2.284e-02 2.005 0.06808 .
## Densidad_Poblacion -5.499e-01 1.822e-01 -3.018 0.01071 *
## INPC -1.656e-02 4.164e-02 -0.398 0.69794
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1489 on 12 degrees of freedom
## Multiple R-squared: 0.8944, Adjusted R-squared: 0.78
## F-statistic: 7.817 on 13 and 12 DF, p-value: 0.0005452
polynomial_regresion_df <- lm(IED_FlujosMX ~ periodo+Empleo+Innovacion+I(Innovacion^2)+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC, data=df_cash )
summary(polynomial_regresion_df)
##
## Call:
## lm(formula = IED_FlujosMX ~ periodo + Empleo + Innovacion + I(Innovacion^2) +
## Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo +
## Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario +
## Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -82800 -29173 1974 29850 82890
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -5.784e+07 4.200e+08 -0.138 0.8932
## periodo 2.294e+04 2.152e+05 0.107 0.9172
## Empleo 9.368e+04 4.395e+04 2.131 0.0589 .
## Innovacion -9.438e+04 5.953e+05 -0.159 0.8772
## I(Innovacion^2) 6.718e+03 2.258e+04 0.298 0.7721
## Tipo_de_Cambio 1.929e+04 2.986e+04 0.646 0.5328
## CO2_Emisiones 2.780e+05 2.149e+05 1.294 0.2249
## Educacion 2.219e+06 1.933e+06 1.148 0.2777
## Inseguridad_Robo 4.308e+01 1.030e+03 0.042 0.9675
## Densidad_Carretera 4.223e+06 8.499e+06 0.497 0.6300
## PIB_Per_Capita -6.985e+00 9.693e+00 -0.721 0.4876
## ExportacionesMX -1.220e+00 9.860e-01 -1.238 0.2441
## Salario_Diario -1.343e+03 5.919e+03 -0.227 0.8251
## Inseguridad_Homicidio 2.054e+04 1.299e+04 1.581 0.1450
## Densidad_Poblacion -2.694e+05 8.947e+04 -3.011 0.0131 *
## INPC -5.523e+03 3.349e+04 -0.165 0.8723
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 72420 on 10 degrees of freedom
## Multiple R-squared: 0.8986, Adjusted R-squared: 0.7466
## F-statistic: 5.91 on 15 and 10 DF, p-value: 0.003692
polynomial_regresion2_df <- lm(IED_FlujosMX ~ Empleo+Innovacion+I(Innovacion^2)+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC, data=df_cash )
summary(polynomial_regresion2_df)
##
## Call:
## lm(formula = IED_FlujosMX ~ Empleo + Innovacion + I(Innovacion^2) +
## Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo +
## Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario +
## Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -83822 -29212 2696 29056 84164
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.307e+07 8.667e+06 -1.509 0.15957
## Empleo 9.292e+04 4.137e+04 2.246 0.04622 *
## Innovacion -1.278e+05 4.825e+05 -0.265 0.79596
## I(Innovacion^2) 7.991e+03 1.828e+04 0.437 0.67044
## Tipo_de_Cambio 1.718e+04 2.133e+04 0.805 0.43769
## CO2_Emisiones 2.700e+05 1.920e+05 1.406 0.18724
## Educacion 2.402e+06 8.523e+05 2.818 0.01672 *
## Inseguridad_Robo 4.347e+01 9.822e+02 0.044 0.96549
## Densidad_Carretera 4.102e+06 8.036e+06 0.511 0.61978
## PIB_Per_Capita -6.715e+00 8.925e+00 -0.752 0.46762
## ExportacionesMX -1.146e+00 6.639e-01 -1.726 0.11234
## Salario_Diario -1.571e+03 5.264e+03 -0.298 0.77089
## Inseguridad_Homicidio 2.113e+04 1.123e+04 1.881 0.08671 .
## Densidad_Poblacion -2.705e+05 8.484e+04 -3.188 0.00863 **
## INPC -3.543e+03 2.658e+04 -0.133 0.89640
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 69080 on 11 degrees of freedom
## Multiple R-squared: 0.8985, Adjusted R-squared: 0.7694
## F-statistic: 6.956 on 14 and 11 DF, p-value: 0.001325
### Splitting the Data into Training and Test Sets
# Let's randomly split the data into a training set and a test set
set.seed(123) # Sets the random seed for reproducibility of results
training.samples <- df_cash$IED_FlujosMX %>%
createDataPartition(p = 0.75, list = FALSE) # Consider 75% of the data for building a predictive model
train.data <- df_cash[training.samples, ] # Training data to fit the linear regression model
test.data <- df_cash[-training.samples, ] # Testing data to test the linear regression model
# LASSO regression via glmnet package can only take numerical observations. Then, the dataset is transformed to model.matrix() format.
# Independent variables
x <- model.matrix(log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + Densidad_Poblacion + INPC, train.data)[,-1] # OLS model specification
x
## Empleo Innovacion Tipo_de_Cambio CO2_Emisiones Educacion Inseguridad_Robo
## 2 96.47043 11.37000 9.94 3.850000 7.2968 314.78
## 3 96.47043 12.46000 9.52 3.690000 7.4012 272.89
## 4 97.83000 13.15000 9.60 3.870000 7.5132 216.98
## 5 97.36000 13.47000 9.17 3.810000 7.6224 214.53
## 7 97.06000 11.81000 11.20 3.950000 7.8364 183.22
## 8 96.48000 12.61000 11.22 3.980000 7.9476 146.28
## 9 97.17000 13.41000 10.71 4.100000 8.0440 136.94
## 10 96.53000 14.23000 10.88 4.190000 8.1320 135.59
## 11 96.60000 15.04000 10.90 4.220000 8.2360 145.92
## 12 95.68000 14.82000 13.77 4.190000 8.3280 158.17
## 13 95.20000 12.59000 13.04 4.040000 8.4160 175.77
## 14 95.06000 12.69000 12.38 4.110000 8.5040 201.94
## 15 95.49000 12.10000 13.98 4.190000 8.5824 212.61
## 16 95.53000 13.03000 12.99 4.200000 8.6540 190.28
## 18 96.24000 13.65000 14.73 3.890000 8.8460 154.41
## 19 96.04000 15.11000 17.34 3.930000 8.9340 180.44
## 20 96.62000 14.40000 20.66 3.890000 9.0220 160.57
## 21 96.85000 14.05000 19.74 3.840000 9.1100 230.43
## 22 96.64000 13.25000 19.66 3.650000 9.1980 184.25
## 23 97.09000 12.70000 18.87 3.590000 9.2860 173.45
## 24 96.21000 11.28000 19.94 3.945217 9.3740 133.90
## 26 97.24000 13.10583 19.41 3.945217 9.5800 120.49
## Densidad_Carretera PIB_Per_Capita ExportacionesMX Salario_Diario
## 2 0.05 126738.8 248690.6 31.91
## 3 0.06 129164.7 235960.5 31.91
## 4 0.06 130874.9 248057.2 35.12
## 5 0.06 128083.4 205482.9 37.57
## 7 0.06 128737.9 265825.7 41.53
## 8 0.06 132563.5 261173.9 43.30
## 9 0.06 132941.1 292695.1 45.24
## 10 0.06 135894.9 303472.5 47.05
## 11 0.06 137795.7 320110.6 48.88
## 12 0.07 135176.0 336297.2 50.84
## 13 0.07 131233.0 357980.1 53.19
## 14 0.07 134991.7 374607.6 55.77
## 15 0.07 138891.9 437299.9 58.06
## 16 0.07 141530.2 423992.5 60.75
## 18 0.08 147277.4 535151.9 65.58
## 19 0.08 149433.5 583386.1 70.10
## 20 0.08 152275.4 704268.5 73.04
## 21 0.09 153235.7 669368.6 88.36
## 22 0.09 153133.8 695447.7 88.36
## 23 0.09 150233.1 648679.3 102.68
## 24 0.09 142609.3 749594.7 123.22
## 26 0.09 146826.7 713259.0 172.87
## Inseguridad_Homicidio Densidad_Poblacion INPC
## 2 14.3200 48.76 39.47
## 3 12.6400 49.48 44.34
## 4 10.8600 50.58 48.31
## 5 10.2500 51.28 50.43
## 7 9.8100 52.61 55.43
## 8 8.9200 53.27 58.31
## 9 9.2200 54.78 60.25
## 10 9.6000 55.44 62.69
## 11 8.0400 56.17 65.05
## 12 12.5200 56.96 69.30
## 13 17.4600 57.73 71.77
## 14 22.4300 58.45 74.93
## 15 23.4200 59.15 77.79
## 16 22.0900 59.85 80.57
## 18 16.9300 60.17 87.19
## 19 17.3700 60.86 89.05
## 20 20.3100 61.57 92.04
## 21 26.2200 62.28 98.27
## 22 29.5900 63.11 99.91
## 23 29.2100 63.90 105.93
## 24 28.9800 64.59 109.27
## 26 17.2924 65.60 126.48
# x <- model.matrix(~., train.data)[,-1] # Matrix of independent variables X's
y <- train.data$IED_FlujosMX # Dependent variable
y
## [1] 210875.6 299734.4 362631.8 546548.4 368752.8 481349.2 458544.8 368495.8
## [9] 542793.7 586217.7 324318.4 449223.7 460653.8 350978.6 512758.2 699904.1
## [17] 700091.6 683318.0 671018.4 615945.4 514711.7 555771.9
# In estimating LASSO regression, it is important to define the lambda that minimizes the prediction error rate.
# Cross-validation ensures that every data/observation from the original dataset (datains) has a chance of appearing in train and test datasets.
# Find the best lambda using cross-validation.
set.seed(123)
cv.lasso <- cv.glmnet(x, y, alpha = 1) # alpha = 1 for LASSO
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
## fold
# NO SE PUEDE USAR CON SOLO UNA VARIABLE
# Display the best lambda value
cv.lasso$lambda.min # Lambda: a numeric value defining the amount of shrinkage. Why min? The higher the value of lambda, the more penalization there is
## [1] 11569.57
# Fit the final model on the training data
lassomodel <- glmnet(x, y, alpha = 1, lambda = cv.lasso$lambda.min)
lassomodel
##
## Call: glmnet(x = x, y = y, alpha = 1, lambda = cv.lasso$lambda.min)
##
## Df %Dev Lambda
## 1 6 79.04 11570
# Display regression coefficients
coef(lassomodel)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) -1.213581e+06
## Empleo 7.376953e+03
## Innovacion 5.304716e+04
## Tipo_de_Cambio 1.585699e+04
## CO2_Emisiones -1.913095e+04
## Educacion .
## Inseguridad_Robo .
## Densidad_Carretera 1.268284e+06
## PIB_Per_Capita 3.837093e-01
## ExportacionesMX .
## Salario_Diario .
## Inseguridad_Homicidio .
## Densidad_Poblacion .
## INPC .
# Make predictions on the test data
x.test <- model.matrix(log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + Densidad_Poblacion + INPC, test.data)[,-1] # OLS model specification
# x.test <- model.matrix(Weekly_Sales ~ ., test.data)[,-1]
lassopredictions <- lassomodel %>% predict(x.test) %>% as.vector()
# Model Accuracy
data.frame(
RMSE = RMSE(lassopredictions, test.data$IED_FlujosMX),
Rsquare = R2(lassopredictions, test.data$IED_FlujosMX)
)
## RMSE Rsquare
## 1 144761.4 0.4561103
### Visualizing LASSO Regression Results
lbs_fun <- function(fit, offset_x = 1, ...) {
L <- length(fit$lambda)
x <- log(fit$lambda[L]) + offset_x
y <- fit$beta[, L]
labs <- names(y)
text(x, y, labels = labs, ...)
}
lasso <- glmnet(scale(x), y, alpha = 1)
plot(lasso, xvar = "lambda", label = TRUE)
lbs_fun(lasso)
abline(v = cv.lasso$lambda.min, col = "brown", lty = 2)
abline(v = cv.lasso$lambda.1se, col = "yellow", lty = 2)
# Find the best lambda using cross-validation
set.seed(123)
cv.ridge <- cv.glmnet(x, y, alpha = 0.1) # alpha = 0 for RIDGE
## Warning: Option grouped=FALSE enforced in cv.glmnet, since < 3 observations per
## fold
# Display the best lambda value
cv.ridge$lambda.min # Lambda: a numeric value defining the amount of shrinkage. Why min? The higher the value of lambda, the more penalization there is
## [1] 37885.11
# Fit the final model on the training data
ridgemodel <- glmnet(x, y, alpha = 0, lambda = cv.ridge$lambda.min)
# Display regression coefficients
coef(ridgemodel)
## 14 x 1 sparse Matrix of class "dgCMatrix"
## s0
## (Intercept) -2.232324e+06
## Empleo 1.835178e+04
## Innovacion 4.460466e+04
## Tipo_de_Cambio 6.556839e+03
## CO2_Emisiones -7.646332e+04
## Educacion 1.598268e+04
## Inseguridad_Robo -2.635834e+02
## Densidad_Carretera 1.368237e+06
## PIB_Per_Capita 2.053151e+00
## ExportacionesMX 4.333402e-02
## Salario_Diario -3.615001e+02
## Inseguridad_Homicidio 4.690874e+02
## Densidad_Poblacion 1.530222e+03
## INPC 1.572627e+02
# Make predictions on the test data
x.test <- model.matrix(log(IED_FlujosMX) ~ Empleo + Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario + Inseguridad_Homicidio + Densidad_Poblacion + INPC, test.data)[,-1]
ridgepredictions <- ridgemodel %>% predict(x.test) %>% as.vector()
# Model Accuracy
data.frame(
RMSE = RMSE(ridgepredictions, test.data$IED_FlujosMX),
Rsquare = R2(ridgepredictions, test.data$IED_FlujosMX)
)
## RMSE Rsquare
## 1 135411.3 0.5144119
### Visualizing Ridge Regression Results
ridge <- glmnet(scale(x), y, alpha = 0)
plot(ridge, xvar = "lambda", label = TRUE)
lbs_fun(ridge)
abline(v = cv.ridge$lambda.min, col = "brown", lty = 2)
abline(v = cv.ridge$lambda.1se, col = "yellow", lty = 2)
### Diagnostic Tests
# Linear regression:
vif(linear_regresion_df)
## periodo Empleo Innovacion
## 9300.244675 4.730382 5.141043
## Tipo_de_Cambio CO2_Emisiones Educacion
## 62.602797 6.723812 8347.798634
## Inseguridad_Robo Densidad_Carretera PIB_Per_Capita
## 9.947260 57.766617 33.202772
## ExportacionesMX Salario_Diario Inseguridad_Homicidio
## 168.278645 114.585619 31.424549
## Densidad_Poblacion INPC
## 1099.236379 1296.841200
bptest(linear_regresion_df)
##
## studentized Breusch-Pagan test
##
## data: linear_regresion_df
## BP = 6.5965, df = 14, p-value = 0.9491
AIC(linear_regresion_df)
## [1] 663.0599
histogram(linear_regresion_df$residuals)
# Linear regression without "periodo":
vif(linear_regresion2_df)
## Empleo Innovacion Tipo_de_Cambio
## 4.654336 5.138823 40.959177
## CO2_Emisiones Educacion Inseguridad_Robo
## 6.222893 1511.043702 9.328882
## Densidad_Carretera PIB_Per_Capita ExportacionesMX
## 57.765894 32.294264 81.407268
## Salario_Diario Inseguridad_Homicidio Densidad_Poblacion
## 114.585512 29.790759 1096.435161
## INPC
## 1203.388890
bptest(linear_regresion2_df)
##
## studentized Breusch-Pagan test
##
## data: linear_regresion2_df
## BP = 6.9369, df = 13, p-value = 0.9054
AIC(linear_regresion2_df)
## [1] 661.3081
histogram(linear_regresion2_df$residuals)
# Log Linear regression:
vif(log_linear_regresion_df)
## periodo Empleo Innovacion
## 9300.244675 4.730382 5.141043
## Tipo_de_Cambio CO2_Emisiones Educacion
## 62.602797 6.723812 8347.798634
## Inseguridad_Robo Densidad_Carretera PIB_Per_Capita
## 9.947260 57.766617 33.202772
## ExportacionesMX Salario_Diario Inseguridad_Homicidio
## 168.278645 114.585619 31.424549
## Densidad_Poblacion INPC
## 1099.236379 1296.841200
bptest(log_linear_regresion_df)
##
## studentized Breusch-Pagan test
##
## data: log_linear_regresion_df
## BP = 5.0509, df = 14, p-value = 0.9851
AIC(log_linear_regresion_df)
## [1] -13.36425
histogram(log_linear_regresion_df$residuals)
# Log Linear regression without "periodo":
vif(log_linear_regresion2_df)
## Empleo Innovacion Tipo_de_Cambio
## 4.654336 5.138823 40.959177
## CO2_Emisiones Educacion Inseguridad_Robo
## 6.222893 1511.043702 9.328882
## Densidad_Carretera PIB_Per_Capita ExportacionesMX
## 57.765894 32.294264 81.407268
## Salario_Diario Inseguridad_Homicidio Densidad_Poblacion
## 114.585512 29.790759 1096.435161
## INPC
## 1203.388890
bptest(log_linear_regresion2_df)
##
## studentized Breusch-Pagan test
##
## data: log_linear_regresion2_df
## BP = 4.9872, df = 13, p-value = 0.9755
AIC(log_linear_regresion2_df)
## [1] -15.34909
histogram(log_linear_regresion2_df$residuals)
# Polynomial regression:
vif(polynomial_regresion_df)
## periodo Empleo Innovacion
## 12911.624364 4.782716 1942.427754
## I(Innovacion^2) Tipo_de_Cambio CO2_Emisiones
## 1944.059476 73.201471 7.130581
## Educacion Inseguridad_Robo Densidad_Carretera
## 9083.580621 11.481684 61.505633
## PIB_Per_Capita ExportacionesMX Salario_Diario
## 35.166094 176.261000 214.651936
## Inseguridad_Homicidio Densidad_Poblacion INPC
## 40.756241 1117.525234 3291.032434
bptest(polynomial_regresion_df)
##
## studentized Breusch-Pagan test
##
## data: polynomial_regresion_df
## BP = 11.514, df = 15, p-value = 0.7154
AIC(polynomial_regresion_df)
## [1] 664.8307
histogram(polynomial_regresion_df$residuals)
# Polynomial regression without "periodo":
vif(polynomial_regresion2_df)
## Empleo Innovacion I(Innovacion^2)
## 4.656670 1402.428957 1400.306288
## Tipo_de_Cambio CO2_Emisiones Educacion
## 41.050732 6.250776 1939.855948
## Inseguridad_Robo Densidad_Carretera PIB_Per_Capita
## 11.481533 60.412638 32.763627
## ExportacionesMX Salario_Diario Inseguridad_Homicidio
## 87.815456 186.570266 33.464161
## Densidad_Poblacion INPC
## 1103.966735 2278.408362
bptest(polynomial_regresion2_df)
##
## studentized Breusch-Pagan test
##
## data: polynomial_regresion2_df
## BP = 11.056, df = 14, p-value = 0.6816
AIC(polynomial_regresion2_df)
## [1] 662.8602
histogram(polynomial_regresion2_df$residuals)
### Adjusting the regression models by not including all variables
# Adjusted Linear regression:
linear_regresion3_df<-lm(IED_FlujosMX ~ lag(IED_FlujosMX)+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(linear_regresion3_df)
##
## Call:
## lm(formula = IED_FlujosMX ~ lag(IED_FlujosMX) + Empleo + Innovacion +
## Tipo_de_Cambio + CO2_Emisiones + Educacion + Inseguridad_Robo +
## Densidad_Carretera + PIB_Per_Capita + ExportacionesMX + Salario_Diario +
## Inseguridad_Homicidio + Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -69413 -25598 -3945 23034 81196
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.928e+07 5.817e+06 -3.315 0.00782 **
## lag(IED_FlujosMX) -4.201e-01 2.207e-01 -1.904 0.08611 .
## Empleo 1.190e+05 4.045e+04 2.941 0.01476 *
## Innovacion 5.969e+04 3.072e+04 1.943 0.08064 .
## Tipo_de_Cambio 2.858e+04 2.086e+04 1.370 0.20062
## CO2_Emisiones 1.312e+05 1.835e+05 0.715 0.49096
## Educacion 2.477e+06 8.371e+05 2.959 0.01431 *
## Inseguridad_Robo 5.658e+02 8.528e+02 0.663 0.52203
## Densidad_Carretera 7.320e+06 7.153e+06 1.023 0.33022
## PIB_Per_Capita -1.011e+01 8.350e+00 -1.211 0.25375
## ExportacionesMX -9.568e-01 5.705e-01 -1.677 0.12443
## Salario_Diario -1.148e+03 4.811e+03 -0.239 0.81628
## Inseguridad_Homicidio 7.911e+03 1.323e+04 0.598 0.56323
## Densidad_Poblacion -1.997e+05 9.551e+04 -2.091 0.06301 .
## INPC -1.975e+04 1.996e+04 -0.989 0.34587
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 61180 on 10 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9214, Adjusted R-squared: 0.8112
## F-statistic: 8.368 on 14 and 10 DF, p-value: 0.0009276
# Adjusted Logartimic Linear regression:
log_linear_regresion3_df<-lm(log(IED_FlujosMX) ~ lag(IED_FlujosMX)+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(log_linear_regresion3_df)
##
## Call:
## lm(formula = log(IED_FlujosMX) ~ lag(IED_FlujosMX) + Empleo +
## Innovacion + Tipo_de_Cambio + CO2_Emisiones + Educacion +
## Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita +
## ExportacionesMX + Salario_Diario + Inseguridad_Homicidio +
## Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.14763 -0.06662 0.01253 0.05292 0.19195
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.791e+01 1.338e+01 -2.085 0.0636 .
## lag(IED_FlujosMX) -9.335e-07 5.076e-07 -1.839 0.0958 .
## Empleo 2.784e-01 9.306e-02 2.991 0.0135 *
## Innovacion 1.183e-01 7.066e-02 1.674 0.1251
## Tipo_de_Cambio 6.257e-02 4.799e-02 1.304 0.2215
## CO2_Emisiones 2.816e-01 4.221e-01 0.667 0.5199
## Educacion 4.353e+00 1.926e+00 2.261 0.0473 *
## Inseguridad_Robo 2.927e-04 1.962e-03 0.149 0.8844
## Densidad_Carretera 1.697e+01 1.645e+01 1.031 0.3266
## PIB_Per_Capita -2.592e-05 1.921e-05 -1.349 0.2069
## ExportacionesMX -2.089e-06 1.312e-06 -1.592 0.1425
## Salario_Diario -7.603e-03 1.107e-02 -0.687 0.5077
## Inseguridad_Homicidio 1.410e-02 3.044e-02 0.463 0.6532
## Densidad_Poblacion -3.579e-01 2.197e-01 -1.629 0.1344
## INPC -2.178e-02 4.592e-02 -0.474 0.6455
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1407 on 10 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9134, Adjusted R-squared: 0.7921
## F-statistic: 7.533 on 14 and 10 DF, p-value: 0.00144
# Adjusted Polynomial regression:
polynomial_regresion3_df <- lm(IED_FlujosMX ~ lag(IED_FlujosMX)+Empleo+Innovacion+Tipo_de_Cambio+CO2_Emisiones+Educacion+I(Innovacion^2)+Inseguridad_Robo+Densidad_Carretera+PIB_Per_Capita+ExportacionesMX+Salario_Diario+Inseguridad_Homicidio+Densidad_Poblacion+INPC,data=df_cash)
summary(polynomial_regresion3_df)
##
## Call:
## lm(formula = IED_FlujosMX ~ lag(IED_FlujosMX) + Empleo + Innovacion +
## Tipo_de_Cambio + CO2_Emisiones + Educacion + I(Innovacion^2) +
## Inseguridad_Robo + Densidad_Carretera + PIB_Per_Capita +
## ExportacionesMX + Salario_Diario + Inseguridad_Homicidio +
## Densidad_Poblacion + INPC, data = df_cash)
##
## Residuals:
## Min 1Q Median 3Q Max
## -69204 -25692 -3701 23449 81170
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.942e+07 8.754e+06 -2.218 0.0537 .
## lag(IED_FlujosMX) -4.214e-01 2.408e-01 -1.750 0.1141
## Empleo 1.191e+05 4.296e+04 2.772 0.0217 *
## Innovacion 6.955e+04 4.628e+05 0.150 0.8838
## Tipo_de_Cambio 2.865e+04 2.219e+04 1.291 0.2290
## CO2_Emisiones 1.310e+05 1.935e+05 0.677 0.5153
## Educacion 2.484e+06 9.430e+05 2.634 0.0272 *
## I(Innovacion^2) -3.776e+02 1.767e+04 -0.021 0.9834
## Inseguridad_Robo 5.761e+02 1.019e+03 0.565 0.5856
## Densidad_Carretera 7.364e+06 7.812e+06 0.943 0.3705
## PIB_Per_Capita -1.015e+01 8.950e+00 -1.134 0.2862
## ExportacionesMX -9.527e-01 6.312e-01 -1.510 0.1654
## Salario_Diario -1.088e+03 5.781e+03 -0.188 0.8549
## Inseguridad_Homicidio 7.789e+03 1.508e+04 0.516 0.6180
## Densidad_Poblacion -1.993e+05 1.026e+05 -1.943 0.0839 .
## INPC -2.013e+04 2.767e+04 -0.728 0.4853
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 64490 on 9 degrees of freedom
## (1 observation deleted due to missingness)
## Multiple R-squared: 0.9214, Adjusted R-squared: 0.7903
## F-statistic: 7.029 on 15 and 9 DF, p-value: 0.002846
### Adjusted Diagnostics Tests
# Adjusted Linear regression:
vif(linear_regresion3_df)
## lag(IED_FlujosMX) Empleo Innovacion
## 6.677654 5.676148 6.390772
## Tipo_de_Cambio CO2_Emisiones Educacion
## 45.921300 6.623875 2094.472155
## Inseguridad_Robo Densidad_Carretera PIB_Per_Capita
## 9.707707 54.669429 34.229534
## ExportacionesMX Salario_Diario Inseguridad_Homicidio
## 78.538609 187.950284 58.857167
## Densidad_Poblacion INPC
## 1536.038948 1443.904797
bptest(linear_regresion3_df)
##
## studentized Breusch-Pagan test
##
## data: linear_regresion3_df
## BP = 8.8414, df = 14, p-value = 0.8411
AIC(linear_regresion3_df)
## [1] 631.1195
histogram(linear_regresion3_df$residuals)
# Adjusted Log Linear regression:
vif(log_linear_regresion3_df)
## lag(IED_FlujosMX) Empleo Innovacion
## 6.677654 5.676148 6.390772
## Tipo_de_Cambio CO2_Emisiones Educacion
## 45.921300 6.623875 2094.472155
## Inseguridad_Robo Densidad_Carretera PIB_Per_Capita
## 9.707707 54.669429 34.229534
## ExportacionesMX Salario_Diario Inseguridad_Homicidio
## 78.538609 187.950284 58.857167
## Densidad_Poblacion INPC
## 1536.038948 1443.904797
bptest(log_linear_regresion3_df)
##
## studentized Breusch-Pagan test
##
## data: log_linear_regresion3_df
## BP = 7.6539, df = 14, p-value = 0.9066
AIC(log_linear_regresion3_df)
## [1] -18.00063
histogram(log_linear_regresion3_df$residuals)
# Adjusted Polynomial regression:
vif(polynomial_regresion3_df)
## lag(IED_FlujosMX) Empleo Innovacion
## 7.158290 5.762157 1305.720393
## Tipo_de_Cambio CO2_Emisiones Educacion
## 46.778600 6.632396 2392.565230
## I(Innovacion^2) Inseguridad_Robo Densidad_Carretera
## 1342.833493 12.470976 58.698280
## PIB_Per_Capita ExportacionesMX Salario_Diario
## 35.392175 86.523871 244.246185
## Inseguridad_Homicidio Densidad_Poblacion INPC
## 68.813372 1595.158196 2495.929531
bptest(polynomial_regresion3_df)
##
## studentized Breusch-Pagan test
##
## data: polynomial_regresion3_df
## BP = 9.0352, df = 15, p-value = 0.8757
AIC(polynomial_regresion3_df)
## [1] 633.1182
histogram(polynomial_regresion3_df$residuals)
To select the best regression model, several analyses and changes were made to the variables and tests. Approximately five changes were made, including excluding variables suggested by the lasso model, adding variables based on their significance with the dependent variable, and considering the inclusion of lagged variables. The final model included all variables and the lagged variable “lag(IED_FlujoMX)”. The Adjusted Logarithmic Linear Regression Model was chosen as it showed a good fit with a relatively high R-squared value of 0.7921. Although slightly lower than the R-squared value of the linear regression model (0.8112), the logarithmic model still provides a good fit. Additionally, the residual error of the logarithmic model (-18.00063) was significantly lower than that of the linear regression model (631.1195), further supporting its suitability. The regression results of the selected model indicate that employment (Empleo) and education (Educacion) levels are significant predictors of foreign direct investment in Mexico. Specifically, the coefficient for “Empleo” is highly positive, suggesting that an increase in the economically active population is associated with a higher level of foreign direct investment. This finding supports the hypothesis that there is a positive correlation between employment and foreign direct investment flow. Similarly, the positive coefficient for “Educacion” indicates that higher education levels contribute to attracting foreign direct investment to Mexico. (Education) also shows a significant positive coefficient, suggesting that higher education levels are associated with higher levels of foreign direct investment in Mexico.
effect_plot(log_linear_regresion3_df,pred=Empleo,interval=TRUE)
## Using data df_cash from global environment. This could cause incorrect
## results if df_cash has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.
## Warning: Removed 1 row containing missing values (`geom_path()`).
effect_plot(log_linear_regresion3_df,pred=Educacion,interval=TRUE)
## Using data df_cash from global environment. This could cause incorrect
## results if df_cash has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.
## Warning: Removed 1 row containing missing values (`geom_path()`).
effect_plot(log_linear_regresion3_df,pred=Innovacion,interval=TRUE)
## Using data df_cash from global environment. This could cause incorrect
## results if df_cash has been altered since the model was fit. You can
## manually provide the data to the "data =" argument.
## Warning: Removed 1 row containing missing values (`geom_path()`).
Model Selection: Initially, the Linear Regression Model showed the best fit among all the regression models, with the highest adjusted R-squared value. However, considering multicollinearity issues, adjustments were made to the variables. The inclusion of the new variable “lag(IED_FlujosMX)” and the exclusion of the “periodo” variable resulted in an improved fit. The Adjusted Logarithmic Linear Regression Model had the second-highest R-squared value (0.7921) compared to the Linear Regression Model (0.8112), but it had a significantly lower residual error (-18.00063) compared to the linear model (631.1195).
Hypothesis Confirmation: Based on the regression model results, Hypothesis 1, which suggests a positive relationship between employment (Empleo) and foreign direct investment (FDI), aligns with the analysis. The highly positive coefficient for “Empleo” indicates that an increase in the economically active population is associated with a higher level of FDI.
Ridge Model Performance: The Ridge model exhibited a lower Root Mean Squared Error (RMSE) and a slightly higher R-squared value compared to the Lasso model, suggesting that it may perform better in predicting FDI. The Ridge model had an RMSE of 135411.3 and an R-squared value of 0.5144119.
Multicollinearity: There is evidence of multicollinearity among the independent variables in all three models. This suggests that some variables may be correlated with each other, potentially affecting the stability and reliability of the coefficient estimates. Although efforts were made to address multicollinearity, further investigation and analysis are needed to mitigate its presence.
Heteroscedasticity: The Breusch-Pagan tests conducted on all models did not indicate any significant signs of heteroscedasticity, suggesting that the assumption of constant variance is reasonable.
Insights and Lessons Learned: The analysis revealed that variables such as education, employment, and population density had more significant impacts on foreign direct investment than initially anticipated. This highlights the importance of conducting thorough analysis and diagnostic tests to uncover the true relationships and identify the most influential factors. It also emphasizes the need for continuous investigation and further analysis to refine and obtain accurate results.