Tesla, one of the leading companies in the production of electric vehicles, has expressed interest in expanding its presence in Mexico by building an automotive plant in the state of Nuevo León. In this sense, it is crucial to identify the municipality that offers the most favorable conditions for the installation of this plant, considering factors such as the availability of human resources, existing infrastructure, government incentives and environmental conditions, among others.
In this research project, we will use data analysis and visualization tools in R to explore different variables and assess the suitability of the municipalities of Nuevo León as possible locations for the Tesla automotive plant. Our goal is to present clear and concise information that enables decision makers in the company and in local government to make an informed decision on the best location for the plant.
Throughout the R markdown file, we will present our findings and analysis results for each of the criteria considered in the selection of the right municipality for the Tesla car plant in Nuevo León. We will also explain the methods and tools used to reach our conclusions, in order to provide a full and transparent understanding of the investigative process.
Let’s start exploring the data and discover the best location for the Tesla car plant in Nuevo León!
# install.packages("dplyr")
# install.packages("stringr")
# install.packages("tidyverse")
# install.packages("readxl")
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats 1.0.0 ✔ readr 2.1.4
## ✔ ggplot2 3.4.3 ✔ stringr 1.5.0
## ✔ lubridate 1.9.2 ✔ tibble 3.2.1
## ✔ purrr 1.0.1 ✔ tidyr 1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
library(readxl)
RESAGEBURB_19XLSX20 <- read_excel("/Users/daviddrums180/Downloads/RESAGEBURB_19XLSX20.xlsx")
df_full <- RESAGEBURB_19XLSX20
df_no_genre <- df_full %>%
select(!matches("_F$|_M$"))
df_filtered <- df_no_genre %>%
filter(!str_detect(NOM_LOC, "^Total"))
df_NA_replaced <- df_filtered %>%
mutate_all(~ ifelse(. == "*", NA, .))
df_filtered <- df_NA_replaced %>%
mutate(across(POBTOT:last_col(), as.numeric))
## Warning: There were 145 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(POBTOT:last_col(), as.numeric)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 144 remaining warnings.
averages_by_municipality <- df_filtered %>%
group_by(NOM_MUN) %>%
summarize(across(POBTOT:last_col(), mean, na.rm = TRUE))
## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(POBTOT:last_col(), mean, na.rm = TRUE)`.
## ℹ In group 1: `NOM_MUN = "Abasolo"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
##
## # Previously
## across(a:b, mean, na.rm = TRUE)
##
## # Now
## across(a:b, \(x) mean(x, na.rm = TRUE))
averages_filtered_df <- df_filtered %>%
left_join(averages_by_municipality, by = "NOM_MUN", suffix = c("", "_average"))
# Add _avg suffix to track usage of avgs
averages_filtered_df <- averages_filtered_df %>%
mutate(across(POBTOT:last_col(), ~ifelse(is.na(.), ifelse(is.na(cur_data()[[paste0(cur_column(), "_prom")]]), 0, cur_data() [[paste0(cur_column(), "_avg")]]), .)))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! `cur_data()` was deprecated in dplyr 1.1.0.
## ℹ Please use `pick()` instead.
df_clean <- averages_filtered_df %>%
select(!matches("_prom$"))
df_clean
## # A tibble: 76,995 × 309
## CVE ENTIDAD NOM_ENT MUN NOM_MUN LOC NOM_LOC AGEB MZA POBTOT POBFEM
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 001 80 43
## 2 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 002 25 13
## 3 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 003 42 20
## 4 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 004 25 14
## 5 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 005 37 13
## 6 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 006 68 32
## 7 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 007 82 35
## 8 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 008 73 44
## 9 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 009 72 41
## 10 190010… 19 Nuevo … 001 Abasolo 0001 Abasolo 0021 010 18 10
## # ℹ 76,985 more rows
## # ℹ 298 more variables: POBMAS <dbl>, P_0A2 <dbl>, P_3YMAS <dbl>,
## # P_5YMAS <dbl>, P_12YMAS <dbl>, P_15YMAS <dbl>, P_18YMAS <dbl>, P_3A5 <dbl>,
## # P_6A11 <dbl>, P_8A14 <dbl>, P_12A14 <dbl>, P_15A17 <dbl>, P_18A24 <dbl>,
## # P_60YMAS <dbl>, POB0_14 <dbl>, POB15_64 <dbl>, POB65_MAS <dbl>,
## # PROM_HNV <dbl>, PNACENT <dbl>, PNACOE <dbl>, PRES2015 <dbl>,
## # PRESOE15 <dbl>, P3YM_HLI <dbl>, P3HLINHE <dbl>, P3HLI_HE <dbl>, …