Introduction

Tesla, one of the leading companies in the production of electric vehicles, has expressed interest in expanding its presence in Mexico by building an automotive plant in the state of Nuevo León. In this sense, it is crucial to identify the municipality that offers the most favorable conditions for the installation of this plant, considering factors such as the availability of human resources, existing infrastructure, government incentives and environmental conditions, among others.

In this research project, we will use data analysis and visualization tools in R to explore different variables and assess the suitability of the municipalities of Nuevo León as possible locations for the Tesla automotive plant. Our goal is to present clear and concise information that enables decision makers in the company and in local government to make an informed decision on the best location for the plant.

Throughout the R markdown file, we will present our findings and analysis results for each of the criteria considered in the selection of the right municipality for the Tesla car plant in Nuevo León. We will also explain the methods and tools used to reach our conclusions, in order to provide a full and transparent understanding of the investigative process.

Let’s start exploring the data and discover the best location for the Tesla car plant in Nuevo León!

Installing Libraries

# install.packages("dplyr")
# install.packages("stringr")
# install.packages("tidyverse")
# install.packages("readxl")

Calling Libraries

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ forcats   1.0.0     ✔ readr     2.1.4
## ✔ ggplot2   3.4.3     ✔ stringr   1.5.0
## ✔ lubridate 1.9.2     ✔ tibble    3.2.1
## ✔ purrr     1.0.1     ✔ tidyr     1.3.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(stringr)
library(readxl)

Inegi database

RESAGEBURB_19XLSX20 <- read_excel("/Users/daviddrums180/Downloads/RESAGEBURB_19XLSX20.xlsx")

Dataframe

Make full dataframe copy

df_full <- RESAGEBURB_19XLSX20

Create a new dataframe with only the columns without gender

df_no_genre <- df_full %>%
   select(!matches("_F$|_M$"))

Filter rows that do not start with “Total” in the NOM_LOC column

df_filtered <- df_no_genre %>%
   filter(!str_detect(NOM_LOC, "^Total"))

Replace ’*’ with NA values

df_NA_replaced <- df_filtered %>%
   mutate_all(~ ifelse(. == "*", NA, .))

Change the data type of the POBTOT columns to PCLIM_PMEN from characters to numbers

df_filtered <- df_NA_replaced %>%
   mutate(across(POBTOT:last_col(), as.numeric))
## Warning: There were 145 warnings in `mutate()`.
## The first warning was:
## ℹ In argument: `across(POBTOT:last_col(), as.numeric)`.
## Caused by warning:
## ! NAs introduced by coercion
## ℹ Run `dplyr::last_dplyr_warnings()` to see the 144 remaining warnings.

Take the average of each column by municipality

averages_by_municipality <- df_filtered %>%
   group_by(NOM_MUN) %>%
   summarize(across(POBTOT:last_col(), mean, na.rm = TRUE))
## Warning: There was 1 warning in `summarize()`.
## ℹ In argument: `across(POBTOT:last_col(), mean, na.rm = TRUE)`.
## ℹ In group 1: `NOM_MUN = "Abasolo"`.
## Caused by warning:
## ! The `...` argument of `across()` is deprecated as of dplyr 1.1.0.
## Supply arguments directly to `.fns` through an anonymous function instead.
## 
##   # Previously
##   across(a:b, mean, na.rm = TRUE)
## 
##   # Now
##   across(a:b, \(x) mean(x, na.rm = TRUE))

Join the averages obtained from each column by municipality to the filtered df

averages_filtered_df <- df_filtered %>%
   left_join(averages_by_municipality, by = "NOM_MUN", suffix = c("", "_average"))
   # Add _avg suffix to track usage of avgs

Replaces the NA values on the POBTOT and PCLIM_PMEN columns by the average of each column by municipality

averages_filtered_df <- averages_filtered_df %>%
   mutate(across(POBTOT:last_col(), ~ifelse(is.na(.), ifelse(is.na(cur_data()[[paste0(cur_column(), "_prom")]]), 0, cur_data() [[paste0(cur_column(), "_avg")]]), .)))
## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `across(...)`.
## Caused by warning:
## ! `cur_data()` was deprecated in dplyr 1.1.0.
## ℹ Please use `pick()` instead.

Remove columns ending in _prom

df_clean <- averages_filtered_df %>%
   select(!matches("_prom$"))
  
df_clean
## # A tibble: 76,995 × 309
##    CVE     ENTIDAD NOM_ENT MUN   NOM_MUN LOC   NOM_LOC AGEB  MZA   POBTOT POBFEM
##    <chr>   <chr>   <chr>   <chr> <chr>   <chr> <chr>   <chr> <chr>  <dbl>  <dbl>
##  1 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  001       80     43
##  2 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  002       25     13
##  3 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  003       42     20
##  4 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  004       25     14
##  5 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  005       37     13
##  6 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  006       68     32
##  7 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  007       82     35
##  8 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  008       73     44
##  9 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  009       72     41
## 10 190010… 19      Nuevo … 001   Abasolo 0001  Abasolo 0021  010       18     10
## # ℹ 76,985 more rows
## # ℹ 298 more variables: POBMAS <dbl>, P_0A2 <dbl>, P_3YMAS <dbl>,
## #   P_5YMAS <dbl>, P_12YMAS <dbl>, P_15YMAS <dbl>, P_18YMAS <dbl>, P_3A5 <dbl>,
## #   P_6A11 <dbl>, P_8A14 <dbl>, P_12A14 <dbl>, P_15A17 <dbl>, P_18A24 <dbl>,
## #   P_60YMAS <dbl>, POB0_14 <dbl>, POB15_64 <dbl>, POB65_MAS <dbl>,
## #   PROM_HNV <dbl>, PNACENT <dbl>, PNACOE <dbl>, PRES2015 <dbl>,
## #   PRESOE15 <dbl>, P3YM_HLI <dbl>, P3HLINHE <dbl>, P3HLI_HE <dbl>, …