• Mexico’s exports grew strongly between 2015 and 2025, almost doubling in total value. This shows how important international trade is for the Mexican economy.
• Around 90% of exports come from manufactured goods, especially automotive, electronics, and machinery. Mexico is highly specialized in industrial production.
• More than 80% of exports go to the United States. This means Mexico depends heavily on the U.S. market and regional supply chains.
• Even with challenges such as COVID-19 and the renegotiation of NAFTA into USMCA, exports recovered quickly and continued growing.
• Panel data combines information across states or sectors and across time. This helps us see differences between regions and how exports change each year.
• It allows us to control for characteristics that do not change over time, such as geography or industrial structure. This makes the analysis more accurate.
• Panel models help separate long-term structural factors from short-term shocks, such as temporary tariff changes.
• We can use past export trends from different states or sectors to estimate what could happen in the next 1–2 years.
• It is possible to include variables like exchange rate, U.S. economic growth, or tariff changes to simulate different scenarios.
• Predictions can be made at the state or sector level, not only for the whole country. This gives more specific and useful results.
• Business Intelligence tools can measure how much each state or sector depends on exports to the U.S. This helps identify which ones are more exposed to tariff changes.
• Predictive models can estimate how sensitive exports are to new trade rules. With this, we can classify states or industries by level of risk.
• Dashboards and visual tools allow decision makers to monitor export performance before and after policy changes.
• Machine learning techniques can group states or sectors with similar export structures. This helps identify patterns of vulnerability during trade negotiations.
library(readxl)
library(tidyverse)
library(plm)
library(sf)
library(rnaturalearth)
library(dplyr)
library(ggplot2)
library(reshape2)df_expo <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "exports")
df_ts <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "ts_exports")
df_data <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "data")
df_fdi <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "fdi")• The original datasets were cleaned and transformed. Export variables were converted from wide format to long format so each row represents one state in one specific year.
df_expolong <- df_expo %>%
pivot_longer(
cols = starts_with("real_exports_"),
names_to = "year",
names_prefix = "real_exports_",
values_to = "real_exports"
)
df_fdilong <- df_fdi %>%
pivot_longer(
cols = starts_with("fdi_"),
names_to = "year",
names_prefix = "fdi_",
values_to = "fdi"
)• We verified that there were no duplicated observations by state and year before merging the datasets.
## # A tibble: 0 × 4
## # ℹ 4 variables: state <chr>, year <chr>, region <chr>, n <int>
## # A tibble: 0 × 3
## # ℹ 3 variables: state <chr>, year <dbl>, n <int>
## # A tibble: 0 × 4
## # ℹ 4 variables: state <chr>, year <chr>, region <chr>, n <int>
• Different datasets (exports, FDI, macroeconomic variables, and state characteristics) were merged using state and year as key variables.
df_expolong$year <- as.numeric(df_expolong$year)
df_fdilong$year <- as.numeric(df_fdilong$year)
df_data$year <- as.numeric(df_data$year)panel_full <- df_expolong %>%
left_join(df_fdilong, by = c("state", "year", "region")) %>%
left_join(df_data, by = c("state", "year"))• Missing values were replaced using the average value by state in order to maintain a balanced panel structure.
panel_full <- panel_full %>%
group_by(state) %>%
mutate(
across(
where(is.numeric),
~ ifelse(is.na(.), mean(., na.rm = TRUE), .)
)
) %>%
ungroup()• A panel dataset was created using state and year as indexes, allowing analysis across time and across regions.
regional_avg <- panel_full %>%
group_by(region) %>%
summarise(
avg_exports = mean(real_exports, na.rm = TRUE),
avg_fdi = mean(fdi, na.rm = TRUE),
avg_exchange_rate = mean(exchange_rate, na.rm = TRUE),
avg_college = mean(college_education, na.rm = TRUE),
.groups = "drop"
)
ggplot(regional_avg, aes(x = region, y = avg_exports)) +
geom_col() +
labs(title = "Average Exports by Region",
x = "Region",
y = "Average Real Exports") +
theme_minimal()ggplot(regional_avg, aes(x = region, y = avg_fdi)) +
geom_col() +
labs(title = "Average FDI by Region",
x = "Region",
y = "Average FDI") +
theme_minimal()ggplot(regional_avg, aes(x = region, y = avg_college)) +
geom_col() +
labs(title = "Average College Education by Region",
x = "Region",
y = "Average College Education") +
theme_minimal()ggplot(panel_full, aes(x = real_exports)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of Real Exports",
x = "Real Exports",
y = "Frequency") +
theme_minimal()ggplot(panel_full, aes(x = fdi)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of FDI",
x = "FDI",
y = "Frequency") +
theme_minimal()ggplot(panel_full, aes(x = college_education)) +
geom_histogram(bins = 30) +
labs(title = "Distribution of College Education",
x = "College Education",
y = "Frequency") +
theme_minimal()ggplot(panel_full, aes(x = region, y = real_exports)) +
geom_boxplot() +
labs(title = "Exports Distribution Across Regions",
x = "Region",
y = "Real Exports") +
theme_minimal()ggplot(panel_full, aes(x = region, y = fdi)) +
geom_boxplot() +
labs(title = "FDI Distribution Across Regions",
x = "Region",
y = "FDI") +
theme_minimal()ggplot(panel_full, aes(x = region, y = lq_primary)) +
geom_boxplot() +
labs(title = "College Education Across Regions",
x = "Region",
y = "College Education") +
theme_minimal()ggplot(panel_full, aes(x = region, y = lq_secondary)) +
geom_boxplot() +
labs(title = "College Education Across Regions",
x = "Region",
y = "College Education") +
theme_minimal()exports_time <- panel_full %>%
group_by(year) %>%
summarise(total_exports = sum(real_exports, na.rm = TRUE))
ggplot(exports_time, aes(x = year, y = total_exports)) +
geom_line(size = 1) +
labs(title = "Total Exports Inflows in Mexico",
x = "Year",
y = "Total Real Exports") +
theme_minimal()## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
• Total exports show a clear upward trend over the period. There is a
temporary decline around the COVID-19 period, but exports recover
quickly and continue growing.
exports_region <- panel_full %>%
group_by(region, year) %>%
summarise(total_exports = sum(real_exports, na.rm = TRUE),
.groups = "drop")
ggplot(exports_region, aes(x = year, y = total_exports, color = region)) +
geom_line(linewidth = 1) +
labs(title = "Total Exports Inflows by Region",
x = "Year",
y = "Total Real Exports",
color = "Region") +
theme_minimal()
• Northern regions show higher export levels compared to southern
regions.
• The gap between regions remains persistent over time, which suggests structural differences in industrial development.
macro_time <- panel_full %>%
group_by(year) %>%
summarise(
total_exports = sum(real_exports, na.rm = TRUE),
avg_exchange_rate = mean(exchange_rate, na.rm = TRUE),
.groups = "drop"
)
macro_time <- macro_time %>%
mutate(
exports_index = total_exports / max(total_exports),
exchange_index = avg_exchange_rate / max(avg_exchange_rate)
)
ggplot(macro_time, aes(x = year)) +
geom_line(aes(y = exports_index, color = "Exports"), linewidth = 1.2) +
geom_line(aes(y = exchange_index, color = "Exchange Rate"), linewidth = 1.2) +
labs(title = "Exports Inflows vs Exchange Rate (Indexed)",
x = "Year",
y = "Indexed Values",
color = "") +
theme_minimal()
• When comparing indexed exports and exchange rate, both variables show
similar movements in some years.
• This suggests that exchange rate fluctuations may influence export performance, although the relationship is not perfectly linear.
###Regional Differences
panel_full <- panel_full %>%
mutate(zone = ifelse(border_distance < 400, "North", "South"))
panel_full <- panel_full %>%
mutate(zone = ifelse(region %in% c("Noroeste", "Noreste"),
"North",
"South"))
north_south_exports <- panel_full %>%
group_by(zone) %>%
summarise(avg_exports = mean(real_exports, na.rm = TRUE),
.groups = "drop")
ggplot(north_south_exports, aes(x = zone, y = avg_exports)) +
geom_bar(stat = "identity") +
labs(title = "Average Exports Inflows: North vs South",
x = "Region",
y = "Average Real Exports") +
theme_minimal()• States classified as “North” have significantly higher average exports than those in the “South.”
• This confirms the importance of geographic proximity to the U.S. border and industrial concentratio
• Northern states show much higher export volumes. Southern states participate less in international trade.
• Geographic proximity to the U.S. border is a improtant factor. Northern states benefit from lower transportation costs and better access to cross-border infrastructure.
• Industrial structure is different. Northern states have higher concentration in manufacturing and export-oriented industries, while southern states more on primary activities or services.
• Foreign Direct Investment is more concentrated in the North. Many multinational firms are located in northern states due to their connection with U.S. markets.
• Infrastructure and connectivity are generally stronger in northern states, including highways, industrial parks, and customs facilities, which facilitate international trade.
mexico_map <- ne_states(country = "mexico", returnclass = "sf")
setdiff(mexico_map$name, panel_data$state)## [1] "Nuevo León" "Yucatán" "Michoacán" NA
## [5] "México" "Querétaro" "San Luis Potosí" "Distrito Federal"
panel_data$state <- recode(panel_data$state,
"Ciudad de Mexico" = "Distrito Federal",
"Michoacan" = "Michoacán",
"Queretaro" = "Querétaro",
"Mexico" = "México",
"San Luis Potosi" = "San Luis Potosí",
"Yucatan" = "Yucatán",
"Nuevo Leon" = "Nuevo León"
)• Border states such as Chihuahua and other northern states show higher export levels.
avg_state_exports <- panel_data %>%
group_by(state) %>%
summarise(avg_exports = mean(real_exports, na.rm = TRUE),
.groups = "drop")
map_data <- mexico_map %>%
left_join(avg_state_exports, by = c("name" = "state"))
ggplot(map_data) +
geom_sf(aes(fill = avg_exports)) +
scale_fill_viridis_c(option = "mako", na.value = "grey90") +
labs(title = "Average Exports Inflows by Mexican State",
fill = "Avg Exports") +
theme_minimal()## <ScaleContinuous>
## Range:
## Limits: -1 -- 1
corr_data <- panel_data %>%
select(real_exports, exchange_rate, pop_density, fdi, border_distance, lq_primary,lq_secondary,lq_tertiary, average_daily_salary, crime_rate) %>%
drop_na()
corr_matrix <- cor(corr_data)
corr_long <- as.data.frame(as.table(corr_matrix))
ggplot(corr_long, aes(Var1, Var2, fill = Freq)) +
geom_tile() +
scale_fill_gradient2(
low = "blue",
mid = "white",
high = "red",
midpoint = 0,
limits = c(-1, 1)
) +
geom_text(aes(label = round(Freq, 2)), size = 4) +
labs(title = "Correlation Matrix",
x = "",
y = "",
fill = "Correlation") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))• Exports show positive correlation with FDI and employment concentration in the secondary sector.
• There is a negative relationship between border distance and exports, meaning that states closer to the U.S. tend to export more.
• Some socioeconomic variables such as salary also show positive association with exports, suggesting that more industrialized states have higher income levels.
ggplot(panel_full, aes(x = border_distance, y = real_exports)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "black") +
labs(title = "Exports Inflows vs Border Disance",
x = "Border Disance",
y = "Exports") +
theme_minimal()## `geom_smooth()` using formula = 'y ~ x'
• The regression line shows a negative slope. States farther from the border tend to have lower export levels.
ggplot(panel_full, aes(x = lq_secondary, y = real_exports)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "black") +
labs(title = "Exports Inflows vs Employment concentration in the secondary sector",
x = "Employment concentration in the secondary sector",
y = "Real Exports") +
theme_minimal()## `geom_smooth()` using formula = 'y ~ x'
• There is a positive relationship. States with stronger industrial
specialization export more.
ggplot(panel_full, aes(x = average_daily_salary, y = real_exports)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = TRUE, color = "black") +
labs(title = "Exports Inflows vs Average Daily Salary",
x = "Average Daily Salary",
y = "Real Exports") +
theme_minimal()## `geom_smooth()` using formula = 'y ~ x'
• The relationship appears positive, suggesting that higher productivity and industrialization are linked to higher wages and exports.
#Hypotheses Statements
• H1: States with higher employment concentration in the secondary (industrial specialization) and third (services) sector have significantly higher export levels.
• H2: Border distance has a negative and significant effect on exports. States closer to the U.S. border export more than states located farther away.
• H3: Higher levels of Foreign Direct Investment (FDI) positively affect export performance at the state level.
• H4: States with higher average wages, as a proxy for productivity and industrial development, show higher export performance.
#Main Findings
• Total exports show fluctuations over time, with increases and decreases from one year to another.
• Northern states consistently show higher export levels than southern states.
• The scatter plot of exports versus border distance shows a negative relationship. States closer to the U.S. border tend to export more.
• There is a positive relationship between exports and employment concentration in the secondary sector. States with stronger manufacturing labor tend to achieve higher export levels.
• Exports are positively correlated with FDI and average daily salary. This suggests that more industrialized states perform better in international trade.