Mexico’s exports performance

Mexico exports characteristics

• Mexico’s exports grew strongly between 2015 and 2025, almost doubling in total value. This shows how important international trade is for the Mexican economy.

• Around 90% of exports come from manufactured goods, especially automotive, electronics, and machinery. Mexico is highly specialized in industrial production.

• More than 80% of exports go to the United States. This means Mexico depends heavily on the U.S. market and regional supply chains.

• Even with challenges such as COVID-19 and the renegotiation of NAFTA into USMCA, exports recovered quickly and continued growing.

Contribution of Panel Data Analysis to Understanding Export Performance

• Panel data combines information across states or sectors and across time. This helps us see differences between regions and how exports change each year.

• It allows us to control for characteristics that do not change over time, such as geography or industrial structure. This makes the analysis more accurate.

• Panel models help separate long-term structural factors from short-term shocks, such as temporary tariff changes.

Use of Panel Data for Short‑Term Export Prediction

• We can use past export trends from different states or sectors to estimate what could happen in the next 1–2 years.

• It is possible to include variables like exchange rate, U.S. economic growth, or tariff changes to simulate different scenarios.

• Predictions can be made at the state or sector level, not only for the whole country. This gives more specific and useful results.

Role of Business Intelligence and Predictive Analytics for Identifying Vulnerability During USMCA Negotiations

• Business Intelligence tools can measure how much each state or sector depends on exports to the U.S. This helps identify which ones are more exposed to tariff changes.

• Predictive models can estimate how sensitive exports are to new trade rules. With this, we can classify states or industries by level of risk.

• Dashboards and visual tools allow decision makers to monitor export performance before and after policy changes.

• Machine learning techniques can group states or sectors with similar export structures. This helps identify patterns of vulnerability during trade negotiations.

Exports Data Analysis

Data Cleaning and Transforming

library(readxl)
library(tidyverse)
library(plm)
library(sf)
library(rnaturalearth)
library(dplyr)
library(ggplot2)
library(reshape2)
df_expo <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "exports")
df_ts <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "ts_exports")
df_data <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "data")
df_fdi <- read_excel("C:/Users/Osval/Downloads/inegi_exports_dataset.xlsx", sheet = "fdi")

• The original datasets were cleaned and transformed. Export variables were converted from wide format to long format so each row represents one state in one specific year.

df_expolong <- df_expo %>%
  pivot_longer(
    cols = starts_with("real_exports_"),
    names_to = "year",
    names_prefix = "real_exports_",
    values_to = "real_exports"
  )

df_fdilong <- df_fdi %>%
  pivot_longer(
    cols = starts_with("fdi_"),
    names_to = "year",
    names_prefix = "fdi_",
    values_to = "fdi"
  )

• We verified that there were no duplicated observations by state and year before merging the datasets.

df_expolong %>% count(state, year, region) %>% filter(n > 1)
## # A tibble: 0 × 4
## # ℹ 4 variables: state <chr>, year <chr>, region <chr>, n <int>
df_data %>% count(state, year) %>% filter(n > 1)
## # A tibble: 0 × 3
## # ℹ 3 variables: state <chr>, year <dbl>, n <int>
df_fdilong %>% count(state, year, region) %>% filter(n > 1)
## # A tibble: 0 × 4
## # ℹ 4 variables: state <chr>, year <chr>, region <chr>, n <int>

• Different datasets (exports, FDI, macroeconomic variables, and state characteristics) were merged using state and year as key variables.

df_expolong$year <- as.numeric(df_expolong$year)
df_fdilong$year <- as.numeric(df_fdilong$year)
df_data$year <- as.numeric(df_data$year)
panel_full <- df_expolong %>%
  left_join(df_fdilong, by = c("state", "year", "region")) %>%
  left_join(df_data, by = c("state", "year"))

• Missing values were replaced using the average value by state in order to maintain a balanced panel structure.

panel_full <- panel_full %>%
  group_by(state) %>%
  mutate(
    across(
      where(is.numeric), 
      ~ ifelse(is.na(.), mean(., na.rm = TRUE), .)
    )
  ) %>%
  ungroup()

• A panel dataset was created using state and year as indexes, allowing analysis across time and across regions.

panel_data <- pdata.frame(panel_full,
                          index = c("state", "year"))

Exploratory Data Analysis – EDA

Descriptive Statistics

regional_avg <- panel_full %>%
  group_by(region) %>%
  summarise(
    avg_exports = mean(real_exports, na.rm = TRUE),
    avg_fdi = mean(fdi, na.rm = TRUE),
    avg_exchange_rate = mean(exchange_rate, na.rm = TRUE),
    avg_college = mean(college_education, na.rm = TRUE),
    .groups = "drop"
  )
ggplot(regional_avg, aes(x = region, y = avg_exports)) +
  geom_col() +
  labs(title = "Average Exports by Region",
       x = "Region",
       y = "Average Real Exports") +
  theme_minimal()

ggplot(regional_avg, aes(x = region, y = avg_fdi)) +
  geom_col() +
  labs(title = "Average FDI by Region",
       x = "Region",
       y = "Average FDI") +
  theme_minimal()

ggplot(regional_avg, aes(x = region, y = avg_college)) +
  geom_col() +
  labs(title = "Average College Education by Region",
       x = "Region",
       y = "Average College Education") +
  theme_minimal()

ggplot(panel_full, aes(x = real_exports)) +
  geom_histogram(bins = 30) +
  labs(title = "Distribution of Real Exports",
       x = "Real Exports",
       y = "Frequency") +
  theme_minimal()

ggplot(panel_full, aes(x = fdi)) +
  geom_histogram(bins = 30) +
  labs(title = "Distribution of FDI",
       x = "FDI",
       y = "Frequency") +
  theme_minimal()

ggplot(panel_full, aes(x = college_education)) +
  geom_histogram(bins = 30) +
  labs(title = "Distribution of College Education",
       x = "College Education",
       y = "Frequency") +
  theme_minimal()

Statistics of Dispersion

ggplot(panel_full, aes(x = region, y = real_exports)) +
  geom_boxplot() +
  labs(title = "Exports Distribution Across Regions",
       x = "Region",
       y = "Real Exports") +
  theme_minimal()

ggplot(panel_full, aes(x = region, y = fdi)) +
  geom_boxplot() +
  labs(title = "FDI Distribution Across Regions",
       x = "Region",
       y = "FDI") +
  theme_minimal()

ggplot(panel_full, aes(x = region, y = lq_primary)) +
  geom_boxplot() +
  labs(title = "College Education Across Regions",
       x = "Region",
       y = "College Education") +
  theme_minimal()

ggplot(panel_full, aes(x = region, y = lq_secondary)) +
  geom_boxplot() +
  labs(title = "College Education Across Regions",
       x = "Region",
       y = "College Education") +
  theme_minimal()

Visualization across Time

exports_time <- panel_full %>%
  group_by(year) %>%
  summarise(total_exports = sum(real_exports, na.rm = TRUE))

ggplot(exports_time, aes(x = year, y = total_exports)) +
  geom_line(size = 1) +
  labs(title = "Total Exports Inflows in Mexico",
       x = "Year",
       y = "Total Real Exports") +
  theme_minimal()
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

• Total exports show a clear upward trend over the period. There is a temporary decline around the COVID-19 period, but exports recover quickly and continue growing.

exports_region <- panel_full %>%
  group_by(region, year) %>%
  summarise(total_exports = sum(real_exports, na.rm = TRUE),
            .groups = "drop")

ggplot(exports_region, aes(x = year, y = total_exports, color = region)) +
  geom_line(linewidth = 1) +
  labs(title = "Total Exports Inflows by Region",
       x = "Year",
       y = "Total Real Exports",
       color = "Region") +
  theme_minimal()

• Northern regions show higher export levels compared to southern regions.

• The gap between regions remains persistent over time, which suggests structural differences in industrial development.

macro_time <- panel_full %>%
  group_by(year) %>%
  summarise(
    total_exports = sum(real_exports, na.rm = TRUE),
    avg_exchange_rate = mean(exchange_rate, na.rm = TRUE),
    .groups = "drop"
  )
macro_time <- macro_time %>%
  mutate(
    exports_index = total_exports / max(total_exports),
    exchange_index = avg_exchange_rate / max(avg_exchange_rate)
  )
ggplot(macro_time, aes(x = year)) +
  geom_line(aes(y = exports_index, color = "Exports"), linewidth = 1.2) +
  geom_line(aes(y = exchange_index, color = "Exchange Rate"), linewidth = 1.2) +
  labs(title = "Exports Inflows vs Exchange Rate (Indexed)",
       x = "Year",
       y = "Indexed Values",
       color = "") +
  theme_minimal()

• When comparing indexed exports and exchange rate, both variables show similar movements in some years.

• This suggests that exchange rate fluctuations may influence export performance, although the relationship is not perfectly linear.

###Regional Differences

panel_full <- panel_full %>%
  mutate(zone = ifelse(border_distance < 400, "North", "South"))
panel_full <- panel_full %>%
  mutate(zone = ifelse(region %in% c("Noroeste", "Noreste"), 
                       "North", 
                       "South"))
north_south_exports <- panel_full %>%
  group_by(zone) %>%
  summarise(avg_exports = mean(real_exports, na.rm = TRUE),
            .groups = "drop")
ggplot(north_south_exports, aes(x = zone, y = avg_exports)) +
  geom_bar(stat = "identity") +
  labs(title = "Average Exports Inflows: North vs South",
       x = "Region",
       y = "Average Real Exports") +
  theme_minimal()

• States classified as “North” have significantly higher average exports than those in the “South.”

• This confirms the importance of geographic proximity to the U.S. border and industrial concentratio

• Northern states show much higher export volumes. Southern states participate less in international trade.

• Geographic proximity to the U.S. border is a improtant factor. Northern states benefit from lower transportation costs and better access to cross-border infrastructure.

• Industrial structure is different. Northern states have higher concentration in manufacturing and export-oriented industries, while southern states more on primary activities or services.

• Foreign Direct Investment is more concentrated in the North. Many multinational firms are located in northern states due to their connection with U.S. markets.

• Infrastructure and connectivity are generally stronger in northern states, including highways, industrial parks, and customs facilities, which facilitate international trade.

mexico_map <- ne_states(country = "mexico", returnclass = "sf")

setdiff(mexico_map$name, panel_data$state)
## [1] "Nuevo León"       "Yucatán"          "Michoacán"        NA                
## [5] "México"           "Querétaro"        "San Luis Potosí"  "Distrito Federal"
panel_data$state <- recode(panel_data$state,
  "Ciudad de Mexico" = "Distrito Federal",
  "Michoacan" = "Michoacán",
  "Queretaro" = "Querétaro",
  "Mexico" = "México",
  "San Luis Potosi" = "San Luis Potosí",
  "Yucatan" = "Yucatán",
  "Nuevo Leon" = "Nuevo León"
)

• Border states such as Chihuahua and other northern states show higher export levels.

avg_state_exports <- panel_data %>%
  group_by(state) %>%
  summarise(avg_exports = mean(real_exports, na.rm = TRUE),
            .groups = "drop")
map_data <- mexico_map %>%
  left_join(avg_state_exports, by = c("name" = "state"))
ggplot(map_data) +
  geom_sf(aes(fill = avg_exports)) +
  scale_fill_viridis_c(option = "mako", na.value = "grey90") +
  labs(title = "Average Exports Inflows by Mexican State",
       fill = "Avg Exports") +
  theme_minimal()

Correlation and Relationships

scale_fill_gradient2(
  low = "blue",
  mid = "white",
  high = "red",
  midpoint = 0,
  limits = c(-1, 1)
)
## <ScaleContinuous>
##  Range:  
##  Limits:   -1 --    1
corr_data <- panel_data %>%
  select(real_exports, exchange_rate, pop_density, fdi, border_distance, lq_primary,lq_secondary,lq_tertiary, average_daily_salary, crime_rate) %>%
  drop_na()

corr_matrix <- cor(corr_data)

corr_long <- as.data.frame(as.table(corr_matrix))

ggplot(corr_long, aes(Var1, Var2, fill = Freq)) +
  geom_tile() +
  scale_fill_gradient2(
    low = "blue",
    mid = "white",
    high = "red",
    midpoint = 0,
    limits = c(-1, 1)
  ) +
  geom_text(aes(label = round(Freq, 2)), size = 4) +
  labs(title = "Correlation Matrix",
       x = "",
       y = "",
       fill = "Correlation") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

• Exports show positive correlation with FDI and employment concentration in the secondary sector.

• There is a negative relationship between border distance and exports, meaning that states closer to the U.S. tend to export more.

• Some socioeconomic variables such as salary also show positive association with exports, suggesting that more industrialized states have higher income levels.

ggplot(panel_full, aes(x = border_distance, y = real_exports)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "black") +
  labs(title = "Exports Inflows vs Border Disance",
       x = "Border Disance",
       y = "Exports") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

• The regression line shows a negative slope. States farther from the border tend to have lower export levels.

ggplot(panel_full, aes(x = lq_secondary, y = real_exports)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "black") +
  labs(title = "Exports Inflows vs Employment concentration in the secondary sector",
       x = "Employment concentration in the secondary sector",
       y = "Real Exports") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

• There is a positive relationship. States with stronger industrial specialization export more.

ggplot(panel_full, aes(x = average_daily_salary, y = real_exports)) +
  geom_point(alpha = 0.6) +
  geom_smooth(method = "lm", se = TRUE, color = "black") +
  labs(title = "Exports Inflows vs Average Daily Salary",
       x = "Average Daily Salary",
       y = "Real Exports") +
  theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'

• The relationship appears positive, suggesting that higher productivity and industrialization are linked to higher wages and exports.

#Hypotheses Statements

• H1: States with higher employment concentration in the secondary (industrial specialization) and third (services) sector have significantly higher export levels.

• H2: Border distance has a negative and significant effect on exports. States closer to the U.S. border export more than states located farther away.

• H3: Higher levels of Foreign Direct Investment (FDI) positively affect export performance at the state level.

• H4: States with higher average wages, as a proxy for productivity and industrial development, show higher export performance.

#Main Findings

• Total exports show fluctuations over time, with increases and decreases from one year to another.

• Northern states consistently show higher export levels than southern states.

• The scatter plot of exports versus border distance shows a negative relationship. States closer to the U.S. border tend to export more.

• There is a positive relationship between exports and employment concentration in the secondary sector. States with stronger manufacturing labor tend to achieve higher export levels.

• Exports are positively correlated with FDI and average daily salary. This suggests that more industrialized states perform better in international trade.

