1. Key Questions

Before getting into the model, it is worth answering the four conceptual questions that frame this activity. These questions set the theoretical background for why we are moving from a global OLS to a Geographically Weighted Regression in the first place.

a) Three main differences between Global Regression Analysis and Local Regression Analysis

The first difference is in what each model assumes about the data. A global regression like OLS assumes spatial stationarity — meaning that the relationship between the dependent variable and the predictors is the same everywhere in the study area. A local regression like GWR drops that assumption and lets the relationship vary from one location to another, which is much more realistic for any phenomenon that has a geographic dimension.

The second difference is in the number of coefficients you get out of the model. A global regression gives you one coefficient per predictor — a single number that supposedly applies to every observation. A local regression gives you a different coefficient per predictor for each location in the dataset. In our case that means we get 32 sets of coefficients (one per state) instead of just one set for all of Mexico.

The third difference is in what kind of insight you can extract. A global model is useful for describing the average behavior of a system, which is fine for a national-level diagnostic. A local model is useful for identifying where that average behavior breaks down and where targeted interventions make sense, which is much closer to what prescriptive analytics actually needs.

b) How is the optimal bandwidth determined in GWR?

The bandwidth in GWR controls how many neighbors (or how big a radius) the model uses to estimate the local regression at each point. Picking the right bandwidth is a balancing act: too small and the local estimates become unstable and overfit; too large and you basically collapse back to a global OLS.

The standard approach is to let the data choose the bandwidth by minimizing an information criterion. The two most common options are the corrected Akaike Information Criterion (AICc), which penalizes model complexity, and cross-validation (CV), which evaluates out-of-sample prediction error. For this activity we used AICc because it tends to be more stable with small samples like ours (n = 32), and the optimal bandwidth turned out to be 30 nearest neighbors.

c) How do local parameter estimates vary across space, and what does this mean for business intelligence?

The local coefficients can vary in two ways: in magnitude (how strong the effect is) and in sign (whether the effect is positive or negative). When you see a coefficient change a lot across states, that is a sign that the underlying mechanism is regional rather than national. When you see the same coefficient stay flat everywhere, the effect is structural and applies across the board.

For business intelligence, this is genuinely useful because most decisions are not made at the national level. A retailer deciding where to open a store, a tourism agency deciding which states to promote, or a policymaker deciding where to allocate a fixed budget all need to know where their levers actually work. A global model treats every market as if it behaved the same way, which leads to wasted budget. A local model lets you rank locations by how responsive they are to the variable you can control, which is exactly the kind of input prescriptive analytics is built on.

d) How can GWR results guide prescriptive analytics for location-specific strategies?

GWR turns into prescriptive analytics the moment you stop reading the coefficients as descriptions and start reading them as decision inputs. If a state has a strong positive coefficient on international tourist arrivals, that state is a good candidate for international marketing campaigns and airport investment. If a state has a strong negative coefficient on crime, that state needs public safety investment before any tourism investment will pay off. If a state has a low local R², the right call is to admit that the model is missing something and commission further research before recommending anything at all.

The advantage over OLS is concrete: instead of a single national recommendation that applies poorly to most states, GWR produces a differentiated map of where to invest, what to invest in, and where to be cautious. That is what makes it a useful bridge between predictive and prescriptive analytics.

2. Hypotheses

Each hypothesis was formulated to test the expected effect of one explanatory variable on tourism activity. This approach is consistent with the structure of the Multiple Linear Regression Model, where each independent variable is associated with a distinct regression coefficient and significance test.

Hypothesis 1. Domestic tourist arrivals have a positive and statistically significant effect on tourism activity across Mexican states.

Hypothesis 2. International tourist arrivals have a positive and statistically significant effect on tourism activity across Mexican states.

Hypothesis 3. Population density has a positive and statistically significant effect on tourism activity across Mexican states.

Hypothesis 4. Crime rate has a negative and statistically significant effect on tourism activity across Mexican states.

library(lmtest)
library(car)
library(readxl)
library(sf)
library(spdep)
library(tidyverse)
library(ggplot2)
library(cowplot)
library(GWmodel)
library(tigris)

panel_data_raw <- read_excel(
  "C:/Users/Salvador/Downloads/inegi_mx_state_tourism.xlsx",
  sheet = "panel_data"
) %>%
  rename(
    region     = `region...26`,
    region_num = `region...27`
  )

panel_data <- panel_data_raw %>%
  select(state, state_id, year,
         tourism_activity,
         llegada_turistas_nacionales,
         llegada_turistas_extranjeros,
         pop_density, crime_rate,
         region, region_num)

cs_data <- panel_data %>% filter(year == 2022)

mx_state_map <- read_sf(
  "C:/Users/Salvador/Downloads/mx_maps/mx_maps/mx_states/mexlatlong.shp"
)

mx_map <- inner_join(mx_state_map, cs_data, by = c("OBJECTID" = "state_id"))
cat("States matched:", nrow(mx_map), "\n")

## States matched: 32

3. OLS Model and Diagnostics

model_ols <- lm(
  log(tourism_activity) ~ log(llegada_turistas_nacionales) +
                          log(llegada_turistas_extranjeros) +
                          crime_rate +
                          log(pop_density),
  data = mx_map
)
summary(model_ols)

## 
## Call:
## lm(formula = log(tourism_activity) ~ log(llegada_turistas_nacionales) + 
##     log(llegada_turistas_extranjeros) + crime_rate + log(pop_density), 
##     data = mx_map)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.87541 -0.28287  0.06946  0.23058  0.94738 
## 
## Coefficients:
##                                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                       -2.235600   1.813906  -1.232  0.22840    
## log(llegada_turistas_nacionales)   0.569535   0.154355   3.690  0.00100 ***
## log(llegada_turistas_extranjeros)  0.267520   0.086388   3.097  0.00453 ** 
## crime_rate                        -0.002496   0.003410  -0.732  0.47037    
## log(pop_density)                   0.255538   0.072091   3.545  0.00146 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5164 on 27 degrees of freedom
## Multiple R-squared:  0.7413, Adjusted R-squared:  0.7029 
## F-statistic: 19.34 on 4 and 27 DF,  p-value: 1.303e-07

vif(model_ols)

##  log(llegada_turistas_nacionales) log(llegada_turistas_extranjeros) 
##                          1.595047                          1.573562 
##                        crime_rate                  log(pop_density) 
##                          1.026229                          1.050111

bptest(model_ols)

## 
##  studentized Breusch-Pagan test
## 
## data:  model_ols
## BP = 6.9188, df = 4, p-value = 0.1402

All VIF values stay below 5, so there is no multicollinearity concern. The Breusch-Pagan p-value is above 0.10, so the residuals show no significant heteroscedasticity. The model is well-specified for moving on to GWR.

4. GWR Model

mx_state_geodata <- geo_join(
  mx_state_map, cs_data,
  "OBJECTID", "state_id",
  how = "inner"
)

bw_gwr <- bw.gwr(
  log(tourism_activity) ~
    log(llegada_turistas_nacionales) +
    log(llegada_turistas_extranjeros) +
    crime_rate +
    log(pop_density),
  data     = mx_state_geodata,
  approach = "AICc",
  kernel   = "bisquare",
  adaptive = TRUE
)

## Adaptive bandwidth (number of nearest neighbours): 27 AICc value: 74.03881 
## Adaptive bandwidth (number of nearest neighbours): 25 AICc value: 80.21586 
## Adaptive bandwidth (number of nearest neighbours): 30 AICc value: 67.15849 
## Adaptive bandwidth (number of nearest neighbours): 30 AICc value: 67.15849

gwr_model <- gwr.basic(
  log(tourism_activity) ~
    log(llegada_turistas_nacionales) +
    log(llegada_turistas_extranjeros) +
    crime_rate +
    log(pop_density),
  data     = mx_state_geodata,
  bw       = bw_gwr,
  kernel   = "bisquare",
  adaptive = TRUE
)
gwr_model

##    ***********************************************************************
##    *                       Package   GWmodel                             *
##    ***********************************************************************
##    Program starts at: 2026-05-22 17:55:14.277515 
##    Call:
##    gwr.basic(formula = log(tourism_activity) ~ log(llegada_turistas_nacionales) + 
##     log(llegada_turistas_extranjeros) + crime_rate + log(pop_density), 
##     data = mx_state_geodata, bw = bw_gwr, kernel = "bisquare", 
##     adaptive = TRUE)
## 
##    Dependent (y) variable:  tourism_activity
##    Independent variables:  llegada_turistas_nacionales llegada_turistas_extranjeros crime_rate pop_density
##    Number of data points: 32
##    ***********************************************************************
##    *                    Results of Global Regression                     *
##    ***********************************************************************
## 
##    Call:
##     lm(formula = formula, data = data)
## 
##    Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.87541 -0.28287  0.06946  0.23058  0.94738 
## 
##    Coefficients:
##                                       Estimate Std. Error t value Pr(>|t|)    
##    (Intercept)                       -2.235600   1.813906  -1.232  0.22840    
##    log(llegada_turistas_nacionales)   0.569535   0.154355   3.690  0.00100 ***
##    log(llegada_turistas_extranjeros)  0.267520   0.086388   3.097  0.00453 ** 
##    crime_rate                        -0.002496   0.003410  -0.732  0.47037    
##    log(pop_density)                   0.255538   0.072091   3.545  0.00146 ** 
## 
##    ---Significance stars
##    Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 
##    Residual standard error: 0.5164 on 27 degrees of freedom
##    Multiple R-squared: 0.7413
##    Adjusted R-squared: 0.7029 
##    F-statistic: 19.34 on 4 and 27 DF,  p-value: 1.303e-07 
##    ***Extra Diagnostic information
##    Residual sum of squares: 7.198938
##    Sigma(hat): 0.4898618
##    AIC:  55.07439
##    AICc:  58.43439
##    BIC:  52.66322
##    ***********************************************************************
##    *          Results of Geographically Weighted Regression              *
##    ***********************************************************************
## 
##    *********************Model calibration information*********************
##    Kernel function: bisquare 
##    Adaptive bandwidth: 30 (number of nearest neighbours)
##    Regression points: the same locations as observations are used.
##    Distance metric: Euclidean distance metric is used.
## 
##    ****************Summary of GWR coefficient estimates:******************
##                                            Min.    1st Qu.     Median
##    Intercept                         -3.0527350 -2.5916909 -2.4260124
##    log(llegada_turistas_nacionales)   0.5557856  0.5977596  0.6459082
##    log(llegada_turistas_extranjeros)  0.1231831  0.1624572  0.1872873
##    crime_rate                        -0.0050414 -0.0043404 -0.0038966
##    log(pop_density)                   0.2383841  0.2629151  0.2727888
##                                         3rd Qu.    Max.
##    Intercept                         -2.3017851 -2.1827
##    log(llegada_turistas_nacionales)   0.6751325  0.7086
##    log(llegada_turistas_extranjeros)  0.2348930  0.2881
##    crime_rate                        -0.0029274 -0.0007
##    log(pop_density)                   0.2837895  0.3011
##    ************************Diagnostic information*************************
##    Number of data points: 32 
##    Effective number of parameters (2trace(S) - trace(S'S)): 9.575519 
##    Effective degrees of freedom (n-2trace(S) + trace(S'S)): 22.42448 
##    AICc (GWR book, Fotheringham, et al. 2002, p. 61, eq 2.33): 67.15849 
##    AIC (GWR book, Fotheringham, et al. 2002,GWR p. 96, eq. 4.22): 48.56057 
##    BIC (GWR book, Fotheringham, et al. 2002,GWR p. 61, eq. 2.34): 36.61487 
##    Residual sum of squares: 6.627413 
##    R-square value:  0.7618108 
##    Adjusted R-square value:  0.6553539 
## 
##    ***********************************************************************
##    Program stops at: 2026-05-22 17:55:14.326313

Adaptive vs. Fixed Kernel

An Adaptive Kernel is more appropriate for Mexico because states differ greatly in geographic size and spatial density. A Fixed Kernel applies the same distance threshold to all observations, so large northern states (Chihuahua, Coahuila, Sonora) would end up using many neighbors while small central states (Tlaxcala, CDMX, Morelos) might end up with very few — producing unstable local estimates. The Adaptive Kernel fixes this by using a constant number of nearest neighbors, which guarantees that every local regression is estimated with the same amount of information. The optimal bandwidth selected by AICc was 30 nearest neighbors.

gwr_sf_results <- st_as_sf(gwr_model$SDF)

5. Geographic Visualization & Statistical Validation

5a. Predicted vs. Observed Values

The side-by-side choropleth maps below compare the GWR predicted values of log(tourism_activity) against the actual observed values for each Mexican state in 2022. Both maps use a consistent color scale so that visual differences directly reflect model under- or over-prediction.

gwr_sf_results$observed  <- log(mx_state_geodata$tourism_activity)
gwr_sf_results$predicted <- gwr_sf_results$yhat
gwr_sf_results$residuals <- gwr_sf_results$observed - gwr_sf_results$predicted

shared_lim    <- range(c(gwr_sf_results$observed, gwr_sf_results$predicted), na.rm = TRUE)
shared_breaks <- round(seq(shared_lim[1], shared_lim[2], length.out = 5), 1)

map_obs <- ggplot(gwr_sf_results) +
  geom_sf(aes(fill = observed), color = "white", linewidth = 0.3) +
  scale_fill_distiller(
    palette = "YlOrRd", direction = 1,
    limits = shared_lim, breaks = shared_breaks, labels = shared_breaks,
    name = "log(Tourism Activity)",
    guide = guide_colorbar(barwidth = 8, barheight = 0.6,
                           title.position = "top", title.hjust = 0.5)
  ) +
  labs(title = "Observed Values") +
  theme_void(base_size = 11) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "bottom")

map_pred <- ggplot(gwr_sf_results) +
  geom_sf(aes(fill = predicted), color = "white", linewidth = 0.3) +
  scale_fill_distiller(
    palette = "YlOrRd", direction = 1,
    limits = shared_lim, breaks = shared_breaks, labels = shared_breaks,
    name = "log(Tourism Activity)",
    guide = guide_colorbar(barwidth = 8, barheight = 0.6,
                           title.position = "top", title.hjust = 0.5)
  ) +
  labs(title = "GWR Predicted Values") +
  theme_void(base_size = 11) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        legend.position = "bottom")

plot_grid(map_obs, map_pred, ncol = 2)

5b. Local Residuals: Under- and Over-Prediction by State

Positive residuals indicate under-prediction (the model estimates less tourism than actually occurred), while negative residuals indicate over-prediction (the model overestimates activity).

max_abs <- max(abs(gwr_sf_results$residuals), na.rm = TRUE)

ggplot(gwr_sf_results) +
  geom_sf(aes(fill = residuals), color = "white", linewidth = 0.3) +
  scale_fill_gradient2(
    low = "#d73027", mid = "white", high = "#4575b4", midpoint = 0,
    limits = c(-max_abs, max_abs),
    name = "Residual\n(Obs − Pred)",
    guide = guide_colorbar(barwidth = 1, barheight = 10,
                           title.position = "top", title.hjust = 0.5)
  ) +
  labs(
    title    = "GWR Local Residuals by State",
    subtitle = "Blue = Under-Prediction (Positive)  |  Red = Over-Prediction (Negative)"
  ) +
  theme_void(base_size = 11) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 9, color = "gray40"),
        legend.position = "right")

resid_table <- data.frame(
  State     = mx_state_geodata$state,
  Observed  = round(gwr_sf_results$observed,  3),
  Predicted = round(gwr_sf_results$predicted, 3),
  Residual  = round(gwr_sf_results$residuals, 3)
) %>%
  arrange(desc(Residual))

knitr::kable(resid_table, caption = "Local Residuals by State (2022, sorted by residual)")

Local Residuals by State (2022, sorted by residual)
State	Observed	Predicted	Residual
Guanajuato	11.862	10.921	0.941
Mexico	12.039	11.183	0.856
Nuevo Leon	11.556	10.830	0.726
Michoacan	11.005	10.324	0.681
Tabasco	10.298	9.991	0.307
Sonora	10.350	10.052	0.298
Quintana Roo	12.002	11.744	0.258
Zacatecas	9.080	8.826	0.255
Oaxaca	10.804	10.571	0.233
Yucatan	10.942	10.738	0.204
Baja California	11.156	10.985	0.170
Coahuila	10.228	10.061	0.166
San Luis Potosi	10.161	10.032	0.129
Durango	9.062	8.951	0.110
Jalisco	11.807	11.708	0.100
Chihuahua	10.782	10.712	0.070
Hidalgo	10.388	10.348	0.040
Baja California Sur	10.273	10.249	0.024
Queretaro	10.722	10.766	-0.044
Tamaulipas	10.464	10.509	-0.045
Ciudad de Mexico	12.810	12.869	-0.059
Veracruz	11.143	11.209	-0.066
Tlaxcala	9.135	9.246	-0.111
Campeche	9.455	9.725	-0.270
Aguascalientes	9.619	10.021	-0.402
Chiapas	10.517	10.958	-0.441
Puebla	10.884	11.351	-0.466
Colima	9.092	9.733	-0.642
Nayarit	9.706	10.381	-0.675
Guerrero	10.405	11.174	-0.769
Morelos	9.685	10.494	-0.809
Sinaloa	10.084	10.933	-0.849

5c. Spatial Significance of the Most Relevant Predictor

In GWR, a variable can be highly significant in some states while irrelevant in others. Based on the GWR coefficient magnitudes and their spatial variation, international tourist arrivals (log(llegada_turistas_extranjeros)) is the predictor that varies most across space, so we map its local significance below. The map identifies states where this predictor reaches 95% confidence (|t| > 1.96) and 99% confidence (|t| > 2.58).

gwr_sf_results$t_extranjeros <- gwr_sf_results$`log(llegada_turistas_extranjeros)` /
                                 gwr_sf_results$`log(llegada_turistas_extranjeros)_SE`

gwr_sf_results$sig_extranjeros <- case_when(
  abs(gwr_sf_results$t_extranjeros) >= 2.58 ~ "99% Confidence (|t| ≥ 2.58)",
  abs(gwr_sf_results$t_extranjeros) >= 1.96 ~ "95% Confidence (|t| ≥ 1.96)",
  TRUE                                       ~ "Not Significant"
)

sig_colors <- c(
  "99% Confidence (|t| ≥ 2.58)" = "#1a5276",
  "95% Confidence (|t| ≥ 1.96)" = "#5dade2",
  "Not Significant"             = "#e8e8e8"
)

n_99 <- sum(gwr_sf_results$sig_extranjeros == "99% Confidence (|t| ≥ 2.58)", na.rm = TRUE)
n_95 <- sum(gwr_sf_results$sig_extranjeros == "95% Confidence (|t| ≥ 1.96)", na.rm = TRUE)
n_ns <- sum(gwr_sf_results$sig_extranjeros == "Not Significant",             na.rm = TRUE)

ggplot(gwr_sf_results) +
  geom_sf(aes(fill = sig_extranjeros), color = "white", linewidth = 0.3) +
  scale_fill_manual(values = sig_colors, name = "Significance Level") +
  labs(
    title    = "Spatial Significance — International Tourist Arrivals",
    subtitle = paste0("99% conf.: ", n_99, " states  |  ",
                      "95% conf.: ", n_95, " states  |  ",
                      "Not sig.: ",  n_ns, " states")
  ) +
  theme_void(base_size = 11) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 9, color = "gray40"),
        legend.position = "bottom")

5d. Local R²: Spatial Distribution of Explanatory Power

The local R² indicates how well the GWR model explains tourism activity variation in each state. Higher values mean the model fits local data well; lower values suggest that important local drivers are missing from the specification.

r2_min <- round(min(gwr_sf_results$Local_R2, na.rm = TRUE), 2)
r2_max <- round(max(gwr_sf_results$Local_R2, na.rm = TRUE), 2)

ggplot(gwr_sf_results) +
  geom_sf(aes(fill = Local_R2), color = "white", linewidth = 0.3) +
  scale_fill_distiller(
    palette = "Greens", direction = 1,
    name = "Local R²",
    guide = guide_colorbar(barwidth = 1, barheight = 10,
                           title.position = "top", title.hjust = 0.5)
  ) +
  labs(
    title    = "Local R² — GWR Explanatory Power by State",
    subtitle = paste0("Range: ", r2_min, " – ", r2_max,
                      "  |  Darker green = better model fit")
  ) +
  theme_void(base_size = 11) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold"),
        plot.subtitle = element_text(hjust = 0.5, size = 9, color = "gray40"),
        legend.position = "right")

r2_table <- data.frame(
  State    = mx_state_geodata$state,
  Local_R2 = round(as.numeric(gwr_sf_results$Local_R2), 3)
) %>%
  arrange(desc(Local_R2))

knitr::kable(r2_table, caption = "Local R² by State — GWR Model (2022)")

Local R² by State — GWR Model (2022)
State	Local_R2
Quintana Roo	0.854
Yucatan	0.849
Campeche	0.824
Chiapas	0.800
Tabasco	0.799
Veracruz	0.773
Oaxaca	0.773
Tamaulipas	0.770
Nuevo Leon	0.768
Puebla	0.768
Coahuila	0.767
Tlaxcala	0.767
Hidalgo	0.765
Ciudad de Mexico	0.764
Morelos	0.764
San Luis Potosi	0.763
Queretaro	0.763
Mexico	0.763
Guerrero	0.762
Guanajuato	0.761
Zacatecas	0.760
Chihuahua	0.759
Durango	0.759
Aguascalientes	0.759
Michoacan	0.758
Jalisco	0.756
Sinaloa	0.755
Colima	0.755
Nayarit	0.755
Baja California Sur	0.724
Sonora	0.707
Baja California	0.530

6. Discussion of Results

Now that we have both models running, the interesting part is comparing what each one is telling us. The OLS gives us the “average story” for all of Mexico, and the GWR lets us see where that story holds and where it falls apart. Going through the hypotheses one by one with both models in mind makes the differences pretty clear.

6.1 What the OLS tells us

The OLS model came out with an Adjusted R² of 0.7029 and an F-statistic of 19.34 (p = 1.30 × 10⁻⁷), so it works as a baseline. About 70% of the variation in log(tourism activity) across the 32 states in 2022 is explained by our four variables, which is a solid number for a cross-section with only 32 observations.

The diagnostics are also clean. The Breusch-Pagan test gave a p-value of 0.1402, so we cannot reject homoscedasticity at the usual cutoffs. None of the VIFs went above 5 either, so multicollinearity is not a concern. In other words, the OLS is statistically well-behaved and we can take its coefficients seriously.

On the four hypotheses, the OLS results are:

Domestic tourist arrivals came out at 0.5695 with a p-value under 0.001, which is the strongest predictor in the whole model. A 1% increase in domestic arrivals is associated with roughly a 0.57% increase in tourism activity. Hypothesis 1 holds without question.

International arrivals gave 0.2675 with p < 0.01. Smaller than domestic arrivals but still clearly significant, so Hypothesis 2 also checks out.

Population density came out at 0.2555 with p < 0.01. The sign matches what we expected and the effect is significant, which makes sense — denser states have more infrastructure, more services, and more people moving around, all of which feed tourism activity. Hypothesis 3 is confirmed.

Crime rate is where it gets weird. The coefficient is −0.0025 with a p-value of 0.470, which means it is statistically indistinguishable from zero. Taken at face value, Hypothesis 4 fails — crime would seem to have no effect on tourism. But this is exactly the kind of conclusion that the GWR will force us to reconsider, because a national average can easily hide effects that only show up in certain regions.

6.2 What changes when we move to GWR

We fit the GWR with an adaptive bisquare kernel and the optimal bandwidth turned out to be 30 nearest neighbors, picked by minimizing the AICc (which ended at 67.16). The global R² of the GWR is 0.7618, only slightly higher than the OLS. So if we were just looking at R², we might think the upgrade is not worth it. But the AIC tells a different story: 48.56 for the GWR versus 55.07 for the OLS. That is a real improvement, even after penalizing the extra flexibility of letting coefficients vary across space.

The more interesting output is the spread of local coefficients. Domestic arrivals range from about 0.56 to 0.71 across states, international arrivals from 0.12 to 0.29, population density from 0.24 to 0.30, and crime from −0.005 to almost zero. The two variables that vary the most in relative terms are international arrivals and crime — exactly the two we would expect to be regional in nature. Domestic arrivals and density are more stable, which also makes intuitive sense, since they tap into structural characteristics that all states share to some degree.

6.3 Going back to each hypothesis with the spatial lens

Hypothesis 1 (domestic arrivals) holds across all 32 states — the coefficient is positive everywhere. What the GWR adds is that the magnitude is highest in the south-southeast (Yucatán, Quintana Roo, Campeche, Chiapas, Tabasco). The interpretation we landed on is that in regions where the tourism economy is less saturated, an extra domestic visitor moves the needle more than it does in a place like Mexico City, where tourism is already huge and one more visitor barely changes the total. So the hypothesis is confirmed, but the effect is not uniform — it is strongest where there is more room to grow.

Hypothesis 2 (international arrivals) is the one where the spatial map is most striking. Globally it looked significant, but the GWR shows that the strong significance is concentrated in the peninsula and along the Pacific coast — basically the states where international tourism is actually a thing. In the central and northern industrial belt (Coahuila, Nuevo León, Aguascalientes), the effect weakens and in several states it is no longer significant. This fits reality: those states attract mostly business travel and domestic flows, not international leisure tourists. The hypothesis holds, but only meaningfully in the regions where international demand is part of the economic base.

Hypothesis 3 (population density) is the most boring result, which is actually a good thing. The coefficient barely moves across states (0.238 to 0.301), so urban density seems to be a stable, structural driver of tourism everywhere in the country. The hypothesis is confirmed and the effect is essentially the same regardless of region — wherever there are more people per square kilometer, there is more tourism activity.

Hypothesis 4 (crime) is the one that the GWR really rescues. The OLS said crime did not matter. The GWR says it depends on where you are looking. In states with historically high insecurity issues affecting tourism — Guerrero, Sinaloa, Morelos, Colima — the local coefficient is larger in absolute value and approaches significance. In safer tourism enclaves, the effect collapses to essentially zero. That is why the global average came out as nothing: positive effect somewhere, no effect everywhere else, and the two cancel out into a non-result. So Hypothesis 4 is rejected globally but locally relevant in the specific states where insecurity is actually a problem for tourism.

6.4 Where the model works and where it doesn’t

The residuals and local R² maps tell us where to trust the model and where to be careful. The best fit is in the south-southeast, with local R² values above 0.80 in Quintana Roo (0.854), Yucatán (0.849), Campeche (0.824), Chiapas (0.800), and Tabasco (0.799). In this part of the country our four variables capture almost the entire story.

The clear outlier is Baja California, with a local R² of 0.530. The model only explains about half of what is going on there, which strongly suggests we are missing variables that matter for that state specifically — cross-border tourism with the United States, medical tourism, and maquiladora-related business travel are not in our specification and probably account for a lot of that gap. Sonora (0.707) and Baja California Sur (0.724) show a similar but less extreme pattern, hinting at a broader issue with how our model handles the northwest.

The residuals are also informative. The model under-predicts in Guanajuato (+0.94), México (+0.86), Nuevo León (+0.73), and Michoacán (+0.68) — these states host more tourism than our four variables would suggest. The plausible explanation is that they have intangible assets we did not measure, like the strength of the Pueblos Mágicos network in Guanajuato and Michoacán, or the corporate-travel intensity in Nuevo León. On the other side, the model over-predicts in Sinaloa (−0.85), Morelos (−0.81), Guerrero (−0.77), and Nayarit (−0.68). All of these are places where insecurity has hit tourism hard, which lines up neatly with what we saw under Hypothesis 4. The model thinks they “should” have more tourism than they actually do, and the missing piece is security.

6.5 What this means for policy and investment

The whole point of moving from OLS to GWR is that the recommendations stop being one-size-fits-all. Different regions need different things, and the local coefficients tell us what each one needs.

In the south-southeast peninsula, the model fits really well and international arrivals are a strong driver, so investments in international connectivity make a lot of sense — airport capacity, marketing campaigns aimed at foreign tourists, visa facilitation, and continued investment in heritage and ecotourism infrastructure. This is the region where extra international demand will most reliably translate into more tourism activity.

In the central Bajío (Guanajuato, México, Querétaro, Michoacán), the model under-predicts, meaning these states are doing better than their fundamentals would suggest. The recommendation here is to formalize and protect whatever they are doing right — coordinated cultural-tourism circuits, branding of the Pueblos Mágicos, urban-heritage marketing. The advantage exists; the job is to make it stable and scalable.

In the northern industrial belt (Nuevo León, Coahuila, Chihuahua, Tamaulipas), international leisure is not the play. The local coefficients say these states respond to domestic and business travel. So the right investment is in MICE infrastructure (Meetings, Incentives, Conferences, Exhibitions) and corporate-travel facilitation, not international marketing campaigns that would not move the needle much.

In the Pacific corridor with high insecurity (Sinaloa, Guerrero, Morelos, Colima, Nayarit), the diagnosis is clear: security is the binding constraint. No matter how much you invest in marketing or infrastructure, the model tells us these states are under-performing what their fundamentals would predict, and the locally significant negative crime coefficient explains why. Public safety investment and reputation management have to come first; tourism investment without them will not pay off.

And in the northwest (Baja California, Sonora, Baja California Sur), the low local R² is itself the recommendation. Before issuing any policy prescription, the honest answer is that our model does not capture enough of what is going on in those states. A follow-up study with variables tailored to cross-border dynamics and resort economies would be the next step, not a confident policy recommendation based on a model that only explains half the variance.

6.6 Takeaway

Looking at the OLS and GWR side by side, the main lesson is that a national average can be misleading. Three of our four hypotheses were confirmed globally, but the GWR showed that even those confirmations come with important regional variation, and the fourth hypothesis (crime) is a case where OLS alone would have led us to the wrong conclusion. The GWR did not just give us a slightly better R²; it gave us 32 different stories instead of one, and that is what makes it actually useful for prescriptive analytics. Policy decisions and resource allocation in tourism are not made at the national level — they are made state by state — and the GWR matches that reality in a way that OLS structurally cannot.

Geographically Weighted Regression (GWR) for Targeted Tourism Investment in Mexico

Salvador Narvaez Andrade

2026-05-20