This section replicates and reconstructs Figure 2 from the paper “Migration, Specialization, and Trade: Evidence from Brazil’s March to the West.”
Due to data access limitations, we use a combination of open geospatial datasets and IPEA population data sourced from IBGE (Brazilian Institute of Geography and Statistics) to visualize the population distribution across Brazil’s meso-regions in 1950, 1980, and 2010.
Population (decennial): Brazilian Institute of Applied Economic Research (IPEA) data (Resident Population – Total), sourced from IBGE Frequency: Decennial from 1872 to 2022 Unit: Inhabitant ( )
Geospatial boundaries: geobr package (IBGE-compatible geometries).
Comment: For resident population, the legal population is considered, consisting of present residents and absent residents, for a period not exceeding 12 months on the date of the census.
Notes from IPEA Metadata: For the year 2010, there is treatment, in accordance with international practices adopted by other countries, for closed households, allowing IBGE to estimate the portion of the population residing in these households.
The population residing in isolated urban areas is not included. For the year 2007, the population totals are from the Population Count, with a reference date of April 1, 2007, of the 5435 Brazilian municipalities that were the subject of this census survey.
For the remaining 128 municipalities and the Federal District, estimates of the resident population for the same reference date are presented, totaling 5564 municipalities. List of municipalities with estimated population .
The universe of municipalities in the table is defined by the IBGE in the census survey and does not necessarily coincide with the officially existing or established population on the reference date.
More information: census methodology 2000.pdf, census methodology 2010.pdf Last Updated on: 03/20/2024.
# Libraries
library(sf)
library(tidyverse)
library(geobr)
# -----------------------------
# 1) Read + clean population data (IPEA CSV)
# -----------------------------
csv <- "../Data/ipeadata[26-01-2026-02-36].csv"
pop <- read_csv(
csv,
skip = 1,
show_col_types = FALSE
)
data <- pop %>%
select("Sigla", "Código", "Meso-região", "1950", "1980", "2010") %>%
setNames(c("state_abbr", "code_meso", "meso_name", "pop_1950", "pop_1980", "pop_2010")) %>%
mutate(
state_abbr = as.character(state_abbr),
code_meso = as.character(code_meso)
)
# -----------------------------
# 2) Read meso-region geometry + join population
# -----------------------------
meso_regions <- geobr::read_meso_region(year = 2010, simplified = TRUE) %>%
st_make_valid() %>%
mutate(code_meso = as.character(code_meso)) %>%
left_join(data, by = "code_meso")
cat("NAs in pop_2010 after join:", sum(is.na(meso_regions$pop_2010)), "\n")
## NAs in pop_2010 after join: 0
# -----------------------------
# 3) Compute population shares + bins
# -----------------------------
break_list <- list(
`1950` = c(0.00167, 0.146, 0.349, 0.615, 1.08, 6.59),
`1980` = c(0.0102, 0.185, 0.393, 0.604, 1.02, 11.3),
`2010` = c(0.0272, 0.216, 0.382, 0.557, 1.03, 11.1)
)
df <- meso_regions %>%
mutate(
pop_share_1950 = 100 * pop_1950 / sum(pop_1950, na.rm = TRUE),
pop_share_1980 = 100 * pop_1980 / sum(pop_1980, na.rm = TRUE),
pop_share_2010 = 100 * pop_2010 / sum(pop_2010, na.rm = TRUE),
bin_1950 = cut(pop_share_1950, breaks = break_list$`1950`, include.lowest = TRUE),
bin_1980 = cut(pop_share_1980, breaks = break_list$`1980`, include.lowest = TRUE),
bin_2010 = cut(pop_share_2010, breaks = break_list$`2010`, include.lowest = TRUE)
)
# -----------------------------
# 4) West outline FROM MESO-REGIONS : defined as North and Central-West regions
# -----------------------------
west_ufs <- c("AC","RO","AM","RR","PA","AP","TO", # North
"MT","MS","GO","DF") # Central-West
# Dissolve West meso-regions into one geometry
west_union <- df %>%
filter(state_abbr %in% west_ufs) %>%
st_make_valid() %>%
summarise(geom = st_union(st_geometry(.)), .groups = "drop") %>%
st_as_sf() %>%
st_set_geometry("geom") %>%
st_make_valid()
# Remove dots/islands: keep only largest polygon piece
west_poly <- st_collection_extract(st_geometry(west_union), "POLYGON")
west_poly_main <- west_poly[which.max(as.numeric(st_area(west_poly)))]
# Remove internal rings: keep only the longest boundary ring
rings <- st_cast(st_boundary(west_poly_main), "LINESTRING")
rings_main <- rings[which.max(as.numeric(st_length(rings)))]
# Final outline sf (single clean outer boundary)
west_outline <- st_sf(geometry = st_sfc(rings_main, crs = st_crs(df)))
# -----------------------------
# 5) Brazil outline
# -----------------------------
brazil <- geobr::read_country(year = 2010, simplified = TRUE) %>%
st_make_valid()
# -----------------------------
# 6) Plot helper
# -----------------------------
plot_panel <- function(fill_var, title_txt) {
ggplot() +
geom_sf(data = brazil, fill = NA, color = "black", size = 1.0) +
geom_sf(data = df, aes(fill = .data[[fill_var]]), color = NA) +
geom_sf(data = west_outline, fill = NA, color = "red", size = 1.0) +
scale_fill_brewer(name = "Pop. Share", palette = "Blues", na.value = "grey90") +
labs(title = title_txt) +
theme_void(base_size = 11) +
theme(
plot.title = element_text(hjust = 0.5),
legend.position = "left",
legend.title = element_text(size = 9),
legend.text = element_text(size = 8)
)
}
# -----------------------------
# 7) Plots
# -----------------------------
p1950 <- plot_panel("bin_1950", "(a) 1950")
p1980 <- plot_panel("bin_1980", "(b) 1980")
p2010 <- plot_panel("bin_2010", "(c) 2010")
Graph of population shares in Brazil’s meso-regions for the years 1950, 1980, and 2010, with the West region outlined in red.