Public Transport Commuting Patterns in São Paulo

Author

Vinicius Hiago e Silva Gerônimo

Published

July 22, 2025

Data Preparation

The microdata from the 2023 Origin-Destination Survey were obtained from the São Paulo Metro Transparency Portal. The Origin-Destination Survey was created to model travel flows of the population in the São Paulo Metropolitan Region, and it currently includes data on trips, such as schedules, travel times, transportation modes, destinations, etc., as well as socioeconomic information, such as income, household characteristics, employment, and more.

For this project, only trips that originated at home and were destined for work between 4 a.m. and 10 a.m. using public transportation (bus, subway and train) were considered. Additionally, only trips with high passenger volume were included in the analysis, to avoid cluttering the map.

Show Code
# Purpose and mode of transportation
motivo_trabalho <- c(1, 2, 3) 
modos_publicos <- c(1, 2, 3, 4, 5, 6)


# Create a column with time in total minutes since midnight
pod_com_hora_ini <- df %>%
  mutate(HORA_INI = (h_saida * 60) + min_saida)

# Filter the dataset
demanda_pico <- pod_com_hora_ini %>%
  filter(motivo_d %in% motivo_trabalho,
         modo1 %in% modos_publicos,
         HORA_INI >= (4 * 60),
         HORA_INI < (10 * 60))

# Normalize the time interval, group volume by origin-destination (OD), and filter the highest volumes
fluxos_para_animacao <- demanda_pico %>%
  mutate(intervalo_tempo = floor(HORA_INI / 15) * 15) %>%
  group_by(zona_o, zona_d, intervalo_tempo) %>%
  summarise(volume_passageiros = sum(Fe_via, na.rm = TRUE), .groups = 'drop') %>%
  filter(zona_o != zona_d) %>%
  filter(volume_passageiros > 600)

Animation

The base map was obtained using the shapefile of the municipalities of the state of São Paulo, available on the Metro’s website. The animation of the points was built based on the coordinates of the centroids of the OD zones, where each point represents a trip (from origin to destination). Darker points represent movements with lower passenger volumes, while lighter points indicate trips with higher volumes.

Show Code
# Retrieve the coordinates of the origin and destination of the trip
segmentos_coords <- fluxos_para_animacao %>%
  left_join(as.data.frame(st_coordinates(cent_zonas)) %>% mutate(ID_ZONA = cent_zonas$NumeroZona), by = c("zona_o" = "ID_ZONA")) %>%
  rename(x_o = X, y_o = Y) %>%
  left_join(as.data.frame(st_coordinates(cent_zonas)) %>% mutate(ID_ZONA = cent_zonas$NumeroZona), by = c("zona_d" = "ID_ZONA")) %>%
  rename(x_d = X, y_d = Y) %>%
  drop_na(x_o, y_o, x_d, y_d) %>%
  mutate(id_viagem = row_number())

# Create a visualization of the movement of the points
pontos_curvados <- segmentos_coords %>%
  mutate(
    pontos = pmap(list(x_o, y_o, x_d, y_d), function(x1, y1, x2, y2) {
      pontos_controle <- matrix(
        c(x1, y1,
          (x1 + x2)/2 + (y2 - y1)*0.15, (y1 + y2)/2 - (x2 - x1)*0.15,
          x2, y2),
        ncol = 2, byrow = TRUE
      )
      curva <- bezier(t = seq(0, 1, length.out = 50), p = pontos_controle)
      return(as.data.frame(curva) %>% mutate(passo_animacao = 1:50))
    })
  ) %>%
  unnest(pontos) %>%
  mutate(tempo_total = intervalo_tempo + passo_animacao)

Chart

The most striking and immediate feature is the pattern of massive convergence. We see points emerging throughout the entire urban sprawl, especially in the more peripheral areas, and narrowing toward a very small and dense central area (Centro, Paulista, Faria Lima, Itaim, Berrini, etc.). This is the so-called “pendular movement” in action—the daily home-to-work commute.

The map visually exposes the city’s severe job-housing imbalance. People live scattered across a vast area, but the vast majority of formal jobs are concentrated in a limited core.

The animation clearly reveals the rhythm of the rush hour. Before 6:00 AM, the city is still “waking up,” with only a few flows. From 6:30 onward, the intensity increases dramatically. The peak “explosion” of trips occurs between 7:15 and 8:30 AM, when the brightest points (representing higher volumes) are concentrated. After 9:00 AM, the pace visibly slows down.

This highlights the pressure placed on the transportation system. Demand is not distributed evenly throughout the morning but rather compressed into a very short and critical time window. Lower-income populations, who depend more on public transport, have historically been settled in more distant areas, forcing them to make the longest journeys.

Although we are not mapping by income, the geographic pattern of trip origins strongly suggests this story of spatial segregation.

Show Code
# Format the time displayed in the title
formatar_tempo <- function(minutos) {
  h <- floor(minutos / 60)
  m <- minutos %% 60
  return(sprintf("%02d:%02d", h, m))
}

# Get the maximum and minimum of the volumes
max_val <- round(max(fluxos_para_animacao$volume_passageiros, na.rm = TRUE),0)
min_val <- round(min(fluxos_para_animacao$volume_passageiros, na.rm = TRUE),0)

# Plot
animacao <- ggplot() +
  
  geom_sf(data = muni_shp, fill = "gray85", color = "white", linewidth = 0.1) +
  
  geom_point(
    data = pontos_curvados,
    aes(
      x = V1, y = V2,
      group = id_viagem,
      color = volume_passageiros
      ),
    size = 0.5,
    alpha = 1
    ) +
          
  scale_color_viridis_c(option = "inferno", trans = "log10", name = "",
                        limits = c(min_val, max_val),
                        breaks = c(min_val, max_val)) +

  labs(title = 'Public Transport Commuting Patterns in São Paulo',
       subtitle = 'Time: {formatar_tempo(floor(frame_along / 15) * 15)} AM',
       caption = 'Source: Pesquisa Origem e Destino 2023') +
  
  transition_reveal(tempo_total) +
  
  shadow_trail(
    max_frames = 15,
    alpha = 0.15
  ) +
  
  theme_void() +
  
  theme(
    plot.title = element_text(family = "pf", size = 32, face = "bold", color = "#222222", hjust = 0.5),
    plot.margin = margin(0,-1,0,-1),
    plot.subtitle = element_text(family = "pf", size = 20, color = "#555555", hjust = 0.5),
    legend.text = element_text(family = "pf", size = 16),
    plot.caption = element_text(family = "pf", size = 20, hjust = 0.5),
    legend.position = 'bottom',
    legend.key.width = unit(1.2, "cm"),
    legend.key.size= unit(0.3, "cm")
  )

options(ragg.max_dim = 900000)

# Render the animation
animate(animacao, 
        nframes = 300,          
        fps = 20,             
        width = 1600,          
        height = 1200,
        units = 'px',
        renderer = gifski_renderer("animacao_com_ajustes.gif"),
        device = 'ragg_png')

Show Code
# Save GIF
anim_save("animacao_fluxo_final.gif")