The microdata from the 2023 Origin-Destination Survey were obtained from the São Paulo Metro Transparency Portal. The Origin-Destination Survey was created to model travel flows of the population in the São Paulo Metropolitan Region, and it currently includes data on trips, such as schedules, travel times, transportation modes, destinations, etc., as well as socioeconomic information, such as income, household characteristics, employment, and more.
For this project, only trips that originated at home and were destined for work between 4 a.m. and 10 a.m. using public transportation (bus, subway and train) were considered. Additionally, only trips with high passenger volume were included in the analysis, to avoid cluttering the map.
Show Code
# Purpose and mode of transportationmotivo_trabalho <-c(1, 2, 3) modos_publicos <-c(1, 2, 3, 4, 5, 6)# Create a column with time in total minutes since midnightpod_com_hora_ini <- df %>%mutate(HORA_INI = (h_saida *60) + min_saida)# Filter the datasetdemanda_pico <- pod_com_hora_ini %>%filter(motivo_d %in% motivo_trabalho, modo1 %in% modos_publicos, HORA_INI >= (4*60), HORA_INI < (10*60))# Normalize the time interval, group volume by origin-destination (OD), and filter the highest volumesfluxos_para_animacao <- demanda_pico %>%mutate(intervalo_tempo =floor(HORA_INI /15) *15) %>%group_by(zona_o, zona_d, intervalo_tempo) %>%summarise(volume_passageiros =sum(Fe_via, na.rm =TRUE), .groups ='drop') %>%filter(zona_o != zona_d) %>%filter(volume_passageiros >600)
Animation
The base map was obtained using the shapefile of the municipalities of the state of São Paulo, available on the Metro’s website. The animation of the points was built based on the coordinates of the centroids of the OD zones, where each point represents a trip (from origin to destination). Darker points represent movements with lower passenger volumes, while lighter points indicate trips with higher volumes.
Show Code
# Retrieve the coordinates of the origin and destination of the tripsegmentos_coords <- fluxos_para_animacao %>%left_join(as.data.frame(st_coordinates(cent_zonas)) %>%mutate(ID_ZONA = cent_zonas$NumeroZona), by =c("zona_o"="ID_ZONA")) %>%rename(x_o = X, y_o = Y) %>%left_join(as.data.frame(st_coordinates(cent_zonas)) %>%mutate(ID_ZONA = cent_zonas$NumeroZona), by =c("zona_d"="ID_ZONA")) %>%rename(x_d = X, y_d = Y) %>%drop_na(x_o, y_o, x_d, y_d) %>%mutate(id_viagem =row_number())# Create a visualization of the movement of the pointspontos_curvados <- segmentos_coords %>%mutate(pontos =pmap(list(x_o, y_o, x_d, y_d), function(x1, y1, x2, y2) { pontos_controle <-matrix(c(x1, y1, (x1 + x2)/2+ (y2 - y1)*0.15, (y1 + y2)/2- (x2 - x1)*0.15, x2, y2),ncol =2, byrow =TRUE ) curva <-bezier(t =seq(0, 1, length.out =50), p = pontos_controle)return(as.data.frame(curva) %>%mutate(passo_animacao =1:50)) }) ) %>%unnest(pontos) %>%mutate(tempo_total = intervalo_tempo + passo_animacao)
Chart
The most striking and immediate feature is the pattern of massive convergence. We see points emerging throughout the entire urban sprawl, especially in the more peripheral areas, and narrowing toward a very small and dense central area (Centro, Paulista, Faria Lima, Itaim, Berrini, etc.). This is the so-called “pendular movement” in action—the daily home-to-work commute.
The map visually exposes the city’s severe job-housing imbalance. People live scattered across a vast area, but the vast majority of formal jobs are concentrated in a limited core.
The animation clearly reveals the rhythm of the rush hour. Before 6:00 AM, the city is still “waking up,” with only a few flows. From 6:30 onward, the intensity increases dramatically. The peak “explosion” of trips occurs between 7:15 and 8:30 AM, when the brightest points (representing higher volumes) are concentrated. After 9:00 AM, the pace visibly slows down.
This highlights the pressure placed on the transportation system. Demand is not distributed evenly throughout the morning but rather compressed into a very short and critical time window. Lower-income populations, who depend more on public transport, have historically been settled in more distant areas, forcing them to make the longest journeys.
Although we are not mapping by income, the geographic pattern of trip origins strongly suggests this story of spatial segregation.
Show Code
# Format the time displayed in the titleformatar_tempo <-function(minutos) { h <-floor(minutos /60) m <- minutos %%60return(sprintf("%02d:%02d", h, m))}# Get the maximum and minimum of the volumesmax_val <-round(max(fluxos_para_animacao$volume_passageiros, na.rm =TRUE),0)min_val <-round(min(fluxos_para_animacao$volume_passageiros, na.rm =TRUE),0)# Plotanimacao <-ggplot() +geom_sf(data = muni_shp, fill ="gray85", color ="white", linewidth =0.1) +geom_point(data = pontos_curvados,aes(x = V1, y = V2,group = id_viagem,color = volume_passageiros ),size =0.5,alpha =1 ) +scale_color_viridis_c(option ="inferno", trans ="log10", name ="",limits =c(min_val, max_val),breaks =c(min_val, max_val)) +labs(title ='Public Transport Commuting Patterns in São Paulo',subtitle ='Time: {formatar_tempo(floor(frame_along / 15) * 15)} AM',caption ='Source: Pesquisa Origem e Destino 2023') +transition_reveal(tempo_total) +shadow_trail(max_frames =15,alpha =0.15 ) +theme_void() +theme(plot.title =element_text(family ="pf", size =32, face ="bold", color ="#222222", hjust =0.5),plot.margin =margin(0,-1,0,-1),plot.subtitle =element_text(family ="pf", size =20, color ="#555555", hjust =0.5),legend.text =element_text(family ="pf", size =16),plot.caption =element_text(family ="pf", size =20, hjust =0.5),legend.position ='bottom',legend.key.width =unit(1.2, "cm"),legend.key.size=unit(0.3, "cm") )options(ragg.max_dim =900000)# Render the animationanimate(animacao, nframes =300, fps =20, width =1600, height =1200,units ='px',renderer =gifski_renderer("animacao_com_ajustes.gif"),device ='ragg_png')