Recorridos en Ecobici de la Ciudad de México

En esta entrega analizaremos los principales recorridos realizados mediante el sistema de bicicletas públicas “Ecobici” de la ciudad de México en la alcaldía Miguel Hidalgo en mayo de 2020.

Primero, se cargan las librerías necesarias.

## Warning: package 'sf' was built under R version 4.0.5
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
## Warning: package 'ggmap' was built under R version 4.0.5
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
## Warning: package 'leaflet' was built under R version 4.0.5
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble  3.1.0     v dplyr   1.0.5
## v tidyr   1.1.3     v stringr 1.4.0
## v readr   1.4.0     v forcats 0.5.1
## v purrr   0.3.4
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'stringr' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## Warning: package 'osrm' was built under R version 4.0.5
## Data: (c) OpenStreetMap contributors, ODbL 1.0 - http://www.openstreetmap.org/copyright
## Routing: OSRM - http://project-osrm.org/

Después se cargan los datos de los viajes realizados mediante Ecobici en la alcaldía Miguel Hidalgo durante mayo de 2020.

bikes <- read.csv("ecobici_mayo2020.csv", stringsAsFactors = TRUE, encoding = "UTF-8")

Se observa que la base de datos contiene 12,004 registros de viajes.

summary(bikes)
##  X.U.FEFF.Genero      Edad            Bici       Estacion_Retiro
##   :  37          Min.   :17.00   Min.   :  868   Min.   :166.0  
##  F:2500          1st Qu.:30.00   1st Qu.: 7988   1st Qu.:197.0  
##  M:9467          Median :36.00   Median : 9684   Median :224.0  
##                  Mean   :38.31   Mean   : 9762   Mean   :271.2  
##                  3rd Qu.:45.00   3rd Qu.:11559   3rd Qu.:286.0  
##                  Max.   :75.00   Max.   :15339   Max.   :480.0  
##                                                                 
##                                              name         lat_origen   
##  208 HESIODO-LAMARTINE                         :  351   Min.   :19.40  
##  242 MIGUEL DE CERVANTES SAAVEDRA-PROL MOLIERE :  288   1st Qu.:19.42  
##  182 PROGRESO-ASTRONOMOS                       :  275   Median :19.43  
##  286 13 DE SEPTIEMBRE-AVENIDA PATRIOTISMO      :  255   Mean   :19.43  
##  194 CTO MAHATMA GANDHI-AV. PASEO DE LA REFORMA:  250   3rd Qu.:19.44  
##  211 NEWTON-HORACIO                            :  244   Max.   :19.45  
##  (Other)                                       :10341                  
##    lon_origen         Fecha_Retiro    Hora_Retiro    Estacion_Arribo
##  Min.   :-99.21   29/05/2020: 539   10:40:34:    4   Min.   :166.0  
##  1st Qu.:-99.20   04/05/2020: 503   13:26:16:    4   1st Qu.:197.0  
##  Median :-99.19   27/05/2020: 491   13:47:38:    4   Median :223.0  
##  Mean   :-99.19   15/05/2020: 486   14:35:45:    4   Mean   :268.6  
##  3rd Qu.:-99.18   25/05/2020: 479   14:37:40:    4   3rd Qu.:285.0  
##  Max.   :-99.17   28/05/2020: 458   18:42:26:    4   Max.   :480.0  
##                   (Other)   :9048   (Other) :11980                  
##                                            name.1       lat_destino   
##  208 HESIODO-LAMARTINE                        :  394   Min.   :19.40  
##  242 MIGUEL DE CERVANTES SAAVEDRA-PROL MOLIERE:  330   1st Qu.:19.42  
##  197 MIGUEL DE CERVANTES-FC DE CUERNAVACA     :  308   Median :19.43  
##  182 PROGRESO-ASTRONOMOS                      :  295   Mean   :19.43  
##  217 EULER-AV. HORACIO                        :  277   3rd Qu.:19.44  
##  206 MOLIERE-HOMERO                           :  261   Max.   :19.45  
##  (Other)                                      :10139                  
##   lon_destino         Fecha_Arribo    Hora_Arribo   
##  Min.   :-99.21   29/05/2020: 542   11:25:39:    5  
##  1st Qu.:-99.20   04/05/2020: 499   12:46:01:    4  
##  Median :-99.19   27/05/2020: 492   13:01:03:    4  
##  Mean   :-99.19   15/05/2020: 487   13:10:01:    4  
##  3rd Qu.:-99.18   25/05/2020: 478   13:55:08:    4  
##  Max.   :-99.17   28/05/2020: 458   14:56:25:    4  
##                   (Other)   :9048   (Other) :11979

Se puede observar que la tabla de datos contiene información del género y edad del usuario, ID, nombre y coordenadas geográficas de la estación de origen y destino del viaje.

Lo primero que haremos es un mapa de las estaciones de Ecobici

bbox_cdmx <- make_bbox(bikes$lon_origen, bikes$lat_origen)
mapa_cdmx <- get_stamenmap(bbox = bbox_cdmx,
                           maptype = "terrain-background",
                           zoom = 15)
## Source : http://tile.stamen.com/terrain-background/15/7353/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14583.png
ggmap(mapa_cdmx) +
  geom_point(data = bikes, aes( x = bikes$lon_origen, y = bikes$lat_origen)) +
  labs(title = "Localización de estaciones Ecobici",
       subtitle = "Alcaldía Miguel Hidalgo, CDMX",
       caption = "Fuente: Ecobici",
       x = "Longitud", 
       y = "Latitud") +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", angle= 90, size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60",  size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
## Warning: Use of `bikes$lon_origen` is discouraged. Use `lon_origen` instead.
## Warning: Use of `bikes$lat_origen` is discouraged. Use `lat_origen` instead.

Ahora identificaremos el recorrido con mayor frecuencia en mayo de 2020.

viajes <- bikes %>%
  group_by(Estacion_Retiro, name, Estacion_Arribo, name.1) %>%
  summarise(cant_viajes = n())
## `summarise()` has grouped output by 'Estacion_Retiro', 'name', 'Estacion_Arribo'. You can override using the `.groups` argument.

Podemos observar que muchos son viajes circulares, es decir, donde comienzan, terminan. Aún así, es posible graficar

ggplot(viajes) +
  geom_histogram(aes(cant_viajes))+
  labs(title = "Recorridos en Ecobici CDMX con mayor frecuencia",
       subtitle = "Alcaldía Miguel Hidalgo, mayo 2020",
       caption = "Fuente: Datos Abiertos Ecobici CDMX",
       x = "Estación",
       y = "Cantidad de viajes") +
  theme_bw() +
  coord_flip() +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

También se puede realizar un gráfico de matriz, pero es necesario quitar el valor de escala porque el número de las estaciones no es continuo.

ggplot(viajes) +
  geom_tile(aes(x=as.factor(Estacion_Retiro), y = as.factor(Estacion_Arribo), fill=cant_viajes)) +
  scale_fill_viridis_c()+
  labs(title = "Recorridos en Ecobici CDMX con mayor frecuencia",
       subtitle = "Alcaldía Miguel Hidalgo, mayo 2020",
       caption = "Fuente: Datos Abiertos Ecobici CDMX",
       x = "Estación de Arribo",
       y = "Estación de Retiro") +
  theme_bw() +
  coord_flip() +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60", angle= 90, size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))

Ahora, haremos un filtro de top 10 de los viajes con mayor cantidad de recorridos. Se filtrarán recorridos circulares y se ordenarán en orden descendente.

top <- viajes %>%
  filter(Estacion_Retiro != Estacion_Arribo) %>%
  arrange(desc(cant_viajes)) %>%
  head(10)
top
## # A tibble: 10 x 5
## # Groups:   Estacion_Retiro, name, Estacion_Arribo [10]
##    Estacion_Retiro name              Estacion_Arribo name.1          cant_viajes
##              <int> <fct>                       <int> <fct>                 <int>
##  1             166 166 PROGRESO-CDA~             167 167 GENERAL SA~          41
##  2             211 211 NEWTON-HORAC~             246 246 LEIBNITZ-T~          39
##  3             465 465 LAGUNA DE GI~             463 463 LAGO ANDRÃ~          38
##  4             190 E190 GDOR IGNACI~             175 E175 MUTUALISM~          36
##  5             242 242 MIGUEL DE CE~             208 208 HESIODO-LA~          35
##  6             197 197 MIGUEL DE CE~             208 208 HESIODO-LA~          34
##  7             476 476 LAGO COMO-LA~             217 217 EULER-AV. ~          33
##  8             206 206 MOLIERE-HOME~             205 205 TAINE-HOME~          32
##  9             182 182 PROGRESO-AST~             171 171 COMERCIO-J~          31
## 10             211 211 NEWTON-HORAC~             217 217 EULER-AV. ~          31

Se observa que la mayor cantidad de viajes no circulares con Ecobici en la alcaldía Miguel Hidalgo de la Ciudad de México durante mayo de 2020 fue entre las estaciones 166 y 167.

Se vuelve a realizar un gráfico de matriz

ggplot(top)+
  geom_tile(aes(x=as.factor(Estacion_Retiro), y=as.factor(Estacion_Arribo), fill=cant_viajes))+
  scale_fill_distiller(palette = "RdYlGn") +
  labs(title = "Top 10 Recorridos en Ecobici CDMX",
       subtitle = "Alcaldía Miguel Hidalgo, mayo 2020",
       caption = "Fuente: Datos Abiertos Ecobici CDMX",
       x = "Estación de Arribo",
       y = "Estación de Retiro") +
  theme_bw() +
  coord_flip() +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60", angle= 90, size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))

Ahora interesa saber la cantidad de kilómetros y el tiempo que se toma en cada viaje. Pero necesitamos las coordenadas geográficas por lo que nuevamente corremos parte del código anterior considerando las coordenadas.

viajes <- bikes %>%
  group_by(Estacion_Retiro, name, lat_origen, lon_origen, Estacion_Arribo, name.1, lat_destino, lon_destino) %>%
  summarise(cant_viajes = n())
## `summarise()` has grouped output by 'Estacion_Retiro', 'name', 'lat_origen', 'lon_origen', 'Estacion_Arribo', 'name.1', 'lat_destino'. You can override using the `.groups` argument.
top <- viajes %>%
  filter(Estacion_Retiro != Estacion_Arribo) %>%
  arrange(desc(cant_viajes)) %>%
  head(10)

Para representar en un mapa los recorridos, primero necesitamos definir una función.

ruteo_bikes <- function(name, lon_origen, lat_origen, name.1, lon_destino, lat_destino) {
  ruta <- osrmRoute(src = c(name, lon_origen, lat_origen), dst = c(name.1, lon_destino, lat_destino), returnclass = "sf",
              overview = "full",
              osrm.profile = "bike")
cbind(ORIGEN = name, DESTINO = name.1, ruta)
}
ruteo_top10 <- list(top$name, top$lon_origen, top$lat_origen, top$name.1, top$lon_destino, top$lat_destino)
ruteo_top10 <- pmap(ruteo_top10, ruteo_bikes) %>% 
  reduce(rbind)
summary(ruteo_top10)
##                                            ORIGEN 
##  211 NEWTON-HORACIO                           :2  
##  166 PROGRESO-CDA. ANTONIO MACEO              :1  
##  182 PROGRESO-ASTRONOMOS                      :1  
##  197 MIGUEL DE CERVANTES-FC DE CUERNAVACA     :1  
##  206 MOLIERE-HOMERO                           :1  
##  242 MIGUEL DE CERVANTES SAAVEDRA-PROL MOLIERE:1  
##  (Other)                                      :3  
##                                    DESTINO       src              dst        
##  208 HESIODO-LAMARTINE                 :2   Min.   :  1.00   Min.   :  2.00  
##  217 EULER-AV. HORACIO                 :2   1st Qu.: 24.25   1st Qu.: 37.75  
##  167 GENERAL SALVADOR ALVARADO-PROGRESO:1   Median : 36.00   Median : 45.00  
##  171 COMERCIO-JOSE MARTÍ               :1   Mean   : 49.20   Mean   : 53.60  
##  205 TAINE-HOMERO                      :1   3rd Qu.: 84.00   3rd Qu.: 74.75  
##  246 LEIBNITZ-THIERS                   :1   Max.   :103.00   Max.   :123.00  
##  (Other)                               :2                                    
##     duration        distance              geometry 
##  Min.   :1.865   Min.   :0.466   LINESTRING   :10  
##  1st Qu.:4.643   1st Qu.:1.031   epsg:4326    : 0  
##  Median :6.902   Median :1.582   +proj=long...: 0  
##  Mean   :6.327   Mean   :1.420                     
##  3rd Qu.:7.962   3rd Qu.:1.837                     
##  Max.   :9.870   Max.   :2.200                     
## 
ggmap(mapa_cdmx) +
  geom_point(data = bikes, aes(x = lon_origen, y = lat_origen), inherit.aes = FALSE)+
  geom_sf(data = ruteo_top10, color = "red", size=1.3, inherit.aes = FALSE)+
  labs(title = "Recorridos Ecobici con mayor frecuencia",
       subtitle = "Alcaldía Miguel Hidalgo, CDMX. Mayo de 2020",
       caption = "Fuente: Ecobici",
       x = "Longitud", 
       y = "Latitud") +
  theme (plot.title = element_text(family = "sans",
                                   size = rel(1), 
                                   vjust = 2, 
                                   face = "bold.italic", 
                                   color = "black", 
                                   lineheight = 1.5), 
          plot.subtitle = element_text(family = "sans",
                                      size = rel(0.8),
                                      vjust = 2, 
                                      face = "italic", 
                                      color = "gray40", 
                                      lineheight = 1.5),
          plot.caption = element_text(family = "sans",
                                     size = rel(0.7),
                                     vjust = 2, 
                                     face = "italic", 
                                     color = "gray30", 
                                     lineheight = 1.5)) + 
  theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)), 
        axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", angle= 90, size=rel(0.75)),
        axis.text.x = element_text(face="italic", colour="gray60", angle = 45,  size=rel(0.65)),
        axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
        legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
        legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

Finalmente, haremos un mapa interactivo en que se aprecie información relevante.

ruteo_top10 <- ruteo_top10 %>%
  left_join(top, by=c("ORIGEN"="name", "DESTINO"="name.1"))
ruteo_top10 <- ruteo_top10 %>%
  mutate(RUTA = paste("Desde", ORIGEN,"hasta", DESTINO))
paleta <- c(low="gold", high= "deeppink4")

labels <- sprintf(
  "<strong>%s</strong><br/>%g km <br/>%g min",
  ruteo_top10$RUTA, round(ruteo_top10$distance, 2), round(ruteo_top10$duration, 0)
) %>% lapply(htmltools::HTML)

leaflet(ruteo_top10) %>%
  addTiles() %>%
  addProviderTiles(providers$CartoDB) %>%
  addPolylines(color = ~colorNumeric(paleta, ruteo_top10$distance)(distance),
               weight = 6,
               label = labels,
    labelOptions = labelOptions(
      style = list("font-weight" = "normal", padding = "2px 5px"),
      textsize = "10px",
      direction = "top"),
              highlight = highlightOptions(weight = 8,
                                           bringToFront = TRUE)) %>% 
  addLegend("bottomright", pal = colorNumeric(paleta, ruteo_top10$distance), values = ~distance,
            title = "Distancia",
            labFormat = labelFormat(suffix = "km"),
            opacity = 0.75)