En esta entrega analizaremos los principales recorridos realizados mediante el sistema de bicicletas públicas “Ecobici” de la ciudad de México en la alcaldía Miguel Hidalgo en mayo de 2020.
Primero, se cargan las librerías necesarias.
## Warning: package 'sf' was built under R version 4.0.5
## Linking to GEOS 3.9.0, GDAL 3.2.1, PROJ 7.2.1
## Warning: package 'ggmap' was built under R version 4.0.5
## Loading required package: ggplot2
## Warning: package 'ggplot2' was built under R version 4.0.5
## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.
## Warning: package 'leaflet' was built under R version 4.0.5
## Warning: package 'tidyverse' was built under R version 4.0.5
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v tibble 3.1.0 v dplyr 1.0.5
## v tidyr 1.1.3 v stringr 1.4.0
## v readr 1.4.0 v forcats 0.5.1
## v purrr 0.3.4
## Warning: package 'tidyr' was built under R version 4.0.5
## Warning: package 'dplyr' was built under R version 4.0.5
## Warning: package 'stringr' was built under R version 4.0.5
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## Warning: package 'osrm' was built under R version 4.0.5
## Data: (c) OpenStreetMap contributors, ODbL 1.0 - http://www.openstreetmap.org/copyright
## Routing: OSRM - http://project-osrm.org/
Después se cargan los datos de los viajes realizados mediante Ecobici en la alcaldía Miguel Hidalgo durante mayo de 2020.
bikes <- read.csv("ecobici_mayo2020.csv", stringsAsFactors = TRUE, encoding = "UTF-8")
Se observa que la base de datos contiene 12,004 registros de viajes.
summary(bikes)
## X.U.FEFF.Genero Edad Bici Estacion_Retiro
## : 37 Min. :17.00 Min. : 868 Min. :166.0
## F:2500 1st Qu.:30.00 1st Qu.: 7988 1st Qu.:197.0
## M:9467 Median :36.00 Median : 9684 Median :224.0
## Mean :38.31 Mean : 9762 Mean :271.2
## 3rd Qu.:45.00 3rd Qu.:11559 3rd Qu.:286.0
## Max. :75.00 Max. :15339 Max. :480.0
##
## name lat_origen
## 208 HESIODO-LAMARTINE : 351 Min. :19.40
## 242 MIGUEL DE CERVANTES SAAVEDRA-PROL MOLIERE : 288 1st Qu.:19.42
## 182 PROGRESO-ASTRONOMOS : 275 Median :19.43
## 286 13 DE SEPTIEMBRE-AVENIDA PATRIOTISMO : 255 Mean :19.43
## 194 CTO MAHATMA GANDHI-AV. PASEO DE LA REFORMA: 250 3rd Qu.:19.44
## 211 NEWTON-HORACIO : 244 Max. :19.45
## (Other) :10341
## lon_origen Fecha_Retiro Hora_Retiro Estacion_Arribo
## Min. :-99.21 29/05/2020: 539 10:40:34: 4 Min. :166.0
## 1st Qu.:-99.20 04/05/2020: 503 13:26:16: 4 1st Qu.:197.0
## Median :-99.19 27/05/2020: 491 13:47:38: 4 Median :223.0
## Mean :-99.19 15/05/2020: 486 14:35:45: 4 Mean :268.6
## 3rd Qu.:-99.18 25/05/2020: 479 14:37:40: 4 3rd Qu.:285.0
## Max. :-99.17 28/05/2020: 458 18:42:26: 4 Max. :480.0
## (Other) :9048 (Other) :11980
## name.1 lat_destino
## 208 HESIODO-LAMARTINE : 394 Min. :19.40
## 242 MIGUEL DE CERVANTES SAAVEDRA-PROL MOLIERE: 330 1st Qu.:19.42
## 197 MIGUEL DE CERVANTES-FC DE CUERNAVACA : 308 Median :19.43
## 182 PROGRESO-ASTRONOMOS : 295 Mean :19.43
## 217 EULER-AV. HORACIO : 277 3rd Qu.:19.44
## 206 MOLIERE-HOMERO : 261 Max. :19.45
## (Other) :10139
## lon_destino Fecha_Arribo Hora_Arribo
## Min. :-99.21 29/05/2020: 542 11:25:39: 5
## 1st Qu.:-99.20 04/05/2020: 499 12:46:01: 4
## Median :-99.19 27/05/2020: 492 13:01:03: 4
## Mean :-99.19 15/05/2020: 487 13:10:01: 4
## 3rd Qu.:-99.18 25/05/2020: 478 13:55:08: 4
## Max. :-99.17 28/05/2020: 458 14:56:25: 4
## (Other) :9048 (Other) :11979
Se puede observar que la tabla de datos contiene información del género y edad del usuario, ID, nombre y coordenadas geográficas de la estación de origen y destino del viaje.
Lo primero que haremos es un mapa de las estaciones de Ecobici
bbox_cdmx <- make_bbox(bikes$lon_origen, bikes$lat_origen)
mapa_cdmx <- get_stamenmap(bbox = bbox_cdmx,
maptype = "terrain-background",
zoom = 15)
## Source : http://tile.stamen.com/terrain-background/15/7353/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14578.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14579.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14580.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14581.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14582.png
## Source : http://tile.stamen.com/terrain-background/15/7353/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7354/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7355/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7356/14583.png
## Source : http://tile.stamen.com/terrain-background/15/7357/14583.png
ggmap(mapa_cdmx) +
geom_point(data = bikes, aes( x = bikes$lon_origen, y = bikes$lat_origen)) +
labs(title = "Localización de estaciones Ecobici",
subtitle = "Alcaldía Miguel Hidalgo, CDMX",
caption = "Fuente: Ecobici",
x = "Longitud",
y = "Latitud") +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", angle= 90, size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
## Warning: Use of `bikes$lon_origen` is discouraged. Use `lon_origen` instead.
## Warning: Use of `bikes$lat_origen` is discouraged. Use `lat_origen` instead.
Ahora identificaremos el recorrido con mayor frecuencia en mayo de 2020.
viajes <- bikes %>%
group_by(Estacion_Retiro, name, Estacion_Arribo, name.1) %>%
summarise(cant_viajes = n())
## `summarise()` has grouped output by 'Estacion_Retiro', 'name', 'Estacion_Arribo'. You can override using the `.groups` argument.
Podemos observar que muchos son viajes circulares, es decir, donde comienzan, terminan. Aún así, es posible graficar
ggplot(viajes) +
geom_histogram(aes(cant_viajes))+
labs(title = "Recorridos en Ecobici CDMX con mayor frecuencia",
subtitle = "Alcaldía Miguel Hidalgo, mayo 2020",
caption = "Fuente: Datos Abiertos Ecobici CDMX",
x = "Estación",
y = "Cantidad de viajes") +
theme_bw() +
coord_flip() +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
También se puede realizar un gráfico de matriz, pero es necesario quitar el valor de escala porque el número de las estaciones no es continuo.
ggplot(viajes) +
geom_tile(aes(x=as.factor(Estacion_Retiro), y = as.factor(Estacion_Arribo), fill=cant_viajes)) +
scale_fill_viridis_c()+
labs(title = "Recorridos en Ecobici CDMX con mayor frecuencia",
subtitle = "Alcaldía Miguel Hidalgo, mayo 2020",
caption = "Fuente: Datos Abiertos Ecobici CDMX",
x = "Estación de Arribo",
y = "Estación de Retiro") +
theme_bw() +
coord_flip() +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", angle= 90, size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
Ahora, haremos un filtro de top 10 de los viajes con mayor cantidad de recorridos. Se filtrarán recorridos circulares y se ordenarán en orden descendente.
top <- viajes %>%
filter(Estacion_Retiro != Estacion_Arribo) %>%
arrange(desc(cant_viajes)) %>%
head(10)
top
## # A tibble: 10 x 5
## # Groups: Estacion_Retiro, name, Estacion_Arribo [10]
## Estacion_Retiro name Estacion_Arribo name.1 cant_viajes
## <int> <fct> <int> <fct> <int>
## 1 166 166 PROGRESO-CDA~ 167 167 GENERAL SA~ 41
## 2 211 211 NEWTON-HORAC~ 246 246 LEIBNITZ-T~ 39
## 3 465 465 LAGUNA DE GI~ 463 463 LAGO ANDRÃ~ 38
## 4 190 E190 GDOR IGNACI~ 175 E175 MUTUALISM~ 36
## 5 242 242 MIGUEL DE CE~ 208 208 HESIODO-LA~ 35
## 6 197 197 MIGUEL DE CE~ 208 208 HESIODO-LA~ 34
## 7 476 476 LAGO COMO-LA~ 217 217 EULER-AV. ~ 33
## 8 206 206 MOLIERE-HOME~ 205 205 TAINE-HOME~ 32
## 9 182 182 PROGRESO-AST~ 171 171 COMERCIO-J~ 31
## 10 211 211 NEWTON-HORAC~ 217 217 EULER-AV. ~ 31
Se observa que la mayor cantidad de viajes no circulares con Ecobici en la alcaldía Miguel Hidalgo de la Ciudad de México durante mayo de 2020 fue entre las estaciones 166 y 167.
Se vuelve a realizar un gráfico de matriz
ggplot(top)+
geom_tile(aes(x=as.factor(Estacion_Retiro), y=as.factor(Estacion_Arribo), fill=cant_viajes))+
scale_fill_distiller(palette = "RdYlGn") +
labs(title = "Top 10 Recorridos en Ecobici CDMX",
subtitle = "Alcaldía Miguel Hidalgo, mayo 2020",
caption = "Fuente: Datos Abiertos Ecobici CDMX",
x = "Estación de Arribo",
y = "Estación de Retiro") +
theme_bw() +
coord_flip() +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", angle= 90, size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
Ahora interesa saber la cantidad de kilómetros y el tiempo que se toma en cada viaje. Pero necesitamos las coordenadas geográficas por lo que nuevamente corremos parte del código anterior considerando las coordenadas.
viajes <- bikes %>%
group_by(Estacion_Retiro, name, lat_origen, lon_origen, Estacion_Arribo, name.1, lat_destino, lon_destino) %>%
summarise(cant_viajes = n())
## `summarise()` has grouped output by 'Estacion_Retiro', 'name', 'lat_origen', 'lon_origen', 'Estacion_Arribo', 'name.1', 'lat_destino'. You can override using the `.groups` argument.
top <- viajes %>%
filter(Estacion_Retiro != Estacion_Arribo) %>%
arrange(desc(cant_viajes)) %>%
head(10)
Para representar en un mapa los recorridos, primero necesitamos definir una función.
ruteo_bikes <- function(name, lon_origen, lat_origen, name.1, lon_destino, lat_destino) {
ruta <- osrmRoute(src = c(name, lon_origen, lat_origen), dst = c(name.1, lon_destino, lat_destino), returnclass = "sf",
overview = "full",
osrm.profile = "bike")
cbind(ORIGEN = name, DESTINO = name.1, ruta)
}
ruteo_top10 <- list(top$name, top$lon_origen, top$lat_origen, top$name.1, top$lon_destino, top$lat_destino)
ruteo_top10 <- pmap(ruteo_top10, ruteo_bikes) %>%
reduce(rbind)
summary(ruteo_top10)
## ORIGEN
## 211 NEWTON-HORACIO :2
## 166 PROGRESO-CDA. ANTONIO MACEO :1
## 182 PROGRESO-ASTRONOMOS :1
## 197 MIGUEL DE CERVANTES-FC DE CUERNAVACA :1
## 206 MOLIERE-HOMERO :1
## 242 MIGUEL DE CERVANTES SAAVEDRA-PROL MOLIERE:1
## (Other) :3
## DESTINO src dst
## 208 HESIODO-LAMARTINE :2 Min. : 1.00 Min. : 2.00
## 217 EULER-AV. HORACIO :2 1st Qu.: 24.25 1st Qu.: 37.75
## 167 GENERAL SALVADOR ALVARADO-PROGRESO:1 Median : 36.00 Median : 45.00
## 171 COMERCIO-JOSE MARTÍ :1 Mean : 49.20 Mean : 53.60
## 205 TAINE-HOMERO :1 3rd Qu.: 84.00 3rd Qu.: 74.75
## 246 LEIBNITZ-THIERS :1 Max. :103.00 Max. :123.00
## (Other) :2
## duration distance geometry
## Min. :1.865 Min. :0.466 LINESTRING :10
## 1st Qu.:4.643 1st Qu.:1.031 epsg:4326 : 0
## Median :6.902 Median :1.582 +proj=long...: 0
## Mean :6.327 Mean :1.420
## 3rd Qu.:7.962 3rd Qu.:1.837
## Max. :9.870 Max. :2.200
##
ggmap(mapa_cdmx) +
geom_point(data = bikes, aes(x = lon_origen, y = lat_origen), inherit.aes = FALSE)+
geom_sf(data = ruteo_top10, color = "red", size=1.3, inherit.aes = FALSE)+
labs(title = "Recorridos Ecobici con mayor frecuencia",
subtitle = "Alcaldía Miguel Hidalgo, CDMX. Mayo de 2020",
caption = "Fuente: Ecobici",
x = "Longitud",
y = "Latitud") +
theme (plot.title = element_text(family = "sans",
size = rel(1),
vjust = 2,
face = "bold.italic",
color = "black",
lineheight = 1.5),
plot.subtitle = element_text(family = "sans",
size = rel(0.8),
vjust = 2,
face = "italic",
color = "gray40",
lineheight = 1.5),
plot.caption = element_text(family = "sans",
size = rel(0.7),
vjust = 2,
face = "italic",
color = "gray30",
lineheight = 1.5)) +
theme(axis.title.x = element_text(face="bold", vjust=-0.5, colour="gray60", size=rel(0.75)),
axis.title.y = element_text(face="bold", vjust=1.5, colour="gray60", angle= 90, size=rel(0.75)),
axis.text.x = element_text(face="italic", colour="gray60", angle = 45, size=rel(0.65)),
axis.text.y = element_text(face="italic", colour="gray60", size=rel(0.65)),
legend.title = element_text(face = "bold", colour="gray60", size=rel(0.75)),
legend.text = element_text(face="italic", colour="gray60", size=rel(0.6)))
## Coordinate system already present. Adding new coordinate system, which will replace the existing one.
Finalmente, haremos un mapa interactivo en que se aprecie información relevante.
ruteo_top10 <- ruteo_top10 %>%
left_join(top, by=c("ORIGEN"="name", "DESTINO"="name.1"))
ruteo_top10 <- ruteo_top10 %>%
mutate(RUTA = paste("Desde", ORIGEN,"hasta", DESTINO))
paleta <- c(low="gold", high= "deeppink4")
labels <- sprintf(
"<strong>%s</strong><br/>%g km <br/>%g min",
ruteo_top10$RUTA, round(ruteo_top10$distance, 2), round(ruteo_top10$duration, 0)
) %>% lapply(htmltools::HTML)
leaflet(ruteo_top10) %>%
addTiles() %>%
addProviderTiles(providers$CartoDB) %>%
addPolylines(color = ~colorNumeric(paleta, ruteo_top10$distance)(distance),
weight = 6,
label = labels,
labelOptions = labelOptions(
style = list("font-weight" = "normal", padding = "2px 5px"),
textsize = "10px",
direction = "top"),
highlight = highlightOptions(weight = 8,
bringToFront = TRUE)) %>%
addLegend("bottomright", pal = colorNumeric(paleta, ruteo_top10$distance), values = ~distance,
title = "Distancia",
labFormat = labelFormat(suffix = "km"),
opacity = 0.75)