MiBici, Guadalajara’s public bicycle-sharing system, began operations in the year 2014. Since the its implementation, the system has been growing in geographical coverage and operational capacity. This is an initial analysis that aims to describe the main characteristics of the system and its use patterns. This will serve as a base to build further models and analysis to inform future infrastructure planning and evaluations.
Adding libraries rearr, dplyr, ggplot2, lubridate and tidyr
library(readr)
library(dplyr)
##
## Adjuntando el paquete: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Adjuntando el paquete: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(tidyr)
MiBici is based on the operation of a network of stations where users can access the use of bicycles for a certain period of time to make their daily trips.According to official information, it has 3,972 bicycles and 360 stations, which are the origin-destination base of the system’s trips.
In the following lines we will download the list and location of the system’s stations, as well as the historical record of trips from the start of operations until the first half of 2024.
if(!file.exists("nomenclatura_2024_07.csv")) {
download.file("https://www.mibici.net/site/assets/files/1118/nomenclatura_2024_07.csv",
"nomenclatura_2024_07.csv")
} else mbpoints <- read.csv("nomenclatura_2024_07.csv", encoding = "latin1")
Historical trip data is available on the MiBici open data page. Each file corresponds to a single month of operational data. However, to automatize the download process the URLs must be constructed taking into account a numerical string contained in each one, which does not respond to an easily identifiable pattern.
This section will gather a URL list to effectively download data sets for years 2014 to 2024. Note that year 2014 only includes data from December, and year 2024 goes only from January to June.
Y2014
mb2014 <- "https://mibici.net/site/assets/files/1079/datos_abiertos_2014_12.csv"
Y2015
rango2015 <- c(1084,1097:1107)
mb2015 <- NULL
endf <- 0
for (i in rango2015) {
endf <- endf+1
urlmb <- paste("https://mibici.net/site/assets/files/",
i, "/datos_abiertos_2015_",
ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
mb2015 <- rbind(mb2015, urlmb)
}
Y2016
mb2016 <- c("https://www.mibici.net/site/assets/files/1020/datos_abiertos_2016_01-1.csv",
"https://www.mibici.net/site/assets/files/1021/datos_abiertos_2016_02.csv",
"https://www.mibici.net/site/assets/files/1023/datos_abiertos_2016_03.csv",
"https://www.mibici.net/site/assets/files/1029/datos_abiertos_2016_04.csv",
"https://www.mibici.net/site/assets/files/1058/datos_abiertos_2016_05.csv",
"https://www.mibici.net/site/assets/files/1080/datos_abiertos_2016_06.csv",
"https://www.mibici.net/site/assets/files/1081/datos_abiertos_2016_07.csv",
"https://www.mibici.net/site/assets/files/1082/datos_abiertos_2016_08.csv",
"https://www.mibici.net/site/assets/files/1083/datos_abiertos_2016_09.csv",
"https://www.mibici.net/site/assets/files/1108/datos_abiertos_2016_10.csv",
"https://www.mibici.net/site/assets/files/1109/datos_abiertos_2016_11.csv",
"https://www.mibici.net/site/assets/files/1110/datos_abiertos_2016_12.csv")
mb2016 <- as.data.frame(mb2016)
colnames(mb2016) <- "V1"
Y2017
rango2017 <- c(1022,1111,1112,1115,1116,1119,1120,1122:1124,1197,1198)
mb2017 <- NULL
endf <- 0
for (i in rango2017) {
endf <- endf+1
urlmb <- paste("https://mibici.net/site/assets/files/",
i, "/datos_abiertos_2017_",
ifelse(endf<4, paste0("0", endf),
ifelse(endf==4, paste("04-1"), ifelse(endf<10, paste0("0", endf),endf))),".csv", sep = "")
mb2017 <- rbind(mb2017, urlmb)
}
Y2018
rango2018 <- c(1200:1211)
mb2018 <- NULL
endf <- 0
for (i in rango2018) {
endf <- endf+1
urlmb <- paste("https://mibici.net/site/assets/files/",
i, "/datos_abiertos_2018_",
ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
mb2018 <- rbind(mb2018, urlmb)
}
Y2019
rango2019 <- c(1214, 1217:1225, 1227,1228)
mb2019 <- NULL
endf<-0
for (i in rango2019) {
endf <- endf+1
urlmb <- paste("https://mibici.net/site/assets/files/",
i, "/datos_abiertos_2019_",
ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
mb2019 <- rbind(mb2019, urlmb)
}
Y2020
rango2020 <- c(1230:1232, 1235:1241, 1317, 1318)
mb2020 <- NULL
endf<-0
for (i in rango2020) {
endf <- endf+1
urlmb <- paste("https://mibici.net/site/assets/files/",
i, "/datos_abiertos_2020_",
ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
mb2020 <- rbind(mb2020, urlmb)
}
Y2021
rango2021 <- c(1320,1323,1322,2129,3292,4572,5728,7073,8461,10088,11538,12780)
mb2021 <- NULL
endf<-0
for (i in rango2021) {
endf <- endf+1
urlmb <- paste("https://mibici.net/site/assets/files/",
i, "/datos_abiertos_2021_",
ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
mb2021 <- rbind(mb2021, urlmb)
}
Y2022-2024
mb2022 <- c(
"https://mibici.net/site/assets/files/14797/datos_abiertos_2022_01.csv",
"https://mibici.net/site/assets/files/16831/datos_abiertos_2022_02.csv",
"https://mibici.net/site/assets/files/19034/datos_abiertos_2022_03.csv",
"https://mibici.net/site/assets/files/20338/datos_abiertos_2022_04.csv",
"https://mibici.net/site/assets/files/21842/datos_abiertos_2022_05.csv",
"https://mibici.net/site/assets/files/23473/datos_abiertos_2022_06.csv",
"https://mibici.net/site/assets/files/25235/datos_abiertos_2022_07.csv",
"https://mibici.net/site/assets/files/27484/datos_abiertos_2022_08.csv",
"https://mibici.net/site/assets/files/29663/datos_abiertos_2022_09.csv",
"https://mibici.net/site/assets/files/31507/datos_abiertos_2022_10.csv",
"https://mibici.net/site/assets/files/33115/datos_abiertos_2022_11.csv",
"https://mibici.net/site/assets/files/34432/datos_abiertos_2022_12.csv")
mb2023 <- c (
"https://mibici.net/site/assets/files/36762/datos_abiertos_2023_01.csv",
"https://mibici.net/site/assets/files/40197/datos_abiertos_2023_02.csv",
"https://mibici.net/site/assets/files/43035/datos_abiertos_2023_03.csv",
"https://mibici.net/site/assets/files/46758/datos_abiertos_2023_04.csv",
"https://mibici.net/site/assets/files/46759/datos_abiertos_2023_05.csv",
"https://mibici.net/site/assets/files/48116/datos_abiertos_2023_06.csv",
"https://mibici.net/site/assets/files/49448/datos_abiertos_2023_07.csv",
"https://mibici.net/site/assets/files/51295/datos_abiertos_2023_08.csv",
"https://mibici.net/site/assets/files/53344/datos_abiertos_2023_09.csv",
"https://mibici.net/site/assets/files/54935/datos_abiertos_2023_10.csv",
"https://mibici.net/site/assets/files/57344/datos_abiertos_2023_11.csv",
"https://mibici.net/site/assets/files/58715/datos_abiertos_2023_12.csv")
mb2024 <- c(
"https://mibici.net/site/assets/files/61450/datos_abiertos_2024_01.csv",
"https://mibici.net/site/assets/files/77251/datos_abiertos_2024_02.csv",
"https://mibici.net/site/assets/files/77252/datos_abiertos_2024_03.csv",
"https://mibici.net/site/assets/files/80028/datos_abiertos_2024_04.csv",
"https://mibici.net/site/assets/files/81663/datos_abiertos_2024_05.csv",
"https://mibici.net/site/assets/files/83901/datos_abiertos_2024_06.csv")
Once the URLs are listed, we can proceed to download the data.
url14_17 <- rbind(mb2014,mb2015,mb2016,mb2017)
if(!file.exists("MiBici_2014_2017.csv")) {
df2014_17 <- NULL
for (i in url14_17) {
dfmibici <- read.csv(i, encoding = "latin1")
df2014_17 <- rbind(df2014_17, dfmibici)
}
rm(dfMiBici)
write_csv(df2014_17, "MiBici_2014_2017.csv")
} else df2014_17 <- read.csv("MiBici_2014_2017.csv", encoding = "latin1")
if(!file.exists("MiBici_2018.csv")) {
df2018 <- NULL
for (i in mb2018) {
dfmibici <- read.csv(i, encoding = "latin1")
df2018 <- rbind(df2018, dfmibici)
}
rm(dfmibici)
write_csv(df2018, "MiBici_2018.csv")
} else df2018 <- read.csv("MiBici_2018.csv", encoding = "latin1")
if(!file.exists("MiBici_2019.csv")) {
df2019 <- NULL
for (i in mb2019) {
dfmibici <- read.csv(i, encoding = "latin1")
df2019 <- rbind(df2019, dfmibici)
}
rm(dfmibici)
write_csv(df2019, "MiBici_2019.csv")
} else df2019 <- read.csv("MiBici_2019.csv", encoding = "latin1")
if(!file.exists("MiBici_2020.csv")) {
df2020 <- NULL
for (i in mb2020) {
dfmibici <- read.csv(i, encoding = "latin1")
df2020 <- rbind(df2020, dfmibici)
}
rm(dfmibici)
write_csv(df2020, "MiBici_2020.csv")
} else df2020 <- read.csv("MiBici_2020.csv", encoding = "latin1")
if(!file.exists("MiBici_2021.csv")) {
df2021 <- NULL
for (i in mb2021) {
dfmibici <- read.csv(i, encoding = "latin1")
df2021 <- rbind(df2021, setNames(dfmibici, names(df2020)))
}
rm(dfmibici)
write_csv(df2021, "MiBici_2021.csv")
} else df2021 <- read.csv("MiBici_2021.csv", encoding = "latin1")
if(!file.exists("MiBici_2022.csv")) {
df2022 <- NULL
for (i in mb2022) {
dfmibici <- read.csv(i, encoding = "latin1")
df2022 <- rbind(df2022, setNames(dfmibici, names(df2021)))
}
rm(dfmibici)
write_csv(df2022, "MiBici_2022.csv")
} else df2022 <- read.csv("MiBici_2022.csv", encoding = "latin1")
# Originaly the data set for 2023 has a combination of different formats in the date columns
# apparently this only happens in November, so we change the strategy here to treat that DF
# sepparetly
mb2023a <- mb2023[1:10]
mb2023b <- mb2023[11]
mb2023c <- mb2023[12]
if(!file.exists("MiBici_2023a.csv")) {
df2023a <- NULL
for (i in mb2023a) {
dfmibici <- read.csv(i, encoding = "latin1")
df2023a <- rbind(df2023a, setNames(dfmibici, names(df2021)))
}
rm(dfmibici)
write_csv(df2023a, "MiBici_2023a.csv")
} else df2023a <- read.csv("MiBici_2023a.csv", encoding = "latin1")
if(!file.exists("MiBici_2023b.csv")) {
df2023b <- NULL|
for (i in mb2023b) {
dfmibici <- read.csv(i, encoding = "latin1")
df2023b <- rbind(df2023b, setNames(dfmibici, names(df2021)))
}
rm(dfmibici)
write_csv(df2023b, "MiBici_2023b.csv")
} else df2023b <- read.csv("MiBici_2023b.csv", encoding = "latin1")
if(!file.exists("MiBici_2023c.csv")) {
df2023c <- NULL
for (i in mb2023c) {
dfmibici <- read.csv(i, encoding = "latin1")
df2023c <- rbind(df2023c, setNames(dfmibici, names(df2021)))
}
rm(dfmibici)
write_csv(df2023c, "MiBici_2023c.csv")
} else df2023c <- read.csv("MiBici_2023c.csv", encoding = "latin1")
if(!file.exists("MiBici_2024.csv")) {
df2024 <- NULL
for (i in mb2024) {
dfmibici <- read.csv(i, encoding = "latin1")
df2024 <- rbind(df2024, setNames(dfmibici, names(df2021)))
}
rm(dfmibici)
write_csv(df2024, "MiBici_2024.csv")
} else df2024 <- read.csv("MiBici_2024.csv", encoding = "latin1")
Data for 11.2023 comes from origin with a different date-time format, we will correct this before proceeding to binding together the data frames for the different years.
df2023b$Inicio_del_viaje = as_datetime(df2023b$Inicio_del_viaje, format = "%d/%m/%Y %H:%M")
df2023b$Fin_del_viaje = as_datetime(df2023b$Fin_del_viaje, format = "%d/%m/%Y %H:%M")
Grouping data frames according to their data-time column format before correcting.
dfMiBici_a <- bind_rows(df2014_17, df2018, df2019, df2020) #check datetime formating
dfMiBici_b <- bind_rows(df2021, df2022) #check datetime formating
dfMiBici_a$Inicio_del_viaje <- as_datetime(dfMiBici_a$Inicio_del_viaje)
dfMiBici_a$Fin_del_viaje <- as_datetime(dfMiBici_a$Fin_del_viaje)
dfMiBici_b$Inicio_del_viaje <- as_datetime(dfMiBici_b$Inicio_del_viaje)
dfMiBici_b$Fin_del_viaje <- as_datetime(dfMiBici_b$Fin_del_viaje)
dfMiBici_ab <- bind_rows(dfMiBici_a, dfMiBici_b) #check datetime formatting
rm(dfMiBici_a, dfMiBici_b)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 47631042 2543.8 92717744 4951.7 56382687 3011.2
## Vcells 514237582 3923.4 739172296 5639.5 737983057 5630.4
df2023a$Inicio_del_viaje <- as_datetime(df2023a$Inicio_del_viaje)
df2023a$Fin_del_viaje <- as_datetime(df2023a$Fin_del_viaje)
df2023c$Inicio_del_viaje <- as_datetime(df2023c$Inicio_del_viaje)
df2023c$Fin_del_viaje <- as_datetime(df2023c$Fin_del_viaje)
df2024$Inicio_del_viaje <- as_datetime(df2024$Inicio_del_viaje)
df2024$Fin_del_viaje <- as_datetime(df2024$Fin_del_viaje)
dfMiBici <- bind_rows(dfMiBici_ab, df2023a, df2023b, df2023c, df2024)
rm(df2014_17, df2018, df2019, df2020, df2021, df2022, df2023a, df2023b, df2023c, df2024, dfMiBici_ab)
gc()
## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 937822 50.1 74174196 3961.4 56382687 3011.2
## Vcells 208431670 1590.3 591337837 4511.6 737983057 5630.4
summary(dfMiBici)
## Viaje_Id Usuario_Id Genero AÃ.o_de_nacimiento
## Min. : 4601 Min. : 3 Length:28885772 Min. : 1
## 1st Qu.: 8738406 1st Qu.: 166711 Class :character 1st Qu.:1982
## Median :17508282 Median : 403540 Mode :character Median :1989
## Mean :17385771 Mean : 625403 Mean :1986
## 3rd Qu.:26055293 3rd Qu.: 706031 3rd Qu.:1994
## Max. :34524981 Max. :4044423 Max. :2021
## NA's :10897672
## Inicio_del_viaje Fin_del_viaje
## Min. :2014-12-01 00:33:47.00 Min. :2014-12-01 00:36:54.00
## 1st Qu.:2018-12-14 16:58:07.75 1st Qu.:2018-12-14 17:10:19.75
## Median :2020-11-03 19:34:07.50 Median :2020-11-03 19:47:47.00
## Mean :2020-10-14 01:13:11.29 Mean :2020-10-14 01:28:46.64
## 3rd Qu.:2022-11-10 09:16:59.75 3rd Qu.:2022-11-10 09:26:04.75
## Max. :2024-06-30 23:59:48.00 Max. :2024-07-01 00:32:07.00
##
## Origen_Id Destino_Id AÃ.o_de_nacimiento
## Min. : 2.0 Min. : 2.0 Min. : 1
## 1st Qu.: 49.0 1st Qu.: 49.0 1st Qu.:1984
## Median : 84.0 Median : 80.0 Median :1992
## Mean :124.8 Mean :123.9 Mean :1989
## 3rd Qu.:197.0 3rd Qu.:198.0 3rd Qu.:1997
## Max. :390.0 Max. :390.0 Max. :2023
## NA's :18067697
There are a couple of minor issues with the Year of birth columns. We will fix column names and combine year of birth columns into one
colnames(dfMiBici) <- c("Viaje_Id","Usuario_Id","Genero", "Anio_de_nacimiento", "Inicio_del_viaje", "Fin_del_viaje","Origen_Id","Destino_Id", "Anio_de_nac2")
dfMiBici <- dfMiBici %>%
mutate(Anio_de_nacimiento = ifelse(is.na(Anio_de_nacimiento), Anio_de_nac2, Anio_de_nacimiento))
dfMiBici <- dfMiBici[,1:8]
unique(dfMiBici$Genero)
## [1] "M" "F" NA "" "NULL"
Fix gender and year of birth values
dfMiBici <- dfMiBici %>%
mutate(Genero = ifelse(Genero=="NULL", NA, Genero)) %>%
mutate(Genero = ifelse(Genero=="", NA, Genero)) %>%
mutate(Anio_de_nacimiento = ifelse(Anio_de_nacimiento < 1900, NA, Anio_de_nacimiento))
We will transform the Gender variable to a Factor. Also create a few columns to facilitate travel time analysis.
dfMiBici <- dfMiBici %>%
mutate(Genero = as.factor(Genero)) %>%
mutate(Tiempo_viaje = Fin_del_viaje - Inicio_del_viaje) %>%
mutate(Minutos_viaje = as.numeric(Tiempo_viaje/60)) %>%
mutate(Minutos_viaje = round(Minutos_viaje, 2)) %>%
mutate(Hora_inicio = hms::as_hms(Inicio_del_viaje))
summary(dfMiBici)
## Viaje_Id Usuario_Id Genero Anio_de_nacimiento
## Min. : 4601 Min. : 3 F : 7526710 Min. :1917
## 1st Qu.: 8738406 1st Qu.: 166711 M :21292159 1st Qu.:1982
## Median :17508282 Median : 403540 NA's: 66903 Median :1990
## Mean :17385771 Mean : 625403 Mean :1987
## 3rd Qu.:26055293 3rd Qu.: 706031 3rd Qu.:1995
## Max. :34524981 Max. :4044423 Max. :2023
## NA's :80511
## Inicio_del_viaje Fin_del_viaje
## Min. :2014-12-01 00:33:47.00 Min. :2014-12-01 00:36:54.00
## 1st Qu.:2018-12-14 16:58:07.75 1st Qu.:2018-12-14 17:10:19.75
## Median :2020-11-03 19:34:07.50 Median :2020-11-03 19:47:47.00
## Mean :2020-10-14 01:13:11.29 Mean :2020-10-14 01:28:46.64
## 3rd Qu.:2022-11-10 09:16:59.75 3rd Qu.:2022-11-10 09:26:04.75
## Max. :2024-06-30 23:59:48.00 Max. :2024-07-01 00:32:07.00
##
## Origen_Id Destino_Id Tiempo_viaje Minutos_viaje
## Min. : 2.0 Min. : 2.0 Length:28885772 Min. : 0.0
## 1st Qu.: 49.0 1st Qu.: 49.0 Class :difftime 1st Qu.: 5.7
## Median : 84.0 Median : 80.0 Mode :numeric Median : 9.4
## Mean :124.8 Mean :123.9 Mean : 15.6
## 3rd Qu.:197.0 3rd Qu.:198.0 3rd Qu.: 14.6
## Max. :390.0 Max. :390.0 Max. :629228.8
##
## Hora_inicio
## Length:28885772
## Class1:hms
## Class2:difftime
## Mode :numeric
##
##
##
There seem to be some values way ouf of range for the travel time variable. We will use a practical approach to cut them out, focusing on trips that have a duration of less than 120 minutes. For reference, MiBici limits the duration of free travels to 45 minutes. Above this the user will have to pay for the additional minutes of travel.
It is also worth notice that trips over 120 minutes only account for 0.06% of total trips.
dfMB_ov2hrs_ttime <- filter(dfMiBici, Minutos_viaje > 120)
dfMiBici <- filter(dfMiBici, Minutos_viaje <= 120) #filter out travel time outliers above 2 hrs.
Initial Plots
ggplot(dfMiBici, aes(x=year(Inicio_del_viaje))) + geom_histogram(fill="blue", alpha=0.6) + labs(title = "MiBici trips by year", x = "Year")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(filter(dfMiBici, year(Inicio_del_viaje) != 2024),
aes(x=month(Inicio_del_viaje))) + geom_histogram(fill="coral3", alpha=0.6) + labs(title = "MiBici total trips by month", x = "Month")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
ggplot(dfMiBici, aes(x=Minutos_viaje)) + geom_histogram(color = "darkgrey", fill="orange", alpha = 0.6)+ labs(title = "MiBici travel time histogram", x = "Minutes of travel")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
This initial plots show us a general picture for travel patterns. Since the start of operations, MiBici has shown steady user increase year by year, interrupted only in 2020, notably as an effect of the COVID-19 pandemic. Data for year 2024 only includes the first semester of the year.
Travel distribution throughout the year shows minor variations. It could be argued that warmer months (April to July) have a slight effect in decreasing bike use, as well as holidays in December.
Finally, time of travel rarely extends more than 50 minutes, most of the trips last less than 20 minutes, with a mean of 10.86 and a median of 9.38 minutes.
Add trip year field Add aprox. age of user at the moment of trip
dfMiBici <- dfMiBici %>%
mutate(Year_trip = year(Inicio_del_viaje)) %>%
mutate(Age_aprox = Year_trip - Anio_de_nacimiento)
ggplot(dfMiBici, aes(x=Age_aprox)) + geom_histogram(color = "darkgrey", fill="darkolivegreen", alpha = 0.6)+ labs(title = "MiBici users age", x = "Years of age at the time of travel")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
## Warning: Removed 78694 rows containing non-finite outside the scale range
## (`stat_bin()`).
ggplot(dfMiBici, aes(x=Genero, fill = Genero)) + geom_bar(alpha=.6) + labs(title="Users by Gender (Male-Female)", x = "Gender / Sex")
As we see in the two previous plots, if we account for the user-trip unit of analysis, there seem to be predominant characteristics of MiBici users. Most of them are males (almost 74%), between the ages of 25 and 35 years old.
sm_years <- dfMiBici %>%
group_by(Year_trip, Genero) %>%
summarise(Ttrips = n())
## `summarise()` has grouped output by 'Year_trip'. You can override using the
## `.groups` argument.
sm_years <- pivot_wider(sm_years, names_from = Genero, values_from = Ttrips)
colnames(sm_years) <- c("Year_trip", "Fem", "Male", "NAs")
sm_years <- mutate(sm_years, Total_trips = sum(Fem, Male, NAs, na.rm = T))
sm_origins <- dfMiBici %>%
group_by(Year_trip, Origen_Id) %>%
summarise(Ttrips_orig = n())
## `summarise()` has grouped output by 'Year_trip'. You can override using the
## `.groups` argument.
sm_origins <- pivot_wider(sm_origins, names_from = Year_trip, values_from = Ttrips_orig)
sm_destin <- dfMiBici %>%
group_by(Year_trip, Destino_Id) %>%
summarise(Ttrips_dest = n())
## `summarise()` has grouped output by 'Year_trip'. You can override using the
## `.groups` argument.
sm_destin <- pivot_wider(sm_destin, names_from = Year_trip, values_from = Ttrips_dest)
To finalize this phase of the analysis we show summarized data frames. First we see the total travel by year and gender, then we show how this yearly totals are distributed beetwen origins and destinations.
sm_years
## # A tibble: 11 × 5
## # Groups: Year_trip [11]
## Year_trip Fem Male NAs Total_trips
## <dbl> <int> <int> <int> <int>
## 1 2014 4299 19616 5 23920
## 2 2015 105900 363305 236 469441
## 3 2016 217578 722421 207 940206
## 4 2017 615509 1896641 NA 2512150
## 5 2018 859585 2540759 NA 3400344
## 6 2019 1189202 3437067 6243 4632512
## 7 2020 747276 2113807 3216 2864299
## 8 2021 864118 2314735 4810 3183663
## 9 2022 1134489 3049509 18144 4202142
## 10 2023 1153882 3117552 20528 4291962
## 11 2024 629938 1703281 13168 2346387
sm_origins
## # A tibble: 375 × 12
## Origen_Id `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022`
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 2 245 4741 8811 18238 23595 21387 11673 11826 12852
## 2 3 240 4186 6052 16009 19914 16351 9464 10245 15928
## 3 4 293 6271 8300 16957 24238 26988 13042 15352 22096
## 4 5 117 2572 4440 11292 11971 13408 9214 11550 16411
## 5 6 191 3453 4616 8463 10290 12528 9843 10470 11669
## 6 8 143 2869 4364 7069 9793 11318 7898 7078 9451
## 7 9 221 4001 7429 15853 19627 22421 14069 14497 17400
## 8 10 197 4335 7129 10901 14913 17814 12281 13259 17388
## 9 11 390 10035 19008 38576 57167 72230 41351 46792 65839
## 10 12 166 5586 8438 15521 20661 27218 19338 19622 23863
## # ℹ 365 more rows
## # ℹ 2 more variables: `2023` <int>, `2024` <int>
sm_destin
## # A tibble: 375 × 12
## Destino_Id `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022`
## <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
## 1 2 245 4454 8663 18704 24857 22980 12698 13131 14198
## 2 3 219 3732 5779 14204 18646 14031 7992 8839 13821
## 3 4 268 5677 7953 15572 20425 24341 12703 15073 20598
## 4 5 112 2120 3740 10762 11593 13759 9042 10956 16244
## 5 6 246 4753 6909 11108 13500 16343 12438 12378 14629
## 6 8 163 3256 4937 8140 10839 13085 8999 8239 10825
## 7 9 228 4430 8080 17557 21819 23898 15610 16520 19135
## 8 10 234 4437 7544 11703 15515 19130 12825 13974 18785
## 9 11 331 9689 18822 35528 50161 68669 38005 43293 61166
## 10 12 176 5806 8980 16278 21668 28499 20884 21455 25088
## # ℹ 365 more rows
## # ℹ 2 more variables: `2023` <int>, `2024` <int>