MiBici Public Bikeshare System

Intro

MiBici, Guadalajara’s public bicycle-sharing system, began operations in the year 2014. Since the its implementation, the system has been growing in geographical coverage and operational capacity. This is an initial analysis that aims to describe the main characteristics of the system and its use patterns. This will serve as a base to build further models and analysis to inform future infrastructure planning and evaluations.

Initial Setup and Data Collection

Libraries and Setup

Adding libraries rearr, dplyr, ggplot2, lubridate and tidyr

library(readr)
library(dplyr)

## 
## Adjuntando el paquete: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(lubridate)

## 
## Adjuntando el paquete: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tidyr)

MiBici Stations List

MiBici is based on the operation of a network of stations where users can access the use of bicycles for a certain period of time to make their daily trips.According to official information, it has 3,972 bicycles and 360 stations, which are the origin-destination base of the system’s trips.

In the following lines we will download the list and location of the system’s stations, as well as the historical record of trips from the start of operations until the first half of 2024.

if(!file.exists("nomenclatura_2024_07.csv")) {
     download.file("https://www.mibici.net/site/assets/files/1118/nomenclatura_2024_07.csv",
                   "nomenclatura_2024_07.csv")
} else mbpoints <- read.csv("nomenclatura_2024_07.csv", encoding = "latin1")

URLs List

Historical trip data is available on the MiBici open data page. Each file corresponds to a single month of operational data. However, to automatize the download process the URLs must be constructed taking into account a numerical string contained in each one, which does not respond to an easily identifiable pattern.

This section will gather a URL list to effectively download data sets for years 2014 to 2024. Note that year 2014 only includes data from December, and year 2024 goes only from January to June.

Y2014

mb2014 <- "https://mibici.net/site/assets/files/1079/datos_abiertos_2014_12.csv"

Y2015

rango2015 <- c(1084,1097:1107)
mb2015 <- NULL
endf <- 0
for (i in rango2015) {
      endf <- endf+1
      urlmb <- paste("https://mibici.net/site/assets/files/", 
                     i, "/datos_abiertos_2015_", 
                     ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
      mb2015 <- rbind(mb2015, urlmb)
}

Y2016

mb2016 <- c("https://www.mibici.net/site/assets/files/1020/datos_abiertos_2016_01-1.csv",
            "https://www.mibici.net/site/assets/files/1021/datos_abiertos_2016_02.csv",
            "https://www.mibici.net/site/assets/files/1023/datos_abiertos_2016_03.csv",
            "https://www.mibici.net/site/assets/files/1029/datos_abiertos_2016_04.csv",
            "https://www.mibici.net/site/assets/files/1058/datos_abiertos_2016_05.csv",
            "https://www.mibici.net/site/assets/files/1080/datos_abiertos_2016_06.csv",
            "https://www.mibici.net/site/assets/files/1081/datos_abiertos_2016_07.csv",
            "https://www.mibici.net/site/assets/files/1082/datos_abiertos_2016_08.csv",
            "https://www.mibici.net/site/assets/files/1083/datos_abiertos_2016_09.csv",
            "https://www.mibici.net/site/assets/files/1108/datos_abiertos_2016_10.csv",
            "https://www.mibici.net/site/assets/files/1109/datos_abiertos_2016_11.csv",
            "https://www.mibici.net/site/assets/files/1110/datos_abiertos_2016_12.csv")

mb2016 <- as.data.frame(mb2016)

colnames(mb2016) <- "V1"

Y2017

rango2017 <- c(1022,1111,1112,1115,1116,1119,1120,1122:1124,1197,1198)
mb2017 <- NULL
endf <- 0
for (i in rango2017) {
      endf <- endf+1
      urlmb <- paste("https://mibici.net/site/assets/files/", 
                     i, "/datos_abiertos_2017_", 
                     ifelse(endf<4, paste0("0", endf), 
                            ifelse(endf==4, paste("04-1"), ifelse(endf<10, paste0("0", endf),endf))),".csv", sep = "")
      mb2017 <- rbind(mb2017, urlmb)
}

Y2018

rango2018 <- c(1200:1211)
mb2018 <- NULL
endf <- 0
for (i in rango2018) {
      endf <- endf+1
      urlmb <- paste("https://mibici.net/site/assets/files/", 
                     i, "/datos_abiertos_2018_", 
                     ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
      mb2018 <- rbind(mb2018, urlmb)
}

Y2019

rango2019 <- c(1214, 1217:1225, 1227,1228)
mb2019 <- NULL
endf<-0
for (i in rango2019) {
      endf <- endf+1
      urlmb <- paste("https://mibici.net/site/assets/files/", 
                     i, "/datos_abiertos_2019_", 
                     ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
      mb2019 <- rbind(mb2019, urlmb)
}

Y2020

rango2020 <- c(1230:1232, 1235:1241, 1317, 1318)
mb2020 <- NULL
endf<-0
for (i in rango2020) {
      endf <- endf+1
      urlmb <- paste("https://mibici.net/site/assets/files/", 
                     i, "/datos_abiertos_2020_", 
                     ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
      mb2020 <- rbind(mb2020, urlmb)
}

Y2021

rango2021 <- c(1320,1323,1322,2129,3292,4572,5728,7073,8461,10088,11538,12780)
mb2021 <- NULL
endf<-0
for (i in rango2021) {
      endf <- endf+1
      urlmb <- paste("https://mibici.net/site/assets/files/", 
                     i, "/datos_abiertos_2021_", 
                     ifelse(endf<10, paste0("0", endf), endf), ".csv", sep = "")
      mb2021 <- rbind(mb2021, urlmb)
}

Y2022-2024

mb2022 <- c(
  "https://mibici.net/site/assets/files/14797/datos_abiertos_2022_01.csv",
  "https://mibici.net/site/assets/files/16831/datos_abiertos_2022_02.csv",
  "https://mibici.net/site/assets/files/19034/datos_abiertos_2022_03.csv",
  "https://mibici.net/site/assets/files/20338/datos_abiertos_2022_04.csv",
  "https://mibici.net/site/assets/files/21842/datos_abiertos_2022_05.csv",
  "https://mibici.net/site/assets/files/23473/datos_abiertos_2022_06.csv",
  "https://mibici.net/site/assets/files/25235/datos_abiertos_2022_07.csv",
  "https://mibici.net/site/assets/files/27484/datos_abiertos_2022_08.csv",
  "https://mibici.net/site/assets/files/29663/datos_abiertos_2022_09.csv",
  "https://mibici.net/site/assets/files/31507/datos_abiertos_2022_10.csv",
  "https://mibici.net/site/assets/files/33115/datos_abiertos_2022_11.csv", 
  "https://mibici.net/site/assets/files/34432/datos_abiertos_2022_12.csv")

mb2023 <- c (
  "https://mibici.net/site/assets/files/36762/datos_abiertos_2023_01.csv",
  "https://mibici.net/site/assets/files/40197/datos_abiertos_2023_02.csv",
  "https://mibici.net/site/assets/files/43035/datos_abiertos_2023_03.csv",
  "https://mibici.net/site/assets/files/46758/datos_abiertos_2023_04.csv",
  "https://mibici.net/site/assets/files/46759/datos_abiertos_2023_05.csv",
  "https://mibici.net/site/assets/files/48116/datos_abiertos_2023_06.csv",
  "https://mibici.net/site/assets/files/49448/datos_abiertos_2023_07.csv",
  "https://mibici.net/site/assets/files/51295/datos_abiertos_2023_08.csv",
  "https://mibici.net/site/assets/files/53344/datos_abiertos_2023_09.csv",
  "https://mibici.net/site/assets/files/54935/datos_abiertos_2023_10.csv",
  "https://mibici.net/site/assets/files/57344/datos_abiertos_2023_11.csv",
  "https://mibici.net/site/assets/files/58715/datos_abiertos_2023_12.csv")

mb2024 <- c(
  "https://mibici.net/site/assets/files/61450/datos_abiertos_2024_01.csv",
  "https://mibici.net/site/assets/files/77251/datos_abiertos_2024_02.csv",
  "https://mibici.net/site/assets/files/77252/datos_abiertos_2024_03.csv",
  "https://mibici.net/site/assets/files/80028/datos_abiertos_2024_04.csv",
  "https://mibici.net/site/assets/files/81663/datos_abiertos_2024_05.csv",
  "https://mibici.net/site/assets/files/83901/datos_abiertos_2024_06.csv")

Download and bind Data by Year

Once the URLs are listed, we can proceed to download the data.

url14_17 <- rbind(mb2014,mb2015,mb2016,mb2017)

if(!file.exists("MiBici_2014_2017.csv")) {
  df2014_17 <- NULL
  for (i in url14_17) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2014_17 <- rbind(df2014_17, dfmibici)
  }
  rm(dfMiBici)
  write_csv(df2014_17, "MiBici_2014_2017.csv")
} else df2014_17 <- read.csv("MiBici_2014_2017.csv", encoding = "latin1")

if(!file.exists("MiBici_2018.csv")) {
  df2018 <- NULL
  for (i in mb2018) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2018 <- rbind(df2018, dfmibici)
  }
  rm(dfmibici)
  write_csv(df2018, "MiBici_2018.csv")
} else df2018 <- read.csv("MiBici_2018.csv", encoding = "latin1")

if(!file.exists("MiBici_2019.csv")) {
  df2019 <- NULL
  for (i in mb2019) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2019 <- rbind(df2019, dfmibici)
  }
  rm(dfmibici)
  write_csv(df2019, "MiBici_2019.csv")
} else df2019 <- read.csv("MiBici_2019.csv", encoding = "latin1")

if(!file.exists("MiBici_2020.csv")) {
  df2020 <- NULL
  for (i in mb2020) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2020 <- rbind(df2020, dfmibici)
  }
  rm(dfmibici)
  write_csv(df2020, "MiBici_2020.csv")
} else df2020 <- read.csv("MiBici_2020.csv", encoding = "latin1")

if(!file.exists("MiBici_2021.csv")) {
  df2021 <- NULL
  for (i in mb2021) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2021 <- rbind(df2021, setNames(dfmibici, names(df2020)))
  }
  rm(dfmibici)
  write_csv(df2021, "MiBici_2021.csv")
} else df2021 <- read.csv("MiBici_2021.csv", encoding = "latin1")

if(!file.exists("MiBici_2022.csv")) {
  df2022 <- NULL
  for (i in mb2022) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2022 <- rbind(df2022, setNames(dfmibici, names(df2021)))
  }
  rm(dfmibici)
  write_csv(df2022, "MiBici_2022.csv")
} else df2022 <- read.csv("MiBici_2022.csv", encoding = "latin1")

# Originaly the data set for 2023 has a combination of different formats in the date columns
# apparently this only happens in November, so we change the strategy here to treat that DF 
# sepparetly 

mb2023a <- mb2023[1:10]
mb2023b <- mb2023[11]
mb2023c <- mb2023[12]

if(!file.exists("MiBici_2023a.csv")) {
  df2023a <- NULL
  for (i in mb2023a) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2023a <- rbind(df2023a, setNames(dfmibici, names(df2021)))
  }
  rm(dfmibici)
  write_csv(df2023a, "MiBici_2023a.csv")
} else df2023a <- read.csv("MiBici_2023a.csv", encoding = "latin1")

if(!file.exists("MiBici_2023b.csv")) {
  df2023b <- NULL|
  for (i in mb2023b) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2023b <- rbind(df2023b, setNames(dfmibici, names(df2021)))
  }
  rm(dfmibici)
  write_csv(df2023b, "MiBici_2023b.csv")
} else df2023b <- read.csv("MiBici_2023b.csv", encoding = "latin1")


if(!file.exists("MiBici_2023c.csv")) {
  df2023c <- NULL
  for (i in mb2023c) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2023c <- rbind(df2023c, setNames(dfmibici, names(df2021)))
  }
  rm(dfmibici)
  write_csv(df2023c, "MiBici_2023c.csv")
} else df2023c <- read.csv("MiBici_2023c.csv", encoding = "latin1")

if(!file.exists("MiBici_2024.csv")) {
  df2024 <- NULL
  for (i in mb2024) {
        dfmibici <- read.csv(i, encoding = "latin1")
        df2024 <- rbind(df2024, setNames(dfmibici, names(df2021)))
  }
  rm(dfmibici)
  write_csv(df2024, "MiBici_2024.csv")
} else df2024 <- read.csv("MiBici_2024.csv", encoding = "latin1")

Adjust Date Columns data type

Data for 11.2023 comes from origin with a different date-time format, we will correct this before proceeding to binding together the data frames for the different years.

df2023b$Inicio_del_viaje = as_datetime(df2023b$Inicio_del_viaje, format = "%d/%m/%Y %H:%M")
df2023b$Fin_del_viaje = as_datetime(df2023b$Fin_del_viaje, format = "%d/%m/%Y %H:%M")

Grouping data frames according to their data-time column format before correcting.

dfMiBici_a <- bind_rows(df2014_17, df2018, df2019, df2020) #check datetime formating
dfMiBici_b <- bind_rows(df2021, df2022) #check datetime formating

dfMiBici_a$Inicio_del_viaje <- as_datetime(dfMiBici_a$Inicio_del_viaje)
dfMiBici_a$Fin_del_viaje <- as_datetime(dfMiBici_a$Fin_del_viaje)

dfMiBici_b$Inicio_del_viaje <- as_datetime(dfMiBici_b$Inicio_del_viaje)
dfMiBici_b$Fin_del_viaje <- as_datetime(dfMiBici_b$Fin_del_viaje)

dfMiBici_ab <- bind_rows(dfMiBici_a, dfMiBici_b) #check datetime formatting
rm(dfMiBici_a, dfMiBici_b)
gc()

##             used   (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells  47631042 2543.8   92717744 4951.7  56382687 3011.2
## Vcells 514237582 3923.4  739172296 5639.5 737983057 5630.4

df2023a$Inicio_del_viaje <- as_datetime(df2023a$Inicio_del_viaje)
df2023a$Fin_del_viaje <- as_datetime(df2023a$Fin_del_viaje)

df2023c$Inicio_del_viaje <- as_datetime(df2023c$Inicio_del_viaje)
df2023c$Fin_del_viaje <- as_datetime(df2023c$Fin_del_viaje)

df2024$Inicio_del_viaje <- as_datetime(df2024$Inicio_del_viaje)
df2024$Fin_del_viaje <- as_datetime(df2024$Fin_del_viaje)

Binding data frames 2014-2024

dfMiBici <- bind_rows(dfMiBici_ab, df2023a, df2023b, df2023c, df2024)
  
rm(df2014_17, df2018, df2019, df2020, df2021, df2022, df2023a, df2023b, df2023c, df2024, dfMiBici_ab)

gc()

##             used   (Mb) gc trigger   (Mb)  max used   (Mb)
## Ncells    937822   50.1   74174196 3961.4  56382687 3011.2
## Vcells 208431670 1590.3  591337837 4511.6 737983057 5630.4

Data Exploration and Cleaning

summary(dfMiBici)

##     Viaje_Id          Usuario_Id         Genero          AÃ.o_de_nacimiento
##  Min.   :    4601   Min.   :      3   Length:28885772    Min.   :   1      
##  1st Qu.: 8738406   1st Qu.: 166711   Class :character   1st Qu.:1982      
##  Median :17508282   Median : 403540   Mode  :character   Median :1989      
##  Mean   :17385771   Mean   : 625403                      Mean   :1986      
##  3rd Qu.:26055293   3rd Qu.: 706031                      3rd Qu.:1994      
##  Max.   :34524981   Max.   :4044423                      Max.   :2021      
##                                                          NA's   :10897672  
##  Inicio_del_viaje                 Fin_del_viaje                   
##  Min.   :2014-12-01 00:33:47.00   Min.   :2014-12-01 00:36:54.00  
##  1st Qu.:2018-12-14 16:58:07.75   1st Qu.:2018-12-14 17:10:19.75  
##  Median :2020-11-03 19:34:07.50   Median :2020-11-03 19:47:47.00  
##  Mean   :2020-10-14 01:13:11.29   Mean   :2020-10-14 01:28:46.64  
##  3rd Qu.:2022-11-10 09:16:59.75   3rd Qu.:2022-11-10 09:26:04.75  
##  Max.   :2024-06-30 23:59:48.00   Max.   :2024-07-01 00:32:07.00  
##                                                                   
##    Origen_Id       Destino_Id    AÃƒ.o_de_nacimiento
##  Min.   :  2.0   Min.   :  2.0   Min.   :   1       
##  1st Qu.: 49.0   1st Qu.: 49.0   1st Qu.:1984       
##  Median : 84.0   Median : 80.0   Median :1992       
##  Mean   :124.8   Mean   :123.9   Mean   :1989       
##  3rd Qu.:197.0   3rd Qu.:198.0   3rd Qu.:1997       
##  Max.   :390.0   Max.   :390.0   Max.   :2023       
##                                  NA's   :18067697

There are a couple of minor issues with the Year of birth columns. We will fix column names and combine year of birth columns into one

colnames(dfMiBici) <- c("Viaje_Id","Usuario_Id","Genero", "Anio_de_nacimiento", "Inicio_del_viaje", "Fin_del_viaje","Origen_Id","Destino_Id", "Anio_de_nac2")

dfMiBici <- dfMiBici %>%
  mutate(Anio_de_nacimiento = ifelse(is.na(Anio_de_nacimiento), Anio_de_nac2, Anio_de_nacimiento))

dfMiBici <- dfMiBici[,1:8]

unique(dfMiBici$Genero)

## [1] "M"    "F"    NA     ""     "NULL"

Fix gender and year of birth values

dfMiBici <- dfMiBici %>%
  mutate(Genero = ifelse(Genero=="NULL", NA, Genero)) %>%
  mutate(Genero = ifelse(Genero=="", NA, Genero)) %>%
  mutate(Anio_de_nacimiento = ifelse(Anio_de_nacimiento < 1900, NA, Anio_de_nacimiento))

Feature Engenieering

We will transform the Gender variable to a Factor. Also create a few columns to facilitate travel time analysis.

dfMiBici <- dfMiBici %>%
  mutate(Genero = as.factor(Genero)) %>%
  mutate(Tiempo_viaje = Fin_del_viaje - Inicio_del_viaje) %>%
  mutate(Minutos_viaje = as.numeric(Tiempo_viaje/60)) %>%
  mutate(Minutos_viaje = round(Minutos_viaje, 2)) %>%
  mutate(Hora_inicio = hms::as_hms(Inicio_del_viaje))

summary(dfMiBici)

##     Viaje_Id          Usuario_Id       Genero         Anio_de_nacimiento
##  Min.   :    4601   Min.   :      3   F   : 7526710   Min.   :1917      
##  1st Qu.: 8738406   1st Qu.: 166711   M   :21292159   1st Qu.:1982      
##  Median :17508282   Median : 403540   NA's:   66903   Median :1990      
##  Mean   :17385771   Mean   : 625403                   Mean   :1987      
##  3rd Qu.:26055293   3rd Qu.: 706031                   3rd Qu.:1995      
##  Max.   :34524981   Max.   :4044423                   Max.   :2023      
##                                                       NA's   :80511     
##  Inicio_del_viaje                 Fin_del_viaje                   
##  Min.   :2014-12-01 00:33:47.00   Min.   :2014-12-01 00:36:54.00  
##  1st Qu.:2018-12-14 16:58:07.75   1st Qu.:2018-12-14 17:10:19.75  
##  Median :2020-11-03 19:34:07.50   Median :2020-11-03 19:47:47.00  
##  Mean   :2020-10-14 01:13:11.29   Mean   :2020-10-14 01:28:46.64  
##  3rd Qu.:2022-11-10 09:16:59.75   3rd Qu.:2022-11-10 09:26:04.75  
##  Max.   :2024-06-30 23:59:48.00   Max.   :2024-07-01 00:32:07.00  
##                                                                   
##    Origen_Id       Destino_Id    Tiempo_viaje      Minutos_viaje     
##  Min.   :  2.0   Min.   :  2.0   Length:28885772   Min.   :     0.0  
##  1st Qu.: 49.0   1st Qu.: 49.0   Class :difftime   1st Qu.:     5.7  
##  Median : 84.0   Median : 80.0   Mode  :numeric    Median :     9.4  
##  Mean   :124.8   Mean   :123.9                     Mean   :    15.6  
##  3rd Qu.:197.0   3rd Qu.:198.0                     3rd Qu.:    14.6  
##  Max.   :390.0   Max.   :390.0                     Max.   :629228.8  
##                                                                      
##  Hora_inicio      
##  Length:28885772  
##  Class1:hms       
##  Class2:difftime  
##  Mode  :numeric   
##                   
##                   
##

There seem to be some values way ouf of range for the travel time variable. We will use a practical approach to cut them out, focusing on trips that have a duration of less than 120 minutes. For reference, MiBici limits the duration of free travels to 45 minutes. Above this the user will have to pay for the additional minutes of travel.

It is also worth notice that trips over 120 minutes only account for 0.06% of total trips.

dfMB_ov2hrs_ttime <- filter(dfMiBici, Minutos_viaje > 120)
dfMiBici <- filter(dfMiBici, Minutos_viaje <= 120) #filter out travel time outliers above 2 hrs.

Visualizing Data

Initial Plots

ggplot(dfMiBici, aes(x=year(Inicio_del_viaje))) + geom_histogram(fill="blue", alpha=0.6) + labs(title = "MiBici trips by year", x = "Year")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(filter(dfMiBici, year(Inicio_del_viaje) != 2024), 
       aes(x=month(Inicio_del_viaje))) + geom_histogram(fill="coral3", alpha=0.6) + labs(title = "MiBici total trips by month", x = "Month")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

ggplot(dfMiBici, aes(x=Minutos_viaje)) + geom_histogram(color = "darkgrey", fill="orange", alpha = 0.6)+ labs(title = "MiBici travel time histogram", x = "Minutes of travel")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

This initial plots show us a general picture for travel patterns. Since the start of operations, MiBici has shown steady user increase year by year, interrupted only in 2020, notably as an effect of the COVID-19 pandemic. Data for year 2024 only includes the first semester of the year.

Travel distribution throughout the year shows minor variations. It could be argued that warmer months (April to July) have a slight effect in decreasing bike use, as well as holidays in December.

Finally, time of travel rarely extends more than 50 minutes, most of the trips last less than 20 minutes, with a mean of 10.86 and a median of 9.38 minutes.

Processing Summaries

More feature engenieering

Add trip year field Add aprox. age of user at the moment of trip

dfMiBici <- dfMiBici %>%
  mutate(Year_trip = year(Inicio_del_viaje)) %>%
  mutate(Age_aprox = Year_trip - Anio_de_nacimiento)

ggplot(dfMiBici, aes(x=Age_aprox)) + geom_histogram(color = "darkgrey", fill="darkolivegreen", alpha = 0.6)+ labs(title = "MiBici users age", x = "Years of age at the time of travel")

## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

## Warning: Removed 78694 rows containing non-finite outside the scale range
## (`stat_bin()`).

ggplot(dfMiBici, aes(x=Genero, fill = Genero)) + geom_bar(alpha=.6) + labs(title="Users by Gender (Male-Female)", x = "Gender / Sex")

As we see in the two previous plots, if we account for the user-trip unit of analysis, there seem to be predominant characteristics of MiBici users. Most of them are males (almost 74%), between the ages of 25 and 35 years old.

sm_years <- dfMiBici %>%
  group_by(Year_trip, Genero) %>%
  summarise(Ttrips = n())

## `summarise()` has grouped output by 'Year_trip'. You can override using the
## `.groups` argument.

sm_years <- pivot_wider(sm_years, names_from = Genero, values_from = Ttrips)

colnames(sm_years) <- c("Year_trip", "Fem", "Male", "NAs")

sm_years <- mutate(sm_years, Total_trips = sum(Fem, Male, NAs, na.rm = T))

sm_origins <- dfMiBici %>%
  group_by(Year_trip, Origen_Id) %>%
  summarise(Ttrips_orig = n())

## `summarise()` has grouped output by 'Year_trip'. You can override using the
## `.groups` argument.

sm_origins <- pivot_wider(sm_origins, names_from = Year_trip, values_from = Ttrips_orig)

sm_destin <- dfMiBici %>%
  group_by(Year_trip, Destino_Id) %>%
  summarise(Ttrips_dest = n())

## `summarise()` has grouped output by 'Year_trip'. You can override using the
## `.groups` argument.

sm_destin <- pivot_wider(sm_destin, names_from = Year_trip, values_from = Ttrips_dest)

To finalize this phase of the analysis we show summarized data frames. First we see the total travel by year and gender, then we show how this yearly totals are distributed beetwen origins and destinations.

sm_years

## # A tibble: 11 × 5
## # Groups:   Year_trip [11]
##    Year_trip     Fem    Male   NAs Total_trips
##        <dbl>   <int>   <int> <int>       <int>
##  1      2014    4299   19616     5       23920
##  2      2015  105900  363305   236      469441
##  3      2016  217578  722421   207      940206
##  4      2017  615509 1896641    NA     2512150
##  5      2018  859585 2540759    NA     3400344
##  6      2019 1189202 3437067  6243     4632512
##  7      2020  747276 2113807  3216     2864299
##  8      2021  864118 2314735  4810     3183663
##  9      2022 1134489 3049509 18144     4202142
## 10      2023 1153882 3117552 20528     4291962
## 11      2024  629938 1703281 13168     2346387

sm_origins

## # A tibble: 375 × 12
##    Origen_Id `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022`
##        <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
##  1         2    245   4741   8811  18238  23595  21387  11673  11826  12852
##  2         3    240   4186   6052  16009  19914  16351   9464  10245  15928
##  3         4    293   6271   8300  16957  24238  26988  13042  15352  22096
##  4         5    117   2572   4440  11292  11971  13408   9214  11550  16411
##  5         6    191   3453   4616   8463  10290  12528   9843  10470  11669
##  6         8    143   2869   4364   7069   9793  11318   7898   7078   9451
##  7         9    221   4001   7429  15853  19627  22421  14069  14497  17400
##  8        10    197   4335   7129  10901  14913  17814  12281  13259  17388
##  9        11    390  10035  19008  38576  57167  72230  41351  46792  65839
## 10        12    166   5586   8438  15521  20661  27218  19338  19622  23863
## # ℹ 365 more rows
## # ℹ 2 more variables: `2023` <int>, `2024` <int>

sm_destin

## # A tibble: 375 × 12
##    Destino_Id `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022`
##         <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
##  1          2    245   4454   8663  18704  24857  22980  12698  13131  14198
##  2          3    219   3732   5779  14204  18646  14031   7992   8839  13821
##  3          4    268   5677   7953  15572  20425  24341  12703  15073  20598
##  4          5    112   2120   3740  10762  11593  13759   9042  10956  16244
##  5          6    246   4753   6909  11108  13500  16343  12438  12378  14629
##  6          8    163   3256   4937   8140  10839  13085   8999   8239  10825
##  7          9    228   4430   8080  17557  21819  23898  15610  16520  19135
##  8         10    234   4437   7544  11703  15515  19130  12825  13974  18785
##  9         11    331   9689  18822  35528  50161  68669  38005  43293  61166
## 10         12    176   5806   8980  16278  21668  28499  20884  21455  25088
## # ℹ 365 more rows
## # ℹ 2 more variables: `2023` <int>, `2024` <int>

MiBici Analitics

Luis Santana

03.09.2024