library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.1.0
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
bike <- read_rds("data-processed/01-bike.rds")
bike |> glimpse()
## Rows: 1,694,087
## Columns: 12
## $ trip_id <chr> "9900285854", "9900285855", "9900285856", "99002…
## $ membership_type <chr> "Annual (San Antonio B-cycle)", "24-Hour Kiosk (…
## $ bicycle_id <chr> "207", "969", "214", "745", "164", "37", "517", …
## $ checkout_time <time> 13:12:00, 13:12:00, 13:12:00, 13:12:00, 13:12:0…
## $ checkout_kiosk_id <chr> "2537", "2498", "2537", NA, "2538", NA, "2496", …
## $ checkout_kiosk <chr> "West & 6th St.", "Convention Center / 4th St. @…
## $ return_kiosk_id <chr> "2707", "2566", "2496", NA, NA, "2545", "2561", …
## $ return_kiosk <chr> "Rainey St @ Cummings", "Pfluger Bridge @ W 2nd …
## $ trip_duration_minutes <dbl> 76, 58, 8, 28, 15, 26, 35, 11, 0, 25, 10, 29, 34…
## $ month <dbl> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, …
## $ year <dbl> 2014, 2014, 2014, 2014, 2014, 2014, 2014, 2014, …
## $ checkout_date <date> 2014-10-26, 2014-10-26, 2014-10-26, 2014-10-26,…
These are columns in the original data, but some are NA, so I made new month and year columns. I also made the month column a character so I can get the month names when I make plots.
month_year <- bike |>
mutate(month_new = month(checkout_date), year_new = year(checkout_date)) |>
select(-"month", -"year") |>
mutate(month = month_new, year = year_new) |>
select(-"month_new", -"year_new") |>
mutate(month_name = month(checkout_date, label = TRUE)) # change month to name instead of number to use on charts
month_year
bike_membership <- bike |>
count(membership_type) |>
arrange(desc(n)) |>
mutate(times_used = n) |>
select(-n)
bike_membership
Walk Up is the most popular membership type, and the least popular are the 24-Hour Membership (Austin B-cycle) and the Heartland Pass (Monthly Pay). UT Student Membership is the third most popular.
Walk Up used to be called 24-Hour Kiosk.
month_year |>
filter(str_detect(membership_type, "24-Hour Kiosk")) |>
group_by(membership_type, year) |>
summarise(use = n()) |>
arrange(desc(use))
## `summarise()` has grouped output by 'membership_type'. You can override using
## the `.groups` argument.
month_year |>
filter(membership_type == "Walk Up" | membership_type == "24-Hour Kiosk (Austin B-cycle)") |>
select(membership_type, year) |>
group_by(year) |>
summarize(walk_up_uses = n()) |>
arrange(desc(walk_up_uses))
Walk Up memberships were most frequently used in 2014.
month_year |>
filter(membership_type == "U.T. Student Membership") |>
select(membership_type, year) |>
group_by(year) |>
summarize(ut_uses = n()) |>
arrange(desc(ut_uses))
UT student memberships were most frequently used in 2018. This is mostly because student memberships were free this year. The next highest year was 2019.
Lime scooters showed up in Austin April 16, 2018, according to the Texas Tribune.
month_year |>
filter(checkout_date < "2018-04-16", membership_type == "U.T. Student Membership") |>
summarize(uses_before_lime = n())
month_year |>
filter(checkout_date >= "2018-04-16", membership_type == "U.T. Student Membership") |>
summarise(uses_since_lime = n())
The number of rides by people with the UT student membership increased after Lime scooters arrived in Austin.
month_year |>
filter(checkout_date < "2018-04-16") |>
summarise(uses_before_lime = n())
month_year |>
filter(checkout_date >= "2018-04-16") |>
summarise(uses_after_lime = n())
Overall, the number of rides decreased by about 43,000 after Lime scooters came to Austin.
These are the years before and after Lime scooters arrived.
month_year |> filter(year == "2017") |>
summarize(total_rides = n())
month_year |>
filter(year == "2019") |>
summarise(total_rides = n())
The total number of rides was lower in 2019 than in 2017.
rides_since_2017 <- month_year |>
filter(year >= "2017") |>
group_by(year, month_name) |>
summarise(rides = n()) |>
arrange(desc(rides))
## `summarise()` has grouped output by 'year'. You can override using the
## `.groups` argument.
rides_since_2017
March 2018 had the highest number of rides.
month_year |>
filter(year >= "2017") |>
group_by(year) |>
summarise(rides = n())
The number of rides more than doubled from 2017 to 2018. Rides dropped by more than half from 2018 to 2019 but rose again in 2021.
The data for 2022 ends in June, so I want to see how the amount of rides in the first six months of each year compare.
month_year |>
filter(month <= 6) |>
group_by(year) |>
summarise(rides = n()) |>
arrange(desc(rides))
2020 had the fewest number of rides during this time frame, and 2018 and 2022 had the highest number.
MetroBike rides from UT Austin student membership holders more than doubled after Lime scooters appeared in the city in 2018, according to an analysis of data from the City of Austin open data portal.
month_year |> write_rds("data-processed/02-clean-bike.rds")
rides_since_2017 |> write_rds("data-processed/02-rides-since.rds")