I’ll use this notebook to download and clean MetroBike data from the city of Austin. My goal is to draw insights about how Austin’s bike share program is used.
I got the data from the city of Austin’s open data portal. It contains trip data for the Austin MetroBike bicycle sharing program dating from 2017 to 2023. It has 1.8 million rows.
The Austin MetroBike program is a partnership between CapMetro, Austin’s public transit authority, and BCycle, a public bicycle sharing company based in Wisconsin. The partnership provides Austinites with bikes for rent at kiosks throughout the city, which can be taken for rides and returned to any other kiosk around the city. The city partnered with BCycle in 2020 to create the partnership. You can learn more about the system on the MetroBike website.
I’ll be using these packages to clean my data.
library("tidyverse")
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.0 ✔ stringr 1.4.1
## ✔ readr 2.1.2 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library("lubridate")
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library("janitor")
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
I’ll import my MetroBike data here.
raw_bikes <- read.csv("data-raw/metrobike_trips.csv")
raw_bikes
First, I’ll clean up the names of the columns so they are uniform.
named_bikes <- raw_bikes |>
clean_names()
Next, I’ll create a column for the checkout date using lubridate. I’ll also get rid of the bike type column, since there isn’t any useful data there (everything is N/A).
clean_bikes <- named_bikes |>
mutate(date_checkout = mdy(checkout_date)) |>
select(-bike_type, -month, -year, -checkout_date, trip_id)
clean_bikes #taking a peek at my results
Now, I’ll export my data to a new R notebook where I’ll analyze it.
clean_bikes |>
write_rds("data-processed/01-bikes.rds")