Bike Data Cleaning Notebook

About this notebook

I’ll use this notebook to download and clean MetroBike data from the city of Austin. My goal is to draw insights about how Austin’s bike share program is used.

About the data:

I got the data from the city of Austin’s open data portal. It contains trip data for the Austin MetroBike bicycle sharing program dating from 2017 to 2023. It has 1.8 million rows.

The Austin MetroBike program is a partnership between CapMetro, Austin’s public transit authority, and BCycle, a public bicycle sharing company based in Wisconsin. The partnership provides Austinites with bikes for rent at kiosks throughout the city, which can be taken for rides and returned to any other kiosk around the city. The city partnered with BCycle in 2020 to create the partnership. You can learn more about the system on the MetroBike website.

Goals of this Notebook:

Import data
Clean dates/column names

Setup

I’ll be using these packages to clean my data.

library("tidyverse")

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
## ✔ tibble  3.1.8      ✔ dplyr   1.0.10
## ✔ tidyr   1.2.0      ✔ stringr 1.4.1 
## ✔ readr   2.1.2      ✔ forcats 0.5.2 
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library("lubridate")

## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library("janitor")

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

Import data

I’ll import my MetroBike data here.

raw_bikes <- read.csv("data-raw/metrobike_trips.csv")

raw_bikes

Clean data

First, I’ll clean up the names of the columns so they are uniform.

named_bikes <- raw_bikes |> 
  clean_names()

Next, I’ll create a column for the checkout date using lubridate. I’ll also get rid of the bike type column, since there isn’t any useful data there (everything is N/A).

clean_bikes <- named_bikes |> 
  mutate(date_checkout = mdy(checkout_date)) |> 
  select(-bike_type, -month, -year, -checkout_date, trip_id)
  
clean_bikes #taking a peek at my results

Export data

Now, I’ll export my data to a new R notebook where I’ll analyze it.

clean_bikes |> 
  write_rds("data-processed/01-bikes.rds")