Divvy is a bike share system that has hundreds of bikes and docking stations available for use all around the Chicagoland area
This case study uses historical bike trip data to analyze how the two different types of riders use the Divvy Bike Share system.
Divvy bike riders are grouped into two types of users: Casual and Member. Casual riders are those who purchase a single-ride pass or a day pass. Members are those riders who subscribe to a monthly or annual plan.
The purpose of this case study is to analyze how these two different types of riders use Divvy, and to recommend ways to convert Casual riders into Members
The data was obtained from Divvy via: https://divvybikes.com/system-data
Each trip is anonymized and includes:
Trip start day and time
Trip end day and time
Trip start station
Trip end station
Rider type (Member, Single Ride, and Day Pass)
The data has been processed to remove trips that are taken by staff as they service and inspect the system; and any trips that were below 60 seconds in length (potentially false starts or users trying to re-dock a bike to ensure it was secure).
My first step was to download the monthly trip data for 2022 from the Divvy website.
Then using RStudio I started a script to import the individual monthly .csv files in order to combine and clean the data
First I had to insall the R packages I would be using:
install.packages("tidyverse")
library(tidyverse)
library(lubridate)
library(ggplot2)
Then I read each monthly .csv file into individual data frames:
jan2022 <- read_csv("202201-divvy-tripdata.csv")
feb2022 <- read_csv("202202-divvy-tripdata.csv")
mar2022 <- read_csv("202203-divvy-tripdata.csv")
apr2022 <- read_csv("202204-divvy-tripdata.csv")
may2022 <- read_csv("202205-divvy-tripdata.csv")
jun2022 <- read_csv("202206-divvy-tripdata.csv")
jul2022 <- read_csv("202207-divvy-tripdata.csv")
aug2022 <- read_csv("202208-divvy-tripdata.csv")
sep2022 <- read_csv("202209-divvy-tripdata.csv")
oct2022 <- read_csv("202210-divvy-tripdata.csv")
nov2022 <- read_csv("202211-divvy-tripdata.csv")
dec2022 <- read_csv("202212-divvy-tripdata.csv")
Next, I combined these twelve data frames into one for the whole year’s data:
alltrips2022 <- bind_rows(jan2022, feb2022, mar2022, apr2022, may2022, jun2022, jul2022, aug2022, sep2022, oct2022, nov2022, dec2022)
With the new combined data frame I added a few new columns to assist with analysis:
#add a new column to the data frame called "date" that takes just the date portion of the "started_at" column
alltrips2022$date <- as.Date(alltrips2022$started_at, format="%m/%d/%Y")
#use the new date column to add 2 additional columns with just month and day respectively
alltrips2022$month <- format(as.Date(alltrips2022$date), "%m")
alltrips2022$day <- format(as.Date(alltrips2022$date), "%d")
#creates a new column titled day_of_week that contains the name of the day of the week
alltrips2022$day_of_week <- format(as.Date(alltrips2022$date), "%A")
I then, created another new column to calculate the length of each ride:
alltrips2022$ride_length <- difftime(alltrips2022$ended_at, alltrips2022$started_at)#calculates the difference in seconds between start and end times
alltrips2022$ride_length <- as.numeric(as.character(alltrips2022$ride_length))#converts the ride_length column values into numeric
I then performed some cleaning on the data to remove unnecessary data:
#this function cleans the data by removing any rows where the starting station is "HQ QR" or the ride length is a negative number
#it also saves the cleaned data into a new data frame titled alltrips2022_v2 in order to preserve old data
alltrips2022_v2 <- alltrips2022[!(alltrips2022$start_station_name == "HQ QR" | alltrips2022$ride_length<0),]
Finally, I export this new combined and cleaned data frame to a new .csv file:
write.csv(alltrips2022_v2, file = 'alltrips2022-divvy-tripdata.csv')
This clean data is now ready to import in to Tableau for future visualization