Divvy Case Study

This is my analysis for the GOOGLE Data Analysis Professional Certificate Cyclistic Case Study hosted by COURSERA

SCENARIO

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

Characters and teams
● Cyclistic: A bike-share program that features more than 5,800 bicycles and 600 docking stations. Cyclistic sets itself apart by also offering reclining bikes, hand tricycles, and cargo bikes, making bike-share more inclusive to people with disabilities and riders who can’t use a standard two-wheeled bike. The majority of riders opt for traditional bikes; about 8% of riders use the assistive options. Cyclistic users are more likely to ride for leisure, but about 30% use them to commute to work each day.
● Lily Moreno: The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
● Cyclistic marketing analytics team: A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy. You joined this team six months ago and have been busy learning about Cyclistic’s mission and business goals — as well as how you, as a junior data analyst, can help Cyclistic achieve them.
● Cyclistic executive team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

Moreno has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

PREPARE & PROCESS

Lets load some packages

library(DescTools)
library(readr)
library(tidyverse)
library(lubridate)

The reliability of the datasource can be varified utilzing the Reliable, Original, Comprehensive, Current, and Cited (R.O.C.C.C) method

Reliable: The data was provided from the Cyclistic marketing analytic team.
Original: The data is orginal due to it being gathered firsthand from the Cyclistic marketing analytic team.
Comprehensive: This data is comprehensive because it contains all fields needed to determine differences between Casual and Member ride share with no NULL fields.
Current: The data provided was previously udated on a quaterly basis then on a monthly basis up to January 2023.
Cited: the data set provided is publicly accessible hosted by the Cyclistic marketing analytic team.

Now lets load the last 12 months of the Divvy trip data. Link to data source bin here

Then assign the dataset to their respective months.

January_22 <- tibble(read.csv("202201-divvy-tripdata.csv"))
February_22 <- tibble(read.csv("202202-divvy-tripdata.csv"))
March_22 <- tibble(read.csv("202203-divvy-tripdata.csv"))
April_22 <- tibble(read.csv('202204-divvy-tripdata.csv'))
May_22 <- tibble(read.csv("202205-divvy-tripdata.csv"))
June_22 <- tibble(read.csv("202206-divvy-tripdata.csv"))
July_22 <- tibble(read.csv('202207-divvy-tripdata.csv'))
August_22 <- tibble(read.csv('202208-divvy-tripdata.csv'))
September_22 <- tibble(read.csv("202209-divvy-publictripdata.csv"))
October_22 <- tibble(read.csv('202210-divvy-tripdata.csv'))
November_22 <- tibble(read.csv('202211-divvy-tripdata.csv'))
December_22 <- tibble(read.csv('202212-divvy-tripdata.csv'))

Then I binded them into 12 month dataset named ‘df’,

df <- bind_rows(January_22, February_22, March_22, April_22, May_22, June_22, July_22, August_22, September_22, October_22, November_22, December_22)

Now I need to find the Ride length so I used difftime to create a new column Ride_Length in Minutes

df$Ride_Length <- (difftime(df$ended_at, df$started_at, units = 'mins'))
summary(df$Ride_Length)

##   Length    Class     Mode 
##  5667717 difftime  numeric

Then I created a week day field with wday to find which day the bicycle was rented on.

df$Started_at_weekday <- wday(df$started_at, label = TRUE, abbr = FALSE)
summary(df$Started_at_weekday)

##    Sunday    Monday   Tuesday Wednesday  Thursday    Friday  Saturday 
##    776259    751014    782372    798223    841591    801787    916471

summary of df dataset

summary(df)

##    ride_id          rideable_type       started_at          ended_at        
##  Length:5667717     Length:5667717     Length:5667717     Length:5667717    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  start_station_name start_station_id   end_station_name   end_station_id    
##  Length:5667717     Length:5667717     Length:5667717     Length:5667717    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    start_lat       start_lng         end_lat         end_lng      
##  Min.   :41.64   Min.   :-87.84   Min.   : 0.00   Min.   :-88.14  
##  1st Qu.:41.88   1st Qu.:-87.66   1st Qu.:41.88   1st Qu.:-87.66  
##  Median :41.90   Median :-87.64   Median :41.90   Median :-87.64  
##  Mean   :41.90   Mean   :-87.65   Mean   :41.90   Mean   :-87.65  
##  3rd Qu.:41.93   3rd Qu.:-87.63   3rd Qu.:41.93   3rd Qu.:-87.63  
##  Max.   :45.64   Max.   :-73.80   Max.   :42.37   Max.   :  0.00  
##                                   NA's   :5858    NA's   :5858    
##  member_casual      Ride_Length       Started_at_weekday
##  Length:5667717     Length:5667717    Sunday   :776259  
##  Class :character   Class :difftime   Monday   :751014  
##  Mode  :character   Mode  :numeric    Tuesday  :782372  
##                                       Wednesday:798223  
##                                       Thursday :841591  
##                                       Friday   :801787  
##                                       Saturday :916471

Now that I finished wrangling the df dataset. I am now able to start analyzing.

ANALYZE

mean of the Ride Length

Mean(df$Ride_Length)

## Time difference of 19.44364 mins

The Max Ride Length

max(df$Ride_Length)

## Time difference of 41387.25 mins

The mode or most frequent week day the bicycles are utilized are

Mode(df$Started_at_weekday)

## [1] Saturday
## attr(,"freq")
## [1] 916471
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday

Finding the average ride length for members and casual riders.

I created a pivot table of ride length and member_casual field, filtering by Members and Casual riders respectively

Average ride length for members

df_memebrs_Ride_Length <- df %>%
  select(Ride_Length, member_casual) %>%
  filter(member_casual == 'member')

mean(df_memebrs_Ride_Length$Ride_Length)

## Time difference of 12.71062 mins

Average ride length for Casual riders

df_casual_Ride_Length <- df %>%
  select(Ride_Length, member_casual, Started_at_weekday) %>%
  filter(member_casual == 'casual')

mean(df_casual_Ride_Length$Ride_Length)

## Time difference of 29.14486 mins

Average ride length for each week day

Average_Ride_Length_Weekday <- df %>%
  select(Ride_Length, Started_at_weekday) %>%
  group_by(Started_at_weekday)%>%
  summarise(Average_Ride_Length = mean(Ride_Length)) %>%
  arrange(desc(Average_Ride_Length))


Average_Ride_Length_Weekday

## # A tibble: 7 × 2
##   Started_at_weekday Average_Ride_Length
##   <ord>              <drtn>             
## 1 Sunday             24.06769 mins      
## 2 Saturday           23.67755 mins      
## 3 Friday             19.00669 mins      
## 4 Monday             18.52467 mins      
## 5 Thursday           17.16476 mins      
## 6 Tuesday            16.74360 mins      
## 7 Wednesday          16.43830 mins

Number of riders per weekday

Num_Riders_weekday <- df %>%
  select(ride_id, Started_at_weekday) %>%
  group_by(Started_at_weekday)%>%
  summarise(ride_num = length(ride_id)) %>%
  arrange(desc(ride_num))

Num_Riders_weekday

## # A tibble: 7 × 2
##   Started_at_weekday ride_num
##   <ord>                 <int>
## 1 Saturday             916471
## 2 Thursday             841591
## 3 Friday               801787
## 4 Wednesday            798223
## 5 Tuesday              782372
## 6 Sunday               776259
## 7 Monday               751014

VISUALIZATIONS & SHARE

A link to my tableau dashboard of this Case Study here

ggplot(df, aes(x = '', fill = member_casual)) +
  geom_bar()+
  coord_polar('y')

ggplot(df, aes(x= Started_at_weekday)) + 
  geom_bar()

ggplot(df, aes(x= month(started_at))) + 
  geom_bar()

ggplot(df, aes(x = rideable_type, fill = member_casual)) +
  geom_bar()

RECCOMENDATIONS & ACT

Increase marketing campaign for the SUMMER months, due to JUNE - AUGUST being the busiest months in ridership for MEMBER and CASUAL riders.
MEMBERS tend to use service for transportation to and from work, based on hourly and weekly useage. Promote bike-share service as alternative workplace transportation to increase CASUAL ridership.
Cater to MEMBERS by investing in more CLASSIC bikes due to this bike type being utilized the most in total riders and mostly by MEMBERS.
Increase SINGLE DAY PASSES for weekends to promote MEMBERSHIP SUBSCRIPTION for CASUAL riders.

TABLEAU PUBLIC DASHBOARD

https://public.tableau.com/app/profile/elijah.sisson/viz/Google_Case_Study/OVERVIEWDASHBOARD