This is my analysis for the GOOGLE Data Analysis Professional Certificate Cyclistic Case Study hosted by COURSERA
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Characters and teams
● Cyclistic: A bike-share program that features more than 5,800 bicycles
and 600 docking stations. Cyclistic sets itself apart by also offering
reclining bikes, hand tricycles, and cargo bikes, making bike-share more
inclusive to people with disabilities and riders who can’t use a
standard two-wheeled bike. The majority of riders opt for traditional
bikes; about 8% of riders use the assistive options. Cyclistic users are
more likely to ride for leisure, but about 30% use them to commute to
work each day.
● Lily Moreno: The director of marketing and your manager. Moreno is
responsible for the development of campaigns and initiatives to promote
the bike-share program. These may include email, social media, and other
channels.
● Cyclistic marketing analytics team: A team of data analysts who are
responsible for collecting, analyzing, and reporting data that helps
guide Cyclistic marketing strategy. You joined this team six months ago
and have been busy learning about Cyclistic’s mission and business goals
— as well as how you, as a junior data analyst, can help Cyclistic
achieve them.
● Cyclistic executive team: The notoriously detail-oriented executive
team will decide whether to approve the recommended marketing
program.
Moreno has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
Lets load some packages
library(DescTools)
library(readr)
library(tidyverse)
library(lubridate)
The reliability of the datasource can be varified utilzing the Reliable, Original, Comprehensive, Current, and Cited (R.O.C.C.C) method
Reliable: The data was provided from the Cyclistic marketing analytic
team.
Original: The data is orginal due to it being gathered firsthand from
the Cyclistic marketing analytic team.
Comprehensive: This data is comprehensive because it contains all fields
needed to determine differences between Casual and Member ride share
with no NULL fields.
Current: The data provided was previously udated on a quaterly basis
then on a monthly basis up to January 2023.
Cited: the data set provided is publicly accessible hosted by the
Cyclistic marketing analytic team.
Now lets load the last 12 months of the Divvy trip data. Link to data source bin here
Then assign the dataset to their respective months.
January_22 <- tibble(read.csv("202201-divvy-tripdata.csv"))
February_22 <- tibble(read.csv("202202-divvy-tripdata.csv"))
March_22 <- tibble(read.csv("202203-divvy-tripdata.csv"))
April_22 <- tibble(read.csv('202204-divvy-tripdata.csv'))
May_22 <- tibble(read.csv("202205-divvy-tripdata.csv"))
June_22 <- tibble(read.csv("202206-divvy-tripdata.csv"))
July_22 <- tibble(read.csv('202207-divvy-tripdata.csv'))
August_22 <- tibble(read.csv('202208-divvy-tripdata.csv'))
September_22 <- tibble(read.csv("202209-divvy-publictripdata.csv"))
October_22 <- tibble(read.csv('202210-divvy-tripdata.csv'))
November_22 <- tibble(read.csv('202211-divvy-tripdata.csv'))
December_22 <- tibble(read.csv('202212-divvy-tripdata.csv'))
Then I binded them into 12 month dataset named ‘df’,
df <- bind_rows(January_22, February_22, March_22, April_22, May_22, June_22, July_22, August_22, September_22, October_22, November_22, December_22)
Now I need to find the Ride length so I used difftime to create a new column Ride_Length in Minutes
df$Ride_Length <- (difftime(df$ended_at, df$started_at, units = 'mins'))
summary(df$Ride_Length)
## Length Class Mode
## 5667717 difftime numeric
Then I created a week day field with wday to find which day the bicycle was rented on.
df$Started_at_weekday <- wday(df$started_at, label = TRUE, abbr = FALSE)
summary(df$Started_at_weekday)
## Sunday Monday Tuesday Wednesday Thursday Friday Saturday
## 776259 751014 782372 798223 841591 801787 916471
summary of df dataset
summary(df)
## ride_id rideable_type started_at ended_at
## Length:5667717 Length:5667717 Length:5667717 Length:5667717
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## start_station_name start_station_id end_station_name end_station_id
## Length:5667717 Length:5667717 Length:5667717 Length:5667717
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## start_lat start_lng end_lat end_lng
## Min. :41.64 Min. :-87.84 Min. : 0.00 Min. :-88.14
## 1st Qu.:41.88 1st Qu.:-87.66 1st Qu.:41.88 1st Qu.:-87.66
## Median :41.90 Median :-87.64 Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65 Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :45.64 Max. :-73.80 Max. :42.37 Max. : 0.00
## NA's :5858 NA's :5858
## member_casual Ride_Length Started_at_weekday
## Length:5667717 Length:5667717 Sunday :776259
## Class :character Class :difftime Monday :751014
## Mode :character Mode :numeric Tuesday :782372
## Wednesday:798223
## Thursday :841591
## Friday :801787
## Saturday :916471
Now that I finished wrangling the df dataset. I am now able to start analyzing.
mean of the Ride Length
Mean(df$Ride_Length)
## Time difference of 19.44364 mins
The Max Ride Length
max(df$Ride_Length)
## Time difference of 41387.25 mins
The mode or most frequent week day the bicycles are utilized are
Mode(df$Started_at_weekday)
## [1] Saturday
## attr(,"freq")
## [1] 916471
## 7 Levels: Sunday < Monday < Tuesday < Wednesday < Thursday < ... < Saturday
Finding the average ride length for members and casual riders.
I created a pivot table of ride length and member_casual field, filtering by Members and Casual riders respectively
Average ride length for members
df_memebrs_Ride_Length <- df %>%
select(Ride_Length, member_casual) %>%
filter(member_casual == 'member')
mean(df_memebrs_Ride_Length$Ride_Length)
## Time difference of 12.71062 mins
Average ride length for Casual riders
df_casual_Ride_Length <- df %>%
select(Ride_Length, member_casual, Started_at_weekday) %>%
filter(member_casual == 'casual')
mean(df_casual_Ride_Length$Ride_Length)
## Time difference of 29.14486 mins
Average ride length for each week day
Average_Ride_Length_Weekday <- df %>%
select(Ride_Length, Started_at_weekday) %>%
group_by(Started_at_weekday)%>%
summarise(Average_Ride_Length = mean(Ride_Length)) %>%
arrange(desc(Average_Ride_Length))
Average_Ride_Length_Weekday
## # A tibble: 7 × 2
## Started_at_weekday Average_Ride_Length
## <ord> <drtn>
## 1 Sunday 24.06769 mins
## 2 Saturday 23.67755 mins
## 3 Friday 19.00669 mins
## 4 Monday 18.52467 mins
## 5 Thursday 17.16476 mins
## 6 Tuesday 16.74360 mins
## 7 Wednesday 16.43830 mins
Number of riders per weekday
Num_Riders_weekday <- df %>%
select(ride_id, Started_at_weekday) %>%
group_by(Started_at_weekday)%>%
summarise(ride_num = length(ride_id)) %>%
arrange(desc(ride_num))
Num_Riders_weekday
## # A tibble: 7 × 2
## Started_at_weekday ride_num
## <ord> <int>
## 1 Saturday 916471
## 2 Thursday 841591
## 3 Friday 801787
## 4 Wednesday 798223
## 5 Tuesday 782372
## 6 Sunday 776259
## 7 Monday 751014
Increase marketing campaign for the SUMMER months, due to JUNE - AUGUST being the busiest months in ridership for MEMBER and CASUAL riders.
MEMBERS tend to use service for transportation to and from work, based on hourly and weekly useage. Promote bike-share service as alternative workplace transportation to increase CASUAL ridership.
Cater to MEMBERS by investing in more CLASSIC bikes due to this bike type being utilized the most in total riders and mostly by MEMBERS.
Increase SINGLE DAY PASSES for weekends to promote MEMBERSHIP SUBSCRIPTION for CASUAL riders.