The Cyclistic Bike Share Capstone Case Study is a project that forms part of the Google Data Analytics Professional Certificate. Here’s a brief overview:
In this case study, I placed myself in the role of a Junior Data Analyst working as a member of the Marketing Analystic Team at Cyclistic Bike Share Company tasked to carry Out a comprehensive analysis of the behaviour of bike users categorised into two: Members and Casual Users. In the project, I am applying data analytics techniques learned in the Cousera Google Data Analytics Professional Certificate Course to a real-world business scenario to providing valuable insights into customer behaviour and strategic decision-making.
• The case study is based on a fictional company named Cyclistic, a bike-share company with over 5,800 bicycles and 600 docking stations. • Cyclistic offers a variety of bikes, including reclining bikes, hand tricycles, and cargo bikes, catering to a diverse range of users. • The company has a flexible strategy allowing riders to unlock a bike from one station and return it to any other station.
Data Analysis Phases: The case study follows a structured analysis process divided into six phases:
• The primary aim is to understand how casual riders and annual members use Cyclistic bikes differently. • Insights from this analysis are intended to help the marketing analytics team develop strategies to convert casual riders into annual members.
Tools Used: • The analysis utilizes tools such as RStudio,Google Sheets, Tableau, and RPubs.
{r} library(tidyverse)
{r} library(lubridate)
{r} library(dplyr)
{r} library(readr)
{r} library(ggplot2)
```{r} file_location <- file.path(“C:\User\Okoye Benjamin E\Documents\divvy_Trips_2020_Q1.csv”)
```{r}
file_location <- file.path("C:\\User\\Okoye Benjamin E\\Documents\\divvy_Trips_2019_Q1.csv")
{r} q1_2019 <- read_csv("Divvy_Trips_2019_Q1.csv")
```{r} q1_2020 <- read_csv(“Divvy_Trips_2020_Q1.csv”)
# STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE
# Compare column names each of the files
# While the names don't have to be in the same order, they DO need to match perfectly before we can use a command to join them into one file
```{r}
colnames(q1_2019)
{r} colnames(q1_2020)
{r} (q1_2019 <- rename(q1_2019 ,ride_id = trip_id ,rideable_type = bikeid ,started_at = start_time ,ended_at = end_time ,start_station_name = from_station_name ,start_station_id = from_station_id ,end_station_name = to_station_name ,end_station_id = to_station_id ,member_casual = usertype ))
{r} str(q1_2019)
{r} str(q1_2020)
```{r} q1_2019 <- mutate(q1_2019, ride_id = as.character(ride_id) ,rideable_type = as.character(rideable_type))
# Stack individual quarter's data frames into one big data frame
```{r}
all_trips <- bind_rows(q1_2019, q1_2020)#, q3_2019)#, q4_2019, q1_2020)
```{r} all_trips <- all_trips %>%
select(-c(start_lat, start_lng, end_lat, end_lng, birthyear, gender,
“tripduration”))
# STEP 3: CLEAN UP AND ADD DATA TO PREPARE FOR ANALYSIS
# Inspect the new table that has been created
#List of column names
```{r}
colnames(all_trips)
#How many rows are in data frame?
{r} nrow(all_trips)
#Dimensions of the data frame?
{r} dim(all_trips) #See the first 6 rows of data
frame.
{r} head(all_trips)
{r} tail(all_trips)
#See list of columns and data types (numeric, character, etc)
{r} str(all_trips)
#Statistical summary of data. Mainly for numerics
{r} summary(all_trips)
{r} table(all_trips$member_casual)
{r} all_trips <- all_trips %>% mutate(member_casual = recode(member_casual ,"Subscriber" = "member" ,"Customer" = "casual"))
```{r} table(all_trips$member_casual)
# Add columns that list the date, month, day, and year of each ride
# This will allow us to aggregate ride data for each month, day, or year ... before completing these operations we could only aggregate at the ride level
# https://www.statmethods.net/input/dates.html more on date formats in R found at that link
```{r}
all_trips$date <- as.Date(all_trips$started_at) #The default format is yyyy-mm-dd
{r} all_trips$month <- format(as.Date(all_trips$date), "%m")
{r} all_trips$day <- format(as.Date(all_trips$date), "%d")
{r} all_trips$year <- format(as.Date(all_trips$date), "%Y")
{r} all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
{r} all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
{r} str(all_trips)
{r} is.factor(all_trips$ride_length)
{r} all_trips$ride_length <- as.numeric(as.character(all_trips$ride_length))
{r} is.numeric(all_trips$ride_length)
{r} all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" |all_trips$ride_length<0),]
{r} mean(all_trips_v2$ride_length) #straight average (total ride length / rides)
{r} median(all_trips_v2$ride_length) #midpoint number in the ascending array of ride lengths
{r} max(all_trips_v2$ride_length) #longest ride
{r} min(all_trips_v2$ride_length) #shortest ride
{r} summary(all_trips_v2$ride_length)
{r} aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
{r} aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
{r} aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
```{r} aggregate(all_trips_v2\(ride_length ~ all_trips_v2\)member_casual, FUN = min)
# See the average ride time by each day for members vs casual users
```{r}
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
{r} aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
{r} aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
```{r} aggregate(all_trips_v2\(ride_length ~ all_trips_v2\)member_casual, FUN = min)
# See the average ride time by each day for members vs casual users
```{r}
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
{r} all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
{r} aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
```{r} all_trips_v2 %>% mutate(weekday = wday(started_at, label = TRUE)) %>% group_by(member_casual, weekday) %>% summarise(number_of_rides = n(), average_duration = mean(ride_length)) %>% arrange(member_casual, weekday)
# Let's visualize the number of rides by rider type
```{r}
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = number_of_rides, fill = member_casual)) +
geom_col(position = "dodge")
```{r} all_trips_v2 %>% mutate(weekday = wday(started_at, label = TRUE)) %>% group_by(member_casual, weekday) %>% summarise(number_of_rides = n() ,average_duration = mean(ride_length)) %>% arrange(member_casual, weekday) %>% ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) + geom_col(position = “dodge”)
## My Presentation:
In the Cyclistic Bike-Share Analysis Case Study, I analyzed historical data from a fictional bike-sharing company called Cyclistic. Here are the key insights and recommendations:
## Business Task:
The aim of the case study was to understand how casual riders and annual members use Cyclistic bikes differently.
## Data Analysis Phases:
The analysis followed several phases:
1. Ask: Identifying the business task and considering stakeholder demands.
2. Prepare: Using Cyclistic’s historical trip data (12 months) for analysis.
3. Process: Cleaning and organizing the data.
4. Analyze: Identifying trends and patterns.
5. Share: Communicating findings.
6. Actions: Implementing recommendations.
## Tools Used:
The main tools included RStudio, Google Sheets,Tableau and RPubs.
## Details of my Observation:
1. It was revealed that in all_trips_V2$ride_length:
Casual-Customer riders 696019.11 secs
Members 59395.69 secs
2. Maximum ride time/Average ride time:
Casual-Customer riders 946684800 secs
Members 820454400 secs
3. Ridership data by type and weekday revealed that:
Casaul-Customer riders use 7 days (Sunday-Saturday)
Members use only 3 days (Sunday-Tuesday)
4. The visualizations of ride by type revealed that though the members ride only for 3 days, (Sunday-Tuesday),they have greater number of rides.
5. Visualization for Average ride duration indicates that Casual Customers have greater Average ride duration
## Insights:
1. The analysis revealed distinct patterns between casual riders and annual members.
2. Casual Riders: These customers purchase single-ride or full-day passes.
3. Cyclistic Members: These customers have annual memberships.
These insights would be used by the Cyclistic marketing analytics team to design a new marketing strategy aimed at converting casual riders into annual members.
Supporting visualizations and key findings
## ACT
This phase will be carried out by the executive team, Director of Marketing (Lily Moreno) and the Marketing Analytics team based on my analysis.
Conclusion:
The Visualization revealed that:
1. Members have more bikes compared to casual riders.
2. We have more members riding in all months compared to casual riders.
3. Casual riders travel for a longer period.
4. Members ride more throughout the entire weekday while the casual riders also have a high ride record during the weekends (Saturday and Sunday) compared to the other days of the week.
5. Casual riders go farther in terms of distance.
## Deliverable
These are the compelling data insights and my design marketing strategies aimed at converting casual riders into annual members.
1. Have a slash sale or promo for casual riders so they can acquire more bikes and indulge them in the benefits of being a member.
2. Host fun biking competitions with prizes at intervals for casual riders on the weekends. Since there are lot of members on weekends, this will also attract them to get a membership.
3. Encourage casual riders to ride more in the entire year through advertisement, hand flyers, by giving them various coupons so as to convince them into being a member.
## Acknowledegment:
My data was downloaded from Global web icon
Divvy Bikes: https://divvybikes.com/system-dataGlobal , a web Historical trip data available to the public. Here you'll find Divvy's trip data for public use.
Note: This data is provided according to the Divvy Data License Agreement and released on a monthly schedule.
I also used the Divvy Execise R Script.
Thank you for making out time to read my Presentation. I look forward to your valuable feedback.
```{r}
write.csv(all_trips_v2, "C:\\Users\\Okoye Benjamin E\\Documents\\divvy_Trips_2019_Q1_csv", row.names=FALSE)
{r} file_location
```{r} counts <- aggregate(all_trips_v2\(ride_length ~ all_trips_v2\)member_casual + all_trips_v2$day_of_week, FUN = mean) write.csv(counts, file = ‘avg_ride_length.csv’)
## Create a DataFrame
## Use write.csv to Export the DataFrame
## Next, you’ll need to add the syntax to export the DataFrame to a CSV file in R.
## To create a DataFrame in R,using this code,
```{r}
rm(df)
df <- data.frame(a = -(1:5), b = 1:5)
df$c[df$a > 0] <- 7
df
# a b c
# 1 -1 1 NA
# 2 -2 2 NA
# 3 -3 3 NA
# 4 -4 4 NA
# 5 -5 5 NA
{r} print(df)