This is a hypothetical case study of a fictional bike-share company, Cyclistic. Based in Chicago, the company’s bike-sharing program features more than 5,800 bicycles and 600 docking stations. Customers who purchase single-ride or full-day passes are referred to as casual riders while customers who purchase annual memberships are cyclistic members.The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. I, as a junior data analyst, am here to organise and analyze the given data to understand how annual members and casual riders use cyclistic bikes differently, make data-driven decisions and come up with positive recommendations that cyclistic executive team would approve of. For this, I shall follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.
ASK
What’s the problem we are trying to solve?
Business Task To analyse how different users, Annual members and Casual riders, use cyclistic bikes differently, and to further formulate effective marketing strategies that can transform casual riders into annual members.
Key Stakeholders Cyclistic executive team and Lily Moreno (Marketing Director and Manager)
PREPARE
Check for biases and determine the credibility of the data.
Data Source Dataset used for this analysis ROCCC (reliable, original, comprehensive, current, cited). The data has been made available by Motivate International inc click here.
Data does not contain personal information of the users.
I have taken previous 12 months data i.e, August 2021 to July 2022. The files are in .csv format.
Tools used:
R: Data Cleaning and Manipulation
Tableau: Visualization
PROCESS
1.To begin the process phase, I would install the necessary packages,
install.packages('tidyverse')
install.packages('dplyr')
2.Then, I would import and merge all 12 files into one and name that dataframe “BikeTripData”.
BikeTripData <- list.files(path="C:/Users/Lenovo/Desktop/Cyclistic/BikeTripData.csv") lapply(read_csv) %>%
bind_rows
library(readr)
View(BikeTripData)
Removing NA’s (if any)
sum(is.na(BikeTripData))
BikeTripData<- na.omit(BikeTripData)
Adding necessary columns
3. To calculate the ride length of each user in minutes and hours, I would run the following code,
BikeTripData$ride_length_temp = difftime(BikeTripData$ended_at, BikeTripData$started_at, units = "mins")
BikeTripData$ride_length_temp = difftime(BikeTripData$ended_at, BikeTripData$started_at, units = "hours")
BikeTripData$Start_day <- wday(BikeTripData$started_at, label=TRUE)
BikeTripData$End_day <- wday(BikeTripData$ended_at, label=TRUE)
install.packages("geosphere")
library(geosphere)
mutate(BikeTripData, distance_km = distHaversine(cbind(start_lng, start_lat), cbind(end_lng, end_lat))*0.001)
Organizing data To make it easier to read and understand, I separate the Date and Time from started_at and ended_at columns and make new columns for Date and Time,
For separating Date in YYYY/MM/DD Format,
BikeTripData$Start_Date <- as.Date(BikeTripData$started_at)
BikeTripData$End_Date <- as.Date(BikeTripData$ended_at)
For separating Time in HH:MM:SS Format,
BikeTripData$Start_time <- format(as.POSIXct(BikeTripData$started_at),format = "%H:%M:%S")
BikeTripData$End_time <- format(as.POSIXct(BikeTripData$ended_at),format = "%H:%M:%S")
Deleting the irrelevant columns
6. ride_id: Since this column contains distinct identification numbers for each user, it limits my analysis to connect the pass purchases to identification numbers to determine if casual riders live in the cyclistic service area or if they have purchased multiple single passes.
BikeTripData = subset(BikeTripData, select = -c(ride_id))
7. started_at, ended_at: Since I have separated the Date and Time to new distinct columns,
BikeTripData = subset(BikeTripData, select = -c(started_at, ended_at))
ANALYZE
After organizing and formatting data, perform calculations and identify trends and relationships.
To count the number of stations frequented by users, I make a new table “t1”
#t1 <- BikeTripData %>%
+ group_by(member_casual, start_station_name) %>%
+ summarise(count_of=n()) %>%
+ arrange(desc(count_of)) %>%
+ na.omit(start_station_name)
{# A tibble: 2,590 × 3} # Groups: member_casual [2]
member_casual start_station_name count_of
<chr> <chr> <int>
1 casual Streeter Dr & Grand Ave 62984
2 casual DuSable Lake Shore Dr & Monroe St 33183
3 casual Millennium Park 29219
4 casual Michigan Ave & Oak St 28210
5 casual DuSable Lake Shore Dr & North Blvd 27312
6 member Kingsbury St & Kinzie St 26428
7 member Clark St & Elm St 23548
8 member Wells St & Concord Ln 23498
9 casual Shedd Aquarium 21709
10 member Wells St & Elm St 20787
# … with 2,580 more rows # ℹ Use `print(n = ...)` to see more rows
To know Top 5 frequented stations by casual riders,
table1.1 <- filter(t1, member_casual =="casual") %>%
+ rename(number_of_trips = count_of) %>%
+ slice(1:5)
{# A tibble: 5 × 3} # Groups: member_casual [1]
member_casual start_station_name number_of_trips
<chr> <chr> <int>
1 casual Streeter Dr & Grand Ave 62984
2 casual DuSable Lake Shore Dr & Monroe St 33183
3 casual Millennium Park 29219
4 casual Michigan Ave & Oak St 28210
5 casual DuSable Lake Shore Dr & North Blvd 27312
To know Top 5 frequented stations by Annual Members,
table1.2 <- filter(t1, member_casual =="member") %>%
+ rename(number_of_trips = count_of) %>%
+ slice(1:5)
{# A tibble: 5 × 3} # Groups: member_casual [1]
member_casual start_station_name number_of_trips
<chr> <chr> <int>
1 member Kingsbury St & Kinzie St 26428
2 member Clark St & Elm St 23548
3 member Wells St & Concord Ln 23498
4 member Wells St & Elm St 20787
5 member Ellis Ave & 60th St 20437
Users and Bike Preference:
Observation:
Casual riders mostly prefer Electric Bikes while Annual members mostly prefer Classic Bikes.
Docked Bikes are used by Casual riders only.
Total rides per day:
Observation:
Saturday is seen the busiest day in which 53.78% are Casual Riders and 46.22% are Annual Members.
Annual Members tend to use bike services more on weekdays and Casual Riders, on weekends.
Total rides(month)
Observation:
August 2021 has seen maximum number of Casual Riders while January 2022 has seen minimum number of Casual Riders.
In case of Annual Members, July 2022 has seen the maximum number of members while again, January 2022 has seen minimum number of members.
From February to July,there is an increment by 94.7% seen in Casual riders and 77.4% in Annual members.
Average ride duration
Average ride duration per day
Observation:
Casual riders’ average ride duration is 29.21 Minutes and Annual members’ average ride duration is 12.93 minutes.
Casual riders’ maximum average ride duration is seen on Sunday (33.96 minutes) and minimum average ride duration is seen on Wednesday (25 minutes) whereas Annual members’ maximum average ride duration is seen on Sunday (14.63 minutes) and minimum average ride duration is seen on Tuesday (12.15 minutes).
Casual riders’ average ride duration is 125% more than Annual members.
Extended Analysis:
Observations:
Average distance covered by Casual riders is 2.2605 km and 2.0930 km by Annual members.
Upon Observation, the average distance lies between 2.34 (August 2021) to 1.93 km (January 2022)
Key Takeaways/Findings:
As per analysis, top frequented stations by casual riders tend to be near tourist attractions where top frequented stations by annual members tend to be near residential/Work places.
Casual riders prefer Electric Bikes.
The graph begins to decline until January 2022 then move upwards from February 2022. It’s an indication that Seasons also play an important role here. From February to July (Spring to Summer), the users increase noticeably.
From February to July;
*There is an increment by 94.7% seen in Casual riders and 77.4% in Annual members.
*The average distance increases from 1.93 km to 2.22 km (Casual Riders)
Recommendations:
Running Marketing Campaigns near Hot spot stations frequented by Casual riders by partnering up with eateries, theme parks and other recreational hubs.
While generating offers, consider discounting price for Electric bikes by keeping average ride duration and distance in view.
Heavy Marketing campaigns can be run in peak season i.e., Summer and all year round on weekends.