I’m a junior data analyst working on the marketing team at Cyclistic, a (fake) bike-share company in Chicago. The director of marketing is focused on increasing the number of annual memberships, as they believe this is key to the company’s future success. My goal is to understand how casual riders and annual members use Cyclistic bikes differently. By analyzing these patterns, I hope to design a marketing strategy that will help convert more casual riders into annual members. My recommendations will be backed by solid data insights and professional visualizations.
To ensure accuracy and relevance, I utilized the most current data available, spanning from January 2024 to August 2024. Upon importing this data, I organized it into a single dataframe, “year24_data,” to consolidate and manage the dataset efficiently.
jan24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202401-divvy-tripdata.csv")
feb24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202402-divvy-tripdata.csv")
mar24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202403-divvy-tripdata.csv")
apr24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202404-divvy-tripdata.csv")
may24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202405-divvy-tripdata.csv")
jun24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202406-divvy-tripdata.csv")
jul24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202407-divvy-tripdata.csv")
aug24_data <- read_csv("C:/Users/codyl/OneDrive/Desktop/202408-divvy-tripdata.csv")
#combine months together for a year dataframe
year24_data <- rbind(jan24_data, feb24_data, mar24_data, apr24_data, may24_data, jun24_data, jul24_data, aug24_data)
is.na(year24_data) %>% sum()## [1] 2823066
Removed unnecessary columns
year24_data <- year24_data %>% select(-c(start_lat, start_lng, end_lat, end_lng, start_station_id,end_station_id, end_station_name))Created a day_of_week column based on Date/Time provided
day_of_week <- weekdays(year24_data$started_at)
#Add day of week dataframe to main dataset
year24_data <- cbind(day_of_week, year24_data)Created a month column based on Date/Time provided
month <- month(year24_data$started_at)
year24_data$month_name <- month(year24_data$started_at, label = TRUE, abbr = TRUE) # For abbreviated names
#Add month dataframe to main dataset
year24_data <- cbind(month, year24_data)Created a ride_length_mins column based on started_at and ended_at columns
ride_length_mins <- as.numeric(difftime(year24_data$ended_at, year24_data$started_at, units = "secs")) / 60
#Add length of ride dateframe (in minutes) to main dataset
year24_data <- cbind(ride_length_mins, year24_data)Calculated how many rides are taken each week day by member and casual members
year24_data %>%
mutate(day_of_week = wday(started_at, label = TRUE)) %>% #creates weekday field using wday()
group_by(member_casual, day_of_week ) %>% #groups by usertype and weekday
summarise(number_of_rides = n())## # A tibble: 14 × 3
## # Groups: member_casual [2]
## member_casual day_of_week number_of_rides
## <chr> <ord> <int>
## 1 casual Sun 243043
## 2 casual Mon 163062
## 3 casual Tue 156226
## 4 casual Wed 185134
## 5 casual Thu 178479
## 6 casual Fri 215907
## 7 casual Sat 315103
## 8 member Sun 273800
## 9 member Mon 344997
## 10 member Tue 374590
## 11 member Wed 405132
## 12 member Thu 379644
## 13 member Fri 348272
## 14 member Sat 326175
mean_ride_length <- year24_data$ride_length_mins %>% mean()
# Mean ride length is 18.33 minutes
max_ride_length <- year24_data$ride_length_mins %>% max()
# Max is 1559.93 minutes
member_count <- sum(year24_data$member_casual == 'member')
# Total is 2452610
casual_count <- sum(year24_data$member_casual == 'casual')
# Total is 1456954The graph above shows that members take more rides every day of the week in comparison to casual individuals. Casual individuals tend to ride more on the weekends indicating they enjoy riding for leisurely activity opposed to as a main form of transportation.
Code used for graph:
ride_per_day_graph <- year24_data %>%
group_by(member_casual, day_of_week) %>%
summarise(number_of_rides = n()) %>%
arrange(member_casual, day_of_week) %>%
ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) + geom_col(position = "dodge") +
labs(x='Day of Week', y='Total Number of Rides', title='Rides per Day of Week', fill = 'Type of Membership') +
scale_y_continuous(breaks = c(100000, 200000, 300000, 400000, 500000), labels = c("100K", "200K", "300K", "400K", "500K"))
The graph above shows a significant increase of usage in the number of riders throughout the year. This must be due to our company gaining more traction and riders becoming accustomed to the lifestyle our product has to offer. We can also see a noticeable gap between member and casual riders.
Code used for graph:
ride_per_month_graph <- year24_data %>%
group_by(member_casual, month_name) %>%
summarise(number_of_rides = n()) %>%
arrange(member_casual, month_name) %>%
ggplot(aes(x = month_name, y = number_of_rides, fill = member_casual)) + geom_col(position = "dodge") +
labs(x='Month', y='Total Number of Rides', title='Rides per Month', fill = 'Type of Membership') +
scale_y_continuous(breaks = c(100000, 200000, 300000, 400000, 500000), labels = c("100K", "200K", "300K", "400K", "500K"))
The graph above shows a fairly equal distribution of bike types from both membership parties. With no underlying preference, bike types will be ruled out of any marketing suggestions as of now.
Code used for graph:
type_of_bike_graph <- year24_data %>%
group_by(member_casual, rideable_type) %>%
summarise(number_of_rides = n()) %>%
arrange(member_casual, rideable_type) %>%
ggplot(aes(x = rideable_type, y = number_of_rides, fill = member_casual)) + geom_col(position = "dodge") +
labs(x='Type of Bike', y='Total Number of Rides', title='Bikes vs Membership', fill = 'Type of Membership') +
scale_y_continuous(breaks = c(200000, 400000, 600000, 800000, 1000000, 1200000), labels = c("200K", "400K", "600K", "800K", "1M", "1.2M"))
The graph above shows that causal riders have a higher monthly ride length average time even though member riders dominate in overall rides. This was surprising, so further analysis was needed to look at the daily average.
Code used for graph:
ride_length_month_graph <- year24_data %>%
group_by(member_casual, month_name) %>%
summarise(average_ride_length = mean(ride_length_mins, na.rm = TRUE)) %>%
arrange(month_name, member_casual) %>%
ggplot(aes(x = month_name, y = average_ride_length, fill = member_casual)) +
geom_col(position = "dodge") +
labs(x = 'Month',
y = 'Average Ride Length (mins)',
title = 'Monthly Rider Average',
fill = 'Type of Membership')
The graph above also shows casual riders dominating in a higher daily ride length average time.
Code used for graph:
ride_length_days_graph <- year24_data %>%
group_by(member_casual, day_of_week) %>%
summarise(average_ride_length = mean(ride_length_mins, na.rm = TRUE)) %>%
arrange(day_of_week, member_casual) %>%
ggplot(aes(x = day_of_week, y = average_ride_length, fill = member_casual)) +
geom_col(position = "dodge") +
labs(x = 'Day of Week',
y = 'Average Ride Length (mins)',
title = 'Daily Rider Average',
fill = 'Type of Membership')
Thus, Members tend to have more consistent, frequent rides, while Casual riders may have fewer but longer rides.