cyclists_analysis

Strategies for Accelerating Growth: A Cyclistic Case Study

Introduction

This case study, part of the Google Data Analytics Professional Certificate program, delves into the strategic steps essential for fostering rapid expansion within the bike-share domain. The study meticulously examines the phases of:

Ask
Prepare
Process
Analyze
Share
Act

Company Overview

Cyclistic, founded in 2016, initiated a thriving bike-share service, evolving over time into a robust fleet comprising 5,824 bicycles strategically stationed across 692 locations throughout Chicago. This sophisticated system enables users to unlock bikes from one station and seamlessly return them to any other station within the network at their convenience.

Objectives

The primary aim is to devise marketing initiatives targeted at transitioning occasional riders into annual subscribers. To achieve this objective, a comprehensive understanding of the distinctions between annual members and casual riders is imperative. Furthermore, insight into the motivations driving casual riders towards membership acquisition, coupled with an assessment of digital media’s influence on marketing strategies, is paramount. Lily Moreno, Director of Marketing, spearheads this endeavor, emphasizing the analysis of historical bike trip data to discern patterns and trends within the Cyclistic user base.

Ask

Cyclistic’s inception in 2016 marked the genesis of an expansive journey, culminating in the current operational scale encompassing a diverse array of subscription options, including single-ride, full-day, and annual memberships. These offerings cater to distinct user demographics, with single-ride and full-day options predominantly serving casual riders, while annual memberships denote Cyclistic’s committed clientele.

For the forthcoming analysis, data spanning 12 months, from April 2020 to March 2021, will be scrutinized. Key inquiries to be addressed include:

Comparative analysis of Cyclistic bike utilization patterns between annual members and casual riders, encompassing factors such as:

Duration of bike routes utilized
Frequency of bike usage
Predominant station locations frequented by each user segment

Evaluation of station performance metrics and geographical trends to discern usage disparities and potential growth avenues.

This meticulous inquiry sets the stage for subsequent phases of preparation, processing, analysis, and strategic action, ultimately facilitating Cyclistic’s quest for sustained growth and market dominance within the bike-share landscape.

Prepare

First, we will create a function to iterate through our CSV files, consolidating them into a single file. Subsequently, we will assign it to our designated variable.

library(tidyverse)
library(geosphere)
library(wordcloud)


# Firstly Making a function to proccess the data we collected
process_data <- function(file_path) {
  # Reading CSV files and combining them into one data frame also making sure that the types are correct
  result <- list.files(path = file_path, pattern = "*.csv", full.names = TRUE) %>%
    purrr::map_dfr(~ read.csv(.x) %>% mutate(across(.fns = as.character))) %>%
    readr::type_convert()
  
  # Adding columns for month, year, and day_of_week
  result$month <- format(as.Date(result$started_at), "%b")
  result$year <- format(as.Date(result$started_at), "%Y")
  result$day_of_week <- format(as.Date(result$started_at), "%A")
  
  # Creating hour column
  result$hour <- strftime(result$ended_at, "%H")
  
  # Creating ride length column (in minutes)
  result$ride_length <- as.numeric(difftime(result$ended_at, result$started_at, units = "mins"))
  
  # Creating ride distance column (in km)
  result$ride_distance <- geosphere::distGeo(matrix(c(result$start_lng, result$start_lat), ncol = 2),
                                             matrix(c(result$end_lng, result$end_lat), ncol = 2)) / 1000
  result$ride_distance <- result$ride_distance/1000
  
  result$day_of_week <- ordered(result$day_of_week, levels = c("Monday", "Teusday", "Wedenesday", "Thursday", "Friday", "Saturday", "Sunday"))
  
  #Ordering the Month Column
  result$month <- ordered(result$month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
  
  #Renaming the Columns
  names(result) [2] <- 'bike_model'
  names(result) [13] <- 'member_type'
  
  #Remove rows with NA
  result <- drop_na(result)
  
  #Removing the Negative Rides
  result <- result[!result$ride_length < 1, ]
  
  # Remove rows above 1 day rides
  result <- result[!result$ride_length > 1440,]
  
  return(result)
}

processed_data <- process_data("./")

Process

After creating the processed_data variable, we will proceed to extract and analyze specific information. The following analyses will be conducted:

Users: Total number of users as a percentage.
Bikes: Total number of used bikes as a percentage.
Bike_model_user_type_casual: Bikes used by Casual Members.
Bike_model_user_type_member: Bikes used by Annual Members.
User_day_rel: Users’ usage compared to days of the week.
Bike_model_day_rel: Bikes’ usage compared to days of the week.
User_month_rel: Users’ usage compared to months.
Bike_model_month_rel: Bikes’ usage compared to months.
User_ride_length_rel: Users’ ride length compared by member type.
User_ride_length_month_rel: Users’ ride length by month and type.
Hours: Rides compared to hours.
avg_bike_rides_month: Users’ ride Distance by month and type.
Start_station_users_casual: Start stations of Casual users.
Start_station_users_member: Start stations of Annual members.

users <- processed_data %>%
  group_by(member_type) %>%
  summarize(total = n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(member_type) %>%
  summarize(precentage = total/all_total * 100)

bikes <- processed_data %>%
  group_by(bike_model) %>%
  summarize(total = n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(bike_model) %>%
  summarize(precentage = total/all_total * 100)

bike_model_user_type_casual <- processed_data %>%
  filter(member_type == "casual") %>%
  group_by(bike_model) %>%
  summarize(total= n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(bike_model) %>%
  summarize(precentage = total/all_total * 100)

bike_model_user_type_member <- processed_data %>%
  filter(member_type == "member") %>%
  group_by(bike_model) %>%
  summarize(total= n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(bike_model) %>%
  summarize(precentage = total/all_total * 100)

user_day_rel <- processed_data %>%
  group_by(day_of_week, member_type, bike_model) %>%
  group_by(day_of_week, member_type)%>%
  summarize(total = n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(day_of_week, member_type) %>%
  summarize(precentage = total/all_total * 100)

bike_model_day_rel <- processed_data %>%
  group_by(day_of_week, member_type, bike_model) %>%
  group_by(day_of_week, bike_model)%>%
  summarize(total = n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(day_of_week, bike_model) %>%
  summarize(precentage = total/all_total * 100)


user_month_rel <- processed_data %>%
  group_by(month, member_type, bike_model) %>%
  group_by(month, member_type)%>%
  summarize(total = n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(month, member_type) %>%
  summarize(precentage = total/all_total * 100)

bike_model_month_rel <- processed_data %>%
  group_by(month, member_type, bike_model) %>%
  group_by(month, bike_model)%>%
  summarize(total = n()) %>%
  mutate(all_total = sum(total)) %>%
  group_by(month, bike_model) %>%
  summarize(precentage = total/all_total * 100)



user_ride_length_rel <- processed_data %>%
  group_by(member_type)%>%
  summarize(ride_length = sum(ride_length))

user_ride_length_month_rel <- processed_data %>%
  group_by(member_type, month)%>%
  summarize(ride_length = sum(ride_length))


hours <- processed_data %>%
  group_by(member_type, hour)%>%
  summarize(number_of_rides = n(), .groups = "drop")%>%
  arrange(hour)

avg_bike_rides_month <- processed_data %>%
  group_by(bike_model, month)%>%
  summarize(ride_length = sum(ride_length))

start_station_users_casual <- processed_data%>%
   filter(member_type == "casual")%>%
   group_by(start_station_name)%>%
   summarize(total = n())
start_station_users_member <- processed_data%>%
   filter(member_type == "member")%>%
   group_by(start_station_name)%>%
   summarize(total = n())

Analyze

Following the data compilation, we will proceed with chart creation to analyze the gathered data. This process begins by establishing key variables essential for our analysis.

# Reused Variables
two_color_pallate <-  c("#FF204E", "#A0153E")
three_color_pallate <- c("#FF204E", "#A0153E","#5D0E41")

user_types_original <- c("casual", "member")
user_types_chart <- c("Casual Member", "Annual Member")
bike_types_original <- c("classic_bike", "docked_bike", "electric_bike")
bike_types_chart <- c("Classic", "Docked", "Electrical")

footer_text <- "Data: Motivate International"

Subsequently, we will develop functions tailored to generate insightful charts based on the provided data sets.

These functions encompass the following:

plot_distribution: This function is utilized to generate a pie chart, effectively highlighting the distribution of the data.
stackbar_plot: It facilitates the creation of a dodge stack bar chart, enabling clear visualization of the variance among variables over time.
stackbar_plot_length: Tailored for larger datasets, this function produces a stacked bar chart, effectively illustrating the disparities among variables.

# Function to use for the charts
plot_distribution <- function(data, x_value, y_value, fill_value, legend, colors, top_name, sections_data, labels_text, main_title, subtitle_text, caption_text) {
  ggplot(data, aes(x={{x_value}}, y={{y_value}}, fill={{fill_value}}))+
    geom_bar(stat= "identity", width = 1)+
    coord_polar(theta = "y", start = 0)+
    geom_text(aes(label = scales :: percent(round({{y_value}}) / 100)), position = position_stack(vjust = 0.5), size = 5, fontface = "bold", color = "#FFFFFF") +
    scale_fill_manual(values = {{colors}}, name = {{top_name}}, breaks = {{sections_data}}, labels = {{labels_text}})+
    labs(title = {{main_title}}, subtitle = {{subtitle_text}}, caption = {{caption_text}}, fill = legend)+
    theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold", color = "#00224D"),
          plot.subtitle = element_text(hjust = 0.5, size = 12, face = "bold", color = "grey20"),
          plot.caption = element_text(size = 4, color = "grey35"),
          legend.title = element_text(size = 12, face = "bold", color = "#00224D"),
          legend.text = element_text(size = 10, color = "grey20"))
}

stackbar_plot <- function(data, x_value, y_value, fill_value, legend, colors, top_name, sections_data, labels_text, main_title, subtitle_text, caption_text){
  ggplot(data, aes(x={{x_value}}, y={{y_value}}, fill={{fill_value}}))+
    geom_bar(position = "dodge", stat="identity")+
    geom_text(aes(label = scales :: percent(round({{y_value}}) / 100)), position = position_dodge(width =0.9),vjust=-0.5, size = 3, fontface = "bold", color = "#000000")+
    scale_fill_manual(values = {{colors}}, name={{top_name}}, breaks = {{sections_data}}, labels= {{labels_text}})+
    scale_y_continuous(labels = scales::comma)+
    labs(title={{main_title}}, subtitle = {{subtitle_text}}, caption = {{caption_text}}, fill=legend)+
    theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold", color = "#00224D"),
          plot.subtitle = element_text(hjust = 0.5, size = 12, face = "bold", color = "grey20"),
          plot.caption = element_text(size = 4, color = "grey35"),
          legend.title = element_text(size = 12, face = "bold", color = "#00224D"),
          legend.text = element_text(size = 10, color = "grey20"))
}

stackbar_plot_lenght <- function(data, x_value, y_value, fill_value, legend, colors, top_name, sections_data, labels_text, main_title, subtitle_text, caption_text){
  ggplot(data, aes(x={{x_value}}, y={{y_value}}, fill={{fill_value}}))+
    geom_bar(position = "stack", stat="identity")+
    scale_fill_manual(values = {{colors}}, name={{top_name}}, breaks = {{sections_data}}, labels= {{labels_text}})+
    scale_y_continuous(labels = scales::comma)+
    labs(title={{main_title}}, subtitle = {{subtitle_text}}, caption = {{caption_text}}, fill=legend)+
    theme(plot.title = element_text(hjust = 0.5, size = 16, face = "bold", color = "#00224D"),
          plot.subtitle = element_text(hjust = 0.5, size = 12, face = "bold", color = "grey20"),
          plot.caption = element_text(size = 4, color = "grey35"),
          legend.title = element_text(size = 12, face = "bold", color = "#00224D"),
          legend.text = element_text(size = 10, color = "grey20"))
}

Total number of users as a percentage.

In this report, we examine the distribution of membership types and their respective percentages in relation to the total number of members. As observed, the majority of our members are classified as Annual members.

# Charts to display
plot_distribution(users, "", precentage, member_type, "Members", two_color_pallate, "Types of riders", 
                                user_types_original, user_types_chart, "Distribution of Riders", "What percentage of riders are using the Cyclistic?", 
                                footer_text)

Total number of used bikes as a percentage.

Next, we explore the distribution of bikes. This report aims to analyze the usage of bikes by users. According to the findings, Docked Bikes are the most frequently utilized.

plot_distribution(bikes, "", precentage, bike_model, "Bikes", three_color_pallate, 
                                    "Types of Bikes", bike_types_original, bike_types_chart, 
                                    "Distribution of Bikes", "What percentage of all riders are using the Bikes?", 
                                    footer_text)

Bikes used by Annual Members.

Following this, we delve into the analysis of bike usage by Annual members. Once more, our observations indicate that Docked bikes represent the highest proportion in terms of usage among this membership category.

plot_distribution(bike_model_user_type_member, "", precentage, bike_model, "Bikes", three_color_pallate, 
                                       "Types of Bikes", bike_types_original, bike_types_chart, 
                                       "Member Bikes", "What percentage of member riders are using the Bikes?",
                                       footer_text)

Bikes used by Casual Members.

Once again, we conduct a similar report for Casual Members. Consistently, our findings reveal that Docked bikes remain the most utilized among this membership segment.

plot_distribution(bike_model_user_type_casual, "", precentage, bike_model, "Bikes", three_color_pallate, 
                                       "Types of Bikes", bike_types_original, bike_types_chart, 
                                       "Casual Bikes", "What percentage of casual riders are using the Bikes?", 
                                       footer_text)

Users’ usage compared to months.

In this report, our objective is to analyze the bike usage patterns among various membership types across different months. It is evident that Casual members exhibit the highest usage, particularly in January, with usage becoming more comparable across all membership types around June and July.

stackbar_plot(user_month_rel, month, precentage, member_type, legend, two_color_pallate,
                                       "Members Type", user_types_original, user_types_chart, 
                                       "Members usage throught Months", "Comparison of the users throght the months", 
                                       footer_text)

Users’ usage compared to days of the week.

In our comparative analysis across days of the week, it becomes apparent that Saturdays and Sundays observe heightened usage among our Annual users.

stackbar_plot(user_day_rel, day_of_week, precentage, member_type, legend, two_color_pallate,
              "Members Type", user_types_original, user_types_chart, 
              "Members usage throught Days of Week", "Comparison of the users throght Days of Week", 
              footer_text)

Bikes’ usage compared to months.

Following our monthly analysis on the types of bikes utilized, a notable trend emerges: as temperatures rise, there is a noticeable increase in the usage of Docked bicycles. Conversely, as colder weather seasons approach, Classic bikes observe higher usage rates.

stackbar_plot(bike_model_month_rel, month, precentage, bike_model, legend, three_color_pallate,
              "Types of Bikes", bike_types_original, bike_types_chart, 
              "Bikes usage throught Months", "Comparison of the bikes Months", 
              footer_text)

Bikes’ usage compared to days of the week.

Continuing with our analysis, we scrutinized bike usage patterns across days of the week. It is evident that Docked bikes are consistently the most utilized throughout the entire week.

stackbar_plot(bike_model_day_rel, day_of_week, precentage, bike_model, legend, three_color_pallate,
              "Types of Bikes", bike_types_original, bike_types_chart, 
              "Bikes usage throught Days of Week", "Comparison of the bikes throght Days of Week", 
              footer_text)

Users’ ride length by month and type.

In this analysis, we examined ride lengths between two member groups. Notably, casual members exhibit longer ride durations compared to annual members. Additionally, it’s observed that as temperatures rise, ride frequency increases, particularly during warmer weather conditions.

stackbar_plot_lenght(user_ride_length_month_rel, month, ride_length, member_type, legend, two_color_pallate,
              "Members Type", user_types_original, user_types_chart, 
              "Members Ride Length throught Months", "Comparison of the user Ride Length throght the months", 
              "Data: Motivate International")

Users’ ride length compared by member type.

This report illustrates a significant disparity in the overall ride length between casual and annual members, with casual members consistently demonstrating markedly higher ride durations compared to annual members.

stackbar_plot_lenght(user_ride_length_rel, "", ride_length, member_type, legend, two_color_pallate,
                     "Members Type", user_types_original, user_types_chart, 
                     "Members Ride Length throught Months", "Comparison of the user Ride Length throght the months", 
                     footer_text)

Users’ ride length by month and type.

In this analysis, we investigated the varying lengths of bike rides, revealing a recurring trend where Docked bikes consistently record the highest number of rides among all bike types.

stackbar_plot_lenght(avg_bike_rides_month, month, ride_length, bike_model, legend, three_color_pallate,
                     "Members Type", bike_types_original, bike_types_chart,
                     "Members Ride Length throught Months", "Comparison of the user Ride Length throght the months", 
                     footer_text)

Rides compared to hours.

This report provides insights into the usage patterns of member bikes throughout the 24-hour period. The analysis reveals that our members predominantly utilize the bikes between the hours of 16:00 and 21:00.

stackbar_plot_lenght(hours, hour, number_of_rides, member_type, legend, two_color_pallate,
                     "Members Type", user_types_original, user_types_chart, 
                     "Members Ride Length throught Months", "Comparison of the user Ride Length throght the months", 
                     "Data: Motivate International")

Start stations of Casual users.

Upon examining the top street names utilized by casual members, it becomes evident that Millennium Park emerges as the favored start station among our casual riders.

wordcloud(words = start_station_users_casual$start_station_name, freq = start_station_users_casual$total, min.freq = 1, max.words = 200, random.order = FALSE, colors = brewer.pal(8, "Dark2"))

Start stations of Annual members.

Upon examining the top street names utilized by annual members, it becomes evident that Wells St & Elm St emerges as the favored start station among our annual riders.

wordcloud(words = start_station_users_member$start_station_name, freq = start_station_users_member$total, min.freq = 1, max.words = 200, random.order = FALSE, colors = brewer.pal(8, "Dark2"))

Act

Based on the analysis conducted, the following recommendations are proposed:

Introduce Docked type bikes specifically tailored for Annual memberships to enhance user experience and satisfaction.
Direct advertising campaigns towards summer periods to capitalize on increased bike usage during warmer weather.
Target casual riders with extended ride durations by showcasing potential savings achievable through Annual membership subscriptions.
Allocate advertising efforts towards start stations frequented by casual members, with particular emphasis on popular locations like Millennium Park.
Tailor advertising campaigns to individuals seeking transportation options for their commute back from work, aligning with peak usage times.

cyclists_analysis

Kamyarmk

2024-04-04

Strategies for Accelerating Growth: A Cyclistic Case Study

Introduction

Company Overview

Objectives

Ask

Prepare

Process

Analyze

Total number of users as a percentage.

Total number of used bikes as a percentage.

Bikes used by Annual Members.

Bikes used by Casual Members.

Users’ usage compared to months.

Users’ usage compared to days of the week.

Bikes’ usage compared to months.

Bikes’ usage compared to days of the week.

Users’ ride length by month and type.

Users’ ride length compared by member type.

Users’ ride length by month and type.

Rides compared to hours.

Start stations of Casual users.

Start stations of Annual members.

Act

Thank you for Reading