Hello, I’m Cherrie Cosby. My journey into the world of computers was ignited by an insatiable curiosity that led me to dive deep into learning. This journey has taken me from being a contract software specialist to becoming a skilled data analyst. My expertise encompasses a wide range of sectors, from manufacturing to marketing, where I’ve skillfully blended my understanding of business dynamics with my prowess in software.
Throughout my career, I’ve spearheaded initiatives that have not only amplified revenues but also streamlined expenditures. I’m deeply driven by the prospect of driving innovation at the intersection of technology and business. I’m enthusiastic about lending my multidisciplinary insights to pioneering projects that have the potential to redefine norms.
My toolkit is well-equipped with VBA, SQL, R, Git, Python, Tableau, and beyond. These tools enable me to navigate complex challenges with finesse and creativity. I’m excited about the opportunity to collaborate and pave fresh pathways in this dynamic and thrilling realm. If you’re as excited about the future of technology and business as I am, let’s join forces and make remarkable strides together!
Welcome to the Cyclistic Marketing Analysis, an exploration of Chicago’s premier bike share program. Cyclistic boasts over 5,800 bikes stationed across 692 docking points, catering to a diverse range of riders, from casual cyclists to daily commuters.
Key Participants:
Cherrie Cosby – Junior Data Analyst in the Cyclistic marketing analytics team.
Lily Moreno – Stakeholder - Director of Marketing in charge of development for campaigns and initiatives to promote the program, including email, social media, and/or other channels.
Company and Program Details:
Cyclistic offers a range of bikes, including classic bikes, docked bikes, and electric bikes, with various styles and options. This diversity ensures accessibility for all riders, including those with disabilities.
Approximately 70% of riders use the bikes for leisure, while 30% use them for daily commuting, with about 8% of riders utilizing the assistive options.
Cyclistic provides flexible pricing plans, such as single-ride, full-day, and annual membership options.
Background and Task:
The objective is clear: to delve into the usage patterns of Cyclistic’s annual members and casual riders. Through this analysis, I aim to unearth the unique characteristics and behaviors that set these two rider groups apart. Armed with this understanding, my mission is to craft precise marketing strategies with one overarching goal – transforming casual riders into dedicated annual members. This strategy is designed to maximize annual memberships and propel Cyclistic’s growth. My analysis will provide a data-driven roadmap, guiding the implementation of a potent marketing program that will fortify Cyclistic’s position as a premier bike-share company.
The marketing analyst team, led by Lily Moreno, is interested in analyzing the Cyclistic historical bike trip data to identify trends.
Hypothesis: Maximizing the number of annual members will be key to future growth.
The Goal: Increase conversion of casual riders into annual members.
The Questions to be Answered:
Join me on this exciting journey into the behaviors of Cyclistic’s ridership, where data-driven decisions pave the path to success.
My drive to undertake this project is fueled by a deep-seated passion for business analytics and an unwavering commitment to revolutionize urban mobility.Through refining Cyclistic’s annual membership program and optimizing the transition from casual user to dedicated member, I am committed to promoting sustainable and eco-friendly transportation options, providing a seamless and gratifying experience for all riders.
The primary audience for this project is prospective employers, recruiters, and fellow data enthusiasts who seek to understand the practical application of data analysis in real-world scenarios. My goal is to showcase a professional and skillful approach to business analytics, providing a tangible example of data-driven decision-making.
Original Source Data
Additional Temporal Data
The holidays data set focuses on the city of Chicago, Illinois, spanning the time frame from January 1, 2022, to December 31, 2022 and was compiled using Google Search. The data set is centered around key holidays observed within this period. It includes holiday names along with their respective start and end dates, providing insights into the specific days of significance throughout the year. The data set captures the essence of these holidays, allowing for a comprehensive understanding of their occurrences and durations in Chicago. This set of holiday data serves as a valuable resource for analyzing and contextualizing rider behaviors during these significant dates within the city.
I acquired the weather data from a reputable source, Visual Crossing, with explicit permission. The acquired weather data set is dedicated to the city of Chicago, Illinois, covering the time span from January 1, 2022, to December 31, 2022. This data set presents a temporal dimension through daily temperature recordings and encompasses a wide array of meteorological variables. Among the temperature measurements in Fahrenheit, the data set encompasses humidity levels, detailed precipitation data, atmospheric pressure readings, solar indexes, cloud information, visibility metrics, and wind statistics. Additionally, the data set includes essential data points such as sunrise and sunset times, moon phases, weather conditions, their corresponding descriptions, and the collection location for the data, with distance measurements provided in miles. This comprehensive data set offers invaluable insights into the climatic trends and conditions of Chicago throughout the specified time frame.
To enhance visual flow, collapsible sections have been utilized to incorporate both code chunks and their corresponding outputs, providing a streamlined and user-friendly presentation.
rmarkdown: Used for generating dynamic reports.
knitr: Enables dynamic document creation.
tidyverse: Used for data manipulation and visualization.
data.table: Provides fast data frame manipulation.
lubridate: Used for handling date and time data.
sf: Provides tools for spatial data analysis.
leaflet: Used for interactive maps.
hms: Used for handling time data in hours, minutes, and seconds format, and for performing time-related operations.
scales: Load the scales library for formatting numbers
ggpubr: Provides a suite of functions and tools for creating publication-ready plots and figures using ggplot2.
gridExtra: Extends the grid graphics system in R, allowing for the arrangement of multiple grid-based figures on one page.
patchwork: Enables the seamless combination and arrangement of multiple ggplots, offering flexibility in creating complex multi-panel visualizations
gt: an R package for creating and formatting tables with ease and flexibility
RPostgres: provides a connection between R and PostgreSQL, allowing users to interact with and manipulate PostgreSQL databases directly from R.
RODBC: for seamless interaction with databases, making data retrieval and manipulation straightforward and flexible
odbc: Simplifying database interaction in R with ease and flexibility, this package streamlines the process of data retrieval and manipulation.
dplyr: designed to streamline data manipulation tasks, offering ease and flexibility in creating and formatting tables.
dbplyr: specialized package that enhances data manipulation capabilities, providing seamless interaction with databases for creating and formatting tables with ease and flexibility.
plotly: An R package that empowers users to create interactive and visually appealing plots with ease and flexibility, enhancing data visualization capabilities.
RColorBrewer: a series of color maps that are designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency.
# Set mirror
options(repos = c(CRAN = "https://cran.rstudio.com/"))
# Update Packages
update.packages(ask = FALSE, dependencies = TRUE)
# List of Packages Used
library(DBI)
library(rmarkdown)
library(knitr)
library(tidyverse)
library(data.table)
library(lubridate)
library(sf)
library(leaflet)
library(hms)
library(scales)
library(ggpubr)
library(gridExtra)
library(patchwork)
library(gt)
library(RPostgres)
library(RODBC)
library(odbc)
library(dplyr)
library(dbplyr)
library(plotly)
library(RColorBrewer)
library(kableExtra)
# Load necessary packages
library(RPostgres)
library(RODBC)
library(odbc)
# Connect to Database
conn <- dbConnect(
drv = RPostgres::Postgres(),
dbname = dsn_database,
host = dsn_hostname,
port = dsn_port,
user = dsn_uid,
password = dsn_pwd
)
# Get List of Tables
dbListTables(conn)
## [1] "cyclistic_data_1" "cyclistic_data_2" "cyclistic_data_3"
## [4] "cyclistic_data_4" "cyclistic_data_5" "cyclistic_data_6"
## [7] "cyclistic_data_7" "cyclistic_data_8" "cyclistic_data_9"
## [10] "cyclistic_data_10" "cyclistic_data_11" "cyclistic_data_12"
## [13] "temporal_data" "holidays" "user_data"
## [16] "weather_data" "user_data_backup" "temporal_data_backup"
-> Please take note that the tables of interest are ‘user_data’ and ‘temporal_data’. To streamline the analysis, I consolidated the ‘holidays’ and ‘weather_data’ tables, as they pertain to the same time frame, into a unified table named ‘temporal_data’.
In this section, we’ll provide a brief overview of the data we’re working with.
The data is clean, and ready for analysis. To access the finalized data imported into R for this analysis, you can download the files user_data, and temporal_data
I initiated this process by obtaining the raw data files and employed VBA scripts for the preprocessing phase. Subsequently, I transferred the processed data into PostgreSQL for in-depth analysis. The VBA and SQL scripts used, along with the observations derived from this stage, can be found in the ‘Processes and Observations’ document, accessible here.
Let’s start by looking at some summary statistics for our data sets.
For the combined_data data set you can see, it contains 5,667,614 records and 18 columns, consisting of:
# Get User Data Summary
user_data_records_count <- dbGetQuery(conn,
"SELECT COUNT(*)
FROM user_data;")
print(user_data_records_count)
## count
## 1 5667614
These columns have been retained for analysis, as each record contains the required information for the evaluation process, as per the provided instructions.
And for the temporal data set you can see, it contains 367 (as there are two overlapping multi-day holidays) records and 37 columns, consisting of:
# Get Temporal Data Summary
temporal_data_record_count <- dbGetQuery(conn,
"SELECT COUNT(*)
FROM temporal_data;")
print(temporal_data_record_count)
## count
## 1 367
As this is an evaluation of user behaviors for casual users of the service versus those with memberships, from this point on the data will be segmented into ‘casual’ and ‘member’.
The chosen color palette was deliberately selected to ensure readability for all viewers, including those with various forms of color blindness.
# User Count Query
user_count <- dbGetQuery(conn,
"SELECT member_casual, Count(*) AS user_count
FROM user_data
GROUP BY member_casual;")
print(user_count)
## member_casual user_count
## 1 casual 2321974
## 2 member 3345640
# Average Ride Length Query
average_ride_length <- dbGetQuery(conn,
"SELECT
ROUND(AVG(
CASE WHEN member_casual = 'member'
THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END
)::numeric, 0) AS rounded_member_mean_ride_length_minutes,
ROUND(AVG(
CASE WHEN member_casual = 'casual'
THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END
)::numeric, 0) AS rounded_casual_mean_ride_length_minutes
FROM
user_data;")
print(average_ride_length)
## rounded_member_mean_ride_length_minutes
## 1 13
## rounded_casual_mean_ride_length_minutes
## 1 29
# Convert user_count to integer
user_count$user_count <- as.integer(user_count$user_count)
library(RColorBrewer) # Load the RColorBrewer package
colors <- brewer.pal(11, "PuOr")
# Create the user_count bar plot
count_plot <- ggplot(user_count, aes(x = member_casual, y = user_count, fill = member_casual)) +
geom_bar(stat = "identity", position = "dodge", color = "white") + # Manually set color
labs(x = NULL, y = "User Count (Millions)") + # Remove x-axis label
ggtitle("Yearly Count by Users") +
scale_fill_manual(values = c("#7F3B08", "#542788")) + # Use PuOr color palette
scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) +
theme(legend.position = "none", # Remove the legend
axis.text.x = element_text(face = "bold")) # Make x-axis labels bold
# Load the necessary package
library(RColorBrewer)
# Format data for plotting
plot_data <- data.frame(
User_Type = c("member", "casual"),
Average_Ride_Length = c(
average_ride_length$rounded_member_mean_ride_length_minutes[1],
average_ride_length$rounded_casual_mean_ride_length_minutes[1]
))
# Create the bar plot
average_plot <- ggplot(plot_data, aes(x = User_Type, y = Average_Ride_Length, fill = User_Type)) +
geom_bar(stat = "identity", position = "dodge", color = "white") + # Manually set color
xlab(NULL) + # Remove x-axis label
ylab("Average Ride Length (Minutes)") + # Define y-axis label
ggtitle("Yearly Average Ride Length") +
scale_fill_manual(values = c("#7F3B08", "#542788")) + # Use PuOr color palette
theme(legend.position = "none", # Remove legend
axis.text.x = element_text(face = "bold")) # Make x-axis labels bold
# Merging the two plots side by side into one chart
library(cowplot)
# Combine the plots
combined_year_plot <- cowplot::plot_grid(count_plot, average_plot, ncol = 2, align = "h")
Examining the data for the year 2022, it becomes evident that ‘Member’ users constitute the majority. However, a noteworthy distinction arises in the average ride duration between ‘Member’ and ‘Casual’ users. Although members engage in a higher number of rides, casual users exhibit extended durations. This contrast in ride lengths implies diverse levels of user engagement and utilization patterns.
NOTE: The contrast in average ride duration highlights the significant difference in behavior between ‘Casual’ and ‘Member’ users. It’s important to note that ‘Casual’ user data encompasses a wide range of ride durations, from full-day excursions to shorter individual rides. This diversity can sometimes skew the perception of average ride length, especially when considering the inclusion of both extended full-day rides and shorter individual trips.
Below, you’ll find visualizations summarizing my process. I’ll highlight key insights that play a pivotal role in shaping the marketing plan, rather than providing an exhaustive list of findings. For a comprehensive view of all queries and their corresponding observations, please consult the ‘Processes and Observations’ section.
monthly_user_count <- dbGetQuery(conn,
"SELECT
TO_CHAR(DATE_TRUNC('month', started_at), 'Month') AS month_name,
COUNT(*) FILTER (WHERE member_casual = 'casual') AS casual_count,
COUNT(*) FILTER (WHERE member_casual = 'member') AS member_count
FROM
user_data
GROUP BY
TO_CHAR(DATE_TRUNC('month', started_at), 'Month')
ORDER BY
MIN(started_at);")
monthly_average_ride_length <- dbGetQuery(conn,
"SELECT
TO_CHAR(started_at, 'Month') AS month,
ROUND(
AVG(CASE WHEN member_casual = 'member'
THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
) AS member_mean_ride_length_minutes,
ROUND(
AVG(CASE WHEN member_casual = 'casual'
THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
) AS casual_mean_ride_length_minutes
FROM
user_data
GROUP BY
TO_CHAR(started_at, 'Month')
ORDER BY
TO_DATE(TO_CHAR(started_at, 'Month'), 'Month');
")
library(ggplot2)
library(RColorBrewer)
# Creating data frame monthly_user_count
monthly_user_count <- data.frame(
month_name = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
casual_count = c(18520, 21416, 89880, 126417, 280413, 369043, 406046, 358917, 296694, 208987, 100747, 44894),
member_count = c(85250, 94193, 194160, 244832, 354443, 400148, 417426, 427000, 404636, 349693, 236947, 136912)
)
# Set month_name as a factor with the desired order
monthly_user_count$month_name <- factor(monthly_user_count$month_name, levels = c(
"January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"
))
# Reshape the data for plotting
monthly_user_count_long <- pivot_longer(monthly_user_count, cols = c(casual_count, member_count), names_to = "User_Type", values_to = "value")
# Create a stacked bar plot with legend, matching legend settings
monthly_user_plot <- ggplot(monthly_user_count_long, aes(x = month_name, y = value, fill = User_Type)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Monthly User Counts", x = NULL, y = "User Count (Millions)") +
scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("", "")) +
scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
legend.key.size = unit(0.1, "in"), # Match legend key size
legend.text = element_text(size = 1), # Adjust legend text size
legend.title = element_blank(), # Remove legend title
legend.background = element_rect(fill = "white") # Make legend background invisible
) +
guides(fill = guide_legend(override.aes = list(fill = NA))) # Make legend completely invisible
print(monthly_user_count)
## month_name casual_count member_count
## 1 January 18520 85250
## 2 February 21416 94193
## 3 March 89880 194160
## 4 April 126417 244832
## 5 May 280413 354443
## 6 June 369043 400148
## 7 July 406046 417426
## 8 August 358917 427000
## 9 September 296694 404636
## 10 October 208987 349693
## 11 November 100747 236947
## 12 December 44894 136912
library(ggplot2)
library(RColorBrewer)
# Create data frame monthly_average_ride_length data
monthly_average_ride_length <- data.frame(
month = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
member_mean_ride_length_minutes = c(12, 11, 12, 11, 13, 14, 14, 13, 13, 12, 11, 11),
casual_mean_ride_length_minutes = c(30, 27, 33, 30, 31, 32, 29, 29, 28, 26, 21, 22)
)
# Set month as a factor with the desired order
monthly_average_ride_length$month <- factor(monthly_average_ride_length$month, levels = c(
"January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"
))
# Reshape the data for plotting
monthly_average_ride_length_long <- pivot_longer(monthly_average_ride_length,
cols = c(member_mean_ride_length_minutes, casual_mean_ride_length_minutes),
names_to = "User_Type", values_to = "value")
# Create a stacked bar plot with RColorBrewer colors and no legend
monthly_ride_length_plot <- ggplot(monthly_average_ride_length_long, aes(x = month, y = value, fill = User_Type)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Monthly Average Ride Length", x = NULL, y = "Average Ride Length (Minutes)") +
scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("Casual", "Member")) + # Use PuOr color palette
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
legend.key.size = unit(0.1, "in"),
legend.text = element_text(size = 8), # Adjust legend text size
legend.title = element_blank() # Remove legend title
)
print(monthly_average_ride_length)
## month member_mean_ride_length_minutes casual_mean_ride_length_minutes
## 1 January 12 30
## 2 February 11 27
## 3 March 12 33
## 4 April 11 30
## 5 May 13 31
## 6 June 14 32
## 7 July 14 29
## 8 August 13 29
## 9 September 13 28
## 10 October 12 26
## 11 November 11 21
## 12 December 11 22
# Combine the plots
combined_monthly_plots <- cowplot::plot_grid(monthly_user_plot, monthly_ride_length_plot, ncol = 2, align = "h")
The data illustrates a clear distinction in ride counts between “Member” and “Casual” users. “Member” users consistently account for a significant portion of rides throughout the year, with notable spikes in the warmer months. In contrast, “Casual” users tend to have lower ride counts, with increases in the warmer months. These trends suggest a preference for membership-based usage, particularly during the winter season.
Additionally, the data showcases distinct patterns in average ride lengths for both member and casual users. Members exhibit a preference for shorter rides, with durations averaging around 12 to 14 minutes. Casual users, on the other hand, tend to opt for longer rides, with average durations ranging from 27 to 33 minutes. These differences in ride lengths are consistent throughout the year and highlight varying user preferences.
Both user engagement and ride lengths display seasonal fluctuations. Warmer months, such as May through August, show increased engagement from both member and casual users. Additionally, average ride lengths tend to peak during the spring and summer, indicating a shift towards longer rides in favorable weather conditions. Based on this observation we will look at the data in a seasonal format.
# Getting a overview of rider trends seasonally
seasonal_user_counts <- dbGetQuery(conn,
"WITH seasonal_counts AS (
SELECT
CASE
WHEN EXTRACT(MONTH FROM started_at) IN (12, 1, 2) THEN 'Winter'
WHEN EXTRACT(MONTH FROM started_at) IN (3, 4, 5) THEN 'Spring'
WHEN EXTRACT(MONTH FROM started_at) IN (6, 7, 8) THEN 'Summer'
WHEN EXTRACT(MONTH FROM started_at) IN (9, 10, 11) THEN 'Autumn'
ELSE 'Unknown'
END AS season,
COUNT(*) FILTER (WHERE member_casual = 'member') AS member_count,
COUNT(*) FILTER (WHERE member_casual = 'casual') AS casual_count
FROM
user_data
GROUP BY
season
)
SELECT
season,
(member_count) AS member,
(casual_count) AS casual
FROM
seasonal_counts
ORDER BY
season;")
print(seasonal_user_counts)
## season member casual
## 1 Autumn 991276 606428
## 2 Spring 793435 496710
## 3 Summer 1244574 1134006
## 4 Winter 316355 84830
# Getting Average Ride Length by Season
seasonal_average_ride_length <- dbGetQuery(conn,
"SELECT
CASE
WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
ELSE 'Winter'
END AS season,
CAST(
ROUND(
AVG(CASE WHEN member_casual = 'member'
THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 0
) AS INT
) AS member_ride_length,
CAST(
ROUND(
AVG(CASE WHEN member_casual = 'casual'
THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 0
) AS INT
) AS casual_ride_length
FROM
user_data
GROUP BY
CASE
WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
ELSE 'Winter'
END;
")
print(seasonal_average_ride_length)
## season member_ride_length casual_ride_length
## 1 Autumn 12 26
## 2 Spring 12 31
## 3 Summer 14 30
## 4 Winter 11 25
library(ggplot2)
library(RColorBrewer)
# Create a factor with custom levels
seasonal_user_counts$season <- factor(seasonal_user_counts$season, levels = c("Spring", "Summer", "Autumn", "Winter"))
# Adjusting data format to integer
seasonal_user_counts$member <- as.integer(seasonal_user_counts$member)
seasonal_user_counts$casual <- as.integer(seasonal_user_counts$casual)
# Reshape the data for plotting
seasonal_user_counts_long <- pivot_longer(seasonal_user_counts, cols = c(member, casual), names_to = "User_Type", values_to = "User_Count")
# Create a bar plot with separate bars for casual and member
seasonal_user_plot <- ggplot(seasonal_user_counts_long, aes(x = season, y = User_Count, fill = User_Type)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Seasonal User Counts", x = NULL, y = "User Count (Millions)") +
scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("", "")) +
scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
legend.key.size = unit(0.1, "in"), # Match legend key size
legend.text = element_text(size = 1), # Adjust legend text size
legend.title = element_blank(), # Remove legend title
legend.background = element_rect(fill = "white") # Make legend background invisible
) +
guides(fill = guide_legend(override.aes = list(fill = NA))) # Make legend completely invisible
library(ggplot2)
library(RColorBrewer)
# Create a data frame for member and casual ride lengths
member_casual_data <- seasonal_average_ride_length %>%
pivot_longer(cols = c(member_ride_length, casual_ride_length),
names_to = "User_Type",
values_to = "Ride_Length")
# Define the order of seasons
season_order <- c("Spring", "Summer", "Autumn", "Winter")
# Apply the order to the 'season' column
member_casual_data$season <- factor(member_casual_data$season, levels = season_order)
# Create a grouped bar plot
seasonal_average_ride_length_plot <- ggplot(member_casual_data, aes(x = season, y = Ride_Length, fill = User_Type)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Seasonal Average Ride Length", x = NULL, y = "Average Ride Length (Minutes)") +
scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("Casual", "Member")) + # Use PuOr color palette
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
legend.key.size = unit(0.1, "in"),
legend.text = element_text(size = 8), # Adjust legend text size
legend.title = element_blank() # Remove legend title
)
# Merging the two plots side by side into one chart
combined_seasonal_plot <- seasonal_user_plot + seasonal_average_ride_length_plot
The analysis of user engagement and ride lengths across seasons provides valuable insights into how users interact with the cycling platform under varying environmental conditions. This data uncovers distinct preferences between “Member” and “Casual” users, shedding light on their utilization patterns.
Ride Length Observations:
Examining the data reveals that “Casual” users tend to engage in longer rides compared to “Member” users, regardless of the season. Notably, both user groups favor longer rides during the warmer months of Spring and Summer, while ride durations shorten in Autumn and Winter.
User Utilization Trends:
Ride length patterns also exhibit interesting variations. “Member” users consistently demonstrate higher engagement levels across all seasons, indicating a strong preference for the platform. In contrast, “Casual” users exhibit more variability, with peaks in utilization during certain seasons, particularly in the warmer months of Spring and Summer.
Seasonal Dynamics:
Seasonal shifts play a significant role in user behavior. The data suggests that environmental factors strongly influence user engagement and ride lengths. As temperatures rise, users are inclined to embark on more extended journeys, while colder seasons witness shorter rides.
casual_monthly_mode_day <- dbGetQuery(conn,
"WITH DayOfWeekCounts AS (
SELECT
TO_CHAR(started_at, 'Month') AS month,
EXTRACT(DOW FROM started_at) AS day_of_week,
COUNT(*) AS ride_count,
ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60)::numeric, 2) AS avg_ride_length_minutes,
ROW_NUMBER() OVER (PARTITION BY TO_CHAR(started_at, 'Month')
ORDER BY COUNT(*) DESC) AS rank
FROM user_data
WHERE member_casual = 'casual'
GROUP BY month, EXTRACT(DOW FROM started_at)
)
SELECT
month,
CASE
WHEN day_of_week = 0 THEN 'Sunday'
WHEN day_of_week = 1 THEN 'Monday'
WHEN day_of_week = 2 THEN 'Tuesday'
WHEN day_of_week = 3 THEN 'Wednesday'
WHEN day_of_week = 4 THEN 'Thursday'
WHEN day_of_week = 5 THEN 'Friday'
WHEN day_of_week = 6 THEN 'Saturday'
END AS mode_day_of_week,
ride_count AS max_count,
avg_ride_length_minutes
FROM DayOfWeekCounts
WHERE rank = 1
ORDER BY to_timestamp(month, 'Month')::DATE;
")
print(casual_monthly_mode_day)
## month mode_day_of_week max_count avg_ride_length_minutes
## 1 January Saturday 3791 37.99
## 2 February Monday 4405 24.90
## 3 March Sunday 16575 38.79
## 4 April Saturday 36330 34.75
## 5 May Sunday 55320 33.06
## 6 June Sunday 65851 36.14
## 7 July Saturday 95230 32.90
## 8 August Saturday 66188 31.94
## 9 September Saturday 64076 35.39
## 10 October Saturday 52221 29.46
## 11 November Thursday 17981 24.17
## 12 December Thursday 8177 19.69
member_monthly_mode_day <- dbGetQuery(conn,
"WITH DayOfWeekCounts AS (
SELECT
TO_CHAR(started_at, 'Month') AS month,
EXTRACT(DOW FROM started_at) AS day_of_week,
COUNT(*) AS ride_count,
ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60)::numeric, 2) AS avg_ride_length_minutes,
ROW_NUMBER() OVER (PARTITION BY TO_CHAR(started_at, 'Month')
ORDER BY COUNT(*) DESC) AS rank
FROM user_data
WHERE member_casual = 'member'
GROUP BY month, EXTRACT(DOW FROM started_at)
)
SELECT
month,
CASE
WHEN day_of_week = 0 THEN 'Sunday'
WHEN day_of_week = 1 THEN 'Monday'
WHEN day_of_week = 2 THEN 'Tuesday'
WHEN day_of_week = 3 THEN 'Wednesday'
WHEN day_of_week = 4 THEN 'Thursday'
WHEN day_of_week = 5 THEN 'Friday'
WHEN day_of_week = 6 THEN 'Saturday'
END AS mode_day_of_week,
ride_count AS max_count,
avg_ride_length_minutes
FROM DayOfWeekCounts
WHERE rank = 1
ORDER BY to_timestamp(month, 'Month')::DATE;
")
print(member_monthly_mode_day)
## month mode_day_of_week max_count avg_ride_length_minutes
## 1 January Thursday 14011 11.62
## 2 February Monday 18375 11.36
## 3 March Wednesday 35957 11.96
## 4 April Tuesday 40432 10.83
## 5 May Monday 62061 13.39
## 6 June Thursday 73500 13.70
## 7 July Saturday 68867 15.30
## 8 August Tuesday 76718 13.06
## 9 September Thursday 76589 12.43
## 10 October Monday 58287 11.42
## 11 November Wednesday 47448 10.92
## 12 December Thursday 27095 10.40
library(ggplot2)
library(scales)
# Create the data
mode_day <- data.frame(
month = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
mode_day_of_week = c("Saturday", "Monday", "Sunday", "Saturday", "Sunday", "Sunday", "Saturday", "Saturday", "Saturday", "Saturday", "Thursday", "Thursday"),
max_count = c(3791, 4405, 16575, 36330, 55320, 65851, 95230, 66188, 64076, 52221, 17981, 8177),
avg_ride_length_minutes = c(37.99, 24.90, 38.79, 34.75, 33.06, 36.14, 32.90, 31.94, 35.39, 29.46, 24.17, 19.69)
)
# Specify the order of months
mode_day$month <- factor(mode_day$month, levels = c(
"January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"
))
# Specify the order of days
mode_day$mode_day_of_week <- factor(mode_day$mode_day_of_week, levels = c(
"Monday", "Thursday", "Saturday", "Sunday"
))
# Format max_count with commas
mode_day$max_count <- comma(mode_day$max_count)
# Plot for casual
casual_month_mode_plot <- ggplot(mode_day, aes(x = month, y = mode_day_of_week)) +
geom_bar(stat = "identity", fill = "#7F3B08") +
geom_text(aes(label = max_count), vjust = -0.5, size = 3) +
geom_text(aes(label = sprintf("%.2f", avg_ride_length_minutes), y = 0.5), vjust = 1, color = "white") +
labs(title = "Mode Day of the Week by Month (Casual)\nwith Rider Counts and Avg Ride Length",
x = "",
y = "") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
library(ggplot2)
library(scales)
# Create the data for 'member'
mode_day_member <- data.frame(
month = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
mode_day_of_week = c("Thursday", "Monday", "Wednesday", "Tuesday", "Monday", "Thursday", "Saturday", "Tuesday", "Thursday", "Monday", "Wednesday", "Thursday"),
max_count = c(14011, 18375, 35957, 40432, 62061, 73500, 68867, 76718, 76589, 58287, 47448, 27095),
avg_ride_length_minutes = c(11.62, 11.36, 11.96, 10.83, 13.39, 13.70, 15.30, 13.06, 12.43, 11.42, 10.92, 10.40)
)
# Specify the order of months
mode_day_member$month <- factor(mode_day_member$month, levels = c(
"January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November", "December"
))
# Specify the order of days
mode_day_member$mode_day_of_week <- factor(mode_day_member$mode_day_of_week, levels = c(
"Monday", "Thursday", "Wednesday", "Tuesday", "Saturday", "Sunday"
))
# Format max_count with commas
mode_day_member$max_count <- comma(mode_day_member$max_count)
# Plot for 'member' with rider counts and average ride length
member_month_mode_plot <- ggplot(mode_day_member, aes(x = month, y = mode_day_of_week)) +
geom_bar(stat = "identity", fill = "#542788") +
geom_text(aes(label = max_count), vjust = -0.5, size = 3) +
geom_text(aes(label = sprintf("%.2f", avg_ride_length_minutes), y = 0.5), vjust = 1, color = "white") +
labs(title = "Mode Day of the Week by Month (Member)\nwith Rider Counts and Avg Ride Length",
x = "",
y = "") +
theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
Casual User Group:
In January, Casual users prefer Saturdays for rides, with an average duration of approximately 37.99 minutes. February sees a shift towards weekday rides, with Mondays emerging as the mode day and rides averaging around 24.90 minutes.
March shows Sundays as the mode day, with an average ride length of approximately 38.79 minutes, indicating a consistent trend of longer rides. In April, Saturdays continue to be the preferred day with rides lasting around 34.75 minutes, suggesting a balanced mix of exploration. May sees Sundays remaining prominent, with rides averaging about 33.06 minutes, in line with the trend of longer rides. June continues the preference for Sundays as the mode day, with average rides lasting around 36.14 minutes, indicating more extensive rides during the warmer season.
July maintains Saturday’s prominence, with rides averaging around 32.90 minutes, indicating a balance between exploration and shorter trips. August sees Saturday remaining popular, with rides averaging approximately 31.94 minutes, suggesting a consistent trend of moderate-length rides. In September, Saturdays continue to be the preferred day, with rides averaging about 35.39 minutes, indicating a preference for weekend exploration. October shows Saturday retaining its position, suggesting a continued preference for weekend rides, with an average ride length of approximately 29.46 minutes.
November sees Thursdays emerging as the mode day, with rides lasting around 24.17 minutes. December maintains Thursdays’ prominence, with rides averaging around 19.69 minutes, suggesting a trend towards shorter rides in the colder season.
Across the year, Casual users exhibit distinct preferences for days of the week. In the warmer months of June, July, and August, Saturdays are particularly popular, with ride counts peaking at over 68,000. Sundays also exhibit high engagement during the summer. Conversely, the colder months of November and December see Thursdays as the preferred day for Casual riders. These trends underscore a seasonal shift in user behavior, with warmer months encouraging more weekend exploration.
Average ride lengths for Casual users follow a fairly consistent pattern. Throughout the year, rides tend to be longer, averaging around 19 to 38 minutes. Interestingly, March stands out with the longest average ride lengths, suggesting leisurely rides during the spring Conversely, December exhibits the shortest ride lengths, potentially indicating more purposeful, shorter trips in the colder weather.
Member User Group:
In January, Member riders prefer Thursdays for rides, with an average duration of approximately 11.62 minutes. February sees a continuation of weekday rides, with a shift to Mondays emerging as the mode day and rides averaging around 11.36 minutes.
March shows Wednesdays as the mode day, with an average ride length of approximately 11.96 minutes. In April, Tuesdays emerge as the mode day for Member users, indicating a preference for rides early in the week, with rides lasting around 10.83 minutes, suggesting purposeful, shorter trips. May continues to see Mondays as the mode day for Member riders. The average ride length is approximately 13.39 minutes, indicating a balance between purposeful trips and potential leisurely exploration.
June maintains Thursdays’ prominence, reflecting a preference for mid-week rides among Member users. The average ride length increases to about 13.70 minutes, potentially indicating slightly longer trips compared to previous months. July sees Saturdays emerging as the mode day for Member riders, signaling a shift towards weekend rides. The average ride length increases significantly to around 15.30 minutes, suggesting more extensive rides during the warmer season.
August sees Tuesdays regaining prominence as the mode day for Member users. The average ride length remains consistent at about 13.06 minutes, indicating a preference for purposeful, shorter trips. In September, Thursdays continue to be the mode day for Member users, aligning with a preference for mid-week rides. The average ride length remains fairly consistent at approximately 12.43 minutes.
October shows Mondays emerging as the mode day for Member users, signaling a shift towards weekday rides, with an average ride length of around 11.42 minutes, indicating a balance between purposeful trips and potential leisurely exploration. November sees Wednesdays regaining prominence as the mode day for Member riders. The average ride length is approximately 10.92 minutes. December maintains Thursdays’ prominence, aligning with a preference for mid-week rides among Member users. The average ride length decreases to approximately 10.40 minutes, suggesting a trend towards slightly shorter trips in the colder season.
Across the year, Member users exhibit varying preferences for days of the week. In July, Saturdays are particularly popular, with ride counts exceeding 68,867. In August, this switches to Tuesdays, with rides peaking at over 76,718, the highest ridership month of the year. Mondays and Thursdays also exhibit high engagement across the year. These trends highlight a seasonal shift in user behavior, with warmer months encouraging more weekend exploration.
Average ride lengths for Member users show a fairly consistent pattern. Throughout the year, rides tend to be around 10 to 15 minutes long. Interestingly, July sees the longest average ride lengths, which may suggest users taking leisurely rides during the summer. Conversely, December exhibits the shortest ride lengths, potentially indicating more purposeful, shorter trips in the colder weather.
Recognizing the critical role that weather data plays in contextualizing and analyzing trends, I embarked on the task of establishing a dedicated ‘weather’ data set for the Chicago area in 2022. This decision was spurred by the data set’s direct relevance to this specific region. Constructing a comprehensive record of weather patterns specific to Chicago holds immense potential in bolstering analytical capabilities and providing a deeper contextual understanding of the data. By incorporating local weather information, this data set will serve as an invaluable resource for conducting more nuanced analyses and deriving insights closely attuned to the dynamic weather conditions shaping the Chicago area.
library(DBI)
# Comparing effects of temperatures on rides
temps_and_rides <- dbGetQuery(conn,
"SELECT
CASE
WHEN td.temp < 50 THEN 'Less than 50°F'
WHEN td.temp >= 50 AND td.temp < 60 THEN '50-59°F'
WHEN td.temp >= 60 AND td.temp < 70 THEN '60-69°F'
WHEN td.temp >= 70 AND td.temp < 80 THEN '70-79°F'
ELSE '80°F and above'
END AS temperature_range,
SUM(CASE WHEN ud.member_casual = 'casual' THEN 1 ELSE 0 END) AS temporal_holiday_casual_riders_count,
SUM(CASE WHEN ud.member_casual = 'member' THEN 1 ELSE 0 END) AS temporal_holiday_member_riders_count
FROM
temporal_data td
LEFT JOIN user_data ud ON td.date_w = ud.date_start
GROUP BY
temperature_range
ORDER BY
temperature_range;")
print(temps_and_rides)
## temperature_range temporal_holiday_casual_riders_count
## 1 50-59°F 280946
## 2 60-69°F 364593
## 3 70-79°F 1079683
## 4 80°F and above 306548
## 5 Less than 50°F 312057
## temporal_holiday_member_riders_count
## 1 477149
## 2 484027
## 3 1190246
## 4 359555
## 5 861900
# Adjusting temps_and_rides$holiday_casual_riders_count and temps_and_rides$holiday_member_riders_count from integer64 to number
temps_and_rides$temporal_holiday_casual_riders_count <- as.numeric(temps_and_rides$temporal_holiday_casual_riders_count)
temps_and_rides$temporal_holiday_member_riders_count <- as.numeric(temps_and_rides$temporal_holiday_member_riders_count)
# Reshape the data for plotting
temps_and_rides_long <- pivot_longer(temps_and_rides,
cols = c(temporal_holiday_member_riders_count, temporal_holiday_casual_riders_count),
names_to = "User_Type", values_to = "value")
temps_and_rides_plot <- ggplot(temps_and_rides_long, aes(x = temperature_range, y = value, fill = User_Type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Ridership by Temperature", x = "Temperature Range", y = "Ridership Count (Millions)") +
scale_fill_manual(values = c("temporal_holiday_casual_riders_count" = "#7F3B08", "temporal_holiday_member_riders_count" = "#542788"),
labels = c("temporal_holiday_casual_riders_count" = "Casual", "temporal_holiday_member_riders_count" = "Member")) +
scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) +
theme(legend.position = "right", axis.text.x = element_text(face = "bold")) +
guides(fill = guide_legend(title = NULL)) # This removes the legend title
Temperature Impacts:
The “70-79°F” temperature range shows the
highest ridership, both for casual and member riders. This indicates
that ridership tends to increase as temperatures rise, suggesting that
mild and pleasant weather conditions attract more riders.
Member ridership is notably higher when the temperature range between 50°F and 59°F. Ridership remains consistently high across both temperature categories in the 60°F to 69°F range. This information can help in planning marketing efforts based on expected temperature levels.
Seasonal Preferences:
Ridership is higher during mild and warm
weather, indicating a preference for comfortable riding conditions.
Member Engagement and Temperature:
The “70-79°F” temperature
range stands out with high member ridership counts, indicating that
members are more inclined to use the service during mild and pleasant
weather.
Casual Engagement and Temperature:
When considering casual
riders, the “70-79°F” temperature range also has the highest casual
ridership counts. This indicates that casual riders are similarly drawn
to use the service during this temperature range, emphasizing the
positive influence of comfortable weather conditions.
I realized the importance of establishing a dedicated ‘holidays’ table for the year 2022, tailored to the unique characteristics of the Chicago area. This decision was driven by the data set’s origin in this specific region. The creation of a comprehensive record of holidays specific to Chicago holds the potential to significantly enhance both the analytical capabilities and the contextual relevance of the data. By including local holidays, this table will serve as a valuable resource for conducting more insightful analyses and drawing meaningful conclusions that are closely aligned with the Chicago area’s socio-cultural dynamics.
I conducted thorough research using Google to gather accurate and relevant information regarding holidays and their corresponding dates specific to the Chicago area. This process ensured the creation of a comprehensive and contextually accurate “holidays” table for the year 2022. By utilizing reliable sources and cross-referencing information, I curated a data set that aligns with the local holiday calendar, enhancing the analytical capabilities and relevance of the collected data.
library(DBI)
# Comparing effects of holidays on rides
holidays_and_rides <- dbGetQuery(conn,
"SELECT
td.holiday_name,
COUNT(CASE WHEN member_casual = 'casual' THEN 1 END) AS holiday_casual_riders_count,
COUNT(CASE WHEN member_casual = 'member' THEN 1 END) AS holiday_member_riders_count
FROM
user_data ud
LEFT JOIN temporal_data td ON ud.date_start = td.holiday_start_date
WHERE
td.holiday_name IS NOT NULL
GROUP BY
td.holiday_name
ORDER BY
td.holiday_name;")
print(holidays_and_rides)
## holiday_name holiday_casual_riders_count
## 1 Boxing Day 272
## 2 Chicago Air and Water Show 28135
## 3 Chicago Blues Festival 53725
## 4 Chicago Jazz Festival 51014
## 5 Chicago Pride Parade 19493
## 6 Chicago St. Patrick's Day Parade 1507
## 7 Chinese New Year 656
## 8 Christmas Day 87
## 9 Columbus Day 6568
## 10 Diwali 6250
## 11 Earth Day 2382
## 12 Easter 1749
## 13 Eid al-Adha 20989
## 14 Eid al-Fitr (End of Ramadan) 3734
## 15 Father's Day 11627
## 16 Flag Day 10505
## 17 Halloween 4012
## 18 Independence Day 13700
## 19 International Women's Day 1653
## 20 International Worker's Day (May Day) 5113
## 21 Juneteenth 17442
## 22 Lollapalooza 63795
## 23 Mardi Gras 1857
## 24 Martin Luther King Jr. 643
## 25 Memorial Day 6743
## 26 Mother's Day 4925
## 27 New Year's 1085
## 28 Oktoberfest 105689
## 29 Pi Day 4014
## 30 Presidents Day 1591
## 31 Taste of Chicago 27375
## 32 Thanksgiving 2236
## 33 Valentine's Day 674
## 34 Veterans Day 4209
## holiday_member_riders_count
## 1 973
## 2 23426
## 3 52167
## 4 51252
## 5 13503
## 6 3262
## 7 4029
## 8 394
## 9 14559
## 10 13468
## 11 6076
## 12 5477
## 13 15161
## 14 10016
## 15 14022
## 16 13611
## 17 10518
## 18 8827
## 19 6878
## 20 7283
## 21 11853
## 22 56922
## 23 6870
## 24 3009
## 25 9614
## 26 10055
## 27 1478
## 28 132819
## 29 8331
## 30 5109
## 31 37915
## 32 3078
## 33 3787
## 34 8979
# Create a data frame from the provided data
holiday_data <- data.frame(
holiday_name = c("Boxing Day", "Chicago Air and Water Show", "Chicago Blues Festival", "Chicago Jazz Festival",
"Chicago Pride Parade", "Chicago St. Patrick's Day Parade", "Chinese New Year", "Christmas Day",
"Columbus Day", "Diwali", "Earth Day", "Easter", "Eid al-Adha", "Eid al-Fitr (End of Ramadan)",
"Father's Day", "Flag Day", "Halloween", "Independence Day", "International Women's Day",
"International Worker's Day (May Day)", "Juneteenth", "Lollapalooza", "Mardi Gras",
"Martin Luther King Jr.", "Memorial Day", "Mother's Day", "New Year's", "Oktoberfest",
"Pi Day", "Presidents Day", "Taste of Chicago", "Thanksgiving", "Valentine's Day",
"Veterans Day"),
casual_riders_count = c(272, 28135, 53725, 51014, 19493, 1507, 656, 87, 6568, 6250, 2382, 1749, 20989,
3734, 11627, 10505, 4012, 13700, 1653, 5113, 17442, 63795, 1857, 643, 6743,
4925, 1085, 105689, 4014, 1591, 27375, 2236, 674, 4209),
member_riders_count = c(973, 23426, 52167, 51252, 13503, 3262, 4029, 394, 14559, 13468, 6076, 5477,
15161, 10016, 14022, 13611, 10518, 8827, 6878, 7283, 11853, 56922, 6870,
3009, 9614, 10055, 1478, 132819, 8331, 5109, 37915, 3078, 3787, 8979)
)
# Reshape data for ggplot
library(tidyr)
library(dplyr)
holiday_data_long <- pivot_longer(holiday_data, cols = c(casual_riders_count, member_riders_count), names_to = "User_Type", values_to = "Rides_Count")
# Arrange the data in descending order based on Rides_Count
holiday_data_long <- holiday_data_long %>%
arrange(desc(Rides_Count))
# Create Horizontal Stacked Bar Chart
holiday_plot_1 <- ggplot(holiday_data_long, aes(y = reorder(holiday_name, Rides_Count), x = Rides_Count, fill = User_Type)) +
geom_bar(stat = "identity", position = "stack") +
labs(title = "Rider Engagement During Holidays",
x = "Number of Riders",
y = "") +
scale_fill_manual(values = c("casual_riders_count" = "#7F3B08", "member_riders_count" = "#542788"),
labels = c("casual_riders_count" = "Casual", "member_riders_count" = "Member")) +
theme_minimal() +
theme(axis.title.y = element_blank(), axis.text.y = element_text(hjust = 0),
legend.key.size = unit(0.1, "in"),
legend.text = element_text(size = 8), # Adjust legend text size
legend.title = element_blank() # Remove legend title
)
Looking at this I decided that I wanted to zoom in and focus on those holidays where ridership is greater than twenty thousand.
# Filter the data to include holidays with rider count >= 50,000
filtered_holiday_data <- holiday_data_long[holiday_data_long$Rides_Count > 20000, ]
# Calculate the total riders for each user type
total_casual_riders <- sum(filtered_holiday_data$Rides_Count[filtered_holiday_data$User_Type == "casual_riders_count"])
total_member_riders <- sum(filtered_holiday_data$Rides_Count[filtered_holiday_data$User_Type == "member_riders_count"])
# Define colors for User_Type
user_type_colors <- c("casual_riders_count" = "#7F3B08", "member_riders_count" = "#542788")
# Plot the filtered data
filtered_holiday <- ggplot(filtered_holiday_data, aes(y = reorder(holiday_name, Rides_Count), x = Rides_Count, fill = User_Type)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Rider Engagement During Holidays (Rider Count > 20,000)",
x = "", # Remove x-axis label for now
y = "") +
scale_fill_manual(values = user_type_colors,
labels = c("casual_riders_count" = "Casual", "member_riders_count" = "Member")) +
theme_minimal() +
theme(axis.title.y = element_blank(),
axis.text.y = element_text(face = "bold", hjust = 0, vjust = 0.5),
legend.key.size = unit(0.2, "in"),
legend.text = element_text(size = 8), # Adjust legend text size
legend.title = element_blank(), # Remove legend title
axis.text.x = element_text(hjust = 0, vjust = 0.5), # Center x-axis labels
axis.title.x = element_text(hjust = 1)) # Align x-axis title to the right
# Format x-axis labels in thousands and millions with commas
filtered_holiday +
scale_x_continuous(labels = scales::comma)
Looking at this chart we notice that for Eid al-Adha, there is only casual riders represented, this occurs due to the fact that member ridership falls under 20,000 while casual ridership is greater than 20,000.
I am excited to present three strategic proposals aimed at enhancing user engagement and driving membership conversions for Cyclistic. These initiatives leverage data-driven insights to create targeted marketing campaigns that align with Cyclistic’s business objectives. First I will present the overview, and following will be the details for each proposal.
Birthday Gift of Health:
To foster a deeper connection with Cyclistic’s users, I propose implementing a special birthday offer. By utilizing the birthday information collected from users during the sign-up process to utilize the service, Cyclistic can extend a personalized promotion encouraging users to invest in their health.
Execution:Automated Birthday Campaign
Personalized Offer with Promo Code
Engaging Content and Call to Action
Follow-Up Sequence
Compliance and Data Security
Cost Savings for Frequent Riders:
Cyclistic aims to convert frequent riders into loyal members by demonstrating potential cost savings through a membership plan. By analyzing ride history and providing personalized cost comparisons,Cyclistic can illustrate the financial benefits of membership.
Execution:Ride History Analysis
Personalized Cost Comparison
Targeted In-App Notifications and Emails
Visualize Savings
Holiday Membership Promotions:
Capitalizing on holiday ridership trends, I propose running targeted promotions during high-demand periods. By offering special incentives for membership, Cyclistic can capture the attention and possible buy-in of users in a festive spending mindset.
Execution:Holiday Ridership Data Analysis
Tailored Holiday Promotions
Multi-Platform Engagement
To enhance user engagement and encourage membership, extend a special birthday offer to users who have provided their birthday information.
Execution:Birthday Data Collection:
Ensure user profiles include birthday information as part of the sign-up process, in compliance with legal age requirements.
Automated Birthday Campaign:
Set up an automated system to trigger birthday greetings via email or in-app messages on the user’s special day.
Personalized Offer:
Include a unique promo code for a discounted membership rate as a birthday gift, encouraging them to invest in their health.
Engaging Content:
Craft a personalized message highlighting the value of a membership, emphasizing the positive impact on their well-being.
Call to Action:
Include a clear call to action guiding users to sign up for the membership using the provided promo code.
Follow-Up:
Implement a follow-up sequence to remind users of the limited-time offer and reinforce the benefits of membership.
Compliance and Data Security:
Ensure all data collection and usage comply with relevant privacy laws, providing users with transparency and control over their information.
Encourage frequent riders to switch to a membership plan based on their past ride history and potential cost savings.
Execution:Ride History Analysis:
Analyze user ride data to identify individuals who could benefit from a membership plan based on their past ride frequency.
Personalized Cost Comparison:
Provide each user with a customized cost analysis, demonstrating potential savings with a membership compared to pay-per-ride. In-App Notifications and Emails:
Send targeted in-app messages and emails to eligible users, presenting them with the cost-saving opportunity. Visualize Savings:
Create visual representations showcasing the financial advantages of being a member, particularly for frequent riders.
Capitalize on holiday ridership trends to drive membership conversions during high-demand periods.
Execution:Holiday Ridership Data Analysis:
Identify holidays with high ridership numbers, indicating potential interest and willingness to spend.
Tailored Holiday Promotions:
Design special promotions for these holidays, highlighting the financial benefits of membership during festive seasons.
Engage Across Platforms:
Utilize social media, email campaigns, and in-app notifications to reach users and promote the holiday offer.
I believe these proposals have the potential to significantly impact
user conversion from casual user to membership. I look forward to your
feedback and collaboration in implementing these strategies to drive
Cyclistic’s growth and success.