Hello, I’m Cherrie Cosby. My journey into the world of computers was ignited by an insatiable curiosity that led me to dive deep into learning. This journey has taken me from being a contract software specialist to becoming a skilled data analyst. My expertise encompasses a wide range of sectors, from manufacturing to marketing, where I’ve skillfully blended my understanding of business dynamics with my prowess in software.

Throughout my career, I’ve spearheaded initiatives that have not only amplified revenues but also streamlined expenditures. I’m deeply driven by the prospect of driving innovation at the intersection of technology and business. I’m enthusiastic about lending my multidisciplinary insights to pioneering projects that have the potential to redefine norms.

My toolkit is well-equipped with VBA, SQL, R, Git, Python, Tableau, and beyond. These tools enable me to navigate complex challenges with finesse and creativity. I’m excited about the opportunity to collaborate and pave fresh pathways in this dynamic and thrilling realm. If you’re as excited about the future of technology and business as I am, let’s join forces and make remarkable strides together!


Introduction to the Cyclistic Marketing Analysis

Project Overview

Welcome to the Cyclistic Marketing Analysis, an exploration of Chicago’s premier bike share program. Cyclistic boasts over 5,800 bikes stationed across 692 docking points, catering to a diverse range of riders, from casual cyclists to daily commuters.

Key Participants:

Cherrie Cosby – Junior Data Analyst in the Cyclistic marketing analytics team.

Lily Moreno – Stakeholder - Director of Marketing in charge of development for campaigns and initiatives to promote the program, including email, social media, and/or other channels.

Company and Program Details:

Cyclistic offers a range of bikes, including classic bikes, docked bikes, and electric bikes, with various styles and options. This diversity ensures accessibility for all riders, including those with disabilities.

Approximately 70% of riders use the bikes for leisure, while 30% use them for daily commuting, with about 8% of riders utilizing the assistive options.

Cyclistic provides flexible pricing plans, such as single-ride, full-day, and annual membership options.

Background and Task:

The objective is clear: to delve into the usage patterns of Cyclistic’s annual members and casual riders. Through this analysis, I aim to unearth the unique characteristics and behaviors that set these two rider groups apart. Armed with this understanding, my mission is to craft precise marketing strategies with one overarching goal – transforming casual riders into dedicated annual members. This strategy is designed to maximize annual memberships and propel Cyclistic’s growth. My analysis will provide a data-driven roadmap, guiding the implementation of a potent marketing program that will fortify Cyclistic’s position as a premier bike-share company.

The marketing analyst team, led by Lily Moreno, is interested in analyzing the Cyclistic historical bike trip data to identify trends.

Hypothesis: Maximizing the number of annual members will be key to future growth.

The Goal: Increase conversion of casual riders into annual members.

The Questions to be Answered:

  • How do annual members and casual riders use Cyclistic bikes differently?
    • Identify the differences between annual members and casual riders and how they use the program differently.
  • Why would casual riders buy a membership to Cyclistic?
    • Identify the factors that could influence that decision.
  • How can Cyclistic use digital media to influence casual riders to become members?
    • Identify the most effective communication channels and create detailed strategies for each identified channel to maximize success.

Join me on this exciting journey into the behaviors of Cyclistic’s ridership, where data-driven decisions pave the path to success.

My drive to undertake this project is fueled by a deep-seated passion for business analytics and an unwavering commitment to revolutionize urban mobility.Through refining Cyclistic’s annual membership program and optimizing the transition from casual user to dedicated member, I am committed to promoting sustainable and eco-friendly transportation options, providing a seamless and gratifying experience for all riders.

The primary audience for this project is prospective employers, recruiters, and fellow data enthusiasts who seek to understand the practical application of data analysis in real-world scenarios. My goal is to showcase a professional and skillful approach to business analytics, providing a tangible example of data-driven decision-making.


Citations:

Original Source Data

  • The data for Cyclistic has been made available by Motivate International Inc. under license and is available for review on their official website

Additional Temporal Data

  • The holidays data set focuses on the city of Chicago, Illinois, spanning the time frame from January 1, 2022, to December 31, 2022 and was compiled using Google Search. The data set is centered around key holidays observed within this period. It includes holiday names along with their respective start and end dates, providing insights into the specific days of significance throughout the year. The data set captures the essence of these holidays, allowing for a comprehensive understanding of their occurrences and durations in Chicago. This set of holiday data serves as a valuable resource for analyzing and contextualizing rider behaviors during these significant dates within the city.

  • I acquired the weather data from a reputable source, Visual Crossing, with explicit permission. The acquired weather data set is dedicated to the city of Chicago, Illinois, covering the time span from January 1, 2022, to December 31, 2022. This data set presents a temporal dimension through daily temperature recordings and encompasses a wide array of meteorological variables. Among the temperature measurements in Fahrenheit, the data set encompasses humidity levels, detailed precipitation data, atmospheric pressure readings, solar indexes, cloud information, visibility metrics, and wind statistics. Additionally, the data set includes essential data points such as sunrise and sunset times, moon phases, weather conditions, their corresponding descriptions, and the collection location for the data, with distance measurements provided in miles. This comprehensive data set offers invaluable insights into the climatic trends and conditions of Chicago throughout the specified time frame.


Data Exploration

To enhance visual flow, collapsible sections have been utilized to incorporate both code chunks and their corresponding outputs, providing a streamlined and user-friendly presentation.

<- This symbol serves as an indicator for a collapsed section. Simply click on it to expand and reveal the content.


This feature allows you to focus on specific content of interest, enhancing your reading experience. Feel free to explore the additional details within each section by clicking on the symbol.


Getting Started

Package installations - loading and reading in the data files


Packages

rmarkdown: Used for generating dynamic reports.

knitr: Enables dynamic document creation.

tidyverse: Used for data manipulation and visualization.

data.table: Provides fast data frame manipulation.

lubridate: Used for handling date and time data.

sf: Provides tools for spatial data analysis.

leaflet: Used for interactive maps.

hms: Used for handling time data in hours, minutes, and seconds format, and for performing time-related operations.

scales: Load the scales library for formatting numbers

ggpubr: Provides a suite of functions and tools for creating publication-ready plots and figures using ggplot2.

gridExtra: Extends the grid graphics system in R, allowing for the arrangement of multiple grid-based figures on one page.

patchwork: Enables the seamless combination and arrangement of multiple ggplots, offering flexibility in creating complex multi-panel visualizations

gt: an R package for creating and formatting tables with ease and flexibility

RPostgres: provides a connection between R and PostgreSQL, allowing users to interact with and manipulate PostgreSQL databases directly from R.

RODBC: for seamless interaction with databases, making data retrieval and manipulation straightforward and flexible

odbc: Simplifying database interaction in R with ease and flexibility, this package streamlines the process of data retrieval and manipulation.

dplyr: designed to streamline data manipulation tasks, offering ease and flexibility in creating and formatting tables.

dbplyr: specialized package that enhances data manipulation capabilities, providing seamless interaction with databases for creating and formatting tables with ease and flexibility.

plotly: An R package that empowers users to create interactive and visually appealing plots with ease and flexibility, enhancing data visualization capabilities.

RColorBrewer: a series of color maps that are designed to improve graph readability for readers with common forms of color blindness and/or color vision deficiency.

# Set mirror
options(repos = c(CRAN = "https://cran.rstudio.com/"))

# Update Packages
update.packages(ask = FALSE, dependencies = TRUE)
# List of Packages Used
library(DBI)
library(rmarkdown)
library(knitr)
library(tidyverse)
library(data.table)
library(lubridate)
library(sf)
library(leaflet)
library(hms)
library(scales)
library(ggpubr)
library(gridExtra)
library(patchwork)
library(gt)
library(RPostgres)
library(RODBC)
library(odbc)
library(dplyr)
library(dbplyr)
library(plotly)
library(RColorBrewer)
library(kableExtra)
Database
# Load necessary packages
library(RPostgres)
library(RODBC)
library(odbc)

# Connect to Database
conn <- dbConnect(
  drv = RPostgres::Postgres(),
  dbname = dsn_database,
  host = dsn_hostname,
  port = dsn_port,
  user = dsn_uid,
  password = dsn_pwd
)
# Get List of Tables
dbListTables(conn)
##  [1] "cyclistic_data_1"     "cyclistic_data_2"     "cyclistic_data_3"    
##  [4] "cyclistic_data_4"     "cyclistic_data_5"     "cyclistic_data_6"    
##  [7] "cyclistic_data_7"     "cyclistic_data_8"     "cyclistic_data_9"    
## [10] "cyclistic_data_10"    "cyclistic_data_11"    "cyclistic_data_12"   
## [13] "temporal_data"        "holidays"             "user_data"           
## [16] "weather_data"         "user_data_backup"     "temporal_data_backup"

-> Please take note that the tables of interest are ‘user_data’ and ‘temporal_data’. To streamline the analysis, I consolidated the ‘holidays’ and ‘weather_data’ tables, as they pertain to the same time frame, into a unified table named ‘temporal_data’.


General Data Overview

In this section, we’ll provide a brief overview of the data we’re working with.

Links to the data files used

The data is clean, and ready for analysis. To access the finalized data imported into R for this analysis, you can download the files user_data, and temporal_data

I initiated this process by obtaining the raw data files and employed VBA scripts for the preprocessing phase. Subsequently, I transferred the processed data into PostgreSQL for in-depth analysis. The VBA and SQL scripts used, along with the observations derived from this stage, can be found in the ‘Processes and Observations’ document, accessible here.


Summary Statistics

Let’s start by looking at some summary statistics for our data sets.

Get an overview of the source data


Metadata for user_data

For the combined_data data set you can see, it contains 5,667,614 records and 18 columns, consisting of:

  • ride_id
  • rideable_type
  • started_at
  • ended_at
  • start_station_name
  • start_station_id
  • end_station_name
  • end_station_id
  • start_lat (latitude)
  • start_lng (longitude)
  • end_lat
  • end_lng
  • member_casual
  • ride_length
  • day_of_week
  • name_of_day
# Get User Data Summary
user_data_records_count <- dbGetQuery(conn,
  "SELECT COUNT(*) 
   FROM user_data;")
print(user_data_records_count)
##     count
## 1 5667614
  • In the preprocessing phase, performed in PostgreSQL, it was observed that several columns, including start_station_name, start_station_id, end_station_name, end_station_id, start_lat, start_lng, end_lat, and end_lng, contained missing data.

These columns have been retained for analysis, as each record contains the required information for the evaluation process, as per the provided instructions.

  • A total of 103 records with negative ride length, or ride lengths exceeding 24 hours were excluded from the data set.


Metadata for temporal_data

And for the temporal data set you can see, it contains 367 (as there are two overlapping multi-day holidays) records and 37 columns, consisting of:

  • holiday_name
  • holiday_start_date
  • holiday_end_date
  • name (city/state)
  • datetime
  • tempmax
  • tempmin
  • temp
  • feelslikemax
  • feelslikemin
  • feelslike
  • dew
  • humidity
  • precip
  • precipprob
  • precipcover
  • preciptype
  • snow
  • snowdepth
  • windgust
  • windspeed
  • winddir
  • sealevelpressure
  • cloudcover
  • visibility
  • solarradiation
  • solarenergy
  • uvindex
  • severerisk
  • sunrise
  • sunset
  • moonphase
  • conditions
  • description
  • icon
  • stations
  • date_w (extracted date)
# Get Temporal Data Summary
temporal_data_record_count <- dbGetQuery(conn,
  "SELECT COUNT(*) 
   FROM temporal_data;")
print(temporal_data_record_count)
##   count
## 1   367
As part of the preprocessing phase, I combined both the ‘holidays’ and ‘weather_data’ tables into a single data set to facilitate seamless analysis.

Looking at the Cyclistic data

As this is an evaluation of user behaviors for casual users of the service versus those with memberships, from this point on the data will be segmented into ‘casual’ and ‘member’.

The chosen color palette was deliberately selected to ensure readability for all viewers, including those with various forms of color blindness.


Yearly count of user types compared to average ride lengths
# User Count Query
user_count <- dbGetQuery(conn,
  "SELECT member_casual, Count(*) AS user_count
   FROM user_data
   GROUP BY member_casual;")
print(user_count)
##   member_casual user_count
## 1        casual    2321974
## 2        member    3345640
# Average Ride Length Query
average_ride_length <- dbGetQuery(conn, 
  "SELECT
       ROUND(AVG(
            CASE WHEN member_casual = 'member'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END
        )::numeric, 0) AS rounded_member_mean_ride_length_minutes,
          ROUND(AVG(
            CASE WHEN member_casual = 'casual'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END
        )::numeric, 0) AS rounded_casual_mean_ride_length_minutes
  FROM
      user_data;")
print(average_ride_length)
##   rounded_member_mean_ride_length_minutes
## 1                                      13
##   rounded_casual_mean_ride_length_minutes
## 1                                      29
# Convert user_count to integer
user_count$user_count <- as.integer(user_count$user_count)

library(RColorBrewer)  # Load the RColorBrewer package
colors <- brewer.pal(11, "PuOr")

# Create the user_count bar plot
count_plot <- ggplot(user_count, aes(x = member_casual, y = user_count, fill = member_casual)) +
  geom_bar(stat = "identity", position = "dodge", color = "white") +  # Manually set color
  labs(x = NULL, y = "User Count (Millions)") +  # Remove x-axis label
  ggtitle("Yearly Count by Users") +
  scale_fill_manual(values = c("#7F3B08", "#542788")) +  # Use PuOr color palette
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) +
  theme(legend.position = "none",  # Remove the legend
        axis.text.x = element_text(face = "bold"))  # Make x-axis labels bold
# Load the necessary package
library(RColorBrewer)

# Format data for plotting
plot_data <- data.frame(
  User_Type = c("member", "casual"),
  Average_Ride_Length = c(
    average_ride_length$rounded_member_mean_ride_length_minutes[1],
    average_ride_length$rounded_casual_mean_ride_length_minutes[1]
  ))

# Create the bar plot
average_plot <- ggplot(plot_data, aes(x = User_Type, y = Average_Ride_Length, fill = User_Type)) +
  geom_bar(stat = "identity", position = "dodge", color = "white") +  # Manually set color
  xlab(NULL) +  # Remove x-axis label
  ylab("Average Ride Length (Minutes)") +  # Define y-axis label
  ggtitle("Yearly Average Ride Length") +
  scale_fill_manual(values = c("#7F3B08", "#542788")) +  # Use PuOr color palette
  theme(legend.position = "none",  # Remove legend
      axis.text.x = element_text(face = "bold"))  # Make x-axis labels bold
# Merging the two plots side by side into one chart
library(cowplot)

# Combine the plots
combined_year_plot <- cowplot::plot_grid(count_plot, average_plot, ncol = 2, align = "h")

Yearly Overview: User Engagement and Ride Length Analysis

Examining the data for the year 2022, it becomes evident that ‘Member’ users constitute the majority. However, a noteworthy distinction arises in the average ride duration between ‘Member’ and ‘Casual’ users. Although members engage in a higher number of rides, casual users exhibit extended durations. This contrast in ride lengths implies diverse levels of user engagement and utilization patterns.

NOTE: The contrast in average ride duration highlights the significant difference in behavior between ‘Casual’ and ‘Member’ users. It’s important to note that ‘Casual’ user data encompasses a wide range of ride durations, from full-day excursions to shorter individual rides. This diversity can sometimes skew the perception of average ride length, especially when considering the inclusion of both extended full-day rides and shorter individual trips.


Below, you’ll find visualizations summarizing my process. I’ll highlight key insights that play a pivotal role in shaping the marketing plan, rather than providing an exhaustive list of findings. For a comprehensive view of all queries and their corresponding observations, please consult the ‘Processes and Observations’ section.


Monthly count of user types compared to average ride lengths
monthly_user_count <- dbGetQuery(conn,
"SELECT
    TO_CHAR(DATE_TRUNC('month', started_at), 'Month') AS month_name,
    COUNT(*) FILTER (WHERE member_casual = 'casual') AS casual_count,
    COUNT(*) FILTER (WHERE member_casual = 'member') AS member_count
FROM
    user_data
GROUP BY
    TO_CHAR(DATE_TRUNC('month', started_at), 'Month')
ORDER BY
    MIN(started_at);")
monthly_average_ride_length <- dbGetQuery(conn,
"SELECT
    TO_CHAR(started_at, 'Month') AS month,
    ROUND(
        AVG(CASE WHEN member_casual = 'member'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
    ) AS member_mean_ride_length_minutes,
    ROUND(
        AVG(CASE WHEN member_casual = 'casual'
                 THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric
    ) AS casual_mean_ride_length_minutes
FROM
    user_data
GROUP BY
    TO_CHAR(started_at, 'Month')
ORDER BY
    TO_DATE(TO_CHAR(started_at, 'Month'), 'Month');
")
library(ggplot2)
library(RColorBrewer)

# Creating data frame monthly_user_count
monthly_user_count <- data.frame(
  month_name = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
  casual_count = c(18520, 21416, 89880, 126417, 280413, 369043, 406046, 358917, 296694, 208987, 100747, 44894),
  member_count = c(85250, 94193, 194160, 244832, 354443, 400148, 417426, 427000, 404636, 349693, 236947, 136912)
)

# Set month_name as a factor with the desired order
monthly_user_count$month_name <- factor(monthly_user_count$month_name, levels = c(
  "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"
))

# Reshape the data for plotting
monthly_user_count_long <- pivot_longer(monthly_user_count, cols = c(casual_count, member_count), names_to = "User_Type", values_to = "value")

# Create a stacked bar plot with legend, matching legend settings
monthly_user_plot <- ggplot(monthly_user_count_long, aes(x = month_name, y = value, fill = User_Type)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Monthly User Counts", x = NULL, y = "User Count (Millions)") +
  scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("", "")) +
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        legend.key.size = unit(0.1, "in"),  # Match legend key size
        legend.text = element_text(size = 1),  # Adjust legend text size
        legend.title = element_blank(),  # Remove legend title
        legend.background = element_rect(fill = "white")  # Make legend background invisible
) +
  guides(fill = guide_legend(override.aes = list(fill = NA)))  # Make legend completely invisible

print(monthly_user_count)
##    month_name casual_count member_count
## 1     January        18520        85250
## 2    February        21416        94193
## 3       March        89880       194160
## 4       April       126417       244832
## 5         May       280413       354443
## 6        June       369043       400148
## 7        July       406046       417426
## 8      August       358917       427000
## 9   September       296694       404636
## 10    October       208987       349693
## 11   November       100747       236947
## 12   December        44894       136912
library(ggplot2)
library(RColorBrewer)


# Create data frame monthly_average_ride_length data
monthly_average_ride_length <- data.frame(
  month = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
  member_mean_ride_length_minutes = c(12, 11, 12, 11, 13, 14, 14, 13, 13, 12, 11, 11),  
  casual_mean_ride_length_minutes = c(30, 27, 33, 30, 31, 32, 29, 29, 28, 26, 21, 22)  
)

# Set month as a factor with the desired order
monthly_average_ride_length$month <- factor(monthly_average_ride_length$month, levels = c(
  "January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"
))

# Reshape the data for plotting
monthly_average_ride_length_long <- pivot_longer(monthly_average_ride_length, 
                                                 cols = c(member_mean_ride_length_minutes, casual_mean_ride_length_minutes), 
                                                 names_to = "User_Type", values_to = "value")

# Create a stacked bar plot with RColorBrewer colors and no legend
monthly_ride_length_plot <- ggplot(monthly_average_ride_length_long, aes(x = month, y = value, fill = User_Type)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Monthly Average Ride Length", x = NULL, y = "Average Ride Length (Minutes)") +
  scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("Casual", "Member")) +  # Use PuOr color palette
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        legend.key.size = unit(0.1, "in"),
    legend.text = element_text(size = 8),  # Adjust legend text size
    legend.title = element_blank()  # Remove legend title
  )

print(monthly_average_ride_length)
##        month member_mean_ride_length_minutes casual_mean_ride_length_minutes
## 1    January                              12                              30
## 2   February                              11                              27
## 3      March                              12                              33
## 4      April                              11                              30
## 5        May                              13                              31
## 6       June                              14                              32
## 7       July                              14                              29
## 8     August                              13                              29
## 9  September                              13                              28
## 10   October                              12                              26
## 11  November                              11                              21
## 12  December                              11                              22
# Combine the plots
combined_monthly_plots <- cowplot::plot_grid(monthly_user_plot, monthly_ride_length_plot, ncol = 2, align = "h")

Monthly Overview: User Engagement and Ride Length Analysis

The data illustrates a clear distinction in ride counts between “Member” and “Casual” users. “Member” users consistently account for a significant portion of rides throughout the year, with notable spikes in the warmer months. In contrast, “Casual” users tend to have lower ride counts, with increases in the warmer months. These trends suggest a preference for membership-based usage, particularly during the winter season.

Monthly Variations in Ride Lengths:

Additionally, the data showcases distinct patterns in average ride lengths for both member and casual users. Members exhibit a preference for shorter rides, with durations averaging around 12 to 14 minutes. Casual users, on the other hand, tend to opt for longer rides, with average durations ranging from 27 to 33 minutes. These differences in ride lengths are consistent throughout the year and highlight varying user preferences.

Seasonal Patterns:

Both user engagement and ride lengths display seasonal fluctuations. Warmer months, such as May through August, show increased engagement from both member and casual users. Additionally, average ride lengths tend to peak during the spring and summer, indicating a shift towards longer rides in favorable weather conditions. Based on this observation we will look at the data in a seasonal format.


Seasonal count of user types compared to average ride lengths
# Getting a overview of rider trends seasonally
seasonal_user_counts <- dbGetQuery(conn, 
"WITH seasonal_counts AS (
    SELECT
        CASE
            WHEN EXTRACT(MONTH FROM started_at) IN (12, 1, 2) THEN 'Winter'
            WHEN EXTRACT(MONTH FROM started_at) IN (3, 4, 5) THEN 'Spring'
            WHEN EXTRACT(MONTH FROM started_at) IN (6, 7, 8) THEN 'Summer'
            WHEN EXTRACT(MONTH FROM started_at) IN (9, 10, 11) THEN 'Autumn'
            ELSE 'Unknown'
        END AS season,
        COUNT(*) FILTER (WHERE member_casual = 'member') AS member_count,
        COUNT(*) FILTER (WHERE member_casual = 'casual') AS casual_count
    FROM
        user_data
    GROUP BY
        season
)
SELECT
    season,
    (member_count) AS member,
    (casual_count) AS casual
FROM
    seasonal_counts
ORDER BY
    season;")

print(seasonal_user_counts)
##   season  member  casual
## 1 Autumn  991276  606428
## 2 Spring  793435  496710
## 3 Summer 1244574 1134006
## 4 Winter  316355   84830
# Getting Average Ride Length by Season
seasonal_average_ride_length <- dbGetQuery(conn, 
"SELECT
    CASE
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
        ELSE 'Winter'
    END AS season,
    CAST(
        ROUND(
            AVG(CASE WHEN member_casual = 'member'
                     THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 0
        ) AS INT
    ) AS member_ride_length,
    CAST(
        ROUND(
            AVG(CASE WHEN member_casual = 'casual'
                     THEN EXTRACT(EPOCH FROM (ended_at - started_at)) / 60 END)::numeric, 0
        ) AS INT
    ) AS casual_ride_length
FROM
    user_data
GROUP BY
    CASE
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 3 AND 5 THEN 'Spring'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 6 AND 8 THEN 'Summer'
        WHEN EXTRACT(MONTH FROM started_at) BETWEEN 9 AND 11 THEN 'Autumn'
        ELSE 'Winter'
    END;
")

print(seasonal_average_ride_length)
##   season member_ride_length casual_ride_length
## 1 Autumn                 12                 26
## 2 Spring                 12                 31
## 3 Summer                 14                 30
## 4 Winter                 11                 25
library(ggplot2)
library(RColorBrewer)


# Create a factor with custom levels
seasonal_user_counts$season <- factor(seasonal_user_counts$season, levels = c("Spring", "Summer", "Autumn", "Winter"))

# Adjusting data format to integer
seasonal_user_counts$member <- as.integer(seasonal_user_counts$member)
seasonal_user_counts$casual <- as.integer(seasonal_user_counts$casual)

# Reshape the data for plotting
seasonal_user_counts_long <- pivot_longer(seasonal_user_counts, cols = c(member, casual), names_to = "User_Type", values_to = "User_Count")

# Create a bar plot with separate bars for casual and member
seasonal_user_plot <- ggplot(seasonal_user_counts_long, aes(x = season, y = User_Count, fill = User_Type)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Seasonal User Counts", x = NULL, y = "User Count (Millions)") +
  scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("", "")) +
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) + 
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        legend.key.size = unit(0.1, "in"),  # Match legend key size
        legend.text = element_text(size = 1),  # Adjust legend text size
        legend.title = element_blank(),  # Remove legend title
        legend.background = element_rect(fill = "white")  # Make legend background invisible
) +
  guides(fill = guide_legend(override.aes = list(fill = NA)))  # Make legend completely invisible
library(ggplot2)
library(RColorBrewer)


# Create a data frame for member and casual ride lengths
member_casual_data <- seasonal_average_ride_length %>%
  pivot_longer(cols = c(member_ride_length, casual_ride_length),
               names_to = "User_Type",
               values_to = "Ride_Length")

# Define the order of seasons
season_order <- c("Spring", "Summer", "Autumn", "Winter")

# Apply the order to the 'season' column
member_casual_data$season <- factor(member_casual_data$season, levels = season_order)

# Create a grouped bar plot
seasonal_average_ride_length_plot <- ggplot(member_casual_data, aes(x = season, y = Ride_Length, fill = User_Type)) +
  geom_bar(stat = "identity", position = "stack") + 
  labs(title = "Seasonal Average Ride Length", x = NULL, y = "Average Ride Length (Minutes)") +
  scale_fill_manual(values = c("#7F3B08", "#542788"), labels = c("Casual", "Member")) +  # Use PuOr color palette
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1),
        legend.key.size = unit(0.1, "in"),
        legend.text = element_text(size = 8),  # Adjust legend text size
        legend.title = element_blank()  # Remove legend title
  )
# Merging the two plots side by side into one chart
combined_seasonal_plot <- seasonal_user_plot + seasonal_average_ride_length_plot

Seasonal Overview: User Engagement and Ride Length Analysis

The analysis of user engagement and ride lengths across seasons provides valuable insights into how users interact with the cycling platform under varying environmental conditions. This data uncovers distinct preferences between “Member” and “Casual” users, shedding light on their utilization patterns.

Ride Length Observations:

Examining the data reveals that “Casual” users tend to engage in longer rides compared to “Member” users, regardless of the season. Notably, both user groups favor longer rides during the warmer months of Spring and Summer, while ride durations shorten in Autumn and Winter.

User Utilization Trends:

Ride length patterns also exhibit interesting variations. “Member” users consistently demonstrate higher engagement levels across all seasons, indicating a strong preference for the platform. In contrast, “Casual” users exhibit more variability, with peaks in utilization during certain seasons, particularly in the warmer months of Spring and Summer.

Seasonal Dynamics:

Seasonal shifts play a significant role in user behavior. The data suggests that environmental factors strongly influence user engagement and ride lengths. As temperatures rise, users are inclined to embark on more extended journeys, while colder seasons witness shorter rides.


Modes by month for the year
In the charts below we have the mode day for each month. Each column shows the rider count for that month at the top in black, along with the average ride length at the bottom of the column in white
casual_monthly_mode_day <- dbGetQuery(conn,
"WITH DayOfWeekCounts AS (
    SELECT
        TO_CHAR(started_at, 'Month') AS month,
        EXTRACT(DOW FROM started_at) AS day_of_week,
        COUNT(*) AS ride_count,
        ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60)::numeric, 2) AS avg_ride_length_minutes,
        ROW_NUMBER() OVER (PARTITION BY TO_CHAR(started_at, 'Month') 
                           ORDER BY COUNT(*) DESC) AS rank
    FROM user_data
    WHERE member_casual = 'casual'
    GROUP BY month, EXTRACT(DOW FROM started_at)
)
SELECT
    month,
    CASE
        WHEN day_of_week = 0 THEN 'Sunday'
        WHEN day_of_week = 1 THEN 'Monday'
        WHEN day_of_week = 2 THEN 'Tuesday'
        WHEN day_of_week = 3 THEN 'Wednesday'
        WHEN day_of_week = 4 THEN 'Thursday'
        WHEN day_of_week = 5 THEN 'Friday'
        WHEN day_of_week = 6 THEN 'Saturday'
    END AS mode_day_of_week,
    ride_count AS max_count,
    avg_ride_length_minutes
FROM DayOfWeekCounts
WHERE rank = 1
ORDER BY to_timestamp(month, 'Month')::DATE;

")
print(casual_monthly_mode_day)
##        month mode_day_of_week max_count avg_ride_length_minutes
## 1  January           Saturday      3791                   37.99
## 2  February            Monday      4405                   24.90
## 3  March               Sunday     16575                   38.79
## 4  April             Saturday     36330                   34.75
## 5  May                 Sunday     55320                   33.06
## 6  June                Sunday     65851                   36.14
## 7  July              Saturday     95230                   32.90
## 8  August            Saturday     66188                   31.94
## 9  September         Saturday     64076                   35.39
## 10 October           Saturday     52221                   29.46
## 11 November          Thursday     17981                   24.17
## 12 December          Thursday      8177                   19.69
member_monthly_mode_day <- dbGetQuery(conn,
"WITH DayOfWeekCounts AS (
    SELECT
        TO_CHAR(started_at, 'Month') AS month,
        EXTRACT(DOW FROM started_at) AS day_of_week,
        COUNT(*) AS ride_count,
        ROUND(AVG(EXTRACT(EPOCH FROM (ended_at - started_at)) / 60)::numeric, 2) AS avg_ride_length_minutes,
        ROW_NUMBER() OVER (PARTITION BY TO_CHAR(started_at, 'Month') 
                           ORDER BY COUNT(*) DESC) AS rank
    FROM user_data
    WHERE member_casual = 'member'
    GROUP BY month, EXTRACT(DOW FROM started_at)
)
SELECT
    month,
    CASE
        WHEN day_of_week = 0 THEN 'Sunday'
        WHEN day_of_week = 1 THEN 'Monday'
        WHEN day_of_week = 2 THEN 'Tuesday'
        WHEN day_of_week = 3 THEN 'Wednesday'
        WHEN day_of_week = 4 THEN 'Thursday'
        WHEN day_of_week = 5 THEN 'Friday'
        WHEN day_of_week = 6 THEN 'Saturday'
    END AS mode_day_of_week,
    ride_count AS max_count,
    avg_ride_length_minutes
FROM DayOfWeekCounts
WHERE rank = 1
ORDER BY to_timestamp(month, 'Month')::DATE;

")
print(member_monthly_mode_day)
##        month mode_day_of_week max_count avg_ride_length_minutes
## 1  January           Thursday     14011                   11.62
## 2  February            Monday     18375                   11.36
## 3  March            Wednesday     35957                   11.96
## 4  April              Tuesday     40432                   10.83
## 5  May                 Monday     62061                   13.39
## 6  June              Thursday     73500                   13.70
## 7  July              Saturday     68867                   15.30
## 8  August             Tuesday     76718                   13.06
## 9  September         Thursday     76589                   12.43
## 10 October             Monday     58287                   11.42
## 11 November         Wednesday     47448                   10.92
## 12 December          Thursday     27095                   10.40
library(ggplot2)
library(scales)

# Create the data
mode_day <- data.frame(
  month = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
   mode_day_of_week = c("Saturday", "Monday", "Sunday", "Saturday", "Sunday", "Sunday", "Saturday", "Saturday", "Saturday", "Saturday", "Thursday", "Thursday"),
  max_count = c(3791, 4405, 16575, 36330, 55320, 65851, 95230, 66188, 64076, 52221, 17981, 8177),
avg_ride_length_minutes = c(37.99, 24.90, 38.79, 34.75, 33.06, 36.14, 32.90, 31.94, 35.39, 29.46, 24.17, 19.69)
)

# Specify the order of months
mode_day$month <- factor(mode_day$month, levels = c(
  "January", "February", "March", "April", "May", "June",
  "July", "August", "September", "October", "November", "December"
))

# Specify the order of days
mode_day$mode_day_of_week <- factor(mode_day$mode_day_of_week, levels = c(
  "Monday", "Thursday", "Saturday", "Sunday"
))

# Format max_count with commas
mode_day$max_count <- comma(mode_day$max_count)

# Plot for casual
casual_month_mode_plot <- ggplot(mode_day, aes(x = month, y = mode_day_of_week)) +
  geom_bar(stat = "identity", fill = "#7F3B08") +
  geom_text(aes(label = max_count), vjust = -0.5, size = 3) +
  geom_text(aes(label = sprintf("%.2f", avg_ride_length_minutes), y = 0.5), vjust = 1, color = "white") +
  labs(title = "Mode Day of the Week by Month (Casual)\nwith Rider Counts and Avg Ride Length",
       x = "",
       y = "") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))
library(ggplot2)
library(scales)

# Create the data for 'member'
mode_day_member <- data.frame(
  month = c("January", "February", "March", "April", "May", "June", "July", "August", "September", "October", "November", "December"),
  mode_day_of_week = c("Thursday", "Monday", "Wednesday", "Tuesday", "Monday", "Thursday", "Saturday", "Tuesday", "Thursday", "Monday", "Wednesday", "Thursday"),
  max_count = c(14011, 18375, 35957, 40432, 62061, 73500, 68867, 76718, 76589, 58287, 47448, 27095),
  avg_ride_length_minutes = c(11.62, 11.36, 11.96, 10.83, 13.39, 13.70, 15.30, 13.06, 12.43, 11.42, 10.92, 10.40)
)

# Specify the order of months
mode_day_member$month <- factor(mode_day_member$month, levels = c(
  "January", "February", "March", "April", "May", "June",
  "July", "August", "September", "October", "November", "December"
))

# Specify the order of days
mode_day_member$mode_day_of_week <- factor(mode_day_member$mode_day_of_week, levels = c(
  "Monday", "Thursday", "Wednesday", "Tuesday", "Saturday", "Sunday"
))

# Format max_count with commas
mode_day_member$max_count <- comma(mode_day_member$max_count)

# Plot for 'member' with rider counts and average ride length
member_month_mode_plot <- ggplot(mode_day_member, aes(x = month, y = mode_day_of_week)) +
  geom_bar(stat = "identity", fill = "#542788") +
  geom_text(aes(label = max_count), vjust = -0.5, size = 3) +
  geom_text(aes(label = sprintf("%.2f", avg_ride_length_minutes), y = 0.5), vjust = 1, color = "white") +
  labs(title = "Mode Day of the Week by Month (Member)\nwith Rider Counts and Avg Ride Length",
       x = "",
       y = "") +
  theme(axis.text.x = element_text(angle = 45, vjust = 1, hjust = 1))


Mode Overview: User Engagement and Ride Length Analysis

Casual User Group:

In January, Casual users prefer Saturdays for rides, with an average duration of approximately 37.99 minutes. February sees a shift towards weekday rides, with Mondays emerging as the mode day and rides averaging around 24.90 minutes.

March shows Sundays as the mode day, with an average ride length of approximately 38.79 minutes, indicating a consistent trend of longer rides. In April, Saturdays continue to be the preferred day with rides lasting around 34.75 minutes, suggesting a balanced mix of exploration. May sees Sundays remaining prominent, with rides averaging about 33.06 minutes, in line with the trend of longer rides. June continues the preference for Sundays as the mode day, with average rides lasting around 36.14 minutes, indicating more extensive rides during the warmer season.

July maintains Saturday’s prominence, with rides averaging around 32.90 minutes, indicating a balance between exploration and shorter trips. August sees Saturday remaining popular, with rides averaging approximately 31.94 minutes, suggesting a consistent trend of moderate-length rides. In September, Saturdays continue to be the preferred day, with rides averaging about 35.39 minutes, indicating a preference for weekend exploration. October shows Saturday retaining its position, suggesting a continued preference for weekend rides, with an average ride length of approximately 29.46 minutes.

November sees Thursdays emerging as the mode day, with rides lasting around 24.17 minutes. December maintains Thursdays’ prominence, with rides averaging around 19.69 minutes, suggesting a trend towards shorter rides in the colder season.

Across the year, Casual users exhibit distinct preferences for days of the week. In the warmer months of June, July, and August, Saturdays are particularly popular, with ride counts peaking at over 68,000. Sundays also exhibit high engagement during the summer. Conversely, the colder months of November and December see Thursdays as the preferred day for Casual riders. These trends underscore a seasonal shift in user behavior, with warmer months encouraging more weekend exploration.

Average ride lengths for Casual users follow a fairly consistent pattern. Throughout the year, rides tend to be longer, averaging around 19 to 38 minutes. Interestingly, March stands out with the longest average ride lengths, suggesting leisurely rides during the spring Conversely, December exhibits the shortest ride lengths, potentially indicating more purposeful, shorter trips in the colder weather.


Member User Group:

In January, Member riders prefer Thursdays for rides, with an average duration of approximately 11.62 minutes. February sees a continuation of weekday rides, with a shift to Mondays emerging as the mode day and rides averaging around 11.36 minutes.

March shows Wednesdays as the mode day, with an average ride length of approximately 11.96 minutes. In April, Tuesdays emerge as the mode day for Member users, indicating a preference for rides early in the week, with rides lasting around 10.83 minutes, suggesting purposeful, shorter trips. May continues to see Mondays as the mode day for Member riders. The average ride length is approximately 13.39 minutes, indicating a balance between purposeful trips and potential leisurely exploration.

June maintains Thursdays’ prominence, reflecting a preference for mid-week rides among Member users. The average ride length increases to about 13.70 minutes, potentially indicating slightly longer trips compared to previous months. July sees Saturdays emerging as the mode day for Member riders, signaling a shift towards weekend rides. The average ride length increases significantly to around 15.30 minutes, suggesting more extensive rides during the warmer season.

August sees Tuesdays regaining prominence as the mode day for Member users. The average ride length remains consistent at about 13.06 minutes, indicating a preference for purposeful, shorter trips. In September, Thursdays continue to be the mode day for Member users, aligning with a preference for mid-week rides. The average ride length remains fairly consistent at approximately 12.43 minutes.

October shows Mondays emerging as the mode day for Member users, signaling a shift towards weekday rides, with an average ride length of around 11.42 minutes, indicating a balance between purposeful trips and potential leisurely exploration. November sees Wednesdays regaining prominence as the mode day for Member riders. The average ride length is approximately 10.92 minutes. December maintains Thursdays’ prominence, aligning with a preference for mid-week rides among Member users. The average ride length decreases to approximately 10.40 minutes, suggesting a trend towards slightly shorter trips in the colder season.

Across the year, Member users exhibit varying preferences for days of the week. In July, Saturdays are particularly popular, with ride counts exceeding 68,867. In August, this switches to Tuesdays, with rides peaking at over 76,718, the highest ridership month of the year. Mondays and Thursdays also exhibit high engagement across the year. These trends highlight a seasonal shift in user behavior, with warmer months encouraging more weekend exploration.

Average ride lengths for Member users show a fairly consistent pattern. Throughout the year, rides tend to be around 10 to 15 minutes long. Interestingly, July sees the longest average ride lengths, which may suggest users taking leisurely rides during the summer. Conversely, December exhibits the shortest ride lengths, potentially indicating more purposeful, shorter trips in the colder weather.


Third Party Data

Looking at user behaviors through temporal filters

Weather

Recognizing the critical role that weather data plays in contextualizing and analyzing trends, I embarked on the task of establishing a dedicated ‘weather’ data set for the Chicago area in 2022. This decision was spurred by the data set’s direct relevance to this specific region. Constructing a comprehensive record of weather patterns specific to Chicago holds immense potential in bolstering analytical capabilities and providing a deeper contextual understanding of the data. By incorporating local weather information, this data set will serve as an invaluable resource for conducting more nuanced analyses and deriving insights closely attuned to the dynamic weather conditions shaping the Chicago area.

library(DBI)

# Comparing effects of temperatures on rides
temps_and_rides <- dbGetQuery(conn,
"SELECT
    CASE
        WHEN td.temp < 50 THEN 'Less than 50°F'
        WHEN td.temp >= 50 AND td.temp < 60 THEN '50-59°F'
        WHEN td.temp >= 60 AND td.temp < 70 THEN '60-69°F'
        WHEN td.temp >= 70 AND td.temp < 80 THEN '70-79°F'
        ELSE '80°F and above'
    END AS temperature_range,
    SUM(CASE WHEN ud.member_casual = 'casual' THEN 1 ELSE 0 END) AS temporal_holiday_casual_riders_count,
    SUM(CASE WHEN ud.member_casual = 'member' THEN 1 ELSE 0 END) AS temporal_holiday_member_riders_count
FROM
    temporal_data td
LEFT JOIN user_data ud ON td.date_w = ud.date_start

GROUP BY
    temperature_range
ORDER BY
    temperature_range;")

print(temps_and_rides)
##   temperature_range temporal_holiday_casual_riders_count
## 1           50-59°F                               280946
## 2           60-69°F                               364593
## 3           70-79°F                              1079683
## 4    80°F and above                               306548
## 5    Less than 50°F                               312057
##   temporal_holiday_member_riders_count
## 1                               477149
## 2                               484027
## 3                              1190246
## 4                               359555
## 5                               861900
# Adjusting temps_and_rides$holiday_casual_riders_count and temps_and_rides$holiday_member_riders_count from integer64 to number
temps_and_rides$temporal_holiday_casual_riders_count <- as.numeric(temps_and_rides$temporal_holiday_casual_riders_count)
temps_and_rides$temporal_holiday_member_riders_count <- as.numeric(temps_and_rides$temporal_holiday_member_riders_count)

# Reshape the data for plotting
temps_and_rides_long <- pivot_longer(temps_and_rides, 
                                     cols = c(temporal_holiday_member_riders_count, temporal_holiday_casual_riders_count), 
                                     names_to = "User_Type", values_to = "value")

temps_and_rides_plot <- ggplot(temps_and_rides_long, aes(x = temperature_range, y = value, fill = User_Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Ridership by Temperature", x = "Temperature Range", y = "Ridership Count (Millions)") +
  scale_fill_manual(values = c("temporal_holiday_casual_riders_count" = "#7F3B08", "temporal_holiday_member_riders_count" = "#542788"),
  labels = c("temporal_holiday_casual_riders_count" = "Casual", "temporal_holiday_member_riders_count" = "Member")) +  
  scale_y_continuous(labels = scales::number_format(scale = 1e-6, accuracy = 0.1)) +
  theme(legend.position = "right", axis.text.x = element_text(face = "bold")) +
  guides(fill = guide_legend(title = NULL))  # This removes the legend title

Temperature Influence on Ridership:

Temperature Impacts:
The “70-79°F” temperature range shows the highest ridership, both for casual and member riders. This indicates that ridership tends to increase as temperatures rise, suggesting that mild and pleasant weather conditions attract more riders.

Member ridership is notably higher when the temperature range between 50°F and 59°F. Ridership remains consistently high across both temperature categories in the 60°F to 69°F range. This information can help in planning marketing efforts based on expected temperature levels.

Seasonal Preferences:
Ridership is higher during mild and warm weather, indicating a preference for comfortable riding conditions.

Member Engagement and Temperature:
The “70-79°F” temperature range stands out with high member ridership counts, indicating that members are more inclined to use the service during mild and pleasant weather.

Casual Engagement and Temperature:
When considering casual riders, the “70-79°F” temperature range also has the highest casual ridership counts. This indicates that casual riders are similarly drawn to use the service during this temperature range, emphasizing the positive influence of comfortable weather conditions.



I realized the importance of establishing a dedicated ‘holidays’ table for the year 2022, tailored to the unique characteristics of the Chicago area. This decision was driven by the data set’s origin in this specific region. The creation of a comprehensive record of holidays specific to Chicago holds the potential to significantly enhance both the analytical capabilities and the contextual relevance of the data. By including local holidays, this table will serve as a valuable resource for conducting more insightful analyses and drawing meaningful conclusions that are closely aligned with the Chicago area’s socio-cultural dynamics.

I conducted thorough research using Google to gather accurate and relevant information regarding holidays and their corresponding dates specific to the Chicago area. This process ensured the creation of a comprehensive and contextually accurate “holidays” table for the year 2022. By utilizing reliable sources and cross-referencing information, I curated a data set that aligns with the local holiday calendar, enhancing the analytical capabilities and relevance of the collected data.

Holidays
library(DBI)

# Comparing effects of holidays on rides
holidays_and_rides <- dbGetQuery(conn,
"SELECT
    td.holiday_name,
    COUNT(CASE WHEN member_casual = 'casual' THEN 1 END) AS holiday_casual_riders_count,
    COUNT(CASE WHEN member_casual = 'member' THEN 1 END) AS holiday_member_riders_count
FROM
    user_data ud
LEFT JOIN temporal_data td ON ud.date_start = td.holiday_start_date
WHERE
    td.holiday_name IS NOT NULL
GROUP BY
    td.holiday_name
ORDER BY
    td.holiday_name;")

print(holidays_and_rides)
##                            holiday_name holiday_casual_riders_count
## 1                            Boxing Day                         272
## 2            Chicago Air and Water Show                       28135
## 3                Chicago Blues Festival                       53725
## 4                 Chicago Jazz Festival                       51014
## 5                  Chicago Pride Parade                       19493
## 6      Chicago St. Patrick's Day Parade                        1507
## 7                      Chinese New Year                         656
## 8                         Christmas Day                          87
## 9                          Columbus Day                        6568
## 10                               Diwali                        6250
## 11                            Earth Day                        2382
## 12                               Easter                        1749
## 13                          Eid al-Adha                       20989
## 14         Eid al-Fitr (End of Ramadan)                        3734
## 15                         Father's Day                       11627
## 16                             Flag Day                       10505
## 17                            Halloween                        4012
## 18                     Independence Day                       13700
## 19            International Women's Day                        1653
## 20 International Worker's Day (May Day)                        5113
## 21                           Juneteenth                       17442
## 22                         Lollapalooza                       63795
## 23                           Mardi Gras                        1857
## 24               Martin Luther King Jr.                         643
## 25                         Memorial Day                        6743
## 26                         Mother's Day                        4925
## 27                           New Year's                        1085
## 28                          Oktoberfest                      105689
## 29                               Pi Day                        4014
## 30                       Presidents Day                        1591
## 31                     Taste of Chicago                       27375
## 32                         Thanksgiving                        2236
## 33                      Valentine's Day                         674
## 34                         Veterans Day                        4209
##    holiday_member_riders_count
## 1                          973
## 2                        23426
## 3                        52167
## 4                        51252
## 5                        13503
## 6                         3262
## 7                         4029
## 8                          394
## 9                        14559
## 10                       13468
## 11                        6076
## 12                        5477
## 13                       15161
## 14                       10016
## 15                       14022
## 16                       13611
## 17                       10518
## 18                        8827
## 19                        6878
## 20                        7283
## 21                       11853
## 22                       56922
## 23                        6870
## 24                        3009
## 25                        9614
## 26                       10055
## 27                        1478
## 28                      132819
## 29                        8331
## 30                        5109
## 31                       37915
## 32                        3078
## 33                        3787
## 34                        8979
# Create a data frame from the provided data
holiday_data <- data.frame(
  holiday_name = c("Boxing Day", "Chicago Air and Water Show", "Chicago Blues Festival", "Chicago Jazz Festival", 
                    "Chicago Pride Parade", "Chicago St. Patrick's Day Parade", "Chinese New Year", "Christmas Day", 
                    "Columbus Day", "Diwali", "Earth Day", "Easter", "Eid al-Adha", "Eid al-Fitr (End of Ramadan)", 
                    "Father's Day", "Flag Day", "Halloween", "Independence Day", "International Women's Day", 
                    "International Worker's Day (May Day)", "Juneteenth", "Lollapalooza", "Mardi Gras", 
                    "Martin Luther King Jr.", "Memorial Day", "Mother's Day", "New Year's", "Oktoberfest", 
                    "Pi Day", "Presidents Day", "Taste of Chicago", "Thanksgiving", "Valentine's Day", 
                    "Veterans Day"),

  casual_riders_count = c(272, 28135, 53725, 51014, 19493, 1507, 656, 87, 6568, 6250, 2382, 1749, 20989, 
                          3734, 11627, 10505, 4012, 13700, 1653, 5113, 17442, 63795, 1857, 643, 6743, 
                          4925, 1085, 105689, 4014, 1591, 27375, 2236, 674, 4209),
  member_riders_count = c(973, 23426, 52167, 51252, 13503, 3262, 4029, 394, 14559, 13468, 6076, 5477, 
                          15161, 10016, 14022, 13611, 10518, 8827, 6878, 7283, 11853, 56922, 6870, 
                          3009, 9614, 10055, 1478, 132819, 8331, 5109, 37915, 3078, 3787, 8979)
)

# Reshape data for ggplot
library(tidyr)
library(dplyr)

holiday_data_long <- pivot_longer(holiday_data, cols = c(casual_riders_count, member_riders_count), names_to = "User_Type", values_to = "Rides_Count")

# Arrange the data in descending order based on Rides_Count
holiday_data_long <- holiday_data_long %>%
  arrange(desc(Rides_Count))

# Create Horizontal Stacked Bar Chart
holiday_plot_1 <- ggplot(holiday_data_long, aes(y = reorder(holiday_name, Rides_Count), x = Rides_Count, fill = User_Type)) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Rider Engagement During Holidays",
       x = "Number of Riders",
       y = "") +
  scale_fill_manual(values = c("casual_riders_count" = "#7F3B08", "member_riders_count" = "#542788"),
    labels = c("casual_riders_count" = "Casual", "member_riders_count" = "Member")) +  
  theme_minimal() +
  theme(axis.title.y = element_blank(), axis.text.y = element_text(hjust = 0),
        legend.key.size = unit(0.1, "in"),
    legend.text = element_text(size = 8),  # Adjust legend text size
    legend.title = element_blank()  # Remove legend title
  )

Looking at this I decided that I wanted to zoom in and focus on those holidays where ridership is greater than twenty thousand.

# Filter the data to include holidays with rider count >= 50,000
filtered_holiday_data <- holiday_data_long[holiday_data_long$Rides_Count > 20000, ]

# Calculate the total riders for each user type
total_casual_riders <- sum(filtered_holiday_data$Rides_Count[filtered_holiday_data$User_Type == "casual_riders_count"])
total_member_riders <- sum(filtered_holiday_data$Rides_Count[filtered_holiday_data$User_Type == "member_riders_count"])

# Define colors for User_Type
user_type_colors <- c("casual_riders_count" = "#7F3B08", "member_riders_count" = "#542788")

# Plot the filtered data
filtered_holiday <- ggplot(filtered_holiday_data, aes(y = reorder(holiday_name, Rides_Count), x = Rides_Count, fill = User_Type)) +
  geom_bar(stat = "identity", position = "dodge") +
  labs(title = "Rider Engagement During Holidays (Rider Count > 20,000)",
       x = "",  # Remove x-axis label for now
       y = "") +
  scale_fill_manual(values = user_type_colors,
    labels = c("casual_riders_count" = "Casual", "member_riders_count" = "Member")) +  
  theme_minimal() +
  theme(axis.title.y = element_blank(),
        axis.text.y = element_text(face = "bold", hjust = 0, vjust = 0.5),
        legend.key.size = unit(0.2, "in"),
        legend.text = element_text(size = 8),  # Adjust legend text size
        legend.title = element_blank(),  # Remove legend title
        axis.text.x = element_text(hjust = 0, vjust = 0.5),  # Center x-axis labels
        axis.title.x = element_text(hjust = 1))  # Align x-axis title to the right

# Format x-axis labels in thousands and millions with commas
filtered_holiday +
  scale_x_continuous(labels = scales::comma)

Looking at this chart we notice that for Eid al-Adha, there is only casual riders represented, this occurs due to the fact that member ridership falls under 20,000 while casual ridership is greater than 20,000.


Marketing Suggesions:

I am excited to present three strategic proposals aimed at enhancing user engagement and driving membership conversions for Cyclistic. These initiatives leverage data-driven insights to create targeted marketing campaigns that align with Cyclistic’s business objectives. First I will present the overview, and following will be the details for each proposal.


Birthday Gift of Health:

To foster a deeper connection with Cyclistic’s users, I propose implementing a special birthday offer. By utilizing the birthday information collected from users during the sign-up process to utilize the service, Cyclistic can extend a personalized promotion encouraging users to invest in their health.

Execution:

Automated Birthday Campaign

Personalized Offer with Promo Code

Engaging Content and Call to Action

Follow-Up Sequence

Compliance and Data Security


Cost Savings for Frequent Riders:

Cyclistic aims to convert frequent riders into loyal members by demonstrating potential cost savings through a membership plan. By analyzing ride history and providing personalized cost comparisons,Cyclistic can illustrate the financial benefits of membership.

Execution:

Ride History Analysis

Personalized Cost Comparison

Targeted In-App Notifications and Emails

Visualize Savings


Holiday Membership Promotions:

Capitalizing on holiday ridership trends, I propose running targeted promotions during high-demand periods. By offering special incentives for membership, Cyclistic can capture the attention and possible buy-in of users in a festive spending mindset.

Execution:

Holiday Ridership Data Analysis

Tailored Holiday Promotions

Multi-Platform Engagement


Proposal 1: Birthday Gift of Health

Concept:

To enhance user engagement and encourage membership, extend a special birthday offer to users who have provided their birthday information.

Execution:

Birthday Data Collection:

Ensure user profiles include birthday information as part of the sign-up process, in compliance with legal age requirements.

Automated Birthday Campaign:

Set up an automated system to trigger birthday greetings via email or in-app messages on the user’s special day.

Personalized Offer:

Include a unique promo code for a discounted membership rate as a birthday gift, encouraging them to invest in their health.

Engaging Content:

Craft a personalized message highlighting the value of a membership, emphasizing the positive impact on their well-being.

Call to Action:

Include a clear call to action guiding users to sign up for the membership using the provided promo code.

Follow-Up:

Implement a follow-up sequence to remind users of the limited-time offer and reinforce the benefits of membership.

Compliance and Data Security:

Ensure all data collection and usage comply with relevant privacy laws, providing users with transparency and control over their information.


Proposal 2: Cost Savings for Frequent Riders

Concept:

Encourage frequent riders to switch to a membership plan based on their past ride history and potential cost savings.

Execution:

Ride History Analysis:

Analyze user ride data to identify individuals who could benefit from a membership plan based on their past ride frequency.

Personalized Cost Comparison:

Provide each user with a customized cost analysis, demonstrating potential savings with a membership compared to pay-per-ride. In-App Notifications and Emails:

Send targeted in-app messages and emails to eligible users, presenting them with the cost-saving opportunity. Visualize Savings:

Create visual representations showcasing the financial advantages of being a member, particularly for frequent riders.


Proposal 3: Holiday Membership Promotions

Concept:

Capitalize on holiday ridership trends to drive membership conversions during high-demand periods.

Execution:

Holiday Ridership Data Analysis:

Identify holidays with high ridership numbers, indicating potential interest and willingness to spend.

Tailored Holiday Promotions:

Design special promotions for these holidays, highlighting the financial benefits of membership during festive seasons.

Engage Across Platforms:

Utilize social media, email campaigns, and in-app notifications to reach users and promote the holiday offer.


I believe these proposals have the potential to significantly impact user conversion from casual user to membership. I look forward to your feedback and collaboration in implementing these strategies to drive Cyclistic’s growth and success.