Cyclistic case study

step1:Ask

1.What is the problem you are trying to solve? The difference in use between casual riders and annual (member)riders.

2.What is the stakeholder’s expectation? Design marketing strategies aimed at converting casual riders into annual members. The team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics.

3.How can your insights drive business decisions? The results of this analysis will be used to design a new marketing strategy to convert casual riders to annual members.

Identify business task: Analyze the differences in bike usage between casual riders and annual members to develop insights that will inform a marketing strategy aimed at converting casual riders into annual members.

Consider key stakeholders Director of Marketing (Lily Moreno), Cyclistic marketing analytics team and Cyclistic executive team

Introduction

The Cyclistic case study focuses on analyzing Divvy bike-share data to understand usage patterns between member and casual riders. The data for this analysis was sourced from the Divvy Trip Data repository, covering the year 2020.

Data Source

The following datasets were downloaded and used for this analysis: - 202004-divvy-tripdata.zip - 202005-divvy-tripdata.zip - 202006-divvy-tripdata.zip - 202007-divvy-tripdata.zip - 202008-divvy-tripdata.zip - 202009-divvy-tripdata.zip - 202010-divvy-tripdata.zip - 202011-divvy-tripdata.zip - 202012-divvy-tripdata.zip - Divvy_Trips_2020_Q1.zip

These datasets contain ride details for different months of 2020.

Workflow

The workflow involves the following stages: 1. Data Preparation: Combining all monthly files into a single table in BigQuery. 2. Data Cleaning: Removing invalid or missing data, ensuring uniform data types. 3. Data Analysis: Exploring patterns in ride data and generating insights. 4. Visualization: Presenting the findings using graphical plots in R.

Combining Data in BigQuery

The datasets were uploaded to BigQuery, and the following SQL query was used to combine them into a unified table:

SELECT * FROM `project.dataset.202004`
UNION ALL
SELECT * FROM `project.dataset.202005`
UNION ALL
SELECT * FROM `project.dataset.202006`
-- Repeat for other months

Install important packages:

#install important packages
library(tidyverse)
library(readr)
library(dplyr)
library(tidyr)
library(lubridate)
library(DT)

Load uncleaned dataset

Data <- read_csv("data_trip.csv")

## Rows: 42140 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (7): ride_id, rideable_type, started_at, ended_at, start_station_name, e...
## dbl (6): start_station_id, end_station_id, start_lat, start_lng, end_lat, en...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Step2:Prepare(Data Exploration)

1) ride_id: the length of the ride id should be uniform

library(tidyr)
library(dplyr)

# 1. Check Ride ID Length Consistency
ride_id_length <- Data %>%
  mutate(ride_id_length = nchar(ride_id)) %>%
  group_by(ride_id_length) %>%
  summarize(count = n()) %>%
  arrange(ride_id_length)

# Create an interactive DataTable with proper alignment
datatable(
  ride_id_length, 
  colnames = c("Rider ID Length", "Count"),          # Custom column names
  caption = htmltools::tags$caption(
    style = 'caption-side: bottom; text-align: center; font-size: 14px; color: grey;',
    "Interactive Table: Rider ID Length Summary"
  ),                                                 # Add a caption at the bottom
  options = list(
    pageLength = 5,                                  # Default number of rows per page
    autoWidth = TRUE,                                # Automatically adjust column width
    dom = 'Bfrtip',                                  # Add buttons (copy, CSV, etc.)
    buttons = c('copy', 'csv', 'excel', 'pdf', 'print'), # Export buttons
    className = 'hover',                             # Highlight rows on hover
    columnDefs = list(
      list(className = 'dt-center', targets = "_all"), # Center-align all columns
      list(width = '50%', targets = 0),              # Set specific width for the first column
      list(width = '50%', targets = 1)               # Set specific width for the second column
    ),
    initComplete = JS(                               # JavaScript for better theme
      "function(settings, json) {
         $(this.api().table().container()).css({'font-family': 'Arial', 'font-size': '14px'});
      }"
    )
  ),
  extensions = c('Buttons', 'Responsive')           # Add export buttons and responsiveness
)

–the ride_id is consistent with 16 characters.

2) rideable_type: determine the type of bikes

rideable_type <- Data %>%
  group_by(rideable_type) %>%
  summarize(total = n())

–there are two types of bike: electric and docked

3) started_at, ended_at: ride duration

 invalid_ride_duration <- Data %>%
  mutate(ride_duration = as.numeric(difftime(ended_at, started_at, units = "mins"))) %>%
  filter(ride_duration <= 1 | ride_duration >= 1440) %>%
  select(ride_id, started_at, ended_at, ride_duration)

– check if the ride time is less than a minute or longer than a day – the end time is behind the start time – TIMESTAMP is in YYYY-MM-DD hh:mm:ss UTC format

4) name & id of start_station and end_station

# Count Null Values for Start/End Station Names
station_nulls_summary <- Data %>%
  summarize(
    start_station_nulls = sum(is.na(start_station_name)),
    end_station_nulls = sum(is.na(end_station_name))
  )

– a total of 1229 start_station_name with null values are determined – a total of 1049 start_station_name with null values are determined

# Check Null Values in Station IDs
station_id_nulls_summary <- Data %>%
  filter(is.na(start_station_id) | is.na(end_station_id)) %>%
  summarize(null_station_ids = n())

– 1690 null values are observed

5) member_casual: type of membership

membership_type_summary <- Data %>%
  group_by(member_casual) %>%
  summarize(membership_count = n())

– only two types: member and causal – total membership count : 25543 – total Casual count: 16597

Step3:Process(data cleaning)

Create new tables:ride_length,month_name,day_name
Data type Validation
check ride_length column
remove null values

data_trip <- Data %>%
  # Create 'month_name' from 'started_at'
  mutate(month_name = format(as.POSIXct(started_at, format = "%Y-%m-%d %H:%M:%S"), "%B"),
         # Create 'day_name' from 'started_at'
         day_name = format(as.POSIXct(started_at, format = "%Y-%m-%d %H:%M:%S"), "%A"),
         # Calculate 'ride_length' in minutes
         ride_length =round(as.numeric(difftime(as.POSIXct(ended_at, format = "%Y-%m-%d %H:%M:%S"),
                                           as.POSIXct(started_at, format = "%Y-%m-%d %H:%M:%S"), 
                                           units = "mins")))) %>%
  # 2. Filter Rows Based on Conditions
  filter(
    ride_length > 1 & ride_length < 1440,  # Ride length must be between 1 minute and 24 hours
    !is.na(ride_id),                      # No null values in ride_id
    !is.na(rideable_type),                # No null values in rideable_type
    !is.na(started_at),                   # No null values in started_at
    !is.na(ended_at),                     # No null values in ended_at
    !is.na(start_station_name),           # No null values in start_station_name
    !is.na(start_station_id),             # No null values in start_station_id
    !is.na(end_station_name),             # No null values in end_station_name
    !is.na(end_station_id),               # No null values in end_station_id
    !is.na(start_lat),                    # No null values in start_lat
    !is.na(start_lng),                    # No null values in start_lng
    !is.na(end_lat),                      # No null values in end_lat
    !is.na(end_lng),                      # No null values in end_lng
    !is.na(member_casual)                 # No null values in member_casual
  )
# Remove duplicates
data_trip <- data_trip[!duplicated(data_trip), ]

# Check for duplicate rows in the Cyclistic dataset
duplicate_rows <- data_trip[duplicated(data_trip), ]

# Count the number of duplicate rows
num_duplicates <- nrow(duplicate_rows)

# Output results
cat("Number of duplicate rows in the dataset:", num_duplicates, "\n")

## Number of duplicate rows in the dataset: 0

if (num_duplicates > 0) {
  cat("Here are the duplicate rows:\n")
  print(duplicate_rows)
} else {
  cat("No duplicate rows found in the dataset.\n")
}

## No duplicate rows found in the dataset.

# Check for total null values in the entire dataset
sum(is.na(data_trip))

## [1] 0

# Check null values for each column
colSums(is.na(data_trip))

##            ride_id      rideable_type         started_at           ended_at 
##                  0                  0                  0                  0 
## start_station_name   start_station_id   end_station_name     end_station_id 
##                  0                  0                  0                  0 
##          start_lat          start_lng            end_lat            end_lng 
##                  0                  0                  0                  0 
##      member_casual         month_name           day_name        ride_length 
##                  0                  0                  0                  0

Step 4 & 5: Analyze , Visualization

Table1:Total Rides by Bike Type and Membership Type

#total number of rides by each bike type and by member type
rides_per_bike_type <- data_trip %>% 
  group_by(rideable_type,member_casual) %>% 
  summarise(total_rides= n(),.groups = "drop")

Visualize the table in bar chart:

This bar chart compares the total number of rides for different bike types (rideable_type) between members and casual riders. It highlights trends in bike usage based on membership status.

library(ggplot2)
ggplot(rides_per_bike_type, aes(x = rideable_type, y = total_rides, fill = member_casual)) +
  geom_bar(stat = "identity", position = "dodge") +  # Bar plot with dodged bars
  labs(title = "Total Rides by Bike Type and Membership Type",
       x = "Bike Type",
       y = "Total Rides") +
  theme_minimal() +
  scale_fill_manual(values = c("member" = "blue", "casual" = "orange"))  # Customize colors

Figure 1: Total rides by bike type for members and casual riders.

Summary of Total Rides by Bike Type and Membership Type

Key findings include:

Docked bikes are predominantly used by members, with significantly higher ride counts compared to casual riders.
Electric bikes show a more balanced usage between members and casual riders, with casual riders slightly leading in total rides.

These insights suggest that docked bikes are favored by regular users, while electric bikes appeal equally to both membership types.

Table 2: Total Rides per Month by Membership Type

The following table summarizes the total number of rides for each month, grouped by membership type:

library(dplyr)
rides_per_month <- data_trip %>%
  group_by(month_name, member_casual) %>%
  summarise(total_rides = n(),.groups = "drop")

Visualize Monthly Total Rides by Membership Type (Faceted View)

The faceted bar chart below visualizes the total rides for each month, grouped by membership type (member and casual). Each panel focuses on one membership type, allowing for clearer identification of trends.

Insights

Casual Riders:

Seasonal Preference: Rides peak around May, likely aligning with warmer weather, outdoor activities, or leisure riding.
Decline in Winter: There’s a noticeable drop during colder months, suggesting casual riders are less inclined to use the service during unfavorable weather.

Members:

Winter Peak: A spike in rides during January-February might reflect commuting for school or work, where regular transportation needs persist despite the weather.

Comparison:

Ride Volume: Members significantly outnumber casual riders in total ride counts, reflecting their regular and committed usage.
Purpose of Usage: The difference in ride patterns suggests that casual riders tend to ride for leisure, while members primarily use the service for daily commuting, such as traveling to school or work.

Table3: Total Number of Rides by Member Type and Day of Week

This analysis shows the total number of rides for each day of the week, split by member and casual rider types

total_rides_by_day <-data_trip %>% 
  group_by(day_name,member_casual) %>% 
  summarise(total_ride=n(),.groups = "drop")

Visualize of Total Rides by Day and Membership Type

The following plot shows the total number of rides for each day of the week, split by membership type (casual and member). The bars are grouped side-by-side to compare the number of rides between casual and member riders for each day.

Total Rides by Day and Membership Type

This visualization compares the total rides taken by casual riders and annual members across each day of the week. It provides valuable insights into rider behavior patterns.

Key Insights

Casual Riders:
- Casual riders show a clear preference for weekends (Saturday and Sunday), with the highest number of rides occurring on Saturday.
- The number of rides during weekdays is significantly lower, indicating that casual riders primarily use bikes for leisure purposes.
Annual Members:
- Members have consistent ridership across all days of the week, with slightly higher usage on weekdays, suggesting frequent usage for commuting purposes.
- This steady trend highlights the utility-oriented behavior of annual members compared to the leisure-driven behavior of casual riders.

Table 4:Total Rides by Hour of Day and Membership Type

This analysis categorizes the time of day into four periods: Night, Morning, Afternoon, and Evening. We then group the data by hour of the day, membership type (casual or member), and time of day. The total number of rides for each category is calculated, and the results are summarized. A bar plot is generated to visualize how ride counts vary by time of day and membership type.

# Data preparation:Extract the hour of day from the start_time
data_trip$started_at <- as.POSIXct(data_trip$started_at, format = "%Y-%m-%d %H:%M:%S", tz = "UTC")
data_trip$hour_of_day <- format(data_trip$started_at, "%H")
data_trip$hour_of_day <- as.numeric(data_trip$hour_of_day)#Convert to numeric for easier analysis

# Categorize time of day
data_trip$time_of_day <- case_when(
  data_trip$hour_of_day >= 0 & data_trip$hour_of_day < 6 ~ "Night",
  data_trip$hour_of_day >= 6 & data_trip$hour_of_day < 12 ~"Morning",
  data_trip$hour_of_day >= 12 & data_trip$hour_of_day < 18 ~ "Afternoon",
  data_trip$hour_of_day >= 18 & data_trip$hour_of_day <= 23 ~ "Evening"
)
# Group by hour of day and member/casual, then count the number of rides
ride_counts_by_time <- data_trip %>%
  group_by(hour_of_day, member_casual,time_of_day) %>%
  summarise(total_rides = n(), .groups = "drop") %>% 
  arrange(hour_of_day, member_casual)

From this analysis, you can gain insights into which times of the day are most popular for both casual and member riders.

Summary of Observations and Insights

Observations:

Casual Riders:
- Usage peaks in the afternoon (4-5 PM) and is highest during leisure hours.
- Minimal activity in the early morning and late evening.
Annual Members:
- Two clear peaks: morning (7-9 AM) and evening (4-6 PM), reflecting commuting behavior.
- Steady usage throughout the day, with minimal activity after 7 PM.
Off-Peak Hours:
- Both groups show very low usage between 12 AM and 6 AM.

Insights:

User Behavior:
- Casual riders primarily use bikes for leisure, aligning with afternoon peaks.
- Members use bikes for structured, routine commuting, reflected in morning and evening peaks.
Marketing Opportunities:
- Target casual riders with promotions during late morning and afternoon hours.
- Highlight the convenience and cost-effectiveness of memberships for commuters.
Operational Adjustments:
- Ensure bike availability during commuting hours (7-9 AM, 4-6 PM) for members.
- Allocate additional resources for casual riders during afternoons and weekends.
Maintenance and Redistribution:
- Regularly redistribute bikes before peak hours to meet demand for both user groups.

Table5: Average Ride Length by Membership Type

This section calculates and visualizes the average ride length by membership type (casual vs member).

#average ride length by each member/casual
average_ride_length <- data_trip %>%
  group_by(member_casual) %>%
  summarise(avg_ride_length = mean(ride_length, na.rm = TRUE))

Insights

Casual Riders:

Longer Ride Duration: Casual riders average about 40 minutes per ride, which is significantly longer than member rides.
Leisure Usage: The longer ride duration likely reflects recreational or exploratory trips rather than utilitarian purposes.

Members:

Shorter Ride Duration: Members average about 15 minutes per ride, indicating more practical, short-distance trips.
Commuting Focus: This shorter average ride length aligns with the hypothesis that members often use rides for commuting to work or school.

Comparison:

Behavioral Differences: The stark contrast in average ride lengths highlights differing usage patterns: casual riders tend to take longer, leisure-oriented trips, while members focus on shorter, utility-driven rides.

Table6: Average Ride Length per day of week

#Average ride length per day of week
average_ride_length_per_day <- data_trip %>%
  group_by(day_name, member_casual) %>%
  summarise(avg_ride_length_per_day = mean(ride_length, na.rm = TRUE),.groups = "drop")

Key Observations

Casual Riders:
- Casual riders have consistently longer ride lengths compared to members across all days.
- The longest average ride lengths occur on weekends (Saturday: 40.01 minutes, Sunday: 41.13 minutes), indicating leisure-oriented usage.
- The shortest ride lengths are observed on weekdays (e.g., Tuesday: 34.66 minutes).
Annual Members:
- Members exhibit shorter and consistent ride lengths, typically ranging between 13.99 and 17.57 minutes.
- Slightly longer ride lengths are observed on weekends (e.g., Saturday: 17.57 minutes, Sunday: 16.99 minutes), likely reflecting some leisure usage.
Day-wise Trends:
- The longest rides for casual riders occur on Sundays, while members’ rides peak slightly on Saturdays.
- Both groups tend to take shorter rides during weekdays, likely due to time constraints and commuting patterns.

Insights

User Behavior:
- Casual riders are leisure-oriented, with longer rides on weekends, while members are utility-driven with shorter, consistent rides.

Table 7: Identify The Most Popular Start and End stations For Members and

1. start station for casual riders

# Reusable function for filtering, grouping, and sorting by station
get_station_data <- function(data, rider_type, station_type) {
  data %>%
    filter(member_casual == rider_type) %>%
    group_by(!!sym(station_type)) %>%
    summarise(total_rides = n(), .groups = "drop") %>%
    arrange(desc(total_rides))
}

# Get data for casual riders (starting station)
start_station_casual <- get_station_data(data_trip, "casual",
                                         "start_station_name")

Key Observations

Top Locations:
- The most popular starting station for casual riders is Streeter Dr & Grand Ave, with a total of 332 rides.
- Other key locations include Lake Shore Dr & Monroe St (257 rides) and Millennium Park (201 rides).
Location Insights:
- Many high-ranking stations are near tourist hotspots or scenic locations in Chicago, such as Millennium Park and Lake Shore Drive, which attract leisure and occasional riders.
- Proximity to parks and waterfront areas plays a significant role in station popularity.
Optimization Opportunities:
- Stations near heavily trafficked areas (e.g., Streeter Dr & Grand Ave) should be prioritized for bike redistribution and maintenance to meet high demand.
- Marketing efforts targeting casual riders could focus on these locations, promoting scenic routes and leisure-friendly activities.
Long-Tail Pattern:
- While the top 10 stations account for a significant number of rides, the total number of starting stations is extensive, with many stations contributing smaller individual counts. This highlights the broad distribution of casual riders across the city.

Insights

Tourism and Leisure:
- Focus marketing campaigns around top starting stations, highlighting leisure benefits and tourist attractions nearby to attract casual riders.

2. start station for member riders

Observations

Popular Urban Stations:
- Broadway & Barry Ave, Wells St & Concord Ln, and Kingsbury St & Kinzie St are among the busiest stations for members, reflecting high commuter activity.
Proximity to Key Locations:
- Stations near parks, such as Theater on the Lake, and business hubs like Dearborn St & Erie St, are significant for leisure and work-related travel.
Commuter Habits:
- Members appear to prioritize urban and residential stations, showing a preference for stations with easy access to business districts and neighborhoods.
Operational Implications:
- High-traffic stations require regular maintenance and redistribution of bikes to ensure availability during peak hours.

3. end station for casual riders

Observations

Popular Destinations:
- Streeter Dr & Grand Ave stands out as the most popular ending station for casual riders, with significantly higher usage compared to others.
- Locations near parks and tourist attractions, such as Millennium Park and Theater on the Lake, indicate high usage for leisure activities.
Tourist and Recreational Use:
- Ending stations near lakefronts and tourist spots, such as Lake Shore Dr & Monroe St and Michigan Ave & Oak St, highlight the preference of casual riders for scenic routes and recreational areas.
Proximity to City Attractions:
- Stations like Michigan Ave & 8th St and Indiana Ave & Roosevelt Rd are close to prominent city attractions and serve as convenient stops for visitors.
Operational Insights:
- Frequent use of these stations suggests a need for improved bike availability and docking capacity during peak times, especially weekends and holidays.

4. end station for member riders

Observations

Frequent Stations:
- Clark St & Elm St emerges as the most popular station for ending rides among member riders, with a significant number of rides.
- Stations near key commercial and residential hubs such as St. Clair St & Erie St and Kingsbury St & Kinzie St also feature prominently.
Member Preferences:
- Members prefer stations located near workplaces, residential areas, and transportation hubs, indicating that bikes are commonly used for commuting purposes.
Balanced Distribution:
- Unlike casual riders, member riders display a more even spread of popular ending stations, reflecting diverse usage patterns across the city.
Tourist and Scenic Locations:
- Scenic stations like Theater on the Lake still feature among the top destinations, suggesting that some members use bikes for recreational purposes.

The chart compares Total Rides (represented by blue bars) and Average Ride Length(represented by red dots) for two membership types: casual and member.

# Create a combined data frame for visualization
combined_data <- data.frame(
  member_casual = c("member", "casual"),
  total_rides = c(sum(data_trip$member_casual == "member"), sum(data_trip$member_casual == "casual")),
  avg_ride_length = c(mean(data_trip$ride_length[data_trip$member_casual == "member"], na.rm = TRUE),
                      mean(data_trip$ride_length[data_trip$member_casual == "casual"], na.rm = TRUE))
)

# Plot the data
ggplot(combined_data, aes(x = member_casual)) +
  geom_col(aes(y = total_rides), fill = "skyblue", alpha = 0.7) +
  geom_line(aes(y = avg_ride_length * 1000, group = 1), color = "tomato", size = 1) +
  geom_point(aes(y = avg_ride_length * 1000), color = "tomato", size = 3) +
  scale_y_continuous(
    name = "Total Rides",
    sec.axis = sec_axis(~ . / 1000, name = "Avg Ride Length (minutes)")
  ) +
  labs(
    title = "Comparison of Total Rides and Average Ride Length",
    x = "Membership Type"
  ) +
  theme_minimal()

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Key Insights

Higher Total Rides for Members:
- Members account for significantly more total rides than casual riders.
- Indicates that members are consistent, frequent users, likely utilizing the service for daily routines such as commuting.
Longer Average Ride Length for Casual Riders:
- Casual riders have a substantially longer average ride length.
- Suggests that casual riders primarily use the service for leisure or recreational trips, like sightseeing or occasional activities.
Efficiency and Usage Patterns:
- Members tend to use the service for shorter, more frequent trips, indicating predictable and utilitarian use.
- Casual riders, on the other hand, favor longer but less frequent trips.
Membership Value Proposition:
- The high number of total rides by members demonstrates the value of converting casual riders into members.
- Targeting casual riders with membership promotions could boost ridership and revenue.
Strategic Implications:
- For Members: Continue focusing on convenience and affordability to retain and expand this user base.
- For Casual Riders: Consider offering flexible pricing or promotions tailored to their longer ride lengths and occasional usage to encourage higher adoption.

Step6: ACT (Recommendations)

1. Membership Conversion:

Offer 20–30% discounts on annual memberships for casual riders who have taken more than 10 rides in a month.
Provide a one-month free trial or 50% off the first month for new members.

2. Leisure Marketing:

Create weekend promotions offering 10% off ride costs or free rides on Sundays during off-peak hours.
Promote scenic bike routes and guided tours with discounts up to 15% for casual riders.

3. Operational Optimizations:

Ensure 95% bike availability at key commuter stations during peak hours.
Introduce a pre-booking feature allowing members to reserve bikes at no extra cost.

4. Seasonal Strategies:

Offer seasonal membership passes (e.g., summer pass for $50) with savings of 20–25% compared to casual pricing.
Provide free heated handle grips or 10% off winter gear rentals for winter riders.

5. Digital Engagement:

Send personalized ride summaries to casual riders, highlighting their potential savings as members.
Use push notifications offering 10–20% discounts for casual riders to encourage repeat usage.

6. Data-Driven Promotions:

Identify frequent casual riders (e.g., those with 5+ rides per month) and offer them 15–25% discounts on memberships.
Provide 5–10% discounts for casual riders taking rides longer than 30 minutes.

7. Improved Accessibility:

Allocate extra bikes at top stations (e.g., increasing availability by 20% at “Southport Ave & Wellington Ave”).
Allow casual riders to upgrade to membership with 0% interest monthly payment plans.

8. Behavioral Nudges:

Use price anchoring to show annual memberships as 40% cheaper per ride compared to casual pricing.
Gamify experiences with challenges like “Ride 50 miles in a month” to earn 10% off the next month’s rides.

9. Expand Marketing Channels:

Use paid social media ads offering exclusive 10% discounts on first rides.
Partner with local businesses to offer 5% discounts or special deals for rides to/from partner locations.

10. Performance Monitoring:

Target a 10% increase in membership conversion rates over 6 months.
Aim for a 5–7% increase in casual rider revenue through promotions and upselling.

Cyclistic case study

zahra khanlarzadehkolaei

2024-11-23

step1:Ask

Introduction

Data Source

Workflow

Combining Data in BigQuery

Install important packages:

Step2:Prepare(Data Exploration)

1) ride_id: the length of the ride id should be uniform

2) rideable_type: determine the type of bikes

3) started_at, ended_at: ride duration

4) name & id of start_station and end_station

5) member_casual: type of membership

Step3:Process(data cleaning)

Step 4 & 5: Analyze , Visualization

Table1:Total Rides by Bike Type and Membership Type

Visualize the table in bar chart:

Summary of Total Rides by Bike Type and Membership Type

Key findings include:

Table 2: Total Rides per Month by Membership Type

Visualize Monthly Total Rides by Membership Type (Faceted View)

Insights

Casual Riders:

Members:

Comparison:

Table3: Total Number of Rides by Member Type and Day of Week

Visualize of Total Rides by Day and Membership Type

Total Rides by Day and Membership Type

Key Insights

Table 4:Total Rides by Hour of Day and Membership Type

Summary of Observations and Insights

Observations:

Insights:

Table5: Average Ride Length by Membership Type

Insights

Casual Riders:

Members:

Comparison:

Table6: Average Ride Length per day of week

Key Observations

Insights

Table 7: Identify The Most Popular Start and End stations For Members and

1. start station for casual riders

Key Observations

Insights

2. start station for member riders

Observations

3. end station for casual riders

Observations

4. end station for member riders

Observations

The chart compares Total Rides (represented by blue bars) and Average Ride Length(represented by red dots) for two membership types: casual and member.

Key Insights

Step6: ACT (Recommendations)

1. Membership Conversion:

2. Leisure Marketing:

3. Operational Optimizations:

4. Seasonal Strategies:

5. Digital Engagement:

6. Data-Driven Promotions:

7. Improved Accessibility:

8. Behavioral Nudges:

9. Expand Marketing Channels:

10. Performance Monitoring: