The Phases of Data Analysis

Phase 1: Ask

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

Key Tasks

  • Determine how casual riders and annual members use Cyclistic bikes differently
  • Identify strategies that can convert casual riders into annual members
  • The key stakeholders are:
    • Director of Marketing
    • Analytics Team
    • Executive Team

Deliverable

The business task is to analyze how annual members and casual riders use Cyclistic bikes differently in order to design marketing strategies that convert casual riders into annual members.

Phase 2: Prepare

Key Considerations

1. Data Source and Location The CSV files were downloaded from the given link and stored on a local drive

2. Data organization Data are in the form of CSV files that are stored in a data folder

3. Issues with bias or credibility in this data The data are credible, reliable and original because they were collected from the company’s bike system. It can be considered comprehensive because it covers the data that can be used to answer the business problem, which is the subject of this analysis.

Problems with the data Columns such as Gender and Birth Year contain null values.

Phase 3: Process

The following tools were used to process the data:

1. RStudio The CSV files were imported and stored in data frames. Cleaning of data was done using the RStudio’s functions.
2. LibreOffice Calc

The following were the processes involved/applied in cleaning the data

  1. Using the LibreOffice Calc, the following tasks were performed on 2019 data:
    • Deleted the columns: TripID, BikeID, from_statiod_ID, to_stationd_ID, birthyear
    • Replaced null values with NA in the Gender column
    • Renamed start_time and end_time to start_date_time and end_date_date respectively

  2. Using the LibreOffice Calc, the following tasks were performed on 2020 data:
    • Deleted the columns: ride_id, rideable_type, start_lat,start_lng, end_lat, end_lng, start_statiod_id, end_station_id
    • Replaced null values with NA in the Gender column
    • Renamed started_at and ended_at to start_date_time and end_date_date respectively
    • Renamed start_station_name and end_station_name to from_station_name and to_station_name respectively
    • Rename member_casual to usertype

    • Replaced values of usertype from member to Subscriber and casual to Customer
    • Calculated tripduration and expressed in minutes
  3. The columns included in the 2 CSV files were matched to determine those the matching and non-matching columns. Matching columns were assigned the same column name in preparation for the merging of data from the 2 files.

Phase 4: Analyze

Summary of Analysis

1. Data Cleaning and Preparation * Missing values in the gender column were handled appropriately. * Datetime fields were converted and separated to allow analysis by hour of day. * Datasets from 2019 Q1 and 2020 Q1 were merged to create a unified dataframe for analysis.

2. Exploratory Analysis * Ride distribution by user type: Subscribers account for 91% of rides, while casual riders contribute 9%. * Average trip duration: Casual riders have longer trips (1,266.2 minutes) compared to subscribers (402.3 minutes). * Rides by time of day: Subscribers ride mostly during commuting hours (7–9 AM, 4–6 PM); casual riders ride mostly midday and afternoon. * Top starting stations: Subscribers begin rides near offices and transit hubs; casual riders start near parks, waterfronts, and tourist areas.

3. Visualization and Insights * Bar charts and pie charts were used to illustrate differences in trip duration, ride counts, and station popularity. * Patterns identified confirm distinct usage behaviors between subscribers and casual riders.

4. Business Implications * Subscribers primarily use the service for commuting, while casual riders use it for leisure. * These insights can guide operational planning, marketing strategies, and service optimization.

Phase 5: Share

Average Trip Duration by User Type

The chart below shows the average trip duration between Cyclistic’s annual members and casual riders.



Interpretation

The pie chart shows that subscribers account for 91% of total rides, while customers represent only 9%.

This indicates that the majority of Cyclistic’s bike usage comes from annual members who ride frequently, likely for commuting or regular travel. In contrast, casual riders (customers) make up a small portion of total rides, suggesting their use is occasional or recreational.

This insight highlights a key opportunity for Cyclistic: to design marketing strategies that encourage casual riders to become annual members, thereby increasing long-term engagement and revenue.



Interpretation

The bar chart shows that customers have a much longer average trip duration (1,266.2 minutes) compared to subscribers (402.3 minutes).

This finding suggests that casual riders (customers) typically use Cyclistic bikes for recreational or leisure activities, which often involve longer rides and flexible schedules. In contrast, subscribers (annual members) tend to take shorter trips, likely for daily commuting or routine travel.

The clear difference in trip duration highlights distinct user behavior patterns. Cyclistic could leverage this insight by developing membership promotions targeting casual riders who frequently take long rides, encouraging them to convert to annual members for better value.



Interpretation

The visualization shows distinct patterns in ride activity between subscribers and customers throughout the day.

Subscribers tend to take most of their rides during morning (around 7–9 AM) and evening (around 4–6 PM) hours — typical commuting times. This pattern suggests that annual members primarily use Cyclistic bikes for work or school commutes.

Customers, on the other hand, show higher activity during midday and afternoon hours (10 AM–4 PM), indicating that they use the service mainly for leisure, exercise, or tourism.

This difference in riding patterns highlights how members use the service for practicality, while casual riders use it for recreation. Understanding these usage peaks can help Cyclistic optimize bike availability and target marketing campaigns based on time-of-day demand.



Interpretation

From the chart above, we can see that annual members frequently start rides from stations located near downtown offices and transport hubs, such as Clark St & Lake St or Clinton St & Madison St. In contrast, casual riders tend to start from recreational or tourist areas, such as Streeter Dr & Grand Ave, which is near Navy Pier. This indicates that members often use bikes for commuting, while casual riders use them for leisure or tourism.


Summary of Findings

Ride Distribution by User Type * Subscribers (annual members) account for 91% of total rides; customers (casual riders) make up only 9%.

Average Trip Duration * Customers take significantly longer trips (1,266.2 minutes) than subscribers (402.3 minutes).

Rides by Time of Day * Subscribers ride mainly during morning (7–9 AM) and evening (4–6 PM) hours — typical commuting patterns.

Top Starting Stations * Subscribers start rides near downtown offices and transit hubs. * Customers start rides near parks, waterfronts, and tourist areas.

Recommendations

1. Targeted Marketing Promote annual memberships to frequent casual riders to increase subscriber conversion. Offer leisure-focused passes or discounts for casual riders to enhance engagement.

2. Operational Optimization Ensure bike availability near office districts and transit hubs during peak hours for subscribers. Increase bikes at recreational or tourist locations during midday and weekends for casual riders.

3. Service Enhancements Use insights on trip duration and peak times to redistribute bikes efficiently and reduce shortages. Tailor communications and incentives based on riding patterns to improve customer satisfaction and retention.

Final Conclusion

Cyclistic bike-share usage differs clearly by user type: subscribers ride frequently and primarily for commuting, while casual riders take longer, leisure-oriented trips.

By leveraging these insights, Cyclistic can optimize operations, improve bike availability, and design targeted marketing strategies, ultimately enhancing both user experience and revenue.

Key takeaway: Understanding subscriber and casual rider behavior enables data-driven decisions that support sustainable growth and operational efficiency.