To showcase my obtained skills from Google Data Analytics Professional Certificate, this case study is part of the last course (Google Data Analytics Capstone: Complete a Case study). I’ve used the 6 steps in Data Analytics (Ask, Prepare, Process, Analyze, Share, and Act) to complete this case study.
In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Until now, Cyclistic’s marketing strategy relied on building general awareness and appealing to broad consumer segments. One approach that helped make these things possible was the flexibility of its pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.
Cyclistic’s finance analysts have concluded that annual members are much more profitable than casual riders. Although the pricing flexibility helps Cyclistic attract more customers, Moreno believes that maximizing the number of annual members will be key to future growth. Rather than creating a marketing campaign that targets all-new customers, Moreno believes there is a very good chance to convert casual riders into members. She notes that casual riders are already aware of the Cyclistic program and have chosen Cyclistic for their mobility needs.
Lily Moreno - The director of marketing and your manager. Moreno is responsible for the development of campaigns and initiatives to promote the bike-share program.
Cyclistic marketing analytics team - A team of data analysts who are responsible for collecting, analyzing, and reporting data that helps guide Cyclistic marketing strategy.
Cyclistic executive team - The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.
Design marketing strategies aimed at converting casual riders into annual members. In order to do that, however, the marketing analyst team needs to better understand how annual members and casual riders differ, why casual riders would buy a membership, and how digital media could affect their marketing tactics.
How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?
The data used in this case study is from Cyclistic’s historical trip data and is available here and stored as a zip file.
The license to use this data, under Motivate International Inc., is found here.
Regarding the issues about bias/credibility of the data, ROCCC (Reliable, Original, Comprehensive, Current, Cited) is observed and followed. The data protects the cyclist’s personal data by only providing the ride number.
The data range is from January 2022 to December 2022 to make a 1 year analysis.
Downloaded the data, stored, and extracted it to a folder (CSV), then saved it as .XLSX file in XLSX folder.
First, I ensured that there are no typographical errors by filtering each column (except ride_id). Highlighted the columns that were dealing with date & time and used the format type (dd/mm/yyyy h:mm) to ensure all dates are on the right format. For time data, I used the format type ([h]:mm:ss) to ensure data are on the right format. To check for missing data, I highlighted each column and used the =ISBLANK(range) function which returns TRUE if a cell is empty and FALSE otherwise, then used the =COUNTIF(range,criteria) function to see if there are missing cells.
The following steps were made in the 12 sheets (202201 to 202212). I proceeded to make another column called mean_ride_length to calculate the average ride length by using =ABS(D2-C2) function to avoid (-) time error then formatted the column by [h]:mm:ss. After that, I made another column called day_of_week to obtain the day the ride was made using =WEEKDAY(C2,1) function then formatted the column as general (1 = Sunday and 7 = Saturday).
The 12 month data sheets were combined in a new excel file to make the analysis faster and more efficient. The new excel file contains 13 sheets, including the main sheet where pivot tables will be made. Sheets were Main, 202201 until 202212.
In the main sheet, I calculated the average ride length of every month using the average function (=AVERAGE(‘202201’!C:C)) where 202201 corresponds to the month. Then, I used PivotTable and PivotChart Wizard to add the 12 sheets to save time.
During calculation, I followed these steps:
Calculate the average ride_length for members and casual riders. Try rows = member_casual; Values = Average of ride_length.
Calculate the average ride_length for users by day_of_week. Try columns = day_of_week; Rows = member_casual; Values = Average of ride_length.
Calculate the number of rides for users by day_of_week by adding Count of trip_id to Values.
From the pivot table, I visualized the graphs that would be used to answer the business task and additional questions.
In this phase, I will be answering the questions based on my analysis.
- How do annual members and casual riders use Cyclistic bikes differently?
- Why would casual riders buy Cyclistic annual memberships?
- How can Cyclistic use digital media to influence casual riders to become members?