Driv_2019 <- readRDS("Driv_2019.rds")
Cyclistic, a bike-share company in Chicago, is aiming to increase the number of annual memberships. To support this goal, the marketing analyst team seeks to better understand how casual riders and annual members use Cyclistic bikes differently. By identifying key usage patterns between these two customer segments, the team intends to design a data-driven marketing strategy focused on converting casual riders into loyal annual members.
What differences exist in the riding behavior between casual riders and annual members?
How can insights about these behaviors inform a marketing strategy to increase annual memberships?
Director of Marketing: Oversees the campaign strategy and needs compelling insights to justify marketing decisions.
Cyclistic Executives: Must approve the strategy and require professional visualizations and solid data evidence to support any recommendation.
Marketing Analyst Team: Responsible for conducting the analysis and delivering actionable insights.
We loaded the data from a CSV file using R. The dataset includes information about individual bike trips, such as trip duration, start and end times, station names and IDs, bike IDs, and user attributes like type, gender, and birth year.
Driv_2020 <- read_csv("Driv_2020.csv")
Driv_2019 <- read_csv("Driv_2019.csv")
Next, we compare the total ride duration between casual riders and annual members. To do this, we used the following R code:
duraciones_por_usuario <- Drive_2019 %>%
group_by(usertype) %>%
summarise(total_tripduration = sum(tripduration, na.rm = TRUE))
We were also interested in analyzing the average duration of rides for casual riders and annual members. This metric helps us understand differences in riding behavior and engagement between the two user types. The following R code was used to calculate the average trip duration:
promedio_duracion <- Driv_2019 %>%
group_by(usertype) %>%
summarise(promedio_duracion = mean(tripduration, na.rm = TRUE))
Finally, we are interested in comparing bike usage between casual riders and subscribers based on their age and gender. Understanding these demographic differences will help tailor marketing strategies to effectively convert casual riders into annual members.
To analyze the age distribution by user type and gender, we first calculated each rider’s age by subtracting their birth year from 2019. We then filtered out unrealistic ages (less than 1 or greater than 99) and focused only on users with clearly identified gender (“Male” or “Female”).
Next, we summarized the data by user type (casual or subscriber) and gender to calculate the average and median ages, as well as the total number of trips for each group.
Finally, we created a histogram to visualize the age distribution of riders, separated by user type and gender. This plot helps identify key demographic segments and guides targeted marketing strategies.
Driv_2019 <- Driv_2019 %>%
mutate(age = 2019 - birthyear) %>%
filter(age > 0 & age < 100) %>%
filter(gender %in% c("Male", "Female"))
age_gender_summary <- Driv_2019 %>%
group_by(usertype, gender) %>%
summarise(
mean_age = mean(age, na.rm = TRUE),
median_age = median(age, na.rm = TRUE),
count = n()
### Phase: PREPARE
To begin the analysis, the dataset corresponding to the year 2019
(Driv_2019) was selected. This dataset contains information
about bicycle trips made in the city of Chicago, including variables
such as user type (member or casual), gender, estimated age (calculated
from birth year), trip duration, among others.
The data were explored with the goal of preparing an analysis focused on identifying demographic patterns associated with user types. In particular, the following questions were addressed:
What differences exist between members and casual users in terms of age and gender?
Which demographic segments could be targeted in future marketing campaigns to convert casual users into members?
During this phase, records with missing or inconsistent data (for example, ages less than 1 year or greater than 90) were removed, and genders were filtered to include only the most reliable values: “Male” and “Female”. This initial cleaning ensured the quality of the data to be analyzed in the subsequent phases.
Once the data was cleaned and prepared, the analysis focused on comparing demographic characteristics between member and casual users. Summary statistics and visualizations were generated to explore differences in age distribution and gender proportions.
Key findings included:
These insights suggest that targeted campaigns could focus on younger demographics and female casual users to increase membership conversion. Further analysis and modeling could refine these recommendations.
The analysis conducted revealed clear differences between member and casual bicycle users in Chicago during 2019. The results show that members tend to be older and predominantly male, while casual users are younger and have a more balanced gender distribution.
These demographic patterns provide valuable insights for designing targeted marketing campaigns aimed at increasing the conversion of casual users into members, especially focusing on younger segments and female users.
Additionally, the careful data cleaning and preparation ensured the quality of the analysis, lending credibility to the conclusions drawn.
For future work, predictive analyses are recommended to more precisely identify potential users for loyalty strategies.
This portfolio will be published on platforms such as Kaggle and GitHub, facilitating professional exposure through LinkedIn to showcase data analysis skills and results communication.