| title:“Cyclistic_Case_Study” |
| output:html_document |
| date:“2023-08-06” |
You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.
Three questions will guide the future marketing program:
Moreno has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?
Key tasks 1. Identify the business task * Assess recorded bike trip data focusing on how casual and member riders behave in order to design marketing strategies aimed at converting casual riders into annual members.
deliverable: A clear statement of the business task * Utilize bike trip data to recommend actions to management in order to convert casual riders into annual members
Key tasks
The following R packages were downloaded to aid in processing: install.packages(“tidyverse”)
library(tidyverse)
install.packages(“lubridate”)
library(lubridate)
install.packages(“ggplot2”)
library(ggplot2)
Download data and store it appropriately.
Identify how it’s organized.
Sort and filter the data.
Data was sorted in ascending order starting from June-2022 and ending with May-2023.
Determine the credibility of the data.
Deliverable: A description of all data sources used.
The data has been made available by Motivate International Inc. under this linklicense.This is public data that you can use to explore how different customer types are using Cyclistic bikes. The twelve months of data between June 2022 to May 2023 were downloaded as 12 .zip files and converted into .CSV files.
Data sources used consists bike trip data of the twelve months of data from June 2022 up to May 2023.
Key tasks
Check the data for errors.
Choose your tools.
Transform the data so you can work with it effectively.
Document the cleaning process.
Deliverable:Documentation of any cleaning or manipulation of data
Data wasread using “read_csv()” assigned to variables based on the month and year of the data they represent. The data was then merged using ‘bind_rows’ to create a signle dataframe for manipulation and cleaning.
all_trips<-bind_rows(jun_2022,jul_2022,aug_2022,sep_2022,oct_2022,nov_2022,dec_2022,jan_2023,feb_2023,mar_2023,apr_2023,may_2023)
The following functions were used to inspect the new dataframe:
List of all column names and number or rows in the dataframe.
colnames(all_trips)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual"
nrow(all_trips)
## [1] 5829030
“mutate” was used to convert subscriber and customer text in dataframe to proper member and casual text.
all_trips <- all_trips %>%
mutate(member_casual = recode(member_casual
,"Subscriber" = "member"
,"Customer" = "casual"))
Columns for the date, month, day, year, and day of the week were added to the dataframe.
all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
colnames(all_trips)
## [1] "ride_id" "rideable_type" "started_at"
## [4] "ended_at" "start_station_name" "start_station_id"
## [7] "end_station_name" "end_station_id" "start_lat"
## [10] "start_lng" "end_lat" "end_lng"
## [13] "member_casual" "date" "month"
## [16] "day" "year" "day_of_week"
Ride length(in seconds) was added to each trip.
all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
str(all_trips)
## tibble [5,829,030 × 19] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:5829030] "600CFD130D0FD2A4" "F5E6B5C1682C6464" "B6EB6D27BAD771D2" "C9C320375DE1D5C6" ...
## $ rideable_type : chr [1:5829030] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
## $ started_at : POSIXct[1:5829030], format: "2022-06-30 17:27:53" "2022-06-30 18:39:52" ...
## $ ended_at : POSIXct[1:5829030], format: "2022-06-30 17:35:15" "2022-06-30 18:47:28" ...
## $ start_station_name: chr [1:5829030] NA NA NA NA ...
## $ start_station_id : chr [1:5829030] NA NA NA NA ...
## $ end_station_name : chr [1:5829030] NA NA NA NA ...
## $ end_station_id : chr [1:5829030] NA NA NA NA ...
## $ start_lat : num [1:5829030] 41.9 41.9 41.9 41.8 41.9 ...
## $ start_lng : num [1:5829030] -87.6 -87.6 -87.7 -87.7 -87.6 ...
## $ end_lat : num [1:5829030] 41.9 41.9 41.9 41.8 41.9 ...
## $ end_lng : num [1:5829030] -87.6 -87.6 -87.6 -87.7 -87.6 ...
## $ member_casual : chr [1:5829030] "casual" "casual" "casual" "casual" ...
## $ date : Date[1:5829030], format: "2022-06-30" "2022-06-30" ...
## $ month : chr [1:5829030] "06" "06" "06" "06" ...
## $ day : chr [1:5829030] "30" "30" "30" "30" ...
## $ year : chr [1:5829030] "2022" "2022" "2022" "2022" ...
## $ day_of_week : chr [1:5829030] "Thursday" "Thursday" "Thursday" "Thursday" ...
## $ ride_length : 'difftime' num [1:5829030] 442 456 809 258 ...
## ..- attr(*, "units")= chr "secs"
“ride_length” was converted from factor to numeric so calculations could be ran on the data.
New verision of the dataframe was created to remove entries of docked bicycles.
all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<=0),]
Key tasks
Aggregate your data so it’s useful and accessible.
Organize and format your data.
Perform calculations.
Identify trends and relationships.
Deliverable:A summary of your analysis
mean=straight average (total ride length / rides)
median#midpoint number in the ascending array of ride lengths
max=longest ride
min=shortest ride
all_trips_v2 %>% group_by(member_casual) %>%
summarise(average_ride_length = mean(ride_length), median_length = median(ride_length), max_ride_length = max(ride_length), min_ride_length = min(ride_length)) %>% drop_na()
## # A tibble: 2 × 5
## member_casual average_ride_length median_length max_ride_length
## <chr> <dbl> <dbl> <dbl>
## 1 casual 1844. 768 2483235
## 2 member 752. 521 93580
## # ℹ 1 more variable: min_ride_length <dbl>
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 1844.4176
## 2 member 752.3535
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 768
## 2 member 521
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 2483235
## 2 member 93580
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
## all_trips_v2$member_casual all_trips_v2$ride_length
## 1 casual 1
## 2 member 1
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
all_trips_v2 %>%
group_by(member_casual, day_of_week) %>% #groups by member_casual
summarise(number_of_rides = n() #calculates the number of rides and average duration
,average_ride_length = mean(ride_length),.groups="drop") %>% # calculates the average duration
arrange(member_casual, day_of_week) #sort
## # A tibble: 15 × 4
## member_casual day_of_week number_of_rides average_ride_length
## <chr> <chr> <int> <dbl>
## 1 casual Friday 292763 1794.
## 2 casual Monday 219537 1802.
## 3 casual Saturday 390890 2094.
## 4 casual Sunday 322008 2181.
## 5 casual Thursday 265164 1588.
## 6 casual Tuesday 229913 1654.
## 7 casual Wednesday 247445 1561.
## 8 member Friday 428644 744.
## 9 member Monday 409435 714.
## 10 member Saturday 382347 846.
## 11 member Sunday 335506 839.
## 12 member Thursday 487275 725.
## 13 member Tuesday 480681 722.
## 14 member Wednesday 502344 718.
## 15 <NA> <NA> 834511 NA
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))
aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
## all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1 casual Sunday 2181.0479
## 2 member Sunday 838.9489
## 3 casual Monday 1801.8482
## 4 member Monday 713.6457
## 5 casual Tuesday 1653.7312
## 6 member Tuesday 721.7299
## 7 casual Wednesday 1560.6771
## 8 member Wednesday 717.5140
## 9 casual Thursday 1588.3589
## 10 member Thursday 725.3273
## 11 casual Friday 1793.9839
## 12 member Friday 743.8081
## 13 casual Saturday 2094.2628
## 14 member Saturday 846.1136
all_trips_v2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarise(number_of_rides = n()
,average_duration = mean(ride_length)) %>%
drop_na() %>%
arrange(member_casual, weekday) %>%
ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
geom_col(position = "dodge") + labs(title ="Average ride time of Members and Casual riders Vs. Day of the week")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
all_trips_v2 %>%
group_by(member_casual,month) %>% summarize(number_of_rides=n()) %>% drop_na() %>%
ggplot() +
geom_col(mapping= aes(x= month, y= number_of_rides,fill=member_casual)) + labs(title ="Number of rides a month by member and casual riders")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
Create your portfolio.
Add your case study.
Practice presenting your case study to a friend or family member.
Deliverable : Your top three recommendations based on your analysis
##1. Offer a weekend membership type to encourage casual riders which are more active on the weekend to buy memberships.##
##2. Increase add campaigns starting in April focused on a summer special with a membership discount as riders of all types increase activity in these months.##
##3. Implement a referall program where new members recieve 20% off their membership for a year if someone they refer purchases a annual membership.##
all_trips_v2 %>%
group_by(member_casual,rideable_type) %>% summarize(number_of_rides=n()) %>% drop_na() %>% filter( rideable_type != 'docked_bike') %>%
ggplot() +
geom_col(mapping= aes(x= member_casual, y= number_of_rides,fill=rideable_type))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
clear_output()