title:“Cyclistic_Case_Study”
output:html_document
date:“2023-08-06”

Cyclistic Case Study

Scenario

You are a junior data analyst working in the marketing analyst team at Cyclistic, a bike-share company in Chicago. The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. Therefore, your team wants to understand how casual riders and annual members use Cyclistic bikes differently. From these insights, your team will design a new marketing strategy to convert casual riders into annual members. But first, Cyclistic executives must approve your recommendations, so they must be backed up with compelling data insights and professional data visualizations.

ASK

Three questions will guide the future marketing program:

  1. How do annual members and casual riders use Cyclistic bikes differently?
  2. Why would casual riders buy Cyclistic annual memberships?
  3. How can Cyclistic use digital media to influence casual riders to become members?

Moreno has assigned you the first question to answer: How do annual members and casual riders use Cyclistic bikes differently?

Key tasks 1. Identify the business task * Assess recorded bike trip data focusing on how casual and member riders behave in order to design marketing strategies aimed at converting casual riders into annual members.

  1. Consider key stakeholders
  • Key stakeholder include Lily Moreno, the cyclistic marketing analyst team, and the cyclistic executive team

deliverable: A clear statement of the business task * Utilize bike trip data to recommend actions to management in order to convert casual riders into annual members

PREPARE

Key tasks

The following R packages were downloaded to aid in processing: install.packages(“tidyverse”)

library(tidyverse)

install.packages(“lubridate”)

library(lubridate)

install.packages(“ggplot2”)

library(ggplot2)

  1. Download data and store it appropriately.

    • The data has been made available by Motivate International Inc. under this license.) This is public data that you can use to explore how different customer types are using Cyclistic bikes. The twelve months of data between June 2022 to June 2023 were downloaded as 12 .CSV files and stored in a file on my computer.
  2. Identify how it’s organized.

    • Data was organized into 13 column containing fields that represent attributes of each bike trip. Rows contain records of each completed trip and have their own unique ride ID.
  3. Sort and filter the data.

    • The twelve CSV files were seperated based on their date and seperated into the quarters they represented.

    Data was sorted in ascending order starting from June-2022 and ending with May-2023.

  4. Determine the credibility of the data.

    • Data is current and comprehensive containing all relevant data needed for analysis.It is public data that is cited making it valid and credible for the purposes of this study.

Deliverable: A description of all data sources used.

The data has been made available by Motivate International Inc. under this linklicense.This is public data that you can use to explore how different customer types are using Cyclistic bikes. The twelve months of data between June 2022 to May 2023 were downloaded as 12 .zip files and converted into .CSV files.

Data sources used consists bike trip data of the twelve months of data from June 2022 up to May 2023.

  1. 202206-divvy-tripdata.csv
  2. 202207-divvy-tripdata.csv
  3. 202208-divvy-tripdata.csv
  4. 202209-divvy-publictripdata.csv
  5. 202210-divvy-tripdata.csv
  6. 202211-divvy-tripdata.csv
  7. 202212-divvy-tripdata.csv
  8. 202301-divvy-tripdata.csv
  9. 202302-divvy-tripdata.csv
  10. 202303-divvy-tripdata.csv
  11. 202304-divvy-tripdata.csv
  12. 202305-divvy-tripdata.csv

Process

Key tasks

  1. Check the data for errors.

  2. Choose your tools.

  3. Transform the data so you can work with it effectively.

  4. Document the cleaning process.

Deliverable:Documentation of any cleaning or manipulation of data

Data wasread using “read_csv()” assigned to variables based on the month and year of the data they represent. The data was then merged using ‘bind_rows’ to create a signle dataframe for manipulation and cleaning.

all_trips<-bind_rows(jun_2022,jul_2022,aug_2022,sep_2022,oct_2022,nov_2022,dec_2022,jan_2023,feb_2023,mar_2023,apr_2023,may_2023)

The following functions were used to inspect the new dataframe:

  • colnames(all_trips) #List of column names
  • nrow(all_trips) #How many rows are in data frame?
  • dim(all_trips) #Dimensions of the data frame?
  • head(all_trips) #See the first 6 rows of data frame. Also tail(all_trips)
  • str(all_trips) #See list of columns and data types (numeric, character, etc)
  • summary(all_trips) #Statistical summary of data. Mainly for numerics

List of all column names and number or rows in the dataframe.

colnames(all_trips)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"
nrow(all_trips)
## [1] 5829030

“mutate” was used to convert subscriber and customer text in dataframe to proper member and casual text.

all_trips <-  all_trips %>% 
  mutate(member_casual = recode(member_casual
                           ,"Subscriber" = "member"
                           ,"Customer" = "casual"))

Columns for the date, month, day, year, and day of the week were added to the dataframe.

all_trips$date <- as.Date(all_trips$started_at)
all_trips$month <- format(as.Date(all_trips$date), "%m")
all_trips$day <- format(as.Date(all_trips$date), "%d")
all_trips$year <- format(as.Date(all_trips$date), "%Y")
all_trips$day_of_week <- format(as.Date(all_trips$date), "%A")
colnames(all_trips)
##  [1] "ride_id"            "rideable_type"      "started_at"        
##  [4] "ended_at"           "start_station_name" "start_station_id"  
##  [7] "end_station_name"   "end_station_id"     "start_lat"         
## [10] "start_lng"          "end_lat"            "end_lng"           
## [13] "member_casual"      "date"               "month"             
## [16] "day"                "year"               "day_of_week"

Ride length(in seconds) was added to each trip.

all_trips$ride_length <- difftime(all_trips$ended_at,all_trips$started_at)
str(all_trips)
## tibble [5,829,030 × 19] (S3: tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:5829030] "600CFD130D0FD2A4" "F5E6B5C1682C6464" "B6EB6D27BAD771D2" "C9C320375DE1D5C6" ...
##  $ rideable_type     : chr [1:5829030] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXct[1:5829030], format: "2022-06-30 17:27:53" "2022-06-30 18:39:52" ...
##  $ ended_at          : POSIXct[1:5829030], format: "2022-06-30 17:35:15" "2022-06-30 18:47:28" ...
##  $ start_station_name: chr [1:5829030] NA NA NA NA ...
##  $ start_station_id  : chr [1:5829030] NA NA NA NA ...
##  $ end_station_name  : chr [1:5829030] NA NA NA NA ...
##  $ end_station_id    : chr [1:5829030] NA NA NA NA ...
##  $ start_lat         : num [1:5829030] 41.9 41.9 41.9 41.8 41.9 ...
##  $ start_lng         : num [1:5829030] -87.6 -87.6 -87.7 -87.7 -87.6 ...
##  $ end_lat           : num [1:5829030] 41.9 41.9 41.9 41.8 41.9 ...
##  $ end_lng           : num [1:5829030] -87.6 -87.6 -87.6 -87.7 -87.6 ...
##  $ member_casual     : chr [1:5829030] "casual" "casual" "casual" "casual" ...
##  $ date              : Date[1:5829030], format: "2022-06-30" "2022-06-30" ...
##  $ month             : chr [1:5829030] "06" "06" "06" "06" ...
##  $ day               : chr [1:5829030] "30" "30" "30" "30" ...
##  $ year              : chr [1:5829030] "2022" "2022" "2022" "2022" ...
##  $ day_of_week       : chr [1:5829030] "Thursday" "Thursday" "Thursday" "Thursday" ...
##  $ ride_length       : 'difftime' num [1:5829030] 442 456 809 258 ...
##   ..- attr(*, "units")= chr "secs"

“ride_length” was converted from factor to numeric so calculations could be ran on the data.

New verision of the dataframe was created to remove entries of docked bicycles.

all_trips_v2 <- all_trips[!(all_trips$start_station_name == "HQ QR" | all_trips$ride_length<=0),]

Analyze

Key tasks

  1. Aggregate your data so it’s useful and accessible.

  2. Organize and format your data.

  3. Perform calculations.

  4. Identify trends and relationships.

Deliverable:A summary of your analysis

Descriptive Analysis on ride length

  • mean=straight average (total ride length / rides)

  • median#midpoint number in the ascending array of ride lengths

  • max=longest ride

  • min=shortest ride

all_trips_v2 %>% group_by(member_casual) %>%
summarise(average_ride_length = mean(ride_length), median_length = median(ride_length), max_ride_length = max(ride_length), min_ride_length = min(ride_length)) %>% drop_na()
## # A tibble: 2 × 5
##   member_casual average_ride_length median_length max_ride_length
##   <chr>                       <dbl>         <dbl>           <dbl>
## 1 casual                      1844.           768         2483235
## 2 member                       752.           521           93580
## # ℹ 1 more variable: min_ride_length <dbl>

Compare members and casual users

mean ride length

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = mean)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                1844.4176
## 2                     member                 752.3535

median ride length

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = median)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                      768
## 2                     member                      521

max ride length

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = max)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                  2483235
## 2                     member                    93580

min ride length

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual, FUN = min)
##   all_trips_v2$member_casual all_trips_v2$ride_length
## 1                     casual                        1
## 2                     member                        1

Total rides by rider type

Total rides by rideable type

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Ridership by weekday

all_trips_v2 %>% 
  group_by(member_casual, day_of_week) %>%  #groups by member_casual
  summarise(number_of_rides = n() #calculates the number of rides and average duration 
  ,average_ride_length = mean(ride_length),.groups="drop") %>% # calculates the average duration
  arrange(member_casual, day_of_week) #sort
## # A tibble: 15 × 4
##    member_casual day_of_week number_of_rides average_ride_length
##    <chr>         <chr>                 <int>               <dbl>
##  1 casual        Friday               292763               1794.
##  2 casual        Monday               219537               1802.
##  3 casual        Saturday             390890               2094.
##  4 casual        Sunday               322008               2181.
##  5 casual        Thursday             265164               1588.
##  6 casual        Tuesday              229913               1654.
##  7 casual        Wednesday            247445               1561.
##  8 member        Friday               428644                744.
##  9 member        Monday               409435                714.
## 10 member        Saturday             382347                846.
## 11 member        Sunday               335506                839.
## 12 member        Thursday             487275                725.
## 13 member        Tuesday              480681                722.
## 14 member        Wednesday            502344                718.
## 15 <NA>          <NA>                 834511                 NA

number of rides by member and casuals users based on weekday

## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Average ride time based on weekday for members vs casual users

all_trips_v2$day_of_week <- ordered(all_trips_v2$day_of_week, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

aggregate(all_trips_v2$ride_length ~ all_trips_v2$member_casual + all_trips_v2$day_of_week, FUN = mean)
##    all_trips_v2$member_casual all_trips_v2$day_of_week all_trips_v2$ride_length
## 1                      casual                   Sunday                2181.0479
## 2                      member                   Sunday                 838.9489
## 3                      casual                   Monday                1801.8482
## 4                      member                   Monday                 713.6457
## 5                      casual                  Tuesday                1653.7312
## 6                      member                  Tuesday                 721.7299
## 7                      casual                Wednesday                1560.6771
## 8                      member                Wednesday                 717.5140
## 9                      casual                 Thursday                1588.3589
## 10                     member                 Thursday                 725.3273
## 11                     casual                   Friday                1793.9839
## 12                     member                   Friday                 743.8081
## 13                     casual                 Saturday                2094.2628
## 14                     member                 Saturday                 846.1136

Visualization of average duration

all_trips_v2 %>% 
  mutate(weekday = wday(started_at, label = TRUE)) %>% 
  group_by(member_casual, weekday) %>% 
  summarise(number_of_rides = n()
            ,average_duration = mean(ride_length)) %>% 
  drop_na() %>%
  arrange(member_casual, weekday)  %>% 
  ggplot(aes(x = weekday, y = average_duration, fill = member_casual)) +
  geom_col(position = "dodge") + labs(title ="Average ride time of Members and Casual riders Vs. Day of the week")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Number of rides by month

all_trips_v2 %>% 
group_by(member_casual,month) %>% summarize(number_of_rides=n()) %>% drop_na() %>%
  ggplot() +
geom_col(mapping= aes(x= month, y= number_of_rides,fill=member_casual)) + labs(title ="Number of rides a month by member and casual riders")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

Share

  1. Determine the best way to share your findings.

  2. Create effective data visualizations.

  3. Present your findings.

  4. Ensure your work is accessible.

Deliverable :Supporting visualizations and key findings

Kaggle will be used to share and present my findings (INSERT LINK)link

Act

  1. Create your portfolio.

  2. Add your case study.

  3. Practice presenting your case study to a friend or family member.

Deliverable : Your top three recommendations based on your analysis

##1. Offer a weekend membership type to encourage casual riders which are more active on the weekend to buy memberships.##

##2. Increase add campaigns starting in April focused on a summer special with a membership discount as riders of all types increase activity in these months.##

##3. Implement a referall program where new members recieve 20% off their membership for a year if someone they refer purchases a annual membership.##

all_trips_v2 %>% 
group_by(member_casual,rideable_type) %>% summarize(number_of_rides=n()) %>% drop_na() %>% filter( rideable_type != 'docked_bike') %>%
  ggplot() +
geom_col(mapping= aes(x= member_casual, y= number_of_rides,fill=rideable_type))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.

clear_output()