Case Study on Cyclistic

A Bike Sharing Company

Author

By Somnath DasGupta

1 INTRODUCTION

Hello everyone, it’s been a few months since I’ve been working on the Google Data Analytics Professional Certificate through Coursera. Throughout this journey I’ve accumulated lots of interesting, insightful and most importantly, useful information about various tools that were included with the bundle, such as Tableau, R programming, SQL, Spreadsheets.

This particular curriculum also introduced me to various sorts of standardized practices and also gave me a universal framework to follow throughout every single project along with some key data analyst terminologies and processes. Now below is a brief walk through of my thought process and overall understanding that I have gained overtime by completing this case study that is included with the course by using various tools, methods and strategy.

2 BACKGROUND INFORMATION

You are working for Cyclistic, a bike-sharing company. Bikes can be unlocked from one station and returned to any other station in the system anytime.

Cyclistic has flexible pricing plans: single-ride passes, full-day passes, and annual memberships. Customers who purchase single-ride or full-day passes are referred to as casual riders. Customers who purchase annual memberships are Cyclistic members.

The director of marketing believes the company’s future success depends on maximizing the number of annual memberships, as finance analysts have concluded that annual memberships are much more profitable than casual riders. She also believes that there is a good chance of converting casual riders to members as they are already aware of a Cyclistic program and have chosen it for their mobility needs.

3 ASK PHASE

Some of the crucial questions asked will guide the direction of the future marketing program:

    • How casual and annual members use cyclistic services differently?
    • What is the estimate of people who choose cyclistic over any other daily commute services?
    • Does cyclistic have any unique value proposition for their riders over any other competitor in the same segment?
    • What is the overall past experience with different kinds of marketing platform, say it’s digital marketing,influencer marketing or traditional marketing?

3.1 Key Takeways

    1. Identify business task.
    • The main objective is to design marketing strategies for converting casual riders to annual members by understanding how they differentiate.
    • The differentiation will be checked based upon certain parameters, which are their preferred weekday, rides per week,duration spent weekly and monthly,most visited routes etc.
    1. Consider key Stakeholders.
    • Director of Marketing (Lily Moreno).
    • Marketing Analytics team.
    • Executive team.

3.2 Deliverable

    • Identify each and every pattern in which both rider types differentiate.
    • All possible related factors which are not letting casual riders opt for the annual membership program.

4 PREPARE PHASE

Here, for this analysis, I will be using a public dataset that is made available on this page. The data has been made available by Motivate International Inc. under this license.

4.1 Key Task

    • Load each of the datasets month wise to maintain the consecutive order.

    • Download the datasets from a given online repository and then save them in a separate folder.

    • Identify the doc format and check if lists of doc are readable and writable.

    • Determine the credibility of data by inspecting for any vague or unwanted rows in each of the datasets and then sort accordingly.

    • Check for the total number of columns and its name to concatenate successfully.

4.2 Deliverable

    • Documenting the entire procedure involved in this phase.

    • A short brief for each operation performed for ease of understanding.

4.3 CODE CHUNK

Let’s load the libraries:

Code
library(tidyverse)
library(ggplot2)
library(janitor)
library(hms)
library(geosphere)
library(spatialrisk)
library(distances)
library(Distance)
library(measurements)
library(plotrix)
library(lubridate)
library(ggalt)
library(hrbrthemes)
library(viridis)
library(ggridges)
library(scales)
library(readxl)
library(writexl)
library(ggiraph)
library(viridisLite)
library(labeling)
library(farver)

Importing all Datasets

Code
january_2021 <- read_csv("202101-divvy-tripdata.csv")
february_2021 <- read_csv("202102-divvy-tripdata.csv")
march_2021 <- read_csv("202103-divvy-tripdata.csv")
april_2021 <- read_csv("202104-divvy-tripdata.csv")
may_2021 <- read_csv("202105-divvy-tripdata.csv")
june_2021<- read_csv("202106-divvy-tripdata.csv")
july_2021<- read_csv("202107-divvy-tripdata.csv")
august_2021 <- read_csv("202108-divvy-tripdata.csv")
september_2021 <- read_csv("202109-divvy-tripdata.csv")
october_2021 <- read_csv("202110-divvy-tripdata.csv")
november_2021 <- read_csv("202111-divvy-tripdata.csv")
december_2021 <- read_csv("202112-divvy-tripdata.csv")

Binding all datasets years wise into one using do.call() , rbind , and list() function:

Code
all_datasets_2021 <- do.call("rbind",list( january_2021, february_2021, march_2021,april_2021, may_2021, june_2021,july_2021,august_2021,
                                           september_2021,october_2021,november_2021,december_2021))

5 PROCESS PHASE

Accumulating datasets for cleaning and preparing for the analysis.

5.1 Key Task

    • Checking for any errors or missing values within datasets.

    • Removing duplicates and outliers from datasets.

    • Selecting the right tools to perform these operations.

    • Creating a backup of original data in case any crucial information gets eradicated during the cleaning process.

    • Transform the data in a desired workable format.

5.2 Deliverable

    • Applying some crucial changes to the dataset, such as changing datatype,using various functions to calculate values.

    • Manipulating data by performing various computations as required.

    • Stating each computation use case along with a final preview of the entire changes made.

    • Documenting the step by step process in detail.

5.3 CODE CHUNK

Removing duplicated rows and coulmns.

Code
all_datasets_2021 <- all_datasets_2021[!duplicated(all_datasets_2021), ]

Removing incomplete data through complete.cases().

Code
all_datasets_2021<- all_datasets_2021[complete.cases(all_datasets_2021), ]
Code
all_datasets_2021[complete.cases(all_datasets_2021), ]
# A tibble: 4,588,302 × 13
   ride_id       ridea…¹ started_at          ended_at            start…² start…³
   <chr>         <chr>   <dttm>              <dttm>              <chr>   <chr>  
 1 B9F73448DFBE… classi… 2021-01-24 19:15:38 2021-01-24 19:22:51 Califo… 17660  
 2 457C7F4B5D3D… electr… 2021-01-23 12:57:38 2021-01-23 13:02:10 Califo… 17660  
 3 57C750326F9F… electr… 2021-01-09 15:28:04 2021-01-09 15:37:51 Califo… 17660  
 4 4D518C65E338… electr… 2021-01-09 15:28:57 2021-01-09 15:37:54 Califo… 17660  
 5 9D08A3AFF410… classi… 2021-01-24 15:56:59 2021-01-24 16:07:08 Califo… 17660  
 6 49FCE1F8598F… electr… 2021-01-22 15:15:28 2021-01-22 15:36:01 Califo… 17660  
 7 0FEED5C2C874… classi… 2021-01-05 10:33:12 2021-01-05 10:39:12 Califo… 17660  
 8 E276FD43BDED… classi… 2021-01-30 11:59:16 2021-01-30 12:03:44 Califo… 17660  
 9 88BFCF66C2D5… electr… 2021-01-27 07:27:09 2021-01-27 07:45:32 Califo… 17660  
10 8BD6F6510F5C… electr… 2021-01-15 08:54:41 2021-01-15 09:11:46 Califo… 17660  
# … with 4,588,292 more rows, 7 more variables: end_station_name <chr>,
#   end_station_id <chr>, start_lat <dbl>, start_lng <dbl>, end_lat <dbl>,
#   end_lng <dbl>, member_casual <chr>, and abbreviated variable names
#   ¹​rideable_type, ²​start_station_name, ³​start_station_id

Split date from a column using as.date() function.

For started_at column

Code
all_datasets_2021$date<-as.Date(all_datasets_2021$started_at)

For ended_at column.

Code
all_datasets_2021$date_of_arrival<-as.Date(all_datasets_2021$ended_at)

Renaming columns in a dataset.

Code
all_datasets_2021<- rename(all_datasets_2021,date_of_journey = date)

Extracting Months from date.

Code
all_datasets_2021$month <- as.character(months(all_datasets_2021$date_of_journey))

Split time from a column using as.POSIXCT()

For started_at column

Code
all_datasets_2021$journey_departure_time<-format(as.POSIXct(all_datasets_2021$started_at),
                                                  format = "%H:%M:%S")

Converting POSIXCT() function class ‘character’ to hms & difftime in order to perform calculations.

Code
all_datasets_2021$departure_time<-as_hms(all_datasets_2021$journey_departure_time)

For ended_at column

Code
all_datasets_2021$journey_arrival_time<-format(as.POSIXct(all_datasets_2021$ended_at),
                                                format = "%H:%M:%S")

Converting POSIXCT() function class ‘character’ to hms & difftime in order to perform calculations.

Code
all_datasets_2021$arrival_time<-as_hms(all_datasets_2021$journey_arrival_time)

Deleting columns from dataset.

For started_at column.

Code
all_datasets_2021$started_at <- NULL

For ended_at column.

Code
all_datasets_2021$ended_at <- NULL

For journey_departure_time

Code
all_datasets_2021$journey_departure_time <-NULL

For journey_arrival_time.

Code
all_datasets_2021$journey_arrival_time <- NULL

For removing dataset.

Code
remove(january_2021)
remove(february_2021)
remove(march_2021)
remove(april_2021)
remove(may_2021)
remove(june_2021)
remove(july_2021)
remove(august_2021)
remove(september_2021)
remove(october_2021)
remove(november_2021)
remove(december_2021)

Difference between time using difftime() in terms of hours.

Code
all_datasets_2021$total_mins<- difftime(all_datasets_2021$arrival_time,all_datasets_2021$departure_time ,units = "hours")

To remove negative time difference we converted the difference into absolute value which is positive using abs().

As cannot be negative in real so decided to convert it into absolute values.

Code
all_datasets_2021$total_mins_spent <- abs(all_datasets_2021$total_mins)

cleaning all spaces in between each column name.

Code
all_datasets_2021<- all_datasets_2021 %>% 
  janitor::clean_names()

Removing total_mins column.

Code
all_datasets_2021$total_mins <- NULL

Creating a separate column for weekdays of journey using weekdays().

Code
all_datasets_2021$day_of_journey <- weekdays(all_datasets_2021$date_of_journey)

Creating a separate column for weekdays of arrival using weekdays().

Code
all_datasets_2021$day_of_arrival <- weekdays(all_datasets_2021$date_of_arrival)

Calculating the riders total distance travelled using geosphere() and spatialrisk() package.

Code
all_datasets_2021$total <- spatialrisk::haversine(all_datasets_2021$start_lat,all_datasets_2021$start_lng,all_datasets_2021$end_lat,all_datasets_2021$end_lng)

Removing all the columns with zero input by converting all those columns into na’s using na_if() function.

Code
all_datasets_2021$distance <- na_if(all_datasets_2021$total, 0)

Distance computed is in default ‘meter’ format, so converting it into ‘km’ by dividing entire column by 1000.

Code
all_datasets_2021$total_distance <- all_datasets_2021$distance / 1000

Removing two column from dataset.

Removing distance column.

Code
all_datasets_2021$distance <- NULL

Removing total column

Code
all_datasets_2021$total <- NULL

Applying na.omit() for removing all NA values from dataset.

Code
all_datasets_2021 <- na.omit(all_datasets_2021)

Converting this new dataset into data frame using data.frame() function to perform all sort of computation.

Code
all_datasets_2021 <- data.frame(all_datasets_2021)

Summary Of data.

Code
summary(all_datasets_2021)
   ride_id          rideable_type      start_station_name start_station_id  
 Length:4311315     Length:4311315     Length:4311315     Length:4311315    
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
 end_station_name   end_station_id       start_lat       start_lng     
 Length:4311315     Length:4311315     Min.   :41.65   Min.   :-87.83  
 Class :character   Class :character   1st Qu.:41.88   1st Qu.:-87.66  
 Mode  :character   Mode  :character   Median :41.90   Median :-87.64  
                                       Mean   :41.90   Mean   :-87.64  
                                       3rd Qu.:41.93   3rd Qu.:-87.63  
                                       Max.   :42.06   Max.   :-87.53  
    end_lat         end_lng       member_casual      date_of_journey     
 Min.   :41.65   Min.   :-87.83   Length:4311315     Min.   :2021-01-01  
 1st Qu.:41.88   1st Qu.:-87.66   Class :character   1st Qu.:2021-06-05  
 Median :41.90   Median :-87.64   Mode  :character   Median :2021-07-29  
 Mean   :41.90   Mean   :-87.64                      Mean   :2021-07-25  
 3rd Qu.:41.93   3rd Qu.:-87.63                      3rd Qu.:2021-09-19  
 Max.   :42.17   Max.   :-87.52                      Max.   :2021-12-31  
 date_of_arrival         month           departure_time    arrival_time     
 Min.   :2021-01-01   Length:4311315     Length:4311315    Length:4311315   
 1st Qu.:2021-06-05   Class :character   Class1:hms        Class1:hms       
 Median :2021-07-29   Mode  :character   Class2:difftime   Class2:difftime  
 Mean   :2021-07-25                      Mode  :numeric    Mode  :numeric   
 3rd Qu.:2021-09-19                                                         
 Max.   :2022-01-03                                                         
 total_mins_spent  day_of_journey     day_of_arrival     total_distance    
 Length:4311315    Length:4311315     Length:4311315     Min.   : 0.00002  
 Class :difftime   Class :character   Class :character   1st Qu.: 1.02524  
 Mode  :numeric    Mode  :character   Mode  :character   Median : 1.71917  
                                                         Mean   : 2.26815  
                                                         3rd Qu.: 2.91957  
                                                         Max.   :33.83804  
Code
str(all_datasets_2021)
'data.frame':   4311315 obs. of  20 variables:
 $ ride_id           : chr  "B9F73448DFBE0D45" "457C7F4B5D3DA135" "57C750326F9FDABE" "4D518C65E338D070" ...
 $ rideable_type     : chr  "classic_bike" "electric_bike" "electric_bike" "electric_bike" ...
 $ start_station_name: chr  "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" ...
 $ start_station_id  : chr  "17660" "17660" "17660" "17660" ...
 $ end_station_name  : chr  "Wood St & Augusta Blvd" "California Ave & North Ave" "Wood St & Augusta Blvd" "Wood St & Augusta Blvd" ...
 $ end_station_id    : chr  "657" "13258" "657" "657" ...
 $ start_lat         : num  41.9 41.9 41.9 41.9 41.9 ...
 $ start_lng         : num  -87.7 -87.7 -87.7 -87.7 -87.7 ...
 $ end_lat           : num  41.9 41.9 41.9 41.9 41.9 ...
 $ end_lng           : num  -87.7 -87.7 -87.7 -87.7 -87.7 ...
 $ member_casual     : chr  "member" "member" "casual" "casual" ...
 $ date_of_journey   : Date, format: "2021-01-24" "2021-01-23" ...
 $ date_of_arrival   : Date, format: "2021-01-24" "2021-01-23" ...
 $ month             : chr  "January" "January" "January" "January" ...
 $ departure_time    : 'hms' num  19:15:38 12:57:38 15:28:04 15:28:57 ...
  ..- attr(*, "units")= chr "secs"
 $ arrival_time      : 'hms' num  19:22:51 13:02:10 15:37:51 15:37:54 ...
  ..- attr(*, "units")= chr "secs"
 $ total_mins_spent  : 'difftime' num  0.120277777777778 0.0755555555555556 0.163055555555556 0.149166666666667 ...
  ..- attr(*, "units")= chr "hours"
 $ day_of_journey    : chr  "Sunday" "Saturday" "Saturday" "Saturday" ...
 $ day_of_arrival    : chr  "Sunday" "Saturday" "Saturday" "Saturday" ...
 $ total_distance    : num  2.03 1.12 2.04 2.04 2.03 ...

6 ANALYZE PHASE

In this particular phase, I will start analysing the clean and processed dataset in order to find answers to many of the questions that will help stakeholders of a cyclistic company to take up on their marketing campaign in a specific direction which, in turn will lead to retaining existing members and converting all other forthcoming users to a subscription programme.

6.1 Key Task

    • Various sets of comparisons were made to deep dive into the dataset in order to understand the complete scenario of customer’s behaviour and their preferences

    • Series of analysis are performed to get the thorough details for the entire analysis and also lay down a path of conviction to answer existing and imminent intrigues.

    • Aggregate several columns to explore the various aspects of the dataset and its significance in how people perceive cyclistic services.

    • Examine every nook and cranny in the dataset using various built-in R functions to get the final profiling done for both the categories of customers.

    • Identifying trends and relationships for each of the member types and their utility.

6.2 Deliverable

    • Established summary through several useful functions such as ‘head()’, ‘filter()’,‘count()’, ‘glimpse()’, etc.

    • Computations performed will illustrate a complete brief on the utility of every member type in terms of their choices and their preferences.

    • Some statistical operations are also performed to develop a brief on some crucial factors that influence people’s conduct.

6.3 CODE CHUNK

Summary of total_mins_spent column in terms of mean, median, max and min.

Code
Ride_duration_summary <- all_datasets_2021 %>% 
  summarize(average_ride_duration = mean(total_mins_spent), 
            min_ride_duration = min(total_mins_spent), 
            median_ride_duration = median(total_mins_spent),
            max_ride_duration = max(total_mins_spent))

To get a overview of Ride_duration-summary.

Code
head(Ride_duration_summary)
  average_ride_duration min_ride_duration median_ride_duration
1       0.4311001 hours           0 hours      0.1997222 hours
  max_ride_duration
1       23.99 hours
Code
glimpse(Ride_duration_summary)
Rows: 1
Columns: 4
$ average_ride_duration <drtn> 0.4311001 hours
$ min_ride_duration     <drtn> 0 hours
$ median_ride_duration  <drtn> 0.1997222 hours
$ max_ride_duration     <drtn> 23.99 hours

Extracting top 10 most visited stations via set of processing for member type.

Filtering default dataset and creating a new one with filtered member type member.

Code
all_trips_member <- filter(all_datasets_2021, member_casual == "member")

Adding a new column using mutate () function from existing start_station_name column.

Code
all_trips_member<-all_trips_member %>% 
  mutate(route = paste(start_station_name,"To", sep = " "))

Concatenating existing column with newly created column using mutate() function.

Code
all_trips_member<-all_trips_member %>% 
  mutate(route = paste(route,end_station_name, sep = " "))

Using pipe operator summarizing number of rides , mean of total_mins_spent grouping it by route and then arranging sequentially.

Code
popular_ride_route_member <- all_trips_member %>% 
  group_by(route) %>% 
  summarize(number_of_rides = n(),average_duration_minutes = mean(total_mins_spent)) %>% 
  arrange(route,number_of_rides,average_duration_minutes)

Using pipe operator summarizing number of rides , mean of total_distance grouping it by route and then arranging sequentially for member type.

Code
popular_distance_travelled_member <- all_trips_member %>% 
  group_by(route) %>% 
  summarize(number_of_rides = n(),average_distance = mean(total_distance)) %>% 
  arrange(route,number_of_rides,average_distance)

Storing top ten station name for member type in desc order using head() function along with total duration spent.

Code
popular_ride_route_member_top10 <- head(arrange(popular_ride_route_member,desc(number_of_rides)), n = 10)

Storing top ten station name for member type in desc order using head() along with total distance travelled.

Code
popular_ride_distance_member_top10 <- head(arrange(popular_distance_travelled_member,desc(number_of_rides)), n = 10)

Glance on newly obtained data.

Code
popular_ride_route_member_top10
# A tibble: 10 × 3
   route                                           number_of_rides average_dur…¹
   <chr>                                                     <int> <drtn>       
 1 Ellis Ave & 60th St To Ellis Ave & 55th St                 4082 0.10550989 h…
 2 Ellis Ave & 55th St To Ellis Ave & 60th St                 3652 0.11763394 h…
 3 Ellis Ave & 60th St To University Ave & 57th St            3109 0.13157196 h…
 4 University Ave & 57th St To Ellis Ave & 60th St            3010 0.12021290 h…
 5 Calumet Ave & 33rd St To State St & 33rd St                1989 0.06755684 h…
 6 State St & 33rd St To Calumet Ave & 33rd St                1954 0.09848971 h…
 7 Loomis St & Lexington St To Morgan St & Polk St            1860 0.09136798 h…
 8 Morgan St & Polk St To Loomis St & Lexington St            1653 0.11731717 h…
 9 MLK Jr Dr & 29th St To State St & 33rd St                  1422 0.21594116 h…
10 State St & 33rd St To MLK Jr Dr & 29th St                  1392 0.21096384 h…
# … with abbreviated variable name ¹​average_duration_minutes
Code
popular_ride_distance_member_top10
# A tibble: 10 × 3
   route                                           number_of_rides average_dis…¹
   <chr>                                                     <int>         <dbl>
 1 Ellis Ave & 60th St To Ellis Ave & 55th St                 4082         1.02 
 2 Ellis Ave & 55th St To Ellis Ave & 60th St                 3652         1.02 
 3 Ellis Ave & 60th St To University Ave & 57th St            3109         0.716
 4 University Ave & 57th St To Ellis Ave & 60th St            3010         0.716
 5 Calumet Ave & 33rd St To State St & 33rd St                1989         0.654
 6 State St & 33rd St To Calumet Ave & 33rd St                1954         0.653
 7 Loomis St & Lexington St To Morgan St & Polk St            1860         0.868
 8 Morgan St & Polk St To Loomis St & Lexington St            1653         0.867
 9 MLK Jr Dr & 29th St To State St & 33rd St                  1422         1.09 
10 State St & 33rd St To MLK Jr Dr & 29th St                  1392         1.09 
# … with abbreviated variable name ¹​average_distance

Extracting top 10 most visited stations via set of processing for casual type.

Filtering default dataset and creating a new one with filtered member type casual.

Code
all_trips_casual <- filter(all_datasets_2021, member_casual == "casual")

Adding a new column using mutate () function from existing start_station_name column.

Code
all_trips_casual<-all_trips_casual %>% 
  mutate(route = paste(start_station_name,"To", sep = " "))

Concatenating existing column with newly created column using mutate() function.

Code
all_trips_casual<-all_trips_casual %>% 
  mutate(route = paste(route,end_station_name, sep = " "))

Using pipe operator summarizing number of rides , mean of total_mins_spent grouping it by route and then arranging sequentially.

Code
popular_ride_route_casual <- all_trips_casual %>% 
  group_by(route) %>% 
  summarize(number_of_rides = n(),average_duration_minutes = mean(total_mins_spent)) %>% 
  arrange(route,number_of_rides,average_duration_minutes)

Using pipe operator summarizing number of rides , mean of total_distance grouping it by route and then arranging sequentially for casual type.

Code
popular_distance_travelled_casual <- all_trips_casual %>% 
  group_by(route) %>% 
  summarize(number_of_rides = n(),average_distance = mean(total_distance)) %>% 
  arrange(route,number_of_rides,average_distance)

Storing top ten station name for casual member type in desc order using head() function along with total duration spent.

Code
popular_ride_route_casual_top10 <- head(arrange(popular_ride_route_casual,desc(number_of_rides)), n = 10)

Storing top ten station name for casual member type in desc order using head() along with total distance travelled.

Code
popular_ride_distance_casual_top10 <- head(arrange(popular_distance_travelled_casual,desc(number_of_rides)), n = 10)

Glance on newly obtained data.

Code
popular_ride_route_casual_top10
# A tibble: 10 × 3
   route                                                        number…¹ avera…²
   <chr>                                                           <int> <drtn> 
 1 Streeter Dr & Grand Ave To Millennium Park                       3309 0.6745…
 2 Millennium Park To Streeter Dr & Grand Ave                       2927 0.7638…
 3 Shedd Aquarium To Streeter Dr & Grand Ave                        2822 0.5835…
 4 Lake Shore Dr & Monroe St To Streeter Dr & Grand Ave             2811 0.5459…
 5 DuSable Lake Shore Dr & Monroe St To Streeter Dr & Grand Ave     2736 0.4910…
 6 Streeter Dr & Grand Ave To Michigan Ave & Oak St                 2478 0.4997…
 7 Dusable Harbor To Streeter Dr & Grand Ave                        2280 0.4592…
 8 Michigan Ave & Oak St To Streeter Dr & Grand Ave                 2008 0.5707…
 9 Streeter Dr & Grand Ave To Theater on the Lake                   1951 0.5780…
10 Shedd Aquarium To Millennium Park                                1818 0.4921…
# … with abbreviated variable names ¹​number_of_rides, ²​average_duration_minutes
Code
popular_ride_distance_casual_top10
# A tibble: 10 × 3
   route                                                        number…¹ avera…²
   <chr>                                                           <int>   <dbl>
 1 Streeter Dr & Grand Ave To Millennium Park                       3309   1.60 
 2 Millennium Park To Streeter Dr & Grand Ave                       2927   1.60 
 3 Shedd Aquarium To Streeter Dr & Grand Ave                        2822   2.80 
 4 Lake Shore Dr & Monroe St To Streeter Dr & Grand Ave             2811   1.32 
 5 DuSable Lake Shore Dr & Monroe St To Streeter Dr & Grand Ave     2736   1.32 
 6 Streeter Dr & Grand Ave To Michigan Ave & Oak St                 2478   1.37 
 7 Dusable Harbor To Streeter Dr & Grand Ave                        2280   0.593
 8 Michigan Ave & Oak St To Streeter Dr & Grand Ave                 2008   1.37 
 9 Streeter Dr & Grand Ave To Theater on the Lake                   1951   4.09 
10 Shedd Aquarium To Millennium Park                                1818   1.70 
# … with abbreviated variable names ¹​number_of_rides, ²​average_distance

Removing some of the dataset that were created to fetch some specific data.

Code
remove(all_trips_casual)
remove(all_trips_member)
remove(popular_ride_route_casual)
remove(popular_ride_route_member)
remove(popular_distance_travelled_casual)
remove(popular_distance_travelled_member)

Taking a glance on start_station_name with respect to member_casual in descending order limited up to twenty entries.

Code
head(count(all_datasets_2021,start_station_name,member_casual,sort = TRUE), n = 20)
          start_station_name member_casual     n
1    Streeter Dr & Grand Ave        casual 54225
2            Millennium Park        casual 26847
3      Michigan Ave & Oak St        casual 23614
4          Clark St & Elm St        member 23200
5   Kingsbury St & Kinzie St        member 22277
6      Wells St & Concord Ln        member 22245
7             Shedd Aquarium        casual 20070
8          Wells St & Elm St        member 19663
9      Dearborn St & Erie St        member 18259
10     Wells St & Concord Ln        casual 18201
11       Wells St & Huron St        member 17845
12    St. Clair St & Erie St        member 17760
13       Theater on the Lake        casual 17706
14      Broadway & Barry Ave        member 16232
15   Clinton St & Madison St        member 16058
16 Desplaines St & Kinzie St        member 15775
17   Clark St & Armitage Ave        member 15478
18    Wabash Ave & Grand Ave        member 15432
19    Clark St & Lincoln Ave        member 15312
20    Clark St & Lincoln Ave        casual 15250

Taking a glance on end_station_name with respect to member_casual in ascending order limited up to twenty entries.

Code
head(count(all_datasets_2021,end_station_name,member_casual,sort = TRUE), n = 20)
                     end_station_name member_casual     n
1             Streeter Dr & Grand Ave        casual 57303
2                     Millennium Park        casual 28406
3               Michigan Ave & Oak St        casual 25317
4                   Clark St & Elm St        member 23273
5               Wells St & Concord Ln        member 22892
6            Kingsbury St & Kinzie St        member 22462
7                   Wells St & Elm St        member 20218
8                 Theater on the Lake        casual 19393
9               Dearborn St & Erie St        member 18918
10                     Shedd Aquarium        casual 18684
11              Wells St & Concord Ln        casual 17939
12             St. Clair St & Erie St        member 17587
13                Wells St & Huron St        member 17508
14               Broadway & Barry Ave        member 16815
15            Clinton St & Madison St        member 16412
16              Green St & Madison St        member 15949
17             Clark St & Lincoln Ave        casual 15504
18         Lake Shore Dr & North Blvd        casual 15474
19 DuSable Lake Shore Dr & North Blvd        casual 15420
20             Clark St & Lincoln Ave        member 15051

Taking a glance on total count by filtering and comparing start_station_name and end_station_name with respect to member_casual in ascending order up to twenty entries.

Code
head(count(filter(all_datasets_2021,start_station_name == end_station_name),member_casual,sort = TRUE), n = 20)
  member_casual     n
1        casual 38301
2        member 22798

Looking for a new gleam by filtering and comparing start_station_name and end_station_name along with a preview of start_station_name with respect to member_casual in ascending order limited to twenty entries.

Code
head(count(filter(all_datasets_2021,start_station_name == end_station_name),start_station_name,member_casual,sort = TRUE), n = 20)
                  start_station_name member_casual    n
1            Streeter Dr & Grand Ave        casual 1458
2              Michigan Ave & Oak St        casual  852
3                    Millennium Park        casual  773
4         Indiana Ave & Roosevelt Rd        casual  552
5                Buckingham Fountain        casual  476
6          Lake Shore Dr & Monroe St        casual  474
7              Dearborn St & Erie St        member  458
8                     Shedd Aquarium        casual  456
9  DuSable Lake Shore Dr & Monroe St        casual  397
10                    Dusable Harbor        casual  392
11                   Montrose Harbor        casual  377
12             Michigan Ave & 8th St        casual  375
13              New St & Illinois St        casual  339
14         Columbus Dr & Randolph St        casual  321
15               Theater on the Lake        casual  305
16               Wabash Ave & 9th St        casual  282
17                 Adler Planetarium        casual  271
18   Lakefront Trail & Bryn Mawr Ave        casual  267
19          Fairbanks Ct & Grand Ave        casual  258
20            Michigan Ave & Lake St        casual  253

Getting a preview by counting start_station_name with respect to day_of_journey by sorting it in ascending order limited to twenty entries.

Code
head(count(all_datasets_2021,start_station_name,day_of_journey,sort = TRUE),n=20)
        start_station_name day_of_journey     n
1  Streeter Dr & Grand Ave       Saturday 17218
2  Streeter Dr & Grand Ave         Sunday 14957
3  Streeter Dr & Grand Ave         Friday  9319
4    Wells St & Concord Ln       Saturday  8560
5    Michigan Ave & Oak St       Saturday  8508
6  Streeter Dr & Grand Ave         Monday  8322
7          Millennium Park       Saturday  7981
8   Clark St & Lincoln Ave       Saturday  7956
9    Michigan Ave & Oak St         Sunday  7688
10     Theater on the Lake       Saturday  7449
11     Theater on the Lake         Sunday  7380
12         Millennium Park         Sunday  7180
13       Clark St & Elm St       Saturday  6994
14 Clark St & Armitage Ave       Saturday  6799
15       Wells St & Elm St       Saturday  6645
16 Streeter Dr & Grand Ave      Wednesday  6596
17   Wells St & Concord Ln         Sunday  6591
18 Streeter Dr & Grand Ave        Tuesday  6272
19 Streeter Dr & Grand Ave       Thursday  6236
20   Wells St & Concord Ln         Friday  5982

Taking a glance on dataset by counting start_station_name, day_of_journey with respect to months alongside sorting the computation in ascending order limited upto twenty entries.

Code
head(count(all_datasets_2021,start_station_name,day_of_journey, month,sort = TRUE),n=20)
                   start_station_name day_of_journey     month    n
1             Streeter Dr & Grand Ave       Saturday      July 3762
2             Streeter Dr & Grand Ave         Sunday    August 2685
3             Streeter Dr & Grand Ave         Sunday      June 2613
4             Streeter Dr & Grand Ave       Saturday    August 2572
5             Streeter Dr & Grand Ave       Saturday September 2560
6             Streeter Dr & Grand Ave         Sunday      July 2459
7             Streeter Dr & Grand Ave       Saturday      June 2433
8             Streeter Dr & Grand Ave       Saturday       May 2392
9             Streeter Dr & Grand Ave         Sunday       May 2276
10            Streeter Dr & Grand Ave         Sunday September 2196
11            Streeter Dr & Grand Ave         Friday      July 2136
12 DuSable Lake Shore Dr & North Blvd       Saturday    August 1882
13 DuSable Lake Shore Dr & North Blvd         Sunday    August 1844
14            Streeter Dr & Grand Ave         Monday      July 1819
15              Michigan Ave & Oak St       Saturday      July 1664
16            Streeter Dr & Grand Ave         Friday    August 1654
17                Theater on the Lake         Sunday    August 1636
18            Streeter Dr & Grand Ave         Friday      June 1632
19         Lake Shore Dr & North Blvd         Sunday      June 1588
20         Lake Shore Dr & North Blvd       Saturday      June 1578

Previewing and pulling out the peak hours from departure time along with weekdays limited up to twenty entries.

Code
head(count(all_datasets_2021,departure_time,day_of_journey,sort = TRUE),n = 20)
   departure_time day_of_journey  n
1        17:09:44      Wednesday 40
2        17:35:20        Tuesday 39
3        14:17:03       Saturday 38
4        17:55:16        Tuesday 38
5        18:05:35      Wednesday 38
6        17:06:41      Wednesday 37
7        17:12:19      Wednesday 37
8        17:16:59      Wednesday 37
9        17:35:39        Tuesday 37
10       17:10:37        Tuesday 36
11       17:14:20      Wednesday 36
12       17:17:39        Tuesday 36
13       17:18:18         Friday 36
14       17:25:52      Wednesday 36
15       18:09:28        Tuesday 36
16       12:46:09       Saturday 35
17       13:42:30       Saturday 35
18       17:06:54      Wednesday 35
19       17:09:27      Wednesday 35
20       17:12:59        Tuesday 35

Previewing and pulling out the peak hours from arrival time along with weekdays limited up to twenty entries.

Code
head(count(all_datasets_2021,arrival_time,day_of_arrival,sort = TRUE), n = 20)
   arrival_time day_of_arrival  n
1      17:21:00      Wednesday 39
2      17:29:39         Friday 39
3      17:15:14        Tuesday 38
4      18:17:16        Tuesday 37
5      17:27:01      Wednesday 36
6      17:52:40         Monday 36
7      15:06:53       Saturday 35
8      15:55:42         Sunday 35
9      17:21:44        Tuesday 35
10     17:25:55      Wednesday 35
11     17:34:27        Tuesday 35
12     17:54:36      Wednesday 35
13     17:57:11        Tuesday 35
14     17:58:25      Wednesday 35
15     18:19:47      Wednesday 35
16     18:21:23      Wednesday 35
17     18:27:55      Wednesday 35
18     16:55:01       Saturday 34
19     17:16:38       Thursday 34
20     17:21:35      Wednesday 34

Dividing into very specific information using count to identify number of rides happened each weekday followed by types of members.

Code
head(count(all_datasets_2021, day_of_journey , member_casual,sort = FALSE),n = 20)
   day_of_journey member_casual      n
1          Friday        casual 267099
2          Friday        member 353772
3          Monday        casual 204519
4          Monday        member 333892
5        Saturday        casual 426554
6        Saturday        member 343383
7          Sunday        casual 360264
8          Sunday        member 297656
9        Thursday        casual 206866
10       Thursday        member 361814
11        Tuesday        casual 194990
12        Tuesday        member 375621
13      Wednesday        casual 199595
14      Wednesday        member 385290
Code
head(count(all_datasets_2021, day_of_arrival , member_casual,sort = FALSE),n = 20)
   day_of_arrival member_casual      n
1          Friday        casual 264993
2          Friday        member 353095
3          Monday        casual 205449
4          Monday        member 333960
5        Saturday        casual 424590
6        Saturday        member 343092
7          Sunday        casual 363926
8          Sunday        member 298948
9        Thursday        casual 206226
10       Thursday        member 361650
11        Tuesday        casual 195001
12        Tuesday        member 375520
13      Wednesday        casual 199702
14      Wednesday        member 385163

Taking a glance on distinct values of these column to understand assigned allocation of its sub-category using table function.

Code
table(all_datasets_2021$rideable_type) 

 classic_bike   docked_bike electric_bike 
      3032819        247001       1031495 
Code
table(all_datasets_2021$member_casual)

 casual  member 
1859887 2451428 
Code
table(all_datasets_2021$month)

    April    August  December  February   January      July      June     March 
   273783    635053    170679     40206     79809    647972    565822    189425 
      May  November   October September 
   415840    247404    457316    588006 

Checking on ratio of rideable type sub category with member type to get a gleam on the distribution.

Code
head(count(all_datasets_2021,rideable_type,member_casual,sort = TRUE), n = 20)
  rideable_type member_casual       n
1  classic_bike        member 1892943
2  classic_bike        casual 1139876
3 electric_bike        member  558484
4 electric_bike        casual  473011
5   docked_bike        casual  247000
6   docked_bike        member       1

Glimpse of the maximum distance travelled by a rider using max function.

Code
head(max(all_datasets_2021$total_distance), sort = TRUE)
[1] 33.83804

Glimpse of the dataset through counting total_distance and month using sort function limited upto twenty entries.

Code
head(count(all_datasets_2021,total_distance, month, sort = FALSE), n = 20)
   total_distance     month n
1    2.313061e-05  December 1
2    3.329401e-05  December 1
3    3.334390e-05    August 1
4    3.710650e-05       May 1
5    3.959635e-05    August 1
6    4.011380e-05  December 1
7    4.142217e-05 September 1
8    4.537711e-05      June 1
9    4.628751e-05      July 1
10   5.562038e-05  November 1
11   6.215622e-05  November 1
12   6.216103e-05   October 1
13   6.651749e-05      July 1
14   6.938482e-05      July 1
15   6.941745e-05  November 1
16   7.148174e-05 September 1
17   7.548542e-05      July 1
18   7.834864e-05    August 1
19   7.837490e-05  November 1
20   7.842346e-05   October 1
Code
head(count(all_datasets_2021,max(total_distance), month, sort = FALSE), n = 20)
   max(total_distance)     month      n
1             33.83804     April 273783
2             33.83804    August 635053
3             33.83804  December 170679
4             33.83804  February  40206
5             33.83804   January  79809
6             33.83804      July 647972
7             33.83804      June 565822
8             33.83804     March 189425
9             33.83804       May 415840
10            33.83804  November 247404
11            33.83804   October 457316
12            33.83804 September 588006

Glimpse of the dataset based on the total_mins_ spent column and month followed by membership type in descending and ascending order limited upto 30 entries maximum.

Code
head(count(all_datasets_2021,total_mins_spent, month, sort = TRUE), n = 20)
   total_mins_spent     month   n
1  0.12055556 hours    August 643
2  0.10694444 hours    August 641
3  0.11222222 hours    August 636
4  0.14472222 hours    August 636
5  0.10861111 hours    August 633
6  0.11444444 hours    August 628
7  0.10583333 hours    August 627
8  0.09361111 hours    August 622
9  0.12388889 hours    August 620
10 0.11000000 hours    August 619
11 0.09694444 hours    August 618
12 0.11611111 hours    August 618
13 0.09527778 hours September 617
14 0.11361111 hours    August 617
15 0.10027778 hours    August 616
16 0.11250000 hours September 616
17 0.12777778 hours    August 616
18 0.11833333 hours    August 615
19 0.11527778 hours      July 614
20 0.12361111 hours    August 614
Code
head(count(all_datasets_2021,max(total_mins_spent), month,member_casual, sort = TRUE), n = 20)
   max(total_mins_spent)     month member_casual      n
1            23.99 hours      July        casual 336693
2            23.99 hours    August        member 321251
3            23.99 hours September        member 317155
4            23.99 hours    August        casual 313802
5            23.99 hours      July        member 311279
6            23.99 hours      June        member 293113
7            23.99 hours   October        member 280003
8            23.99 hours      June        casual 272709
9            23.99 hours September        casual 270851
10           23.99 hours       May        member 225037
11           23.99 hours       May        casual 190803
12           23.99 hours  November        member 181237
13           23.99 hours   October        casual 177313
14           23.99 hours     April        member 170269
15           23.99 hours  December        member 128318
16           23.99 hours     March        member 124581
17           23.99 hours     April        casual 103514
18           23.99 hours   January        member  66550
19           23.99 hours  November        casual  66167
20           23.99 hours     March        casual  64844

Checking Quantile values and performing winsorization on the dataset.

For total_mins_spent column.

Code
quantile(all_datasets_2021$total_mins_spent,probs = seq(.99,1.0,by=0.001))
Time differences in hours
      99%     99.1%     99.2%     99.3%     99.4%     99.5%     99.6%     99.7% 
 2.604444  2.847271  3.223889  4.272445  9.228217 21.973294 23.175762 23.525849 
    99.8%     99.9%      100% 
23.691944 23.805278 23.990000 

For total_distance column.

Code
quantile(all_datasets_2021$total_distance,probs = seq(.75,1.0,by=.05))
      75%       80%       85%       90%       95%      100% 
 2.919568  3.326758  3.858654  4.599971  5.895091 33.838043 

Fetching data using specific member type along with day_of_journey and max of total mins spent column limited upto twenty entries.

Code
head(count(filter(all_datasets_2021,member_casual == 'casual'),member_casual,day_of_journey,max(total_mins_spent),sort = TRUE),n = 20)
  member_casual day_of_journey max(total_mins_spent)      n
1        casual       Saturday           23.99 hours 426554
2        casual         Sunday           23.99 hours 360264
3        casual         Friday           23.99 hours 267099
4        casual       Thursday           23.99 hours 206866
5        casual         Monday           23.99 hours 204519
6        casual      Wednesday           23.99 hours 199595
7        casual        Tuesday           23.99 hours 194990
Code
head(count(filter(all_datasets_2021,member_casual == 'member'),member_casual,day_of_journey,max(total_mins_spent),sort = TRUE),n = 20)
  member_casual day_of_journey max(total_mins_spent)      n
1        member      Wednesday         23.9825 hours 385290
2        member        Tuesday         23.9825 hours 375621
3        member       Thursday         23.9825 hours 361814
4        member         Friday         23.9825 hours 353772
5        member       Saturday         23.9825 hours 343383
6        member         Monday         23.9825 hours 333892
7        member         Sunday         23.9825 hours 297656

7 SHARE PHASE

7.1 Key Task

    • Establish the best way to share visualization using R and tableau.

    • Illustrate every minute detail backed with explanation.

    • Choose adequate graph type to conclude findings along with legends, labels and heading to improve readability and interpretation.

    • Ensure work is easily accessible.

7.2 Deliverable

    • Convey findings accompanied with illustration and explanation.

    • Put a short description of every visualization added under this phase.

7.3 VISUALIZATION

Here we are trying to look through overall distribution of members types based on total duration spent on weekdays.

Code
ggplot(data = all_datasets_2021)+ aes(x = day_of_journey, y = as.numeric(total_mins_spent),fill = member_casual) + 
  geom_bar(stat = "identity", width = 0.5, position = 'stack') + scale_fill_manual(values = c("red", "blue")) +
labs(title = "Cyclistic Data: Week Day Vs Total Duration Spent",x = "Weekday of Journey", 
     y = "Total Duration",fill = "Member Type") + theme(axis.text.x = element_text(angle = 60,hjust = 1)) + theme_minimal()

Here getting a glimpse on the utilization of rideable type by member type and then distributed across weekdays.

Code
ggplot(data = all_datasets_2021) +  (mapping =  aes(x = member_casual, fill = rideable_type)) + 
  geom_bar(width = 0.5, alpha = 2.5) + facet_wrap(~day_of_journey) + 
  scale_fill_manual(values = c("Black", " yellow", "green")) +
  labs(title = "Cyclistic Data: Member Type preference WIth Rideable Type",x = "Member Type", 
       y = "No. of Count",fill = "Rideable Type") +
  theme(axis.text.x = element_text(angle = 60,hjust = 1)) + theme_classic()

Glimpse on the member type based on total duration spent on total distance travelled.

Code
ggplot( data = all_datasets_2021) + 
  aes(x = hms::as_hms(total_mins_spent), y = total_distance, shape = member_casual,color = member_casual) +
  scale_color_manual(name = "Member Type",values = c("Black", " Purple")) + 
  scale_shape_manual(name = "Member Type",values = c(19,17)) + 
  geom_point(size = 3,alpha = 0.5,stroke = 1)+
  labs(title = "Cyclistic Data: Total Minutes spent with Total Distance Travelled",
       caption = "Comparing the Difference by Member Type",x = "Total Minutes Spent", y = "Total Distance Travelled") + 
  theme_minimal()

Taking a look on top 10 stations based on average duration spent by casual member type.

Code
ggplot( data = popular_ride_route_casual_top10) + aes(x = as_hms(average_duration_minutes) ,y = route, group = 1) + 
  geom_line(color="Red") + geom_point(shape=21, color="black", fill="blue", size=6) +
  labs( title = "Top 10 Most Visited Stations",subtitle = "For Casual Riders", 
        x = "Duration Spent (In Hours)", y = "Station Name", 
        caption = "Popularity of stations is determined by no of rides & average distance travelled") +
  theme_minimal()

Taking a look on top 10 stations based on average duration spent by subscribed member type.

Code
ggplot( data = popular_ride_route_member_top10) + aes(x = as_hms(average_duration_minutes) ,y = route, group = 1) + 
  geom_line(color="green") + geom_point(shape=21, color="black", fill="Brown", size=6) +
  labs( title = "Top 10 Most Visited Stations",subtitle = "For Membership Riders", 
        x = "Duration Spent (In Hours)", y = "Station Name", 
        caption = "Popularity of stations is determined by no of rides & average distance travelled") +theme_minimal()

Taking a look on top 10 stations based on average distance travelled by casual member type.

Code
ggplot(data = popular_ride_distance_casual_top10) + aes(x = average_distance, y = route, group = 1) +
  geom_line(color="orange") +geom_point(shape = 21, color = "black", fill = "dark green", size = 6) + 
  labs( title = "Top 10 Most Visited Stations",subtitle = "For Casual Riders", 
        x = "Duration Spent (In Hours)", y = "Station Name",
        caption = "Popularity of stations is determined by no of rides & average time spent") +theme_light()

Taking a look on top 10 stations based on average distance travelled by subscribed member type.

Code
ggplot(data = popular_ride_distance_member_top10) + aes(x = average_distance, y = route, group = 1) + geom_line(color="orange") +
  geom_point(shape = 21, color = "black", fill = "dark green", size = 6) + 
  labs( title = "Top 10 Most Visited Stations",subtitle = "For Membership Riders", 
        x = "Duration Spent (In Hours)", y = "Station Name", 
        caption = "Popularity of stations is determined by no of rides & average time spent") +theme_ipsum()

Overview on the distribution split of member type based on each weekdays of every month.

Code
ggplot(data = all_datasets_2021) + geom_col(aes(x = day_of_journey,y = month, fill = member_casual), position = "identity") + 
  scale_fill_manual(values = c("blue","red")) + labs(title = "Busiest WeekDay of The Month", 
subtitle = " Followed By Member Type", x = "Day of Journey" , y = "Month", fill = "Member Type") + 
  theme_bw()

Getting a snap on overall allocation of total duration spent on every month based on each weekdays.

Code
ggplot(data = all_datasets_2021) + geom_col(mapping = aes(x = month, y = as.numeric(as.difftime(total_mins_spent)),
  fill = day_of_journey)) + 
  scale_fill_manual(values = c("blue", "Orange","Dark Green","violet","Red","Brown","black")) + 
  labs(title = "Monthly time duration spent", subtitle = "Followed By Weekday", x = "Total Time Duration",
  y =" Month", fill = " Day of Journey") + 
  theme(axis.text.x = element_text(angle = 40, vjust = 0.5, hjust=1))

Overview of total distance travelled by each member type on every month.

Code
ggplot(data = all_datasets_2021) + aes(x = month, y = total_distance, fill = member_casual) + 
  geom_bar(stat ="identity",position = "dodge") + scale_fill_manual(values = c("blue","orange"))+
  labs(title = "Distance Travelled by Month", subtitle = "Followed by Member Type", x = " Month",
       y = "Total Distance", fill = "Member Type") +  
  theme(axis.text.x = element_text(angle = 40, vjust = 0.5, hjust=1),
        plot.background = element_rect(fill = "#D2B48C"))

Taking a glance on rideable type distribution on each weekdays of every month.

Code
ggplot(data = all_datasets_2021) + aes(x = day_of_journey, y = month) + 
  geom_col(aes(fill = rideable_type),width = .5,position = 'identity') + 
  scale_fill_manual(values = c("Red","Blue","Dark green"))+
  labs(title = "Number of Rides Per Weeek",subtitle = "Followed By Rideable Type",
       x ="Month", y = "Day Of Journey", fill = "Rideable Type") +
  theme(axis.text.x = element_text(angle = 50, vjust = 0.5, hjust=1)) + 
  theme_classic()

Overview on the monthly usage of rideable type.

Code
ggplot(data = all_datasets_2021) + aes(x = month, fill = rideable_type) + 
  stat_count(width = 0.5) + scale_fill_manual(values = c("Dark Blue","Maroon","Purple")) +
labs( title = "Rideable type Popularity By Months", x = "Months", y = "Count", fill = "Rideable Type") + 
  theme(axis.text.x = element_text(angle = 40, vjust = 0.5, hjust = 1))

8 MAP VISUALS.

                              Complete Map

                Start Station Name

                End Station Name

9 ACT PHASE

This crucial phase will be carried out by the executive team, Director of Marketing (Lily Moreno) and the Marketing Analytics team based on the above analysis made.

10 CONCLUSION

    • Cyclistic have a greater number of subscribed members as compared to casual riders and, based on the overall analysis, the ratio in which casual riders differ from subscribed members is only 13.76%.

    • Casual riders mostly prefer the weekend slot for choosing a ride with Cyclistic, which is Friday, Saturday and Sunday.

    • Subscribed members in general prefer weekday slots for choosing a ride with Cyclistic,which is Monday to Thursday, as well as the weekend slot.

    • Based on twelve month’s data, it was observed that casual riders took the ride for the longest duration of time as compared to subscribed members and also the month which recorded the highest spent time was July.

    • Distribution of rideable type seems completely unbalanced and inappropriate as docked bikes seem to have the least significance in terms of the usage distribution where only one bike was rented within an entire year by a subscribed member as compared to casual riders.

    • The overall month’s wise distribution shows that Saturdays and Sundays seem to attract more crowds as compared to all other weekdays and its constant for every single month.

    • Peak hours are mostly from 5’oclock to 7’oclock in the evening for both departure and arrival by each of the member types.

    • Out of three rideable options, the classic bike is the most endorsed one, followed by the electric bike and the docked bike, which have very minimal interaction with subscribed members especially.

11 DELIVERABLE

    • From the detailed analysis, it was learned that we need to cater to different sets of casual riders, first, who pursue cyclistic services in a completely oriented manner, and second, who just use them in an emergency or may be once in a while type.

    • One more thing needs to be noted to convert the maximum number of casual riders personified ads and campaigns are very essential and necessary.

    • The other thing is, it’s more about how to cater your services in a way which can have multiple unique value propositions based on the type of customers it is directed or promoted to.

    • It’s better to push some integral benefits that most people are surrounded by today, such as health, and then diversify health into various subsets like protection from cancer, arthritis,stroke etc to specific groups of casual riders.

    • Firstly introduce one month trial plan with all yearly perks and benefits and then start charging based on a monthly basis or annual basis.Also, use the old trick in the book by adding their payment information in advance and enabling auto payment on both a monthly, quarterly and yearly basis.

    • Remove the one-day subscription plan and include a minimum monthly plan with all the benefits that a member gets yearly and price it higher as compared to the annual plan and discounting should be done in a descending manner.

    • To enable the auto debit feature, ensure users the data is absolutely safe with the cyclistic plus add a tagline saying “We are never hard on commitments”.

    • Overall distribution of bikes is completely inappropriate as docked bike is just not considered by any members and it is mostly used by casual members due to unavailability. It is essential that company should consider converting all rideable types to electric bikes with different specifications and variants as the future is electric.

    • To make the subscription plan even more fascinating, some additional gear of its own brand can also be introduced or an association with a renowned brand can also be perfect.

    • Try hosting or sponsoring any big events that promote cyclistic and its services under all phenomena and also introduce flesh sales on any special events or occasions as it may help attract more number of customers.

12 RESOURCES

    • Stack Overflow.
    • RStudio and Kaggle community.
    • Dataset was made available by Motivate International Inc.