Cyclistic Bike-Share Analysis

About the Case Study.

This case study is a capstone project done in pursuance of the Google Data Analytics Certificate. The dataset was provided by the Cyclistic Marketing Analytics team. Cyclistic is a bike-share program that features over 5,800 bicycles and 600 docking station. Cyclistics offers various options ranging from reclining bikes to hand tricycles and cargo bikes to make bike-sharing more inclusive for people with disabilities and riders who can’t use a standard two-wheeled bike.

The business task is to create a marketing strategy aimed at getting more casual riders who pay for a full-day or single-ride pass to buy the membership subscription (a plan that gives annual access to the Cyclistics bike-share services). To do this the marketing analytics team need to provide a detailed analysis and report their findings to show how casual riders and members use cyclistic rides differently and why casual riders would buy the annual membership plan. As part of this task I will be analyzing and visualizing insights from a 12-month time period of the historical bike trip data to show how members and casual riders differ and the trends within that time.

Documenting the Analysis Process.

Collecting and load the dataset.
Cleaning the dataset.
Data Aggregation.
Visualizing the insights.
Draw conclusions and share suggestions.

For this project, I used the R Studio for the analysis process. I have chosen the R Studio because it affords me the chance to collect, clean, manipulate, visualize and create report in one environment. With the R Markdown document, I can also include code chunks that were used in the analysis process.

Load packages

To carry out the various data analysis process in R Studio, I will need to load some packages that will enable me load, clean, manipulate and visualize the data. Some of these packages include

tidyvesre
dpylr
lubridate
ggplot2
rmakrdown I will load the various packages. *I have installed the relevant packages. They are not documented in this file.

Import the datasets.

The datasets are saved in .csv format on my local machine. To import them into R Studio, I used the read.csv() function.

Jan_rides <- read.csv('Jan-2021.csv')
Feb_rides <- read.csv('Feb-2021.csv')
Mar_rides <-read.csv('Mar-2021.csv')
Apr_rides<-read.csv('Apr-2021.csv')
May_rides<-read.csv('May-2021.csv')
Jun_rides<-read.csv('Jun-2021.csv')
Jul_rides<- read.csv('Jul-2021.csv')
Aug_rides<- read.csv('Aug-2021.csv')
Sep_rides<-read.csv('Sep-2021.csv')
Oct_rides<-read.csv('Oct-2021.csv')
Nov_rides<-read.csv('Nov-2021.csv')
Dec_rides<-read.csv('Dec-2021.csv')

To merge the various datasets into a single dataset I used the rbind() function.

Bike_rides<-rbind(Jan_rides,Feb_rides,Mar_rides,Apr_rides,May_rides,Jun_rides,Jul_rides,Aug_rides,Sep_rides,Oct_rides,Nov_rides,Dec_rides)

To inspect and familiarize with the dataset, we will use the colnames(), head(), View() function.

head(Bike_rides)

##            ride_id rideable_type          started_at            ended_at
## 1 E19E6F1B8D4C42ED electric_bike 2021-01-23 16:14:19 2021-01-23 16:24:44
## 2 DC88F20C2C55F27F electric_bike 2021-01-27 18:43:08 2021-01-27 18:47:12
## 3 EC45C94683FE3F27 electric_bike 2021-01-21 22:35:54 2021-01-21 22:37:14
## 4 4FA453A75AE377DB electric_bike 2021-01-07 13:31:13 2021-01-07 13:42:55
## 5 BE5E8EB4E7263A0B electric_bike 2021-01-23 02:24:02 2021-01-23 02:24:45
## 6 5D8969F88C773979 electric_bike 2021-01-09 14:24:07 2021-01-09 15:17:54
##           start_station_name start_station_id end_station_name end_station_id
## 1 California Ave & Cortez St            17660                                
## 2 California Ave & Cortez St            17660                                
## 3 California Ave & Cortez St            17660                                
## 4 California Ave & Cortez St            17660                                
## 5 California Ave & Cortez St            17660                                
## 6 California Ave & Cortez St            17660                                
##   start_lat start_lng end_lat end_lng member_casual
## 1  41.90034 -87.69674   41.89  -87.72        member
## 2  41.90033 -87.69671   41.90  -87.69        member
## 3  41.90031 -87.69664   41.90  -87.70        member
## 4  41.90040 -87.69666   41.92  -87.69        member
## 5  41.90033 -87.69670   41.90  -87.70        casual
## 6  41.90041 -87.69676   41.94  -87.71        casual

*The head() function returns a result showing the first 6 rows of the dataset and metadata such as the column names and the datatype.

Cleaning the dataset

Now that I have successfully loaded the dataset, I will go through it to inspect for anomalies. Some things I need to do at this stage of the analysis process inlclude

Remove duplicate rows from the data
Rename certain columns
Change the format of the time columns from ‘char’ to month and time. This will help us carry out calculations and help us notice trends in dataset.
Create month and day column
Create a calculated column (travel_time)

Removing duplicate rows.

To check for duplicate columns I will use the distinct() function from the dplyr package. To this, I will run the code below after installing the package then run the distinct() function.

library(dplyr)

## Warning: package 'dplyr' was built under R version 4.2.1

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The syntax distinct(Bike_rides) returned the same number of rows, therefore there were no duplicate data.

I have left out the query results due to the magnitude of the data

Renaming Columns and creating a new data frame.

To rename the columns, I will use the rename() function. I will also use the assignment operator (<-) to create a new data frame.

Bike_rides_2021<-rename(Bike_rides, 'Bike_type'= 'rideable_type', 'Subscription'= 'member_casual', 'Start_time'= 'started_at', 'End_time'= 'ended_at')

I will use to head() function to check the new data frame

head(Bike_rides_2021)

##            ride_id     Bike_type          Start_time            End_time
## 1 E19E6F1B8D4C42ED electric_bike 2021-01-23 16:14:19 2021-01-23 16:24:44
## 2 DC88F20C2C55F27F electric_bike 2021-01-27 18:43:08 2021-01-27 18:47:12
## 3 EC45C94683FE3F27 electric_bike 2021-01-21 22:35:54 2021-01-21 22:37:14
## 4 4FA453A75AE377DB electric_bike 2021-01-07 13:31:13 2021-01-07 13:42:55
## 5 BE5E8EB4E7263A0B electric_bike 2021-01-23 02:24:02 2021-01-23 02:24:45
## 6 5D8969F88C773979 electric_bike 2021-01-09 14:24:07 2021-01-09 15:17:54
##           start_station_name start_station_id end_station_name end_station_id
## 1 California Ave & Cortez St            17660                                
## 2 California Ave & Cortez St            17660                                
## 3 California Ave & Cortez St            17660                                
## 4 California Ave & Cortez St            17660                                
## 5 California Ave & Cortez St            17660                                
## 6 California Ave & Cortez St            17660                                
##   start_lat start_lng end_lat end_lng Subscription
## 1  41.90034 -87.69674   41.89  -87.72       member
## 2  41.90033 -87.69671   41.90  -87.69       member
## 3  41.90031 -87.69664   41.90  -87.70       member
## 4  41.90040 -87.69666   41.92  -87.69       member
## 5  41.90033 -87.69670   41.90  -87.70       casual
## 6  41.90041 -87.69676   41.94  -87.71       casual

Removing Columns

For this analysis process, I will not need all the columns so I will delete certain columns, some of this columns include; * started_at * ended_at * start_station_id * end_station_id * start_lat * start_lng * end_lat * end_lng To delete columns, I used the syntax below

Bike_rides_2021 <-Bike_rides_2021 %>% select(-start_station_id,-end_station_id,-start_lat,-start_lng,-end_lat,-end_lng)

‘%>%’ in the syntax above is called a ‘pipe operator’.

I will run the head(Bike_rides_2021) syntax again to confirm that the columns have been deleted

head(Bike_rides_2021)

##            ride_id     Bike_type          Start_time            End_time
## 1 E19E6F1B8D4C42ED electric_bike 2021-01-23 16:14:19 2021-01-23 16:24:44
## 2 DC88F20C2C55F27F electric_bike 2021-01-27 18:43:08 2021-01-27 18:47:12
## 3 EC45C94683FE3F27 electric_bike 2021-01-21 22:35:54 2021-01-21 22:37:14
## 4 4FA453A75AE377DB electric_bike 2021-01-07 13:31:13 2021-01-07 13:42:55
## 5 BE5E8EB4E7263A0B electric_bike 2021-01-23 02:24:02 2021-01-23 02:24:45
## 6 5D8969F88C773979 electric_bike 2021-01-09 14:24:07 2021-01-09 15:17:54
##           start_station_name end_station_name Subscription
## 1 California Ave & Cortez St                        member
## 2 California Ave & Cortez St                        member
## 3 California Ave & Cortez St                        member
## 4 California Ave & Cortez St                        member
## 5 California Ave & Cortez St                        casual
## 6 California Ave & Cortez St                        casual

Changing the time and date column

For us to easily manipulate the data and carry out calculations on the date and time column (‘Start_time’ & ‘End_time’), we will need to change the data type from ‘CHAR’ to ‘DATETIME’. To do this we will need to use the mutate() function in the lubridate package to convert both columns to POSIXct. I have added the code chunk used below.

library(lubridate)

## Warning: package 'lubridate' was built under R version 4.2.1

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Bike_rides_2021 <- Bike_rides_2021 %>% mutate(Start_time=ymd_hms(Start_time))%>% 
  relocate(Start_time) %>%
  mutate(End_time = ymd_hms(End_time)) %>%
  relocate(End_time) %>%
  relocate(ride_id)

head(Bike_rides_2021)

##            ride_id            End_time          Start_time     Bike_type
## 1 E19E6F1B8D4C42ED 2021-01-23 16:24:44 2021-01-23 16:14:19 electric_bike
## 2 DC88F20C2C55F27F 2021-01-27 18:47:12 2021-01-27 18:43:08 electric_bike
## 3 EC45C94683FE3F27 2021-01-21 22:37:14 2021-01-21 22:35:54 electric_bike
## 4 4FA453A75AE377DB 2021-01-07 13:42:55 2021-01-07 13:31:13 electric_bike
## 5 BE5E8EB4E7263A0B 2021-01-23 02:24:45 2021-01-23 02:24:02 electric_bike
## 6 5D8969F88C773979 2021-01-09 15:17:54 2021-01-09 14:24:07 electric_bike
##           start_station_name end_station_name Subscription
## 1 California Ave & Cortez St                        member
## 2 California Ave & Cortez St                        member
## 3 California Ave & Cortez St                        member
## 4 California Ave & Cortez St                        member
## 5 California Ave & Cortez St                        casual
## 6 California Ave & Cortez St                        casual

We have changed the datatype for the ‘Start_time’ and ‘End_time’ columns, now we can carry out calculations the columns to analyze travel time.

Creating Month and Day Column

We will also create a new column for Month and Day, this will help us monitor trends that took place with the days of the week and the months. To do this we will use the mutate() function again.

Bike_rides_2021 <- Bike_rides_2021 %>% mutate(Months = month(Start_time, label = TRUE))

Bike_rides_2021 <- Bike_rides_2021 %>% mutate(Days_of_the_week = weekdays(Start_time))

Now we have modified the dataframe, we will examine it with the head()function

head(Bike_rides_2021)

##            ride_id            End_time          Start_time     Bike_type
## 1 E19E6F1B8D4C42ED 2021-01-23 16:24:44 2021-01-23 16:14:19 electric_bike
## 2 DC88F20C2C55F27F 2021-01-27 18:47:12 2021-01-27 18:43:08 electric_bike
## 3 EC45C94683FE3F27 2021-01-21 22:37:14 2021-01-21 22:35:54 electric_bike
## 4 4FA453A75AE377DB 2021-01-07 13:42:55 2021-01-07 13:31:13 electric_bike
## 5 BE5E8EB4E7263A0B 2021-01-23 02:24:45 2021-01-23 02:24:02 electric_bike
## 6 5D8969F88C773979 2021-01-09 15:17:54 2021-01-09 14:24:07 electric_bike
##           start_station_name end_station_name Subscription Months
## 1 California Ave & Cortez St                        member    Jan
## 2 California Ave & Cortez St                        member    Jan
## 3 California Ave & Cortez St                        member    Jan
## 4 California Ave & Cortez St                        member    Jan
## 5 California Ave & Cortez St                        casual    Jan
## 6 California Ave & Cortez St                        casual    Jan
##   Days_of_the_week
## 1         Saturday
## 2        Wednesday
## 3         Thursday
## 4         Thursday
## 5         Saturday
## 6         Saturday

Looking closely at the query results above, we can see that there are two new columns ‘Months’ and ‘Days_of_the_week’. Now that we have these new columns and we have changed the datatype of the ‘Start_time’ and ‘End_time’ columns, we can now find trends like

Average travel time spent by the different subscription type per month
Average travel time spent by the different subscription type daily
Number of rides taken monthly by the different subscription type.

Data Aggregation

Now that we have cleaned and organized our data, we will now do some data aggregation. Some data aggregation we will be carrying out in this analysis include;

Total trips taken .
Total trips taken by the different types of subscribers.
Total trips taken with different bike types.
Total trips taken per month.
Total trips taken by the different subscribers per month.
Total trips taken by the different subscribers per day.
Average travel time of different type of subscribers.
Average travel time of different bike types.
Average travel time of different subscribers monthly.

Some function we will be using for the aggregation include

summarise()
count()
arrange()
group_by()

Total trips taken

Bike_rides_2021 %>%
  summarise(count = n())

##     count
## 1 5595063

From the query results we see that Cyclistic recorded a total of 5,595,063 trip from January to December 2021.

Total trips per month

Monthly_Trips <- Bike_rides_2021 %>%
  count(Months)%>%
  arrange(Months)

Bike_rides_2021 %>%
  count(Months)%>%
  arrange(Months)

##    Months      n
## 1     Jan  96834
## 2     Feb  49622
## 3     Mar 228496
## 4     Apr 337230
## 5     May 531633
## 6     Jun 729595
## 7     Jul 822410
## 8     Aug 804352
## 9     Sep 756147
## 10    Oct 631226
## 11    Nov 359978
## 12    Dec 247540

Total trips taken with different biike types.

Bike_rides_2021 %>%
  group_by(Bike_type) %>%
  summarise(count = n())%>%
  arrange(desc(count))

## # A tibble: 3 × 2
##   Bike_type       count
##   <chr>           <int>
## 1 classic_bike  3251028
## 2 electric_bike 2031692
## 3 docked_bike    312343

This result shows us that majority of Cyclistic customers used the Classic Bikes for their rides in 2021. About 3 million users used that option. One reason could be that they are more familiar with it and therefore more comfortable using that option.

Total trips taken by the different types of subscribers.

Bike_rides_2021 %>%
  group_by(Subscription) %>%
  summarise(count = n())%>%
  arrange(desc(count))

## # A tibble: 2 × 2
##   Subscription   count
##   <chr>          <int>
## 1 member       3066058
## 2 casual       2529005

From the result we can tell that majority of the customers that used the Cyclistic Bike share services in 2021 are customers who have subscribed to the membership plan. Let’s take a look at the total number of trips the different type of subscribers monthly. To get this information from the dataset we will run the syntax below

Monthly_Rides_Subscription<-Bike_rides_2021 %>% 
  group_by(Subscription) %>%
  count(Months)%>%
  arrange(Months)

This query result shows us the breakdown of rides taken by ‘Member’ and ‘Casual’ subscribers monthly in the year. A visualization will help us make more sense of this information.

Total trips taken by the different subscribers per day.

Bike_rides_2021 %>% 
  group_by (Subscription) %>%
  count (Days_of_the_week)%>%
  arrange (Days_of_the_week)

## # A tibble: 14 × 3
## # Groups:   Subscription [2]
##    Subscription Days_of_the_week      n
##    <chr>        <chr>             <int>
##  1 casual       Friday           364080
##  2 member       Friday           446428
##  3 casual       Monday           286376
##  4 member       Monday           416212
##  5 casual       Saturday         558000
##  6 member       Saturday         433047
##  7 casual       Sunday           481143
##  8 member       Sunday           376142
##  9 casual       Thursday         286064
## 10 member       Thursday         451524
## 11 casual       Tuesday          274392
## 12 member       Tuesday          465513
## 13 casual       Wednesday        278950
## 14 member       Wednesday        477192

In the visualization section of this document we will see the trends in these result clearly.

Now that we have seen done some aggregation on the data to see the Total trips in within various time frames, we will go on to calculate the averages

Average Travel time of different type of subscribers (in Minutes)

Bike_rides_2021 %>% group_by(Subscription) %>%
  summarise(Travel_time = mean(minute(seconds_to_period(End_time - Start_time))))

## # A tibble: 2 × 2
##   Subscription Travel_time
##   <chr>              <dbl>
## 1 casual              18.0
## 2 member              12.0

We can see that casual riders had an average travel time of 18 minutes. This could mean that they rode longer or traveled farther distances

Average Travel time of different bike types

Bike_rides_2021 %>% group_by(Bike_type) %>%
  summarise(Travel_time = mean(minute(seconds_to_period(End_time - Start_time))))%>%
  arrange(Travel_time)

## # A tibble: 3 × 2
##   Bike_type     Travel_time
##   <chr>               <dbl>
## 1 electric_bike        13.4
## 2 classic_bike         14.6
## 3 docked_bike          24.4

The query results show that electric bikes have the shortest travel time, which means that it could be the fastest of the 3 options in the Cyclistic Bike-Sharing System. This could be as a result of the fact that electric bikes have an electric motor that aids propulsion.

Average Travel time of different type of subscribers monthly.

We want to tell what the travel time of the different subscribers was in the various month

 Bike_rides_2021 %>% group_by(Months,Subscription) %>% 
  summarise(Avg_travel_time=mean(minute(seconds_to_period(End_time - Start_time))))

## `summarise()` has grouped output by 'Months'. You can override using the
## `.groups` argument.

## # A tibble: 24 × 3
## # Groups:   Months [12]
##    Months Subscription Avg_travel_time
##    <ord>  <chr>                  <dbl>
##  1 Jan    casual                  15.2
##  2 Jan    member                  11.1
##  3 Feb    casual                  18.2
##  4 Feb    member                  12.7
##  5 Mar    casual                  19.7
##  6 Mar    member                  12.5
##  7 Apr    casual                  19.3
##  8 Apr    member                  12.9
##  9 May    casual                  19.7
## 10 May    member                  12.9
## # … with 14 more rows

Avg_travel_time_sub <- Bike_rides_2021 %>% group_by(Months,Subscription) %>% 
  summarise(Avg_travel_time=mean(minute(seconds_to_period(End_time - Start_time))))

## `summarise()` has grouped output by 'Months'. You can override using the
## `.groups` argument.

Visualizing Insights

We have done some cleaning and data aggregation, we now need to come up with visualization of some insights give the rest of the team a better understanding of the data and help them make decision from the findings. We will be plotting our visualizations using the ggplot() function from the ggplot2 package.

Note that packages have been pre-loaded

Some visualizations we will be creating include;

Total number of trips per month.
Total trips by Subscribers.
Total monthly trips by Subscription type.
Total trip by bike types.
Total trips by bike type and subscribers.
Average daily travel time by different subscribers.
Average demand per hour

Total trips per month

library(ggplot2)

## Warning: package 'ggplot2' was built under R version 4.2.1

options(scipen = 999)

Monthly_Trips %>%
  ggplot(aes(x=Months, y = n,fill = n))+
  geom_col(position = 'dodge')+
  labs(x= 'Month', y = 'Total trips', title = 'Total trips per month', fill = 'Trip Count')+
  scale_y_continuous(breaks = c(100000,200000,300000,400000,500000,600000,700000,800000))+
  scale_x_discrete(limits = c(Monthly_Trips$Months))

The visualization shows that the highest number of rides were recorded during the months of June to September.

Total trips by Subscription type

ggplot(data = Bike_rides_2021)+
  geom_bar(mapping = aes(x=Subscription, fill= Subscription))+
  labs(x= 'Subscription Type', y = 'Total trips', title = 'Total trips by Subscription Type', fill = 'Subscription Type')+
  scale_y_continuous(breaks = c(500000,1000000,1500000,2000000,2500000,3000000,3500000))

Total Monthly trips by Subscription Type

Monthly_Rides_Subscription %>%
  ggplot(aes(x=Months, y = n, fill = Subscription))+
  geom_col(position = 'dodge')+
  labs(x= 'Months', y= 'Total trips', fill = 'Subscription Type', title = 'Total Monthly trips by Subscription Type')+
  scale_y_continuous(breaks = c(50000,100000,150000,200000,250000,300000,350000,400000,450000))+
  scale_x_discrete(limits =c(Monthly_Rides_Subscription$Months))

The graph above shows us the breakdown of the total trips taken each month according to the subscription type.

Total Trips by bike types

Bike_rides_2021 %>%
  ggplot(aes(x=Bike_type,fill= Bike_type))+
  geom_bar(position = 'dodge')+
  labs(x= 'Bike Type', y = 'Total trips', title = 'Total trips by bike types')+
  scale_y_continuous(breaks = c (500000,1000000,1500000,2000000,2500000,3000000))

Total trips by bike types and subsccribers

Bike_rides_2021 %>%
  ggplot(aes(x=Bike_type,fill= Subscription))+
  geom_bar(position = 'dodge')+
  labs(x= 'Bike Type', y = 'Total trips', title = 'Total trips by bike types and subscribers')+
  scale_y_continuous(breaks = c (500000,1000000,1500000,2000000,2500000,3000000))

The graph shows us that majority of Cyclistic customers opted for the classic bike with about 3 million riders using the option and about 2 million of these riders subscribed were subscribed as members. All riders that opted for the docked bike options were casual subscribers.

Now that we have visualized the total trips and compared trends based on different categories. We look at the average values to examine trends.

Average daily travel time by different subscribers. (minutes)

Average Daily Travel time

Avg_travel_time <-Bike_rides_2021 %>% group_by(Subscription,Days_of_the_week) %>% 
  summarise(Avg_travel_time=mean(minute(seconds_to_period(End_time - Start_time))))

## `summarise()` has grouped output by 'Subscription'. You can override using the
## `.groups` argument.

Avg_travel_time %>% 
  ggplot(aes(x = Days_of_the_week, y = Avg_travel_time, color= Subscription, shape = Subscription)) + geom_point(position = "dodge") + 
  labs(x= 'Day of Week', y='Average Travel Time', title='Daily Average Travel time by Subcription type', fill = 'Subscription type') +
  scale_y_continuous(breaks = c(5, 10, 15, 20), labels = c("5mins", "10mins", "15mins", "20mins")) +
  scale_x_discrete(limits = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

## Warning: Width not defined. Set with `position_dodge(width = ?)`

The result of the visualization above shows us that casual riders had an average travel time of about 20 mins per day and member riders had an average travel time of about 15 minutes per day.

We will also take a look at the daily demand per hour to notice trends and observe peak periods.

To do this we will need to determine the daily demand. We will need to run some syntax using the mutate() function and some other functions in the dplyr package.

First we will create an hour column from the ‘Start_time’ column.

Bike_rides_2021<-Bike_rides_2021%>%
  mutate(hours = hour(Start_time))

head(Bike_rides_2021)

##            ride_id            End_time          Start_time     Bike_type
## 1 E19E6F1B8D4C42ED 2021-01-23 16:24:44 2021-01-23 16:14:19 electric_bike
## 2 DC88F20C2C55F27F 2021-01-27 18:47:12 2021-01-27 18:43:08 electric_bike
## 3 EC45C94683FE3F27 2021-01-21 22:37:14 2021-01-21 22:35:54 electric_bike
## 4 4FA453A75AE377DB 2021-01-07 13:42:55 2021-01-07 13:31:13 electric_bike
## 5 BE5E8EB4E7263A0B 2021-01-23 02:24:45 2021-01-23 02:24:02 electric_bike
## 6 5D8969F88C773979 2021-01-09 15:17:54 2021-01-09 14:24:07 electric_bike
##           start_station_name end_station_name Subscription Months
## 1 California Ave & Cortez St                        member    Jan
## 2 California Ave & Cortez St                        member    Jan
## 3 California Ave & Cortez St                        member    Jan
## 4 California Ave & Cortez St                        member    Jan
## 5 California Ave & Cortez St                        casual    Jan
## 6 California Ave & Cortez St                        casual    Jan
##   Days_of_the_week hours
## 1         Saturday    16
## 2        Wednesday    18
## 3         Thursday    22
## 4         Thursday    13
## 5         Saturday     2
## 6         Saturday    14

Now that we have created the hour column, we will find the daily demand.

Daily_demands<- Bike_rides_2021%>%
  group_by(Subscription, hours)%>%
  dplyr::summarise(Start_time)%>%
  as.data.frame()

## `summarise()` has grouped output by 'Subscription', 'hours'. You can override
## using the `.groups` argument.

Next we will group the daily demand by the hour

Daily_demands_hour <- Daily_demands%>%
  group_by(hours)%>%
  count(Subscription)

To help identify the trends easily, we ill plot the results on a line graph.

options(scipen = 999)

The syntax above helps to convert the scale of axes from scientific numbers to standard notation

Daily_demands_hour%>%
  ggplot(aes(x = hours, y = n , color = Subscription, group = Subscription))+
  geom_line(position = 'dodge')+
  labs(x = 'Hours', y = 'Number of trips', title = 'Daily demand per hour')+
  scale_y_continuous(breaks = c(50000, 100000, 150000, 200000, 250000, 300000, 350000))+
  scale_x_discrete(limits = c(Daily_demands_hour$hours))

## Warning: Continuous limits supplied to discrete scale.
## Did you mean `limits = factor(...)` or `scale_*_continuous()`?

## Warning: Width not defined. Set with `position_dodge(width = ?)`

Drawing Conclusions

We have identified different trends from this analysis process of the Cyclistic Bike-share Data. From these trends around the travel time and the number of trips we can come to the following conclusions

Most users prefer the classic bike type, the member subscribers (users who subscribed to the annual payment plan) never used the docked bike.
Based on the the data, July and August recorded the highest number of trips, this is the peak of summer and is a possible reason for the spike at that period.
The lowest number of trips are recorded within the months of December and March, these months are the winter months and customers won’t be able to do a lot of riding due to the temperature and weather conditions.
We also see that casual users more time on bike trips; especially on Saturdays and Sundays. This could mean that casual users ride bikes for leisure or exercise.
The data also shows a spike in demand especially by the member subscribers around the hours of 7am - 8am and 4pm - 6pm daily; these periods are the resumption and closing times respectively. This could mean that member subscribers use the Cyclistic Bike-Share as an alternative means of transportation to work to avoid traffic congestion and other delays.

Suggestions

From the analysis and conclusions, here are some suggestions to help the Cyclistics Marketing Analytics team design a new marketing strategy to convert casual subscribers to members.

Since casual users ride picks up on weekends, the marketing team can include a weekend only annual subscription plan that includes certain incentives not available for casual subscribers at a lower price.
Modifications to subscription plans that see users paying relatively less as they ride longer.
Rewards and bonuses for users who subscribe to the annual plan; some of these rewards could be vouchers or subscription discounts.
Marketing campaigns and promos that will give discount to first time subscribers should be launched between June to August as the number of trips and demand starts to build up at that time. This will entice casual users and make them opt for the annual plan.

Cyclistic Bike-Share Analysis

Sunday Itodo

2022-08-03

About the Case Study.

Documenting the Analysis Process.

Load packages

Import the datasets.

Cleaning the dataset

Removing duplicate rows.

Renaming Columns and creating a new data frame.

Removing Columns

Changing the time and date column

Creating Month and Day Column

Data Aggregation

Total trips taken

Total trips per month

Total trips taken with different biike types.

Total trips taken by the different types of subscribers.

Total trips taken by the different subscribers per day.

Average Travel time of different type of subscribers (in Minutes)

Average Travel time of different bike types

Average Travel time of different type of subscribers monthly.

Visualizing Insights

Total trips per month

Total trips by Subscription type

Total Monthly trips by Subscription Type

Total Trips by bike types

Total trips by bike types and subsccribers

Average daily travel time by different subscribers. (minutes)

Drawing Conclusions

Suggestions