Cyclistic bike-share analysis

About the company

Business Task:

This project aims to Maximize the number of annual Cyclistic memberships across 692 stations in Chicago

• To understand how casual riders and annual members use Cyclistic bikes differently

• To design a new marketing strategy to convert casual riders into annual members

This project does not include:

• Implementing any solutions

• Any areas outside the 692 stations across Chicago, as defined by Cyclistic

• No historical trip data older than 12 months will be considered in the project

Data Preparation

The data has been made available by Motivate International Inc. under this license.) This is public data that you can use to explore how different customer types are using Cyclistic bikes. But note that data-privacy issues prohibit you from using riders’ personally identifiable information. This means that you won’t be able to connect pass purchases to credit card numbers to determine if casual riders live in the Cyclistic service area or if they have purchased multiple single passes.

Load libraries

STEP 1: COLLECT DATA

Set up work directory

tripdata1 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202109-divvy-tripdata.csv',header=TRUE)
tripdata2 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202110-divvy-tripdata.csv',header=TRUE)
tripdata3 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202111-divvy-tripdata.csv',header=TRUE)
tripdata4 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202112-divvy-tripdata.csv',header=TRUE)
tripdata5 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202201-divvy-tripdata.csv',header=TRUE)
tripdata6 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202202-divvy-tripdata.csv',header=TRUE)
tripdata7 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202203-divvy-tripdata.csv',header=TRUE)
tripdata8 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202204-divvy-tripdata.csv',header=TRUE)
tripdata9 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202205-divvy-tripdata.csv',header=TRUE)
tripdata10 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202206-divvy-tripdata.csv',header=TRUE)
tripdata11 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202207-divvy-tripdata.csv',header=TRUE)
tripdata12 <-read.csv('C:\\Users\\Student226\\Downloads\\Data Analytics Capstone\\Trip Dataset\\202208-divvy-tripdata.csv',header=TRUE)

STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE

Combine dataset into R studio data frame

The combined dataset contains last 12 months of trip data(September 2021- August 2022).

combined_csv = rbind(tripdata1, tripdata2, tripdata3, tripdata4, tripdata5, tripdata6, tripdata7, tripdata8, tripdata9, tripdata10, tripdata11, tripdata12)

To summarize my data

summary(combined_csv)

##    ride_id          rideable_type       started_at          ended_at        
##  Length:5883043     Length:5883043     Length:5883043     Length:5883043    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##  start_station_name start_station_id   end_station_name   end_station_id    
##  Length:5883043     Length:5883043     Length:5883043     Length:5883043    
##  Class :character   Class :character   Class :character   Class :character  
##  Mode  :character   Mode  :character   Mode  :character   Mode  :character  
##                                                                             
##                                                                             
##                                                                             
##                                                                             
##    start_lat       start_lng         end_lat         end_lng      
##  Min.   :41.64   Min.   :-87.84   Min.   :41.39   Min.   :-88.97  
##  1st Qu.:41.88   1st Qu.:-87.66   1st Qu.:41.88   1st Qu.:-87.66  
##  Median :41.90   Median :-87.64   Median :41.90   Median :-87.64  
##  Mean   :41.90   Mean   :-87.65   Mean   :41.90   Mean   :-87.65  
##  3rd Qu.:41.93   3rd Qu.:-87.63   3rd Qu.:41.93   3rd Qu.:-87.63  
##  Max.   :45.64   Max.   :-73.80   Max.   :42.37   Max.   :-87.50  
##                                   NA's   :5727    NA's   :5727    
##  member_casual     
##  Length:5883043    
##  Class :character  
##  Mode  :character  
##                    
##                    
##                    
##

skim_without_charts(combined_csv)

Data summary
Name	combined_csv
Number of rows	5883043
Number of columns	13
_______________________
Column type frequency:
character	9
numeric	4
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique
ride_id	1	16	16	0	5883043
rideable_type	1	11	13	0	3
started_at	1	19	19	0	4912339
ended_at	1	19	19	0	4919944
start_station_name	1	0	64	884365	1439
start_station_id	1	0	44	884363	1273
end_station_name	1	0	64	946303	1453
end_station_id	1	0	44	946303	1282
member_casual	1	6	6	0	2

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
start_lat	0	1	41.90	0.05	41.64	41.88	41.90	41.93	45.64
start_lng	0	1	-87.65	0.03	-87.84	-87.66	-87.64	-87.63	-73.80
end_lat	5727	1	41.90	0.05	41.39	41.88	41.90	41.93	42.37
end_lng	5727	1	-87.65	0.03	-88.97	-87.66	-87.64	-87.63	-87.50

To get the structure of the data

glimpse(combined_csv)

## Rows: 5,883,043
## Columns: 13
## $ ride_id            <chr> "9DC7B962304CBFD8", "F930E2C6872D6B32", "6EF7213790…
## $ rideable_type      <chr> "electric_bike", "electric_bike", "electric_bike", …
## $ started_at         <chr> "2021-09-28 16:07:10", "2021-09-28 14:24:51", "2021…
## $ ended_at           <chr> "2021-09-28 16:09:54", "2021-09-28 14:40:05", "2021…
## $ start_station_name <chr> "", "", "", "", "", "", "", "", "", "", "Clark St &…
## $ start_station_id   <chr> "", "", "", "", "", "", "", "", "", "", "TA13070001…
## $ end_station_name   <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ end_station_id     <chr> "", "", "", "", "", "", "", "", "", "", "", "", "",…
## $ start_lat          <dbl> 41.89000, 41.94000, 41.81000, 41.80000, 41.88000, 4…
## $ start_lng          <dbl> -87.68000, -87.64000, -87.72000, -87.72000, -87.740…
## $ end_lat            <dbl> 41.89, 41.98, 41.80, 41.81, 41.88, 41.88, 41.74, 41…
## $ end_lng            <dbl> -87.67, -87.67, -87.72, -87.72, -87.71, -87.74, -87…
## $ member_casual      <chr> "casual", "casual", "casual", "casual", "casual", "…

STEP 3: CLEAN UP AND MANIPULATION TO PREPARE FOR ANALYSIS

make a copy of data

combined_csv2 <- combined_csv
combined_csv2

To remove duplicates

combined_csv2 <- combined_csv2 %>% 
  distinct(ride_id, .keep_all = TRUE)

To remove null values

combined_csv2 <- drop_na(combined_csv2)

Remove irrelevant columns for my analysis (ride_id, start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng)

The ride id is distinct so i dont need it for my analysis

The start startion id, end station id are duplicates of startion name, end station name.

The latitude and longtitude is not needed for my analysis

combined_csv2 <- combined_csv2 %>%
  select(-c(ride_id, start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng))

To check available columns

colnames(combined_csv2)

## [1] "rideable_type"      "started_at"         "ended_at"          
## [4] "start_station_name" "end_station_name"   "member_casual"

To rename member_casual as usertype

colnames(combined_csv2)[6] = "user_type"

To get an overview of the entire dataframe and check for inconsistencies and missing values

skim_without_charts(combined_csv2)

Data summary
Name	combined_csv2
Number of rows	5877316
Number of columns	6
_______________________
Column type frequency:
character	6
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	empty	n_unique
rideable_type	1	11	13	0	3
started_at	1	19	19	0	4908405
ended_at	1	19	19	0	4915726
start_station_name	1	0	64	884365	1439
end_station_name	1	0	64	940576	1453
user_type	1	6	6	0	2

Replace blank with NA

combined_csv2[combined_csv2 == "" ] <- NA

Check number of colums with missing values

colSums(is.na(combined_csv2))

##      rideable_type         started_at           ended_at start_station_name 
##                  0                  0                  0             884365 
##   end_station_name          user_type 
##             940576                  0

Fill NA’s value using Mode

Function to exclude NA values and find the mode from a column with NA excluded

mode <- function(x){
  ux <-na.omit(unique(x))
  tab <- tabulate(match(x, ux)); ux[tab == max(tab)]
}

Test on columns if the code is working

mode(combined_csv2$start_station_name)

## [1] "Streeter Dr & Grand Ave"

mode(combined_csv2$end_station_name)

## [1] "Streeter Dr & Grand Ave"

Replace the NA values by the mode(factor data)

combined_csv2$start_station_name[is.na(combined_csv2$start_station_name)] <- mode(combined_csv2$start_station_name)
combined_csv2$end_station_name[is.na(combined_csv2$end_station_name)] <- mode(combined_csv2$end_station_name)

To get an overview of the entire dataframe and check for inconsistencies and completeness

skim_without_charts(combined_csv2)

Data summary
Name	combined_csv2
Number of rows	5877316
Number of columns	6
_______________________
Column type frequency:
character	6
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
rideable_type	1	11	13	3
started_at	1	19	19	4908405
ended_at	1	19	19	4915726
start_station_name	1	7	64	1438
end_station_name	1	9	64	1452
user_type	1	6	6	2

Make a copy of my data frame

trip_data <- combined_csv2

To change started_at, ended_at column data types from char to date.

trip_data$started_at = as.POSIXct(trip_data$started_at)
trip_data$ended_at = as.POSIXct(trip_data$ended_at)

Capitilizing all first letter in my data frame for consistency.

trip_data <- data.frame(trip_data)
trip_data$user_type <- capitalize(trip_data$user_type)
trip_data$rideable_type <- capitalize(trip_data$rideable_type)
trip_data$start_station_name<-capitalize(trip_data$start_station_name)
trip_data$end_station_name<-capitalize(trip_data$end_station_name)

Inspect available column

colnames(trip_data)  #List of column names

## [1] "rideable_type"      "started_at"         "ended_at"          
## [4] "start_station_name" "end_station_name"   "user_type"

nrow(trip_data)  #How many rows are in data frame?

## [1] 5877316

dim(trip_data)  #Dimensions of the data frame?

## [1] 5877316       6

str(trip_data)  #See list of columns and data types (numeric, character, etc)

## 'data.frame':    5877316 obs. of  6 variables:
##  $ rideable_type     : chr  "Electric_bike" "Electric_bike" "Electric_bike" "Electric_bike" ...
##  $ started_at        : POSIXct, format: "2021-09-28 16:07:10" "2021-09-28 14:24:51" ...
##  $ ended_at          : POSIXct, format: "2021-09-28 16:09:54" "2021-09-28 14:40:05" ...
##  $ start_station_name: chr  "Streeter Dr & Grand Ave" "Streeter Dr & Grand Ave" "Streeter Dr & Grand Ave" "Streeter Dr & Grand Ave" ...
##  $ end_station_name  : chr  "Streeter Dr & Grand Ave" "Streeter Dr & Grand Ave" "Streeter Dr & Grand Ave" "Streeter Dr & Grand Ave" ...
##  $ user_type         : chr  "Casual" "Casual" "Casual" "Casual" ...

summary(trip_data)  #Statistical summary of data. Mainly for numeric

##  rideable_type        started_at                    
##  Length:5877316     Min.   :2021-09-01 00:00:06.00  
##  Class :character   1st Qu.:2021-11-06 13:36:49.75  
##  Mode  :character   Median :2022-05-07 12:05:47.00  
##                     Mean   :2022-03-22 05:15:59.85  
##                     3rd Qu.:2022-07-06 15:54:58.25  
##                     Max.   :2022-08-31 23:59:39.00  
##     ended_at                      start_station_name end_station_name  
##  Min.   :2021-09-01 00:00:41.00   Length:5877316     Length:5877316    
##  1st Qu.:2021-11-06 13:56:35.75   Class :character   Class :character  
##  Median :2022-05-07 12:27:11.50   Mode  :character   Mode  :character  
##  Mean   :2022-03-22 05:33:16.32                                        
##  3rd Qu.:2022-07-06 16:10:50.00                                        
##  Max.   :2022-09-02 00:27:01.00                                        
##   user_type        
##  Length:5877316    
##  Class :character  
##  Mode  :character  
##                    
##                    
##

head(trip_data)  #See the first 6 rows of data frame

tail(trip_data)  #See the last 6 rows of data frame

Make a copy of my data frame

trip_data2 <- trip_data

There are a few problems we will need to fix:

Adding columns that list the date, month, day, year, day of the week, ride length, day type and day time of each ride

Notable_days <- tibble(
  day = c("New Year’s Day","Luther King's Jr Day","Valentine's Day","Presidents' day","Memorial Day","Independence Day","Labor Day",
          "Halloween","Veterans Day","Thanksgiving Day","Christmas Day"),
  date = c("01-01","01-17","02-14","02-21","5-30","07-04","09-05","10-31","11-11","11-24","12-25"))


trip_data2 <- trip_data2 %>% 
  mutate(date = as.Date(as.character(as.POSIXct(trip_data2$started_at))),
                                   month = months(trip_data2$started_at),
                                   year = year(trip_data2$started_at),
                                   weekdays = weekdays(trip_data2$started_at),
)

Add a “ride_length” calculation to all_trips (in seconds), type of day and time period

trip_data2$ride_length <- difftime(trip_data2$ended_at,trip_data2$started_at, units = "mins")

trip_data2 <- trip_data2 %>% 
  mutate(day_type = case_when(
    as.character(format(trip_data2$started_at, "%m-%d")) %in% Notable_days$date ~ 'Holidays',
    trip_data2$weekdays %in% c('Saturday','Sunday') ~ 'Weekend', TRUE ~ 'Weekday'),
    time_period = case_when(
      hour(trip_data2$started_at) >= 5 & hour(trip_data2$started_at) < 12 ~ 'Morning', 
      hour(trip_data2$started_at) >= 12 & hour(trip_data2$started_at) < 16 ~ 'Afternoon', 
      hour(trip_data2$started_at)>= 16 & hour(trip_data2$started_at) <= 20 ~ 'Evening', 
      TRUE ~ 'Night')
  )

Convert “ride_length” from Factor to numeric so we can run calculations on the data

trip_data2$ride_length <- as.numeric(as.character(trip_data2$ride_length))
is.numeric(trip_data2$ride_length)

## [1] TRUE

Copy of my data

trip_data3 <- trip_data2

identifying outliners

The dataframe includes a few hundred entries when bikes ride_length was negative.

Customers who purchase single-ride or full-day passes are referred to as casual riders

Remove casual riders than have less than than 1 and more than 1440 mins

Remove member riders than have less than than 60 and leaving those more than 1440 mins.

Note: Customers who purchase single-ride or full-day passes are referred to as casual riders. I figured members can use ride for more than a day

attach(trip_data3)
trip_data3 <- trip_data3[order(-ride_length),]

trip_data3 <- trip_data3[!(trip_data3$ride_length < 1 | trip_data3$ride_length > 1440 & trip_data3$user_type == "Casual"  ),]

after removing member riders with less than than 60 and leaving those more than 1440 min, It shows that the highest ride length is 1499.933 mins. That mins 59mins extra on just 21 riders. This can be late return of bike which I cant really say because no client here to ask. I will be removing these 21 riders with more than 1440 mins to have a bias free data.

trip_data3 <- trip_data3[!(trip_data3$ride_length > 1440),]

There was 111,156 outliners for riders less than 1 min and greater than 1440 mins

To get an overview of the entire dataframe and check for inconsistencies and completeness

skim_without_charts(trip_data3)

Data summary
Name	trip_data3
Number of rows	5766160
Number of columns	13
_______________________
Column type frequency:
character	8
Date	1
numeric	2
POSIXct	2
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
rideable_type	1	11	13	3
start_station_name	1	7	64	1436
end_station_name	1	9	64	1449
user_type	1	6	6	2
month	1	3	9	12
weekdays	1	6	9	7
day_type	1	7	8	3
time_period	1	5	9	4

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	2021-09-01	2022-08-31	2022-05-06	365

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
year	0	1	2021.66	0.47	2021	2021.00	2022.00	2022.00	2022.00
ride_length	0	1	17.27	30.26	1	6.28	10.93	19.55	1439.37

Variable type: POSIXct

skim_variable	n_missing	complete_rate	min	max	median	n_unique
started_at	0	1	2021-09-01 00:00:06	2022-08-31 23:59:39	2022-05-06 22:22:20	4830709
ended_at	0	1	2021-09-01 00:03:37	2022-09-01 19:10:49	2022-05-06 22:34:10	4837768

For my analysis i will need the following columns

Rideable_type,start_station_name,end_station_name,user_type, date, month,year,weekdays, ride_length, day_type, time_period.

Additional table to drop before analysis.

clean_data <- trip_data3 %>%
  select(-c(started_at,ended_at))

To get an overview of the entire dataframe and check for inconsistencies and completeness

skim_without_charts(clean_data)

Data summary
Name	clean_data
Number of rows	5766160
Number of columns	11
_______________________
Column type frequency:
character	8
Date	1
numeric	2
________________________
Group variables	None

Variable type: character

skim_variable	complete_rate	min	max	n_unique
rideable_type	1	11	13	3
start_station_name	1	7	64	1436
end_station_name	1	9	64	1449
user_type	1	6	6	2
month	1	3	9	12
weekdays	1	6	9	7
day_type	1	7	8	3
time_period	1	5	9	4

Variable type: Date

skim_variable	n_missing	complete_rate	min	max	median	n_unique
date	0	1	2021-09-01	2022-08-31	2022-05-06	365

Variable type: numeric

skim_variable	n_missing	complete_rate	mean	sd	p0	p25	p50	p75	p100
year	0	1	2021.66	0.47	2021	2021.00	2022.00	2022.00	2022.00
ride_length	0	1	17.27	30.26	1	6.28	10.93	19.55	1439.37

STEP 4: CONDUCT DESCRIPTIVE ANALYSIS

mean(clean_data$ride_length)

## [1] 17.27148

median(clean_data$ride_length)

## [1] 10.93333

max(clean_data$ride_length)

## [1] 1439.367

min(clean_data$ride_length)

## [1] 1

Total number of rides by each usertype and rideable type

total_ride_by_user <-
clean_data %>% 
  group_by(user_type)  %>%  
  summarise(total_ride = n()) %>%
  mutate(per_of_ride = (round(total_ride / sum(total_ride),4 )) * 100) %>% 
  arrange(desc(total_ride))

total_ride_by_user

total_ride_by_rideabletype <-
clean_data %>% 
  group_by(rideable_type)  %>%  
  summarise(total_ride = n()) %>% 
  mutate(per_of_ride = (round(total_ride / sum(total_ride),4 )) * 100) %>%
  arrange(desc(total_ride))

total_ride_by_rideabletype

Total number of ride by each usertype per the type of ride

totalride_user_per_rideabletype <- 
clean_data %>% 
  group_by(rideable_type,user_type)  %>%  
  summarise(total_ride = n(),.groups = "drop") %>% 
  mutate(per_of_ride = (round(total_ride / sum(total_ride),4 )) * 100) %>%
  arrange(desc(total_ride))

totalride_user_per_rideabletype

Total number of rides and total ride length by each usertype and rideable type

total_ride_length_usertype <-
clean_data %>% 
  group_by(user_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length)) %>%
  mutate(per_of_ride = (round(total_ride / sum(total_ride),4 )) * 100) %>%
  mutate(per_of_ride_length = (round(total_ride_length / sum(total_ride_length ),4 )) * 100) %>%
  arrange(desc(total_ride_length))

total_ride_length_usertype

total_ride_length_rideabletype <-
clean_data %>% 
  group_by(rideable_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length)) %>%
  mutate(per_of_ride = (round(total_ride / sum(total_ride),4 )) * 100) %>%
  mutate(per_of_ride_length = (round(total_ride_length / sum(total_ride_length ),4 )) * 100) %>%
  arrange(desc(total_ride_length))

total_ride_length_rideabletype

Total number of rides and total ride length by each user type per the type of ride

totalridelength_user_per_rideabletype <-
clean_data %>% 
  group_by(user_type, rideable_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length), .groups = "drop") %>%
  mutate(per_of_ride = (round(total_ride / sum(total_ride),4 )) * 100) %>%
  mutate(per_of_ride_length = (round(total_ride_length / sum(total_ride_length ),4 )) * 100) %>%
  arrange(desc(total_ride_length))

totalridelength_user_per_rideabletype

Average and max ride length for each usertype

avg_max_usertype <-
clean_data %>% 
  group_by(user_type) %>%
  summarise(max_ride_length = max(ride_length)
            ,average_ride_length = mean(ride_length)) %>%
  arrange(desc(max_ride_length))

avg_max_usertype

Average and max ride length for each rideable type

avg_max_rideabletype <-
clean_data %>% 
  group_by(rideable_type) %>%
  summarise(max_ride_length = max(ride_length)
            ,average_ride_length = mean(ride_length)) %>%
  arrange(desc(max_ride_length))
avg_max_rideabletype

Average and max ride length for each usertype by rideable type

clean_data %>% 
  group_by(user_type, rideable_type) %>%
  summarise(max_ride_length = max(ride_length)
            ,average_ride_length = mean(ride_length),.groups = "drop") %>%
  arrange(desc(max_ride_length))

No of rides, ride length, average ride length, max ride length by usertype for each weekday

clean_data$weekdays <- ordered(clean_data$weekdays, levels=c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday"))

user_type_weekday <-
clean_data %>% 
  group_by(weekdays, user_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length),.groups = "drop") %>%
  arrange(weekdays, user_type)

user_type_weekday

No of rides, ride length, average ride length, max ride length for rideable for each weekday

rideable_type_weekday <-
clean_data %>% 
  group_by(weekdays, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length), .groups = "drop") %>% 
  arrange(weekdays, rideable_type)

rideable_type_weekday

No of ride, ride length, average ride length, max ride length by each usertype for rideable type on each weekday

user_rideable_weekday <-
clean_data %>% 
  group_by(weekdays,user_type, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length), .groups = "drop") %>% 
  arrange(weekdays, user_type, rideable_type)

user_rideable_weekday

No of ride, ride length, average ride length, max ride length by usertype for each daytype

user_type_daytype <-
clean_data %>% 
  group_by(day_type, user_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length),.groups = "drop") %>%
  arrange(day_type, user_type)

user_type_daytype

No of ride, ride length, average ride length, max ride length for rideable for each day type

rideable_type_daytype <-
clean_data %>% 
  group_by(day_type, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length), .groups = "drop") %>% 
  arrange(day_type, rideable_type)

rideable_type_daytype

No of ride, ride length, average ride length, max ride length by each usertype for rideable type on each daytype

user_rideable_daytype <-
clean_data %>% 
  group_by(day_type,user_type, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length), .groups = "drop") %>% 
  arrange(day_type, user_type, rideable_type)

user_rideable_daytype

No of ride, ride length, average ride length, max ride length by usertype for each time period

clean_data$time_period <- ordered(clean_data$time_period, levels=c("Morning", "Afternoon", "Evening", "Night"))

user_type_time_period <-
clean_data %>% 
  group_by(time_period, user_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),.groups = "drop") %>%
  arrange(time_period, user_type)

user_type_time_period

No of rides, ride length, average ride length, max ride length for rideable for each time period

rideable_type_time_period <-
clean_data %>% 
  group_by(time_period, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length), .groups = "drop") %>% 
  arrange(time_period, rideable_type)

rideable_type_time_period

No of rides, ride length, average ride length, max ride length by each usertype for rideable type on each time period

user_rideable_time_period <-
clean_data %>% 
  group_by(time_period,user_type, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length), .groups = "drop") %>% 
  arrange(time_period, user_type, rideable_type)

user_rideable_time_period

No of rides, ride length, average ride length, max ride length by usertype for each month

clean_data$month <- ordered(clean_data$month, levels=c("September", "October", "November", "December", "January", "February", "March", "April", "May", "June", "July", "August"))

user_type_month <-
clean_data %>% 
  group_by(month, user_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length),.groups = "drop") %>%
  arrange(month, user_type)

user_type_month

No of ride, ride length, average ride length, max ride length for rideable for each month

rideable_type_month <-
clean_data %>% 
  group_by(month, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length),.groups = "drop") %>%
  arrange(month, rideable_type)

rideable_type_month

No of ride, ride length, average ride length, max ride length by each usertype for rideable type on each month

user_rideable_month <-
clean_data %>% 
  group_by(month, user_type, rideable_type) %>%
  summarise(total_ride = n(),
            total_ride_length = sum(ride_length),
            max_ride_length = max(ride_length),
            average_ride_length = mean(ride_length),.groups = "drop") %>%
  arrange(month, user_type, rideable_type)
user_rideable_month

Top 10 starting station for member users

top_10_member <-
clean_data %>% 
  filter(user_type == "Member") %>% 
  group_by(start_station_name, user_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length), .groups = "drop") %>%
  arrange(desc(total_ride)) %>% 
  head(10)

top_10_member

Top 10 starting station for member users

top_10_casual <-
clean_data %>% 
  filter(user_type == "Casual") %>% 
  group_by(start_station_name, user_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length), .groups = "drop") %>%
  arrange(desc(total_ride)) %>% 
  head(10)

top_10_casual

Top 10 starting station for rideable type

top_10_classicbike <-
clean_data %>% 
  filter(rideable_type == "Classic_bike") %>% 
  group_by(start_station_name, rideable_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length), .groups = "drop") %>%
  arrange(desc(total_ride)) %>% 
  head(10)

top_10_classicbike

Top 10 starting station for rideable type

top_10_dockedbike <-
clean_data %>% 
  filter(rideable_type == "Docked_bike") %>% 
  group_by(start_station_name, rideable_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length), .groups = "drop") %>%
  arrange(desc(total_ride)) %>% 
  head(10)

top_10_dockedbike

Top 10 starting station for rideable type

top_10_electricbike <-
clean_data %>% 
  filter(rideable_type == "Electric_bike") %>% 
  group_by(start_station_name, rideable_type) %>%
  summarise(total_ride = n()
            ,total_ride_length = sum(ride_length), .groups = "drop") %>%
  arrange(desc(total_ride)) %>% 
  head(10)

top_10_electricbike

Data Visualization

Total number of rides by each user type and rideable type

Total number of rides by users for the type of ride

totalride_user_per_rideabletype %>% 
  ggplot(mapping = aes(x= rideable_type, y = total_ride, fill = user_type))+
  geom_col(position= "dodge")+
  labs(title = "The total number of rides by users for each type of ride",x="Type of ride by users",y="No of Rides")+
  geom_text(aes(label = total_ride), vjust=1.2,position = position_dodge(0.9), size=3)+
  theme(text = element_text(size = 9))

The total ride length by each user type and rideable type

total_ride_length_usertype %>% 
  ggplot(mapping = aes(x= user_type, y = total_ride_length, fill = user_type))+
  geom_col(position= "dodge")+
  labs(title = "The total ride length by user type in mins",x="User type",y="Total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2), "mins")), vjust=1.6, position = position_dodge(0.7), size=3)+
  theme(text = element_text(size = 9))

total_ride_length_usertype %>% 
  ggplot(aes(x= "", y = per_of_ride_length, fill = user_type))+
  geom_col()+
  coord_polar(theta = "y")+
  labs(title = "Percentage of ride length")+
  theme(legend.position="top",
        text = element_text(size = 10))+
  geom_text(aes(label = paste(user_type, paste(per_of_ride_length, "%"), sep = "\n")),
            position = position_stack(vjust = 0.5), size = 3, color = "black")

total_ride_length_rideabletype %>% 
  ggplot(mapping = aes(x= rideable_type, y = total_ride_length, fill = rideable_type))+
  geom_col(position= "dodge")+
  labs(title = "The total ride length for rideable type in mins",x="Rideable type",y="Total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.7), size=3)+
  theme(text = element_text(size = 9))

total_ride_length_rideabletype %>% 
  ggplot(aes(x= "", y = per_of_ride_length, fill = rideable_type))+
  geom_col()+
  coord_polar(theta = "y")+
  labs(title = "Percentage of ride length")+
  theme(legend.position="top",
        text = element_text(size = 10))+
  geom_text(aes(label = paste(rideable_type, paste(per_of_ride_length, "%"), sep = "\n")),
            position = position_stack(vjust = 0.5), size = 3, color = "black")

Total number of rides and total ride length by each user type per the type of ride

totalridelength_user_per_rideabletype %>% 
  ggplot(mapping = aes(x= rideable_type, y = total_ride_length, fill = user_type))+
  geom_col(position= "dodge")+
  labs(title = "The total ride length by user type in mins",x="User type",y="Total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2), "mins")), vjust=1.6, position = position_dodge(0.7), size=2)+
  theme(text = element_text(size = 9))

Max and Average ride length for user type

avg_max_usertype %>% 
  ggplot(mapping = aes(x= user_type, y = max_ride_length, fill = user_type))+
  geom_col(position= "dodge")+
  labs(title = "Max ride length by each user type",x="User type",y="Max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 9))

avg_max_usertype %>% 
  ggplot(mapping = aes(x= user_type, y = average_ride_length, fill = user_type))+
  geom_col(position= "dodge")+
  labs(title = "Average ride length by each user type",x="User type",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 9))

Max and Average ride length for rideable type

avg_max_rideabletype %>% 
  ggplot(mapping = aes(x= rideable_type, y = max_ride_length, fill = rideable_type))+
  geom_col(position= "dodge")+
  labs(title = "Max ride length by each rideable type",x="Rideable type",y="Max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 9))

avg_max_rideabletype %>% 
  ggplot(mapping = aes(x= rideable_type, y = average_ride_length, fill = rideable_type))+
  geom_col(position= "dodge")+
  labs(title = "Average ride length by each rideable type",x="Rideable type",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 9))

Average and max ride by users for each type of ride

clean_data %>% 
  group_by(rideable_type, user_type) %>%
  summarise(average_ride_length = mean(ride_length),.groups = "drop") %>%
  arrange(rideable_type)%>% 
  ggplot(mapping = aes(x= rideable_type, y = average_ride_length, fill = user_type))+
  geom_col(position= "dodge")+
  labs(title = "Average ride length by users for each type of ride",x="Type of ride by users",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

clean_data %>% 
  group_by(rideable_type, user_type) %>%
  summarise(Max_ride_length = max(ride_length),.groups = "drop") %>%
  arrange(rideable_type)%>% 
  ggplot(mapping = aes(x= rideable_type, y = Max_ride_length, fill = user_type))+
  geom_col(position= "dodge")+
  labs(title = "Max ride length by users for each type of ride",x="Type of ride by users",y="Max ride length")+
  geom_text(aes(label = paste(round(Max_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

No of rides, ride length, average ride length, max ride length by user type for each weekday

user_type_weekday %>% 
  ggplot(mapping = aes(x= weekdays, y =total_ride, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "Total no of rides by each user type for each week day",x="Week days",y="No of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_weekday %>% 
  ggplot(mapping = aes(x= weekdays, y =total_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "The total ride length by each user type for each week day",x="Week days",y="Total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2), "mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_weekday %>%
  ggplot(mapping = aes(x= weekdays, y =average_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "The average ride length by each user type for each week day",x="Week days",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_weekday %>% 
  ggplot(mapping = aes(x= weekdays, y =max_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "The max ride length by each user type for each week day",x="Week days",y="Max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

No of rides, ride length, average ride length, max ride length by rideable type for each weekday

rideable_type_weekday%>% 
  ggplot(mapping = aes(x= weekdays, y =total_ride, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "Total no of rides by each rideable type for each week day",x="Week days",y="No of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_weekday %>% 
  ggplot(mapping = aes(x= weekdays, y =total_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "Total ride length by each rideable type for each week day",x="Week days",y="total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_weekday %>% 
  ggplot(mapping = aes(x= weekdays, y =average_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "Average ride length by each rideable type for each week day",x="Week days",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_weekday %>% 
  ggplot(mapping = aes(x= weekdays, y =max_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "The max ride length by each rideable type for each week day",x="Week days",y="Max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

No of rides, ride length, average ride length, max ride length by rideable type for each day type

rideable_type_daytype%>% 
  ggplot(mapping = aes(x= day_type, y =total_ride, fill =rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "Total no of rides by each rideable type for each day type",x="Day type",y="No of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_daytype %>% 
  ggplot(mapping = aes(x= day_type, y =total_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "total ride length by each rideable type for each day type",x="Day type",y="total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_daytype %>% 
  ggplot(mapping = aes(x= day_type, y =average_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "Average ride length by each rideable type for each day type",x="Day type",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_daytype %>% 
  ggplot(mapping = aes(x= day_type, y =max_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "The max ride length by each rideable type for each day type",x="Day type",y="Max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

No of rides, ride length, average ride length, max ride length by user type for each day type

user_type_daytype %>% 
  ggplot(mapping = aes(x= day_type, y =total_ride,fill =user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "Total no of rides by each user type for each day type",x="Day type",y="number_of_rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_daytype %>%
  ggplot(mapping = aes(x= day_type, y =total_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "total ride length by each user type for each day type",x="Day type",y="total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_daytype %>%
  ggplot(mapping = aes(x= day_type, y =average_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "Average ride length by each user type for each day type",x="Day type",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_daytype %>%
  ggplot(mapping = aes(x= day_type, y =max_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "The max ride length by each user type for each day type",x="Day type",y="Max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

No of rides by rideable type for each time period

rideable_type_time_period %>% 
  ggplot(mapping = aes(x= time_period, y =total_ride, fill =rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  labs(title = "Total no of rides by each rideable type for each time period",x="Time period",y="number_of_rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

No of rides and total ride length by users for each time period

user_type_time_period %>%  
  ggplot(mapping = aes(x= time_period, y =total_ride,fill =user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "Total no of rides by each user type for each time period",x="Time period",y="number_of_rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 9))

user_type_time_period %>%  
  ggplot(mapping = aes(x= time_period, y =total_ride_length, fill =user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  labs(title = "Total ride length by each user type for each time period",x="Time period",y="Total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=3)+
  theme(text = element_text(size = 9))

No of rides, ride length, average ride length, max ride length by rideable type for each month

rideable_type_month %>% 
  ggplot(mapping = aes(x= month, y =total_ride, fill =rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Total no of rides by each rideable type for each month (September 2021 - August 2022)",x="Months",y="number_of_rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 9))

rideable_type_month %>%  
  ggplot(mapping = aes(x= month, y =total_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "total ride length by each rideable type for each month (September 2021 - August 2022)",x="Months",y="total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_month%>% 
  ggplot(mapping = aes(x= month, y =average_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Average ride length by each rideable type for each month (September 2021 - August 2022)",x="Months",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

rideable_type_month%>% 
  ggplot(mapping = aes(x= month, y =max_ride_length, fill = rideable_type))+
  geom_line(aes(group = rideable_type, color = rideable_type), size = 4)+
  geom_point(aes(group = rideable_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Max ride length by each rideable type for each month (September 2021 - August 2022)",x="Months",y="Max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

No of rides, ride length, average ride length, max ride length by user type for each month

user_type_month %>%  
  ggplot(mapping = aes(x= month, y =total_ride, fill =user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "No of rides by each user type for each month (September 2021 - August 2022)",x="Months",y="number_of_rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_month %>%  
  ggplot(mapping = aes(x= month, y =total_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "total ride length by each user type for each month (September 2021 - August 2022)",x="Months",y="total ride length")+
  geom_text(aes(label = paste(round(total_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_month %>%  
  ggplot(mapping = aes(x= month, y =average_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Average ride length by each user type for each month (September 2021 - August 2022)",x="Months",y="Average ride length")+
  geom_text(aes(label = paste(round(average_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

user_type_month %>%  
  ggplot(mapping = aes(x= month, y =max_ride_length, fill = user_type))+
  geom_line(aes(group = user_type, color = user_type), size = 4)+
  geom_point(aes(group = user_type), size = 1)+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Max ride length by each user type for each month (September 2021 - August 2022)",x="Months",y="max ride length")+
  geom_text(aes(label = paste(round(max_ride_length, digits = 2),"mins")), vjust=1.6,position = position_dodge(0.9), size=2)+
  theme(text = element_text(size = 7))

Top 10 starting station for users

top_10_member %>% 
  ggplot(mapping = aes(x= start_station_name, y = total_ride, fill = user_type))+
  geom_col(position= "dodge")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Top 10 starting station for member users",x="Start station name",y="Number of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.7), size=2)+
  theme(text = element_text(size = 7))

top_10_casual %>% 
  ggplot(mapping = aes(x= start_station_name, y = total_ride, fill = user_type))+
  geom_col(position= "dodge")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Top 10 starting station for casual users",x="Start station name",y="Number of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.7), size=2)+
  theme(text = element_text(size = 7))

Top 10 starting station for rideable type

top_10_classicbike %>% 
  ggplot(mapping = aes(x= start_station_name, y = total_ride, fill = rideable_type))+
  geom_col(position= "dodge")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Top 30 starting station for classic bike riders",x="Start station name",y="Number of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.7), size=2)+
  theme(text = element_text(size = 7))

top_10_classicbike %>% 
  ggplot(mapping = aes(x= start_station_name, y = total_ride, fill = rideable_type))+
  geom_col(position= "dodge")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Top 15 starting station for docked bike riders",x="Start station name",y="Number of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.7), size=2)+
  theme(text = element_text(size = 7))

top_10_classicbike %>%
  ggplot(mapping = aes(x= start_station_name, y = total_ride, fill = rideable_type))+
  geom_col(position= "dodge")+
  theme(axis.text.x = element_text(angle = 90, hjust = 1))+
  labs(title = "Top 10 starting station for electric bike riders",x="Start station name",y="Number of rides")+
  geom_text(aes(label = total_ride), vjust=1.6,position = position_dodge(0.7), size=2)+
  theme(text = element_text(size = 7))

Findings and Recommendations

1. Based on my analysis, it appears that members have a much higher number of rides on electric and classic bikes than casual users. However, casual users have a much longer total ride distance than members. It is my recommendation that Cyclist should implement relationship marketing in order to enable them to develop a better connection with casual users. This will enable them to ensure an emotional connection with the brand. A prompt and efficient response to all customer inquiries on all platforms.

2. Cyclist should create a new membership program

a. There should be a membership program targeted at casual members during the summer months (May - October). The longest ride length is experienced by casual users during this period.

b. Membership should include access to all three types of bikes, since all causal members use all the three types of bikes available at Cyclist.

c. The membership program should allow unlimited weekends rides of 45 minutes (above average ride length) on classic bikes.

i. Casual users ride more often during weekends, and classic bikes are the bikes most often used.

ii. Membership programs should allow bikes to be ridden for longer than 30 minutes at a time, regardless of the type of bicycle.

d. A discount for afternoon and evening rides as we have more casual users at these times.

e. Docked bike member users should be given a discounted price since it is only used by casual users.

f. Provide discounted pricing for starting stations that have a record of usership for all types of riders (Streeter Dr & Grand Ave)

#Notable Places in the Area ##### • The Chicago Children’s Museum ##### • Lake Point Tower : Residential ##### • Navy Pier: Lake ##### • Milton Lee Olive: Public part

Cyclistic bike-share analysis

Ogunnubi Abiodun Tobi

2022-09-30

About the company

Business Task:

This project aims to Maximize the number of annual Cyclistic memberships across 692 stations in Chicago

• To understand how casual riders and annual members use Cyclistic bikes differently

• To design a new marketing strategy to convert casual riders into annual members

This project does not include:

• Implementing any solutions

• Any areas outside the 692 stations across Chicago, as defined by Cyclistic

• No historical trip data older than 12 months will be considered in the project

Data Preparation

Load libraries

STEP 1: COLLECT DATA

Set up work directory

STEP 2: WRANGLE DATA AND COMBINE INTO A SINGLE FILE

Combine dataset into R studio data frame

The combined dataset contains last 12 months of trip data(September 2021- August 2022).

To summarize my data

To get the structure of the data

STEP 3: CLEAN UP AND MANIPULATION TO PREPARE FOR ANALYSIS

make a copy of data

To remove duplicates

To remove null values

Remove irrelevant columns for my analysis (ride_id, start_station_id, end_station_id, start_lat, start_lng, end_lat, end_lng)

The ride id is distinct so i dont need it for my analysis

The start startion id, end station id are duplicates of startion name, end station name.

The latitude and longtitude is not needed for my analysis

To check available columns

To rename member_casual as usertype

To get an overview of the entire dataframe and check for inconsistencies and missing values

Replace blank with NA

Check number of colums with missing values

Fill NA’s value using Mode

Function to exclude NA values and find the mode from a column with NA excluded

Test on columns if the code is working

Replace the NA values by the mode(factor data)

To get an overview of the entire dataframe and check for inconsistencies and completeness

Make a copy of my data frame

To change started_at, ended_at column data types from char to date.

Capitilizing all first letter in my data frame for consistency.

Inspect available column

Make a copy of my data frame

There are a few problems we will need to fix:

Adding columns that list the date, month, day, year, day of the week, ride length, day type and day time of each ride

Add a “ride_length” calculation to all_trips (in seconds), type of day and time period

Convert “ride_length” from Factor to numeric so we can run calculations on the data

Copy of my data

identifying outliners

The dataframe includes a few hundred entries when bikes ride_length was negative.

Customers who purchase single-ride or full-day passes are referred to as casual riders

Remove casual riders than have less than than 1 and more than 1440 mins

Remove member riders than have less than than 60 and leaving those more than 1440 mins.

Note: Customers who purchase single-ride or full-day passes are referred to as casual riders. I figured members can use ride for more than a day

There was 111,156 outliners for riders less than 1 min and greater than 1440 mins

To get an overview of the entire dataframe and check for inconsistencies and completeness

For my analysis i will need the following columns

Rideable_type,start_station_name,end_station_name,user_type, date, month,year,weekdays, ride_length, day_type, time_period.

Additional table to drop before analysis.

To get an overview of the entire dataframe and check for inconsistencies and completeness

STEP 4: CONDUCT DESCRIPTIVE ANALYSIS

Total number of rides by each usertype and rideable type

Total number of ride by each usertype per the type of ride

Total number of rides and total ride length by each usertype and rideable type

Total number of rides and total ride length by each user type per the type of ride

Average and max ride length for each usertype

Average and max ride length for each rideable type

Average and max ride length for each usertype by rideable type

No of rides, ride length, average ride length, max ride length by usertype for each weekday

No of rides, ride length, average ride length, max ride length for rideable for each weekday

No of ride, ride length, average ride length, max ride length by each usertype for rideable type on each weekday

No of ride, ride length, average ride length, max ride length by usertype for each daytype

No of ride, ride length, average ride length, max ride length for rideable for each day type

No of ride, ride length, average ride length, max ride length by each usertype for rideable type on each daytype

No of ride, ride length, average ride length, max ride length by usertype for each time period

No of rides, ride length, average ride length, max ride length for rideable for each time period

No of rides, ride length, average ride length, max ride length by each usertype for rideable type on each time period

No of rides, ride length, average ride length, max ride length by usertype for each month

No of ride, ride length, average ride length, max ride length for rideable for each month