Imported the partially cleaned excel files into data frames
I took the time to do some preliminary data cleaning in Excel prior
to bring the data over to R. The reason for this is that there are thing
that Excel is just naturally good at from a data cleaning perspective.
These include fixing data types, performing basic functions and
calculations, as well as, highlighting errors. One major caveat here is
that the excel files for this project were huge, in excess of 60MB or
over 100,000 rows of data. Given that while I do a fairly competent
laptop, it wouldn’t be efficient to try to do everything in Excel,
especially when R can be brought in handle some of the heavy
lifting.
january_tripdata <- read_excel("Divvy-TripData/202201-divvy-tripdata.xlsx",
sheet = "202201-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
february_tripdata <- read_excel("Divvy-TripData/202202-divvy-tripdata.xlsx",
sheet = "202202-divvy-tripdata", col_types = c("text",
"text", "date", "date","text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
march_tripdata <- read_excel("Divvy-TripData/202203-divvy-tripdata.xlsx",
sheet = "202203-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
april_tripdata <- read_excel("Divvy-TripData/202204-divvy-tripdata.xlsx",
sheet = "202204-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
may_tripdata <- read_excel("Divvy-TripData/202205-divvy-tripdata.xlsx",
sheet = "202205-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
june_tripdata <- read_excel("Divvy-TripData/202206-divvy-tripdata.xlsx",
sheet = "202206-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
july_tripdata <- read_excel("Divvy-TripData/202207-divvy-tripdata.xlsx",
sheet = "202207-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
august_tripdata <- read_excel("Divvy-TripData/202208-divvy-tripdata.xlsx",
sheet = "202208-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
september_tripdata <- read_excel("Divvy-TripData/202209-divvy-tripdata.xlsx",
sheet = "202209-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
october_tripdata <- read_excel("Divvy-TripData/202210-divvy-tripdata.xlsx",
sheet = "202210-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
november_tripdata <- read_excel("Divvy-TripData/202211-divvy-tripdata.xlsx",
sheet = "202211-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
december_tripdata <- read_excel("Divvy-TripData/202212-divvy-tripdata.xlsx",
sheet = "202212-divvy-tripdata", col_types = c("text",
"text", "date", "date", "text", "text", "text",
"numeric", "text", "text", "text",
"text", "numeric", "numeric", "numeric",
"numeric", "text"))
Analyze and Share
I combined these two aspects of the data analysis process here
because as you are reading this, I am fulfilling these aspects. I have
analyzed the data which I am also sharing with you below.
Summarized the newly clean data
Much of the heavy lifting for cleaning the data was done in excel,
but due to the enormous file sizes for the data sets used in this
analysis, it wouldn’t be practical to do all of summarisation there. So,
we tackle that aspect of things here, since R is much more
efficient.
summary(trips_summary)
## ride_id rideable_type started_at
## Length:4292473 Length:4292473 Min. :2022-01-01 00:00:05.00
## Class :character Class :character 1st Qu.:2022-05-29 05:01:35.00
## Mode :character Mode :character Median :2022-07-20 19:37:51.00
## Mean :2022-07-19 11:37:12.23
## 3rd Qu.:2022-09-14 17:45:46.00
## Max. :2022-12-31 23:59:26.00
## ended_at trip_month trip_day
## Min. :2022-01-01 00:01:48.00 Length:4292473 Length:4292473
## 1st Qu.:2022-05-29 05:40:24.00 Class :character Class :character
## Median :2022-07-20 19:55:37.00 Mode :character Mode :character
## Mean :2022-07-19 11:54:35.89
## 3rd Qu.:2022-09-14 18:01:26.00
## Max. :2023-01-01 18:09:37.00
## trip_time_period trip_duration start_station_name start_station_id
## Length:4292473 Min. : 1.00 Length:4292473 Length:4292473
## Class :character 1st Qu.: 6.25 Class :character Class :character
## Mode :character Median : 10.80 Mode :character Mode :character
## Mean : 17.39
## 3rd Qu.: 19.25
## Max. :34354.07
## end_station_name end_station_id start_lat start_lng
## Length:4292473 Length:4292473 Min. :41.65 Min. :-87.83
## Class :character Class :character 1st Qu.:41.88 1st Qu.:-87.66
## Mode :character Mode :character Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.64
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :45.64 Max. :-73.80
## end_lat end_lng member_casual
## Min. : 0.00 Min. :-87.83 Length:4292473
## 1st Qu.:41.88 1st Qu.:-87.66 Class :character
## Median :41.90 Median :-87.64 Mode :character
## Mean :41.90 Mean :-87.64
## 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :42.06 Max. : 0.00
str(trips_summary)
## tibble [4,292,473 × 17] (S3: tbl_df/tbl/data.frame)
## $ ride_id : chr [1:4292473] "578BA30BA1348F18" "5EE2D7C533CCC17B" "5AA216F2E2138811" "81F3141973924C8C" ...
## $ rideable_type : chr [1:4292473] "docked_bike" "docked_bike" "docked_bike" "docked_bike" ...
## $ started_at : POSIXct[1:4292473], format: "2022-01-01 01:00:05" "2022-01-06 19:07:45" ...
## $ ended_at : POSIXct[1:4292473], format: "2022-01-21 08:51:11" "2022-01-25 14:30:33" ...
## $ trip_month : chr [1:4292473] "JANUARY" "JANUARY" "JANUARY" "JANUARY" ...
## $ trip_day : chr [1:4292473] "SATURDAY" "THURSDAY" "THURSDAY" "WEDNESDAY" ...
## $ trip_time_period : chr [1:4292473] "Morning" "Evening" "Night" "Afternoon" ...
## $ trip_duration : num [1:4292473] 29271 27083 14238 9839 8531 ...
## $ start_station_name: chr [1:4292473] "Millennium Park" "Wabash Ave & Grand Ave" "Broadway & Belmont Ave" "Sedgwick St & Schiller St" ...
## $ start_station_id : chr [1:4292473] "13008" "TA1307000117" "13277" "TA1307000143" ...
## $ end_station_name : chr [1:4292473] "Fairfield Ave & Roosevelt Rd" "Base - 2132 W Hubbard Warehouse" "Avers Ave & Belmont Ave" "Larrabee St & Division St" ...
## $ end_station_id : chr [1:4292473] "KA1504000102" "Hubbard Bike-checking (LBS-WH-TEST)" "15640" "KA1504000079" ...
## $ start_lat : num [1:4292473] 41.9 41.9 41.9 41.9 41.9 ...
## $ start_lng : num [1:4292473] -87.6 -87.6 -87.6 -87.6 -87.6 ...
## $ end_lat : num [1:4292473] 41.9 41.9 41.9 41.9 41.9 ...
## $ end_lng : num [1:4292473] -87.7 -87.7 -87.7 -87.6 -87.7 ...
## $ member_casual : chr [1:4292473] "casual" "casual" "casual" "casual" ...
Dived Deeper into the analysis
What was the summary information for trip duration
summary(trips_summary$trip_duration)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 6.25 10.80 17.39 19.25 34354.07
What as the total number of rides based on Membership Type>
members_trip_total <- data.frame(table(trips_summary$member_casual))
colnames(members_trip_total) <- c("Membership Type", "Number of Rides")
members_trip_total
## Membership Type Number of Rides
## 1 casual 1731141
## 2 member 2561332
Quick Observation: There are almost
50% more Annual Members than Casual
Riders
What was the Total Duration of rides based
on Membership Type?
members_trip_total_duration <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = sum)
colnames(members_trip_total_duration) <- c("Membership Type", "Trip Duration (Minutes)")
members_trip_total_duration
## Membership Type Trip Duration (Minutes)
## 1 casual 42171615
## 2 member 32492675
Quick Observation Casual riders rode
for almost 30% longer than Annual Members in
total
What was the average trip duration for based on Membership Type?
members_trip_mean <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = mean)
colnames(members_trip_mean) <- c("Membership Type","Average Trip (minutes)")
members_trip_mean
## Membership Type Average Trip (minutes)
## 1 casual 24.36059
## 2 member 12.68585
Quick Observation: Casual Riders for
50% more than Annual Members on average
What were the longest and shortest trip duration based on Membership
Type?
members_trip_max <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = max)
colnames(members_trip_max) <- c("Membership Type", "Longest Trip (minutes)")
members_trip_max
## Membership Type Longest Trip (minutes)
## 1 casual 34354.067
## 2 member 1493.233
members_trip_min <- aggregate(trips_summary$trip_duration, list(trips_summary$member_casual), FUN = min)
colnames(members_trip_min) <- c("Membership Type", "Shortest Trip (minutes)")
members_trip_min
## Membership Type Shortest Trip (minutes)
## 1 casual 1
## 2 member 1
Looked at how popular certain weekdays are between Casual, Annual
Members and Overall
What was the most popular day of the week overall?
members_trip_weekday_popular <- data.frame(table(trips_summary$trip_day))
colnames(members_trip_weekday_popular) <- c("Weekday", "Frequency")
members_trip_weekday_popular
## Weekday Frequency
## 1 FRIDAY 598037
## 2 MONDAY 575757
## 3 SATURDAY 692890
## 4 SUNDAY 588199
## 5 THURSDAY 634734
## 6 TUESDAY 597226
## 7 WEDNESDAY 605630
Quick Observation: Saturday gets the most riders
overall, regardless of membership type
What was the most popular day based on Membership Type?
members_trip_popular_days <- data.frame(table(trips_summary$member_casual, trips_summary$trip_day))
colnames(members_trip_popular_days) <- c("Membership Type","Weekday","Frequency")
members_trip_popular_days
## Membership Type Weekday Frequency
## 1 casual FRIDAY 244971
## 2 member FRIDAY 353066
## 3 casual MONDAY 207530
## 4 member MONDAY 368227
## 5 casual SATURDAY 361592
## 6 member SATURDAY 331298
## 7 casual SUNDAY 296578
## 8 member SUNDAY 291621
## 9 casual THURSDAY 226558
## 10 member THURSDAY 408176
## 11 casual TUESDAY 193402
## 12 member TUESDAY 403824
## 13 casual WEDNESDAY 200510
## 14 member WEDNESDAY 405120
Visualized the result
ggplot(members_trip_popular_days, aes(x = `Weekday`, y = `Frequency`, fill = `Membership Type`)) +
geom_col(position = "dodge") +
geom_text(aes(label = `Frequency`), position = position_dodge(width = 0.9), vjust = -0.25)+
scale_y_continuous(labels = scales::comma) +
labs(x = "Weekday", y = "Frequency", fill = "Membership Type", title = "Most Popular Day based on Membership Type") +
theme_minimal()

Extracted the key results based on Membership Type
popular_casual_day <- members_trip_popular_days[members_trip_popular_days[, "Membership Type"] == "casual", ]
popular_member_day <- members_trip_popular_days[members_trip_popular_days[, "Membership Type"] == "member", ]
casual_most_popular_row <- which.max(popular_casual_day[, "Frequency"])
member_most_popular_row <- which.max(popular_member_day[, "Frequency"])
casual_most_popular_day <- popular_casual_day[casual_most_popular_row, c("Membership Type", "Weekday", "Frequency")]
member_most_popular_day <- popular_member_day[member_most_popular_row, c("Membership Type", "Weekday", "Frequency")]
most_popular_days <- rbind(casual_most_popular_day, member_most_popular_day)
most_popular_days
## Membership Type Weekday Frequency
## 5 casual SATURDAY 361592
## 10 member THURSDAY 408176
Quick Observation: For Casual Riders, the most
popular day of the week is Saturday, whereas for Annual Members its
Thursday
Exploring which time period is most popular based on Membership Type
and overall
What was the most popular time period overall?
members_trip_popular_time_period <- data.frame(table(trips_summary$trip_time_period))
colnames(members_trip_popular_time_period) <- c("Time Period", "Frequency")
members_trip_popular_time_period
## Time Period Frequency
## 1 Afternoon 1191788
## 2 Evening 636305
## 3 Morning 1157095
## 4 Night 1307285
Quick Observation: The most popular time period
overall was Night Time
What was the most popular time period based on membership type?
members_trip_popular_period <- data.frame(table(trips_summary$member_casual, trips_summary$trip_time_period))
colnames(members_trip_popular_period) <- c("Membership Type","Time Period","Frequency")
members_trip_popular_period
## Membership Type Time Period Frequency
## 1 casual Afternoon 517921
## 2 member Afternoon 673867
## 3 casual Evening 260318
## 4 member Evening 375987
## 5 casual Morning 382052
## 6 member Morning 775043
## 7 casual Night 570850
## 8 member Night 736435
Visualized the results
ggplot(members_trip_popular_period, aes(x = `Time Period`, y = `Frequency`, fill = `Membership Type`)) +
geom_col(position = "dodge") +
geom_text(aes(label = `Frequency`), position = position_dodge(width = 0.9), vjust = -0.25)+
scale_y_continuous(labels = scales::comma) +
labs(x = "Weekday", y = "Frequency", fill = "Membership Type", title = "Most Popular Time Period by Membership Type") +
theme_minimal()

Extracted the key results based on Membership Type
popular_casual_period <- members_trip_popular_period[members_trip_popular_period[, "Membership Type"] == "casual", ]
popular_member_period <- members_trip_popular_period[members_trip_popular_period[, "Membership Type"] == "member", ]
casual_most_popular_period_row <- which.max(popular_casual_period[, "Frequency"])
member_most_popular_period_row <- which.max(popular_member_period[, "Frequency"])
casual_most_popular_period <- popular_casual_period[casual_most_popular_period_row, c("Membership Type", "Time Period", "Frequency")]
member_most_popular_period <- popular_member_period[member_most_popular_period_row, c("Membership Type", "Time Period", "Frequency")]
most_popular_time_period <- rbind(casual_most_popular_period, member_most_popular_period)
most_popular_time_period
## Membership Type Time Period Frequency
## 7 casual Night 570850
## 6 member Morning 775043
Investigated how popular which months are between Casual and Annual
Members, as well as overall
What was the most popular month overall?
members_trip_popular_month_overall <- data.frame(table(trips_summary$trip_month))
colnames(members_trip_popular_month_overall) <- c("Month", "Frequency")
members_trip_popular_month_overall
## Month Frequency
## 1 APRIL 268528
## 2 AUGUST 594293
## 3 DECEMBER 132604
## 4 FEBRUARY 87649
## 5 JANUARY 79052
## 6 JULY 630891
## 7 JUNE 609773
## 8 MARCH 212888
## 9 MAY 494082
## 10 NOVEMBER 251113
## 11 OCTOBER 406305
## 12 SEPTEMBER 525295
Quick Observation: July was the
most popular month overall
what was the most popular month based on Membership Type?
members_trip_popular_month <- data.frame(table(trips_summary$member_casual, trips_summary$trip_month))
colnames(members_trip_popular_month) <- c("Membership Type","Month","Duration (Minutes)")
members_trip_popular_month
## Membership Type Month Duration (Minutes)
## 1 casual APRIL 90812
## 2 member APRIL 177716
## 3 casual AUGUST 265735
## 4 member AUGUST 328558
## 5 casual DECEMBER 30979
## 6 member DECEMBER 101625
## 7 casual FEBRUARY 14972
## 8 member FEBRUARY 72677
## 9 casual JANUARY 12481
## 10 member JANUARY 66571
## 11 casual JULY 306599
## 12 member JULY 324292
## 13 casual JUNE 287536
## 14 member JUNE 322237
## 15 casual MARCH 66401
## 16 member MARCH 146487
## 17 casual MAY 216932
## 18 member MAY 277150
## 19 casual NOVEMBER 72364
## 20 member NOVEMBER 178749
## 21 casual OCTOBER 148857
## 22 member OCTOBER 257448
## 23 casual SEPTEMBER 217473
## 24 member SEPTEMBER 307822
Visualized the results
members_trip_popular_month$`Month` <- as.factor(members_trip_popular_month$`Month`)
ggplot(members_trip_popular_month, aes(x = `Month`, y = `Duration (Minutes)`, fill = `Membership Type`)) +
geom_col(position = "dodge", width = 1) +
geom_text(aes(label = sprintf("%.2f", `Duration (Minutes)`)), position = position_dodge(width = 0.9), vjust = -0.25)+
scale_y_continuous(labels = scales::comma) +
labs(x = "Month", y = "Duration (Minutes)", fill = "Membership Type", title = "Trip Durarion by Month and Membership Type") +
theme_minimal()

Extracted the key results based on Membership Type
popular_casual_month <- members_trip_popular_month[members_trip_popular_month[, "Membership Type"] == "casual", ]
popular_member_month <- members_trip_popular_month[members_trip_popular_month[, "Membership Type"] == "member", ]
casual_most_popular_row2 <- which.max(popular_casual_month[, "Duration (Minutes)"])
member_most_popular_row2 <- which.max(popular_member_month[, "Duration (Minutes)"])
casual_most_popular_month <- popular_casual_month[casual_most_popular_row2, c("Membership Type", "Month", "Duration (Minutes)")]
member_most_popular_month <- popular_member_month[member_most_popular_row2, c("Membership Type", "Month", "Duration (Minutes)")]
most_popular_month <- rbind(casual_most_popular_month, member_most_popular_month)
most_popular_month
## Membership Type Month Duration (Minutes)
## 11 casual JULY 306599
## 4 member AUGUST 328558
Finally, explored how the most popular bicycle types based on
membership type
members_popular_bike <- data.frame(table(trips_summary$member_casual, trips_summary$rideable_type))
colnames(members_popular_bike) <- c("Membership Type","Bicycle Type","Number of rides")
members_popular_bike
## Membership Type Bicycle Type Number of rides
## 1 casual classic_bike 875958
## 2 member classic_bike 1682817
## 3 casual docked_bike 173342
## 4 member docked_bike 0
## 5 casual electric_bike 681841
## 6 member electric_bike 878515
Visualized the results
ggplot(members_popular_bike, aes(x = `Bicycle Type`, y = `Number of rides`, fill = `Membership Type`))+
geom_col(position = "dodge", width = 1)+
geom_text(aes(label = `Number of rides`), position = position_dodge(width = 0.9), vjust = -0.25)+
scale_y_continuous(labels = scales::comma) +
labs(x = "Bicycle Type", y = "Number of rides", Fill = "Membership Type", title = "Most Popular Bicycle Type by Number of Rides and Membership Type")

Repeated the entire analysis again, but using Bicycle Type as the
focus
The Average Trip Duration by Bicycle Type
bicycle_trip_mean <- aggregate(trips_summary$trip_duration, list(trips_summary$rideable_type), FUN = mean)
colnames(bicycle_trip_mean) <- c("Bicycle Type","Average Trip (minutes)")
bicycle_trip_mean
## Bicycle Type Average Trip (minutes)
## 1 classic_bike 17.32218
## 2 docked_bike 51.14780
## 3 electric_bike 13.76267
Longest Trip by Bicycle Type
bicycle_trip_max <- aggregate(trips_summary$trip_duration, list(trips_summary$rideable_type), FUN = max)
colnames(bicycle_trip_max) <- c("Bicycle Type", "Longest Trip (minutes)")
bicycle_trip_max
## Bicycle Type Longest Trip (minutes)
## 1 classic_bike 1499.417
## 2 docked_bike 34354.067
## 3 electric_bike 480.000
Shortest Trip by Bicycle Type
bicycle_trip_min <- aggregate(trips_summary$trip_duration, list(trips_summary$rideable_type), FUN = min)
colnames(bicycle_trip_min) <- c("Bicycle Type", "Shortest Trip (minutes)")
bicycle_trip_min
## Bicycle Type Shortest Trip (minutes)
## 1 classic_bike 1
## 2 docked_bike 1
## 3 electric_bike 1
What was the most popular Bicycle Type overall?
members_trip_popular_bike <- data.frame(table(trips_summary$rideable_type))
colnames(members_trip_popular_bike) <- c("Bicycle Type", "Number of rides")
members_trip_popular_bike
## Bicycle Type Number of rides
## 1 classic_bike 2558775
## 2 docked_bike 173342
## 3 electric_bike 1560356
Quick Observation: The most popular bicycle type
overall was the Classic Bike
What was the most Popular Day by Bicycle Type?
bicycle_trip_popular_days <- data.frame(table(trips_summary$rideable_type, trips_summary$trip_day))
colnames(bicycle_trip_popular_days) <- c("Bicycle Type","Weekday","Number of rides")
bicycle_trip_popular_days
## Bicycle Type Weekday Number of rides
## 1 classic_bike FRIDAY 349141
## 2 docked_bike FRIDAY 22806
## 3 electric_bike FRIDAY 226090
## 4 classic_bike MONDAY 346180
## 5 docked_bike MONDAY 21995
## 6 electric_bike MONDAY 207582
## 7 classic_bike SATURDAY 416944
## 8 docked_bike SATURDAY 40026
## 9 electric_bike SATURDAY 235920
## 10 classic_bike SUNDAY 353180
## 11 docked_bike SUNDAY 34882
## 12 electric_bike SUNDAY 200137
## 13 classic_bike THURSDAY 375396
## 14 docked_bike THURSDAY 19304
## 15 electric_bike THURSDAY 240034
## 16 classic_bike TUESDAY 358874
## 17 docked_bike TUESDAY 17366
## 18 electric_bike TUESDAY 220986
## 19 classic_bike WEDNESDAY 359060
## 20 docked_bike WEDNESDAY 16963
## 21 electric_bike WEDNESDAY 229607
Visualized the results
ggplot(bicycle_trip_popular_days, aes(x = `Weekday`, y = `Number of rides`, fill = `Bicycle Type`)) +
geom_col(position = "dodge") +
geom_text(aes(label = `Number of rides`), position = position_dodge(width = 0.9), vjust = -0.25)+
scale_y_continuous(labels = scales::comma) +
labs(x = "Weekday", y = "Number of rides", fill = "Bicycle Type", title = "Most Popular Weekday by Bicycle Type") +
theme_minimal()

Extracted the key results for each bicycle type
popular_classic_day <- bicycle_trip_popular_days[bicycle_trip_popular_days[, "Bicycle Type"] == "classic_bike", ]
popular_docked_day <- bicycle_trip_popular_days[bicycle_trip_popular_days[, "Bicycle Type"] == "docked_bike", ]
popular_electric_day <- bicycle_trip_popular_days[bicycle_trip_popular_days[, "Bicycle Type"] == "electric_bike", ]
classic_most_popular_row <- which.max(popular_classic_day[, "Number of rides"])
docked_most_popular_row <- which.max(popular_docked_day[, "Number of rides"])
electric_most_popular_row <- which.max(popular_electric_day[, "Number of rides"])
classic_most_popular_day <- popular_classic_day[classic_most_popular_row, c("Bicycle Type", "Weekday", "Number of rides")]
docked_most_popular_day <- popular_docked_day[docked_most_popular_row, c("Bicycle Type", "Weekday", "Number of rides")]
electric_most_popular_day <- popular_electric_day[electric_most_popular_row, c("Bicycle Type", "Weekday", "Number of rides")]
most_popular_bike_days <- rbind(classic_most_popular_day, docked_most_popular_day, electric_most_popular_day)
most_popular_bike_days
## Bicycle Type Weekday Number of rides
## 7 classic_bike SATURDAY 416944
## 8 docked_bike SATURDAY 40026
## 15 electric_bike THURSDAY 240034
Exploring which time period is most popular based on Bicycle
Type
What was the most popular time period based on bicycle type?
members_bike_popular_period <- data.frame(table(trips_summary$rideable_type, trips_summary$trip_time_period))
colnames(members_bike_popular_period) <- c("Bicycle Type","Time Period","Number of rides")
members_bike_popular_period
## Bicycle Type Time Period Number of rides
## 1 classic_bike Afternoon 696076
## 2 docked_bike Afternoon 61741
## 3 electric_bike Afternoon 433971
## 4 classic_bike Evening 398665
## 5 docked_bike Evening 22365
## 6 electric_bike Evening 215275
## 7 classic_bike Morning 686059
## 8 docked_bike Morning 35324
## 9 electric_bike Morning 435712
## 10 classic_bike Night 777975
## 11 docked_bike Night 53912
## 12 electric_bike Night 475398
Visualized the results
ggplot(members_bike_popular_period, aes(x = `Time Period`, y = `Number of rides`, fill = `Bicycle Type`)) +
geom_col(position = "dodge") +
geom_text(aes(label = `Number of rides`), position = position_dodge(width = 0.9), vjust = -0.25)+
scale_y_continuous(labels = scales::comma) +
labs(x = "Time Period", y = "Number of rides", fill = "Bicycle Type", title = "Most Popular Time Period by Bicycle Type") +
theme_minimal()

Extracted the key results based on Bicycle Type
popular_classic_time_period <- members_bike_popular_period[members_bike_popular_period[, "Bicycle Type"] == "classic_bike", ]
popular_docked_time_period <- members_bike_popular_period[members_bike_popular_period[, "Bicycle Type"] == "docked_bike", ]
popular_electric_time_period <- members_bike_popular_period[members_bike_popular_period[, "Bicycle Type"] == "electric_bike", ]
classic_most_popular_period_row <- which.max(popular_classic_time_period[, "Number of rides"])
docked_most_popular_period_row <- which.max(popular_docked_time_period[, "Number of rides"])
electric_most_popular_period_row <- which.max(popular_electric_time_period[, "Number of rides"])
classic_most_popular_period <- popular_classic_time_period[classic_most_popular_period_row, c("Bicycle Type", "Time Period", "Number of rides")]
docked_most_popular_period <- popular_docked_time_period[docked_most_popular_period_row, c("Bicycle Type", "Time Period", "Number of rides")]
electric_most_popular_period <- popular_electric_time_period[electric_most_popular_period_row, c("Bicycle Type", "Time Period", "Number of rides")]
most_popular_bike_time_period <- rbind(classic_most_popular_period, docked_most_popular_period, electric_most_popular_period)
most_popular_bike_time_period
## Bicycle Type Time Period Number of rides
## 10 classic_bike Night 777975
## 2 docked_bike Afternoon 61741
## 12 electric_bike Night 475398
What was the most popular month based on bicycle type?
bicycle_trip_popular_month <- data.frame(table(trips_summary$rideable_type, trips_summary$trip_month))
colnames(bicycle_trip_popular_month) <- c("Bicycle Type","Month","Number of rides")
bicycle_trip_popular_month
## Bicycle Type Month Number of rides
## 1 classic_bike APRIL 164281
## 2 docked_bike APRIL 11910
## 3 electric_bike APRIL 92337
## 4 classic_bike AUGUST 338611
## 5 docked_bike AUGUST 25646
## 6 electric_bike AUGUST 230036
## 7 classic_bike DECEMBER 72196
## 8 docked_bike DECEMBER 1860
## 9 electric_bike DECEMBER 58548
## 10 classic_bike FEBRUARY 58140
## 11 docked_bike FEBRUARY 1338
## 12 electric_bike FEBRUARY 28171
## 13 classic_bike JANUARY 53978
## 14 docked_bike JANUARY 939
## 15 electric_bike JANUARY 24135
## 16 classic_bike JULY 366908
## 17 docked_bike JULY 30313
## 18 electric_bike JULY 233670
## 19 classic_bike JUNE 399684
## 20 docked_bike JUNE 29962
## 21 electric_bike JUNE 180127
## 22 classic_bike MARCH 132472
## 23 docked_bike MARCH 8173
## 24 electric_bike MARCH 72243
## 25 classic_bike MAY 318546
## 26 docked_bike MAY 25931
## 27 electric_bike MAY 149605
## 28 classic_bike NOVEMBER 142429
## 29 docked_bike NOVEMBER 5736
## 30 electric_bike NOVEMBER 102948
## 31 classic_bike OCTOBER 210209
## 32 docked_bike OCTOBER 12268
## 33 electric_bike OCTOBER 183828
## 34 classic_bike SEPTEMBER 301321
## 35 docked_bike SEPTEMBER 19266
## 36 electric_bike SEPTEMBER 204708
Visualized the results
bicycle_trip_popular_month$`Month` <- as.factor(bicycle_trip_popular_month$`Month`)
ggplot(bicycle_trip_popular_month, aes(x = `Month`, y = `Number of rides`, fill = `Bicycle Type`)) +
geom_col(position = "dodge") +
geom_text(aes(label = `Number of rides`), position = position_dodge(width = 0.9), vjust = -0.25)+
scale_y_continuous(labels = scales::comma) +
labs(x = "Month", y = "FNumber of rides", fill = "Bicycle Type", title = "Most Popular Month by Bicycle Type") +
theme_minimal()

Extracted the key results for each bicycle type
popular_classic_month <- bicycle_trip_popular_month[bicycle_trip_popular_month[, "Bicycle Type"] == "classic_bike", ]
popular_docked_month <- bicycle_trip_popular_month[bicycle_trip_popular_month[, "Bicycle Type"] == "docked_bike", ]
popular_electric_month <- bicycle_trip_popular_month[bicycle_trip_popular_month[, "Bicycle Type"] == "electric_bike", ]
classic_most_popular_month_row <- which.max(popular_classic_month[, "Number of rides"])
docked_most_popular_month_row <- which.max(popular_docked_month[, "Number of rides"])
electric_most_popular_month_row <- which.max(popular_electric_month[, "Number of rides"])
classic_most_popular_month <- popular_classic_month[classic_most_popular_month_row, c("Bicycle Type", "Month", "Number of rides")]
docked_most_popular_month <- popular_docked_month[docked_most_popular_month_row, c("Bicycle Type", "Month", "Number of rides")]
electric_most_popular_month <- popular_electric_month[electric_most_popular_month_row, c("Bicycle Type", "Month", "Number of rides")]
most_popular_bike_month <- rbind(classic_most_popular_month, docked_most_popular_month, electric_most_popular_month)
most_popular_bike_month
## Bicycle Type Month Number of rides
## 19 classic_bike JUNE 399684
## 17 docked_bike JULY 30313
## 18 electric_bike JULY 233670