Industry focus: Bike-share Company.
Problem Statement: Maximizing the number of annual memberships, converting casual riders into annual members.
Business Use Case: Identify the different use of the Cyclistic bikes between members and casual riders.
Other questions for the marketing analysis team are:
Deliverables:
Datasets available in: https://divvy-tripdata.s3.amazonaws.com/index.html
Cyclistic is a fictional company created for the purposes of this case study. In 2016, Cyclistic launched a successful bike-share offering. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system anytime.
Some Key Notes:
Stakeholders
I chose Rstudio to do the data wrangling, and later build the analysis.
# Setting up the enviroment
# install.packages("tidyverse")
# install.packages("lubridate")
# install.packages("janitor")
# install.packages("skimr")
# install.packages("geosphere")
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.4.0 ✔ purrr 1.0.0
## ✔ tibble 3.1.8 ✔ dplyr 1.0.10
## ✔ tidyr 1.2.1 ✔ stringr 1.5.0
## ✔ readr 2.1.3 ✔ forcats 0.5.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(lubridate)
##
## Attaching package: 'lubridate'
##
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(janitor)
##
## Attaching package: 'janitor'
##
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(skimr)
library(geosphere)
There are 12 data frames by each month of the year 2022 csv format.Before stacking data frames, all files were check in Excel to see their structure. Structure were compatible, so I stack them into one big data frame.
# Importing and stacking csv files in one data frame
trips22 <- list.files(path = "2023_Bikeshare_files", full.names = TRUE) %>%
lapply(read.csv) %>%
bind_rows
Before starting the cleaning, I am going to check data structure basics:
# Checking the new data frame
str(trips22)
## 'data.frame': 5667717 obs. of 13 variables:
## $ ride_id : chr "C2F7DD78E82EC875" "A6CF8980A652D272" "BD0F91DFF741C66D" "CBB80ED419105406" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-01-13 11:59:47" "2022-01-10 08:41:56" "2022-01-25 04:53:40" "2022-01-04 00:18:04" ...
## $ ended_at : chr "2022-01-13 12:02:44" "2022-01-10 08:46:17" "2022-01-25 04:58:01" "2022-01-04 00:33:00" ...
## $ start_station_name: chr "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Sheffield Ave & Fullerton Ave" "Clark St & Bryn Mawr Ave" ...
## $ start_station_id : chr "525" "525" "TA1306000016" "KA1504000151" ...
## $ end_station_name : chr "Clark St & Touhy Ave" "Clark St & Touhy Ave" "Greenview Ave & Fullerton Ave" "Paulina St & Montrose Ave" ...
## $ end_station_id : chr "RP-007" "RP-007" "TA1307000001" "TA1309000021" ...
## $ start_lat : num 42 42 41.9 42 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ end_lat : num 42 42 41.9 42 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ member_casual : chr "casual" "casual" "member" "casual" ...
head(trips22)
tail(trips22)
summary(trips22)
## ride_id rideable_type started_at ended_at
## Length:5667717 Length:5667717 Length:5667717 Length:5667717
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## start_station_name start_station_id end_station_name end_station_id
## Length:5667717 Length:5667717 Length:5667717 Length:5667717
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
##
## start_lat start_lng end_lat end_lng
## Min. :41.64 Min. :-87.84 Min. : 0.00 Min. :-88.14
## 1st Qu.:41.88 1st Qu.:-87.66 1st Qu.:41.88 1st Qu.:-87.66
## Median :41.90 Median :-87.64 Median :41.90 Median :-87.64
## Mean :41.90 Mean :-87.65 Mean :41.90 Mean :-87.65
## 3rd Qu.:41.93 3rd Qu.:-87.63 3rd Qu.:41.93 3rd Qu.:-87.63
## Max. :45.64 Max. :-73.80 Max. :42.37 Max. : 0.00
## NA's :5858 NA's :5858
## member_casual
## Length:5667717
## Class :character
## Mode :character
##
##
##
##
It is necessary converting to datetime the attributes started_at and ended_at attribute and calculate some columns, in order get the required information for analysis.
# Date
trips22$date <- as.Date(trips22$started_at)
# Year
trips22$year <- format(as.Date(trips22$date), "%Y")
# Month
trips22$month <- format(as.Date(trips22$date), "%m")
# Day
trips22$day <- format(as.Date(trips22$date), "%d")
# Day of the week
trips22$weekday <- format(as.Date(trips22$date),"%A")
# Part of the week
trips22 <- trips22 %>%
mutate(part_of_week = case_when(weekday == "Monday" ~ "Workday",
weekday == "Tuesday" ~ "Workday",
weekday == "Wednesday" ~ "Workday",
weekday == "Thursday" ~ "Workday",
weekday == "Friday" ~ "Workday",
weekday == "Saturday" ~ "Weekend",
weekday == "Sunday" ~ "Weekend"))
# Datetime
trips22$started_time <-strptime(trips22$started_at, "%Y-%m-%d %H:%M:%S")
trips22$ended_time <-strptime(trips22$ended_at, "%Y-%m-%d %H:%M:%S")
# Hour
trips22$hour <- trips22$started_at %>% hour()
# Calculating Ride length in seconds
trips22$ride_length_s <- difftime(trips22$ended_time, trips22$started_time)
trips22$ride_length_s<- as.numeric(as.character(trips22$ride_length_s))
is.numeric(trips22$ride_length_s)
## [1] TRUE
# Converting Ride Length from seconds into minutes in a new column
trips22 <- trips22 %>%
mutate(ride_length_m = ride_length_s/60)
trips22$ride_length_m <- round(trips22$ride_length_m, digits = 0)
# Calculating distance in kilometers in a new column
trips22 <- trips22 %>%
mutate(distance_km = distHaversine(cbind(trips22$start_lng, trips22$start_lat),
cbind(trips22$end_lng,
trips22$end_lat))*0.001)
# Checking the new attributes
str(trips22)
## 'data.frame': 5667717 obs. of 25 variables:
## $ ride_id : chr "C2F7DD78E82EC875" "A6CF8980A652D272" "BD0F91DFF741C66D" "CBB80ED419105406" ...
## $ rideable_type : chr "electric_bike" "electric_bike" "classic_bike" "classic_bike" ...
## $ started_at : chr "2022-01-13 11:59:47" "2022-01-10 08:41:56" "2022-01-25 04:53:40" "2022-01-04 00:18:04" ...
## $ ended_at : chr "2022-01-13 12:02:44" "2022-01-10 08:46:17" "2022-01-25 04:58:01" "2022-01-04 00:33:00" ...
## $ start_station_name: chr "Glenwood Ave & Touhy Ave" "Glenwood Ave & Touhy Ave" "Sheffield Ave & Fullerton Ave" "Clark St & Bryn Mawr Ave" ...
## $ start_station_id : chr "525" "525" "TA1306000016" "KA1504000151" ...
## $ end_station_name : chr "Clark St & Touhy Ave" "Clark St & Touhy Ave" "Greenview Ave & Fullerton Ave" "Paulina St & Montrose Ave" ...
## $ end_station_id : chr "RP-007" "RP-007" "TA1307000001" "TA1309000021" ...
## $ start_lat : num 42 42 41.9 42 41.9 ...
## $ start_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ end_lat : num 42 42 41.9 42 41.9 ...
## $ end_lng : num -87.7 -87.7 -87.7 -87.7 -87.6 ...
## $ member_casual : chr "casual" "casual" "member" "casual" ...
## $ date : Date, format: "2022-01-13" "2022-01-10" ...
## $ year : chr "2022" "2022" "2022" "2022" ...
## $ month : chr "01" "01" "01" "01" ...
## $ day : chr "13" "10" "25" "04" ...
## $ weekday : chr "Thursday" "Monday" "Tuesday" "Tuesday" ...
## $ part_of_week : chr "Workday" "Workday" "Workday" "Workday" ...
## $ started_time : POSIXlt, format: "2022-01-13 11:59:47" "2022-01-10 08:41:56" ...
## $ ended_time : POSIXlt, format: "2022-01-13 12:02:44" "2022-01-10 08:46:17" ...
## $ hour : int 11 8 4 0 1 18 18 12 7 15 ...
## $ ride_length_s : num 177 261 261 896 362 ...
## $ ride_length_m : num 3 4 4 15 6 3 17 12 25 7 ...
## $ distance_km : num 0.7 0.695 1.002 2.466 0.815 ...
Clean data is moved to a new data frame: trips_22_2
# Cleaning the dataframe of negative and zero times
trips22_2 <- trips22[(trips22$ride_length_s > 0),]
# Calculating the rows eliminated
nrow(trips22)-nrow(trips22_2)
## [1] 531
Removing duplicates. The rest of the cleaning and analysis is taking place in the new dataframe.
# Removing duplicates from Ride_id
trips22_2$ride_id[duplicated(trips22_2$ride_id)]
## [1] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [25] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [49] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [73] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [97] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [121] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [145] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [169] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [193] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [217] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [241] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [265] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [289] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [313] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [337] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [361] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [385] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [409] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [433] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [457] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [481] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [505] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [529] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [553] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [577] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [601] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [625] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [649] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [673] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [697] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [721] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [745] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [769] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [793] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [817] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [841] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [865] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [889] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [913] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [937] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [961] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [985] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
## [1009] NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA NA
Limitations: Data have missings in some attributes that that may cause inaccuracies in further analysis.
# Identifying missing data
sapply(trips22_2,function(x) sum(is.na(x)))
## ride_id rideable_type started_at ended_at
## 1026 1026 1026 1026
## start_station_name start_station_id end_station_name end_station_id
## 1026 1026 1026 1026
## start_lat start_lng end_lat end_lng
## 1026 1026 6884 6884
## member_casual date year month
## 1026 1026 1026 1026
## day weekday part_of_week started_time
## 1026 1026 1026 1026
## ended_time hour ride_length_s ride_length_m
## 1026 1026 1026 1026
## distance_km
## 6884
** Summary**
Where, * number_rides:* number of rides made by users * avg_duration:* average of the duration of the ride. * min_duration:* the shortest duration of a ride * max_duration:* the longest duration of a ride * avg_distance:* average distance of a ride * min_distance:* the closest distance of a ride * max_distance:* the farthest distance of a ride
# General stats
trips22_2 %>%
group_by(member_casual) %>%
summarize(number_rides = n(), avg_duration_m = mean(ride_length_m),
min_duration_m = min(ride_length_m), max_duration_m = max(ride_length_m),
avg_distance_km = mean(distance_km, na.rm = TRUE) ) %>%
drop_na()
# For distance
summary(trips22_2$distance_km)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.000 0.874 1.576 2.142 2.783 9825.063 6884
# For duration
trips22_2 %>%
group_by(member_casual) %>%
summarize(avg_duration_m = mean (ride_length_m), median_duration_m = median (ride_length_m),
max_duration_m = max (ride_length_m), min_duration_m = min (ride_length_m)) %>%
drop_na()
** Pie chart of total rides**
# Pie chart of total rides
trips22_2 %>%
group_by(member_casual) %>%
summarize(count_of = round(n()/5667186*100),0) %>%
drop_na() %>%
ggplot(aes(x = "", y = count_of, fill = member_casual)) +coord_polar(theta="y")+
geom_bar(stat = "identity")+ labs(title = "Number of rides by user type (in %)",
subtitle = "For the period between January to December of 2022",caption = "Total rides = 5,667,186",fill = "Member type")+theme_void() +
geom_text(aes(label=count_of), position=position_stack(vjust=0.5),color="white",size=5)
** Number of rides by user type, and day of the week**
# Number of rides by user type by day of the week
trips22_2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarize(number_rides_m = n()/1000
,avg_duration_m = mean(ride_length_m)) %>%
arrange(member_casual, weekday) %>%
drop_na() %>%
ggplot(aes(x = weekday, y = number_rides_m, fill = member_casual)) + facet_wrap(~member_casual)+
geom_col(position = "dodge") +
labs(title = "Number of rides by user type and day of the week",
subtitle = "For the period between January to December of 2022",
x = "Day of the week", y = "Number of rides (in thousands)", fill = "Member Type")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
** Number of rides by user type, and month**
# Number of rides by user type by month
trips22_2 %>%
mutate(month = month(started_at, label = TRUE)) %>%
group_by(member_casual, month) %>%
summarize(number_of_rides_m = n()/1000
,avg_duration_m = mean(ride_length_m)) %>%
arrange(member_casual, month) %>%
drop_na() %>%
ggplot(aes(x = month, y = number_of_rides_m)) +
geom_point(aes(group = member_casual, color = member_casual)) +
geom_line(aes(group = member_casual, color = member_casual)) +
labs(title = "Number of rides by user type and month",
subtitle = "For the period between January to December of 2022",
x = "Month", y = "Number of rides (in thousands)", fill = "Member Type")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
** Number of rides by day hour, part of the week, and user**
# Number of rides by day hour, part of the week, and user
trips22_2 %>%
group_by(member_casual, hour, part_of_week) %>%
summarize(number_of_rides_m = n()/1000
,avg_duration_m = mean(ride_length_m)) %>%
arrange(member_casual, hour) %>%
drop_na() %>%
ggplot(aes(x = hour, y = number_of_rides_m)) +
geom_point(aes(group = member_casual, color = member_casual)) +
geom_line(aes(group = member_casual, color = member_casual)) +
facet_wrap(~part_of_week)+labs(title = "Number of rides by user type, hour, and part of the week",
subtitle = "For the period between January to December of 2022",
x = "Started hour", y = "Number of rides (in thousands)",
fill = "Member Type")
## `summarise()` has grouped output by 'member_casual', 'hour'. You can override
## using the `.groups` argument.
** Average distance by member type, and part of the week**
# Average distance by member type, and part of the week
trips22_2 %>%
group_by(member_casual, part_of_week) %>%
summarize(avg_distance_km = mean(distance_km, na.rm = TRUE)) %>%
drop_na() %>%
ggplot(aes(x = part_of_week, y = avg_distance_km, fill = member_casual))+facet_wrap(~member_casual)+
geom_col(position = "dodge")+ labs(title = "Average distance by user type and part of the week",
subtitle = "For the period between January to December of 2022",
x = "Part of the week", y = "Average distance (in km)", fill = "Member Type")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
** Average distance by member type, and month**
# Average distance by member type and month -->ok
trips22_2 %>%
mutate(month = month(started_at, label = TRUE)) %>%
group_by(member_casual, month) %>%
summarize(avg_distance_km = mean(distance_km, na.rm = TRUE)) %>%
drop_na() %>%
ggplot(aes(x = month, y = avg_distance_km, fill = member_casual))+
geom_col(position = "dodge")+ labs(title = "Average distance by user type and month",
subtitle = "For the period between January to December of 2022",
x = "Month", y = "Average distance (in km)", fill = "Member Type")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
** Average duration of rides (in minutes) by user type, and month**
# Average duration of rides (in minutes) by user type, and month
trips22_2 %>%
mutate(month = month(started_at, label = TRUE)) %>%
group_by(member_casual, month) %>%
summarize(avg_duration_m = mean (ride_length_m)) %>%
drop_na() %>%
ggplot(aes(x = month, y = avg_duration_m, fill = member_casual))+facet_wrap(~member_casual)+
geom_col(position = "dodge")+ labs(title = "Average duration by user type and month",
subtitle = "For the period between January to December of 2022",
x = "Month", y = "Average duration (in minutes)", fill = "Member Type")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
** Average duration of rides (in minutes) by day of the week**
# Average duration of rides (in minutes) by day of the week
trips22_2 %>%
mutate(weekday = wday(started_at, label = TRUE)) %>%
group_by(member_casual, weekday) %>%
summarize(avg_duration_m = mean (ride_length_m)) %>%
drop_na() %>%
ggplot(aes(x = weekday, y = avg_duration_m, fill = member_casual))+facet_wrap(~member_casual)+
geom_col(position = "dodge")+ labs(title = "Average duration by user type and day of the week",
subtitle = "For the period between January to December of 2022",
x = "Day of the week", y = "Average duration (in minutes)", fill = "Member Type")
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
** Bike preference by user type (in miles of rides)**
# Bike preference by user type (in thousand of rides)
trips22_2 %>%
group_by(member_casual, rideable_type, part_of_week) %>%
summarize(count_of_m = n()/1000, part_of_week) %>%
drop_na() %>%
ggplot(aes(x = member_casual, y = count_of_m, fill = rideable_type)) + facet_wrap(~part_of_week)+
geom_bar(stat = "identity")+ labs(title = "Bike preference by user type",
subtitle = "For the period between January to December of 2022",
x = "User Type", y = "Number of rides (in thousands)", fill = "Bike Type")
## `summarise()` has grouped output by 'member_casual', 'rideable_type',
## 'part_of_week'. You can override using the `.groups` argument.
** Top 5 start stations**
First, I filtered out the start stations without name (NA)
# Filtering out start stations without names
top_5_start_st <- trips22_2 %>%
filter (start_station_name != "") %>%
group_by(member_casual, start_station_name) %>%
drop_na(start_station_name) %>%
summarize(count_of= n()) %>%
arrange(desc(count_of))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
Then, I ranked top 5 stations for members and casual users
# Top 5 start stations for members
top_5_start_member <- filter (top_5_start_st, member_casual == "member") %>%
slice(1:5)
print(top_5_start_member)
## # A tibble: 5 × 3
## # Groups: member_casual [1]
## member_casual start_station_name count_of
## <chr> <chr> <int>
## 1 member Kingsbury St & Kinzie St 24936
## 2 member Clark St & Elm St 22030
## 3 member Wells St & Concord Ln 21294
## 4 member University Ave & 57th St 19948
## 5 member Clinton St & Washington Blvd 19827
# Top 5 start stations for casual riders
top_5_start_casual <- filter (top_5_start_st, member_casual == "casual") %>%
slice(1:5)
print(top_5_start_casual)
## # A tibble: 5 × 3
## # Groups: member_casual [1]
## member_casual start_station_name count_of
## <chr> <chr> <int>
## 1 casual Streeter Dr & Grand Ave 58078
## 2 casual DuSable Lake Shore Dr & Monroe St 31850
## 3 casual Millennium Park 25519
## 4 casual Michigan Ave & Oak St 25263
## 5 casual DuSable Lake Shore Dr & North Blvd 23651
# Filtering out end stations without names
top_5_end_st <- trips22_2 %>%
filter (end_station_name != "") %>%
group_by(member_casual, end_station_name) %>%
drop_na(end_station_name) %>%
summarize(count_of= n()) %>%
arrange(desc(count_of))
## `summarise()` has grouped output by 'member_casual'. You can override using the
## `.groups` argument.
# Top 5 end stations for members
top_5_end_member <- filter (top_5_end_st, member_casual == "member") %>%
slice(1:5)
print(top_5_end_member)
## # A tibble: 5 × 3
## # Groups: member_casual [1]
## member_casual end_station_name count_of
## <chr> <chr> <int>
## 1 member Kingsbury St & Kinzie St 24634
## 2 member Clark St & Elm St 22361
## 3 member Wells St & Concord Ln 21912
## 4 member University Ave & 57th St 20531
## 5 member Clinton St & Washington Blvd 20529
# Top 5 end stations for casual riders
top_5_end_casual <- filter (top_5_end_st, member_casual == "casual") %>%
slice(1:5)
print(top_5_end_casual)
## # A tibble: 5 × 3
## # Groups: member_casual [1]
## member_casual end_station_name count_of
## <chr> <chr> <int>
## 1 casual Streeter Dr & Grand Ave 59864
## 2 casual DuSable Lake Shore Dr & Monroe St 29600
## 3 casual Millennium Park 26673
## 4 casual Michigan Ave & Oak St 26446
## 5 casual DuSable Lake Shore Dr & North Blvd 26139