Cyclistic_Case_Study

1. ASK

Asking Questions

Goal of the Marketing Director: Design marketing strategies aimed at converting casual riders into annual members

Three questions will guide the future marketing program:

How do annual members and casual riders use Cyclistic bikes differently?
Why would casual riders buy Cyclistic annual memberships?
How can Cyclistic use digital media to influence casual riders to become members?

This Case Study focuses on the first question.

What is the Business Task ?

“Analyzing the difference in usage patterns of casual riders and annual members with the aim to convert casual riders into annual members”

Who are Key Stakeholders ?

The Director of Marketing: responsible for the development of campaigns and initiatives to promote the bike-share program. These may include email, social media, and other channels.
Cyclistic Executive Team: The notoriously detail-oriented executive team will decide whether to approve the recommended marketing program.

2. DATA PREPARATION

The data is on an AWS server where it is easily downloadable and named correctly. I have downloaded the previous 12 months data and stored it locally for the next steps in this process. It is organized by year and Fiscal Quarters. The data is reliable and original since it comes from the company.This Data is ROCCC.

3. DATA PROCESSING

Starting with the Process and installing the packages and libraries required for this job

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## ── Attaching packages
## ───────────────────────────────────────
## tidyverse 1.3.2 ──

## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ stringr 1.4.0
## ✔ tidyr   1.2.0     ✔ forcats 0.5.1
## ✔ readr   2.1.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()

library(skimr)
library(janitor)

## 
## Attaching package: 'janitor'
## 
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

library(readr)
library(ggmap)

## Google's Terms of Service: https://cloud.google.com/maps-platform/terms/.
## Please cite ggmap if you use it! See citation("ggmap") for details.

library(lubridate)

## 
## Attaching package: 'lubridate'
## 
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

Loading Cyclist Data in R

Because of limited RAM Available on RStudio, here we are using only 6 Datasets, because adding more datasets results in RStudio Crash.

trip_jan <- read_csv("D:/Manjeet/RStudio/Cyclistic_Bike_Data/202101-divvy-tripdata.csv")

## Rows: 96834 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

trip_feb <- read_csv("D:/Manjeet/RStudio/Cyclistic_Bike_Data/202102-divvy-tripdata.csv")

## Rows: 49622 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

trip_mar <- read_csv("D:/Manjeet/RStudio/Cyclistic_Bike_Data/202103-divvy-tripdata.csv")

## Rows: 228496 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

trip_apr <- read_csv("D:/Manjeet/RStudio/Cyclistic_Bike_Data/202104-divvy-tripdata.csv")

## Rows: 337230 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

trip_may <- read_csv("D:/Manjeet/RStudio/Cyclistic_Bike_Data/202105-divvy-tripdata.csv")

## Rows: 531633 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

trip_june <- read_csv("D:/Manjeet/RStudio/Cyclistic_Bike_Data/202106-divvy-tripdata.csv")

## Rows: 729595 Columns: 13
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (7): ride_id, rideable_type, start_station_name, start_station_id, end_...
## dbl  (4): start_lat, start_lng, end_lat, end_lng
## dttm (2): started_at, ended_at
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Binding Data Together

trip2021 <- rbind(trip_jan,trip_feb,trip_mar,trip_apr,trip_may,trip_june)

Checking Data Structure

str(trip2021)

## spec_tbl_df [1,973,410 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:1973410] "E19E6F1B8D4C42ED" "DC88F20C2C55F27F" "EC45C94683FE3F27" "4FA453A75AE377DB" ...
##  $ rideable_type     : chr [1:1973410] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXct[1:1973410], format: "2021-01-23 16:14:19" "2021-01-27 18:43:08" ...
##  $ ended_at          : POSIXct[1:1973410], format: "2021-01-23 16:24:44" "2021-01-27 18:47:12" ...
##  $ start_station_name: chr [1:1973410] "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" ...
##  $ start_station_id  : chr [1:1973410] "17660" "17660" "17660" "17660" ...
##  $ end_station_name  : chr [1:1973410] NA NA NA NA ...
##  $ end_station_id    : chr [1:1973410] NA NA NA NA ...
##  $ start_lat         : num [1:1973410] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:1973410] -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num [1:1973410] 41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:1973410] -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr [1:1973410] "member" "member" "member" "member" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

When we saw the structure of the file, we found that start and end date were in character format. We need to change that to date and time format.

Changing TIme and Data Format

trip2021$started_at = strptime(trip2021$started_at,"%Y-%m-%d %H:%M:%S")
trip2021$ended_at = strptime(trip2021$ended_at,"%Y-%m-%d %H:%M:%S")

Check Data Structure Again for Date and Time

str(trip2021)

## spec_tbl_df [1,973,410 × 13] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ ride_id           : chr [1:1973410] "E19E6F1B8D4C42ED" "DC88F20C2C55F27F" "EC45C94683FE3F27" "4FA453A75AE377DB" ...
##  $ rideable_type     : chr [1:1973410] "electric_bike" "electric_bike" "electric_bike" "electric_bike" ...
##  $ started_at        : POSIXlt[1:1973410], format: "2021-01-23 16:14:19" "2021-01-27 18:43:08" ...
##  $ ended_at          : POSIXlt[1:1973410], format: "2021-01-23 16:24:44" "2021-01-27 18:47:12" ...
##  $ start_station_name: chr [1:1973410] "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" "California Ave & Cortez St" ...
##  $ start_station_id  : chr [1:1973410] "17660" "17660" "17660" "17660" ...
##  $ end_station_name  : chr [1:1973410] NA NA NA NA ...
##  $ end_station_id    : chr [1:1973410] NA NA NA NA ...
##  $ start_lat         : num [1:1973410] 41.9 41.9 41.9 41.9 41.9 ...
##  $ start_lng         : num [1:1973410] -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ end_lat           : num [1:1973410] 41.9 41.9 41.9 41.9 41.9 ...
##  $ end_lng           : num [1:1973410] -87.7 -87.7 -87.7 -87.7 -87.7 ...
##  $ member_casual     : chr [1:1973410] "member" "member" "member" "member" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   ride_id = col_character(),
##   ..   rideable_type = col_character(),
##   ..   started_at = col_datetime(format = ""),
##   ..   ended_at = col_datetime(format = ""),
##   ..   start_station_name = col_character(),
##   ..   start_station_id = col_character(),
##   ..   end_station_name = col_character(),
##   ..   end_station_id = col_character(),
##   ..   start_lat = col_double(),
##   ..   start_lng = col_double(),
##   ..   end_lat = col_double(),
##   ..   end_lng = col_double(),
##   ..   member_casual = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Adding Trip Duration

trip2021<-mutate(trip2021,tripduration=difftime(ended_at,started_at, units = "secs"))

Preview Data

head(trip2021)

## # A tibble: 6 × 14
##   ride_id        ridea…¹ started_at          ended_at            start…² start…³
##   <chr>          <chr>   <dttm>              <dttm>              <chr>   <chr>  
## 1 E19E6F1B8D4C4… electr… 2021-01-23 16:14:19 2021-01-23 16:24:44 Califo… 17660  
## 2 DC88F20C2C55F… electr… 2021-01-27 18:43:08 2021-01-27 18:47:12 Califo… 17660  
## 3 EC45C94683FE3… electr… 2021-01-21 22:35:54 2021-01-21 22:37:14 Califo… 17660  
## 4 4FA453A75AE37… electr… 2021-01-07 13:31:13 2021-01-07 13:42:55 Califo… 17660  
## 5 BE5E8EB4E7263… electr… 2021-01-23 02:24:02 2021-01-23 02:24:45 Califo… 17660  
## 6 5D8969F88C773… electr… 2021-01-09 14:24:07 2021-01-09 15:17:54 Califo… 17660  
## # … with 8 more variables: end_station_name <chr>, end_station_id <chr>,
## #   start_lat <dbl>, start_lng <dbl>, end_lat <dbl>, end_lng <dbl>,
## #   member_casual <chr>, tripduration <drtn>, and abbreviated variable names
## #   ¹rideable_type, ²start_station_name, ³start_station_id
## # ℹ Use `colnames()` to see all variable names

Filter the Data

Here we’re filtering out ‘tripduaration’ greater than zero because is after Viewing Trip data we found a lot of values less than zero

trip2021 <- filter(trip2021,trip2021$tripduration>0)

Taking a Glimpse at data

glimpse(trip2021)

## Rows: 1,973,230
## Columns: 14
## $ ride_id            <chr> "E19E6F1B8D4C42ED", "DC88F20C2C55F27F", "EC45C94683…
## $ rideable_type      <chr> "electric_bike", "electric_bike", "electric_bike", …
## $ started_at         <dttm> 2021-01-23 16:14:19, 2021-01-27 18:43:08, 2021-01-…
## $ ended_at           <dttm> 2021-01-23 16:24:44, 2021-01-27 18:47:12, 2021-01-…
## $ start_station_name <chr> "California Ave & Cortez St", "California Ave & Cor…
## $ start_station_id   <chr> "17660", "17660", "17660", "17660", "17660", "17660…
## $ end_station_name   <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "Wood St & Augu…
## $ end_station_id     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, "657", "13258",…
## $ start_lat          <dbl> 41.90034, 41.90033, 41.90031, 41.90040, 41.90033, 4…
## $ start_lng          <dbl> -87.69674, -87.69671, -87.69664, -87.69666, -87.696…
## $ end_lat            <dbl> 41.89000, 41.90000, 41.90000, 41.92000, 41.90000, 4…
## $ end_lng            <dbl> -87.72000, -87.69000, -87.70000, -87.69000, -87.700…
## $ member_casual      <chr> "member", "member", "member", "member", "casual", "…
## $ tripduration       <drtn> 625 secs, 244 secs, 80 secs, 702 secs, 43 secs, 32…

Removing values that are missing/blank

trip2021<-trip2021 %>% 
  na.omit()

Adding the individual columns for date, day, month, year, day of the week to ease the in-depth analysis

trip2021$date<-as.Date(trip2021$started_at)
trip2021$month<-format(as.Date(trip2021$started_at),"%m")
trip2021$day<-format(as.Date(trip2021$started_at),"%d")
trip2021$year<-format(as.Date(trip2021$started_at),"%y")
trip2021$day_of_week<-format(as.Date(trip2021$started_at),"%A")

4. ANALYZE

The Data has been Prepared and Processed now ready for Descriptive Analysis

Here will be performing calculations on the cleaned, consistent data and identification of trends, patterns and relationships as well as statistical analysis by calculating mean, median, maximum and minimum on ride length column for both casual riders and members

Finding mean (total ride length/total rides), median(midpoint), max(longest), min(shortest) for ride_length

trip2021 %>% 
  group_by(member_casual) %>% summarise(average_trip_duration=mean(tripduration),median_duration=median(tripduration),max_trip_duration=max(tripduration),min_trip_duration=min(tripduration))

## # A tibble: 2 × 5
##   member_casual average_trip_duration median_duration max_trip_duration min_tr…¹
##   <chr>         <drtn>                <drtn>          <drtn>            <drtn>  
## 1 casual        2321.3509 secs        1116 secs       3356649 secs      1 secs  
## 2 member         841.1266 secs         621 secs         89738 secs      1 secs  
## # … with abbreviated variable name ¹min_trip_duration

Calculating Total No. of Rides.

trip2021 %>% 
  group_by(member_casual) %>% 
  summarise(ride_count=length(ride_id))

## # A tibble: 2 × 2
##   member_casual ride_count
##   <chr>              <int>
## 1 casual            740369
## 2 member            949760

Calculating average ride length and no. of rides as per day of the week.

trip2021 %>% 
  group_by(member_casual,day_of_week) %>% 
  summarise(number_of_rides=n(),average_trip_duration=mean(tripduration),.groups = "drop")

## # A tibble: 14 × 4
##    member_casual day_of_week number_of_rides average_trip_duration
##    <chr>         <chr>                 <int> <drtn>               
##  1 casual        Friday                98136 2260.3380 secs       
##  2 casual        Monday                84057 2187.2885 secs       
##  3 casual        Saturday             171318 2469.5763 secs       
##  4 casual        Sunday               155323 2657.3575 secs       
##  5 casual        Thursday              69671 1964.1777 secs       
##  6 casual        Tuesday               83191 2124.9971 secs       
##  7 casual        Wednesday             78673 2078.4810 secs       
##  8 member        Friday               134603  805.0850 secs       
##  9 member        Monday               131540  810.5589 secs       
## 10 member        Saturday             139237  934.6932 secs       
## 11 member        Sunday               124329  966.5500 secs       
## 12 member        Thursday             127236  779.0815 secs       
## 13 member        Tuesday              144874  806.9532 secs       
## 14 member        Wednesday            147941  794.4575 secs

Adding another column for different periods in a day

breaks<-hour((hm("00:00", "6:00", "12:00", "18:00", "23:59")))

labels <- c("Night", "Morning", "Afternoon", "Evening")

Defining Time of the Day

trip2021$time_of_the_trip <-cut(x=hour(trip2021$started_at), breaks = breaks, labels = labels, include.lowest = "true")

Comparing Ride Lengths between different Times of the Day

trip2021 %>% 
  group_by(member_casual,time_of_the_trip) %>% 
  summarise(number_of_rides=n(),average_trip_duration=mean(tripduration),.groups = "drop")

## # A tibble: 8 × 4
##   member_casual time_of_the_trip number_of_rides average_trip_duration
##   <chr>         <fct>                      <int> <drtn>               
## 1 casual        Night                      46680 2760.9889 secs       
## 2 casual        Morning                   162059 2100.8855 secs       
## 3 casual        Afternoon                 367788 2303.0961 secs       
## 4 casual        Evening                   163842 2455.1382 secs       
## 5 member        Night                      55862  761.2808 secs       
## 6 member        Morning                   280562  800.9260 secs       
## 7 member        Afternoon                 450785  877.1041 secs       
## 8 member        Evening                   162551  838.1800 secs

5. SHARE

In this phase, we’ll be sharing the gained insights and findings through effective data visualizations. Bar charts are used to share the above analysis.

Visualizing Total Rides taken by Members & Casual Riders

trip2021 %>% 
  group_by(member_casual) %>% 
  summarise(ride_count=length(ride_id)) %>% 
  ggplot()+geom_col(mapping = aes(x=member_casual,y=ride_count, fill=member_casual),show.legend = "false") + 
  labs(title = "Total No of Rides")

## Warning: `show.legend` must be a logical vector.

Visualizing the Days of the Week with No. of Rides taken by Riders

trip2021 %>% 
  group_by(member_casual, day_of_week) %>% 
  summarise(number_of_rides=n(), .groups = "drop") %>% 
  ggplot(aes(x = day_of_week, y = number_of_rides, fill = member_casual)) +
  labs(title = "Total Rides vs.Day of The Week") +
  geom_col(width = 0.5, position = position_dodge(width = 0.5)) +
  scale_y_continuous(labels = function(x) format(x,scientific = FALSE))

Visualizing Average Ride by Day of the Week

trip2021 %>% 
  group_by(member_casual,day_of_week) %>% 
  summarise(average_trip_duration=mean(tripduration), .groups = "drop") %>% 
  ggplot(aes(x = day_of_week, y = average_trip_duration, fill = member_casual)) +
  geom_col(width = 0.5, position = position_dodge(width = 0.5)) +
  labs(title = "Average Trip Duration vs. Day of The Week")

## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.

Visualizing average rides by Month

trip2021 %>% 
  group_by(member_casual, month) %>%
  summarise(average_ride_length = mean(tripduration), .groups = "drop") %>%
  ggplot(aes(x = month, y = average_ride_length, fill = member_casual)) +
  geom_col(width = 0.5, position = position_dodge(width = 0.5)) +
  labs(title = "Average ride length vs. month")

## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.

visualizing and Comparing Casual and Member Rides by Distance

trip2021 %>% 
  group_by(member_casual) %>%
  summarise(average_ride_distance = mean(tripduration)) %>%
  ggplot() + geom_col(mapping = aes(x = member_casual, y = average_ride_distance, fill = member_casual), show.legend = FALSE) +
  labs(title = "Mean Distance Travelled")

## Don't know how to automatically pick scale for object of type difftime. Defaulting to continuous.

Comparison of Total rides with the Type of Ride

trip2021 %>% 
  group_by(member_casual, rideable_type) %>%
  summarise(number_of_rides = n(), .groups = "drop") %>%
  ggplot() + geom_col(mapping = aes(x = rideable_type, y = number_of_rides, fill = member_casual), show.legend = TRUE) +
  labs(title = "Total no. of Rides vs. Ride Type")

Visualization on MAP Using

Creating a New DataFrame for Most Popular Routes > 200 Rides

coordinates_df <- trip2021 %>%
  filter(start_lat != end_lng & start_lng != end_lat) %>%
  group_by(start_lng, start_lat, end_lng, end_lat, member_casual, rideable_type) %>%
  summarise(total_rides = n(), .groups = "drop") %>%
  filter(total_rides > 200)

casual_riders <- coordinates_df %>%
  filter(member_casual == "casual")

member_riders <- coordinates_df %>%
  filter(member_casual =="member")

Loading the Map of Chicago

chicago <- c(left = -87.700424, bottom = 41.790769, right = -87.554855, top = 41.990119)
chicago_map <- get_stamenmap(bbox = chicago, zoom = 12, maptype = "terrain" )

## Source : http://tile.stamen.com/terrain/12/1050/1520.png

## Source : http://tile.stamen.com/terrain/12/1051/1520.png

## Source : http://tile.stamen.com/terrain/12/1050/1521.png

## Source : http://tile.stamen.com/terrain/12/1051/1521.png

## Source : http://tile.stamen.com/terrain/12/1050/1522.png

## Source : http://tile.stamen.com/terrain/12/1051/1522.png

## Source : http://tile.stamen.com/terrain/12/1050/1523.png

## Source : http://tile.stamen.com/terrain/12/1051/1523.png

Casual Riders Hotspots

ggmap(chicago_map, darken = c(0.1, "white")) +
  geom_point(casual_riders, mapping = aes(x = start_lng, y = start_lat, color = rideable_type), size = 2) + coord_fixed(0.8) +
  labs(title = "Casual Riders Hotspots", x=NULL, y=NULL) + theme(legend.position = "Right")

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

## Warning: Removed 3 rows containing missing values (geom_point).

Member Riders Hotspots

ggmap(chicago_map, darken = c(0.1, "white")) +
  geom_point(member_riders, mapping = aes(x = start_lng, y = start_lat, color = rideable_type), size = 2) + coord_fixed(0.8) +
  labs(title = "Member Riders Hotspots", x=NULL, y=NULL) + theme(legend.position = "Right")

## Coordinate system already present. Adding new coordinate system, which will replace the existing one.

## Warning: Removed 7 rows containing missing values (geom_point).

6. ACT

Now that we have finished creating visualizations, its time to act on our findings and proposing the Top 3 Recommendations based on our analysis.

Weekend Membership: As we found that most of the casual riders prefer riding on weekends more, thus a weekend membership can attract new casual riders as well as the existing ones and also the weekend membership benefits can be used to influence them for extended memberships.
Marketing and Promotional Campaigns: The busiest time of the year for Cyclistic is in the 3rd quarter of the year when rides are on its peak for both type of riders which is the best time for promotional activities and campaigns. Those can be conducted nearby riding hotspots. Classic bikes are used the most thus offerings can be created for those.
Discounts and Riding Competitions: Cyclistic can organize bike riding competitions with exciting prizes and can offer discounted yearly memberships to the participants.

Additional data like pricing details etc. could be used to expand our findings and scope of analysis but the provided data is sufficient to conclude our findings and accomplish the business task.

Resources Used

For ggmap: http://journal.r-project.org/archive/2013-1/kahle-wickham.pdf https://cran.r-project.org/web/packages/ggmap/citation.html

Cyclistic_Case_Study_1

ManjeetGulati

2022-08-06

1. ASK

Asking Questions

What is the Business Task ?

Who are Key Stakeholders ?

2. DATA PREPARATION

3. DATA PROCESSING

Loading Cyclist Data in R

Binding Data Together

Checking Data Structure

Changing TIme and Data Format

Check Data Structure Again for Date and Time

Adding Trip Duration

Preview Data

Filter the Data

4. ANALYZE

Finding mean (total ride length/total rides), median(midpoint), max(longest), min(shortest) for ride_length

Calculating Total No. of Rides.

Calculating average ride length and no. of rides as per day of the week.

Adding another column for different periods in a day

Defining Time of the Day

Comparing Ride Lengths between different Times of the Day

6. ACT

Resources Used

Cyclistic_Case_Study_1

ManjeetGulati

2022-08-06

How Does a Bike-Share Navigate Speedy Success?

About the Company

Scenario

1. ASK

Asking Questions

What is the Business Task ?

Who are Key Stakeholders ?

2. DATA PREPARATION

3. DATA PROCESSING

Loading Cyclist Data in R

Binding Data Together

Checking Data Structure

Changing TIme and Data Format

Check Data Structure Again for Date and Time

Adding Trip Duration

Preview Data

Filter the Data

4. ANALYZE

Finding mean (total ride length/total rides), median(midpoint), max(longest), min(shortest) for ride_length

Calculating Total No. of Rides.

Calculating average ride length and no. of rides as per day of the week.

Adding another column for different periods in a day

Defining Time of the Day

Comparing Ride Lengths between different Times of the Day

5. SHARE

Visualizing Total Rides taken by Members & Casual Riders

Visualizing the Days of the Week with No. of Rides taken by Riders

Visualizing Average Ride by Day of the Week

Visualizing average rides by Month

visualizing and Comparing Casual and Member Rides by Distance

Comparison of Total rides with the Type of Ride

Visualization on MAP Using

Creating a New DataFrame for Most Popular Routes > 200 Rides

Loading the Map of Chicago

Casual Riders Hotspots

Member Riders Hotspots

6. ACT

Resources Used