This is a hypothetical case study of a fictional bike-share company, Cyclistic. Based in Chicago, the company’s bike-sharing program features more than 5,800 bicycles and 600 docking stations. Customers who purchase single-ride or full-day passes are referred to as casual riders while customers who purchase annual memberships are cyclistic members.The director of marketing believes the company’s future success depends on maximizing the number of annual memberships. I, as a junior data analyst, am here to organise and analyze the given data to understand how annual members and casual riders use cyclistic bikes differently, make data-driven decisions and come up with positive recommendations that cyclistic executive team would approve of. For this, I shall follow the steps of the data analysis process: ask, prepare, process, analyze, share, and act.

ASK

What’s the problem we are trying to solve?

Business Task To analyse how different users, Annual members and Casual riders, use cyclistic bikes differently, and to further formulate effective marketing strategies that can transform casual riders into annual members.

Key Stakeholders Cyclistic executive team and Lily Moreno (Marketing Director and Manager)

PREPARE

Check for biases and determine the credibility of the data.

Data Source Dataset used for this analysis ROCCC (reliable, original, comprehensive, current, cited). The data has been made available by Motivate International inc click here.

Data does not contain personal information of the users.

I have taken previous 12 months data i.e, August 2021 to July 2022. The files are in .csv format.

Tools used:

R: Data Cleaning and Manipulation

Tableau: Visualization

PROCESS

1.To begin the process phase, I would install the necessary packages,

install.packages('tidyverse')
install.packages('dplyr')

2.Then, I would import and merge all 12 files into one and name that dataframe “BikeTripData”.

BikeTripData  <- list.files(path="C:/Users/Lenovo/Desktop/Cyclistic/BikeTripData.csv") lapply(read_csv) %>%    
  bind_rows
library(readr)
View(BikeTripData)

Removing NA’s (if any)

sum(is.na(BikeTripData))
BikeTripData<- na.omit(BikeTripData)

Adding necessary columns

3. To calculate the ride length of each user in minutes and hours, I would run the following code,

BikeTripData$ride_length_temp = difftime(BikeTripData$ended_at, BikeTripData$started_at, units = "mins")
BikeTripData$ride_length_temp = difftime(BikeTripData$ended_at, BikeTripData$started_at, units = "hours")
  1. To calculate the day of the week that each ride started and ended , I run the code,
BikeTripData$Start_day <- wday(BikeTripData$started_at, label=TRUE)
BikeTripData$End_day <- wday(BikeTripData$ended_at, label=TRUE)
  1. I, then, calculate the distance traveled by each user in a day,
install.packages("geosphere")
library(geosphere)
 mutate(BikeTripData, distance_km = distHaversine(cbind(start_lng, start_lat), cbind(end_lng, end_lat))*0.001)

Organizing data To make it easier to read and understand, I separate the Date and Time from started_at and ended_at columns and make new columns for Date and Time,

For separating Date in YYYY/MM/DD Format,

BikeTripData$Start_Date <- as.Date(BikeTripData$started_at)
BikeTripData$End_Date <- as.Date(BikeTripData$ended_at)

For separating Time in HH:MM:SS Format,

BikeTripData$Start_time <- format(as.POSIXct(BikeTripData$started_at),format = "%H:%M:%S")
BikeTripData$End_time <- format(as.POSIXct(BikeTripData$ended_at),format = "%H:%M:%S")

Deleting the irrelevant columns

6. ride_id: Since this column contains distinct identification numbers for each user, it limits my analysis to connect the pass purchases to identification numbers to determine if casual riders live in the cyclistic service area or if they have purchased multiple single passes.

BikeTripData = subset(BikeTripData, select = -c(ride_id)) 

7. started_at, ended_at: Since I have separated the Date and Time to new distinct columns,

BikeTripData = subset(BikeTripData, select = -c(started_at, ended_at)) 

ANALYZE

After organizing and formatting data, perform calculations and identify trends and relationships.

To count the number of stations frequented by users, I make a new table “t1”

#t1 <- BikeTripData %>%
+     group_by(member_casual, start_station_name) %>%
+     summarise(count_of=n()) %>%
+     arrange(desc(count_of)) %>%
+     na.omit(start_station_name)

{# A tibble: 2,590 × 3} # Groups: member_casual [2]

member_casual start_station_name count_of

<chr> <chr> <int>

1 casual Streeter Dr & Grand Ave 62984

2 casual DuSable Lake Shore Dr & Monroe St 33183

3 casual Millennium Park 29219

4 casual Michigan Ave & Oak St 28210

5 casual DuSable Lake Shore Dr & North Blvd 27312

6 member Kingsbury St & Kinzie St 26428

7 member Clark St & Elm St 23548

8 member Wells St & Concord Ln 23498

9 casual Shedd Aquarium 21709

10 member Wells St & Elm St 20787

# … with 2,580 more rows # ℹ Use `print(n = ...)` to see more rows

To know Top 5 frequented stations by casual riders,

 table1.1 <- filter(t1, member_casual =="casual") %>%
+     rename(number_of_trips = count_of) %>%
+     slice(1:5)

{# A tibble: 5 × 3} # Groups: member_casual [1]

member_casual start_station_name number_of_trips

<chr> <chr> <int>

1 casual Streeter Dr & Grand Ave 62984

2 casual DuSable Lake Shore Dr & Monroe St 33183

3 casual Millennium Park 29219

4 casual Michigan Ave & Oak St 28210

5 casual DuSable Lake Shore Dr & North Blvd 27312

To know Top 5 frequented stations by Annual Members,

 table1.2 <- filter(t1, member_casual =="member") %>%
+     rename(number_of_trips = count_of) %>%
+     slice(1:5)

{# A tibble: 5 × 3} # Groups: member_casual [1]

member_casual start_station_name number_of_trips

<chr> <chr> <int>

1 member Kingsbury St & Kinzie St 26428

2 member Clark St & Elm St 23548

3 member Wells St & Concord Ln 23498

4 member Wells St & Elm St 20787

5 member Ellis Ave & 60th St 20437

Users and Bike Preference:

Observation:

Observation:

Total rides(month)

Observation:

Average ride duration

Average ride duration per day

Observation:

*There is an increment by 94.7% seen in Casual riders and 77.4% in Annual members.

*The average distance increases from 1.93 km to 2.22 km (Casual Riders)

Recommendations: