Kahlen Cheung

21-04-2022

# knitr::opts_chunk$set(echo = TRUE, fig.align = "left")
# knitr::opts_chunk$set(tidy = "styler")

library(tsibbledata)
library(tsibble)
library(tidyverse)
library(janitor)
library(lubridate)

Overview

The Data Analysis on NYC City Bikes Performance 2018 (the report)is based on the research on the usage of 10 bikes in 2018 by NYC Citi Bike There are a total of 4268 observations, covering customers’ gender, age, gender, bike usage type, as well as trips details including start and end times and locations.

Data Ethic Guidline

The data used in this report is a public source stored in the Github tidyverse package. There is not direct consent authorized by NYC Citi Bikes or NYC Citi Bikes’s users. This report is only used for data analysis case study.

On behalf of the data collected by NYC Citi Bikes, there is not evidence show that it disclosure users’ private data: such as names, personal ID, or address etc. However, the Citi Bike app and the online payment system record users’ travelling pattern and consumption habits,which may cause the risk to expose personal behaviors to the third party without agreement.

Key Performance Indicators

This report explores the overall bike hiring service performance in 2018. The analysis focuses on the following indicators:

  • The number of users
  • The relations between gender and user types;
  • The relations between travel duration and age;
  • Monthly performance;
  • Weekly performance;
  • An overview of popularity of different stations

Report flow

  • Data cleaning and wrangling
  • Data analysis and visualizations
  • Key Suggestions

Preparing Data for Visualisation

The original size of the dataset were 12 variables with 4268 observations. For a better evaluation and understanding of the topic, it is expanded with 6 more columns: the separated columns for year, month, weekday and date, age and travelled duration. The table is saved as nyc_bikes_mass.

View dataset


knitr::kable(nyc_bikes_mass[1:5, 1:17], format = "markdown")
bike_id start_time date month year stop_time travelled_time start_station start_lat start_long end_station end_lat end_long type age birth_year gender
26301 2018-02-26 19:11:03 26 February 2018 2018-02-26 19:15:40 4.618300 mins 3186 40.71959 -74.04312 3203 40.72760 -74.04425 Subscriber 32 1986 Male
26301 2018-02-27 07:52:49 27 February 2018 2018-02-27 07:58:13 5.398733 mins 3203 40.72760 -74.04425 3202 40.72722 -74.03376 Subscriber 39 1979 Male
26301 2018-02-27 12:03:27 27 February 2018 2018-02-27 12:04:54 1.442117 mins 3202 40.72722 -74.03376 3638 40.72429 -74.03548 Subscriber 55 1963 Male
26301 2018-02-27 13:53:51 27 February 2018 2018-02-27 14:21:04 27.215750 mins 3638 40.72429 -74.03548 3638 40.72429 -74.03548 Subscriber 34 1984 Male
26301 2018-02-27 14:30:42 27 February 2018 2018-02-27 14:33:11 2.475567 mins 3638 40.72429 -74.03548 3187 40.72112 -74.03805 Subscriber 38 1980 Male

Change the weekday column to a proper weekday format, and save the table as nyc_bikes_mass_weekday.

View dataset


knitr::kable(nyc_bikes_mass_weekday[1:5, 1:18], format = "markdown")
bike_id start_time date month year weekday stop_time travelled_time start_station start_lat start_long end_station end_lat end_long type age birth_year gender
26301 2018-02-26 19:11:03 26 February 2018 Monday 2018-02-26 19:15:40 4.618300 mins 3186 40.71959 -74.04312 3203 40.72760 -74.04425 Subscriber 32 1986 Male
26301 2018-02-27 07:52:49 27 February 2018 Tuesday 2018-02-27 07:58:13 5.398733 mins 3203 40.72760 -74.04425 3202 40.72722 -74.03376 Subscriber 39 1979 Male
26301 2018-02-27 12:03:27 27 February 2018 Tuesday 2018-02-27 12:04:54 1.442117 mins 3202 40.72722 -74.03376 3638 40.72429 -74.03548 Subscriber 55 1963 Male
26301 2018-02-27 13:53:51 27 February 2018 Tuesday 2018-02-27 14:21:04 27.215750 mins 3638 40.72429 -74.03548 3638 40.72429 -74.03548 Subscriber 34 1984 Male
26301 2018-02-27 14:30:42 27 February 2018 Tuesday 2018-02-27 14:33:11 2.475567 mins 3638 40.72429 -74.03548 3187 40.72112 -74.03805 Subscriber 38 1980 Male


Data types

The dataset used in this report is in csv format, includes different range of types such as factor, date type integer and double.

glimpse(nyc_bikes_mass)
## Rows: 4,268
## Columns: 17
## Key: bike_id [10]
## $ bike_id        <fct> 26301, 26301, 26301, 26301, 26301, 26301, 26301, 26301,…
## $ start_time     <dttm> 2018-02-26 19:11:03, 2018-02-27 07:52:49, 2018-02-27 1…
## $ date           <int> 26, 27, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 1, 3, 3…
## $ month          <ord> February, February, February, February, February, Febru…
## $ year           <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2…
## $ stop_time      <dttm> 2018-02-26 19:15:40, 2018-02-27 07:58:13, 2018-02-27 1…
## $ travelled_time <drtn> 4.618300 mins, 5.398733 mins, 1.442117 mins, 27.215750…
## $ start_station  <fct> 3186, 3203, 3202, 3638, 3638, 3187, 3638, 3639, 3202, 3…
## $ start_lat      <dbl> 40.71959, 40.72760, 40.72722, 40.72429, 40.72429, 40.72…
## $ start_long     <dbl> -74.04312, -74.04425, -74.03376, -74.03548, -74.03548, …
## $ end_station    <fct> 3203, 3202, 3638, 3638, 3187, 3638, 3639, 3202, 3638, 3…
## $ end_lat        <dbl> 40.72760, 40.72722, 40.72429, 40.72429, 40.72112, 40.72…
## $ end_long       <dbl> -74.04425, -74.03376, -74.03548, -74.03548, -74.03805, …
## $ type           <fct> Subscriber, Subscriber, Subscriber, Subscriber, Subscri…
## $ age            <dbl> 32, 39, 55, 34, 38, 38, 29, 56, 26, 37, 46, 30, 32, 40,…
## $ birth_year     <dbl> 1986, 1979, 1963, 1984, 1980, 1980, 1989, 1962, 1992, 1…
## $ gender         <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, M…

Brief Summary

There was a total of 4628 users in the year of 2018, which 3953 of the users were subscribers, while 315 of the users were customers.

Gender No. of Users
Male 3069
Female 930
Unknown 269
Total 4268

Data visualizations for decision making

  1. User demographic Insights
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

  • The age at around 30 is the most common users of all, followed by age range from 44 to 49.

Suggestions

The company is suggested to offer a discount scheme to users aged under 30, which full time students may have the demand in renting bicycles as a cheaper transportation.

  • It shows that in the total number of 3096 male users, 2985 are subscribers (96%),

  • while in the total number of 930 female users, 883 are subscribers (94%).

  • It also shows that the number of male subscribers is nearly 300% of female subscribers.

Suggestions

Generally speaking, the number of female users is a lot fewer than male users. The company is suggested to plan and execute a promotion that is suitable for female users, for example, introduce a route that is full of restaurant and can be accessed easily by riding bicycle.

  1. Monthly Performance Insights

  • This visualizations contains the observations of gender, number of user and usage of each month in 2018.

  • The plot shows that in July and August, the bicycle hiring service has the highest volume of demand, while January and February has the least.

  • The mean of number of user in each month is 355, where only half of the year (May to October) meets the value.

Suggestions

The most popular bikes hiring service season is Summer (May to October), a Seasonal Pass for Winter may help to boost the renting performance during the months from November to March.

  1. Weekly Performance Insights

  • This visualizations reviews that Tuesdays has the most users, followed by Friday and Monday, while Sunday has the least users.

  • Generally speaking, there are more users from age group of 31-40 and 21-30 tend to use the service in weekdays.

Suggestions

Most of the users use the service due to in the work days, as for weekend, a discount package would suitable for people who want go for a bicycle trip in the city centre.

  1. Travel Duration Insights

  • The most common travel time is less than 25 mins, where mainly covered by the age group of 21-30.

  • The age group of 41-50 tends to have a longer travel time (50 mins or more) than other age groups.

Suggestions

The company is suggested to offer a daily pass for users, which they can enjoy a sub-urban bicycle ride without time limitation; or to cooperate with other transports as an interchange service.

Also, a promotion targeting to people age over 50 would help expand the age range of users.

  1. Location Insights
  • This geographic interprets that for all of the 52 stations in New York City, Richmond Rd(3186), Steinway St(3203), Sip Ave(3195) are the top 3 most popular stations(see in blue markers)

  • The warning markers show the stations have usage fewer 20, which mainly located in non city centre.

Suggestions

The company is suggested to build more stations in the sub urban area in order to enlarge the bicycle network and attract more new users. In contrast, removing those stations could be a solution to reduce the running cost, especially for the maintenance.




The End