Kahlen Cheung
21-04-2022
# knitr::opts_chunk$set(echo = TRUE, fig.align = "left")
# knitr::opts_chunk$set(tidy = "styler")
library(tsibbledata)
library(tsibble)
library(tidyverse)
library(janitor)
library(lubridate)
The Data Analysis on NYC City Bikes Performance 2018 (the report)is based on the research on the usage of 10 bikes in 2018 by NYC Citi Bike There are a total of 4268 observations, covering customers’ gender, age, gender, bike usage type, as well as trips details including start and end times and locations.
The data used in this report is a public source stored in the Github tidyverse package. There is not direct consent authorized by NYC Citi Bikes or NYC Citi Bikes’s users. This report is only used for data analysis case study.
On behalf of the data collected by NYC Citi Bikes, there is not evidence show that it disclosure users’ private data: such as names, personal ID, or address etc. However, the Citi Bike app and the online payment system record users’ travelling pattern and consumption habits,which may cause the risk to expose personal behaviors to the third party without agreement.
This report explores the overall bike hiring service performance in 2018. The analysis focuses on the following indicators:
The original size of the dataset were 12 variables with 4268 observations. For a better evaluation and understanding of the topic, it is expanded with 6 more columns: the separated columns for year, month, weekday and date, age and travelled duration. The table is saved as nyc_bikes_mass.
knitr::kable(nyc_bikes_mass[1:5, 1:17], format = "markdown")
| bike_id | start_time | date | month | year | stop_time | travelled_time | start_station | start_lat | start_long | end_station | end_lat | end_long | type | age | birth_year | gender |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 26301 | 2018-02-26 19:11:03 | 26 | February | 2018 | 2018-02-26 19:15:40 | 4.618300 mins | 3186 | 40.71959 | -74.04312 | 3203 | 40.72760 | -74.04425 | Subscriber | 32 | 1986 | Male |
| 26301 | 2018-02-27 07:52:49 | 27 | February | 2018 | 2018-02-27 07:58:13 | 5.398733 mins | 3203 | 40.72760 | -74.04425 | 3202 | 40.72722 | -74.03376 | Subscriber | 39 | 1979 | Male |
| 26301 | 2018-02-27 12:03:27 | 27 | February | 2018 | 2018-02-27 12:04:54 | 1.442117 mins | 3202 | 40.72722 | -74.03376 | 3638 | 40.72429 | -74.03548 | Subscriber | 55 | 1963 | Male |
| 26301 | 2018-02-27 13:53:51 | 27 | February | 2018 | 2018-02-27 14:21:04 | 27.215750 mins | 3638 | 40.72429 | -74.03548 | 3638 | 40.72429 | -74.03548 | Subscriber | 34 | 1984 | Male |
| 26301 | 2018-02-27 14:30:42 | 27 | February | 2018 | 2018-02-27 14:33:11 | 2.475567 mins | 3638 | 40.72429 | -74.03548 | 3187 | 40.72112 | -74.03805 | Subscriber | 38 | 1980 | Male |
Change the weekday column to a proper weekday format, and save the table as nyc_bikes_mass_weekday.
knitr::kable(nyc_bikes_mass_weekday[1:5, 1:18], format = "markdown")
| bike_id | start_time | date | month | year | weekday | stop_time | travelled_time | start_station | start_lat | start_long | end_station | end_lat | end_long | type | age | birth_year | gender |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 26301 | 2018-02-26 19:11:03 | 26 | February | 2018 | Monday | 2018-02-26 19:15:40 | 4.618300 mins | 3186 | 40.71959 | -74.04312 | 3203 | 40.72760 | -74.04425 | Subscriber | 32 | 1986 | Male |
| 26301 | 2018-02-27 07:52:49 | 27 | February | 2018 | Tuesday | 2018-02-27 07:58:13 | 5.398733 mins | 3203 | 40.72760 | -74.04425 | 3202 | 40.72722 | -74.03376 | Subscriber | 39 | 1979 | Male |
| 26301 | 2018-02-27 12:03:27 | 27 | February | 2018 | Tuesday | 2018-02-27 12:04:54 | 1.442117 mins | 3202 | 40.72722 | -74.03376 | 3638 | 40.72429 | -74.03548 | Subscriber | 55 | 1963 | Male |
| 26301 | 2018-02-27 13:53:51 | 27 | February | 2018 | Tuesday | 2018-02-27 14:21:04 | 27.215750 mins | 3638 | 40.72429 | -74.03548 | 3638 | 40.72429 | -74.03548 | Subscriber | 34 | 1984 | Male |
| 26301 | 2018-02-27 14:30:42 | 27 | February | 2018 | Tuesday | 2018-02-27 14:33:11 | 2.475567 mins | 3638 | 40.72429 | -74.03548 | 3187 | 40.72112 | -74.03805 | Subscriber | 38 | 1980 | Male |
The dataset used in this report is in csv format, includes different range of types such as factor, date type integer and double.
glimpse(nyc_bikes_mass)
## Rows: 4,268
## Columns: 17
## Key: bike_id [10]
## $ bike_id <fct> 26301, 26301, 26301, 26301, 26301, 26301, 26301, 26301,…
## $ start_time <dttm> 2018-02-26 19:11:03, 2018-02-27 07:52:49, 2018-02-27 1…
## $ date <int> 26, 27, 27, 27, 27, 27, 27, 27, 27, 27, 28, 28, 1, 3, 3…
## $ month <ord> February, February, February, February, February, Febru…
## $ year <dbl> 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2018, 2…
## $ stop_time <dttm> 2018-02-26 19:15:40, 2018-02-27 07:58:13, 2018-02-27 1…
## $ travelled_time <drtn> 4.618300 mins, 5.398733 mins, 1.442117 mins, 27.215750…
## $ start_station <fct> 3186, 3203, 3202, 3638, 3638, 3187, 3638, 3639, 3202, 3…
## $ start_lat <dbl> 40.71959, 40.72760, 40.72722, 40.72429, 40.72429, 40.72…
## $ start_long <dbl> -74.04312, -74.04425, -74.03376, -74.03548, -74.03548, …
## $ end_station <fct> 3203, 3202, 3638, 3638, 3187, 3638, 3639, 3202, 3638, 3…
## $ end_lat <dbl> 40.72760, 40.72722, 40.72429, 40.72429, 40.72112, 40.72…
## $ end_long <dbl> -74.04425, -74.03376, -74.03548, -74.03548, -74.03805, …
## $ type <fct> Subscriber, Subscriber, Subscriber, Subscriber, Subscri…
## $ age <dbl> 32, 39, 55, 34, 38, 38, 29, 56, 26, 37, 46, 30, 32, 40,…
## $ birth_year <dbl> 1986, 1979, 1963, 1984, 1980, 1980, 1989, 1962, 1992, 1…
## $ gender <fct> Male, Male, Male, Male, Male, Male, Male, Male, Male, M…
There was a total of 4628 users in the year of 2018, which 3953 of the users were subscribers, while 315 of the users were customers.
| Gender | No. of Users |
|---|---|
| Male | 3069 |
| Female | 930 |
| Unknown | 269 |
| Total | 4268 |
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
30 is the most common users of all, followed by age range from 44 to 49.Suggestions
The company is suggested to offer a discount scheme to users aged under 30, which full time students may have the demand in renting bicycles as a cheaper transportation.
It shows that in the total number of 3096 male users, 2985 are subscribers (96%),
while in the total number of 930 female users, 883 are subscribers (94%).
It also shows that the number of male subscribers is nearly 300% of female subscribers.
Suggestions
Generally speaking, the number of female users is a lot fewer than male users. The company is suggested to plan and execute a promotion that is suitable for female users, for example, introduce a route that is full of restaurant and can be accessed easily by riding bicycle.
This visualizations contains the observations of gender, number of user and usage of each month in 2018.
The plot shows that in July and August, the bicycle hiring service has the highest volume of demand, while January and February has the least.
The mean of number of user in each month is 355, where only half of the year (May to October) meets the value.
Suggestions
The most popular bikes hiring service season is Summer (May to October), a Seasonal Pass for Winter may help to boost the renting performance during the months from November to March.
This visualizations reviews that Tuesdays has the most users, followed by Friday and Monday, while Sunday has the least users.
Generally speaking, there are more users from age group of 31-40 and 21-30 tend to use the service in weekdays.
Suggestions
Most of the users use the service due to in the work days, as for weekend, a discount package would suitable for people who want go for a bicycle trip in the city centre.
The most common travel time is less than 25 mins, where mainly covered by the age group of 21-30.
The age group of 41-50 tends to have a longer travel time (50 mins or more) than other age groups.
Suggestions
The company is suggested to offer a daily pass for users, which they can enjoy a sub-urban bicycle ride without time limitation; or to cooperate with other transports as an interchange service.
Also, a promotion targeting to people age over 50 would help expand the age range of users.
This geographic interprets that for all of the 52 stations in New York City, Richmond Rd(3186), Steinway St(3203), Sip Ave(3195) are the top 3 most popular stations(see in blue markers)
The warning markers show the stations have usage fewer 20, which mainly located in non city centre.
Suggestions
The company is suggested to build more stations in the sub urban area in order to enlarge the bicycle network and attract more new users. In contrast, removing those stations could be a solution to reduce the running cost, especially for the maintenance.
The End