In this LBB, i’m going to create visualization to examine the effect of rainfall on the number of cyclists at various bridges in New York. I’ll be working with a dataset of number of cyclist in New York Bridges on April 2016.
- Loading package
library(ggplot2)
library(GGally)
library(lubridate)
library(reshape2)
library(scales)- Import data
bike <- read.csv("nyc-bicycle.csv")
bike- Check data dimension
dim(bike)## [1] 210 11
- Check data summary
summary(bike)## X Date Day
## Min. : 0.00 2016-04-01 00:00:00: 7 2016-04-01 00:00:00: 7
## 1st Qu.: 52.25 2016-04-02 00:00:00: 7 2016-04-02 00:00:00: 7
## Median :104.50 2016-04-03 00:00:00: 7 2016-04-03 00:00:00: 7
## Mean :104.50 2016-04-04 00:00:00: 7 2016-04-04 00:00:00: 7
## 3rd Qu.:156.75 2016-04-05 00:00:00: 7 2016-04-05 00:00:00: 7
## Max. :209.00 2016-04-06 00:00:00: 7 2016-04-06 00:00:00: 7
## (Other) :168 (Other) :168
## High.Temp...F. Low.Temp...F. Precipitation Brooklyn.Bridge
## Min. :39.90 Min. :26.10 0 :119 Min. : 504
## 1st Qu.:55.00 1st Qu.:44.10 0.01 : 21 1st Qu.:1447
## Median :62.10 Median :46.90 0.09 : 21 Median :2380
## Mean :60.58 Mean :46.41 0.05 : 7 Mean :2270
## 3rd Qu.:68.00 3rd Qu.:50.00 0.15 : 7 3rd Qu.:3147
## Max. :81.00 Max. :66.00 0.16 : 7 Max. :3871
## (Other): 28
## Manhattan.Bridge Williamsburg.Bridge Queensboro.Bridge Total
## Min. : 997 Min. :1440 Min. :1306 Min. : 4335
## 1st Qu.:2617 1st Qu.:3282 1st Qu.:2457 1st Qu.: 9596
## Median :4165 Median :5194 Median :3477 Median :15292
## Mean :4050 Mean :4862 Mean :3353 Mean :14534
## 3rd Qu.:5309 3rd Qu.:6030 3rd Qu.:4192 3rd Qu.:18315
## Max. :6951 Max. :7834 Max. :5032 Max. :23318
##
- Convert data type
bike$Precipitation <- as.numeric(bike$Precipitation)
bike$Date <- ymd_hms(bike$Date)
bike$Day <- wday(bike$Day, label = T, abbr = F)
bike$Manhattan.Bridge <- as.numeric(bike$Manhattan.Bridge)
bike$Williamsburg.Bridge <- as.numeric(bike$Williamsburg.Bridge)
bike$Queensboro.Bridge <- as.numeric(bike$Queensboro.Bridge)
bike- Check correlation
ggcorr(bike,label=T)## Warning in ggcorr(bike, label = T): data in column(s) 'Date', 'Day' are not
## numeric and were ignored
- Plot data visualization
Relationship between precipitation levels and total bike crossings
ggplot(bike, aes(x = Precipitation, y = Total)) +
geom_point() +
ggtitle("Effect of Precipitation on Number of Cyclists Using New York Bridges ")+
stat_smooth(method = "lm")+
labs(y="Cyclist")+
scale_y_continuous(labels = comma)Insight: The number of Cyclists tends to decrease as the level of rainfall increases.
Day vs Total Cyclist
library(reshape2)bridge1 <- melt(bike, id.vars ="Day", measure.vars=c("Brooklyn.Bridge","Manhattan.Bridge","Williamsburg.Bridge","Queensboro.Bridge"))
bridge1ggplot(bridge1,aes(Day,value))+
geom_col(aes(fill=variable), position="dodge")+
scale_y_continuous(label=comma) +
labs(y="Total Cyclist")Insight: The above graph shows that everyday the total of cyclist in each NY bridges are almost the same. The mosttotal bicycle user is at Williamsburg bridge.
Precipitation Level vs Cyclist Numbers by Bridge
ggplot(bike, aes(x = Precipitation)) +
geom_smooth(aes(y = Brooklyn.Bridge, colour = 'Brooklyn.Bridge'), method = 'glm', method.args = list(family = 'poisson')) +
geom_smooth(aes(y = Manhattan.Bridge, colour = 'Manhattan Bridge'), method = 'glm', method.args = list(family = 'poisson')) +
geom_smooth(aes(y = Williamsburg.Bridge, colour = 'Williamsburg Bridge'), method = 'glm', method.args = list(family = 'poisson')) +
geom_smooth(aes(y = Queensboro.Bridge, colour = 'Queensboro Bridge'), method = 'glm', method.args = list(family = 'poisson')) +
scale_colour_brewer(type = "div", palette = "Dark2") +
labs(colour = 'Bridge', y='Total Cyclist') +
ggtitle("Effect of Precipitation Levels on Cyclist Numbers by Bridge")Insight: This shows the bridges most used by cyclists are Williamsburg Bridge, Manhattan Bridge, Queensboro Bridge and lastly Brooklyn Bridge. They all show the same trend of decreasing usage as rainfall levels increase but Manhattan Bridge shows the strongest trend.