The geographic map shows location of the Citibike stations I have taken and returned the bikes since I started be the membership in August 2014.
There are 113 bike stations in total out of
750 Citibike stations in the city as per their record on
October 2017.
The majority of stations I use locates in Manhattan midtown area and some stations in the downtown west. There are 6 stations I use in Long Island City, Queens and the rest of them are in the city. Noticeably, I have never used the service in the lower east and the upper part of the city. Part of the reason because the observations in the dataset are in 2014 to 2016 and at the time Citibike had not expanded their service to the upper area of the city.
Transitioning from the previous map, this geographic map with circle
markers shows the bike stations where I use measured by frequency of the
usage. The number of time are divided and grouped into 4 groups:
0-30 times is lightly use, 30-60 times is
moderate, 60-90 times is often use, and
90-150 times is considered as extreme use.
The map shows two hot spots that I use most often. One station is on
W 53th St & 10th Ave for 105 times and
the other is W 56 St & 6 Ave for
148 times.
There are four other stations that I often use in the midtown and one spot in the midtown east. These makes total senses since all these hot spots in orange, red, and purple are the areas where my work, school, and go-to places locate.
Here I create the alluvial diagram to show my top 13 routes based on the number of rides between the pairs of station.
The height of block represents the size of ride number relating to that station and width of stream field represents the number of rides accumulated between the connected stations.
The diagram shows two obvious routes that I ride most often which are
the route from W 56 St & 6 Ave to Broadway
& W 56 St for 54 rides and from 45 Rd
& 11 St to 46 Ave & 5 St for
31 rides. Note that all the routes that show here are over
10 rides.
This time-series plot with a combination of dot plot and line plot shows the trend of my daily bike usage.
By observing the dots, it shows that most of my daily use are
1-2 rides a day. In addition, it shows a trend of higher
usage in July through October which are the summer months. The most
obvious trend in 2015. The highest number of ride was on June 13th, 2015
with 7 rides in that day. What a busy day!
I would like to note here that the dataset used in this project actually includes observations in 2018. But since there is a big gap of a period that I paused my membership from August 2016 to May 2018. Those observations were removed and are not included in this plot for the purpose of increasing the ability to spot the trend.
This monthly rides time-series chart shows the same trend spotted on the daily chart. It gives a clearer picture of the trend.
Number of rides are accumulated by month with the highest point at
70 ridesin July 2015 and the lowest rides
at only 5 rides in February 2015. Again
this emphasizes the higher usage in summer and lower usage in winter
months trend.
This time-series chart looks similar to the previous two charts. The difference is that this chart is meant to show the trend by the trip duration or the amount of time spent on the rides. Each point represents the amount of time in minute accumulated in a day.
Again this shows the same seasonal trend as seen earlier, however the outlier or the longest time spent on riding was on September 6th, 2015 with an accumulated time of 2 hours in that day. Nevertheless, the higher trend can still be seen in the month of June and July 2015.
This monthly plot indicates the same trend with the outlier shifted to the month of June in 2015. The accumulated time is 16 hours.
Boxplot shows the distribution of time spent on the rides by days of week based on the summary statistics.
The median of trip duration is higher on the weekend with the values of 8.07 and 9 minutes on Saturday and Sunday respectively. Monday through Friday have lower median duration ranging between 4.30 to 6.44 minutes. The weekend also have higher variation of time while there are more outliers or trip duration that are higher but out of the range on the weekdays.
Bar plot compares accumulated number of rides by days of week. It looks like I ride more often on weekdays than on weekends. The highest number of rides is on Friday and the lowest rides on Saturday. The summaries are as follow:
| Day | Total count |
|---|---|
| Monday | 115 |
| Tuesday | 74 |
| Wednesday | 110 |
| Thursday | 85 |
| Friday | 164 |
| Saturday | 44 |
| Sunday | 79 |
Accompanying with the previous boxplot, this means that while I ride more often on weekdays but I tend to ride longer on the weekends.
Stacked-bar plot show distribution of number of rides by hours of day. This gives the ability to compare between weekends and weekdays usage by time of day. The time are grouped into six ranges.
3-6pm seems to be my peak time every day. I also bike more in the early hours on weekdays than weekends and I am obviously not a night riders however that seems to happen occasionally. Below is the summary of number of rides by time of day:
| Hours | Total count |
|---|---|
| 6-9AM | 167 |
| 9-12AM | 145 |
| 12-3PM | 92 |
| 3-6PM | 230 |
| 6-9PM | 29 |
| After 9PM | 8 |
Yes! Biking is not just a sport but also a recreational activity and a way of exercising for most of people. And there are many more hidden benefits that may never cross our mind. Biking can save our bucks and environment!
According to the Citibike and the EPA (Environmental Protection
Agency) website, an estimated distance traveled based on the total usage
time with an assumed average speed of 7.456 miles per hour.
Gas saved is an estimate using the distance traveled multiplied by
0.041 gallon per mile (24.1 mpg). And C02 reduced is
estimated using the distance traveled multiplied by
0.812 lbs. per mile.
These three bar plots shows the same pattern as what we see on the previous time-series chart by trip duration since the numbers are calculated based on the total usage time.
Source:
www.citibikenyc.com http://www.epa.gov/otaq/consumer/420f08024.pdf
---
title: "The Quantified Self: Citybike Usage Analysis"
author: 'by Thanita Sokphoodsa #222320'
output:
flexdashboard::flex_dashboard:
#storyboard: true
social: menu
source: embed
---
```{r setup, include=FALSE}
#####=============== Setup =================#####
#setwd("~/Desktop/HU/ANLY512/Project") #set working directory
library(flexdashboard)
library(readr)
library(leaflet)
library(dygraphs)
library(dplyr)
library(plyr)
library(htmlwidgets)
library(ggplot2)
library(ggthemes)
library(reshape)
library(data.table)
library(reshape2)
library(plotly)
library(ggmap)
library(xts)
library(Quandl)
library(scales)
library(xts)
library(lubridate)
####============ Read-in data ===========#####
citibike <- read_csv("citibikeData.csv")
stationCodes <- read_csv("stationCodes.csv")
#============== cleanup a bit =============#
#create tripDate field
citibike$date <- as.Date(factor(citibike$Start),format="%m/%d/%y")
# convert character to POSIXct
citibike$startTime <- as.POSIXct(strptime(citibike$Start,"%m/%d/%y %H:%M"))
citibike$endTime <- as.POSIXct(strptime(citibike$End,"%m/%d/%y %H:%M"))
#extract time
citibike$startTime <- strftime(citibike$startTime,format="%H:%M")
citibike$endTime <- strftime(citibike$endTime,format="%H:%M")
citibike$hour <- as.numeric(substr(citibike$startTime,1,2))
#new variables
citibike$wday <- wday(citibike$date)
#rename weekdays
citibike$wday <- ifelse(citibike$wday==1,"Sunday",citibike$wday)
citibike$wday <- ifelse(citibike$wday==2,"Monday",citibike$wday)
citibike$wday <- ifelse(citibike$wday==3,"Tuesday",citibike$wday)
citibike$wday <- ifelse(citibike$wday==4,"Wednesday",citibike$wday)
citibike$wday <- ifelse(citibike$wday==5,"Thursday",citibike$wday)
citibike$wday <- ifelse(citibike$wday==6,"Friday",citibike$wday)
citibike$wday <- ifelse(citibike$wday==7,"Saturday",citibike$wday)
citibike$wday <- factor(citibike$wday,
levels = c("Monday","Tuesday","Wednesday","Thursday","Friday","Saturday","Sunday"))
citibike$tripSec <- citibike$tripMin * 60
citibike$tripHr <- citibike$tripMin / 60
citibike$distMile <- citibike$tripHr * 7.456
citibike$gassavedGal <- citibike$distMile *0.041
citibike$co2reducedLbs <- citibike$distMile *0.812
#month
citibike$mth1 <- as.numeric(month(citibike$date))
citibike$mth <- as.numeric(citibike$mth1)
citibike$mth <- ifelse(citibike$mth1==1, "Jan", citibike$mth)
citibike$mth <- ifelse(citibike$mth1==2, "Feb",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==3, "Mar",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==4, "Apr",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==5, "May",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==6, "Jun",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==7, "Jul",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==8, "Aug",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==9, "Sep",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==10, "Oct",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==11, "Nov",citibike$mth)
citibike$mth <- ifelse(citibike$mth1==12, "Dec",citibike$mth)
citibike$mth <- factor(citibike$mth,
levels = c("Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"))
#year
citibike$yr <- year(citibike$date)
#number of ride 1 for all
citibike$numRide <- rep(1, times= 671)
#group Hours
#summary(citibike$hour)
citibike$groupHr <- cut(citibike$hour,
breaks=c(0,6,9,12,15,18,21,24),
labels=c("Before 6AM","6-9AM","9-12AM","12-3PM","3-6PM","6-9PM","After 9PM"))
#weekdays
citibike$wdayCat <- as.character(citibike$wday)
citibike$wdayCat <- ifelse(citibike$wdayCat == "Monday", "Weekday", citibike$wdayCat)
citibike$wdayCat <- ifelse(citibike$wdayCat == "Tuesday", "Weekday", citibike$wdayCat)
citibike$wdayCat <- ifelse(citibike$wdayCat == "Wednesday", "Weekday", citibike$wdayCat)
citibike$wdayCat <- ifelse(citibike$wdayCat == "Thursday", "Weekday", citibike$wdayCat)
citibike$wdayCat <- ifelse(citibike$wdayCat == "Friday", "Weekday", citibike$wdayCat)
citibike$wdayCat <- ifelse(citibike$wdayCat == "Saturday", "Weekend", citibike$wdayCat)
citibike$wdayCat <- ifelse(citibike$wdayCat == "Sunday", "Weekend", citibike$wdayCat)
citibike$wdayCat <- factor(citibike$wdayCat,
levels = c("Weekday","Weekend"))
# combine all station to one column
names(citibike)
stationList <- as.data.frame(c(citibike$fromStation, citibike$toStation))
# extract only unique value
stationList <- as.data.frame(unique(stationList$`c(citibike$fromStation, citibike$toStation)`))
# change var name
stationList <- rename(stationList, location = "unique(stationList$`c(citibike$fromStation, citibike$toStation)`)")
# change class to charactor
stationList$location <- as.character(stationList$location)
tail(stationList)
numStations <- count(stationList)
sum(numStations$freq)
# NumRides by day of week
a <- select(citibike, wday, numRide)%>%
group_by(wday)%>%
summarise(numRide = sum(numRide, na.rm = T))
time1 <- select(citibike, groupHr, numRide)%>%
group_by(groupHr)%>%
summarise(numRide = sum(numRide))
citibike4dist <- filter(citibike, tripMin <1000 )
myRides <- select(citibike, date, mth, mth1, yr, numRide, tripMin, tripHr)
#numRides/day
rideDay2 <- group_by(myRides, mth, yr, date) %>%
summarise(numRide = sum(numRide, na.rm = T)) %>%
arrange(yr)
#numRides/mth
rideMth2 <- group_by(myRides, mth, yr) %>%
summarise(numRide = sum(numRide, na.rm = T))%>%
arrange(yr)
#numRides/yr
rideYr2 <- group_by(myRides, yr) %>%
summarise(numRide = sum(numRide, na.rm = T))%>%
arrange(yr)
#Duration
myRides4dura <- select(citibike, date, mth, mth1, yr, numRide, tripMin, tripHr) %>%
filter(tripMin < 1000)
#head(myRides4dura)
#duration/day
durDay2 <- group_by(myRides4dura, mth, yr, date) %>%
summarise(duration = sum(tripMin, na.rm = T)) %>%
arrange(yr)
#duration/mth
durMth2 <- group_by(myRides4dura, mth, mth1, yr) %>%
summarise(duration = sum(tripHr, na.rm = T)) %>%
arrange(yr, mth1)
#duration/yr
durYr2 <- group_by(myRides4dura, yr) %>%
summarise(duration = sum(tripHr, na.rm = T))
# ====================================================== #
```
Summary
=========================================
Row
-----------------------------------------------------------------------
### Total stations used
```{r}
stations <- sum(numStations$freq)
valueBox(stations, icon = "fas fa-map-marker")
```
### Highest daily rides
```{r}
highRide <- max(rideDay2$numRide)
valueBox(highRide, icon = "fas fa-signal", color = "pink")
```
### Highest Monthly rides
```{r}
highRidemth <- max(rideMth2$numRide)
valueBox(highRidemth, icon = "fas fa-signal", color = "pink")
```
### Highest yearly rides
```{r}
highRideyr <- max(rideYr2$numRide)
valueBox(highRideyr, icon = "fas fa-signal", color = "pink")
```
### Longest daily trip duration (minutes)
```{r}
longRideday <- max(durDay2$duration)
valueBox(longRideday, icon = "fal fa-hourglass-start", color = "purple")
```
Row
-----------------------------------------------------------------------
### Number of rides
```{r sumRides}
rides <- sum(citibike$numRide)
valueBox(rides, icon = "fas fa-bicycle")
```
### Average rides per day
```{r avgDayride}
averageDayride <- round(mean(rideDay2$numRide, na.rm = T))
valueBox(averageDayride, icon = "far fa-calendar", color = "orange")
# color = ifelse(spam > 10, "warning", "primary"))
```
### Average rides per month
```{r avgMthride}
averageMthride <- round(mean(rideMth2$numRide, na.rm = T))
valueBox(averageMthride, icon = "far fa-calendar", color = "orange")
# color = ifelse(spam > 10, "warning", "primary"))
```
### Average rides per year
```{r avgYrride}
averageYrride <- round(mean(rideYr2$numRide, na.rm = T))
valueBox(averageYrride, icon = "far fa-calendar", color = "orange")
# color = ifelse(spam > 10, "warning", "primary"))
```
### Longest monthly trip duration (hours)
```{r}
longRidemth <- round(max(durMth2$duration), digits = 2)
valueBox(longRidemth, icon = "fal fa-hourglass-start", color = "purple")
```
Row
-----------------------------------------------------------------------
### Total usage time (hours)
```{r}
usage <- round(sum(citibike4dist$tripHr))
valueBox(usage, icon = "glyphicon glyphicon-time")
```
### Distance traveled (estimated miles)
```{r estDistance}
#use citibike4dist without the outlier
estDist <- round(sum(citibike4dist$distMile))
valueBox(estDist, icon = "fas fa-road", color = "green")
# color = ifelse(spam > 10, "warning", "primary"))
```
### Gas saved (estimated gallons)
```{r gas}
#use citibike4dist without the outlier
gas <- round(sum(citibike4dist$gassavedGal))
valueBox(gas, icon = "glyphicon glyphicon-piggy-bank", color = "green")
# color = ifelse(spam > 10, "warning", "primary"))
```
### CO2 reduced (estimated lbs)
```{r co2}
#use citibike4dist without the outlier
co2 <- round(sum(citibike4dist$co2reducedLbs))
valueBox(co2, icon = "glyphicon glyphicon-tree-deciduous", color = "green")
# color = ifelse(spam > 10, "warning", "primary"))
```
### Longest yearly trip duration (hours)
```{r}
longRideyr <- round(max(durYr2$duration), digits = 2)
valueBox(longRideyr, icon = "fal fa-hourglass-start", color = "purple")
```
Analysis {.storyboard}
=========================================
### **113 bike stations** have been used since the start of membership in August 2014
```{r mapLocation}
#========= create map for Q1# citibike stations I have visited since 2014 =========#
#names(stationCodes)
leaflet() %>%
addTiles() %>%
addMarkers(lng= stationCodes$lon, lat= stationCodes$lat, label = stationCodes$station)
```
***
The geographic map shows location of the Citibike stations I have taken and returned the bikes since I started be the membership in **August 2014**.
There are `113` bike stations in total out of `750` Citibike stations in the city as per their record on October 2017.
The majority of stations I use locates in Manhattan midtown area and some stations in the downtown west. There are **6 stations** I use in Long Island City, Queens and the rest of them are in the city. Noticeably, I have never used the service in the lower east and the upper part of the city. Part of the reason because the observations in the dataset are in 2014 to 2016 and at the time Citibike had not expanded their service to the upper area of the city.
### **6 hot stations** often gone to: 5 in Midtown Manhattan and 1 in MidEast
```{r mapHeat}
#===================== create bubble map based on usage of the stations ====================#
#=========Q#2: which location or area I have visited most often - Bubble map!
# or based on number of time I have visited the stations
stationCombined <- as.data.frame(c(citibike$fromStation, citibike$toStation))
#head(stationCombined)
#tail(stationCombined)
#str(stationCombined)
stationCombined$numVisit <- rep(1, times= 1342) # create new column for number of trip, all 1
stationCombined$`c(citibike$fromStation, citibike$toStation)` <- as.character(stationCombined$`c(citibike$fromStation, citibike$toStation)`) #change class to character
# merge geocodes and stationCombined
stationCombined <- arrange(as.data.frame(table(stationCombined$`c(citibike$fromStation, citibike$toStation)`)), Freq) # find number of usage
stationCombined <- merge(stationCombined, stationCodes, by.x = "Var1", by.y = "station", all.x = TRUE)
stationCombined <- arrange(stationCombined, Freq)
# now map
# cut the value into bins
stationCombined$cut <- cut(stationCombined$Freq,
breaks= c(0,30,60,90,150), right = FALSE,
labels= c("Light[0-30)", "Moderate[30-60)", "Often[60-90)", "Extreme[90-150)"))
# define color
pal <- colorFactor(palette = c("yellow","orange","red","purple"), domain = stationCombined$cut)
leaflet(stationCombined) %>%
addTiles() %>%
addCircleMarkers(lng = ~lon,
lat = ~lat,
color = ~pal(cut),
opacity = 1,
label = paste("Location:",stationCombined$Var1,
";",
"Frequency:", stationCombined$Freq,
"Level:", stationCombined$cut)) %>%
leaflet::addLegend(position= "bottomright",
pal= pal,
values= ~cut,
title = "Usage Counts",
opacity = 1)
```
***
Transitioning from the previous map, this geographic map with circle markers shows the bike stations where I use measured by frequency of the usage. The number of time are divided and grouped into 4 groups: `0-30 times` is lightly use, `30-60 times` is moderate, `60-90 times` is often use, and `90-150 times` is considered as extreme use.
The map shows two hot spots that I use most often. One station is on **W 53th St & 10th Ave** for `105 times` and the other is **W 56 St & 6 Ave** for `148 times`.
There are four other stations that I often use in the midtown and one spot in the midtown east. These makes total senses since all these hot spots in orange, red, and purple are the areas where my work, school, and go-to places locate.
### **13 top routes** by number of rides between the start staions to the end stations
```{r alluvialTopStations, fig.width=10, fig.height=5}
#================== link from station to station: oblivia==================#
library(alluvial)
library(ggalluvial)
#names(citibike)
topRoute <- select(citibike, fromStation, toStation, numRide) %>%
group_by(fromStation, toStation) %>%
summarise(numRide = sum(numRide, na.rm = T)) %>%
arrange(desc(numRide))%>%
filter(numRide >10)
#is_alluvial(as.data.frame(topRoute), logical= FALSE, silent= TRUE)
#names(topRoute)
ggplot(as.data.frame(topRoute),
aes(y = numRide, axis1 = fromStation, axis2 = toStation)) +
geom_alluvium(aes(fill= fromStation, alpha= 1)) +
geom_stratum(fill = "black", color = "grey", alpha= .2) +
geom_label(stat = "stratum", aes(label = after_stat(stratum))) +
scale_x_discrete(limits = c("fromStation", "toStation"), expand = c(.1, .1)) +
scale_fill_viridis_d() +
labs(title= "My Top Citibike Routes by Number of Rides Between the Stations")+
ylab("Number of Rides")+
#theme_classic()+
theme(legend.position="none",
#axis.text.x=element_blank(),
#axis.ticks = element_blank(),
#legend.position="bottom",
plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
text=element_text(size= 12, family="Arial"))
```
***
Here I create the alluvial diagram to show my **top 13 routes** based on the number of rides between the pairs of station.
The height of block represents the size of ride number relating to that station and width of stream field represents the number of rides accumulated between the connected stations.
The diagram shows two obvious routes that I ride most often which are the route from **W 56 St & 6 Ave** to **Broadway & W 56 St** for `54 rides` and from **45 Rd & 11 St** to **46 Ave & 5 St** for `31 rides`. Note that all the routes that show here are over 10 rides.
### **Highest rides in July 2015** with the trend of higher usage in the summer and lower usage in the winter
```{r timeseries1}
#============= number of rides overtime by day/month/Yr ===========#
myRides <- select(citibike, date, mth, mth1, yr, numRide, tripMin, tripHr)
#numRides/day
rideDay <- group_by(myRides, mth, yr, date) %>%
summarise(numRide = sum(numRide, na.rm = T)) %>%
arrange(yr) %>%
filter(yr <2018)
#have to filter out 2017&2018 to make plot look better
#numRides/mth
rideMth <- group_by(myRides, mth, mth1, yr) %>%
summarise(numRide = sum(numRide, na.rm = T)) %>%
arrange(yr, mth1) %>%
filter(yr <2018)
#create date column with day 1
rideMth$dateRef <- as.Date(paste(rideMth$yr, rideMth$mth1, 1, sep='-'))
#Make plots for per day
dayData <- xts(x = rideDay$numRide, order.by = rideDay$date)
dygraph(dayData,
main = "Daily Rides from August 2014 to August 2016 ",
ylab = "Number of Rides / Day") %>%
dyOptions( drawPoints = TRUE,
pointSize = 4,
fillGraph=TRUE)%>%
dyRangeSelector(strokeColor = "darkred", fillColor = "darkred") %>%
dyLegend(show = "always", hideOnMouseOut = FALSE) %>%
dySeries("V1", label = "Number of Rides")
```
***
This time-series plot with a combination of dot plot and line plot shows the trend of my daily bike usage.
By observing the dots, it shows that most of my daily use are `1-2 rides a day`. In addition, it shows a trend of higher usage in July through October which are the summer months. The most obvious trend in 2015. The highest number of ride was on June 13th, 2015 with 7 rides in that day. What a busy day!
I would like to note here that the dataset used in this project actually includes observations in 2018. But since there is a big gap of a period that I paused my membership from August 2016 to May 2018. Those observations were removed and are not included in this plot for the purpose of increasing the ability to spot the trend.
### **Monthly rides** emphasize the same trend as the daily rides with the highest rides again in July 2015 for 70 rides!
```{r timeseries2}
#Make plots for per month
mthData <- xts(x = rideMth$numRide, order.by = rideMth$dateRef)
dygraph(mthData,
main = "Monthly Rides from August 2014 to August 2016 ",
ylab = "Number of Rides / Month") %>%
dyOptions(drawPoints = TRUE, pointSize = 4, fillGraph=TRUE)%>%
dyRangeSelector(strokeColor = "darkred", fillColor = "darkred") %>%
dyLegend(show = "always", hideOnMouseOut = FALSE) %>%
dySeries("V1", label = "Number of Rides")
```
***
This monthly rides time-series chart shows the same trend spotted on the daily chart. It gives a clearer picture of the trend.
Number of rides are accumulated by month with the highest point at `70 rides `in **July 2015** and the lowest rides at only `5 rides` in **February 2015**. Again this emphasizes the higher usage in summer and lower usage in winter months trend.
### **Longer ride in July through October 2015** with the longest duration of 2 hours in the day
```{r timeseries3}
#============== duration Min overtime by month/Yr ==============#
#Had to get rid one extreme data point since it skewed the plot
myRides4dura <- select(citibike, date, mth, mth1, yr, numRide, tripMin, tripHr) %>%
filter(tripMin < 1000)
#head(myRides4dura)
#duration/day
durDay <- group_by(myRides4dura, mth, yr, date) %>%
summarise(duration = sum(tripMin, na.rm = T)) %>%
arrange(yr) %>%
filter(yr <2018)
#have to filter out 2017&2018 to make plot look better
#Make plots for per day
dayDataDur <- xts(x = durDay$duration, order.by = durDay$date)
dygraph(dayDataDur,
main = "Daily Trip Duration from August 2014 to August 2016 ",
ylab = "Duration (min) / Day") %>%
dyOptions( drawPoints = TRUE,
pointSize = 4,
fillGraph=TRUE)%>%
dyRangeSelector(strokeColor = "darkred", fillColor = "darkred") %>%
dyLegend(show = "always", hideOnMouseOut = FALSE) %>%
dySeries("V1", label = "Duration (minutes)", color = "darkblue")
```
***
This time-series chart looks similar to the previous two charts. The difference is that this chart is meant to show the trend by the trip duration or the amount of time spent on the rides. Each point represents the amount of time in minute accumulated in a day.
Again this shows the same seasonal trend as seen earlier, however the outlier or the longest time spent on riding was on **September 6th, 2015** with an accumulated time of 2 hours in that day. Nevertheless, the higher trend can still be seen in the month of June and July 2015.
### **Longest monthly rides in June 2015** with accumulation of 16 hours
```{r timeseries4}
#duration/mth
durMth <- group_by(myRides4dura, mth, mth1, yr) %>%
summarise(duration = sum(tripHr, na.rm = T)) %>%
arrange(yr, mth1) %>%
filter(yr <2018)
#create date column with day 1
durMth$dateRef <- as.Date(paste(durMth$yr, durMth$mth1, 1, sep='-'))
#Make plots for per month
mthDataDur <- xts(x = durMth$duration, order.by = durMth$dateRef)
dygraph(mthDataDur,
main = "Monthly Trip Duration from August 2014 to August 2016 ",
ylab = "Duration (hr) / Month") %>%
dyOptions(drawPoints = TRUE, pointSize = 4, fillGraph=TRUE)%>%
dyRangeSelector(strokeColor = "darkred", fillColor = "darkred") %>%
dyLegend(show = "always", hideOnMouseOut = FALSE) %>%
dySeries("V1", label = "Duration (hrs)", color = "darkblue")
```
***
This monthly plot indicates the same trend with the outlier shifted to the month of June in 2015. The accumulated time is 16 hours.
### **Weekend rides are slightly longer** than weekdays
```{r boxplotWday}
#===================== weekday distribution by duration ======================#
#head(citibike)
#names(citibike)
citibike4dist <- filter(citibike, tripMin <1000 )
weekday <- citibike4dist %>%
ggplot(aes(x=wday,y=tripMin,fill=wday))+
geom_boxplot(alpha=.5)+
#theme_fivethirtyeight()+
scale_fill_brewer(palette="Set3")+
theme_classic()+
theme(legend.position="none",
plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
text=element_text(size= 12, family="Arial"))+
ggtitle("Distribution of Ride Duration by Days of Week") +
scale_x_discrete(name = "")+
scale_y_continuous(name = "Trip Duration (minute)",
limits=c(0, 77))
ggplotly(weekday)
```
***
Boxplot shows the distribution of time spent on the rides by days of week based on the summary statistics.
The median of trip duration is higher on the weekend with the values of 8.07 and 9 minutes on Saturday and Sunday respectively. Monday through Friday have lower median duration ranging between 4.30 to 6.44 minutes. The weekend also have higher variation of time while there are more outliers or trip duration that are higher but out of the range on the weekdays.
### **More rides on weekdays** than weekends
```{r barRidebByday}
# rides by day
#citibike %>%
ggplotly(ggplot(citibike, aes(x=wday,fill=wday))+
geom_bar(alpha=1)+
#theme_fivethirtyeight()+
theme_classic()+
scale_fill_brewer(palette="Set3")+
theme(legend.position="none",
plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
text=element_text(size= 12, family="Arial"))+
scale_x_discrete(name = "")+
ggtitle("Number of Rides by Days of Week"))
```
***
Bar plot compares accumulated number of rides by days of week. It looks like I ride more often on weekdays than on weekends. The highest number of rides is on Friday and the lowest rides on Saturday. The summaries are as follow:
Day Total count
------- -----------
Monday 115
Tuesday 74
Wednesday 110
Thursday 85
Friday 164
Saturday 44
Sunday 79
------- -----------
Accompanying with the previous boxplot, this means that while I ride more often on weekdays but I tend to ride longer on the weekends.
### **Peak hours** on weekdays are in the morning 6-9am and 3-6pm while peak hours on weekends are from noon to 6pm
```{r barTimedist}
#===================== Hours distribution by duration ======================#
#time of day distribution by usertype
ggplotly(citibike %>%
ggplot(aes(x=groupHr, fill= wdayCat)) +
geom_bar(alpha=1) +
theme_classic() +
scale_fill_brewer(palette="Set3")+
theme(legend.title=element_blank(),
legend.position="bottom",
plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
text=element_text(size= 12, family="Arial"))+
scale_x_discrete(name = "")+
ggtitle("Time of Day Distribution by Weekday vs Weekend"))
```
***
Stacked-bar plot show distribution of number of rides by hours of day. This gives the ability to compare between weekends and weekdays usage by time of day. The time are grouped into six ranges.
3-6pm seems to be my peak time every day. I also bike more in the early hours on weekdays than weekends and I am obviously not a night riders however that seems to happen occasionally. Below is the summary of number of rides by time of day:
Hours Total count
------- -----------
6-9AM 167
9-12AM 145
12-3PM 92
3-6PM 230
6-9PM 29
After 9PM 8
------- -----------
### **Biking for Good Causes:** Do you know that we can reduce our spending and help our environment at the same time by just biking?
```{r barGoodcauses, fig.width=10, fig.height=5}
#================ barchart show distance, saveGallon, reduceCO2 by year ============= #
#names(citibike4dist)
groupbar <- select(citibike4dist, date,distMile,gassavedGal,co2reducedLbs, yr) %>%
group_by(yr)%>%
summarise(distMile = sum(distMile, na.rm = T),
gassavedGal = sum(gassavedGal, na.rm = T),
co2reducedLbs = sum(co2reducedLbs, na.rm = T))
groupbar$yr <- as.factor(as.character(groupbar$yr,
levels = c("2014","2015","2016","2018")))
#reshape data to long
library(tidyr)
groupbar <- gather(groupbar, items, value, distMile:co2reducedLbs, factor_key=TRUE)
groupbar <- rename(groupbar, Year = yr)
#plot
require(viridis)
# Create labels
labs <- c("Distance (mile)","Gas Saved (gallon)","CO2 Reduced (lbs)")
levels(groupbar$items) <- rev(labs)
ggplotly(ggplot(transform(groupbar, items=factor(items, levels = c("Distance (mile)","Gas Saved (gallon)","CO2 Reduced (lbs)"))),
aes(Year, value)) +
geom_bar(aes(fill = Year), position = "dodge", stat="identity") +
facet_wrap(~items, scales = "free") +
xlab("") +
ylab("Measurement") +
theme_classic()+
theme(axis.text.x=element_blank(),
axis.ticks = element_blank(),
legend.position="bottom",
plot.title = element_text(size = 14, face = "bold"),
axis.text = element_text(size = 10),
axis.title = element_text(size = 10),
text=element_text(size= 10, family="Arial"))+
scale_fill_viridis(discrete=TRUE)+
labs(title= "Estimated Gas Saving & CO2 Reduction Over Time"))
#legend.title=element_blank())+
# subtitle= "",
#caption= "Source:"))
```
***
**Yes!** Biking is not just a sport but also a recreational activity and a way of exercising for most of people. And there are many more hidden benefits that may never cross our mind. Biking can save our bucks and environment!
According to the Citibike and the EPA (Environmental Protection Agency) website, an estimated distance traveled based on the total usage time with an assumed average speed of `7.456 miles` per hour. Gas saved is an estimate using the distance traveled multiplied by `0.041 gallon per mile (24.1 mpg)`. And C02 reduced is estimated using the distance traveled multiplied by `0.812 lbs. per mile`.
These three bar plots shows the same pattern as what we see on the previous time-series chart by trip duration since the numbers are calculated based on the total usage time.
Source:
www.citibikenyc.com
http://www.epa.gov/otaq/consumer/420f08024.pdf