Matias Lespiau
December 2014
Kaggle has a competition on predicting bike sharing demand in Washington, D. C. based on weather and historical data.
Building a tool to visualize how the demand is distributed along the different moments of the day, the days of week and the year could help us understand facts or patterns to build a better predictive model.
The app plots bike sharing demand in a color scale (darker for higher demand) for the different hours of the day against the day of the week.
The filters allow to separate the information by user type or/and by month.
The dataset is downloaded from Kaggles Bike Sharing site. Hour, day of the week and month is normalized using basic R commands and finally the data is queried (and aggregated) using dplyr.
countGroup <- group_by(data, dayOfTheWeek, hour)
count <- summarise(countGroup, count=mean(count), casual=mean(casual), registered=mean(registered))
head(count, n=3)
Source: local data frame [3 x 5]
Groups: dayOfTheWeek
dayOfTheWeek hour count casual registered
1 Sunday 00 96.23 17.47 78.76
2 Sunday 01 79.45 15.23 64.23
3 Sunday 02 62.48 12.38 50.11
library(ggplot2)
ggplot(count, aes(x = hour, y = dayOfTheWeek)) +
scale_fill_gradient(name="Average Counts", low="white", high="violet") +
theme(axis.title.y = element_blank()) +
ggtitle("Bicycle sharing count density") +
geom_tile(aes(fill = count))