Visualizing bike sharing demand for Kaggle competition

Matias Lespiau
December 2014

Motivation

Kaggle has a competition on predicting bike sharing demand in Washington, D. C. based on weather and historical data.

Main Screen

Building a tool to visualize how the demand is distributed along the different moments of the day, the days of week and the year could help us understand facts or patterns to build a better predictive model.

Usage

The app plots bike sharing demand in a color scale (darker for higher demand) for the different hours of the day against the day of the week.

Main Screen

The filters allow to separate the information by user type or/and by month.

Dataset preparation with dplyr

The dataset is downloaded from Kaggles Bike Sharing site. Hour, day of the week and month is normalized using basic R commands and finally the data is queried (and aggregated) using dplyr.

countGroup <- group_by(data, dayOfTheWeek, hour)
count <- summarise(countGroup, count=mean(count), casual=mean(casual), registered=mean(registered))
head(count, n=3)

Source: local data frame [3 x 5]
Groups: dayOfTheWeek

  dayOfTheWeek hour count casual registered
1       Sunday   00 96.23  17.47      78.76
2       Sunday   01 79.45  15.23      64.23
3       Sunday   02 62.48  12.38      50.11

Plot example

library(ggplot2)
ggplot(count, aes(x = hour, y = dayOfTheWeek)) + 
        scale_fill_gradient(name="Average Counts", low="white", high="violet") + 
        theme(axis.title.y = element_blank()) + 
        ggtitle("Bicycle sharing count density") +
        geom_tile(aes(fill = count))

plot of chunk myplot