Get started on this competition through Kaggle Scripts
Bike sharing systems are a means of renting bicycles where the process of obtaining membership, rental, and bike return is automated via a network of kiosk locations throughout a city. Using these systems, people are able rent a bike from a one location and return it to a different place on an as-needed basis. Currently, there are over 500 bike-sharing programs around the world.
The data generated by these systems makes them attractive for researchers because the duration of travel, departure location, arrival location, and time elapsed is explicitly recorded. Bike sharing systems therefore function as a sensor network, which can be used for studying mobility in a city. In this competition, participants are asked to combine historical usage patterns with weather data in order to forecast bike rental demand in the Capital Bikeshare program in Washington, D.C.
You are provided hourly rental data spanning two years. For this competition, the training set is comprised of the first 19 days of each month, while the test set is the 20th to the end of the month. You must predict the total count of bikes rented during each hour covered by the test set, using only information available prior to the rental period.
bike_sharing<-read.csv("Kaggle_bike_sharing_train.csv")
bike_sharing$hour <- hour(ymd_hms(bike_sharing$datetime))
bike_sharing$times <- as.POSIXct(strftime(ymd_hms(bike_sharing$datetime), format="%H:%M:%S"), format="%H:%M:%S")
bike_sharing$day <- wday(ymd_hms(bike_sharing$datetime), label=TRUE)
rent_plot<-ggplot(bike_sharing, aes(x=times, y=count, color=day)) +
geom_smooth(ce=FALSE, fill=NA, size=2) +
theme_light(base_size=20) +
xlab("Hour of the Day") +
scale_x_datetime(breaks = date_breaks("4 hours"), labels=date_format("%I:%M %p")) +
ylab("Number of Bike Rentals") +
scale_color_discrete("") +
ggtitle("N bike rentals by day of the week and hour") +
theme(plot.title=element_text(size=18),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 12))
temp_plot<-ggplot(bike_sharing, aes(x=times, y=temp)) +
geom_smooth(ce=FALSE, fill=NA, size=2) +
theme_light(base_size=20) +
xlab("Hour of the Day") +
scale_x_datetime(breaks = date_breaks("4 hours"), labels=date_format("%I:%M %p")) +
ylab("Temperature [Celsius]") +
#scale_color_discrete("") +
ggtitle("Temperature by day of the week and hour") +
theme(plot.title=element_text(size=18),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 12))
ggplotly(rent_plot)
ggplotly(temp_plot)
Is windspeed associated to the number of bike rentails?
ggplot(bike_sharing, aes(x=windspeed, y=count)) +
geom_point(size=2, shape=23)
ggplot(bike_sharing, aes(x=humidity, y=count)) +
geom_point(size=2, shape=23)
ggplot(bike_sharing, aes(x=temp, y=count)) +
geom_point(size=2, shape=23)
bike_sharing$month <- month(ymd_hms(bike_sharing$datetime), label = TRUE)
ggplotly(ggplot(bike_sharing, aes(x=times, y=count, color=month)) +
geom_smooth(ce=FALSE, fill=NA, size=2) +
theme_light(base_size=20) +
xlab("Hour of the Day") +
scale_x_datetime(breaks = date_breaks("4 hours"), labels=date_format("%I:%M %p")) +
ylab("Number of Bike Rentals") +
scale_color_discrete("") +
ggtitle("N bike rentals by month and day hour") +
theme(plot.title=element_text(size=18),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 12)))
bike_sharing <-
bike_sharing %>%
mutate(
season_fct = case_when(
season == 1 ~ "Spring",
season == 2 ~ "Summer",
season == 3 ~ "Autumn",
season == 4 ~ "Winter"
)) %>%
mutate(season_fct = factor(season_fct, levels = c("Spring", "Summer", "Autumn", "Winter")))
ggplotly(ggplot(bike_sharing, aes(x=times, y=count, color=season_fct)) +
geom_smooth(ce=FALSE, fill=NA, size=2) +
theme_light(base_size=20) +
xlab("Hour of the Day") +
scale_x_datetime(breaks = date_breaks("4 hours"), labels=date_format("%I:%M %p")) +
ylab("Number of Bike Rentals") +
scale_color_discrete("") +
ggtitle("N bike rentals by season and day hour") +
theme(plot.title=element_text(size=18),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 12)))
bike_sharing <-
bike_sharing %>%
mutate(
workingday_fct = case_when(
workingday == 1 ~ "Working Day",
workingday == 0 ~ "Not Working Day"
)) %>%
mutate(workingday_fct = factor(workingday_fct, levels = c("Working Day", "Not Working Day")))
ggplotly(ggplot(bike_sharing, aes(x=times, y=count, color=workingday_fct)) +
geom_smooth(ce=FALSE, fill=NA, size=2) +
theme_light(base_size=20) +
xlab("Hour of the Day") +
scale_x_datetime(breaks = date_breaks("4 hours"), labels=date_format("%I:%M %p")) +
ylab("Number of Bike Rentals") +
scale_color_discrete("") +
ggtitle("N bike rentals by Working Day and day hour") +
theme(plot.title=element_text(size=18),
axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1, size = 12)))
The peaks at commuting times are visible in Working Days