What Factors Influence Bike Rentals: An Exploratory Analysis on Seoul Bike Rentals

Authors

Slimeball Goats

Grady Freeman, Nathan Gruwell, Shayaan Cyclewalla, Jackson Mayo

library(qrcode)
plot(qr_code("https://rpubs.com/gradyfreem05/1428021"))

Introduction

  • Bikes are a common form of transportation alternate to cars in Seoul. This is most likely because bikes can efficiently avoid traffic, and they cost much less to maintain/rent than a car.

  • Knowing what factors influence bike rental counts in Seoul is very profitable for the company renting them out.

  • By knowing what days are predicted to have higher or lower rental counts, the company can efficiently distribute bikes on busy days or to take them in for maintenance on days with low demand.

Project Goal

This project aims to determine what factors affect bike rentals so that the company can make informed decisions about bike availability and distribution in order to maximize efficiency and revenue.

Data

Seoul Bike Rental Dataset

This data was acquired from Kaggle.com, and has 8,760 observations across 15 variables. The variables describe important weather characteristics down to the hour, and whether or not the day was a holiday or a functioning day (service available or not). Below are the variable names for the data.

library(tidyverse)
library(knitr)
library(plotly)
data.frame(Variable_Names = names(df)) |>
  knitr::kable(
    caption = "Variable Names in Bike Rental Dataset"
  )
Variable Names in Bike Rental Dataset
Variable_Names
Date
Rented.Bike.Count
Hour
Temperature.蚓.
Humidity…
Wind.speed..m.s.
Visibility..10m.
Dew.point.temperature.蚓.
Solar.Radiation..MJ.m2.
Rainfall.mm.
Snowfall..cm.
Seasons
Holiday
Functioning.Day

Everything looks fine in this table besides Temperature and Dew.point.temperature. The character “蚓” after both of these means Celsius, so we need to convert both of these to Fahrenheit, and then remove the old columns.

df$Temperature_F = (df$Temperature.蚓. * 9/5) + 32
df$Dew.point.temperature_F = (df$Dew.point.temperature.蚓. * 9/5) + 32
## Removing the old columns
library(dplyr)
df <- df |>
  select(-Temperature.蚓., -Dew.point.temperature.蚓.)
library(DT)
datatable(df)

Analysis

#Rental/Temp Scatterplot

p1 <- ggplot(df, aes(x = Temperature_F, y = Rented.Bike.Count, color = Seasons)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "lm", color = "black") +
  labs(title = "Rentals by Temperature (Fahrenheit)",
       x = "Temperature (°F)",
       y = "Bikes Rented") +
  theme_minimal()
p1

This scatter plot shows the relationship between temperature (in Fahrenheit) and rental demand. The black trend line and the upward cluster of points display a positive correlation, meaning that as the weather warms up, the number of people renting bikes increases.

#Holidays/Rentals Boxplot
p2 <- ggplot(df, aes(x = Holiday, y = Rented.Bike.Count, fill = Holiday)) +
  geom_boxplot() +
  labs(title = "Impact of Holidays on Rentals",
       x = "Day Type",
       y = "Bikes Rented") +
  theme_minimal() +
  theme(legend.position = "none")
ggplotly(p2)

These box plots depict the amount of bikes being rented during a holiday contrasted on days without one. The holiday plot shows lower overall spread, which means people prefer riding bikes on days without holidays.

#data cleaning
#demand function
df_clean <- df |>
  mutate(Demand_Level = case_when(
    Rented.Bike.Count > 1500 ~ "Very High",
    Rented.Bike.Count > 1000 ~ "High",
    Rented.Bike.Count > 500 ~ "Moderate",
    TRUE ~ "Low"
  )) |>
  filter(Functioning.Day == "Yes") 

#seasonal (summary)
seasonal_stats <- df_clean |>
  group_by(Seasons) |>
  summarize(
    Avg_Rentals = mean(Rented.Bike.Count),
    Max_Rentals = max(Rented.Bike.Count),
    Total_Hours = n()
  )

knitr::kable(seasonal_stats, caption = "Summary of Rentals by Season")
Summary of Rentals by Season
Seasons Avg_Rentals Max_Rentals Total_Hours
Autumn 924.1105 3298 1937
Spring 746.2542 3251 2160
Summer 1034.0734 3556 2208
Winter 225.5412 937 2160

We used mutate and case_when to categorize demand. Then, we used filter to observe relevant functioning days. Lastly, we grouped seasons and performed summaries to further understand the rental data. As we can see, the summer is a clear winner in bike rentals, with Winter being in last place. We should also note that the max rentals for the top three seasons are relatively close so the peaks can be similar when the seasons get very busy.

#demand lvl

df_clean$Demand_Level<- factor(df_clean$Demand_Level, levels = c("Low", "Moderate", "High", "Very High"))

p3 <- ggplot(df_clean, aes(x = Demand_Level, fill = Demand_Level)) +
  geom_bar() +
  labs(title = "Frequency of Demand Levels",
       x = "Demand Type",
       y = "Frequency (Hours)") +
  theme_minimal()
p3

A demand function was created to sort the demand of wanted bikes in levels of High, Low, Moderate and Very High. Given the “low” level has the highest count, our team can deduct that more often than not, most bikes aren’t utilized.

p4 <- ggplot(df_clean, aes(x = Humidity...)) +
  geom_histogram(binwidth = 5, fill = "skyblue1", color = "blue") +
  scale_x_continuous(breaks = seq(0, 100, 10)) +
  labs(title = "Distribution of Humidity Levels", 
       x = "Humidity (%)", 
       y = "Frequency (Hours)") +
  theme_minimal()
p4

This visualization shows the humidity levels as it alternates conditions in Seoul. We chose to split the data in increments of 10% and what we found was that mid-level (moderate) humidity is the most prominent state where bike rentals are purchased.

dewpointbike <- ggplot(df, aes(x = Dew.point.temperature_F, y = Rented.Bike.Count, color = Seasons)) +
  geom_point() +
  geom_smooth(color = "grey40",method = "lm", se = FALSE) +
  labs(title = "Scatter Plot of Dew Point and Rented Bike Count",
       x = "Dew Point (°F)",
       y = "Rented Bike Count") +
  theme_minimal()
dewpointbike

In this plot we can see that there is a clear positive correlation between Dew Point and Rented Bike Count. This doesn’t necessarily mean that the two are connected however, as we already discovered, the summer has the highest rentals on average as well. So inherently the summer having higher dew point also has a higher bike rental count, which makes the two look correlated.

windbike <- ggplot(df |> filter(Wind.speed..m.s. > 0.5,
                                Wind.speed..m.s. < 5),
                   aes(x = Wind.speed..m.s., 
                       y = Rented.Bike.Count, 
                       color = Seasons)) +
  geom_point(alpha = 0.35) +
  geom_smooth(method = "lm", se = FALSE) +
  facet_wrap(~ Seasons) +
  labs(title = "Wind Speed and Bike Rentals by Season",
       x = "Wind Speed (m/s)",
       y = "Rented Bike Count") +
  theme_minimal()

windbike

windhour <- ggplot(df, aes(x = Hour, 
                           y = Wind.speed..m.s.)) +
  geom_point(alpha = 0.25) +
  geom_smooth(method = "lm", se = FALSE, color = "grey40") +
  labs(title = "Wind Speed by Hour",
       x = "Hour of Day",
       y = "Wind Speed (m/s)") +
  theme_minimal()

bikehour <- ggplot(df, aes(x = Hour, 
                           y = Rented.Bike.Count,
                           color = Seasons)) +
  geom_point(alpha = 0.25) +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Bike Rentals by Hour and Season",
       x = "Hour of Day",
       y = "Rented Bike Count") +
  theme_minimal()

windhour + bikehour

These graphs show that bike rentals go up as wind speed goes up in almost every season. Summer still has the highest bike rental counts, then Autumn and Spring are also pretty high, while Winter stays much lower. The side by side graphs help explain why this might happen because wind speed goes up later in the day, and bike rentals also go up later in the day. So it might not be the wind by itself causing more rentals, but more that people rent bikes during the daytime and evening when wind speeds are a little higher. Also, in the summer, a little more wind might actually make it feel cooler outside, which could make people more willing to go outside and ride bikes.

p5 <- ggplot(seasonal_stats, aes(x = Seasons, y = Avg_Rentals)) +
  geom_segment(aes(x = Seasons, xend = Seasons, y = 0, yend = Avg_Rentals), color = "black") +
  geom_point(size = 5, color = "orange") +
  labs(title = "Average Rentals by Season") +
  theme_minimal()
p5

The geom_segment is excellent in displaying the highest demands for a bike rental within each season. Through the visualization we can clearly see the highest amount of rentals was in summer being just over a thousand, and the lowest being in winter.

outlier_box <- ggplot(df, aes(x = Seasons, y = Rented.Bike.Count, fill = Seasons)) +
  geom_boxplot(outlier.colour = "red", outlier.shape = 16, outlier.size = 2) +
  coord_flip() +
  labs(title = "Boxplot of Rented Bike Count by Season",
       x = "Season",
       y = "Rented Bike Count") +
  theme_minimal()
outlier_box

In this plot we can see that the Summer has the highest mean rentals, as well as having the most outliers. It is important to note that Winter has the smallest spread, which means that we can confidently take in bikes for repairs and not have a sudden increase in demand that catches us off guard. We also know that there are some outliers in Autumn that means that we should always be ready with extra bikes during that season.

hourly_trends <- df_clean |>
  group_by(Hour, Seasons) |>
  summarize(Avg_Rentals = mean(Rented.Bike.Count), .groups = 'drop')

p7 <- ggplot(hourly_trends, aes(x = Hour, y = Avg_Rentals, color = Seasons)) +
  geom_line(size = 1.2) +
  geom_point() +
  scale_x_continuous(breaks = seq(0, 23, 2)) +
  labs(title = "Hourly Rental Trends by Season",
       x = "Hour of Day (24hr)", y = "Average Bikes Rented") +
  theme_minimal()
ggplotly(p7)

This line chart follows the average number of bike rentals within a 24-hour period to show patterns of commutes/bike usage. Notice the visualization contains two peaks (8 & 18), signifying morning and evening rush hours consistently across all seasons. This could prove to be especially useful for traffic management or arrivals/departures for example, in order to maximize efficiency.

#plot 8:rainfall density
p8 <- ggplot(df_clean, aes(x = Rented.Bike.Count, fill = Seasons)) +
  geom_density(alpha = 0.5) +
  facet_wrap(~(Rainfall.mm. > 0), 
             labeller = as_labeller(c("FALSE" = "No Rain", "TRUE" = "During Rain"))) +
  labs(title = "Rentals in Rain vs. No Rain",
       x = "Bikes Rented", y = "Density") +
  theme_minimal() + theme(legend.position = "none")

#plot 9:snowfall density
p9 <- ggplot(df_clean, aes(x = Rented.Bike.Count, fill = Seasons)) +
  geom_density(alpha = 0.5) +
  facet_wrap(~(Snowfall..cm. > 0), 
             labeller = as_labeller(c("FALSE" = "No Snow", "TRUE" = "During Snow"))) +
  labs(title = "Rentals in Snow vs. No Snow",
       x = "Bikes Rented", y = "Density") +
  theme_minimal() + theme(legend.position = "none")

#plot 10: seasomal precipitation
p10 <- ggplot(df_clean |> filter(Rainfall.mm. > 0 | Snowfall..cm. > 0), 
              aes(x = Seasons, y = Rented.Bike.Count, color = Seasons)) +
  geom_jitter(alpha = 0.3, width = 0.2) +
  geom_violin(alpha = 0.1, color = "black") +
  labs(title = "Rentals During Active Precipitation",
       x = "Season", y = "Bikes Rented") +
  theme_minimal() + theme(legend.position = "bottom")

(p8 + p9) / p10

The combined visualization shows the impact of precipitation on bike rental density and seasonal demand distribution during precipitation. The upper density plots display that active rain and snow conditions cause a significant “squishing” of the data toward the lower end of the rental scale, proving yet again that unfriendly weather discourages riders. In the bottom panel, data point spread and violin shape further show this relationship by revealing that even during precipitation, the summer season still has the highest amount of bikes rented (activity), while other seasons like spring and winter are far lower.

Conclusions:

  • Overall, the main things that affected bike rentals were season, temperature, hour, and bad weather.
  • Summer had the highest bike rentals and Winter had the lowest, which makes sense because people are more likely to ride bikes when it is warm outside.
  • Temperature and dew point both went up with bike rentals, but that is probably because warmer seasons already have higher temperatures and higher rental counts.
  • Wind speed also looked like it increased bike rentals, but that does not mean wind is the main reason.
  • The hour graphs showed that wind speed and bike rentals both go up later in the day, so time of day is probably part of why that happens.
  • In the summer, some wind might make it feel cooler outside, which could make people more likely to ride bikes.
  • Rain and snow lowered bike rentals a lot because people are less likely to ride in bad weather.
  • The hourly graph showed higher rentals around morning and evening commute times.
  • Bike companies should have more bikes ready during warmer seasons, clear weather, and busy hours.
  • Winter and bad weather days would be better times for repairs or maintenance because demand is lower.

Contact Information

Thank you for visiting our page!

  • gfreem20@students.kennesaw.edu

  • ngruwell@students.kennesaw.edu

  • scyclewa@students.kennesaw.edu

  • jmayo20@students.kennesaw.edu