Bike sharing platforms are on a boom and opening up in many cities across the world. They have automated the complete the process of membership, renting and return. In 2017, there were 55 systems in place across US with 42,000 bikes in the network. Similarly across the world, there are 1000 similar programs in operation. With the passage of time, these programs are gaining popularity because of increasing awareness in public regarding traffic, environmental issues, and health issues. The picture below depicts the increasing trips using bike sharing in US alone. As opposed to other transport services such bus or subway, the duration of travel, departure, and arrival location are not controlled by an organization. These programs are also affected by changing weather conditions and calamities. Currently, a lot of programs are springing across universities throughout the states, which are encouraging younger people to develop and follow a healthier lifestyle. This research is being conducted with the intent of studying effect of calender days and various weather factors on the demand of bicycles.
image
More and more people prefer to rent bikes because it is a cheap, convenient, healthy and environment friendly method for short trips (Shaheen, Guzman, & Zhang, 2010). This increase in user-base increases the complexities in handling them. A lot of research today is towards forecasting bike demand and rebalancing stations according to meet the anticipated demand. (Raviv, Tzur, and Forma 2013) tackles the problem of finding truck routes and plans for how many bikes to move between stations. The paper minimizes an objective function tied to both the operating cost of the vehicles as well as penalty functions relating to station imbalance. Similarly, (Rainer-Harbach et al. 2013) takes a local search approach to finding a similar solution.
image
“Just the mismatch between docks and bikes”, “Sometimes these racks are empty; sometimes they’re packed, and that forces me to fend, when I’ve got to ditch a bike or I need to be somewhere fast.”
“And another daily biker says he has trouble finding bikes in some spots, like near the Port Authority, you have to walk up 10 blocks or so.”
The bike share demand data is recorded by Capital Bikeshare platform that operates in Washington DC. The data consists of entries for a period of 2 years with daily weather and general conditions. The data has 731 entries with 16 attributes. A brief description of the attributes is provided below:
instant: record index
dteday : date
season : season (1:springer, 2:summer, 3:fall, 4:winter)
yr : year (0: 2011, 1:2012)
mnth : month ( 1 to 12)
holiday : weather day is holiday or not (extracted from [Web Link])
weekday : day of the week
workingday : if day is neither weekend nor holiday is 1, otherwise is 0.
ii: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist
iii: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds
iv: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog
temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)
atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)
hum: Normalized humidity. The values are divided to 100 (max)
windspeed: Normalized wind speed. The values are divided to 67 (max)
casual: count of casual users
registered: count of registered users
cnt: count of total rental bikes including both casual and registered
From the above list of attributes, we remove the first two attributes as they are not required for our analysis. The data is provided by UCI machine learning repository:https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset
The model we develop here is a linear regression model of the form:
\[Count= \alpha_0 + \alpha_1(i)*AttributesRegardingDay + \alpha_2(j)* AttributeRegardingWeather + \epsilon\]
bike <- read.csv("day.csv")
model <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$weekday+bike$weathersit+bike$windspeed)
summary(model)
##
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$weekday +
## bike$weathersit + bike$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5014.4 -915.3 134.9 923.4 3362.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3511.98 225.75 15.557 < 2e-16 ***
## bike$season 916.71 74.59 12.290 < 2e-16 ***
## bike$yr 2141.37 91.58 23.383 < 2e-16 ***
## bike$mnth -97.04 23.91 -4.059 5.47e-05 ***
## bike$weekday 81.21 22.84 3.555 0.000403 ***
## bike$weathersit -961.08 84.33 -11.397 < 2e-16 ***
## bike$windspeed -3349.88 607.68 -5.513 4.92e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1236 on 724 degrees of freedom
## Multiple R-squared: 0.596, Adjusted R-squared: 0.5926
## F-statistic: 178 on 6 and 724 DF, p-value: < 2.2e-16
From the analysis we have conducted, that we are able to develop a model that is able to account for 59.60% of the variation associated with Count. Using the above model we can predict the usage of shared bikes every day. The model above accounts for both the calender and the weather data.
In the end, we are able to test association between variables present in the dataset, and present a model that can help a manager or decision maker better plan for the day ahead. USing this data, the problem of supply and demand of shared bikes can be balanced.
Variables season, yr, mnth, weekday, weathersit, and windspeed are associated to the number of bikes used everyday out of the 13 variables in consideration. The model above helps in forecasting the use of shared bikes everyday.
https://www.curbed.com/2017/3/21/15006248/bike-share-ridership-transit-safety
Raviv, T.; Tzur, M.; and Forma, I. 2013. Static repositioning in a bike-sharing system: models and solution approaches. EURO Journal on Transportation and Logistics 2(3):187- 229
Rainer-Harbach, M.; Papazek, P.; Hu, B.; and Raidl, G. R. 2013. Balancing bicycle sharing systems: A variable neighborhood search approach. In Middendorf, M., and Blum, C., eds., EvoCOP, volume 7832 of Lecture Notes in Computer Science, 121-132. Springer
Shaheen, S., Guzman, S., & Zhang, H. (2010). Bikesharing in Europe, the Americas, and Asia: past, present, and future. Transportation Research Record: Journal of the Transportation Research Board, (2143), 159-167
http://abc7ny.com/traffic/new-york-city-bike-share-audit-reveals-problems/433473/
http://newyork.cbslocal.com/2015/08/24/citi-bike-customers-seeing-red-over-broken-bikes/
t.test(bike$cnt~bike$holiday)
##
## Welch Two Sample t-test
##
## data: bike$cnt by bike$holiday
## t = 1.7047, df = 21.007, p-value = 0.103
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -174.1953 1758.4038
## sample estimates:
## mean in group 0 mean in group 1
## 4527.104 3735.000
t.test(bike$cnt,bike$season)
##
## Welch Two Sample t-test
##
## data: bike$cnt and bike$season
## t = 62.831, df = 730, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 4361.187 4642.518
## sample estimates:
## mean of x mean of y
## 4504.34884 2.49658
model3 <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$holiday+bike$weekday+bike$workingday+bike$weathersit+bike$windspeed+bike$hum)
summary(model3)
##
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$holiday +
## bike$weekday + bike$workingday + bike$weathersit + bike$windspeed +
## bike$hum)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4916.0 -917.1 112.6 921.6 3501.7
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3147.19 313.02 10.054 < 2e-16 ***
## bike$season 899.44 74.19 12.123 < 2e-16 ***
## bike$yr 2155.91 91.48 23.566 < 2e-16 ***
## bike$mnth -94.77 23.84 -3.975 7.74e-05 ***
## bike$holiday -593.91 282.54 -2.102 0.0359 *
## bike$weekday 77.07 22.89 3.367 0.0008 ***
## bike$workingday 225.15 101.12 2.226 0.0263 *
## bike$weathersit -1066.68 107.44 -9.928 < 2e-16 ***
## bike$windspeed -3062.71 632.35 -4.843 1.56e-06 ***
## bike$hum 563.64 432.63 1.303 0.1930
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1227 on 721 degrees of freedom
## Multiple R-squared: 0.6037, Adjusted R-squared: 0.5988
## F-statistic: 122 on 9 and 721 DF, p-value: < 2.2e-16
model4 <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$holiday+bike$weekday+bike$workingday+bike$weathersit+bike$windspeed)
summary(model4)
##
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$holiday +
## bike$weekday + bike$workingday + bike$weathersit + bike$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4861.3 -903.0 95.9 933.5 3528.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3417.80 234.29 14.588 < 2e-16 ***
## bike$season 903.34 74.17 12.180 < 2e-16 ***
## bike$yr 2142.34 90.93 23.560 < 2e-16 ***
## bike$mnth -92.38 23.78 -3.885 0.000112 ***
## bike$holiday -596.58 282.67 -2.111 0.035153 *
## bike$weekday 74.33 22.80 3.259 0.001169 **
## bike$workingday 222.44 101.15 2.199 0.028185 *
## bike$weathersit -979.20 83.91 -11.669 < 2e-16 ***
## bike$windspeed -3309.88 603.52 -5.484 5.74e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1228 on 722 degrees of freedom
## Multiple R-squared: 0.6028, Adjusted R-squared: 0.5984
## F-statistic: 137 on 8 and 722 DF, p-value: < 2.2e-16
This model, includes all the variables significantly related to count.
model5 <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$weekday+bike$weathersit+bike$windspeed)
summary(model5)
##
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$weekday +
## bike$weathersit + bike$windspeed)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5014.4 -915.3 134.9 923.4 3362.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3511.98 225.75 15.557 < 2e-16 ***
## bike$season 916.71 74.59 12.290 < 2e-16 ***
## bike$yr 2141.37 91.58 23.383 < 2e-16 ***
## bike$mnth -97.04 23.91 -4.059 5.47e-05 ***
## bike$weekday 81.21 22.84 3.555 0.000403 ***
## bike$weathersit -961.08 84.33 -11.397 < 2e-16 ***
## bike$windspeed -3349.88 607.68 -5.513 4.92e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1236 on 724 degrees of freedom
## Multiple R-squared: 0.596, Adjusted R-squared: 0.5926
## F-statistic: 178 on 6 and 724 DF, p-value: < 2.2e-16
This model includes all the variables highly related to count. This model contains 4% less variation associated with Count, but is simpler as it contains lesser variables.