Email: anshul.nograiya@gmail.com

COLLEGE: Penn State University

1. Introduction

Bike sharing platforms are on a boom and opening up in many cities across the world. They have automated the complete the process of membership, renting and return. In 2017, there were 55 systems in place across US with 42,000 bikes in the network. Similarly across the world, there are 1000 similar programs in operation. With the passage of time, these programs are gaining popularity because of increasing awareness in public regarding traffic, environmental issues, and health issues. The picture below depicts the increasing trips using bike sharing in US alone. As opposed to other transport services such bus or subway, the duration of travel, departure, and arrival location are not controlled by an organization. These programs are also affected by changing weather conditions and calamities. Currently, a lot of programs are springing across universities throughout the states, which are encouraging younger people to develop and follow a healthier lifestyle. This research is being conducted with the intent of studying effect of calender days and various weather factors on the demand of bicycles.

image

image

2. Overview of the Study

More and more people prefer to rent bikes because it is a cheap, convenient, healthy and environment friendly method for short trips (Shaheen, Guzman, & Zhang, 2010). This increase in user-base increases the complexities in handling them. A lot of research today is towards forecasting bike demand and rebalancing stations according to meet the anticipated demand. (Raviv, Tzur, and Forma 2013) tackles the problem of finding truck routes and plans for how many bikes to move between stations. The paper minimizes an objective function tied to both the operating cost of the vehicles as well as penalty functions relating to station imbalance. Similarly, (Rainer-Harbach et al. 2013) takes a local search approach to finding a similar solution.

image

image

CBS New York: (August 24, 2015)

“Just the mismatch between docks and bikes”, “Sometimes these racks are empty; sometimes they’re packed, and that forces me to fend, when I’ve got to ditch a bike or I need to be somewhere fast.”

Abc7NY: (December 12, 2014)

“And another daily biker says he has trouble finding bikes in some spots, like near the Port Authority, you have to walk up 10 blocks or so.”

3. Data Description

The bike share demand data is recorded by Capital Bikeshare platform that operates in Washington DC. The data consists of entries for a period of 2 years with daily weather and general conditions. The data has 731 entries with 16 attributes. A brief description of the attributes is provided below:

  1. instant: record index

  2. dteday : date

  3. season : season (1:springer, 2:summer, 3:fall, 4:winter)

  4. yr : year (0: 2011, 1:2012)

  5. mnth : month ( 1 to 12)

  6. holiday : weather day is holiday or not (extracted from [Web Link])

  7. weekday : day of the week

  8. workingday : if day is neither weekend nor holiday is 1, otherwise is 0.

    • weathersit : i: Clear, Few clouds, Partly cloudy, Partly cloudy

ii: Mist + Cloudy, Mist + Broken clouds, Mist + Few clouds, Mist

iii: Light Snow, Light Rain + Thunderstorm + Scattered clouds, Light Rain + Scattered clouds

iv: Heavy Rain + Ice Pallets + Thunderstorm + Mist, Snow + Fog

  1. temp : Normalized temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-8, t_max=+39 (only in hourly scale)

  2. atemp: Normalized feeling temperature in Celsius. The values are derived via (t-t_min)/(t_max-t_min), t_min=-16, t_max=+50 (only in hourly scale)

  3. hum: Normalized humidity. The values are divided to 100 (max)

  4. windspeed: Normalized wind speed. The values are divided to 67 (max)

  5. casual: count of casual users

  6. registered: count of registered users

  7. cnt: count of total rental bikes including both casual and registered

From the above list of attributes, we remove the first two attributes as they are not required for our analysis. The data is provided by UCI machine learning repository:https://archive.ics.uci.edu/ml/datasets/bike+sharing+dataset

4. Model Analysis:

The model we develop here is a linear regression model of the form:

\[Count= \alpha_0 + \alpha_1(i)*AttributesRegardingDay + \alpha_2(j)* AttributeRegardingWeather + \epsilon\]

bike <- read.csv("day.csv")
model <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$weekday+bike$weathersit+bike$windspeed)
summary(model)
## 
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$weekday + 
##     bike$weathersit + bike$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5014.4  -915.3   134.9   923.4  3362.8 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3511.98     225.75  15.557  < 2e-16 ***
## bike$season       916.71      74.59  12.290  < 2e-16 ***
## bike$yr          2141.37      91.58  23.383  < 2e-16 ***
## bike$mnth         -97.04      23.91  -4.059 5.47e-05 ***
## bike$weekday       81.21      22.84   3.555 0.000403 ***
## bike$weathersit  -961.08      84.33 -11.397  < 2e-16 ***
## bike$windspeed  -3349.88     607.68  -5.513 4.92e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1236 on 724 degrees of freedom
## Multiple R-squared:  0.596,  Adjusted R-squared:  0.5926 
## F-statistic:   178 on 6 and 724 DF,  p-value: < 2.2e-16

From the analysis we have conducted, that we are able to develop a model that is able to account for 59.60% of the variation associated with Count. Using the above model we can predict the usage of shared bikes every day. The model above accounts for both the calender and the weather data.

5. Results

In the end, we are able to test association between variables present in the dataset, and present a model that can help a manager or decision maker better plan for the day ahead. USing this data, the problem of supply and demand of shared bikes can be balanced.

6. Conclusion

Variables season, yr, mnth, weekday, weathersit, and windspeed are associated to the number of bikes used everyday out of the 13 variables in consideration. The model above helps in forecasting the use of shared bikes everyday.

7. References

https://www.curbed.com/2017/3/21/15006248/bike-share-ridership-transit-safety

Raviv, T.; Tzur, M.; and Forma, I. 2013. Static repositioning in a bike-sharing system: models and solution approaches. EURO Journal on Transportation and Logistics 2(3):187- 229

Rainer-Harbach, M.; Papazek, P.; Hu, B.; and Raidl, G. R. 2013. Balancing bicycle sharing systems: A variable neighborhood search approach. In Middendorf, M., and Blum, C., eds., EvoCOP, volume 7832 of Lecture Notes in Computer Science, 121-132. Springer

Shaheen, S., Guzman, S., & Zhang, H. (2010). Bikesharing in Europe, the Americas, and Asia: past, present, and future. Transportation Research Record: Journal of the Transportation Research Board, (2143), 159-167

http://abc7ny.com/traffic/new-york-city-bike-share-audit-reveals-problems/433473/

http://newyork.cbslocal.com/2015/08/24/citi-bike-customers-seeing-red-over-broken-bikes/

9. R code for Statistical tests

T-test for Count and Weekday

t.test(bike$cnt~bike$holiday)
## 
##  Welch Two Sample t-test
## 
## data:  bike$cnt by bike$holiday
## t = 1.7047, df = 21.007, p-value = 0.103
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -174.1953 1758.4038
## sample estimates:
## mean in group 0 mean in group 1 
##        4527.104        3735.000

T-test for Count and Seasons

t.test(bike$cnt,bike$season)
## 
##  Welch Two Sample t-test
## 
## data:  bike$cnt and bike$season
## t = 62.831, df = 730, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  4361.187 4642.518
## sample estimates:
##  mean of x  mean of y 
## 4504.34884    2.49658

Linear model including all the variables

model3 <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$holiday+bike$weekday+bike$workingday+bike$weathersit+bike$windspeed+bike$hum)
summary(model3)
## 
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$holiday + 
##     bike$weekday + bike$workingday + bike$weathersit + bike$windspeed + 
##     bike$hum)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4916.0  -917.1   112.6   921.6  3501.7 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3147.19     313.02  10.054  < 2e-16 ***
## bike$season       899.44      74.19  12.123  < 2e-16 ***
## bike$yr          2155.91      91.48  23.566  < 2e-16 ***
## bike$mnth         -94.77      23.84  -3.975 7.74e-05 ***
## bike$holiday     -593.91     282.54  -2.102   0.0359 *  
## bike$weekday       77.07      22.89   3.367   0.0008 ***
## bike$workingday   225.15     101.12   2.226   0.0263 *  
## bike$weathersit -1066.68     107.44  -9.928  < 2e-16 ***
## bike$windspeed  -3062.71     632.35  -4.843 1.56e-06 ***
## bike$hum          563.64     432.63   1.303   0.1930    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1227 on 721 degrees of freedom
## Multiple R-squared:  0.6037, Adjusted R-squared:  0.5988 
## F-statistic:   122 on 9 and 721 DF,  p-value: < 2.2e-16

Linear model including well associated variables

model4 <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$holiday+bike$weekday+bike$workingday+bike$weathersit+bike$windspeed)
summary(model4)
## 
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$holiday + 
##     bike$weekday + bike$workingday + bike$weathersit + bike$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4861.3  -903.0    95.9   933.5  3528.5 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3417.80     234.29  14.588  < 2e-16 ***
## bike$season       903.34      74.17  12.180  < 2e-16 ***
## bike$yr          2142.34      90.93  23.560  < 2e-16 ***
## bike$mnth         -92.38      23.78  -3.885 0.000112 ***
## bike$holiday     -596.58     282.67  -2.111 0.035153 *  
## bike$weekday       74.33      22.80   3.259 0.001169 ** 
## bike$workingday   222.44     101.15   2.199 0.028185 *  
## bike$weathersit  -979.20      83.91 -11.669  < 2e-16 ***
## bike$windspeed  -3309.88     603.52  -5.484 5.74e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1228 on 722 degrees of freedom
## Multiple R-squared:  0.6028, Adjusted R-squared:  0.5984 
## F-statistic:   137 on 8 and 722 DF,  p-value: < 2.2e-16

This model, includes all the variables significantly related to count.

Linear model including highly associated variables

model5 <-lm(bike$cnt~bike$season+bike$yr+bike$mnth+bike$weekday+bike$weathersit+bike$windspeed)
summary(model5)
## 
## Call:
## lm(formula = bike$cnt ~ bike$season + bike$yr + bike$mnth + bike$weekday + 
##     bike$weathersit + bike$windspeed)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5014.4  -915.3   134.9   923.4  3362.8 
## 
## Coefficients:
##                 Estimate Std. Error t value Pr(>|t|)    
## (Intercept)      3511.98     225.75  15.557  < 2e-16 ***
## bike$season       916.71      74.59  12.290  < 2e-16 ***
## bike$yr          2141.37      91.58  23.383  < 2e-16 ***
## bike$mnth         -97.04      23.91  -4.059 5.47e-05 ***
## bike$weekday       81.21      22.84   3.555 0.000403 ***
## bike$weathersit  -961.08      84.33 -11.397  < 2e-16 ***
## bike$windspeed  -3349.88     607.68  -5.513 4.92e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1236 on 724 degrees of freedom
## Multiple R-squared:  0.596,  Adjusted R-squared:  0.5926 
## F-statistic:   178 on 6 and 724 DF,  p-value: < 2.2e-16

This model includes all the variables highly related to count. This model contains 4% less variation associated with Count, but is simpler as it contains lesser variables.