GLM - Bikeshare

Sections

  • Packages Required
  • Questions

Data

Base regression model

Total riders by temperature

Fit of total riders as a function of temperature using a third-degree polynomial.

Table continues below
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 519 775.3 0.6694 0.5035
poly(temp, 3, raw = TRUE)1 63.14 134.5 0.4693 0.639
poly(temp, 3, raw = TRUE)2 16.63 7.217 2.305 0.02146
poly(temp, 3, raw = TRUE)3 -0.4324 0.1208 -3.58 0.0003663
 
(Intercept)
poly(temp, 3, raw = TRUE)1
poly(temp, 3, raw = TRUE)2 *
poly(temp, 3, raw = TRUE)3 * * *
Fitting linear model: total_riders ~ poly(temp, 3, raw = TRUE)
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
731 1423 0.4627 0.4604

Residuals

1. Best regression model possible

After trying with 9 different models, incrementaly adding more IV, some worked others didn’t. The model below is the best fit.

total_riders = b_0 + b_1 temp + b_2(temp)^2+b_3 (temp)^3 + b_4 promotion + b_6 mnth + b_7 weathersit + b_8 humidity + b_9 windspeed

Values

Table continues below
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 3656 476.9 7.665 5.895e-14
poly(temp, 3, raw = TRUE)1 -290.6 83.35 -3.487 0.0005192
poly(temp, 3, raw = TRUE)2 32.6 4.519 7.215 1.39e-12
poly(temp, 3, raw = TRUE)3 -0.6887 0.07543 -9.13 6.967e-19
as.factor(Promotion)1 1962 56.05 35 8.011e-157
as.factor(mnth)2 43.41 143.1 0.3033 0.7618
as.factor(mnth)3 510.1 155.6 3.278 0.001095
as.factor(mnth)4 708 172.7 4.099 4.623e-05
as.factor(mnth)5 885.3 194.6 4.55 6.303e-06
as.factor(mnth)6 1042 219.2 4.754 2.419e-06
as.factor(mnth)7 1255 242 5.186 2.807e-07
as.factor(mnth)8 1054 224 4.704 3.059e-06
as.factor(mnth)9 1302 201 6.476 1.752e-10
as.factor(mnth)10 1409 173.3 8.128 1.93e-15
as.factor(mnth)11 1149 157.1 7.314 7.014e-13
as.factor(mnth)12 812.9 147.5 5.512 4.961e-08
as.factor(weathersit)2 -398.7 73.44 -5.429 7.801e-08
as.factor(weathersit)3 -1836 186.8 -9.825 1.89e-21
humidity -21.66 2.801 -7.731 3.66e-14
windspeed -55.11 5.786 -9.525 2.537e-20
 
(Intercept) * * *
poly(temp, 3, raw = TRUE)1 * * *
poly(temp, 3, raw = TRUE)2 * * *
poly(temp, 3, raw = TRUE)3 * * *
as.factor(Promotion)1 * * *
as.factor(mnth)2
as.factor(mnth)3 * *
as.factor(mnth)4 * * *
as.factor(mnth)5 * * *
as.factor(mnth)6 * * *
as.factor(mnth)7 * * *
as.factor(mnth)8 * * *
as.factor(mnth)9 * * *
as.factor(mnth)10 * * *
as.factor(mnth)11 * * *
as.factor(mnth)12 * * *
as.factor(weathersit)2 * * *
as.factor(weathersit)3 * * *
humidity * * *
windspeed * * *
Fitting linear model: total_riders ~ poly(temp, 3, raw = TRUE) + as.factor(Promotion) + as.factor(mnth) + as.factor(weathersit) + humidity + windspeed
Observations Residual Std. Error \(R^2\) Adjusted \(R^2\)
731 737.1 0.859 0.8552

Plot

2. Problems with assumptions

  • Multicollinearity between season and month, 83%
  • We found that the plot of the model, the residuals vs fitted, shows overestimating on lower and higher values.
  • We tested comparing the model (anova) and we can see that this model works better than base model.

Analysis of Variance Table

Model 1: total_riders ~ poly(temp, 3, raw = TRUE)
Model 2: total_riders ~ poly(temp, 3, raw = TRUE) + as.factor(Promotion) + 
    as.factor(mnth) + as.factor(weathersit) + humidity + windspeed
  Res.Df        RSS Df  Sum of Sq      F    Pr(>F)    
1    727 1472082143                                   
2    711  386341671 16 1085740472 124.88 < 2.2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

3. Month with highest number

  • Which month has the highest number of riders, holding everything else constant?

Month = 10

  • What if this month became unseasonably cold and rainy?

  • Coeficcients: Month 10 = 1409.04 and weathersit 3 = -1835.55 (light snow or light rain)
  • The bad weather doesn’t change the coefficient for month, however it will reduce the total number of riders by 426.51

4. Promotion

  • The promotion seems to be increasing the total number of riders.
  • The coefficient of 1961 means that a promotion brings an increase of 1961 riders compared to days with no promotion.
Table continues below
(Intercept) poly(temp, 3, raw = TRUE)1 poly(temp, 3, raw = TRUE)2
3656 -290.6 32.6
Table continues below
poly(temp, 3, raw = TRUE)3 as.factor(Promotion)1 as.factor(mnth)2
-0.6887 1962 43.41
Table continues below
as.factor(mnth)3 as.factor(mnth)4 as.factor(mnth)5 as.factor(mnth)6
510.1 708 885.3 1042
Table continues below
as.factor(mnth)7 as.factor(mnth)8 as.factor(mnth)9 as.factor(mnth)10
1255 1054 1302 1409
Table continues below
as.factor(mnth)11 as.factor(mnth)12 as.factor(weathersit)2
1149 812.9 -398.7
as.factor(weathersit)3 humidity windspeed
-1836 -21.66 -55.11

5. Casual vs Registered riders

Casual - Model

Using the best model to predict number of casual riders

  • The effect on promotion on casual riders is an increase of 286 riders,
  • R2 of 0.446 for casual is an indicator that the model is not a good fit for casual riders
  casual
Predictors Estimates CI p
(Intercept) 937.85 290.04 – 1585.65 0.005
poly(temp, 3, raw = TRUE)1 -114.97 -228.18 – -1.76 0.047
poly(temp, 3, raw = TRUE)2 10.21 4.07 – 16.35 0.001
poly(temp, 3, raw = TRUE)3 -0.20 -0.31 – -0.10 <0.001
as factor(Promotion)1 286.57 210.44 – 362.70 <0.001
as factor(mnth)2 -17.23 -211.63 – 177.18 0.862
as factor(mnth)3 306.09 94.75 – 517.43 0.005
as factor(mnth)4 454.03 219.44 – 688.62 <0.001
as factor(mnth)5 463.55 199.28 – 727.83 0.001
as factor(mnth)6 398.53 100.84 – 696.22 0.009
as factor(mnth)7 526.90 198.21 – 855.58 0.002
as factor(mnth)8 357.66 53.39 – 661.93 0.022
as factor(mnth)9 415.33 142.31 – 688.36 0.003
as factor(mnth)10 393.41 157.96 – 628.86 0.001
as factor(mnth)11 212.52 -0.83 – 425.88 0.051
as factor(mnth)12 82.53 -117.79 – 282.85 0.420
as factor(weathersit)2 -157.39 -257.15 – -57.64 0.002
as factor(weathersit)3 -458.34 -712.09 – -204.59 <0.001
humidity -5.28 -9.09 – -1.48 0.007
windspeed -15.66 -23.52 – -7.80 <0.001
Observations 731
R2 / adjusted R2 0.461 / 0.446

Registered riders - Model

  • The effect on promotion on registered riders is an increase of 1675 riders
  • With a R2 od 0.764 which is an indicator that the model is a good fit for registered riders
  registered
Predictors Estimates CI p
(Intercept) 2717.71 1757.34 – 3678.08 <0.001
poly(temp, 3, raw = TRUE)1 -175.64 -343.48 – -7.81 0.041
poly(temp, 3, raw = TRUE)2 22.39 13.29 – 31.49 <0.001
poly(temp, 3, raw = TRUE)3 -0.49 -0.64 – -0.33 <0.001
as factor(Promotion)1 1675.39 1562.53 – 1788.26 <0.001
as factor(mnth)2 60.63 -227.57 – 348.84 0.680
as factor(mnth)3 204.02 -109.29 – 517.34 0.202
as factor(mnth)4 253.96 -93.81 – 601.74 0.153
as factor(mnth)5 421.77 29.98 – 813.56 0.035
as factor(mnth)6 643.34 202.01 – 1084.66 0.004
as factor(mnth)7 728.03 240.75 – 1215.30 0.004
as factor(mnth)8 696.23 245.15 – 1147.32 0.003
as factor(mnth)9 886.49 481.74 – 1291.25 <0.001
as factor(mnth)10 1015.63 666.58 – 1364.69 <0.001
as factor(mnth)11 936.36 620.06 – 1252.66 <0.001
as factor(mnth)12 730.41 433.45 – 1027.38 <0.001
as factor(weathersit)2 -241.29 -389.18 – -93.41 0.001
as factor(weathersit)3 -1377.21 -1753.39 – -1001.03 <0.001
humidity -16.37 -22.01 – -10.73 <0.001
windspeed -39.45 -51.11 – -27.80 <0.001
Observations 731
R2 / adjusted R2 0.771 / 0.764

6. Promotion was a financial success or a failure

  • We need to understand if we are at least breaking even within promotion days, promotion vs costs:
  • Operational Cost
  • Maintenance Cost
  • Taxes
  • Fees for registered and casual users, in order to calculate profit

  • In general, we can say that registered users pay 35% in a year for riding bikes. The question goes, is the profit margin high enough to make the business profitable?
  • The program should be revised and separating promotions for casual and registered riders.

2019-11-26