For the final I decided to use the variables temp with a thrid degree polynomial, month as a factor variable, weathersit as a factor variable, humidity, windspeed and promotion.
It appeared that the model wasnt linear so I transformed the total_riders variable with a log transformation and that appeared to fix it. I also saw high collinearity between the month variable and the temperature variable, I would have taken it out, but Joel said in question 3 that we should leave it in so I did.
the month that has the highest number of riders holding all other variables constant is November. Even if this month became unseasonably cold it would not change the coefficient of this variable in any way because linear regression uses these variables while everything else can change around it. The final prediction would change but the coefficient would not. ceteris paribus yo.
Based off the regression analysis it appears that promotion increases the number of riders by about 55% while holding all other variables constant. Which appears to be a pretty substantial effect
##
## Call:
## lm(formula = log(casual) ~ poly(temp, 3, raw = TRUE) + as.factor(mnth) +
## as.factor(weathersit) + humidity + windspeed + Promotion,
## data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.1911 -0.3779 -0.0964 0.4284 1.8439
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.268e+00 3.645e-01 11.708 < 2e-16 ***
## poly(temp, 3, raw = TRUE)1 1.571e-01 6.370e-02 2.466 0.013895 *
## poly(temp, 3, raw = TRUE)2 1.656e-04 3.454e-03 0.048 0.961767
## poly(temp, 3, raw = TRUE)3 -8.243e-05 5.765e-05 -1.430 0.153185
## as.factor(mnth)2 5.424e-02 1.094e-01 0.496 0.620128
## as.factor(mnth)3 7.059e-01 1.189e-01 5.936 4.56e-09 ***
## as.factor(mnth)4 8.390e-01 1.320e-01 6.356 3.69e-10 ***
## as.factor(mnth)5 8.652e-01 1.487e-01 5.818 9.00e-09 ***
## as.factor(mnth)6 7.911e-01 1.675e-01 4.723 2.80e-06 ***
## as.factor(mnth)7 9.601e-01 1.849e-01 5.191 2.73e-07 ***
## as.factor(mnth)8 8.351e-01 1.712e-01 4.878 1.33e-06 ***
## as.factor(mnth)9 8.189e-01 1.536e-01 5.331 1.32e-07 ***
## as.factor(mnth)10 7.685e-01 1.325e-01 5.801 9.93e-09 ***
## as.factor(mnth)11 6.198e-01 1.201e-01 5.163 3.16e-07 ***
## as.factor(mnth)12 4.010e-01 1.127e-01 3.558 0.000399 ***
## as.factor(weathersit)2 -2.362e-01 5.613e-02 -4.209 2.90e-05 ***
## as.factor(weathersit)3 -1.473e+00 1.428e-01 -10.316 < 2e-16 ***
## humidity -9.322e-03 2.141e-03 -4.355 1.53e-05 ***
## windspeed -2.510e-02 4.422e-03 -5.675 2.02e-08 ***
## Promotion 3.399e-01 4.284e-02 7.935 8.21e-15 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.5634 on 711 degrees of freedom
## Multiple R-squared: 0.7019, Adjusted R-squared: 0.694
## F-statistic: 88.12 on 19 and 711 DF, p-value: < 2.2e-16
##
## Call:
## lm(formula = log(registered) ~ poly(temp, 3, raw = TRUE) + as.factor(mnth) +
## as.factor(weathersit) + humidity + windspeed + Promotion,
## data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3228 -0.1074 0.0473 0.1623 0.9714
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.4008944 0.2086642 35.468 < 2e-16 ***
## poly(temp, 3, raw = TRUE)1 0.0079848 0.0364660 0.219 0.826740
## poly(temp, 3, raw = TRUE)2 0.0047233 0.0019771 2.389 0.017155 *
## poly(temp, 3, raw = TRUE)3 -0.0001235 0.0000330 -3.742 0.000197 ***
## as.factor(mnth)2 0.0686611 0.0626198 1.096 0.273242
## as.factor(mnth)3 0.0599312 0.0680762 0.880 0.378964
## as.factor(mnth)4 0.0817854 0.0755631 1.082 0.279466
## as.factor(mnth)5 0.1253561 0.0851256 1.473 0.141301
## as.factor(mnth)6 0.1544088 0.0958897 1.610 0.107782
## as.factor(mnth)7 0.1839751 0.1058729 1.738 0.082697 .
## as.factor(mnth)8 0.1463051 0.0980098 1.493 0.135944
## as.factor(mnth)9 0.2201372 0.0879432 2.503 0.012532 *
## as.factor(mnth)10 0.2355972 0.0758409 3.106 0.001969 **
## as.factor(mnth)11 0.3649706 0.0687244 5.311 1.46e-07 ***
## as.factor(mnth)12 0.2425880 0.0645243 3.760 0.000184 ***
## as.factor(weathersit)2 -0.0521146 0.0321314 -1.622 0.105263
## as.factor(weathersit)3 -0.8142002 0.0817352 -9.961 < 2e-16 ***
## humidity -0.0061528 0.0012256 -5.020 6.53e-07 ***
## windspeed -0.0158321 0.0025315 -6.254 6.90e-10 ***
## Promotion 0.4614168 0.0245229 18.816 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3225 on 711 degrees of freedom
## Multiple R-squared: 0.6831, Adjusted R-squared: 0.6746
## F-statistic: 80.65 on 19 and 711 DF, p-value: < 2.2e-16
## [1] 1.404807
## [1] 1.58632
It appears based of the analysis above that for casual riders a promotion increases the number of riders by about 40% while for registered riders it increases the number of riders by about 59%
Because the promotion is a price reduction/discount it would probably be good to know that if this discount increased sales by enough to offset the amount of money they were reducing their prices by. Knowing how much additional money the company was making relative to their costs would determine whether or not the promotion was a financial success or failure.