Question 1

The regression model I found to be the best fit for the total riders included the polynomial temp variable, the month, windspeed, humidity, weathersit, and Promotion.

First I built the simple model of riders as a function of temperature with a regression line:

Then I built the regression model with just the polynomial temp variable which gave these results:

## 
## Call:
## lm(formula = total ~ temp + I(temp * temp) + I(temp * temp * 
##     temp), data = bikeshare)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4724.0 -1034.4   -99.6  1130.1  3160.1 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)           518.9929   775.3459   0.669 0.503472    
## temp                   63.1408   134.5298   0.469 0.638964    
## I(temp * temp)         16.6342     7.2173   2.305 0.021461 *  
## I(temp * temp * temp)  -0.4324     0.1208  -3.580 0.000366 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1423 on 727 degrees of freedom
## Multiple R-squared:  0.4627, Adjusted R-squared:  0.4604 
## F-statistic: 208.6 on 3 and 727 DF,  p-value: < 0.00000000000000022

I then plotted this as a baseline:

I then built the regression model with the variables mentioned above, which gave these results:

## 
## Call:
## lm(formula = total ~ temp + I(temp * temp) + I(temp * temp * 
##     temp) + mnth + windspeed + humidity + weathersit + Promotion, 
##     data = bikeshare)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3308.2  -344.5    64.3   452.6  2315.3 
## 
## Coefficients:
##                          Estimate  Std. Error t value             Pr(>|t|)    
## (Intercept)            3655.55759   476.93890   7.665  0.00000000000005895 ***
## temp                   -290.60944    83.34955  -3.487             0.000519 ***
## I(temp * temp)           32.60304     4.51908   7.215  0.00000000000139040 ***
## I(temp * temp * temp)    -0.68868     0.07543  -9.130 < 0.0000000000000002 ***
## mnth2                    43.40561   143.12868   0.303             0.761778    
## mnth3                   510.11506   155.60011   3.278             0.001095 ** 
## mnth4                   707.99181   172.71289   4.099  0.00004623311183325 ***
## mnth5                   885.32358   194.56963   4.550  0.00000630285601288 ***
## mnth6                  1041.86721   219.17299   4.754  0.00000241897831399 ***
## mnth7                  1254.92258   241.99134   5.186  0.00000028070973731 ***
## mnth8                  1053.89186   224.01881   4.704  0.00000305872677145 ***
## mnth9                  1301.82875   201.00980   6.476  0.00000000017521292 ***
## mnth10                 1409.04014   173.34785   8.128  0.00000000000000193 ***
## mnth11                 1148.87931   157.08176   7.314  0.00000000000070144 ***
## mnth12                  812.94672   147.48171   5.512  0.00000004960769791 ***
## windspeed               -55.11091     5.78616  -9.525 < 0.0000000000000002 ***
## humidity                -21.65540     2.80124  -7.731  0.00000000000003660 ***
## weathersit2            -398.68533    73.44189  -5.429  0.00000007800758143 ***
## weathersit3           -1835.55211   186.82033  -9.825 < 0.0000000000000002 ***
## Promotion              1961.96028    56.05143  35.003 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 737.1 on 711 degrees of freedom
## Multiple R-squared:  0.859,  Adjusted R-squared:  0.8552 
## F-statistic: 227.9 on 19 and 711 DF,  p-value: < 0.00000000000000022

Nearly all of the variables (the months of January and February being the exception) showed statistical significance indicated by low p-values. The overall R-squared at 0.8589755 showed the model as a good fit.

The model was then plotted to show the residuals, then a component plot to show residuals for individual variables.

Question 2

While I thought that at least humidity and temperature may be colinear, this did not turn out to be a concern.

A variance inflation factor (VIF) test shows that none of the variables are highly correlated, as the test result values are low:

##                              GVIF Df GVIF^(1/(2*Df))
## temp                   525.701581  1       22.928183
## I(temp * temp)        2575.348359  1       50.747890
## I(temp * temp * temp)  824.325500  1       28.711069
## mnth                    22.066947 11        1.151010
## windspeed                1.212639  1        1.101199
## humidity                 2.138553  1        1.462379
## weathersit               1.834181  2        1.163752
## Promotion                1.056642  1        1.027931

Question 3

From the regression output, it is apparent that the month with the most riders is the month with the highest coefficient, mnth10, or October. A change in weather would not change the coefficient on the month.

Isolating the residual plot for months, this visualization confirms the estimate of greater riders in October:

Question 4

The coefficient for the Promotion variable, 1961.9602777, does appear to have an affect on the number of riders based on this model. It indicates that for a day that the Promotion was active, total riders increases by 1961.9602777.

This is helpful in understanding the overall impact of the Promotion, however, it does not necessarily imply that it is good business. I.e. There may be a greater influence on registered riders than casual riders, effectively giving already-captured customers an unnecessary discount.

Question 5

Using the same variables, but separating the rider populations into casual and registered riders, there is an apparent difference in the influence on the promotion on those types of riders.

Comparing the standarized coefficients of the variables shows this difference:

For casual riders, the standardized coefficient is 0.1670051 and for registered riders is 0.4083562. This shows that registered riders increases significantly more than casual riders when the Promotion is in effect.

Question 6

While the above model and statistics help to show the effect of the promotion, key information is needed to understand the business impact of these results. Namely, what is the net revenue of the promotion per rider, and more importantly, what is the net income of the promotion for casual riders vs.  registered riders. Likely, casual riders provide more net revenue and are more desirable riders. If the target of the promotion is to maximize net revenue by incentivizing new or more casual riders over registered riders, it is not as effective as it is for registered riders.