Regression Model

Below you will see my representation for the best regression model possible for the dependent variable total riders. Total riders are composed of both casual and registered riders. The following independent variables temp, weekday, promotion, weather situation, humidity, holiday, wind speed, and month were utilized to tell the story of total ridership. I chose the following variables because while I am trying to understand what other riders do, I relate to what would affect me as a rider. After I decided which variables I believed were most applicable, I checked for inflated variances and collinear variables using the cor function and VIF function to ensure there were no inflated variables or collinear variables. I found some issues regarding collinearity and those will be discussed later, however the overall regression was a good fit. The output below displays the significant factors in the regression but overall, we can see most variables are within the significance threshold of .05 and outliers are mostly factor polynomials or other polynomial order variables. Finally, we can see a Multiple R-Squared of .8745 (87.45%) and an Adjusted R-Squared of .8693 (86.93%). After reviewing this summary, I feel comfortable moving forward with the data to represent total ridership.

## 
## Call:
## lm(formula = (total_riders) ~ poly(temp, 3, raw = TRUE) + as.factor(weekday) + 
##     as.factor(Promotion) + as.factor(weathersit) + poly(humidity, 
##     2, raw = TRUE) + as.factor(holiday) + poly(windspeed, 3, 
##     raw = TRUE) + as.factor(mnth), data = bikeshare)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3102.40  -329.13    44.27   413.98  2239.24 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      2.002e+03  6.503e+02   3.078 0.002162 ** 
## poly(temp, 3, raw = TRUE)1      -3.105e+02  7.955e+01  -3.903 0.000104 ***
## poly(temp, 3, raw = TRUE)2       3.412e+01  4.319e+00   7.900 1.08e-14 ***
## poly(temp, 3, raw = TRUE)3      -7.173e-01  7.217e-02  -9.938  < 2e-16 ***
## as.factor(weekday)1              1.515e+02  9.999e+01   1.516 0.130080    
## as.factor(weekday)2              2.896e+02  9.758e+01   2.968 0.003104 ** 
## as.factor(weekday)3              3.471e+02  9.793e+01   3.545 0.000419 ***
## as.factor(weekday)4              3.785e+02  9.791e+01   3.865 0.000121 ***
## as.factor(weekday)5              4.231e+02  9.788e+01   4.322 1.77e-05 ***
## as.factor(weekday)6              4.613e+02  9.716e+01   4.748 2.49e-06 ***
## as.factor(Promotion)1            1.954e+03  5.359e+01  36.464  < 2e-16 ***
## as.factor(weathersit)2          -3.715e+02  7.195e+01  -5.164 3.15e-07 ***
## as.factor(weathersit)3          -1.538e+03  2.034e+02  -7.561 1.25e-13 ***
## poly(humidity, 2, raw = TRUE)1   3.656e+01  1.358e+01   2.693 0.007249 ** 
## poly(humidity, 2, raw = TRUE)2  -4.797e-01  1.114e-01  -4.305 1.91e-05 ***
## as.factor(holiday)1             -6.186e+02  1.640e+02  -3.773 0.000175 ***
## poly(windspeed, 3, raw = TRUE)1 -1.368e+02  6.011e+01  -2.275 0.023183 *  
## poly(windspeed, 3, raw = TRUE)2  7.902e+00  4.153e+00   1.903 0.057505 .  
## poly(windspeed, 3, raw = TRUE)3 -2.045e-01  8.765e-02  -2.333 0.019927 *  
## as.factor(mnth)2                 4.211e+01  1.362e+02   0.309 0.757265    
## as.factor(mnth)3                 4.812e+02  1.483e+02   3.244 0.001234 ** 
## as.factor(mnth)4                 6.413e+02  1.651e+02   3.883 0.000113 ***
## as.factor(mnth)5                 7.920e+02  1.861e+02   4.255 2.37e-05 ***
## as.factor(mnth)6                 8.881e+02  2.108e+02   4.213 2.85e-05 ***
## as.factor(mnth)7                 1.143e+03  2.325e+02   4.918 1.09e-06 ***
## as.factor(mnth)8                 8.923e+02  2.154e+02   4.142 3.86e-05 ***
## as.factor(mnth)9                 1.230e+03  1.921e+02   6.404 2.77e-10 ***
## as.factor(mnth)10                1.319e+03  1.661e+02   7.940 8.01e-15 ***
## as.factor(mnth)11                1.082e+03  1.503e+02   7.198 1.58e-12 ***
## as.factor(mnth)12                7.713e+02  1.408e+02   5.476 6.05e-08 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 700.3 on 701 degrees of freedom
## Multiple R-squared:  0.8745, Adjusted R-squared:  0.8693 
## F-statistic: 168.4 on 29 and 701 DF,  p-value: < 2.2e-16

Applicable Variables in Regression for Correlation

##              total_riders          temp       weekday    Promotion
## total_riders   1.00000000  0.6274940090  0.0674434124  0.566709708
## temp           0.62749401  1.0000000000 -0.0001699624  0.047603572
## weekday        0.06744341 -0.0001699624  1.0000000000 -0.005460765
## Promotion      0.56670971  0.0476035719 -0.0054607652  1.000000000
## weathersit    -0.29739124 -0.1206022365  0.0310874694 -0.048726541
## humidity      -0.10065856  0.1269629390 -0.0522321004 -0.110651045
## holiday       -0.06834772 -0.0285555350 -0.1019602689  0.007954311
## windspeed     -0.23454500 -0.1579441204  0.0142821241 -0.011817060
## mnth           0.27997711  0.2202053352  0.0095093129 -0.001792434
##               weathersit    humidity      holiday    windspeed
## total_riders -0.29739124 -0.10065856 -0.068347716 -0.234544997
## temp         -0.12060224  0.12696294 -0.028555535 -0.157944120
## weekday       0.03108747 -0.05223210 -0.101960269  0.014282124
## Promotion    -0.04872654 -0.11065104  0.007954311 -0.011817060
## weathersit    1.00000000  0.59104460 -0.034626841  0.039511059
## humidity      0.59104460  1.00000000 -0.015937479 -0.248489099
## holiday      -0.03462684 -0.01593748  1.000000000  0.006291507
## windspeed     0.03951106 -0.24848910  0.006291507  1.000000000
## mnth          0.04352810  0.22220369  0.019190895 -0.207501752
##                      mnth
## total_riders  0.279977112
## temp          0.220205335
## weekday       0.009509313
## Promotion    -0.001792434
## weathersit    0.043528098
## humidity      0.222203691
## holiday       0.019190895
## windspeed    -0.207501752
## mnth          1.000000000

Variance Inflation Factors

##                                     GVIF Df GVIF^(1/(2*Df))
## poly(temp, 3, raw = TRUE)      21.694747  3        1.670036
## as.factor(weekday)              1.180943  6        1.013956
## as.factor(Promotion)            1.070005  1        1.034410
## as.factor(weathersit)           2.444317  2        1.250372
## poly(humidity, 2, raw = TRUE)   3.178286  2        1.335206
## as.factor(holiday)              1.117926  1        1.057320
## poly(windspeed, 3, raw = TRUE)  1.402216  3        1.057960
## as.factor(mnth)                25.285956 11        1.158156

Visualizations for Collinearity and Multicollinearity

Did you find any problems with the assumptions of multicollinearity or linearity in this regression?

The main issues found regarding multicollinearity or linearity in my regression occurred when I was trying to maximize the Multiple and Adjusted R-Square values thus finding a high correlation between the month, temp, and season independent variables. I understood I would find inflated variables when combining these three, however a seasonality effect may need to be tested so it was important to include a variable relating to a timeframe even though all three are correlated. The reason I chose month rather than season relates the to the ability to check the slope coefficient for 12 observations rather than 4 seasons, thus giving allowing more opportunity to understand the data. Also, it makes sense to me considering the relationship between temp and month where we can infer what the season may be by using the two independent variables. Keeping this is mind we can better understand total riders without having too many variables that are collinear with highly inflated variance factors which can be problematic. It can be naive to use one variable rather than both however for this model, this will suffice.

Your model from Q1 should include some means of assessing the impact the month of the year has on total ridership. Using your regression output, which month has the highest number of riders, holding everything else constant? What if this month became unseasonably cold and rainy? Would it change the coefficient on this month in any way?

In my regression months 6 and 8 have the largest impact on total ridership. This intuitive because we can infer, as well as test that temp and other corresponding independent variables may play a role in the overall effect of total riders holding everything else constant. The month with the most total riders is month June. If the months became unseasonably cold and rainy it would not change the slope coefficient for the month due to the model remaining constant. The model itself is not changing therefor we must assume the coefficient itself will not change.

Interpret (in simple terms) the coefficient on your “promotion” variable and make an initial judgement on the claims of the marketing department based on your analysis.

Looking at my given regression model we can expect 1954 more riders on any given day where a promotion is utilized with every other independent variable are held constant. My initial judgement or advice to the marketing department would be to utilize promotions in rather unseasonable months where we see an average lower ridership such as January, February, December, March, November, etc. I also would not hope to put my faith in the data, I would consider weather situation, temp, etc. Even though promotion makes a huge impact on ridership we cannot infer that riders will take a promotion and ride in the snow. The decision needs to be made with a gut decision partnered with the data so that a maximized return can be reached. Naive models’ ore inherent such as this on the surface, the ultimate decision should consist of both analysis and common sense.

You suspect the promotion might have influenced casual riders differently than the registered riders. Perform some type of analysis that allows you to assess if the program had a more substantial impact on the casual riders or the registered riders. What is your conclusion, and why? Include any data or screenshots to back up your claim.

Cannot explain anything other than both variables are affected by the fact a promotion influenced both groups to ride on a given day. The percent difference is not a good explanation because there are too many unknowns which in fact make this model naive and a good fit to where has been brought to the surface, however there are too many unknowns to absolutely describe the relationship of promotion on the two different riderships. The data below shows the both R-Squared values are greater for the casual riders which depicts a better fit and prediction rather than registered, however there are more registered riders which can mean more variability to the data thus being harder to explain. The point is there is not enough information giver to be able to tell the entire story but we can get a decent overview of how the tow different riders are affected by promotions.

## 
## Call:
## lm(formula = (casual) ~ poly(temp, 3, raw = TRUE) + as.factor(weekday) + 
##     as.factor(Promotion) + as.factor(weathersit) + poly(humidity, 
##     2, raw = TRUE) + as.factor(holiday) + poly(windspeed, 3, 
##     raw = TRUE), data = bikeshare)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1084.7  -203.0   -29.2   177.2  1594.1 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                     1218.60837  303.97667   4.009 6.74e-05 ***
## poly(temp, 3, raw = TRUE)1      -100.26532   32.38442  -3.096  0.00204 ** 
## poly(temp, 3, raw = TRUE)2        11.83308    1.73955   6.802 2.18e-11 ***
## poly(temp, 3, raw = TRUE)3        -0.24645    0.02920  -8.440  < 2e-16 ***
## as.factor(weekday)1             -768.71679   48.49251 -15.852  < 2e-16 ***
## as.factor(weekday)2             -810.57388   47.25673 -17.153  < 2e-16 ***
## as.factor(weekday)3             -816.65755   47.40345 -17.228  < 2e-16 ***
## as.factor(weekday)4             -801.42404   47.34977 -16.926  < 2e-16 ***
## as.factor(weekday)5             -617.26920   47.48188 -13.000  < 2e-16 ***
## as.factor(weekday)6              162.16213   47.16914   3.438  0.00062 ***
## as.factor(Promotion)1            262.68796   25.79248  10.185  < 2e-16 ***
## as.factor(weathersit)2           -70.13307   34.43610  -2.037  0.04206 *  
## as.factor(weathersit)3          -197.89175   97.13492  -2.037  0.04199 *  
## poly(humidity, 2, raw = TRUE)1     4.05732    6.44084   0.630  0.52894    
## poly(humidity, 2, raw = TRUE)2    -0.09890    0.05294  -1.868  0.06213 .  
## as.factor(holiday)1              534.44998   78.89814   6.774 2.63e-11 ***
## poly(windspeed, 3, raw = TRUE)1  -80.33994   28.76781  -2.793  0.00537 ** 
## poly(windspeed, 3, raw = TRUE)2    5.42641    1.99096   2.726  0.00658 ** 
## poly(windspeed, 3, raw = TRUE)3   -0.13217    0.04205  -3.143  0.00174 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 340.2 on 712 degrees of freedom
## Multiple R-squared:  0.7606, Adjusted R-squared:  0.7546 
## F-statistic: 125.7 on 18 and 712 DF,  p-value: < 2.2e-16
##                 casual          temp       weekday    Promotion
## casual      1.00000000  0.5432846617  0.0599226375  0.248545664
## temp        0.54328466  1.0000000000 -0.0001699624  0.047603572
## weekday     0.05992264 -0.0001699624  1.0000000000 -0.005460765
## Promotion   0.24854566  0.0476035719 -0.0054607652  1.000000000
## weathersit -0.24735300 -0.1206022365  0.0310874694 -0.048726541
## humidity   -0.07700788  0.1269629390 -0.0522321004 -0.110651045
## holiday     0.05427420 -0.0285555350 -0.1019602689  0.007954311
## windspeed  -0.16761335 -0.1579441204  0.0142821241 -0.011817060
##             weathersit    humidity      holiday    windspeed
## casual     -0.24735300 -0.07700788  0.054274203 -0.167613349
## temp       -0.12060224  0.12696294 -0.028555535 -0.157944120
## weekday     0.03108747 -0.05223210 -0.101960269  0.014282124
## Promotion  -0.04872654 -0.11065104  0.007954311 -0.011817060
## weathersit  1.00000000  0.59104460 -0.034626841  0.039511059
## humidity    0.59104460  1.00000000 -0.015937479 -0.248489099
## holiday    -0.03462684 -0.01593748  1.000000000  0.006291507
## windspeed   0.03951106 -0.24848910  0.006291507  1.000000000
##                                    GVIF Df GVIF^(1/(2*Df))
## poly(temp, 3, raw = TRUE)      1.285583  3        1.042758
## as.factor(weekday)             1.155038  6        1.012084
## as.factor(Promotion)           1.050715  1        1.025044
## as.factor(weathersit)          2.302449  2        1.231821
## poly(humidity, 2, raw = TRUE)  2.765578  2        1.289575
## as.factor(holiday)             1.097326  1        1.047533
## poly(windspeed, 3, raw = TRUE) 1.282491  3        1.042339

## 
## Call:
## lm(formula = I(log(registered)) ~ poly(temp, 3, raw = TRUE) + 
##     as.factor(weekday) + as.factor(Promotion) + as.factor(weathersit) + 
##     poly(humidity, 2, raw = TRUE) + as.factor(holiday) + poly(windspeed, 
##     3, raw = TRUE), data = bikeshare)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.3084 -0.0996  0.0132  0.1367  0.7830 
## 
## Coefficients:
##                                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)                      6.312e+00  2.666e-01  23.677  < 2e-16 ***
## poly(temp, 3, raw = TRUE)1       7.132e-02  2.840e-02   2.511 0.012259 *  
## poly(temp, 3, raw = TRUE)2       1.693e-03  1.526e-03   1.109 0.267596    
## poly(temp, 3, raw = TRUE)3      -7.918e-05  2.561e-05  -3.091 0.002070 ** 
## as.factor(weekday)1              2.571e-01  4.253e-02   6.045 2.41e-09 ***
## as.factor(weekday)2              3.218e-01  4.145e-02   7.763 2.89e-14 ***
## as.factor(weekday)3              3.268e-01  4.158e-02   7.860 1.42e-14 ***
## as.factor(weekday)4              3.328e-01  4.153e-02   8.013 4.59e-15 ***
## as.factor(weekday)5              3.119e-01  4.164e-02   7.490 2.05e-13 ***
## as.factor(weekday)6              9.423e-02  4.137e-02   2.278 0.023040 *  
## as.factor(Promotion)1            4.563e-01  2.262e-02  20.173  < 2e-16 ***
## as.factor(weathersit)2          -7.973e-02  3.020e-02  -2.640 0.008475 ** 
## as.factor(weathersit)3          -7.155e-01  8.519e-02  -8.398 2.44e-16 ***
## poly(humidity, 2, raw = TRUE)1   1.698e-02  5.649e-03   3.005 0.002747 ** 
## poly(humidity, 2, raw = TRUE)2  -1.800e-04  4.643e-05  -3.876 0.000116 ***
## as.factor(holiday)1             -4.089e-01  6.920e-02  -5.909 5.35e-09 ***
## poly(windspeed, 3, raw = TRUE)1 -3.703e-02  2.523e-02  -1.468 0.142591    
## poly(windspeed, 3, raw = TRUE)2  2.012e-03  1.746e-03   1.152 0.249585    
## poly(windspeed, 3, raw = TRUE)3 -5.421e-05  3.688e-05  -1.470 0.142020    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2983 on 712 degrees of freedom
## Multiple R-squared:  0.7284, Adjusted R-squared:  0.7215 
## F-statistic: 106.1 on 18 and 712 DF,  p-value: < 2.2e-16
##             registered          temp       weekday    Promotion
## registered  1.00000000  0.5400119662  0.0573674440  0.594248168
## temp        0.54001197  1.0000000000 -0.0001699624  0.047603572
## weekday     0.05736744 -0.0001699624  1.0000000000 -0.005460765
## Promotion   0.59424817  0.0476035719 -0.0054607652  1.000000000
## weathersit -0.26038771 -0.1206022365  0.0310874694 -0.048726541
## humidity   -0.09108860  0.1269629390 -0.0522321004 -0.110651045
## holiday    -0.10874486 -0.0285555350 -0.1019602689  0.007954311
## windspeed  -0.21744898 -0.1579441204  0.0142821241 -0.011817060
##             weathersit    humidity      holiday    windspeed
## registered -0.26038771 -0.09108860 -0.108744863 -0.217448981
## temp       -0.12060224  0.12696294 -0.028555535 -0.157944120
## weekday     0.03108747 -0.05223210 -0.101960269  0.014282124
## Promotion  -0.04872654 -0.11065104  0.007954311 -0.011817060
## weathersit  1.00000000  0.59104460 -0.034626841  0.039511059
## humidity    0.59104460  1.00000000 -0.015937479 -0.248489099
## holiday    -0.03462684 -0.01593748  1.000000000  0.006291507
## windspeed   0.03951106 -0.24848910  0.006291507  1.000000000
##                                    GVIF Df GVIF^(1/(2*Df))
## poly(temp, 3, raw = TRUE)      1.285583  3        1.042758
## as.factor(weekday)             1.155038  6        1.012084
## as.factor(Promotion)           1.050715  1        1.025044
## as.factor(weathersit)          2.302449  2        1.231821
## poly(humidity, 2, raw = TRUE)  2.765578  2        1.289575
## as.factor(holiday)             1.097326  1        1.047533
## poly(windspeed, 3, raw = TRUE) 1.282491  3        1.042339

Report on the promotion’s influence on ridership to the CAO

In my final report, overall, I feel comfortable describing the relationship for total riders but I cannot say the same when breaking up the two categories as more information is needed to tell the entire story. Also, as an overview for the entire model I would like to have information relating to whether this data is a sample or the entire population of riders. Another important point would be possible market basket attachments or in other words I would like to know what else encourages riders other than promotion. Lastly, I would like to know any liability issues in case a bike is damaged and how that effects riders and the overall relationship for profit margin relating to sustainability for the long term.