The purpose of this markdown is to perform analysis on the BikeShare program for Toronto,Canada. The goal is to assess the real impact of the promotion on the use of the bike sharing program.
The bike sharing program operates on the following model:
Bikes may be rented only on a daily basis and only for the entire day (no partial days.)
There are two ways a person can rent a bike:
o First, casual users can walk up to any available bike rental terminal, swipe a credit card and rent a bike by paying a daily fee (assuming bikes are available.)
o Second, registered BikeShareTM members pay a monthly fee and are guaranteed bike rental availability whenever they want. They pay half the daily fee of the casual users.
The marketing department’s promotion was run over the course of two years and would randomly assign half of the year’s days to be declared “promotional days” and the other half “non-promotional days.”
On promotional days, the daily rental fee paid by the individual is discounted by 30% of the standard daily rate. As a result, on promotional days casual users pay 70% of the normal daily fee, and registered members pay 20% of the normal daily fee (because they are already receiving a 50% discount on rentals by being members.)
The following packages were required for this analysis:
| PACKAGE | Description |
|---|---|
| readr | Allows the imporation of .csv files |
| Skimr | Grants ability to generate summary statistics |
| Tidyverse | Loads the tidyverse packagess |
| knitr | RMarkdown documents |
| rmdformats | RMarkdown themes |
| CAR | Companion to Regression Package |
To access the data used in this case, download the .csv file of the data from the following website. Then load the data into a variable as follows:
# download the web-based data for university data from the US department
# of Education
#download.file("http://asayanalytics.com/bikeshare_csv","bikeshare.csv")
# Read the data and store in a database called Universi_Data_raw. This contains
# 7115 observations of 29 different variables relative to Universities. This
# data will be cleaned to become more useful for future analysis.
bikeshare <-
read_csv("bikeshare.csv")
This analysis builds off of a previously generated regression model for the Toronto Bike Share program that used the temperature as the variable to calculate how many riders used the bikes. This generated the following third degree polynomial as the regression equation:
Total Riders = B0 + B1 * Temp + B2 * Temp^2 + B3 * Temp^3
This equation fits the data as follows:
##
## Call:
## lm(formula = total_riders ~ poly(temp, 3, raw = TRUE), data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4724.0 -1034.4 -99.6 1130.1 3160.1
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 518.9929 775.3459 0.669 0.503472
## poly(temp, 3, raw = TRUE)1 63.1408 134.5298 0.469 0.638964
## poly(temp, 3, raw = TRUE)2 16.6342 7.2173 2.305 0.021461 *
## poly(temp, 3, raw = TRUE)3 -0.4324 0.1208 -3.580 0.000366 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1423 on 727 degrees of freedom
## Multiple R-squared: 0.4627, Adjusted R-squared: 0.4604
## F-statistic: 208.6 on 3 and 727 DF, p-value: < 2.2e-16
From the plotted data, it is pretty obvious to tell that the data is not linear so a regression equation that is curvilinear is more appropriate.
This regression has an r-squared value of 0.462. The equation for this output is as follows:
Total riders = 518.99 + 63.14* temp + 16.63* temp^2 - 0.43* temp^3
This form seems adequate, but the regression could be better if additional variables are taken into account.
To include new variables in the model, a new regression equation needs to be built.
##
## Call:
## lm(formula = total_riders ~ poly(temp, 3, raw = TRUE) + as.factor(Promotion) +
## as.factor(mnth) + as.factor(workingday) + humidity + as.factor(weathersit) +
## windspeed, data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3315.7 -344.3 53.7 441.3 2380.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.527e+03 4.770e+02 7.394 4.02e-13 ***
## poly(temp, 3, raw = TRUE)1 -2.877e+02 8.297e+01 -3.468 0.000556 ***
## poly(temp, 3, raw = TRUE)2 3.226e+01 4.500e+00 7.168 1.91e-12 ***
## poly(temp, 3, raw = TRUE)3 -6.820e-01 7.511e-02 -9.079 < 2e-16 ***
## as.factor(Promotion)1 1.964e+03 5.580e+01 35.206 < 2e-16 ***
## as.factor(mnth)2 4.200e+01 1.425e+02 0.295 0.768240
## as.factor(mnth)3 5.123e+02 1.549e+02 3.308 0.000988 ***
## as.factor(mnth)4 7.281e+02 1.721e+02 4.232 2.62e-05 ***
## as.factor(mnth)5 9.121e+02 1.939e+02 4.704 3.07e-06 ***
## as.factor(mnth)6 1.072e+03 2.184e+02 4.906 1.15e-06 ***
## as.factor(mnth)7 1.294e+03 2.413e+02 5.362 1.11e-07 ***
## as.factor(mnth)8 1.080e+03 2.232e+02 4.841 1.59e-06 ***
## as.factor(mnth)9 1.336e+03 2.005e+02 6.666 5.29e-11 ***
## as.factor(mnth)10 1.430e+03 1.727e+02 8.280 6.12e-16 ***
## as.factor(mnth)11 1.156e+03 1.564e+02 7.396 3.98e-13 ***
## as.factor(mnth)12 8.178e+02 1.468e+02 5.571 3.60e-08 ***
## as.factor(workingday)1 1.632e+02 5.889e+01 2.772 0.005720 **
## humidity -2.129e+01 2.791e+00 -7.628 7.67e-14 ***
## as.factor(weathersit)2 -4.138e+02 7.330e+01 -5.645 2.39e-08 ***
## as.factor(weathersit)3 -1.866e+03 1.863e+02 -10.018 < 2e-16 ***
## windspeed -5.457e+01 5.762e+00 -9.470 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 733.7 on 710 degrees of freedom
## Multiple R-squared: 0.8605, Adjusted R-squared: 0.8566
## F-statistic: 219 on 20 and 710 DF, p-value: < 2.2e-16
Important factors to note:
The r-squared value for the regression is 0.8605
all variables except for the month of February (2) are significant
The next step is to check for multicollinearity. To check for this, the Variance Inflation Factor is generated for each variable to see if it correlates with another.
The general guidelines are as follows:
VIF < 5 = Good, 5 < VIF < 10 = Possible Problem, VIF > 10 = Problem Very Likely
## GVIF Df GVIF^(1/(2*Df))
## poly(temp, 3, raw = TRUE) 20.775699 3 1.658031
## as.factor(Promotion) 1.056907 1 1.028060
## as.factor(mnth) 22.234971 11 1.151407
## as.factor(workingday) 1.018060 1 1.008990
## humidity 2.143279 1 1.463994
## as.factor(weathersit) 1.846672 2 1.165729
## windspeed 1.214038 1 1.101834
From the results above, the variables can be shown to not be redundant by looking at the GVIF^(1/(2*Df)), which reduces GVIF to a linear measure for comparison across continuous and categorical variables. The results for all variables are below 5, indicating that there is not a multicollinearity problem with this data.
The last test to check for in the data is homoscedasticity.
Based on the plot of the total riders vs temperature, we can see that as the temperature increases, the residuals of the data are shown to increase as the temperature increases. This would point to the data having some level of heteroscedasticity. This can be investigated by applying log() to the dependent variable and seeing how the r^2 changes.
##
## Call:
## lm(formula = log(total_riders) ~ poly(temp, 3, raw = TRUE) +
## as.factor(Promotion) + as.factor(mnth) + as.factor(workingday) +
## humidity + as.factor(weathersit) + windspeed, data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.3236 -0.0828 0.0203 0.1233 1.0176
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.412e+00 1.852e-01 40.016 < 2e-16 ***
## poly(temp, 3, raw = TRUE)1 2.341e-02 3.222e-02 0.727 0.46771
## poly(temp, 3, raw = TRUE)2 4.306e-03 1.747e-03 2.464 0.01397 *
## poly(temp, 3, raw = TRUE)3 -1.208e-04 2.917e-05 -4.140 3.89e-05 ***
## as.factor(Promotion)1 4.393e-01 2.167e-02 20.273 < 2e-16 ***
## as.factor(mnth)2 6.617e-02 5.532e-02 1.196 0.23204
## as.factor(mnth)3 1.407e-01 6.014e-02 2.339 0.01961 *
## as.factor(mnth)4 1.974e-01 6.681e-02 2.954 0.00324 **
## as.factor(mnth)5 2.329e-01 7.530e-02 3.093 0.00206 **
## as.factor(mnth)6 2.331e-01 8.481e-02 2.748 0.00615 **
## as.factor(mnth)7 2.962e-01 9.369e-02 3.161 0.00164 **
## as.factor(mnth)8 2.247e-01 8.666e-02 2.592 0.00973 **
## as.factor(mnth)9 3.028e-01 7.784e-02 3.890 0.00011 ***
## as.factor(mnth)10 2.992e-01 6.706e-02 4.462 9.45e-06 ***
## as.factor(mnth)11 4.008e-01 6.072e-02 6.600 8.03e-11 ***
## as.factor(mnth)12 2.551e-01 5.701e-02 4.474 8.92e-06 ***
## as.factor(workingday)1 6.377e-02 2.287e-02 2.788 0.00544 **
## humidity -6.441e-03 1.084e-03 -5.942 4.41e-09 ***
## as.factor(weathersit)2 -8.775e-02 2.846e-02 -3.083 0.00213 **
## as.factor(weathersit)3 -9.274e-01 7.234e-02 -12.820 < 2e-16 ***
## windspeed -1.749e-02 2.238e-03 -7.816 1.97e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2849 on 710 degrees of freedom
## Multiple R-squared: 0.768, Adjusted R-squared: 0.7615
## F-statistic: 117.5 on 20 and 710 DF, p-value: < 2.2e-16
Before applying log to the Dependent Variable, we had an r^2 value of 0.8605. After applying log to flatten the residuals, the r^2 is reduced to 0.768. The non-log version of the regression model is a better fit, and therefore, the regression will not utilize the log, and will contain some level of heteroscedasticity, and the regression equation is as follows:
Total riders = 3527 - 287.7* temp + 32.26* temp^2 - .68* temp^3 + promotion + month + workingday - 21.29* humidity + weathersit -54.57* windspeed
Factors for promotion, month, workingday, and weathersit are applicable only for the month, workingday, and weathersit that the data occurs under. For example, ifpromotion =1 (true), the month were June, the workingday = 1 (true), and the weathersit variable indicated mist and cloudy (2), then the factors to include in the equation would be as follows:
Total riders = 3527 - 287.7* temp + 32.26* temp^2 - .68* temp^3 +1963 +1072 +163.2 - 21.29* humidity -413.8 -54.57* windspeed
While generating the regression model, linearity, multicollinearity, and homoscedasticity were all evaluated to understand what changes were required in the model.
What was known is that based on the beginning 3rd order polynomial, the linearity test was failed and the regression model needed to reflect some level of curvilinear behavior.
While running the variance inflation factor test, it was noticed that a few variables had high collinearity with each other. The first is thing that is notices is that the season variable is represented within the month variable. When both season and month were included in the regression, the month variable had very little significance and both variables had very migh VIF factors. When the season variable was removed, the month variable became very significant. Therefore, the season variable was dropped and the month variable is kept.
This scenario also played out with the workingday, weekday, and holiday variables. Weekday and Holiday variables are captured within the workingday variable, and thus weekday and holiday variables are removed from the equation, and the multicollinearity issues were resolved.
Lastly, the heteroscedasticity level remains unchanged due to a reduced regression by applying log to the dependent variable of the regression.
Question: Using your regression output, which month has the highest number of riders, holding everything else constant? If this month became unseasonably cold and rainy, would it change the coefficient on this month in any way?
To answer these questions, we will use the following data from the regression model that was just created:
##
## Call:
## lm(formula = total_riders ~ poly(temp, 3, raw = TRUE) + as.factor(Promotion) +
## as.factor(mnth) + as.factor(workingday) + humidity + as.factor(weathersit) +
## windspeed, data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3315.7 -344.3 53.7 441.3 2380.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.527e+03 4.770e+02 7.394 4.02e-13 ***
## poly(temp, 3, raw = TRUE)1 -2.877e+02 8.297e+01 -3.468 0.000556 ***
## poly(temp, 3, raw = TRUE)2 3.226e+01 4.500e+00 7.168 1.91e-12 ***
## poly(temp, 3, raw = TRUE)3 -6.820e-01 7.511e-02 -9.079 < 2e-16 ***
## as.factor(Promotion)1 1.964e+03 5.580e+01 35.206 < 2e-16 ***
## as.factor(mnth)2 4.200e+01 1.425e+02 0.295 0.768240
## as.factor(mnth)3 5.123e+02 1.549e+02 3.308 0.000988 ***
## as.factor(mnth)4 7.281e+02 1.721e+02 4.232 2.62e-05 ***
## as.factor(mnth)5 9.121e+02 1.939e+02 4.704 3.07e-06 ***
## as.factor(mnth)6 1.072e+03 2.184e+02 4.906 1.15e-06 ***
## as.factor(mnth)7 1.294e+03 2.413e+02 5.362 1.11e-07 ***
## as.factor(mnth)8 1.080e+03 2.232e+02 4.841 1.59e-06 ***
## as.factor(mnth)9 1.336e+03 2.005e+02 6.666 5.29e-11 ***
## as.factor(mnth)10 1.430e+03 1.727e+02 8.280 6.12e-16 ***
## as.factor(mnth)11 1.156e+03 1.564e+02 7.396 3.98e-13 ***
## as.factor(mnth)12 8.178e+02 1.468e+02 5.571 3.60e-08 ***
## as.factor(workingday)1 1.632e+02 5.889e+01 2.772 0.005720 **
## humidity -2.129e+01 2.791e+00 -7.628 7.67e-14 ***
## as.factor(weathersit)2 -4.138e+02 7.330e+01 -5.645 2.39e-08 ***
## as.factor(weathersit)3 -1.866e+03 1.863e+02 -10.018 < 2e-16 ***
## windspeed -5.457e+01 5.762e+00 -9.470 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 733.7 on 710 degrees of freedom
## Multiple R-squared: 0.8605, Adjusted R-squared: 0.8566
## F-statistic: 219 on 20 and 710 DF, p-value: < 2.2e-16
To answer the first question, we need to find the month with the largest estimate, which also coincides with the largest t-value, meaning that it has the greatest impact on the regression for that category.
In this case, the month of October (10) will have the highest estimated number of riders, assuming all other independent variables remain constant.
To answer the second question, if the month were unseasonably rainy and windy, this coefficient would remain the same because as demonstrated earlier with the multicollinearity test, the variables for month, weathersit and windpseed are not likely to have any redundancy to each other.
Question: Interpret (in simple terms) the coefficient on your “promotion” variable and make an initial judgement on the claims of the marketing department based on your analysis
##
## Call:
## lm(formula = total_riders ~ poly(temp, 3, raw = TRUE) + as.factor(Promotion) +
## as.factor(mnth) + as.factor(workingday) + humidity + as.factor(weathersit) +
## windspeed, data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3315.7 -344.3 53.7 441.3 2380.5
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.527e+03 4.770e+02 7.394 4.02e-13 ***
## poly(temp, 3, raw = TRUE)1 -2.877e+02 8.297e+01 -3.468 0.000556 ***
## poly(temp, 3, raw = TRUE)2 3.226e+01 4.500e+00 7.168 1.91e-12 ***
## poly(temp, 3, raw = TRUE)3 -6.820e-01 7.511e-02 -9.079 < 2e-16 ***
## as.factor(Promotion)1 1.964e+03 5.580e+01 35.206 < 2e-16 ***
## as.factor(mnth)2 4.200e+01 1.425e+02 0.295 0.768240
## as.factor(mnth)3 5.123e+02 1.549e+02 3.308 0.000988 ***
## as.factor(mnth)4 7.281e+02 1.721e+02 4.232 2.62e-05 ***
## as.factor(mnth)5 9.121e+02 1.939e+02 4.704 3.07e-06 ***
## as.factor(mnth)6 1.072e+03 2.184e+02 4.906 1.15e-06 ***
## as.factor(mnth)7 1.294e+03 2.413e+02 5.362 1.11e-07 ***
## as.factor(mnth)8 1.080e+03 2.232e+02 4.841 1.59e-06 ***
## as.factor(mnth)9 1.336e+03 2.005e+02 6.666 5.29e-11 ***
## as.factor(mnth)10 1.430e+03 1.727e+02 8.280 6.12e-16 ***
## as.factor(mnth)11 1.156e+03 1.564e+02 7.396 3.98e-13 ***
## as.factor(mnth)12 8.178e+02 1.468e+02 5.571 3.60e-08 ***
## as.factor(workingday)1 1.632e+02 5.889e+01 2.772 0.005720 **
## humidity -2.129e+01 2.791e+00 -7.628 7.67e-14 ***
## as.factor(weathersit)2 -4.138e+02 7.330e+01 -5.645 2.39e-08 ***
## as.factor(weathersit)3 -1.866e+03 1.863e+02 -10.018 < 2e-16 ***
## windspeed -5.457e+01 5.762e+00 -9.470 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 733.7 on 710 degrees of freedom
## Multiple R-squared: 0.8605, Adjusted R-squared: 0.8566
## F-statistic: 219 on 20 and 710 DF, p-value: < 2.2e-16
Looking at the data again, we can see that with an active promotion, the total number of riders would increase by 1964 riders, assuming that all other variables remain the same. Additionally, the promotion variable has the highest t-score, meaning that it has the greatest impact on the regression model, and the greatest impact on the total riders. Therefore, if the marketing department’s goal was to increase total ridership, then they could claim success.
This analysis looks at whether the BikeShare promotion by the Toronto marketing department was more effective for the casual riders or the registered riders.
To do this, a regression equation is needed for each casual and registered riders.
First, casual riders are looked at:
##
## Call:
## lm(formula = casual ~ poly(temp, 3, raw = TRUE) + as.factor(Promotion) +
## as.factor(mnth) + as.factor(workingday) + humidity + as.factor(weathersit) +
## windspeed, data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -989.3 -213.5 -32.7 170.4 1407.8
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1572.43944 224.86833 6.993 6.23e-12 ***
## poly(temp, 3, raw = TRUE)1 -129.15715 39.11434 -3.302 0.001008 **
## poly(temp, 3, raw = TRUE)2 11.92396 2.12137 5.621 2.73e-08 ***
## poly(temp, 3, raw = TRUE)3 -0.23563 0.03541 -6.654 5.70e-11 ***
## as.factor(Promotion)1 274.50084 26.30509 10.435 < 2e-16 ***
## as.factor(mnth)2 -10.27907 67.16271 -0.153 0.878405
## as.factor(mnth)3 295.37385 73.01536 4.045 5.80e-05 ***
## as.factor(mnth)4 354.74890 81.11679 4.373 1.41e-05 ***
## as.factor(mnth)5 331.57840 91.41402 3.627 0.000307 ***
## as.factor(mnth)6 251.89297 102.96989 2.446 0.014676 *
## as.factor(mnth)7 335.36537 113.74496 2.948 0.003299 **
## as.factor(mnth)8 227.18498 105.21576 2.159 0.031167 *
## as.factor(mnth)9 245.88313 94.50352 2.602 0.009466 **
## as.factor(mnth)10 290.35640 81.42005 3.566 0.000387 ***
## as.factor(mnth)11 175.04438 73.72101 2.374 0.017842 *
## as.factor(mnth)12 58.50001 69.20988 0.845 0.398253
## as.factor(workingday)1 -805.06827 27.76541 -28.995 < 2e-16 ***
## humidity -7.07425 1.31592 -5.376 1.03e-07 ***
## as.factor(weathersit)2 -82.92774 34.55773 -2.400 0.016666 *
## as.factor(weathersit)3 -307.22447 87.81911 -3.498 0.000497 ***
## windspeed -18.33012 2.71669 -6.747 3.13e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 345.9 on 710 degrees of freedom
## Multiple R-squared: 0.7532, Adjusted R-squared: 0.7462
## F-statistic: 108.3 on 20 and 710 DF, p-value: < 2.2e-16
From the regression output for casual riders, it is shown that if all other variables remain the same, the promotion variable will add an additional 274.5 riders. A box plot can also graphically show this:
It is shown that the mean number of casual riders increases by ~ 275 riders.
##
## Call:
## lm(formula = registered ~ poly(temp, 3, raw = TRUE) + as.factor(Promotion) +
## as.factor(mnth) + as.factor(workingday) + humidity + as.factor(weathersit) +
## windspeed, data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3188.0 -255.8 43.6 378.8 1669.2
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.954e+03 3.951e+02 4.947 9.44e-07 ***
## poly(temp, 3, raw = TRUE)1 -1.586e+02 6.873e+01 -2.307 0.021322 *
## poly(temp, 3, raw = TRUE)2 2.033e+01 3.727e+00 5.455 6.78e-08 ***
## poly(temp, 3, raw = TRUE)3 -4.464e-01 6.222e-02 -7.174 1.84e-12 ***
## as.factor(Promotion)1 1.690e+03 4.622e+01 36.563 < 2e-16 ***
## as.factor(mnth)2 5.228e+01 1.180e+02 0.443 0.657914
## as.factor(mnth)3 2.169e+02 1.283e+02 1.691 0.091316 .
## as.factor(mnth)4 3.734e+02 1.425e+02 2.620 0.008989 **
## as.factor(mnth)5 5.805e+02 1.606e+02 3.614 0.000323 ***
## as.factor(mnth)6 8.197e+02 1.809e+02 4.531 6.90e-06 ***
## as.factor(mnth)7 9.584e+02 1.999e+02 4.795 1.98e-06 ***
## as.factor(mnth)8 8.532e+02 1.849e+02 4.615 4.67e-06 ***
## as.factor(mnth)9 1.090e+03 1.660e+02 6.566 9.97e-11 ***
## as.factor(mnth)10 1.140e+03 1.431e+02 7.966 6.53e-15 ***
## as.factor(mnth)11 9.814e+02 1.295e+02 7.577 1.11e-13 ***
## as.factor(mnth)12 7.593e+02 1.216e+02 6.244 7.33e-10 ***
## as.factor(workingday)1 9.683e+02 4.879e+01 19.848 < 2e-16 ***
## humidity -1.422e+01 2.312e+00 -6.149 1.30e-09 ***
## as.factor(weathersit)2 -3.309e+02 6.072e+01 -5.449 6.99e-08 ***
## as.factor(weathersit)3 -1.559e+03 1.543e+02 -10.103 < 2e-16 ***
## windspeed -3.624e+01 4.773e+00 -7.592 9.95e-14 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 607.8 on 710 degrees of freedom
## Multiple R-squared: 0.8524, Adjusted R-squared: 0.8483
## F-statistic: 205.1 on 20 and 710 DF, p-value: < 2.2e-16
From the regression output for casual registered riders, it is shown that if all other variables remain the same, the promotion variable will add an additional 1690 additional riders.
For registered riders, we can visually see that the riders also increased to match the regression.
Between these two sets of data, it can be shown that the promotion is more effective in drawing registered riders than it is for casual riders.
This question is about what additional pieces of information would be needed to determine whether the Toronto BikeShare program can be considered a financial success.
The first piece of information that would help would be knowing what the original price the riders are going to pay, assuming that registered users pay 50% less than that price on a non-promotional day. This will determine how much revenue is generated, or lost, relative to the promotional program.
Additionally, it would be beneficial to understand what the current number of registered riders there are relative to the number of registered riders that are taking advantage of the promotional days. Registered riders are already paying a monthly fee, so additional revenue may not actually be generated from those riders.
A third set of information that is necessary would be the costs associated with implementing this type of program. Costs such as damage from casual riders, whom are assumed to be less conservative with the equipment that they use less often than registered riders, and also administrative costs of the infrastructure to implement the promotional days.