This document is aimed to perform analysis about a bike sharing program.
1.You want to build the best regression model possible for the dependent variable, total riders. Begin with the example from class where we fit total riders as a function of temperature using a third-degree polynomial. Add as many additional variables to your model as feasible to improve fit. Remember, your goal is to build the best fitting regression model explaining total ridership, using the tools we have covered regarding the linearity and multicollinearity assumptions.
Specify your generalized regression equation, an output of regression output (with coefficients, standard errors, etc.) and a summary of your work.
I consider all variables in this question. And it achieves a high 88.1% R square, with P-value <0.001. To be more specific, the final regression equation:
Total = 1664-334.75(temp)+35.58(temp^2)-0.74(temp^3)+1052.12(season)-115.41(season^2)-22.37(mnth)+74.11(weekday)+1150.24(weathersit)-527.70(weathersit^2)-20.62(humidity)+7.05(windspeed)-1.98(windspeed^2)+1966.87(as.factor(Promotion))-422.83(as.factor(holiday))+92.6(as.factor(workingday))
##
## Call:
## lm(formula = total ~ temp + I(temp * temp) + I(temp * temp *
## temp) + season + I(season * season) + mnth + weekday + weathersit +
## I(weathersit * weathersit) + humidity + windspeed + I(windspeed *
## windspeed) + as.factor(Promotion) + as.factor(holiday) +
## as.factor(workingday), data = bikeshare)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3136.88 -319.64 44.84 416.70 2524.02
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1664.01926 492.92459 3.376 0.000776 ***
## temp -334.75051 65.63295 -5.100 4.35e-07 ***
## I(temp * temp) 35.58087 3.51831 10.113 < 2e-16 ***
## I(temp * temp * temp) -0.74325 0.05876 -12.649 < 2e-16 ***
## season 1052.11956 209.00762 5.034 6.09e-07 ***
## I(season * season) -115.41299 40.15527 -2.874 0.004171 **
## mnth -22.36875 13.34053 -1.677 0.094028 .
## weekday 74.11407 12.48584 5.936 4.56e-09 ***
## weathersit 1150.23585 295.37537 3.894 0.000108 ***
## I(weathersit * weathersit) -527.70222 87.08347 -6.060 2.21e-09 ***
## humidity -20.61944 2.46696 -8.358 3.31e-16 ***
## windspeed 7.04873 20.19077 0.349 0.727112
## I(windspeed * windspeed) -1.98079 0.67161 -2.949 0.003289 **
## as.factor(Promotion)1 1966.87134 50.53020 38.925 < 2e-16 ***
## as.factor(holiday)1 -422.82983 154.09659 -2.744 0.006223 **
## as.factor(workingday)1 92.60230 55.23083 1.677 0.094050 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 668.4 on 715 degrees of freedom
## Multiple R-squared: 0.8834, Adjusted R-squared: 0.881
## F-statistic: 361.2 on 15 and 715 DF, p-value: < 2.2e-16
2.Regarding your model from Q1, explain any problems your encountered with the assumptions of multicollinearity, linearity or homoscedasticity in this regression and how you solved them.
For multicollinearity, which is means there are two or more related or explanatory variables in a multiple regression model are highly linearly related. Based on common intuition that the total riders normmaly correlate to Temperature, Season, Windspeed. And I also tried several combinations here and end up with including all related variables that I think correlated to the Total riders and with third-degree polynomial. The equations I obtained in Q1 is the one with the best performance.
For linearity, which is linear relation between variables. some variable easy to see the linearity but some not, so I examine each variable’s linearity with total riders one by one visually by plot the relabtion between them.
For homoscedastic, which means random variables have the same finite variance, and it does not cause bias in the coefficient estimates. I examine this also by visually, same as the what we did in the class.
3.Your model from Q1 should include some means of assessing the impact the month of the year has on total ridership. Using your regression output, which month has the highest number of riders, holding everything else constant? If this month became unseasonably cold and rainy, would it change the coefficient on this month in any way?
The July would become the month with the highest number of riders. The coefficient of the model on this moth would change if this month became unreasonably cold and rainy since it is unreasonaly and won’t happen every year, so it won’t affect the model. But the number of riders would decrease largerly.
4.Interpret (in simple terms) the coefficient on your “promotion” variable and make an initial judgement on the claims of the marketing department based on your analysis
For coefficient of Promotion here is 1966.87, which means that Promotion is positive correlated with Total riders, is it will increase 1966.87 riders if the promotion is applied. Thereby, I conclude that this promotion is comtrinuted to the total riders, which means it successful.
5.You suspect the promotion might have influenced casual riders differently than the registered riders. Perform some type of analysis that allows you to assess if the program had a more substantial impact on the casual riders or the registered riders. What is your conclusion, and why? Include any data or screenshots to back up your claim.
Visually the two types of riders with Promotion variable in boxplot figures. For the figure, I can observed that promotion do boost the two types of riders, but obviously promotion has a larger affect on Registered riders than casual riders, which is making sense that promotion do benefit more to the register riders.
6.With your analysis from questions 4 and 5 now in-hand, you are prepared to report on the promotion’s influence on ridership. However, you lack some information required to make a meaningful report on whether the promotion was a financial success or a failure. What additional information (from a business perspective) do you need to accurately make such a conclusion?
For a new business project, in general, the very fist key to success is to get popularity, which means get known to public, in addition to that, people are willing to use your product. From what we conclude above, this project is success in the first step, which gain the popularity with the Promotion boost method. While the most important is the the project need to financially rewarded in return, which is the essential goal of a buisness project. In oder to know whether this project is financially beneficial, we also need to know total sales on all days (with/without promotion) based on the membership fee, one time riding fee, etc, to decide if this project is financially fesiable for the investors.