This data set contains the daily total of bike counts was conducted monthly on the Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge. To keep count of cyclists entering and leaving Queens, Manhattan, and Brooklyn via the East River Bridges. The Traffic Information Management System (TIMS) collects the count data. Each record represents the total number of cyclists per 24 hours at Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge. The data set contains seven different variables:
The objective of this case study is to examine the relationship between relevant predictor variables in the data set and the count of the cyclists who rode on the Manhattan bridge in a given day. We will also examine the relationship between day of the week and the proportion of the total cyclists who rode on the Manhattan bridge relative to the amount of total riders on all bridges observed in the data set in a given day. We will use a poisson regression model to examine this relationship because of our response variable of interest being a frequency variable. The response variable of interest is the count of the cyclists on the Manhattan Bridge, and also the proportion of total cyclists on the Manhattan bridge relative to the total count of cyclists in a given day.
To begin analysis, first it is necessary to evaluate the variables in the data set and choose which variables can be used to build the model.
| Date | Day | HighTemp | LowTemp | Precipitation | ManhattanBridge | Total |
|---|---|---|---|---|---|---|
| 2023-07-01 | Saturday | 84.9 | 72.0 | 0.23 | 2958 | 11867 |
| 2023-07-02 | Sunday | 87.1 | 73.0 | 0.00 | 3776 | 13995 |
| 2023-07-03 | Monday | 87.1 | 71.1 | 0.45 | 4199 | 16067 |
| 2023-07-04 | Tuesday | 82.9 | 70.0 | 0.00 | 4084 | 13925 |
| 2023-07-05 | Wednesday | 84.9 | 71.1 | 0.00 | 6770 | 23110 |
| 2023-07-06 | Thursday | 75.0 | 71.1 | 0.00 | 6243 | 21861 |
After an initial examination of the data set, the variable Date will not be necessary to conduct the initial poisson regression, so we can omit it from the data set. We will also check if the mean of the response is equal to the variance. As we can see, the two values are not equal.
## [1] 5424.613
## [1] 2453414
We first build a Poisson frequency regression model using the final data set created from the Exploratory Data Analysis. There is one very strong assumption when using a Poisson Regression Model, and that assumption is that for the response variable the mean is equal to the variance. Even though the data does not meet these model assumptions as outlined above, we will still conduct analysis.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 8.6345244 | 0.0440211 | 196.14501 | 0 |
| DayMonday | 0.1867363 | 0.0100703 | 18.54330 | 0 |
| DaySaturday | -0.2190999 | 0.0106321 | -20.60747 | 0 |
| DaySunday | -0.1331873 | 0.0107366 | -12.40503 | 0 |
| DayThursday | 0.1475508 | 0.0105831 | 13.94206 | 0 |
| DayTuesday | 0.2026021 | 0.0104875 | 19.31840 | 0 |
| DayWednesday | 0.2691229 | 0.0102750 | 26.19201 | 0 |
| HighTemp | 0.0128613 | 0.0007319 | 17.57245 | 0 |
| LowTemp | -0.0157873 | 0.0010597 | -14.89824 | 0 |
| Precipitation | -0.4307477 | 0.0104214 | -41.33284 | 0 |
The variables Day, HighTemp, LowTemp, and Precipitation all have a significant p-value when the Poisson regression model is built. This suggests that there is a statistically significant difference between the counts of cyclists for the baseline level of the categorical variable day (Friday) and the other levels of the variable. For example, the Day Saturday is expected to decrease the count of cyclists by -0.219 when compared to the baseline Day Friday. The interpretation is the same for the other days of the week. The interpretation of the Precipitation variable implies that as Precipitation decreases by 0.43 units, there is a one unit increases in the count of cyclists. The interpretation of the HighTemp variable implies that as Temperature increases by 0.0128 units there is a one unit increase in the count of cyclists. The interpretation of the LowTemp variable implies that as Temperature decreases by 0.0157 units there is a one unit decrease in the count of cyclists.
Because all of the predictor variables in this model are significant, this is sufficient to be the final model for the count of cyclists on the Manhattan Bridge. The key assumption of the mean of the response variable being equal to the variance of the response variable is violated, so we must eventually use a Quasi-Poisson model to analyze this data set in a more reliable manner.
The following model assesses the potential relationship between the relevant predictor variables from the final model above and the proportion of cyclists who rode on the Manhattan Bridge. This is the primary interest of the model. We must first define a new response variable for the proportion of cyclists on the Manhattan Bridge.
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -10.8490421 | 6.0861791 | -1.7825703 | 0.0746563 |
| DayMonday | -0.0539290 | 1.3610238 | -0.0396238 | 0.9683930 |
| DaySaturday | 0.1118994 | 1.3723120 | 0.0815408 | 0.9350119 |
| DaySunday | 0.1533927 | 1.4322407 | 0.1070998 | 0.9147098 |
| DayThursday | -0.1297518 | 1.4866392 | -0.0872786 | 0.9304501 |
| DayTuesday | -0.0927055 | 1.4768788 | -0.0627712 | 0.9499487 |
| DayWednesday | -0.1856679 | 1.4633411 | -0.1268794 | 0.8990358 |
| HighTemp | -0.0170374 | 0.1022715 | -0.1665900 | 0.8676927 |
| LowTemp | 0.0163443 | 0.1459438 | 0.1119905 | 0.9108309 |
| Precipitation | 0.2927874 | 1.0863862 | 0.2695058 | 0.7875405 |
When a full model is generated for the proportion of cyclists on the Manhattan Bridge, there are no significant predictor variables. Because of this, we will generate a new model using just Day as a predictor variable because it contains the most practical interpretation for the model.
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -10.9452509 | 0.9521095 | -11.4957899 | 0.0000000 |
| DayMonday | -0.1860511 | 1.2583030 | -0.1478587 | 0.8824543 |
| DaySaturday | -0.0071692 | 1.2889910 | -0.0055618 | 0.9955623 |
| DaySunday | -0.0110928 | 1.2675963 | -0.0087511 | 0.9930178 |
| DayThursday | -0.2527537 | 1.3336595 | -0.1895189 | 0.8496861 |
| DayTuesday | -0.2116342 | 1.3181369 | -0.1605555 | 0.8724435 |
| DayWednesday | -0.3417495 | 1.3245037 | -0.2580208 | 0.7963908 |
After generating a model with only one predictor variable, Day, the levels of this variable are still statistically insignificant. The interpretation of the coefficients for the Day variable are the same as the interpretation from the model for counts, but we must multiply the expected counts by the total count because log(total) was used as an offset.
We used a similar variable selection process for this model compared to the previous model for the counts of cyclists. The major difference was that this model did not have any significant predictor variables when using the proportion of cyclists as the response variable. This leads to a conclusion that the poisson regression model is better applied to the model for counts of cyclists. The final model for counts and proportions both still have a practical interpretation for the days of week variable.
Overall, we had much more success modeling the counts of the cyclists using the poisson regression model compared to modeling the proportion of cyclists. The count of cyclists model yielded a statistically significant interpretation for all relevant predictor variables in the data set, while the proportion model did not yield any statistically significant interpretations. We can now attempt to use the same data and instead use a Quasi-Poisson model. This may be a more practical model as key assumptions were violated using the poisson regression model for this data and as a result we cannot be as confident in the findings from the model.
To begin the Quasi-Poisson part of this analysis, we first have to modify some of the predictor variables from our data set that we derived from the previous exploratory data analysis. We maintain the same practical question as for the poisson regression modeling done previously in the assignment. We are also using the same data set from the previous part of the assignment.
Now that we have an average temperature variable and have discretized the Precipitation variable, we can created a Quasi-Poisson regression model using this new variables. This model will include the variable ManhattanBridge as the response variables, and Day, AvgTemp, and NewPrecip as the predictor variables.
We will next construct a quasi-poisson model using the count of Manhattan bridge cyclists as the response variable and the variables Day, AvgTemp, and NewPrecip as the predictor variables.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 8.073 | 0.5397 | 14.96 | 5.191e-13 |
| DayMonday | 0.2899 | 0.1285 | 2.256 | 0.03435 |
| DaySaturday | -0.1107 | 0.1393 | -0.7947 | 0.4353 |
| DaySunday | -0.03022 | 0.1366 | -0.2212 | 0.8269 |
| DayThursday | 0.2241 | 0.1351 | 1.658 | 0.1115 |
| DayTuesday | 0.2055 | 0.1371 | 1.499 | 0.1482 |
| DayWednesday | 0.2762 | 0.1351 | 2.045 | 0.05301 |
| AvgTemp | 0.006285 | 0.006822 | 0.9213 | 0.3669 |
| NewPrecip1 | -0.4057 | 0.09213 | -4.404 | 0.0002251 |
The dispersion index can be extracted from the quasi-Poisson object with the following code
| Dispersion |
|---|
| 264.9 |
After we generate the dispersion parameter, we can see that the value is very far away from one. This suggests that the regular poisson model was the better candidate model rather than the quasi poisson model. If the dispersion metric were much smaller and closer to the value of 1, then we would be better suited for the poisson model.
In this graph, we can see the change in the count of cyclists on different days of the week when there is no precipitation reported for the day. This graph uses Friday as the baseline level of the variable day, so the change in counts for the days of the week is based upon the counts from Friday.
After testing both a quasi-poisson and a regular poisson model, we can see that neither of these models can be validated for modeling using this data. When using the regular poisson model we see that our key assumption of the response variable’s mean being equal to its variance is not met. When using the quasi poisson model our dispersion parameter is about 264 which is much larger than 1 which is the ideal dispersion parameter. The best final model would likely be the regular poisson model, but both of these models may be better replaced by a model such as a negative binomial model.