Introduction

This data set contains the daily total of bike counts was conducted monthly on the Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge. To keep count of cyclists entering and leaving Queens, Manhattan, and Brooklyn via the East River Bridges. The Traffic Information Management System (TIMS) collects the count data. Each record represents the total number of cyclists per 24 hours at Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge. The data set contains seven different variables:

Practical Question

The objective of this case study is to examine the relationship between relevant predictor variables in the data set and the count of the cyclists who rode on the Manhattan bridge in a given day. We will also examine the relationship between day of the week and the proportion of the total cyclists who rode on the Manhattan bridge relative to the amount of total riders on all bridges observed in the data set in a given day. We will use a poisson regression model to examine this relationship because of our response variable of interest being a frequency variable. The response variable of interest is the count of the cyclists on the Manhattan Bridge, and also the proportion of total cyclists on the Manhattan bridge relative to the total count of cyclists in a given day.

Exploratory Data Analysis

To begin analysis, first it is necessary to evaluate the variables in the data set and choose which variables can be used to build the model.

First few records in the data set
Date Day HighTemp LowTemp Precipitation ManhattanBridge Total
2023-07-01 Saturday 84.9 72.0 0.23 2958 11867
2023-07-02 Sunday 87.1 73.0 0.00 3776 13995
2023-07-03 Monday 87.1 71.1 0.45 4199 16067
2023-07-04 Tuesday 82.9 70.0 0.00 4084 13925
2023-07-05 Wednesday 84.9 71.1 0.00 6770 23110
2023-07-06 Thursday 75.0 71.1 0.00 6243 21861

After an initial examination of the data set, the variable Date will not be necessary to conduct the initial poisson regression, so we can omit it from the data set. We will also check if the mean of the response is equal to the variance. As we can see, the two values are not equal.

## [1] 5424.613
## [1] 2453414

Poisson Regression on Manhattan Bridge Cyclist Counts

We first build a Poisson frequency regression model using the final data set created from the Exploratory Data Analysis. There is one very strong assumption when using a Poisson Regression Model, and that assumption is that for the response variable the mean is equal to the variance. Even though the data does not meet these model assumptions as outlined above, we will still conduct analysis.

Poisson Regression Model: Counts of cyclists on the Manhattan Bridge versus Relevant Predictor Variables
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.6345244 0.0440211 196.14501 0
DayMonday 0.1867363 0.0100703 18.54330 0
DaySaturday -0.2190999 0.0106321 -20.60747 0
DaySunday -0.1331873 0.0107366 -12.40503 0
DayThursday 0.1475508 0.0105831 13.94206 0
DayTuesday 0.2026021 0.0104875 19.31840 0
DayWednesday 0.2691229 0.0102750 26.19201 0
HighTemp 0.0128613 0.0007319 17.57245 0
LowTemp -0.0157873 0.0010597 -14.89824 0
Precipitation -0.4307477 0.0104214 -41.33284 0

The variables Day, HighTemp, LowTemp, and Precipitation all have a significant p-value when the Poisson regression model is built. This suggests that there is a statistically significant difference between the counts of cyclists for the baseline level of the categorical variable day (Friday) and the other levels of the variable. For example, the Day Saturday is expected to decrease the count of cyclists by -0.219 when compared to the baseline Day Friday. The interpretation is the same for the other days of the week. The interpretation of the Precipitation variable implies that as Precipitation decreases by 0.43 units, there is a one unit increases in the count of cyclists. The interpretation of the HighTemp variable implies that as Temperature increases by 0.0128 units there is a one unit increase in the count of cyclists. The interpretation of the LowTemp variable implies that as Temperature decreases by 0.0157 units there is a one unit decrease in the count of cyclists.

Because all of the predictor variables in this model are significant, this is sufficient to be the final model for the count of cyclists on the Manhattan Bridge. The key assumption of the mean of the response variable being equal to the variance of the response variable is violated, so we must eventually use a Quasi-Poisson model to analyze this data set in a more reliable manner.

Poisson Regression on Manhattan Bridge Proportion of Cyclists

The following model assesses the potential relationship between the relevant predictor variables from the final model above and the proportion of cyclists who rode on the Manhattan Bridge. This is the primary interest of the model. We must first define a new response variable for the proportion of cyclists on the Manhattan Bridge.

## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
Poisson Regression Model: Proportion of cyclists on the Manhattan Bridge versus Relevant Predictor Variables
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.8490421 6.0861791 -1.7825703 0.0746563
DayMonday -0.0539290 1.3610238 -0.0396238 0.9683930
DaySaturday 0.1118994 1.3723120 0.0815408 0.9350119
DaySunday 0.1533927 1.4322407 0.1070998 0.9147098
DayThursday -0.1297518 1.4866392 -0.0872786 0.9304501
DayTuesday -0.0927055 1.4768788 -0.0627712 0.9499487
DayWednesday -0.1856679 1.4633411 -0.1268794 0.8990358
HighTemp -0.0170374 0.1022715 -0.1665900 0.8676927
LowTemp 0.0163443 0.1459438 0.1119905 0.9108309
Precipitation 0.2927874 1.0863862 0.2695058 0.7875405

When a full model is generated for the proportion of cyclists on the Manhattan Bridge, there are no significant predictor variables. Because of this, we will generate a new model using just Day as a predictor variable because it contains the most practical interpretation for the model.

## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.249263
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.269811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.261343
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293285
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.292947
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.285577
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278563
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.278537
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.297271
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302211
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296003
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296732
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281575
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.265773
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.281418
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.295912
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.306918
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.308811
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293693
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.293065
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.276589
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.259673
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282470
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.302141
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.305308
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.296155
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.286394
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282186
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.255637
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.282520
## Warning in dpois(y, mu, log = TRUE): non-integer x = 0.304892
Poisson Regression Model: Proportion of cyclists on the Manhattan Bridge versus Day of the Week
Estimate Std. Error z value Pr(>|z|)
(Intercept) -10.9452509 0.9521095 -11.4957899 0.0000000
DayMonday -0.1860511 1.2583030 -0.1478587 0.8824543
DaySaturday -0.0071692 1.2889910 -0.0055618 0.9955623
DaySunday -0.0110928 1.2675963 -0.0087511 0.9930178
DayThursday -0.2527537 1.3336595 -0.1895189 0.8496861
DayTuesday -0.2116342 1.3181369 -0.1605555 0.8724435
DayWednesday -0.3417495 1.3245037 -0.2580208 0.7963908

After generating a model with only one predictor variable, Day, the levels of this variable are still statistically insignificant. The interpretation of the coefficients for the Day variable are the same as the interpretation from the model for counts, but we must multiply the expected counts by the total count because log(total) was used as an offset.

We used a similar variable selection process for this model compared to the previous model for the counts of cyclists. The major difference was that this model did not have any significant predictor variables when using the proportion of cyclists as the response variable. This leads to a conclusion that the poisson regression model is better applied to the model for counts of cyclists. The final model for counts and proportions both still have a practical interpretation for the days of week variable.

Summary

Overall, we had much more success modeling the counts of the cyclists using the poisson regression model compared to modeling the proportion of cyclists. The count of cyclists model yielded a statistically significant interpretation for all relevant predictor variables in the data set, while the proportion model did not yield any statistically significant interpretations. We can now attempt to use the same data and instead use a Quasi-Poisson model. This may be a more practical model as key assumptions were violated using the poisson regression model for this data and as a result we cannot be as confident in the findings from the model.

Begin Quasi-Poisson Section of Assignment

To begin the Quasi-Poisson part of this analysis, we first have to modify some of the predictor variables from our data set that we derived from the previous exploratory data analysis. We maintain the same practical question as for the poisson regression modeling done previously in the assignment. We are also using the same data set from the previous part of the assignment.

Now that we have an average temperature variable and have discretized the Precipitation variable, we can created a Quasi-Poisson regression model using this new variables. This model will include the variable ManhattanBridge as the response variables, and Day, AvgTemp, and NewPrecip as the predictor variables.

Quasi-Poisson Model Construction

We will next construct a quasi-poisson model using the count of Manhattan bridge cyclists as the response variable and the variables Day, AvgTemp, and NewPrecip as the predictor variables.

Quasi-Poisson Regression on the Count of Cyclists on the Manhattan Bridge After generating the quasi-poisson regression model for our data, we can assess that the NewPrecip variable at its “1” level is statistically significant. Also, we can see the Monday and Wednesday levels of the Day variable are also statistically significant in this model. The interpretation of the regression coefficients is the same as the interpretation from the original poisson regression model included earlier in the report. Next we can calculate the dispersion parameter for this model.
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.073 0.5397 14.96 5.191e-13
DayMonday 0.2899 0.1285 2.256 0.03435
DaySaturday -0.1107 0.1393 -0.7947 0.4353
DaySunday -0.03022 0.1366 -0.2212 0.8269
DayThursday 0.2241 0.1351 1.658 0.1115
DayTuesday 0.2055 0.1371 1.499 0.1482
DayWednesday 0.2762 0.1351 2.045 0.05301
AvgTemp 0.006285 0.006822 0.9213 0.3669
NewPrecip1 -0.4057 0.09213 -4.404 0.0002251

The dispersion index can be extracted from the quasi-Poisson object with the following code

Dispersion
264.9

After we generate the dispersion parameter, we can see that the value is very far away from one. This suggests that the regular poisson model was the better candidate model rather than the quasi poisson model. If the dispersion metric were much smaller and closer to the value of 1, then we would be better suited for the poisson model.

Some Graphical Comparison

In this graph, we can see the change in the count of cyclists on different days of the week when there is no precipitation reported for the day. This graph uses Friday as the baseline level of the variable day, so the change in counts for the days of the week is based upon the counts from Friday.

Summary

After testing both a quasi-poisson and a regular poisson model, we can see that neither of these models can be validated for modeling using this data. When using the regular poisson model we see that our key assumption of the response variable’s mean being equal to its variance is not met. When using the quasi poisson model our dispersion parameter is about 264 which is much larger than 1 which is the ideal dispersion parameter. The best final model would likely be the regular poisson model, but both of these models may be better replaced by a model such as a negative binomial model.