Here we read the bike data into R using Github.
The data represents the amount of bikers that go on the Brooklyn Bridge with stats on the days of the week as well as the precipitation and the low and high temperatures of that day
We first build a Poisson frequency regression model and ignore the total population size of all the bikers in the data.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 7.0206554 | 0.0504253 | 139.22894 | 0 |
| DayMonday | 0.3030707 | 0.0137679 | 22.01289 | 0 |
| DaySaturday | 0.0895943 | 0.0143381 | 6.24868 | 0 |
| DaySunday | 0.2042091 | 0.0140217 | 14.56380 | 0 |
| DayThursday | 0.2598638 | 0.0144328 | 18.00505 | 0 |
| DayTuesday | 0.2114776 | 0.0147706 | 14.31749 | 0 |
| DayWednesday | 0.2629973 | 0.0145466 | 18.07966 | 0 |
| HighTemp | 0.0094271 | 0.0005897 | 15.98643 | 0 |
| newp1 | -0.3541359 | 0.0094149 | -37.61431 | 0 |
Above we are shown that the temperature and precipitation are in fact significant in the data and contribute to how many bikers cross over the Brooklyn Bridge
The following model assesses the potential relationship between Biker rates, High temp and precipitation or no precipitation and Days of the week. This is the primary interest of the model. We also want to adjust the relationship be the different days of the week.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -2.3023334 | 0.0508766 | -45.2532710 | 0.0000000 |
| DayMonday | 0.0880389 | 0.0137556 | 6.4002306 | 0.0000000 |
| DaySaturday | 0.1628197 | 0.0143334 | 11.3594543 | 0.0000000 |
| DaySunday | 0.2634215 | 0.0139480 | 18.8860141 | 0.0000000 |
| DayThursday | 0.0586048 | 0.0143601 | 4.0810902 | 0.0000448 |
| DayTuesday | 0.0706446 | 0.0146887 | 4.8094638 | 0.0000015 |
| DayWednesday | 0.0336450 | 0.0143897 | 2.3381354 | 0.0193802 |
| HighTemp | 0.0033975 | 0.0005922 | 5.7367142 | 0.0000000 |
| newp1 | -0.0058136 | 0.0093244 | -0.6234784 | 0.5329702 |
The above table indicates that the log of biker rates is similar along the bounds of precipitation and temperature we must look into the comparison of the days of the week and how they relate to each other.
The inferential tables of the Poisson regression models in the previous sections give numerical information about the potential discrepancy across the age group and among the cities. But it is not intuitive. Next, we create two graphics that make the hidden pattern visible.
The following calculation is based on the regression equation with coefficients given in above table 3. Note that all variables in the model are indicator variables. Each of these indicator variables takes only two possible values: 0 and 1.
For example, \(\exp( -2.30233345)\) gives the biker rate of the baseline day, Friday, and the baseline precipitation which is shown as 0. \(\exp(-2.3023334+1.101)\) gives the biker rate of the baseline day, Friday, and precipitation 0.
Several conclusions we can draw from the output of the regression models.
The regression model based on the biker count is not appropriate since the information on the population size can not be used. Simply include the total biker amount in the regression model to improve the model performance. See the following output of the fitted Poisson regression model.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 6.2556817 | 0.0545087 | 114.764856 | 0.0000000 |
| DayMonday | 0.0962685 | 0.0147678 | 6.518809 | 0.0000000 |
| DaySaturday | 0.1905999 | 0.0145910 | 13.062812 | 0.0000000 |
| DaySunday | 0.2971638 | 0.0141647 | 20.979119 | 0.0000000 |
| DayThursday | 0.0965010 | 0.0150109 | 6.428715 | 0.0000000 |
| DayTuesday | 0.0829859 | 0.0151798 | 5.466873 | 0.0000000 |
| DayWednesday | 0.0343462 | 0.0156325 | 2.197099 | 0.0280134 |
| HighTemp | 0.0061623 | 0.0005958 | 10.343026 | 0.0000000 |
| newp1 | -0.0431121 | 0.0122411 | -3.521903 | 0.0004285 |
| Total | 0.0000539 | 0.0000014 | 38.513379 | 0.0000000 |
This note briefly outlines the regular Poisson regression model for fitting frequency data. The Poisson regression model has a simple structure and is easy to interpret but has a relatively strong assumption - variance is equal to the mean.
If this assumption is violated, we can use negative binomial regression as an alternative. The other potential issue is the data has excess zeros, then we can consider zero-inflated Poisson or zero-inflated negative binomial regression models.
The above two Poison models assume that there is no dispersion issue
in the model. The quasi-Poisson through glm() returns the
dispersion coefficient.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -2.302 | 0.05088 | -45.25 | 0 |
| DayMonday | 0.08804 | 0.01376 | 6.4 | 1.551e-10 |
| DaySaturday | 0.1628 | 0.01433 | 11.36 | 6.656e-30 |
| DaySunday | 0.2634 | 0.01395 | 18.89 | 1.487e-79 |
| DayThursday | 0.0586 | 0.01436 | 4.081 | 4.482e-05 |
| DayTuesday | 0.07064 | 0.01469 | 4.809 | 1.513e-06 |
| DayWednesday | 0.03364 | 0.01439 | 2.338 | 0.01938 |
| HighTemp | 0.003397 | 0.0005922 | 5.737 | 9.653e-09 |
| newp1 | -0.005814 | 0.009324 | -0.6235 | 0.533 |
The dispersion index can be extracted from the quasi-Poisson object with the following code
| Dispersion |
|---|
| 10.62 |
The intercept represents the baseline biker rate ( of baseline precipitation 0 in the baseline day of the week Friday). The actual rate is \(\exp(-2.3023334) \approx -0.005\%\) which is close to the recently reported rate of the country by WHO. The intercept $ -0.0058141$ is the difference of the log-rates between baseline day of the week Friday and Tuesday at any given precipitaion, to be more specific, \(\log(R_{\text{Horsen}}) - \log(R_{\text{Fredericia}}) = -0.3301\) which is equivalent to
\[ \log \left( \frac{R_{\text{Tuesday}}}{R_{\text{Friday}}} \right) = -0.0058141 ~~~\Rightarrow~~~\frac{R_{\text{Horsen}}}{R_{\text{Fredericia}}} = e^{-0.0058141} \approx 0.9942 \]
Several conclusions we can draw from the output of the regression models.
The regression model based on biker count isn’t as accurate as the one that factors in the total from all the bridges. We used the model that includes frequency of population. Because the dispersion is 10.62 we use the quasi poisson regression model
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -2.302 | 0.05088 | -45.25 | 0 |
| DayMonday | 0.08804 | 0.01376 | 6.4 | 1.551e-10 |
| DaySaturday | 0.1628 | 0.01433 | 11.36 | 6.656e-30 |
| DaySunday | 0.2634 | 0.01395 | 18.89 | 1.487e-79 |
| DayThursday | 0.0586 | 0.01436 | 4.081 | 4.482e-05 |
| DayTuesday | 0.07064 | 0.01469 | 4.809 | 1.513e-06 |
| DayWednesday | 0.03364 | 0.01439 | 2.338 | 0.01938 |
| HighTemp | 0.003397 | 0.0005922 | 5.737 | 9.653e-09 |
| newp1 | -0.005814 | 0.009324 | -0.6235 | 0.533 |