1 Intro for Poisson

Here we read the bike data into R using Github.

The data represents the amount of bikers that go on the Brooklyn Bridge with stats on the days of the week as well as the precipitation and the low and high temperatures of that day

1.1 Poisson Regression on bike frequency

We first build a Poisson frequency regression model and ignore the total population size of all the bikers in the data.

The Poisson regression model for the counts of lung cancer cases versus the geographical locations and the age group.
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.0206554 0.0504253 139.22894 0
DayMonday 0.3030707 0.0137679 22.01289 0
DaySaturday 0.0895943 0.0143381 6.24868 0
DaySunday 0.2042091 0.0140217 14.56380 0
DayThursday 0.2598638 0.0144328 18.00505 0
DayTuesday 0.2114776 0.0147706 14.31749 0
DayWednesday 0.2629973 0.0145466 18.07966 0
HighTemp 0.0094271 0.0005897 15.98643 0
newp1 -0.3541359 0.0094149 -37.61431 0

Above we are shown that the temperature and precipitation are in fact significant in the data and contribute to how many bikers cross over the Brooklyn Bridge

1.2 Poisson Regression on Rates

The following model assesses the potential relationship between Biker rates, High temp and precipitation or no precipitation and Days of the week. This is the primary interest of the model. We also want to adjust the relationship be the different days of the week.

Poisson regression on the rate of the the cancer rate in the four Danish cities adjusted by age.
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.3023334 0.0508766 -45.2532710 0.0000000
DayMonday 0.0880389 0.0137556 6.4002306 0.0000000
DaySaturday 0.1628197 0.0143334 11.3594543 0.0000000
DaySunday 0.2634215 0.0139480 18.8860141 0.0000000
DayThursday 0.0586048 0.0143601 4.0810902 0.0000448
DayTuesday 0.0706446 0.0146887 4.8094638 0.0000015
DayWednesday 0.0336450 0.0143897 2.3381354 0.0193802
HighTemp 0.0033975 0.0005922 5.7367142 0.0000000
newp1 -0.0058136 0.0093244 -0.6234784 0.5329702

The above table indicates that the log of biker rates is similar along the bounds of precipitation and temperature we must look into the comparison of the days of the week and how they relate to each other.

2 Graphical Comparison

The inferential tables of the Poisson regression models in the previous sections give numerical information about the potential discrepancy across the age group and among the cities. But it is not intuitive. Next, we create two graphics that make the hidden pattern visible.

The following calculation is based on the regression equation with coefficients given in above table 3. Note that all variables in the model are indicator variables. Each of these indicator variables takes only two possible values: 0 and 1.

For example, \(\exp( -2.30233345)\) gives the biker rate of the baseline day, Friday, and the baseline precipitation which is shown as 0. \(\exp(-2.3023334+1.101)\) gives the biker rate of the baseline day, Friday, and precipitation 0.

2.1 Conclusion and Discussion

Several conclusions we can draw from the output of the regression models.

The regression model based on the biker count is not appropriate since the information on the population size can not be used. Simply include the total biker amount in the regression model to improve the model performance. See the following output of the fitted Poisson regression model.

The Poisson regression model for the counts of lung cancer cases versus the geographical locations, population size, and age group.
Estimate Std. Error z value Pr(>|z|)
(Intercept) 6.2556817 0.0545087 114.764856 0.0000000
DayMonday 0.0962685 0.0147678 6.518809 0.0000000
DaySaturday 0.1905999 0.0145910 13.062812 0.0000000
DaySunday 0.2971638 0.0141647 20.979119 0.0000000
DayThursday 0.0965010 0.0150109 6.428715 0.0000000
DayTuesday 0.0829859 0.0151798 5.466873 0.0000000
DayWednesday 0.0343462 0.0156325 2.197099 0.0280134
HighTemp 0.0061623 0.0005958 10.343026 0.0000000
newp1 -0.0431121 0.0122411 -3.521903 0.0004285
Total 0.0000539 0.0000014 38.513379 0.0000000

This note briefly outlines the regular Poisson regression model for fitting frequency data. The Poisson regression model has a simple structure and is easy to interpret but has a relatively strong assumption - variance is equal to the mean.

If this assumption is violated, we can use negative binomial regression as an alternative. The other potential issue is the data has excess zeros, then we can consider zero-inflated Poisson or zero-inflated negative binomial regression models.

3 Quasi-Poisson Rate Model

The above two Poison models assume that there is no dispersion issue in the model. The quasi-Poisson through glm() returns the dispersion coefficient.

Quasi-Poisson regression on the rate of the cancer rate in the four Danish cities adjusted by age. ## Dispersion
  Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.302 0.05088 -45.25 0
DayMonday 0.08804 0.01376 6.4 1.551e-10
DaySaturday 0.1628 0.01433 11.36 6.656e-30
DaySunday 0.2634 0.01395 18.89 1.487e-79
DayThursday 0.0586 0.01436 4.081 4.482e-05
DayTuesday 0.07064 0.01469 4.809 1.513e-06
DayWednesday 0.03364 0.01439 2.338 0.01938
HighTemp 0.003397 0.0005922 5.737 9.653e-09
newp1 -0.005814 0.009324 -0.6235 0.533

The dispersion index can be extracted from the quasi-Poisson object with the following code

Dispersion
10.62

3.1 Final Working Model

The intercept represents the baseline biker rate ( of baseline precipitation 0 in the baseline day of the week Friday). The actual rate is \(\exp(-2.3023334) \approx -0.005\%\) which is close to the recently reported rate of the country by WHO. The intercept $ -0.0058141$ is the difference of the log-rates between baseline day of the week Friday and Tuesday at any given precipitaion, to be more specific, \(\log(R_{\text{Horsen}}) - \log(R_{\text{Fredericia}}) = -0.3301\) which is equivalent to

\[ \log \left( \frac{R_{\text{Tuesday}}}{R_{\text{Friday}}} \right) = -0.0058141 ~~~\Rightarrow~~~\frac{R_{\text{Horsen}}}{R_{\text{Fredericia}}} = e^{-0.0058141} \approx 0.9942 \]

3.2 Discussions and Conclusions

Several conclusions we can draw from the output of the regression models.

The regression model based on biker count isn’t as accurate as the one that factors in the total from all the bridges. We used the model that includes frequency of population. Because the dispersion is 10.62 we use the quasi poisson regression model

Quasi-Poisson regression on the rate of the cancer rate in the four Danish cities adjusted by age.
  Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.302 0.05088 -45.25 0
DayMonday 0.08804 0.01376 6.4 1.551e-10
DaySaturday 0.1628 0.01433 11.36 6.656e-30
DaySunday 0.2634 0.01395 18.89 1.487e-79
DayThursday 0.0586 0.01436 4.081 4.482e-05
DayTuesday 0.07064 0.01469 4.809 1.513e-06
DayWednesday 0.03364 0.01439 2.338 0.01938
HighTemp 0.003397 0.0005922 5.737 9.653e-09
newp1 -0.005814 0.009324 -0.6235 0.533