1 Intro for Poisson

Here we read the bike data into R using Github.

The data represents the amount of bikers that go on the Brooklyn Bridge with stats on the days of the week as well as the precipitation and the low and high temperatures of that day

1.1 Poisson Regression on bike frequency

We first build a Poisson frequency regression model and ignore the total population size of all the bikers in the data.

The Poisson regression model for the counts of lung cancer cases versus the geographical locations and the age group.
	Estimate	Std. Error	z value
(Intercept)	7.0206554	0.0504253	139.22894
DayMonday	0.3030707	0.0137679	22.01289
DaySaturday	0.0895943	0.0143381	6.24868
DaySunday	0.2042091	0.0140217	14.56380
DayThursday	0.2598638	0.0144328	18.00505
DayTuesday	0.2114776	0.0147706	14.31749
DayWednesday	0.2629973	0.0145466	18.07966
HighTemp	0.0094271	0.0005897	15.98643
newp1	-0.3541359	0.0094149	-37.61431

Above we are shown that the temperature and precipitation are in fact significant in the data and contribute to how many bikers cross over the Brooklyn Bridge

1.2 Poisson Regression on Rates

The following model assesses the potential relationship between Biker rates, High temp and precipitation or no precipitation and Days of the week. This is the primary interest of the model. We also want to adjust the relationship be the different days of the week.

Poisson regression on the rate of the the cancer rate in the four Danish cities adjusted by age.
	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	-2.3023334	0.0508766	-45.2532710	0.0000000
DayMonday	0.0880389	0.0137556	6.4002306	0.0000000
DaySaturday	0.1628197	0.0143334	11.3594543	0.0000000
DaySunday	0.2634215	0.0139480	18.8860141	0.0000000
DayThursday	0.0586048	0.0143601	4.0810902	0.0000448
DayTuesday	0.0706446	0.0146887	4.8094638	0.0000015
DayWednesday	0.0336450	0.0143897	2.3381354	0.0193802
HighTemp	0.0033975	0.0005922	5.7367142	0.0000000
newp1	-0.0058136	0.0093244	-0.6234784	0.5329702

The above table indicates that the log of biker rates is similar along the bounds of precipitation and temperature we must look into the comparison of the days of the week and how they relate to each other.

2 Graphical Comparison

The inferential tables of the Poisson regression models in the previous sections give numerical information about the potential discrepancy across the age group and among the cities. But it is not intuitive. Next, we create two graphics that make the hidden pattern visible.

The following calculation is based on the regression equation with coefficients given in above table 3. Note that all variables in the model are indicator variables. Each of these indicator variables takes only two possible values: 0 and 1.

For example, $\exp( -2.30233345)$ gives the biker rate of the baseline day, Friday, and the baseline precipitation which is shown as 0. $\exp(-2.3023334+1.101)$ gives the biker rate of the baseline day, Friday, and precipitation 0.

2.1 Conclusion and Discussion

Several conclusions we can draw from the output of the regression models.

The regression model based on the biker count is not appropriate since the information on the population size can not be used. Simply include the total biker amount in the regression model to improve the model performance. See the following output of the fitted Poisson regression model.

The Poisson regression model for the counts of lung cancer cases versus the geographical locations, population size, and age group.
	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	6.2556817	0.0545087	114.764856	0.0000000
DayMonday	0.0962685	0.0147678	6.518809	0.0000000
DaySaturday	0.1905999	0.0145910	13.062812	0.0000000
DaySunday	0.2971638	0.0141647	20.979119	0.0000000
DayThursday	0.0965010	0.0150109	6.428715	0.0000000
DayTuesday	0.0829859	0.0151798	5.466873	0.0000000
DayWednesday	0.0343462	0.0156325	2.197099	0.0280134
HighTemp	0.0061623	0.0005958	10.343026	0.0000000
newp1	-0.0431121	0.0122411	-3.521903	0.0004285
Total	0.0000539	0.0000014	38.513379	0.0000000

This note briefly outlines the regular Poisson regression model for fitting frequency data. The Poisson regression model has a simple structure and is easy to interpret but has a relatively strong assumption - variance is equal to the mean.

If this assumption is violated, we can use negative binomial regression as an alternative. The other potential issue is the data has excess zeros, then we can consider zero-inflated Poisson or zero-inflated negative binomial regression models.

3 Quasi-Poisson Rate Model

The above two Poison models assume that there is no dispersion issue in the model. The quasi-Poisson through glm() returns the dispersion coefficient.

Quasi-Poisson regression on the rate of the cancer rate in the four Danish cities adjusted by age. ## Dispersion
	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	-2.302	0.05088	-45.25	0
DayMonday	0.08804	0.01376	6.4	1.551e-10
DaySaturday	0.1628	0.01433	11.36	6.656e-30
DaySunday	0.2634	0.01395	18.89	1.487e-79
DayThursday	0.0586	0.01436	4.081	4.482e-05
DayTuesday	0.07064	0.01469	4.809	1.513e-06
DayWednesday	0.03364	0.01439	2.338	0.01938
HighTemp	0.003397	0.0005922	5.737	9.653e-09
newp1	-0.005814	0.009324	-0.6235	0.533

The dispersion index can be extracted from the quasi-Poisson object with the following code

Dispersion
10.62

3.1 Final Working Model

The intercept represents the baseline biker rate ( of baseline precipitation 0 in the baseline day of the week Friday). The actual rate is $\exp(-2.3023334) \approx -0.005\%$ which is close to the recently reported rate of the country by WHO. The intercept $ -0.0058141$ is the difference of the log-rates between baseline day of the week Friday and Tuesday at any given precipitaion, to be more specific, $\log(R_{\text{Horsen}}) - \log(R_{\text{Fredericia}}) = -0.3301$ which is equivalent to

\[ \log \left( \frac{R_{\text{Tuesday}}}{R_{\text{Friday}}} \right) = -0.0058141 ~~~\Rightarrow~~~\frac{R_{\text{Horsen}}}{R_{\text{Fredericia}}} = e^{-0.0058141} \approx 0.9942 \]

3.2 Discussions and Conclusions

Several conclusions we can draw from the output of the regression models.

The regression model based on biker count isn’t as accurate as the one that factors in the total from all the bridges. We used the model that includes frequency of population. Because the dispersion is 10.62 we use the quasi poisson regression model

Quasi-Poisson regression on the rate of the cancer rate in the four Danish cities adjusted by age.
	Estimate	Std. Error	z value	Pr(>\|z\|)
(Intercept)	-2.302	0.05088	-45.25	0
DayMonday	0.08804	0.01376	6.4	1.551e-10
DaySaturday	0.1628	0.01433	11.36	6.656e-30
DaySunday	0.2634	0.01395	18.89	1.487e-79
DayThursday	0.0586	0.01436	4.081	4.482e-05
DayTuesday	0.07064	0.01469	4.809	1.513e-06
DayWednesday	0.03364	0.01439	2.338	0.01938
HighTemp	0.003397	0.0005922	5.737	9.653e-09
newp1	-0.005814	0.009324	-0.6235	0.533

Poisson Regression

Jaiden Neff

Assignmnet 9&10