1 Introduction

In this project we are going to fit the quasi-Poisson regression model on the counts of cyclists who entered and left the Williamsburg Bridge. We will report the value of the estimated dispersion parameter and based on the value determine whether the regular Poisson model or the quasi-Poisson should be used as the final model. Lastly, we will make a visualization to show the relationship between the number of cyclists who entered and left the bridge and the related predictor variables.

1.1 Data

Here are all the variables in this project:

  • Date- observation ID
  • AvgTemp-(HighTemp + LowTemp)/2.
  • NewPrecip-if Precipitation = 0, then NewPrecip = 0; if Precipitation > 0, then NewPrecip = 1.
  • WilliamsburgBridge- bike count on the Williamsburg Bridge.
  • Total-Bike count of all bridges

2 Poisson Regression on Rates

The following model assesses the potential relationship between biker count rates on the Williamsburg bridge and Date, AvgTemp, and NewPrecip,. This is the primary interest of the model.

Poisson regression on the rate of the the bike rate of people coming in and out of the bridge.
  Estimate Std. Error z value Pr(>|z|)
(Intercept) 5.036 14.68 0.343 0.7316
Date -0.0001417 0.0003429 -0.4132 0.6795
AvgTemp -0.001413 0.0003525 -4.009 6.095e-05
NewPrecip 0.01941 0.005767 3.366 0.0007623

The above table indicates that Date is not a good fit for this model due to its p-value. The coefficients represent the log change in expected count for a one unit increase in our predictor value. The log rates of NewPrecip were higher than any other variable since its absolute value is the highest out of all predictor variables.

2.1 Quasi-Poisson Rate Model

The quasi-Poisson returns the dispersion coefficient.

Quasi-Poisson regression on the rate of the bike rate of people coming in and out of the bridge.
  Estimate Std. Error t value Pr(>|t|)
(Intercept) 5.036 30.44 0.1654 0.8699
Date -0.0001417 0.000711 -0.1993 0.8436
AvgTemp -0.001413 0.0007308 -1.934 0.06412
NewPrecip 0.01941 0.01196 1.624 0.1165

Looking at this table we can see that Date still is not a good predictor variable and we will have to take the variable out of the final model. In this model, Date still is the lowest estimate and NewPrecip is the highest of the predictor variables. AvgTemp is the most significant value based on p-value.

The dispersion index can be extracted from the quasi-Poisson object with the following code

Dispersion
7.451

The dispersion index is 7.451. This is a high number so there is over dispersion. This means that we stay with the Quasi-Poisson rate model.

2.2 Visuals

These visuals below are to show the relationship between the number of cyclists who entered and left the bridge and their relation to AvgTemp, Date, and Precipitation.

Looking at these graphs you can see that both of their predicted bikers go down as the date passes. If you look at the other graph you will see that “no precipitation” has a higher average of predicted bikers than “precipitation”. Precipitation effects both graphs in a big way.

3 Discussions and Conclusions

The quasi-poisson rate model based on the bike count is not appropriate since the information on the total biker count is a key variable in the study of WilliamsburgBridge distribution. Including the total biker count in the quasi-poisson rate model will reduce the statistical significance of AvgTemp and NewPrecip. See the following output of the fitted quasi-poisson rate model of Williamsburg Bridge bike count adjusted by total bike count.

The Poisson regression model for the counts of bikers versus the average temp, precipitation, and the total biker count.
Estimate Std. Error t value Pr(>|t|)
(Intercept) -0.5851487 0.1735099 -3.3724224 0.0023420
AvgTemp -0.0000765 0.0008017 -0.0954342 0.9247017
NewPrecip 0.0053194 0.0117947 0.4509952 0.6557320
log(Total) 0.9458853 0.0206076 45.8998833 0.0000000

The log(Total) in bike rate is significantly higher than the other predictor variables . There should be more investigation in why the log(Total) is distributing the data.

There is a negative linear relationship between AvgTemp and Date. The AvgTemp decreases as Date increase.

The last statistical observation is that there is no interaction effect between the precip groups and nonprecip. The rate curves are “parallel”.

This is only a small data set with limited information. All conclusions in this report are only based on the given data set.

