Introduction
In this project we are going to fit the quasi-Poisson regression
model on the counts of cyclists who entered and left the Williamsburg
Bridge. We will report the value of the estimated dispersion parameter
and based on the value determine whether the regular Poisson model or
the quasi-Poisson should be used as the final model. Lastly, we will
make a visualization to show the relationship between the number of
cyclists who entered and left the bridge and the related predictor
variables.
Data
Here are all the variables in this project:
- Date- observation ID
- AvgTemp-(HighTemp + LowTemp)/2.
- NewPrecip-if Precipitation = 0, then NewPrecip = 0; if Precipitation
> 0, then NewPrecip = 1.
- WilliamsburgBridge- bike count on the Williamsburg Bridge.
- Total-Bike count of all bridges
Poisson Regression on
Rates
The following model assesses the potential relationship between biker
count rates on the Williamsburg bridge and Date, AvgTemp, and
NewPrecip,. This is the primary interest of the model.
Poisson regression on the rate of the the bike rate of people
coming in and out of the bridge.
| (Intercept) |
5.036 |
14.68 |
0.343 |
0.7316 |
| Date |
-0.0001417 |
0.0003429 |
-0.4132 |
0.6795 |
| AvgTemp |
-0.001413 |
0.0003525 |
-4.009 |
6.095e-05 |
| NewPrecip |
0.01941 |
0.005767 |
3.366 |
0.0007623 |
The above table indicates that Date is not a good fit for this model
due to its p-value. The coefficients represent the log change in
expected count for a one unit increase in our predictor value. The log
rates of NewPrecip were higher than any other variable since its
absolute value is the highest out of all predictor variables.
Quasi-Poisson Rate
Model
The quasi-Poisson returns the dispersion coefficient.
Quasi-Poisson regression on the rate of the bike rate of people
coming in and out of the bridge.
| (Intercept) |
5.036 |
30.44 |
0.1654 |
0.8699 |
| Date |
-0.0001417 |
0.000711 |
-0.1993 |
0.8436 |
| AvgTemp |
-0.001413 |
0.0007308 |
-1.934 |
0.06412 |
| NewPrecip |
0.01941 |
0.01196 |
1.624 |
0.1165 |
Looking at this table we can see that Date still is not a good
predictor variable and we will have to take the variable out of the
final model. In this model, Date still is the lowest estimate and
NewPrecip is the highest of the predictor variables. AvgTemp is the most
significant value based on p-value.
The dispersion index can be extracted from the quasi-Poisson object
with the following code
The dispersion index is 7.451. This is a high number so there is over
dispersion. This means that we stay with the Quasi-Poisson rate
model.
Visuals
These visuals below are to show the relationship between the number
of cyclists who entered and left the bridge and their relation to
AvgTemp, Date, and Precipitation.


Looking at these graphs you can see that both of their predicted
bikers go down as the date passes. If you look at the other graph you
will see that “no precipitation” has a higher average of predicted
bikers than “precipitation”. Precipitation effects both graphs in a big
way.
Discussions and
Conclusions
The quasi-poisson rate model based on the bike count is not
appropriate since the information on the total biker count is a key
variable in the study of WilliamsburgBridge distribution. Including the
total biker count in the quasi-poisson rate model will reduce the
statistical significance of AvgTemp and NewPrecip. See the following
output of the fitted quasi-poisson rate model of Williamsburg Bridge
bike count adjusted by total bike count.
The Poisson regression model for the counts of bikers versus
the average temp, precipitation, and the total biker count.
| (Intercept) |
-0.5851487 |
0.1735099 |
-3.3724224 |
0.0023420 |
| AvgTemp |
-0.0000765 |
0.0008017 |
-0.0954342 |
0.9247017 |
| NewPrecip |
0.0053194 |
0.0117947 |
0.4509952 |
0.6557320 |
| log(Total) |
0.9458853 |
0.0206076 |
45.8998833 |
0.0000000 |
The log(Total) in bike rate is significantly higher than the other
predictor variables . There should be more investigation in why the
log(Total) is distributing the data.
There is a negative linear relationship between AvgTemp and Date. The
AvgTemp decreases as Date increase.
The last statistical observation is that there is no interaction
effect between the precip groups and nonprecip. The rate curves are
“parallel”.
This is only a small data set with limited information. All
conclusions in this report are only based on the given data set.
