Data Description
First few records in the data set
| 1 |
42826 |
42826 |
46.0 |
37 |
0.00 |
1915 |
5397 |
| 2 |
42827 |
42827 |
62.1 |
41 |
0.00 |
4207 |
13033 |
| 3 |
42828 |
42828 |
63.0 |
50 |
0.03 |
5178 |
16325 |
| 4 |
42829 |
42829 |
51.1 |
46 |
1.18 |
2279 |
6581 |
| 5 |
42830 |
42830 |
63.0 |
46 |
0.00 |
5711 |
17991 |
| 6 |
42831 |
42831 |
48.9 |
41 |
0.73 |
1739 |
4896 |
Research
Question
In this assignment we are looking at how many cyclists are entering
and leaving Queens, Manhattan, and Brooklyn via the East River Bridges
on a 24 hour basis. We are specifically looking at the Williamsburg
Bridge.
Poisson Regression on
Bike Count
We first build a Poisson frequency regression model.
The Poisson regression model for the Williamsburg Bridge counts
of bikes versus predictor variables.
| (Intercept) |
92.5585282 |
15.8473148 |
5.840644 |
0e+00 |
| Date |
-0.0019898 |
0.0003702 |
-5.375607 |
1e-07 |
| HighTemp |
0.0124176 |
0.0005280 |
23.516474 |
0e+00 |
| LowTemp |
0.0090757 |
0.0007776 |
11.671588 |
0e+00 |
| Precipitation |
-0.8475197 |
0.0153209 |
-55.317891 |
0e+00 |
The above inferential table about the regression coefficients
indicates all variables are significant. This means, if we look at
Williamsburg bridge bike count across all predictor variables, there is
statistical evidence to support the potential discrepancy across the
variables.
Poisson Regression on
Rates
The following model assesses the potential relationship between
Williamsburg bridge bike count rates of people entering and leaving the
bridge. This is the primary interest of the model.
Poisson regression on the rate of the the Williamsburg Bridge
bikers leaving and entering from this bridge.
| (Intercept) |
16.2190289 |
15.7531962 |
1.029571 |
0.3032116 |
| Date |
-0.0004033 |
0.0003679 |
-1.096057 |
0.2730539 |
| HighTemp |
-0.0016855 |
0.0005412 |
-3.114497 |
0.0018426 |
| LowTemp |
0.0010453 |
0.0007805 |
1.339286 |
0.1804777 |
| Precipitation |
0.0473201 |
0.0147457 |
3.209079 |
0.0013316 |
The above table indicates that the log of Williamsburg bridge rates
is not identical across date, temps, and precipitation. The log rates of
precipitation were higher than all the other predictor variables.
LowTemp has the lowest log rate. The regression coefficients represent
the change of log rate between the count of bikes passing on the bridge
each day.
FALSE
FALSE Call:
FALSE glm(formula = WilliamsburgBridge ~ Date + HighTemp + LowTemp +
FALSE Precipitation, family = quasipoisson, data = bikes, offset = log(Total))
FALSE
FALSE Coefficients:
FALSE Estimate Std. Error t value Pr(>|t|)
FALSE (Intercept) 16.2190289 31.5869867 0.513 0.612
FALSE Date -0.0004033 0.0007377 -0.547 0.589
FALSE HighTemp -0.0016855 0.0010851 -1.553 0.133
FALSE LowTemp 0.0010453 0.0015650 0.668 0.510
FALSE Precipitation 0.0473201 0.0295668 1.600 0.122
FALSE
FALSE (Dispersion parameter for quasipoisson family taken to be 4.02049)
FALSE
FALSE Null deviance: 151.051 on 29 degrees of freedom
FALSE Residual deviance: 99.642 on 25 degrees of freedom
FALSE AIC: NA
FALSE
FALSE Number of Fisher Scoring iterations: 3
Discussions and
Conclusions
The regression model based on the bike count is appropriate since the
information on the total count is an important variable. Including the
temperature of the days will decrease the significance of the other
variables so we will not put the temperature variables in the final
model. See the following output of the fitted Poisson regression model
of count adjusted by total count.
The Poisson regression model for the counts of bikers onnthe
Williamsburg Bridge cases versus the date, precipitation, and total
biker count.
| (Intercept) |
21.9017937 |
15.8197149 |
1.3844620 |
0.1662170 |
| Date |
-0.0005253 |
0.0003692 |
-1.4225474 |
0.1548674 |
| HighTemp |
-0.0009698 |
0.0005706 |
-1.6995988 |
0.0892064 |
| LowTemp |
0.0014775 |
0.0007879 |
1.8753115 |
0.0607499 |
| Precipitation |
-0.0014458 |
0.0193091 |
-0.0748743 |
0.9403148 |
| log(Total) |
0.9459994 |
0.0137645 |
68.7272667 |
0.0000000 |
We can see from the above output the adding total count to the model
changes the p-values associated with all predictor variables. There is a
strong correlation between the total bike count and the Williamsburg
bridge count. On the other hand, adding the total has increased some of
the other variables p-values like precipitation.
Looking at the final model, it would be best to take precipitation
out since it has a very high p-value and could be brining other
variables p-values up with it.
This is a small data set with limited information. All conclusions in
this report are only based on the given data set.
