1 Data Description

This data set is a subset of a larger collection of data measuring relating to the amount of bike traffic on 4 bridges in New York City. This specific subset concerns the bike traffic on the Queensboro Bridge during each day of the month of July. Variables in this data set include the date of the day recorded, which weekday it is on that date, the day’s high temperature, the day’s low temperature, amount of precipitation, the count for amount of bikes that crossed the Queensboro Bridge, and the count for total amount of bikes that crossed all of the 4 bridges.

2 Objectives and Distribution Assumptions

There are two main questions we are looking to answer in our analysis of this data set. These two qeustions will require us to build two separate predictive models. The first question is to determine the relationship between the 4 explanatory variables (WeekDay,High temp, Low temp, rain) and the count for amount of cyclist crossing the Queensboro Bridge. These relationships will determine what types of days cyclist traffic is the heaviest and the lightest on the Queensboro bridge. The second question is to determine the relationship between the 4 explanatory variables and proportion cyclists using the Queensboro bridge out of the total number of cyclists crossing any of the 4 bridges (Queensboro/Total). These relationships are practically important in determining if the Queensboro bridge’s cylclist activity follows the same trends as the other 4 bridges.

We assume that the distributions for both the count of cyclist crossing the Queensboro bridge and the count of cyclist crossing all 4 bridges are poisson distributions. As a result of this assumption we will use poisson regression models for both our association analyses. Two assumptions of poison distributions are that the mean of the distribution equals its variance and that all observations are independent. We should be cautious around the assumption of independence as in this case that might not necessarily be true. It is not unreasonable to assume that one observation of a cyclist increases the probability of a second on account of multiple cyclists riding together. For our analysis we will assume that these violations of independence are not frequent enough to change the distribution. After the two models are built a quasi-poisson model will be built to model cylist crossing the Queensboro bridge. We will Analize the differences between the quasi- poisson model and the original poisson model.

## Rows: 31 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Date, Day
## dbl (3): HighTemp, LowTemp, Precipitation
## num (2): QueensboroBridge, Total
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3 Frequency model

The following model is the poisson regression model for predicting number of cyclist crossing the Queensboro bridge based on the day of the week, the high temperature for the day, the lower temperature for the day, and the precipitation level. Based on the p-values of the coefficients all 4 explanatory variables have a significant relationship with cyclist count, The values of the coefficients do not have practical significance as they are the relationship between their variable and the log mean, rather than the mean. Cyclist crossing the bridge is least common on the weekends, and most common on Wednesdays. I am unsure why cyclist peaks in the middle of the workweek but the drop in the weekend is most likely a result of the lack of commuters. Precipitation has a negative relationship with the amount of cyclists, this is too be expected as people are less likely to engage in any outdoor activity in the rain or snow. I was surprised to find that while increases in the daily high temperature increased log mean cyclist count, increases in the daily low temperature decreased log mean cyclist count. This is most likely a modeling error resulting from the high correlation of these two values. As such I will build a second model replacing the variables high temp and low temp with a new variable median temp (High + Low /2). Rebuilding the model with this new variables reveals median temp is statisticaly significant with a positive relationship with the log mean of the distribution. This is an intuitive result as we would assume people would be more likely to engage in outdoor physical activity on warmer days.

The Poisson regression model for the counts of cyclist crossing the Queensboro bridge
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.6058987 0.0458725 187.604675 0.00e+00
DayFriday -0.0541753 0.0108546 -4.991021 6.00e-07
DaySaturday -0.2495134 0.0099422 -25.096506 0.00e+00
DaySunday -0.2767917 0.0099669 -27.771104 0.00e+00
DayThursday 0.0615369 0.0106391 5.784024 0.00e+00
DayTuesday 0.0423816 0.0104438 4.058063 4.95e-05
DayWednesday 0.1294945 0.0099519 13.012008 0.00e+00
HighTemp 0.0158199 0.0008034 19.691515 0.00e+00
LowTemp -0.0197465 0.0011701 -16.876092 0.00e+00
Precipitation -0.3221763 0.0105342 -30.583731 0.00e+00
The Poisson regression model for the counts of cyclist crossing the Queensboro bridge
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.3001431 0.0428085 193.890110 0.0000000
DayFriday -0.1106318 0.0103709 -10.667495 0.0000000
DaySaturday -0.2805862 0.0097873 -28.668364 0.0000000
DaySunday -0.2986431 0.0099038 -30.154408 0.0000000
DayThursday -0.0164139 0.0097710 -1.679867 0.0929832
DayTuesday -0.0245571 0.0097855 -2.509536 0.0120890
DayWednesday 0.0782499 0.0095605 8.184737 0.0000000
Medtemp 0.0033094 0.0005487 6.030941 0.0000000
Precipitation -0.3376053 0.0106236 -31.778712 0.0000000

4 Rates Model

At a alpha level of 0.05 the variable Median temp is not statisticaly significant, the rest of the variables are. We can remove this variable from the model. Precipitation has a positive relationship with the log proportion of cyclist. This indicates that the Queensboro bridge has a smaller drop in cyclist per increase in precipitation compared to the other bridges. One potential reason for this could be the Queensboro bridge has more cover than the other 3 bridges. The log proportion is at it’s smallest on Sunday, and increases util it reaches it’s peak on Friday and then decreases again over the weekend. This cyclical trend is intresting, though it is hard to determine a reason for it from data alone.

The Poisson regression model for the proportion of cyclist crossing the Queensboro bridge out of the 4 bridges
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3629003 0.0437105 -31.180148 0.0000000
DayFriday 0.0587981 0.0105973 5.548389 0.0000000
DaySaturday 0.0357250 0.0097667 3.657854 0.0002543
DaySunday -0.0125410 0.0098214 -1.276899 0.2016378
DayThursday 0.0372178 0.0096381 3.861527 0.0001127
DayTuesday 0.0138737 0.0096862 1.432327 0.1520504
DayWednesday 0.0170347 0.0094173 1.808868 0.0704715
Medtemp -0.0010316 0.0005595 -1.843772 0.0652163
Precipitation 0.0433469 0.0097368 4.451860 0.0000085
The Poisson regression model for the proportion of cyclist crossing the Queensboro bridge out of the 4 bridges
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4425633 0.0066355 -217.400820 0.0000000
DayFriday 0.0567518 0.0105395 5.384695 0.0000001
DaySaturday 0.0352129 0.0097626 3.606934 0.0003098
DaySunday -0.0113306 0.0097987 -1.156329 0.2475466
DayThursday 0.0364482 0.0096285 3.785455 0.0001534
DayTuesday 0.0148991 0.0096689 1.540925 0.1233350
DayWednesday 0.0158896 0.0093962 1.691062 0.0908250
Precipitation 0.0468076 0.0095518 4.900409 0.0000010

5 Summary

From this analysis we were able to conclude some notable things about the frequency of cyclist on the Queensboro bridge. Rain or Snow tends to lower the amount of cyclist, but not as much as on the other 3 bridges. Bikers are more likely to cross the bridge on warmer days. Bikers also tend to cross the bridge the least on the weekends, and the most on Wensday.E` parameter was added to the code chunk to prevent printing of the R code that generated the plot.

6 Quasi Poisson model

Before building a Quasi-Poisson model we will modify the precipitation variable to be be a binary variable for weather there was any precipitation or not. This will make the model easier to visualize and won’t affect any practical interpretation. Even if there is a meaningful difference between a day of light rain or heavy rain its not practical as the amount of precipitation is hard to estimate before the fact. We rebuild the original poisson model replacing precipitation with the new binary version of the variable, there are no practice difference in our interpretation of the variable compared to the non-discretized form. Fitting a Quasi Poisson model, there are a number of differences in terms of the significance of certain variables. Median temperature no longer has a significant relationship with the log mean of bikes crossing the Queensboro bridge. Median tempature can be removed from the model without significantly affecting the estimates of the other coefficents. Precipitation is still significant. Based on the new standard errors and p values none of the weekdays have a statistically significant difference from Monday, but Saturday and Sunday do. The Dispersion parameter is estimated to be 109.1, This is much larger than 1 and indicates we should definitely use the quasi poisson model as our final model.

The poisson regression model for the counts of cyclist crossing the Queensboro bridge
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.1325853 0.0425137 191.293386 0.0000000
DayFriday -0.1586579 0.0102399 -15.494037 0.0000000
DaySaturday -0.2575913 0.0097688 -26.368845 0.0000000
DaySunday -0.2869852 0.0097652 -29.388510 0.0000000
DayThursday 0.0100075 0.0095887 1.043677 0.2966350
DayTuesday -0.0652087 0.0098213 -6.639518 0.0000000
DayWednesday 0.0308267 0.0096415 3.197286 0.0013873
Medtemp 0.0060353 0.0005483 11.006896 0.0000000
PrecipYNRain -0.3189133 0.0072562 -43.950275 0.0000000
The quasi poisson regression model for the counts of cyclist crossing the Queensboro bridge
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.1325853 0.4440895 18.3129423 0.0000000
DayFriday -0.1586579 0.1069643 -1.4832787 0.1521859
DaySaturday -0.2575913 0.1020427 -2.5243483 0.0193106
DaySunday -0.2869852 0.1020055 -2.8134275 0.0101220
DayThursday 0.0100075 0.1001618 0.0999135 0.9213179
DayTuesday -0.0652087 0.1025914 -0.6356159 0.5315836
DayWednesday 0.0308267 0.1007133 0.3060833 0.7624190
Medtemp 0.0060353 0.0057277 1.0537146 0.3034495
PrecipYNRain -0.3189133 0.0757971 -4.2074578 0.0003635
Dispersion
109.1

7 Graphing the poisson regression estimates

The following graph shows the predicted cyclist crossing the Queensboro Bridge based on the quasi poisson model. As temperature was found to not be statistically significant in the quasi poisson model it is not included in the graph or the underlying model. This graph highlights some aspects we have discussed in the coefficient analysis, mainly how the amount of cyclist drops on rainy days and on the weekend. The shape of the line graphs are the same for both days with or without precipitation, this indicates that there is no interaction between precipitation and week day.