1 Data Description

This data set is a subset of a larger collection of data measuring relating to the amount of bike traffic on 4 bridges in New York City. This specific subset concerns the bike traffic on the Queensboro Bridge during each day of the month of July. Variables in this data set include the date of the day recorded, which weekday it is on that date, the day’s high temperature, the day’s low temperature, amount of precipitation, the count for amount of bikes that crossed the Queensboro Bridge, and the count for total amount of bikes that crossed all of the 4 bridges.

2 Objectives and Distribution Assumptions

There are two main questions we are looking to answer in our analysis of this data set. These two qeustions will require us to build two separate predictive models. The first question is to determine the relationship between the 4 explanatory variables (WeekDay,High temp, Low temp, rain) and the count for amount of cyclist crossing the Queensboro Bridge. These relationships will determine what types of days cyclist traffic is the heaviest and the lightest on the Queensboro bridge. The second question is to determine the relationship between the 4 explanatory variables and proportion cyclists using the Queensboro bridge out of the total number of cyclists crossing any of the 4 bridges (Queensboro/Total). These relationships are practically important in determining if the Queensboro bridge’s cylclist activity follows the same trends as the other 4 bridges.

We assume that the distributions for both the count of cyclist crossing the Queensboro bridge and the count of cyclist crossing all 4 bridges are poisson distributions. As a result of this assumption we will use poisson regression models for both our association analyses. Two assumptions of poison distributions are that the mean of the distribution equals its variance and that all observations are independent. We should be cautious around the assumption of independence as in this case that might not necessarily be true. It is not unreasonable to assume that one observation of a cyclist increases the probability of a second on account of multiple cyclists riding together. For our analysis we will assume that these violations of independence are not frequent enough to change the distribution.

## Rows: 31 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Date, Day
## dbl (3): HighTemp, LowTemp, Precipitation
## num (2): QueensboroBridge, Total
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

3 Frequency model

The following model is the poisson regression model for predicting number of cyclist crossing the Queensboro bridge based on the day of the week, the high temperature for the day, the lower temperature for the day, and the precipitation level. Based on the p-values of the coefficients all 4 explanatory variables have a significant relationship with cyclist count, The values of the coefficients do not have practical significance as they are the relationship between their variable and the log mean, rather than the mean. Cyclist crossing the bridge is least common on the weekends, and most common on Wednesdays. I am unsure why cyclist peaks in the middle of the workweek but the drop in the weekend is most likely a result of the lack of commuters. Precipitation has a negative relationship with the amount of cyclists, this is too be expected as people are less likely to engage in any outdoor activity in the rain or snow. I was surprised to find that while increases in the daily high temperature increased log mean cyclist count, increases in the daily low temperature decreased log mean cyclist count. This is most likely a modeling error resulting from the high correlation of these two values. As such I will build a second model replacing the variables high temp and low temp with a new variable median temp (High + Low /2). Rebuilding the model with this new variables reveals median temp is statisticaly significant with a positive relationship with the log mean of the distribution. This is an intuitive result as we would assume people would be more likely to engage in outdoor physical activity on warmer days.

The Poisson regression model for the counts of cyclist crossing the Queensboro bridge
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.6058987 0.0458725 187.604675 0.00e+00
DayFriday -0.0541753 0.0108546 -4.991021 6.00e-07
DaySaturday -0.2495134 0.0099422 -25.096506 0.00e+00
DaySunday -0.2767917 0.0099669 -27.771104 0.00e+00
DayThursday 0.0615369 0.0106391 5.784024 0.00e+00
DayTuesday 0.0423816 0.0104438 4.058063 4.95e-05
DayWednesday 0.1294945 0.0099519 13.012008 0.00e+00
HighTemp 0.0158199 0.0008034 19.691515 0.00e+00
LowTemp -0.0197465 0.0011701 -16.876092 0.00e+00
Precipitation -0.3221763 0.0105342 -30.583731 0.00e+00
The Poisson regression model for the counts of cyclist crossing the Queensboro bridge
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.3001431 0.0428085 193.890110 0.0000000
DayFriday -0.1106318 0.0103709 -10.667495 0.0000000
DaySaturday -0.2805862 0.0097873 -28.668364 0.0000000
DaySunday -0.2986431 0.0099038 -30.154408 0.0000000
DayThursday -0.0164139 0.0097710 -1.679867 0.0929832
DayTuesday -0.0245571 0.0097855 -2.509536 0.0120890
DayWednesday 0.0782499 0.0095605 8.184737 0.0000000
Medtemp 0.0033094 0.0005487 6.030941 0.0000000
Precipitation -0.3376053 0.0106236 -31.778712 0.0000000

4 Rates Model

At a alpha level of 0.05 the variable Median temp is not statisticaly significant, the rest of the variables are. We can remove this variable from the model. Precipitation has a positive relationship with the log proportion of cyclist. This indicates that the Queensboro bridge has a smaller drop in cyclist per increase in precipitation compared to the other bridges. One potential reason for this could be the Queensboro bridge has more cover than the other 3 bridges. The log proportion is at it’s smallest on Sunday, and increases util it reaches it’s peak on Friday and then decreases again over the weekend. This cyclical trend is intresting, though it is hard to determine a reason for it from data alone.

The Poisson regression model for the proportion of cyclist crossing the Queensboro bridge out of the 4 bridges
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.3629003 0.0437105 -31.180148 0.0000000
DayFriday 0.0587981 0.0105973 5.548389 0.0000000
DaySaturday 0.0357250 0.0097667 3.657854 0.0002543
DaySunday -0.0125410 0.0098214 -1.276899 0.2016378
DayThursday 0.0372178 0.0096381 3.861527 0.0001127
DayTuesday 0.0138737 0.0096862 1.432327 0.1520504
DayWednesday 0.0170347 0.0094173 1.808868 0.0704715
Medtemp -0.0010316 0.0005595 -1.843772 0.0652163
Precipitation 0.0433469 0.0097368 4.451860 0.0000085
The Poisson regression model for the proportion of cyclist crossing the Queensboro bridge out of the 4 bridges
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.4425633 0.0066355 -217.400820 0.0000000
DayFriday 0.0567518 0.0105395 5.384695 0.0000001
DaySaturday 0.0352129 0.0097626 3.606934 0.0003098
DaySunday -0.0113306 0.0097987 -1.156329 0.2475466
DayThursday 0.0364482 0.0096285 3.785455 0.0001534
DayTuesday 0.0148991 0.0096689 1.540925 0.1233350
DayWednesday 0.0158896 0.0093962 1.691062 0.0908250
Precipitation 0.0468076 0.0095518 4.900409 0.0000010

5 Summary

From this anaylsis we were able to conclude some notable things about the frequency of cyclist on the Queensboro bridge. Rain or Snow tends to lower the amount of cyclist, but not as much as on the other 3 bridges. Bikers are more likely to cross the bridge on warmer days. Bikers also tend to cross the bridge the least on the weekends, and the most on Wensday.