1 Introduction and Background

Here we have a dataset sourced from New York City’s Traffic Information Management System (TIMS). TIMS recorded the number of cyclists entering and leaving three of New York City’s five boroughs - Queens, Manhattan and Brooklyn - via a collection of bridges known as the East River Bridges (Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge). These recordings took place in 2017. April, July and October are the three months that are present in our available copy of the data.

For today’s analysis we are going to look at a randomly selected subset of the larger dataset (subset was chosen using R’s runif function), that pertains to cyclists who entered and left our three boroughs of interest - Queens, Manhattan and Brooklyn - via the Manhattan Bridge throughout the entire month of July 2017. This data has 31 observations, one detailing each day, and no missing values. A breakdown of each of the original dataset’s variables, their practical meaning and data types are below.

Name Meaning Data_Type
Date Date for that observation; YYYY-MM-DD form Date
Day Day of the week for that observation character
HighTemp That day’s highest recorded temperature double
LowTemp That day’s lowest recorded temperature double
Precipitation Measure of rain that day (inches) double
Manhattan Number of cyclists entering/leaving Queens, Manhattan or Brooklyn via the MANHATTAN Bridge double
Total Total number of cyclists entering/leaving Queens, Manhattan or Brooklyn via ANY of the East River Bridges double

1.1 Objective of Analysis

With the available data, my goal for this analysis is to examine the association between weather conditions and day of the week with the amount of cyclist traffic that the Manhattan Bridge experiences. In order to do this, I created two new variables - MeanTemp and TempDiff - which were calculated by averaging that particular day’s low and high temperatures and finding the difference between those temperatures respectively.

Using these temperature-related metrics, along with measures of precipitation and records of the day of the week, I will use Poisson regression techniques to see which if any of these factors play a particular role in the overall amount or the relative rate of cyclist traffic that the Manhattan Bridge experiences.

2 Poisson Regression Modeling

To explore any potential associations, I created Poisson models of two different regression types, one being for counts and one being for rates.

Poisson counts regression examines the total number of occurrences of a particular event (in this case cyclists on the Manhattan Bridge) and uses a logarithmic function to determine which, if any of the explanatory variables have a significant effect on said response variable’s mean. The formula for said regression is below:

  • \(\beta\)0 = the log of our response variable’s mean; not very useful for practical interpretation

  • \(\beta\)1, \(\beta\)2, \(\beta\)3, … \(\beta\)p = the change in our response variable’s log mean, in association with a one unit increase in said predictor variable


Additionally, Poisson rates regression aims to find the expected rate of a particular event’s occurrence relative to that event’s proportion within a larger “population.” In the instance of this dataset and analysis, our variable Total, which represents the total number of cyclists on all the East River Bridges, will be what the number of cyclists on the Manhattan Bridge are considered to be a proportion of. The calculation for this type of Poisson regression is similar to counts regression, but the logarithm of the population variable is also considered to be a factor. This can be expressed in both of the following ways.


  • In Poisson rates regression, the parameters \(\beta\)0, …. \(\beta\)p should be interpreted in the same manner as they are in Poisson counts model.

2.1 Poisson Regression (Counts)

Below is a summary of the Poisson counts regression model I created, with measures of temperature range and averages, precipitation amount and day of the week all functioning as predictors of how many cyclists crossed the Manhattan Bridge in or out of our three boroughs of interest.

# Counts Model:
  # Response = Manhattan
  # Predictors = Day, MeanTemp, TempDiff, Precipitation
    # Day is stored as a Factor

Counts_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, family = poisson(link = "log"), data = Data)

Counts_Model_Sum = summary(Counts_Model)
Counts_Model_Coef = Counts_Model_Sum$coefficients

invisible(Counts_Model_Coef)
kable(Counts_Model_Coef, caption = "Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists
Estimate Std. Error z value Pr(>|z|)
(Intercept) 8.5013371 0.0421490 201.697286 0.0000000
DayMonday 0.3199236 0.0089935 35.572866 0.0000000
DayTuesday 0.3357894 0.0093242 36.012843 0.0000000
DayWednesday 0.4023102 0.0090796 44.309388 0.0000000
DayThursday 0.2807381 0.0096557 29.074878 0.0000000
DayFriday 0.1331873 0.0107366 12.405032 0.0000000
DaySaturday -0.0859127 0.0097279 -8.831613 0.0000000
MeanTemp -0.0029260 0.0006129 -4.774076 0.0000018
TempDiff 0.0143243 0.0008575 16.703807 0.0000000
Precipitation -0.4307477 0.0104214 -41.332836 0.0000000
# All predictor variables are significant

In the model, we can see that every predictor variable is statistically significant as per p values well below the standard of 0.05, so no stepwise regression or model simplification is necessary.

As for the practical implications of our model summary, we can say that although every predictor variable is statistically significant, the magnitude of their impacts are relatively small. Precipitation’s estimated negative effect on the log mean of Manhattan Bridge cyclists has an absolute value ~ |.4307|, which is the the highest of all our predictors.

It appears that the day’s average temperature and difference in daily highs and lows played very little practical significance in the log mean of that day’s cyclists. When we look at the difference in log means from a day-of-the-week perspective, we do see a slightly more impactful effect. With Sunday being coded in as the baseline, it looks like Wednesday has the greatest amount of cyclist traffic and Saturday has the least. This higher count of cyclists during the workweek could be due to the Manhattan Bridge functioning for many as a commuting method.

All in all, our Poisson counts model yields some interesting and statistically significant revelations, most notably that cyclists care far more about precipitation than they do temperature fluctuation, and that cyclist traffic appears to tick upwards throughout the workweek before dying down for the weekend. However, the relatively small magnitude of each variable’s estimated effect is a downside regarding the model’s utility.

2.2 Poisson Regression (Rates)

After Poisson counts regression, I then performed Poisson rates regression with the total number of cyclists entering and exiting our three boroughs of interest across all the East River Bridges as the “population” for which the Manhattan Bridge cyclists are acting as a sample of.

This process consisted of me creating two different Poisson rates models. The first one I created listed both temperature variables as statistically insignificant. Given their status as statistically insignificant in this model, and their minute practical significance in the previous counts model, I chose to remove them and create a second Poisson rates model which did not factor in the day’s average or range of temperature.

### Rates Model 1
Rates_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model_Sum = summary(Rates_Model)
Rates_Model_Coef = Rates_Model_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model_Coef, caption = "Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.1844325 0.0418215 -28.3211719 0.0000000
DayMonday 0.0418134 0.0088829 4.7071774 0.0000025
DayTuesday 0.0549949 0.0094416 5.8247706 0.0000000
DayWednesday 0.0316272 0.0090743 3.4853702 0.0004915
DayThursday 0.0048565 0.0096974 0.5008067 0.6165072
DayFriday -0.0167479 0.0108925 -1.5375635 0.1241554
DaySaturday -0.0667274 0.0097414 -6.8498669 0.0000000
MeanTemp -0.0010004 0.0006053 -1.6527512 0.0983815
TempDiff 0.0008449 0.0008628 0.9792330 0.3274649
Precipitation -0.0306511 0.0095235 -3.2184824 0.0012887
### Rates Model 2
Rates_Model2 = glm(Manhattan ~ Day + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model2_Sum = summary(Rates_Model2)
Rates_Model2_Coef = Rates_Model2_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model2_Coef, caption = "Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2497005 0.0065309 -191.3530685 0.0000000
DayMonday 0.0417392 0.0088231 4.7306551 0.0000022
DayTuesday 0.0522471 0.0090521 5.7718313 0.0000000
DayWednesday 0.0285783 0.0088706 3.2216687 0.0012745
DayThursday 0.0006398 0.0091826 0.0696739 0.9444532
DayFriday -0.0205402 0.0106430 -1.9299220 0.0536165
DaySaturday -0.0684652 0.0096858 -7.0685797 0.0000000
Precipitation -0.0288171 0.0093266 -3.0897749 0.0020031

Looking at the findings of our second Poisson rates regression model, we see a trend similar to that of our Poisson counts regression model, that being a common occurrence of statistical significance but not a great deal of practical significance on display when the magnitude of the regression coefficient is taken into consideration.

Once again treating Sunday as our baseline, it looks like the rate of Manhattan Bridge cyclists in proportion to the entirety of East River Bridge cyclists is at its highest early in the week, with that rate declining going into the weekend. That being said, the statistical significance of this breakdown also greatly decreases when we look at the data for Thursday and to a much lesser but still noticeable extent Friday, perhaps suggesting that the Manhattan Bridge cyclist rate’s decline at the tail end of the workweek could be chalked up to random chance and not a particular characteristic of the Bridge that affects the experience of its cyclists only on those particular days.

2.3 Day-By-Day Averages

Since both our counts and rates models suggested that the day of the week has the greatest association with the log mean of the Manhattan Bridge’s cyclists, I decided to calculate the average counts and rates per day to compare them to each other and the mean across all days considered. The table with this information is below.

Count_Averages = c(
  round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$Day == "Saturday"]))
)

AllDays_Rates_Avg = sum(Data$Manhattan)/sum(Data$Total)

Sun_Rates_Avg = sum(Data$Manhattan[Data$Day == "Sunday"])/sum(Data$Total[Data$Day == "Sunday"])

Mon_Rates_Avg = sum(Data$Manhattan[Data$Day == "Monday"])/sum(Data$Total[Data$Day == "Monday"])

Tues_Rates_Avg = sum(Data$Manhattan[Data$Day == "Tuesday"])/sum(Data$Total[Data$Day == "Tuesday"])

Wed_Rates_Avg = sum(Data$Manhattan[Data$Day == "Wednesday"])/sum(Data$Total[Data$Day == "Wednesday"])

Thur_Rates_Avg = sum(Data$Manhattan[Data$Day == "Thursday"])/sum(Data$Total[Data$Day == "Thursday"])

Fri_Rates_Avg = sum(Data$Manhattan[Data$Day == "Friday"])/sum(Data$Total[Data$Day == "Friday"])

Sat_Rates_Avg = sum(Data$Manhattan[Data$Day == "Saturday"])/sum(Data$Total[Data$Day == "Saturday"])

Day_Rates_Averages = c(AllDays_Rates_Avg, Sun_Rates_Avg, Mon_Rates_Avg, Tues_Rates_Avg, Wed_Rates_Avg, Thur_Rates_Avg, Fri_Rates_Avg, Sat_Rates_Avg)

Rate_Averages = round(Day_Rates_Averages, digits = 4)

Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

Counts_Difference = c(
  0, # Difference between the average count of all days and itself
  round(mean(Data$Manhattan[Data$Day == "Sunday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Monday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Friday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Saturday"])) - round(mean(Data$Manhattan))
)

Rates_DifferenceB = c(
  0,
  Sun_Rates_Avg - AllDays_Rates_Avg,
  Mon_Rates_Avg - AllDays_Rates_Avg,
  Tues_Rates_Avg - AllDays_Rates_Avg,
  Wed_Rates_Avg - AllDays_Rates_Avg,
  Thur_Rates_Avg - AllDays_Rates_Avg,
  Fri_Rates_Avg - AllDays_Rates_Avg,
  Sat_Rates_Avg - AllDays_Rates_Avg
)

Rates_Difference = round(Rates_DifferenceB, digits = 4)

Table = cbind(Days, Count_Averages, Counts_Difference, Rate_Averages, Rates_Difference)

kable(Table, caption = "Distribution of Manhattan Bridge Cyclist Count and Rates July 2017") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center"
  )
Distribution of Manhattan Bridge Cyclist Count and Rates July 2017
Days Count_Averages Counts_Difference Rate_Averages Rates_Difference
All Days 5425 0 0.2885 0
Sunday 4690 -735 0.2865 -0.002
Monday 6001 576 0.2975 0.009
Tuesday 6363 938 0.302 0.0135
Wednesday 6938 1513 0.2949 0.0064
Thursday 5999 574 0.2868 -0.0017
Friday 4338 -1087 0.2775 -0.0109
Saturday 4031 -1394 0.2665 -0.022

The table provides greater detail into the implications of our Poisson count and rate models. That being weekday totals of Manhattan Bridge cyclists (specifically Monday - Thursday) far outweigh the count of cyclists on the bridge from Friday to Sunday. With the average number of cylclists from Monday - Thursday being about 6,325, and the average number Friday - Sunday being about 4,353.

As for the rate of Manhattan Bridge cyclists relative to cyclists on all East River Bridges, we see that the Manhattan Bridge’s cyclist rate is slightly above average Monday - Wednesday, but then below average Thursday through Sunday.

3 Conclusion and Takeaways

To conclude, any implementations done in response to this model’s findings should be done with caution due to the low practical significance found in both our count and rate Poisson models. That being said, there are still valuable takeaways that we can draw from our analysis.

First, the Manhattan Bridge is clearly busier, both in the sense of raw volume and as a proportion of the overall East River Bridge network, early and throughout the standard workweek than it is during the weekend. Second, the daily average temperature as well as the difference between that day’s high and low played very little if any role in the count or rate of cyclists on any given day, but the measure of precipitation does appear to have a relatively noticeable and negative association with the number of that day’s cyclists on the Manhattan Bridge.

If we were to continue or expand on this analysis in the future, it would be valuable to expand the scope of our data outside of the month of July and into months that border on seasonal changes such as March, April or October. Intuitively, one might guess that the day’s temperatures play a much larger role in a time of the year like that.

