Introduction and
Background
Here we have a dataset sourced from New York City’s Traffic
Information Management System (TIMS). TIMS recorded the number of
cyclists entering and leaving three of New York City’s five boroughs -
Queens, Manhattan and Brooklyn - via a collection of bridges known as
the East River Bridges (Brooklyn Bridge, Manhattan Bridge, Williamsburg
Bridge, and Queensboro Bridge). These recordings took place in 2017.
April, July and October are the three months that are present in our
available copy of the data.
For today’s analysis we are going to look at a randomly selected
subset of the larger dataset (subset was chosen using R’s runif
function), that pertains to cyclists who entered and left our three
boroughs of interest - Queens, Manhattan and Brooklyn - via the
Manhattan Bridge throughout the entire month of July
2017. This data has 31 observations, one detailing each day, and no
missing values. A breakdown of each of the original dataset’s variables,
their practical meaning and data types are below.
|
Name
|
Meaning
|
Data_Type
|
|
Date
|
Date for that observation; YYYY-MM-DD form
|
Date
|
|
Day
|
Day of the week for that observation
|
character
|
|
HighTemp
|
That day’s highest recorded temperature
|
double
|
|
LowTemp
|
That day’s lowest recorded temperature
|
double
|
|
Precipitation
|
Measure of rain that day (inches)
|
double
|
|
Manhattan
|
Number of cyclists entering/leaving Queens, Manhattan or Brooklyn via
the MANHATTAN Bridge
|
double
|
|
Total
|
Total number of cyclists entering/leaving Queens, Manhattan or Brooklyn
via ANY of the East River Bridges
|
double
|
Objective of
Analysis
With the available data, my goal for this analysis is to examine the
association between weather conditions and day of the week with the
amount of cyclist traffic that the Manhattan Bridge experiences. In
order to do this, I created two new variables - MeanTemp and TempDiff -
which were calculated by averaging that particular day’s low and high
temperatures and finding the difference between those temperatures
respectively.
Using these temperature-related metrics, along with measures of
precipitation and records of the day of the week, I will use Poisson
regression techniques to see which if any of these factors play a
particular role in the overall amount or the relative rate of
cyclist traffic that the Manhattan Bridge experiences.
Poisson Regression
Modeling
To explore any potential associations, I created Poisson models of
two different regression types, one being for counts and one being for
rates.
Poisson counts regression examines the total number of occurrences of
a particular event (in this case cyclists on the Manhattan Bridge) and
uses a logarithmic function to determine which, if any of the
explanatory variables have a significant effect on said response
variable’s mean. The formula for said regression is below:

\(\beta\)0 = the log
of our response variable’s mean; not very useful for practical
interpretation
\(\beta\)1, \(\beta\)2, \(\beta\)3, … \(\beta\)p = the change in our
response variable’s log mean, in association with a one unit increase in
said predictor variable
Additionally, Poisson rates regression aims to find the expected rate
of a particular event’s occurrence relative to that event’s proportion
within a larger “population.” In the instance of this dataset and
analysis, our variable Total, which represents the
total number of cyclists on all the East River Bridges,
will be what the number of cyclists on the Manhattan Bridge are
considered to be a proportion of. The calculation for this type of
Poisson regression is similar to counts regression, but the logarithm of
the population variable is also considered to be a factor. This can be
expressed in both of the following ways.
- In Poisson rates regression, the parameters \(\beta\)0, …. \(\beta\)p should be interpreted
in the same manner as they are in Poisson counts model.
Poisson Regression
(Counts)
Below is a summary of the Poisson counts regression model I created,
with measures of temperature range and averages, precipitation amount
and day of the week all functioning as predictors of how many cyclists
crossed the Manhattan Bridge in or out of our three boroughs of
interest.
# Counts Model:
# Response = Manhattan
# Predictors = Day, MeanTemp, TempDiff, Precipitation
# Day is stored as a Factor
Counts_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, family = poisson(link = "log"), data = Data)
Counts_Model_Sum = summary(Counts_Model)
Counts_Model_Coef = Counts_Model_Sum$coefficients
invisible(Counts_Model_Coef)
kable(Counts_Model_Coef, caption = "Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Counts Regression: Weather and Schedule Relationship
with Count of Manhattan Bridge Cyclists
| (Intercept) |
8.5013371 |
0.0421490 |
201.697286 |
0.0000000 |
| DayMonday |
0.3199236 |
0.0089935 |
35.572866 |
0.0000000 |
| DayTuesday |
0.3357894 |
0.0093242 |
36.012843 |
0.0000000 |
| DayWednesday |
0.4023102 |
0.0090796 |
44.309388 |
0.0000000 |
| DayThursday |
0.2807381 |
0.0096557 |
29.074878 |
0.0000000 |
| DayFriday |
0.1331873 |
0.0107366 |
12.405032 |
0.0000000 |
| DaySaturday |
-0.0859127 |
0.0097279 |
-8.831613 |
0.0000000 |
| MeanTemp |
-0.0029260 |
0.0006129 |
-4.774076 |
0.0000018 |
| TempDiff |
0.0143243 |
0.0008575 |
16.703807 |
0.0000000 |
| Precipitation |
-0.4307477 |
0.0104214 |
-41.332836 |
0.0000000 |
# All predictor variables are significant
In the model, we can see that every predictor variable is
statistically significant as per p values well below the standard of
0.05, so no stepwise regression or model simplification is
necessary.
As for the practical implications of our model summary, we can say
that although every predictor variable is statistically significant, the
magnitude of their impacts are relatively small. Precipitation’s
estimated negative effect on the log mean of Manhattan Bridge cyclists
has an absolute value ~ |.4307|, which is the the highest of all our
predictors.
It appears that the day’s average temperature and difference in daily
highs and lows played very little practical significance in the log mean
of that day’s cyclists. When we look at the difference in log means from
a day-of-the-week perspective, we do see a slightly more impactful
effect. With Sunday being coded in as the baseline, it looks like
Wednesday has the greatest amount of cyclist traffic and Saturday has
the least. This higher count of cyclists during the workweek could be
due to the Manhattan Bridge functioning for many as a commuting
method.
All in all, our Poisson counts model yields some interesting and
statistically significant revelations, most notably that cyclists care
far more about precipitation than they do temperature fluctuation, and
that cyclist traffic appears to tick upwards throughout the workweek
before dying down for the weekend. However, the relatively small
magnitude of each variable’s estimated effect is a downside regarding
the model’s utility.
Poisson Regression
(Rates)
After Poisson counts regression, I then performed Poisson rates
regression with the total number of cyclists entering and exiting our
three boroughs of interest across all the East River Bridges as
the “population” for which the Manhattan Bridge cyclists are acting as a
sample of.
This process consisted of me creating two different Poisson rates
models. The first one I created listed both temperature variables as
statistically insignificant. Given their status as statistically
insignificant in this model, and their minute practical significance in
the previous counts model, I chose to remove them and create a second
Poisson rates model which did not factor in the day’s average or range
of temperature.
### Rates Model 1
Rates_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)
Rates_Model_Sum = summary(Rates_Model)
Rates_Model_Coef = Rates_Model_Sum$coefficients
invisible(Rates_Model_Coef)
kable(Rates_Model_Coef, caption = "Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Rates Regression (1): Weather and Schedule Relationship
with Count of Manhattan Bridge Cyclists
| (Intercept) |
-1.1844325 |
0.0418215 |
-28.3211719 |
0.0000000 |
| DayMonday |
0.0418134 |
0.0088829 |
4.7071774 |
0.0000025 |
| DayTuesday |
0.0549949 |
0.0094416 |
5.8247706 |
0.0000000 |
| DayWednesday |
0.0316272 |
0.0090743 |
3.4853702 |
0.0004915 |
| DayThursday |
0.0048565 |
0.0096974 |
0.5008067 |
0.6165072 |
| DayFriday |
-0.0167479 |
0.0108925 |
-1.5375635 |
0.1241554 |
| DaySaturday |
-0.0667274 |
0.0097414 |
-6.8498669 |
0.0000000 |
| MeanTemp |
-0.0010004 |
0.0006053 |
-1.6527512 |
0.0983815 |
| TempDiff |
0.0008449 |
0.0008628 |
0.9792330 |
0.3274649 |
| Precipitation |
-0.0306511 |
0.0095235 |
-3.2184824 |
0.0012887 |
### Rates Model 2
Rates_Model2 = glm(Manhattan ~ Day + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)
Rates_Model2_Sum = summary(Rates_Model2)
Rates_Model2_Coef = Rates_Model2_Sum$coefficients
invisible(Rates_Model_Coef)
kable(Rates_Model2_Coef, caption = "Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Rates Regression (2): Precipitation and Schedule
Relationship with Count of Manhattan Bridge Cyclists
| (Intercept) |
-1.2497005 |
0.0065309 |
-191.3530685 |
0.0000000 |
| DayMonday |
0.0417392 |
0.0088231 |
4.7306551 |
0.0000022 |
| DayTuesday |
0.0522471 |
0.0090521 |
5.7718313 |
0.0000000 |
| DayWednesday |
0.0285783 |
0.0088706 |
3.2216687 |
0.0012745 |
| DayThursday |
0.0006398 |
0.0091826 |
0.0696739 |
0.9444532 |
| DayFriday |
-0.0205402 |
0.0106430 |
-1.9299220 |
0.0536165 |
| DaySaturday |
-0.0684652 |
0.0096858 |
-7.0685797 |
0.0000000 |
| Precipitation |
-0.0288171 |
0.0093266 |
-3.0897749 |
0.0020031 |
Looking at the findings of our second Poisson rates regression model,
we see a trend similar to that of our Poisson counts regression model,
that being a common occurrence of statistical significance but not a
great deal of practical significance on display when the magnitude of
the regression coefficient is taken into consideration.
Once again treating Sunday as our baseline, it looks like the rate of
Manhattan Bridge cyclists in proportion to the entirety of East River
Bridge cyclists is at its highest early in the week, with that rate
declining going into the weekend. That being said, the statistical
significance of this breakdown also greatly decreases when we look at
the data for Thursday and to a much lesser but still noticeable extent
Friday, perhaps suggesting that the Manhattan Bridge cyclist rate’s
decline at the tail end of the workweek could be chalked up to random
chance and not a particular characteristic of the Bridge that affects
the experience of its cyclists only on those particular days.
Day-By-Day
Averages
Since both our counts and rates models suggested that the day of the
week has the greatest association with the log mean of the Manhattan
Bridge’s cyclists, I decided to calculate the average counts and rates
per day to compare them to each other and the mean across all days
considered. The table with this information is below.
Count_Averages = c(
round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Sunday"])),
round(mean(Data$Manhattan[Data$Day == "Monday"])),
round(mean(Data$Manhattan[Data$Day == "Tuesday"])),
round(mean(Data$Manhattan[Data$Day == "Wednesday"])),
round(mean(Data$Manhattan[Data$Day == "Thursday"])),
round(mean(Data$Manhattan[Data$Day == "Friday"])),
round(mean(Data$Manhattan[Data$Day == "Saturday"]))
)
AllDays_Rates_Avg = sum(Data$Manhattan)/sum(Data$Total)
Sun_Rates_Avg = sum(Data$Manhattan[Data$Day == "Sunday"])/sum(Data$Total[Data$Day == "Sunday"])
Mon_Rates_Avg = sum(Data$Manhattan[Data$Day == "Monday"])/sum(Data$Total[Data$Day == "Monday"])
Tues_Rates_Avg = sum(Data$Manhattan[Data$Day == "Tuesday"])/sum(Data$Total[Data$Day == "Tuesday"])
Wed_Rates_Avg = sum(Data$Manhattan[Data$Day == "Wednesday"])/sum(Data$Total[Data$Day == "Wednesday"])
Thur_Rates_Avg = sum(Data$Manhattan[Data$Day == "Thursday"])/sum(Data$Total[Data$Day == "Thursday"])
Fri_Rates_Avg = sum(Data$Manhattan[Data$Day == "Friday"])/sum(Data$Total[Data$Day == "Friday"])
Sat_Rates_Avg = sum(Data$Manhattan[Data$Day == "Saturday"])/sum(Data$Total[Data$Day == "Saturday"])
Day_Rates_Averages = c(AllDays_Rates_Avg, Sun_Rates_Avg, Mon_Rates_Avg, Tues_Rates_Avg, Wed_Rates_Avg, Thur_Rates_Avg, Fri_Rates_Avg, Sat_Rates_Avg)
Rate_Averages = round(Day_Rates_Averages, digits = 4)
Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
Counts_Difference = c(
0, # Difference between the average count of all days and itself
round(mean(Data$Manhattan[Data$Day == "Sunday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Monday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Tuesday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Wednesday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Thursday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Friday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Saturday"])) - round(mean(Data$Manhattan))
)
Rates_DifferenceB = c(
0,
Sun_Rates_Avg - AllDays_Rates_Avg,
Mon_Rates_Avg - AllDays_Rates_Avg,
Tues_Rates_Avg - AllDays_Rates_Avg,
Wed_Rates_Avg - AllDays_Rates_Avg,
Thur_Rates_Avg - AllDays_Rates_Avg,
Fri_Rates_Avg - AllDays_Rates_Avg,
Sat_Rates_Avg - AllDays_Rates_Avg
)
Rates_Difference = round(Rates_DifferenceB, digits = 4)
Table = cbind(Days, Count_Averages, Counts_Difference, Rate_Averages, Rates_Difference)
kable(Table, caption = "Distribution of Manhattan Bridge Cyclist Count and Rates July 2017") %>%
kable_styling(
bootstrap_options = c("striped", "bordered"),
full_width = FALSE,
position = "center"
)
Distribution of Manhattan Bridge Cyclist Count and Rates July 2017
|
Days
|
Count_Averages
|
Counts_Difference
|
Rate_Averages
|
Rates_Difference
|
|
All Days
|
5425
|
0
|
0.2885
|
0
|
|
Sunday
|
4690
|
-735
|
0.2865
|
-0.002
|
|
Monday
|
6001
|
576
|
0.2975
|
0.009
|
|
Tuesday
|
6363
|
938
|
0.302
|
0.0135
|
|
Wednesday
|
6938
|
1513
|
0.2949
|
0.0064
|
|
Thursday
|
5999
|
574
|
0.2868
|
-0.0017
|
|
Friday
|
4338
|
-1087
|
0.2775
|
-0.0109
|
|
Saturday
|
4031
|
-1394
|
0.2665
|
-0.022
|
The table provides greater detail into the implications of our
Poisson count and rate models. That being weekday totals of Manhattan
Bridge cyclists (specifically Monday - Thursday) far outweigh the count
of cyclists on the bridge from Friday to Sunday. With the average number
of cylclists from Monday - Thursday being about 6,325, and the average
number Friday - Sunday being about 4,353.
As for the rate of Manhattan Bridge cyclists relative to cyclists on
all East River Bridges, we see that the Manhattan Bridge’s cyclist rate
is slightly above average Monday - Wednesday, but then below average
Thursday through Sunday.
Conclusion and
Takeaways
To conclude, any implementations done in response to this model’s
findings should be done with caution due to the low practical
significance found in both our count and rate Poisson models. That being
said, there are still valuable takeaways that we can draw from our
analysis.
First, the Manhattan Bridge is clearly busier, both in the sense of
raw volume and as a proportion of the overall East River Bridge network,
early and throughout the standard workweek than it is during the
weekend. Second, the daily average temperature as well as the difference
between that day’s high and low played very little if any role in the
count or rate of cyclists on any given day, but the measure of
precipitation does appear to have a relatively noticeable and negative
association with the number of that day’s cyclists on the Manhattan
Bridge.
If we were to continue or expand on this analysis in the future, it
would be valuable to expand the scope of our data outside of the month
of July and into months that border on seasonal changes such as March,
April or October. Intuitively, one might guess that the day’s
temperatures play a much larger role in a time of the year like
that.
