1 Introduction
Daily cyclist counts on New York City’s East River bridges provide
valuable insight into transportation patterns, recreational activity,
and how environmental conditions influence urban mobility. In last
week’s analysis, a standard Poisson model was used to examine how
continuous measures of temperature and precipitation affected cyclist
counts on the Manhattan Bridge. While the initial findings highlighted
clear relationships between weather and cycling behavior, the data also
showed signs of overdispersion, indicating that the variability in
cyclist counts may not be fully captured by a basic Poisson
framework.
To address this, the current study refines the analysis by
re-engineering key predictor variables and applying a dispersed Poisson
approach. Rather than treating high and low temperatures separately,
daily temperature is summarized using a single average temperature
variable (AvgTemp), offering a more stable and interpretable measure of
overall weather conditions. Precipitation is also simplified: because
the data contain many zero-rainfall days and a small number of positive
rainfall values, precipitation is converted into a binary indicator
(NewPrecip) distinguishing dry days from days with any measurable
rainfall. These modifications help create a more realistic
representation of how weather influences cyclist behavior.
The primary research question guiding this revised analysis is:
How do average temperature and precipitation status influence daily
cyclist counts on the Manhattan Bridge when accounting for potential
overdispersion in the data?
To answer this question, we conduct exploratory data analysis, fit
Poisson and quasi-Poisson regression models, evaluate the estimated
dispersion parameter, and visualize how expected cyclist counts vary
across temperature and precipitation conditions. This approach provides
a clearer and more robust understanding of how key environmental factors
shape daily cycling activity.
2 Description of Dataset
The dataset originates from the Traffic Information Management System
(TIMS), which collects automated bicycle counts across New York City’s
East River bridges. The data are summarized on a daily basis, where each
observation represents the total number of cyclists crossing the bridges
in a 24-hour period, along with corresponding weather conditions for
that day. These data provide a snapshot of how cycling activity
fluctuates with environmental factors such as temperature and
rainfall.
There are 5 original exploratory variables in the data set:
- Day (numerical) — an index we created to represent the sequence of
observations (1, 2, …, n)
- LowTemp (numerical) - lowest temperature
- Precipitation (numerical) - amount of precipitation
- HighTemp (numerical) - highest temperature
- Total (numerical) - total number of cyclists per 24 hours on the
Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro
Bridge
We engineered two new variables:
- AvgTemp = (HighTemp + LowTemp) / 2
- NewPrecip = 0 if Precipitation = 0, otherwise 1
Our response variable is:
- ManhattanBridge (numerical) - total number of cyclists per 24 hours
crossing the Manhattan Bridge
Assumptions and Conditions for Poisson
Regression:
Count response variable: The dependent variable must represent counts
(non-negative integers), such as the number of cyclists crossing a
bridge per day.
Independence of observations: Each observation (daily count) is
assumed to be independent of the others.
Equidispersion: The mean and variance of the response variable are
approximately equal. Overdispersion (variance greater than mean) can
indicate model misfit and may require a quasi-Poisson or negative
binomial adjustment.
Log-linearity: The logarithm of the expected count is a linear
function of the predictor variables.
No excessive zeros: Poisson regression assumes the count data are not
dominated by zeros (zero-inflated models would be used otherwise).
Explanatory variables measured without error: Predictor variables
(e.g., temperature, precipitation) are assumed to be observed
accurately.
3 Exploratory Data Analysis
To begin, we will make a frequency distribution histogram for our
variable, Precipiation, to understand the behavior of the variable
before modeling.
ggplot(data, aes(x = Precipitation)) +
geom_histogram(binwidth = 0.1, fill = "steelblue", color = "black") +
labs(
title = "Distribution of Precipitation",
x = "Precipitation (inches)",
y = "Frequency"
)

The histogram of precipitation clearly shows a highly skewed and
imbalanced distribution. The vast majority of observations fall at 0
inches, meaning most days in the dataset experienced no rainfall at all.
This is evident from the tall bar at zero, which dramatically outweighs
the frequency of all other values. In contrast, the days with measurable
rainfall are both rare and widely dispersed, with only a handful of
observations scattered across various positive precipitation amounts
(e.g., 0.1, 0.3, 0.7, 1.2 inches). None of these positive values occur
frequently enough to form meaningful clusters.
This extreme right skew indicates that precipitation behaves almost
like a binary variable in practice: the important distinction is simply
whether it rained or not, rather than the exact amount. Because of this
structure, treating precipitation as a continuous predictor would do
little to improve model accuracy and could introduce instability due to
the influence of a few outlying high-rainfall days. The histogram
therefore justifies discretizing precipitation into a binary
indicator—NewPrecip = 0 for no rain and 1 for any rain—which better
reflects the underlying pattern in the data and aligns with how weather
affects cyclist behavior.
We next discretize Precipitation to assess for this.
id.rain = which(data$Precipitation > 0)
NewPrecip = data$Precipitation
NewPrecip[id.rain] = 1
NewPrecip[-id.rain] = 0
data$NewPrecip = NewPrecip
In this code chunk, we convert the original precipitation variable
into a simple binary indicator. Any day with precipitation greater than
zero is coded as 1, and days with no precipitation are coded as 0. This
creates the new variable NewPrecip, which simplifies the analysis by
focusing on whether it rained or not—important because the original
precipitation data are highly skewed with many zero values.
4 Poisson Regression Modeling
We build both the regular Poisson and Quasi-Poisson regression models
and extract the dispersion parameter to decide which model should be
used as a working model.
4.2 Visual Comparisons
Next, we create a visualization to show how the predictors in our
final model relate to the actual number of cyclists on the Manhattan
Bridge. We define two comparison groups based on precipitation
status:
Dry days: NewPrecip = 0
Rainy days: NewPrecip = 1
For a grid of average temperatures spanning the observed range, we
compute the model-based predicted log counts, then exponentiate them to
obtain predicted cyclist counts. Plotting these curves side-by-side (dry
vs. rainy) across temperature illustrates how warmer conditions raise
expected volume and how any precipitation consistently lowers expected
counts at each temperature level
## Final model assumed fitted as:
## final_model <- glm(ManhattanBridge ~ AvgTemp + NewPrecip,
## family = quasipoisson, data = data)
par(mar = c(5, 5, 4, 2)) # bottom, left, top, right
par(cex.lab = 1.2, cex.axis = 1, cex.main = 1.2)
co <- coef(quasi.model.02)
# Temperature grid across observed range
temps <- seq(min(data$AvgTemp, na.rm = TRUE),
max(data$AvgTemp, na.rm = TRUE), length.out = 200)
# Linear predictors by precipitation status
eta_dry <- co["(Intercept)"] + co["AvgTemp"]*temps + co["NewPrecip"]*0
eta_wet <- co["(Intercept)"] + co["AvgTemp"]*temps + co["NewPrecip"]*1
# Expected cyclist counts
mu_dry <- exp(eta_dry)
mu_wet <- exp(eta_wet)
# Plot (base R), styled like the class example
ylim_max <- max(mu_dry, mu_wet, na.rm = TRUE)
plot(temps, mu_dry, type = "l", lty = 1,
ylim = c(0, ylim_max),
xlab = "Average Temperature (°F)",
ylab = "Expected Manhattan Bridge Cyclists",
main = "Expected Cyclist Counts by AvgTemp and Precipitation")
lines(temps, mu_wet, lty = 2)
legend("topleft",
c("Dry day (NewPrecip = 0)", "Rain day (NewPrecip = 1)"),
lty = c(1, 2), bty = "n", cex = 0.9)

The plot displays the expected number of cyclists crossing the
Manhattan Bridge as a function of average daily temperature, with
separate curves shown for dry days (NewPrecip = 0) and rainy days
(NewPrecip = 1). The expected counts were generated by exponentiating
the fitted quasi-Poisson regression equation using a sequence of
temperatures from 45°F to 75°F.
The visualization highlights two clear patterns. First, cyclist
counts increase steadily with increasing average temperature. Warmer
days are associated with substantially higher predicted cycling volumes,
consistent with the positive and significant coefficient for AvgTemp in
the regression model. Second, daily precipitation has a marked negative
effect on cycling levels. Across all temperatures, the predicted counts
on rainy days lie well below those on dry days, illustrating the
substantial decrease in cyclist activity when any precipitation
occurs.
Overall, the graph reinforces the statistical findings from the final
quasi-Poisson model: average temperature is positively associated with
cyclist counts, while rainfall significantly reduces expected bridge
usage.
5 Conclusion
In this extended analysis of daily cyclist activity on the Manhattan
Bridge, we evaluated how average temperature and precipitation influence
cycling behavior under a dispersed Poisson framework. By engineering two
key predictor variables—AvgTemp to summarize overall daily temperature,
and NewPrecip to distinguish between dry and rainy days—we captured the
primary weather-related factors that shape daily cyclist counts. After
fitting both a standard Poisson model and a dispersed Poisson model, the
estimated dispersion parameter indicated severe overdispersion in the
raw Poisson fit, with values exceeding 300. This confirmed that the
regular Poisson assumptions were not appropriate for the data, and that
the quasi-Poisson model was the correct working model for inference.
The quasi-Poisson results demonstrated clear, intuitive effects.
AvgTemp had a strong, positive association with cyclist counts: warmer
days consistently led to higher expected volumes of bridge crossings. In
contrast, precipitation significantly decreased cyclist activity; even a
small amount of rainfall corresponded to a large reduction in expected
counts. The Day variable, included initially to assess underlying
trends, was not statistically significant and was removed from the final
model. The simplified model therefore focused on the two predictors with
meaningful influence.
The visual comparison of predicted cyclist counts further supported
these findings. The expected counts rose steadily with increasing
average temperature, while the predicted curve for rainy days remained
consistently lower than that of dry days across the entire temperature
range. This illustrates how weather conditions jointly shape rider
behavior: favorable temperatures encourage cycling, while rain sharply
suppresses it.
Overall, the quasi-Poisson modeling approach provided a more accurate
and reliable picture of cyclist behavior on the Manhattan Bridge by
accounting for overdispersion in the data. The results highlight the
substantial role that daily weather conditions play in influencing
cycling patterns and offer valuable insight for city planners seeking to
anticipate and accommodate fluctuations in bicycle traffic.
