1 Introduction

Daily cyclist counts on New York City’s East River bridges provide valuable insight into transportation patterns, recreational activity, and how environmental conditions influence urban mobility. In last week’s analysis, a standard Poisson model was used to examine how continuous measures of temperature and precipitation affected cyclist counts on the Manhattan Bridge. While the initial findings highlighted clear relationships between weather and cycling behavior, the data also showed signs of overdispersion, indicating that the variability in cyclist counts may not be fully captured by a basic Poisson framework.

To address this, the current study refines the analysis by re-engineering key predictor variables and applying a dispersed Poisson approach. Rather than treating high and low temperatures separately, daily temperature is summarized using a single average temperature variable (AvgTemp), offering a more stable and interpretable measure of overall weather conditions. Precipitation is also simplified: because the data contain many zero-rainfall days and a small number of positive rainfall values, precipitation is converted into a binary indicator (NewPrecip) distinguishing dry days from days with any measurable rainfall. These modifications help create a more realistic representation of how weather influences cyclist behavior.

The primary research question guiding this revised analysis is:

How do average temperature and precipitation status influence daily cyclist counts on the Manhattan Bridge when accounting for potential overdispersion in the data?

To answer this question, we conduct exploratory data analysis, fit Poisson and quasi-Poisson regression models, evaluate the estimated dispersion parameter, and visualize how expected cyclist counts vary across temperature and precipitation conditions. This approach provides a clearer and more robust understanding of how key environmental factors shape daily cycling activity.

2 Description of Dataset

The dataset originates from the Traffic Information Management System (TIMS), which collects automated bicycle counts across New York City’s East River bridges. The data are summarized on a daily basis, where each observation represents the total number of cyclists crossing the bridges in a 24-hour period, along with corresponding weather conditions for that day. These data provide a snapshot of how cycling activity fluctuates with environmental factors such as temperature and rainfall.

There are 5 original exploratory variables in the data set:

  • Day (numerical) — an index we created to represent the sequence of observations (1, 2, …, n)
  • LowTemp (numerical) - lowest temperature
  • Precipitation (numerical) - amount of precipitation
  • HighTemp (numerical) - highest temperature
  • Total (numerical) - total number of cyclists per 24 hours on the Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge

We engineered two new variables:

  • AvgTemp = (HighTemp + LowTemp) / 2
  • NewPrecip = 0 if Precipitation = 0, otherwise 1

Our response variable is:

  • ManhattanBridge (numerical) - total number of cyclists per 24 hours crossing the Manhattan Bridge

Assumptions and Conditions for Poisson Regression:

Count response variable: The dependent variable must represent counts (non-negative integers), such as the number of cyclists crossing a bridge per day.

Independence of observations: Each observation (daily count) is assumed to be independent of the others.

Equidispersion: The mean and variance of the response variable are approximately equal. Overdispersion (variance greater than mean) can indicate model misfit and may require a quasi-Poisson or negative binomial adjustment.

Log-linearity: The logarithm of the expected count is a linear function of the predictor variables.

No excessive zeros: Poisson regression assumes the count data are not dominated by zeros (zero-inflated models would be used otherwise).

Explanatory variables measured without error: Predictor variables (e.g., temperature, precipitation) are assumed to be observed accurately.

3 Exploratory Data Analysis

To begin, we will make a frequency distribution histogram for our variable, Precipiation, to understand the behavior of the variable before modeling.

ggplot(data, aes(x = Precipitation)) +
  geom_histogram(binwidth = 0.1, fill = "steelblue", color = "black") +
  labs(
    title = "Distribution of Precipitation",
    x = "Precipitation (inches)",
    y = "Frequency"
  )

The histogram of precipitation clearly shows a highly skewed and imbalanced distribution. The vast majority of observations fall at 0 inches, meaning most days in the dataset experienced no rainfall at all. This is evident from the tall bar at zero, which dramatically outweighs the frequency of all other values. In contrast, the days with measurable rainfall are both rare and widely dispersed, with only a handful of observations scattered across various positive precipitation amounts (e.g., 0.1, 0.3, 0.7, 1.2 inches). None of these positive values occur frequently enough to form meaningful clusters.

This extreme right skew indicates that precipitation behaves almost like a binary variable in practice: the important distinction is simply whether it rained or not, rather than the exact amount. Because of this structure, treating precipitation as a continuous predictor would do little to improve model accuracy and could introduce instability due to the influence of a few outlying high-rainfall days. The histogram therefore justifies discretizing precipitation into a binary indicator—NewPrecip = 0 for no rain and 1 for any rain—which better reflects the underlying pattern in the data and aligns with how weather affects cyclist behavior.

We next discretize Precipitation to assess for this.

id.rain   = which(data$Precipitation > 0)
NewPrecip = data$Precipitation
NewPrecip[id.rain] = 1
NewPrecip[-id.rain] = 0
data$NewPrecip = NewPrecip

In this code chunk, we convert the original precipitation variable into a simple binary indicator. Any day with precipitation greater than zero is coded as 1, and days with no precipitation are coded as 0. This creates the new variable NewPrecip, which simplifies the analysis by focusing on whether it rained or not—important because the original precipitation data are highly skewed with many zero values.

4 Poisson Regression Modeling

We build both the regular Poisson and Quasi-Poisson regression models and extract the dispersion parameter to decide which model should be used as a working model.

4.1 Extracting Dispersion Index

We calculate the estimated dispersion value using both Pearson residuals and deviance residuals. In practice, R’s built-in dispersion estimate is based on the Pearson residuals, which it uses to approximate how much the data deviate from the assumptions of a standard Poisson model.

# AvgTemp = (HighTemp + LowTemp)/2
data$AvgTemp <- (data$HighTemp + data$LowTemp) / 2

# Discretize Precipitation: NewPrecip = 0 if 0, 1 if > 0
id.rain <- which(data$Precipitation > 0)
NewPrecip <- rep(0, nrow(data))
NewPrecip[id.rain] <- 1
data$NewPrecip <- NewPrecip

# Day index (1, 2, ..., n). If you prefer calendar info, swap for as.integer(factor(Date)).
data$Day <- seq_len(nrow(data))

# --- Poisson model (ManhattanBridge counts) ---
pois_mod <- glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip,
                family = poisson(link = "log"), data = data)

# Predicted mean and dispersion estimates
yhat <- pois_mod$fitted.values
pearson_resid <- (data$ManhattanBridge - yhat) / sqrt(yhat)
Pearson.disp <- sum(pearson_resid^2) / pois_mod$df.residual
Deviance.disp <- pois_mod$deviance / pois_mod$df.residual

# Tidy little table for dispersion
disp_tab <- data.frame(Pearson.disp = Pearson.disp,
                       Deviance.disp = Deviance.disp)
knitr::kable(disp_tab, caption = "Estimated dispersion (Poisson fit)")
Estimated dispersion (Poisson fit)
Pearson.disp Deviance.disp
326.1144 336.6871
# --- Quasi-Poisson model (same mean, SEs scaled by dispersion) ---
qpois_mod <- glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip,
                 family = quasipoisson, data = data)

knitr::kable(summary(qpois_mod)$coef,
             caption = "Quasi-Poisson regression coefficients (Manhattan Bridge)")
Quasi-Poisson regression coefficients (Manhattan Bridge)
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.7958375 0.3724002 18.2487500 0.0000000
Day -0.0030259 0.0065783 -0.4599783 0.6493554
AvgTemp 0.0295903 0.0065617 4.5095289 0.0001227
NewPrecip -0.2764622 0.1132013 -2.4422176 0.0217057

The dispersion results show extremely high values for both the Pearson dispersion index (326.11) and the deviance dispersion index (336.69), indicating severe overdispersion in the Poisson model. Under the Poisson assumption, the variance should be approximately equal to the mean, producing a dispersion value close to 1. Even modest deviations—values around 2 or 3—signal that the Poisson model may not fit well. Here, the estimated dispersion is more than 300 times larger than expected, meaning the variability in cyclist counts is far greater than what a standard Poisson model can accommodate. This level of overdispersion implies that the Poisson model dramatically understates the uncertainty in the data, leading to unreliable standard errors and misleading p-values. Therefore, the quasi-Poisson model, which adjusts for excess variance by scaling the standard errors, is clearly the more appropriate modeling choice for this dataset.

Next, we summarize the inferential statistics about the regression coefficients in the following table.

quasi.model <- glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip, 
                 family = quasipoisson, data = data)  
summary(quasi.model )

Call:
glm(formula = ManhattanBridge ~ Day + AvgTemp + NewPrecip, family = quasipoisson, 
    data = data)

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)  6.795837   0.372400  18.249 2.42e-16 ***
Day         -0.003026   0.006578  -0.460 0.649355    
AvgTemp      0.029590   0.006562   4.510 0.000123 ***
NewPrecip   -0.276462   0.113201  -2.442 0.021706 *  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for quasipoisson family taken to be 326.1148)

    Null deviance: 20980.3  on 29  degrees of freedom
Residual deviance:  8753.9  on 26  degrees of freedom
AIC: NA

Number of Fisher Scoring iterations: 4
SE.quasi.pois = summary(quasi.model)$coef
kable(SE.quasi.pois, caption = "Summary statistics of quasi-poisson regression model")
Summary statistics of quasi-poisson regression model
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.7958375 0.3724002 18.2487500 0.0000000
Day -0.0030259 0.0065783 -0.4599783 0.6493554
AvgTemp 0.0295903 0.0065617 4.5095289 0.0001227
NewPrecip -0.2764622 0.1132013 -2.4422176 0.0217057

The quasi-Poisson regression results reveal how average temperature, precipitation status, and day-to-day variation influence the number of cyclists crossing the Manhattan Bridge.

The coefficient for Day is small and statistically insignificant (β = −0.0030, p = 0.649), indicating that there is no meaningful day-to-day trend in cyclist counts after accounting for temperature and precipitation. This suggests that long-term or sequential effects are not a major driver of variation in usage during the period studied.

Average temperature (AvgTemp) is a strong positive predictor of cycling activity (β = 0.0296, p < 0.001). Interpreted on the log scale, holding all else constant, each one-degree increase in average temperature raises the expected cyclist count by approximately 3% (since exp(0.0296) ≈ 1.03). This aligns with intuitive expectations that warmer weather encourages more cycling.

The precipitation indicator (NewPrecip) has a significant negative effect on cyclist counts (β = −0.2765, p = 0.022). On days with any precipitation, the expected number of cyclists decreases by about 24% compared to completely dry days (exp(−0.2765) ≈ 0.76). This substantial drop reflects the strong influence of rainfall on cycling behavior.

Overall, the quasi-Poisson model indicates that average temperature and precipitation meaningfully impact Manhattan Bridge cycling counts, while daily trends do not contribute significantly to variation.

The estimated dispersion index is 326.11 based on the Pearson residuals.

In our final model, we will drop Day because it was not statistically significant. We refit the quasi-Poisson model after dropping Day.

quasi.model.02 = glm(ManhattanBridge ~ AvgTemp + NewPrecip, 
                 family = quasipoisson, data = data)

kable(summary(quasi.model.02)$coef, caption = "Inferential statistics of 
the Poisson regression coefficients  in the final working model.")
Inferential statistics of the Poisson regression coefficients in the final working model.
Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.8265361 0.3623033 18.842050 0.0000000
AvgTemp 0.0283012 0.0058694 4.821823 0.0000492
NewPrecip -0.2859083 0.1097976 -2.603957 0.0147972

The final quasi-Poisson model indicates that AvgTemp and NewPrecip are both significant predictors of daily cyclist counts on the Manhattan Bridge. AvgTemp has a positive and highly significant effect (Estimate = 0.0283, p < 0.001), meaning warmer days are associated with increased cyclist activity. NewPrecip has a significant negative effect (Estimate = −0.286, p = 0.015), showing that days with any precipitation have substantially lower cyclist counts compared to dry days. Overall, temperature increases cycling activity, while precipitation suppresses it.

4.2 Visual Comparisons

Next, we create a visualization to show how the predictors in our final model relate to the actual number of cyclists on the Manhattan Bridge. We define two comparison groups based on precipitation status:

Dry days: NewPrecip = 0

Rainy days: NewPrecip = 1

For a grid of average temperatures spanning the observed range, we compute the model-based predicted log counts, then exponentiate them to obtain predicted cyclist counts. Plotting these curves side-by-side (dry vs. rainy) across temperature illustrates how warmer conditions raise expected volume and how any precipitation consistently lowers expected counts at each temperature level

## Final model assumed fitted as:
## final_model <- glm(ManhattanBridge ~ AvgTemp + NewPrecip,
##                    family = quasipoisson, data = data)
par(mar = c(5, 5, 4, 2))  # bottom, left, top, right
par(cex.lab = 1.2, cex.axis = 1, cex.main = 1.2)

co <- coef(quasi.model.02)

# Temperature grid across observed range
temps <- seq(min(data$AvgTemp, na.rm = TRUE),
             max(data$AvgTemp, na.rm = TRUE), length.out = 200)

# Linear predictors by precipitation status
eta_dry  <- co["(Intercept)"] + co["AvgTemp"]*temps + co["NewPrecip"]*0
eta_wet  <- co["(Intercept)"] + co["AvgTemp"]*temps + co["NewPrecip"]*1

# Expected cyclist counts
mu_dry <- exp(eta_dry)
mu_wet <- exp(eta_wet)

# Plot (base R), styled like the class example
ylim_max <- max(mu_dry, mu_wet, na.rm = TRUE)
plot(temps, mu_dry, type = "l", lty = 1,
     ylim = c(0, ylim_max),
     xlab = "Average Temperature (°F)",
     ylab = "Expected Manhattan Bridge Cyclists",
     main = "Expected Cyclist Counts by AvgTemp and Precipitation")
lines(temps, mu_wet, lty = 2)

legend("topleft",
       c("Dry day (NewPrecip = 0)", "Rain day (NewPrecip = 1)"),
       lty = c(1, 2), bty = "n", cex = 0.9)

The plot displays the expected number of cyclists crossing the Manhattan Bridge as a function of average daily temperature, with separate curves shown for dry days (NewPrecip = 0) and rainy days (NewPrecip = 1). The expected counts were generated by exponentiating the fitted quasi-Poisson regression equation using a sequence of temperatures from 45°F to 75°F.

The visualization highlights two clear patterns. First, cyclist counts increase steadily with increasing average temperature. Warmer days are associated with substantially higher predicted cycling volumes, consistent with the positive and significant coefficient for AvgTemp in the regression model. Second, daily precipitation has a marked negative effect on cycling levels. Across all temperatures, the predicted counts on rainy days lie well below those on dry days, illustrating the substantial decrease in cyclist activity when any precipitation occurs.

Overall, the graph reinforces the statistical findings from the final quasi-Poisson model: average temperature is positively associated with cyclist counts, while rainfall significantly reduces expected bridge usage.

5 Conclusion

In this extended analysis of daily cyclist activity on the Manhattan Bridge, we evaluated how average temperature and precipitation influence cycling behavior under a dispersed Poisson framework. By engineering two key predictor variables—AvgTemp to summarize overall daily temperature, and NewPrecip to distinguish between dry and rainy days—we captured the primary weather-related factors that shape daily cyclist counts. After fitting both a standard Poisson model and a dispersed Poisson model, the estimated dispersion parameter indicated severe overdispersion in the raw Poisson fit, with values exceeding 300. This confirmed that the regular Poisson assumptions were not appropriate for the data, and that the quasi-Poisson model was the correct working model for inference.

The quasi-Poisson results demonstrated clear, intuitive effects. AvgTemp had a strong, positive association with cyclist counts: warmer days consistently led to higher expected volumes of bridge crossings. In contrast, precipitation significantly decreased cyclist activity; even a small amount of rainfall corresponded to a large reduction in expected counts. The Day variable, included initially to assess underlying trends, was not statistically significant and was removed from the final model. The simplified model therefore focused on the two predictors with meaningful influence.

The visual comparison of predicted cyclist counts further supported these findings. The expected counts rose steadily with increasing average temperature, while the predicted curve for rainy days remained consistently lower than that of dry days across the entire temperature range. This illustrates how weather conditions jointly shape rider behavior: favorable temperatures encourage cycling, while rain sharply suppresses it.

Overall, the quasi-Poisson modeling approach provided a more accurate and reliable picture of cyclist behavior on the Manhattan Bridge by accounting for overdispersion in the data. The results highlight the substantial role that daily weather conditions play in influencing cycling patterns and offer valuable insight for city planners seeking to anticipate and accommodate fluctuations in bicycle traffic.

---
title: "Quantifying the Effect of Temperature and Precipitation on Manhattan Bridge Cyclist Traffic"
author: "Luke Volm"
date: "2025-10-10"
output:
  html_document:           # output document format
    toc: yes               # add table contents
    toc_float: yes         # toc_property: floating
    toc_depth: 4           # depth of TOC headings
    fig_width: 6           # global figure width
    fig_height: 4          # global figure height
    fig_caption: yes       # add figure caption
    number_sections: no   # numbering section headings
    toc_collapsed: yes     # TOC subheading collapsing
    code_folding: hide     # folding/showing code 
    code_download: yes     # allow to download complete RMarkdown source code
    smooth_scroll: yes     # scrolling text of the document
    theme: lumen           # visual theme for HTML document only
    highlight: tango       # code syntax highlighting styles
  pdf_document: 
    toc: yes
    toc_depth: 4
    fig_caption: yes
    number_sections: yes
  word_document:
    toc: yes
    toc_depth: '4'
---

```{css, echo = FALSE}
div#TOC lwe{     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
library(readxl)
library(boot)
library(dplyr)
library(knitr)
library(psych)
library(MASS)
library(tidyr)
library(ggplot2)
library(car)
library(pander)

# Set seed for reproducibility
set.seed(123)

# Read in data (drop first column if it's just an index/ID)
setwd("C:/Users/volm1/OneDrive/Desktop/STA321 new")
data <- read_excel("w09-AssignDataset.xlsx")
# Global chunk options
knitr::opts_chunk$set(
  echo = TRUE,      # show code
  warning = FALSE,  # suppress warnings
  message = FALSE,  # suppress messages
  results = TRUE,   # show results
  comment = NA      # cleaner output (no "##" prefix)
)
```

# 1 Introduction

Daily cyclist counts on New York City’s East River bridges provide valuable insight into transportation patterns, recreational activity, and how environmental conditions influence urban mobility. In last week’s analysis, a standard Poisson model was used to examine how continuous measures of temperature and precipitation affected cyclist counts on the Manhattan Bridge. While the initial findings highlighted clear relationships between weather and cycling behavior, the data also showed signs of overdispersion, indicating that the variability in cyclist counts may not be fully captured by a basic Poisson framework.

To address this, the current study refines the analysis by re-engineering key predictor variables and applying a dispersed Poisson approach. Rather than treating high and low temperatures separately, daily temperature is summarized using a single average temperature variable (AvgTemp), offering a more stable and interpretable measure of overall weather conditions. Precipitation is also simplified: because the data contain many zero-rainfall days and a small number of positive rainfall values, precipitation is converted into a binary indicator (NewPrecip) distinguishing dry days from days with any measurable rainfall. These modifications help create a more realistic representation of how weather influences cyclist behavior.

The primary research question guiding this revised analysis is:

How do average temperature and precipitation status influence daily cyclist counts on the Manhattan Bridge when accounting for potential overdispersion in the data?

To answer this question, we conduct exploratory data analysis, fit Poisson and quasi-Poisson regression models, evaluate the estimated dispersion parameter, and visualize how expected cyclist counts vary across temperature and precipitation conditions. This approach provides a clearer and more robust understanding of how key environmental factors shape daily cycling activity.

# 2 Description of Dataset

The dataset originates from the Traffic Information Management System (TIMS), which collects automated bicycle counts across New York City’s East River bridges. The data are summarized on a daily basis, where each observation represents the total number of cyclists crossing the bridges in a 24-hour period, along with corresponding weather conditions for that day. These data provide a snapshot of how cycling activity fluctuates with environmental factors such as temperature and rainfall.

There are 5 original exploratory variables in the data set:

* Day (numerical) — an index we created to represent the sequence of observations (1, 2, …, n)
* LowTemp (numerical) - lowest temperature
* Precipitation (numerical) - amount of precipitation
* HighTemp (numerical) - highest temperature 
* Total (numerical) - total number of cyclists per 24 hours on the Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge

We engineered two new variables:

* AvgTemp = (HighTemp + LowTemp) / 2
* NewPrecip = 0 if Precipitation = 0, otherwise 1

Our response variable is:

* ManhattanBridge (numerical) - total number of cyclists per 24 hours crossing the Manhattan Bridge

**Assumptions and Conditions for Poisson Regression:**

Count response variable: The dependent variable must represent counts (non-negative integers), such as the number of cyclists crossing a bridge per day.

Independence of observations: Each observation (daily count) is assumed to be independent of the others.

Equidispersion: The mean and variance of the response variable are approximately equal. Overdispersion (variance greater than mean) can indicate model misfit and may require a quasi-Poisson or negative binomial adjustment.

Log-linearity: The logarithm of the expected count is a linear function of the predictor variables.

No excessive zeros: Poisson regression assumes the count data are not dominated by zeros (zero-inflated models would be used otherwise).

Explanatory variables measured without error: Predictor variables (e.g., temperature, precipitation) are assumed to be observed accurately.

# 3 Exploratory Data Analysis

To begin, we will make a frequency distribution histogram for our variable, Precipiation, to understand the behavior of the variable before modeling.

```{r}

ggplot(data, aes(x = Precipitation)) +
  geom_histogram(binwidth = 0.1, fill = "steelblue", color = "black") +
  labs(
    title = "Distribution of Precipitation",
    x = "Precipitation (inches)",
    y = "Frequency"
  )

```

The histogram of precipitation clearly shows a highly skewed and imbalanced distribution. The vast majority of observations fall at 0 inches, meaning most days in the dataset experienced no rainfall at all. This is evident from the tall bar at zero, which dramatically outweighs the frequency of all other values. In contrast, the days with measurable rainfall are both rare and widely dispersed, with only a handful of observations scattered across various positive precipitation amounts (e.g., 0.1, 0.3, 0.7, 1.2 inches). None of these positive values occur frequently enough to form meaningful clusters.

This extreme right skew indicates that precipitation behaves almost like a binary variable in practice: the important distinction is simply whether it rained or not, rather than the exact amount. Because of this structure, treating precipitation as a continuous predictor would do little to improve model accuracy and could introduce instability due to the influence of a few outlying high-rainfall days. The histogram therefore justifies discretizing precipitation into a binary indicator—NewPrecip = 0 for no rain and 1 for any rain—which better reflects the underlying pattern in the data and aligns with how weather affects cyclist behavior.

We next discretize Precipitation to assess for this.

```{r}

id.rain   = which(data$Precipitation > 0)
NewPrecip = data$Precipitation
NewPrecip[id.rain] = 1
NewPrecip[-id.rain] = 0
data$NewPrecip = NewPrecip

```

In this code chunk, we convert the original precipitation variable into a simple binary indicator. Any day with precipitation greater than zero is coded as 1, and days with no precipitation are coded as 0. This creates the new variable NewPrecip, which simplifies the analysis by focusing on whether it rained or not—important because the original precipitation data are highly skewed with many zero values.

# 4 Poisson Regression Modeling

We build both the regular Poisson and Quasi-Poisson regression models and extract the dispersion parameter to decide which model should be used as a working model.

## 4.1 Extracting Dispersion Index

We calculate the estimated dispersion value using both Pearson residuals and deviance residuals. In practice, R’s built-in dispersion estimate is based on the Pearson residuals, which it uses to approximate how much the data deviate from the assumptions of a standard Poisson model.

```{r}

# AvgTemp = (HighTemp + LowTemp)/2
data$AvgTemp <- (data$HighTemp + data$LowTemp) / 2

# Discretize Precipitation: NewPrecip = 0 if 0, 1 if > 0
id.rain <- which(data$Precipitation > 0)
NewPrecip <- rep(0, nrow(data))
NewPrecip[id.rain] <- 1
data$NewPrecip <- NewPrecip

# Day index (1, 2, ..., n). If you prefer calendar info, swap for as.integer(factor(Date)).
data$Day <- seq_len(nrow(data))

# --- Poisson model (ManhattanBridge counts) ---
pois_mod <- glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip,
                family = poisson(link = "log"), data = data)

# Predicted mean and dispersion estimates
yhat <- pois_mod$fitted.values
pearson_resid <- (data$ManhattanBridge - yhat) / sqrt(yhat)
Pearson.disp <- sum(pearson_resid^2) / pois_mod$df.residual
Deviance.disp <- pois_mod$deviance / pois_mod$df.residual

# Tidy little table for dispersion
disp_tab <- data.frame(Pearson.disp = Pearson.disp,
                       Deviance.disp = Deviance.disp)
knitr::kable(disp_tab, caption = "Estimated dispersion (Poisson fit)")

# --- Quasi-Poisson model (same mean, SEs scaled by dispersion) ---
qpois_mod <- glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip,
                 family = quasipoisson, data = data)

knitr::kable(summary(qpois_mod)$coef,
             caption = "Quasi-Poisson regression coefficients (Manhattan Bridge)")

```

The dispersion results show extremely high values for both the Pearson dispersion index (326.11) and the deviance dispersion index (336.69), indicating severe overdispersion in the Poisson model. Under the Poisson assumption, the variance should be approximately equal to the mean, producing a dispersion value close to 1. Even modest deviations—values around 2 or 3—signal that the Poisson model may not fit well. Here, the estimated dispersion is more than 300 times larger than expected, meaning the variability in cyclist counts is far greater than what a standard Poisson model can accommodate. This level of overdispersion implies that the Poisson model dramatically understates the uncertainty in the data, leading to unreliable standard errors and misleading p-values. Therefore, the quasi-Poisson model, which adjusts for excess variance by scaling the standard errors, is clearly the more appropriate modeling choice for this dataset.

Next, we summarize the inferential statistics about the regression coefficients in the following table.

```{r}

quasi.model <- glm(ManhattanBridge ~ Day + AvgTemp + NewPrecip, 
                 family = quasipoisson, data = data)  
summary(quasi.model )

SE.quasi.pois = summary(quasi.model)$coef
kable(SE.quasi.pois, caption = "Summary statistics of quasi-poisson regression model")

```

The quasi-Poisson regression results reveal how average temperature, precipitation status, and day-to-day variation influence the number of cyclists crossing the Manhattan Bridge. 

The coefficient for Day is small and statistically insignificant (β = −0.0030, p = 0.649), indicating that there is no meaningful day-to-day trend in cyclist counts after accounting for temperature and precipitation. This suggests that long-term or sequential effects are not a major driver of variation in usage during the period studied.

Average temperature (AvgTemp) is a strong positive predictor of cycling activity (β = 0.0296, p < 0.001). Interpreted on the log scale, holding all else constant, each one-degree increase in average temperature raises the expected cyclist count by approximately 3% (since exp(0.0296) ≈ 1.03). This aligns with intuitive expectations that warmer weather encourages more cycling.

The precipitation indicator (NewPrecip) has a significant negative effect on cyclist counts (β = −0.2765, p = 0.022). On days with any precipitation, the expected number of cyclists decreases by about 24% compared to completely dry days (exp(−0.2765) ≈ 0.76). This substantial drop reflects the strong influence of rainfall on cycling behavior.

Overall, the quasi-Poisson model indicates that average temperature and precipitation meaningfully impact Manhattan Bridge cycling counts, while daily trends do not contribute significantly to variation.

The estimated dispersion index is 326.11 based on the Pearson residuals.

In our final model, we will drop Day because it was not statistically significant. We refit the quasi-Poisson model after dropping Day.

```{r}

quasi.model.02 = glm(ManhattanBridge ~ AvgTemp + NewPrecip, 
                 family = quasipoisson, data = data)

kable(summary(quasi.model.02)$coef, caption = "Inferential statistics of 
the Poisson regression coefficients  in the final working model.")

```

The final quasi-Poisson model indicates that AvgTemp and NewPrecip are both significant predictors of daily cyclist counts on the Manhattan Bridge. AvgTemp has a positive and highly significant effect (Estimate = 0.0283, p < 0.001), meaning warmer days are associated with increased cyclist activity. NewPrecip has a significant negative effect (Estimate = −0.286, p = 0.015), showing that days with any precipitation have substantially lower cyclist counts compared to dry days. Overall, temperature increases cycling activity, while precipitation suppresses it.

# 4.2 Visual Comparisons

Next, we create a visualization to show how the predictors in our final model relate to the actual number of cyclists on the Manhattan Bridge. We define two comparison groups based on precipitation status:

Dry days: NewPrecip = 0

Rainy days: NewPrecip = 1

For a grid of average temperatures spanning the observed range, we compute the model-based predicted log counts, then exponentiate them to obtain predicted cyclist counts. Plotting these curves side-by-side (dry vs. rainy) across temperature illustrates how warmer conditions raise expected volume and how any precipitation consistently lowers expected counts at each temperature level

```{r}

## Final model assumed fitted as:
## final_model <- glm(ManhattanBridge ~ AvgTemp + NewPrecip,
##                    family = quasipoisson, data = data)
par(mar = c(5, 5, 4, 2))  # bottom, left, top, right
par(cex.lab = 1.2, cex.axis = 1, cex.main = 1.2)

co <- coef(quasi.model.02)

# Temperature grid across observed range
temps <- seq(min(data$AvgTemp, na.rm = TRUE),
             max(data$AvgTemp, na.rm = TRUE), length.out = 200)

# Linear predictors by precipitation status
eta_dry  <- co["(Intercept)"] + co["AvgTemp"]*temps + co["NewPrecip"]*0
eta_wet  <- co["(Intercept)"] + co["AvgTemp"]*temps + co["NewPrecip"]*1

# Expected cyclist counts
mu_dry <- exp(eta_dry)
mu_wet <- exp(eta_wet)

# Plot (base R), styled like the class example
ylim_max <- max(mu_dry, mu_wet, na.rm = TRUE)
plot(temps, mu_dry, type = "l", lty = 1,
     ylim = c(0, ylim_max),
     xlab = "Average Temperature (°F)",
     ylab = "Expected Manhattan Bridge Cyclists",
     main = "Expected Cyclist Counts by AvgTemp and Precipitation")
lines(temps, mu_wet, lty = 2)

legend("topleft",
       c("Dry day (NewPrecip = 0)", "Rain day (NewPrecip = 1)"),
       lty = c(1, 2), bty = "n", cex = 0.9)

```

The plot displays the expected number of cyclists crossing the Manhattan Bridge as a function of average daily temperature, with separate curves shown for dry days (NewPrecip = 0) and rainy days (NewPrecip = 1). The expected counts were generated by exponentiating the fitted quasi-Poisson regression equation using a sequence of temperatures from 45°F to 75°F.

The visualization highlights two clear patterns. First, cyclist counts increase steadily with increasing average temperature. Warmer days are associated with substantially higher predicted cycling volumes, consistent with the positive and significant coefficient for AvgTemp in the regression model. Second, daily precipitation has a marked negative effect on cycling levels. Across all temperatures, the predicted counts on rainy days lie well below those on dry days, illustrating the substantial decrease in cyclist activity when any precipitation occurs.

Overall, the graph reinforces the statistical findings from the final quasi-Poisson model: average temperature is positively associated with cyclist counts, while rainfall significantly reduces expected bridge usage.

# 5 Conclusion

In this extended analysis of daily cyclist activity on the Manhattan Bridge, we evaluated how average temperature and precipitation influence cycling behavior under a dispersed Poisson framework. By engineering two key predictor variables—AvgTemp to summarize overall daily temperature, and NewPrecip to distinguish between dry and rainy days—we captured the primary weather-related factors that shape daily cyclist counts. After fitting both a standard Poisson model and a dispersed Poisson model, the estimated dispersion parameter indicated severe overdispersion in the raw Poisson fit, with values exceeding 300. This confirmed that the regular Poisson assumptions were not appropriate for the data, and that the quasi-Poisson model was the correct working model for inference.

The quasi-Poisson results demonstrated clear, intuitive effects. AvgTemp had a strong, positive association with cyclist counts: warmer days consistently led to higher expected volumes of bridge crossings. In contrast, precipitation significantly decreased cyclist activity; even a small amount of rainfall corresponded to a large reduction in expected counts. The Day variable, included initially to assess underlying trends, was not statistically significant and was removed from the final model. The simplified model therefore focused on the two predictors with meaningful influence.

The visual comparison of predicted cyclist counts further supported these findings. The expected counts rose steadily with increasing average temperature, while the predicted curve for rainy days remained consistently lower than that of dry days across the entire temperature range. This illustrates how weather conditions jointly shape rider behavior: favorable temperatures encourage cycling, while rain sharply suppresses it.

Overall, the quasi-Poisson modeling approach provided a more accurate and reliable picture of cyclist behavior on the Manhattan Bridge by accounting for overdispersion in the data. The results highlight the substantial role that daily weather conditions play in influencing cycling patterns and offer valuable insight for city planners seeking to anticipate and accommodate fluctuations in bicycle traffic.