Introduction

The goal of this project was to analyze the daily cyclist counts on the NYC East river bridges and evaluate how weather and day-of-week influence traffic volume. Two Poisson regression models were applied: a counts model predicting total daily cyclists, and a rate model which used an offset to examine relative flow rates after adjusting for exposure.

Materials

The dataset contains daily totals for bridge traffic aolng with weather variables (high temp, low temp, precipitation) and calendar information (date, day). All counts were non-negative integers representing the number of cyclists crossing a single bridge per day. Each observation was treated as independent. After cleaning and converting text dates to proper date objects, and factors were created for weekdays and months.

Methodology and analysis

Poisson generalized linear models were used because the response variable represented count data. Two versions of the model were fit: Counts models and rate models. Model selection used AIC, and overdispersion was checked using residual deviance/degrees of freedom.

Results and Conclusions

Counts Model

Weekday effects: Relative to Friday, Wednesday was +20%, Thursday +12%, and Tuesday +10% showed higher expected cyclist counts. Weekend days dropped sharply, with Saturday -18% and Sunday -20%. Weather Effects: Each 1 degree increase in temp raised expected counts by 1.6%. Lowtemp had an opposite effect, and precipitation reduced counts 27% per unit. # Rate Model Using offset(log(total)) produced nearly identical effects but a smaller deviance, although still over dispersed. This model appropriately interpreted coefficients as having effects on rates rather than raw counts.

Conclusions

Cyclists traffic increased mid-week on warm and dry days. Weekend and rainy conditions reduce riders.

General Discussion

The analysis shows how routine weather and temporal variables help explain variability in cyclist volume. Limitations within the study include bridge limitation as only one bridge had data, possible autocorrelation between days, and unmeasured variables like holidays or daylight length. Future work could expand to more bridges or time-series model to address over dispersion and temperature dependence.

knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)



library(readxl)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(stringr)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(tidyr)
library(ggplot2)
library(broom)
library(knitr)
library(forcats)


data_path <- "C:/Users/rg03/Downloads/sta321/PoissonData.xlsx"
raw <- readxl::read_excel(data_path)



names(raw) <- names(raw) %>%
tolower() %>%
gsub("[[:space:]]+", "*", .) %>%
gsub("[^a-z0-9*]", "", .)

dat0 <- raw



has_total <- "total" %in% names(dat0)


bridge_cols <- names(dat0)[names(dat0) %in% c(
"brooklyn","manhattan","williamsburg","queensborobridge"
)]



date_col <- if ("date" %in% names(dat0)) "date" else NULL
day_col  <- if ("day"  %in% names(dat0)) "day"  else NULL



hitemp_col <- names(dat0)[grepl("^hightemp", names(dat0))][1]
if (is.na(hitemp_col)) hitemp_col <- NULL
lotemp_col <- names(dat0)[grepl("^lowtemp",  names(dat0))][1]
if (is.na(lotemp_col)) lotemp_col <- NULL
prcp_col   <- if ("precipitation" %in% names(dat0)) "precipitation" else NULL



if (length(bridge_cols) >= 1) {



if (length(bridge_cols) > 1) {
long_counts <- dat0 %>%
tidyr::pivot_longer(
cols = dplyr::all_of(bridge_cols),
names_to = "bridge",
values_to = "count"
)
} else {
bcol <- bridge_cols[1]
long_counts <- dat0 %>%
dplyr::mutate(
bridge = bcol,
count  = .data[[bcol]]
)
}
long_counts <- long_counts %>% dplyr::mutate(direction = "total")



if (has_total) {
long_counts$total <- long_counts$total
} else {
long_counts$total <- long_counts$count
}

} else if (has_total) {



long_counts <- dat0 %>%
dplyr::mutate(
bridge = "all",
direction = "total",
count = total
)
} else {
stop("Could not detect counts. Please ensure your sheet has a bridge count column (e.g., QueensboroBridge) or a Total column.")
}



dat <- long_counts %>%
{ if (!is.null(date_col)) dplyr::rename(., date = dplyr::all_of(date_col)) else dplyr::mutate(., date = NA) } %>%
dplyr::mutate(
date = suppressWarnings(lubridate::as_date(date)),

wday_chr = dplyr::case_when(
!is.null(day_col) ~ as.character(.data[[day_col]]),
!is.na(date)      ~ as.character(lubridate::wday(date, label = TRUE, abbr = TRUE)),
TRUE              ~ NA_character_
),
month_chr = dplyr::case_when(
!is.na(date) ~ as.character(lubridate::month(date, label = TRUE, abbr = TRUE)),
TRUE         ~ NA_character_
),
wday  = factor(wday_chr),
month = factor(month_chr),
bridge = factor(bridge),
direction = factor(direction),
count = as.numeric(count),
total = as.numeric(total)
) %>%
dplyr::select(-wday_chr, -month_chr)



if (!is.null(hitemp_col)) dat$hightemp <- suppressWarnings(as.numeric(dat0[[hitemp_col]]))
if (!is.null(lotemp_col)) dat$lowtemp  <- suppressWarnings(as.numeric(dat0[[lotemp_col]]))
if (!is.null(prcp_col))   dat$precip   <- suppressWarnings(as.numeric(dat0[[prcp_col]]))



dat <- dat %>% dplyr::filter(!is.na(count))



knitr::kable(head(dat), caption = "Preview of standardized data")

Preview of standardized data
date	day	hightemp	lowtemp	precipitation	queensborobridge	total	bridge	count	direction	wday	month	precip
2025-07-01	Saturday	84.9	72.0	0.23	3216	11867	queensborobridge	3216	total	Saturday	Jul	0.23
2025-07-02	Sunday	87.1	73.0	0.00	3579	13995	queensborobridge	3579	total	Sunday	Jul	0.00
2025-07-03	Monday	87.1	71.1	0.45	4230	16067	queensborobridge	4230	total	Monday	Jul	0.45
2025-07-04	Tuesday	82.9	70.0	0.00	3861	13925	queensborobridge	3861	total	Tuesday	Jul	0.00
2025-07-05	Wednesday	84.9	71.1	0.00	5862	23110	queensborobridge	5862	total	Wednesday	Jul	0.00
2025-07-06	Thursday	75.0	71.1	0.00	5251	21861	queensborobridge	5251	total	Thursday	Jul	0.00

has_2lvls <- function(x) length(unique(na.omit(x))) > 1
has_var   <- function(x) is.numeric(x) && is.finite(sd(x, na.rm = TRUE)) && sd(x, na.rm = TRUE) > 0

rhs_terms <- c()
if ("bridge" %in% names(dat)    && has_2lvls(dat$bridge))    rhs_terms <- c(rhs_terms, "bridge")
if ("direction" %in% names(dat) && has_2lvls(dat$direction)) rhs_terms <- c(rhs_terms, "direction")
if ("month" %in% names(dat)     && has_2lvls(dat$month))     rhs_terms <- c(rhs_terms, "month")
if ("wday" %in% names(dat)      && has_2lvls(dat$wday))      rhs_terms <- c(rhs_terms, "wday")
if ("hightemp" %in% names(dat)  && has_var(dat$hightemp))    rhs_terms <- c(rhs_terms, "hightemp")
if ("lowtemp"  %in% names(dat)  && has_var(dat$lowtemp))     rhs_terms <- c(rhs_terms, "lowtemp")
if ("precip"   %in% names(dat)  && has_var(dat$precip))      rhs_terms <- c(rhs_terms, "precip")
if (length(rhs_terms) == 0) rhs_terms <- "1"

counts_formula <- as.formula(paste("count ~", paste(rhs_terms, collapse = " + ")))


dat2 <- dat %>% mutate(
bridge    = if ("bridge" %in% names(.)) droplevels(bridge) else bridge,
direction = if ("direction" %in% names(.)) droplevels(direction) else direction,
month     = if ("month" %in% names(.)) droplevels(month) else month,
wday      = if ("wday" %in% names(.)) droplevels(wday) else wday
)

m_counts_full <- glm(counts_formula, family = poisson(link = "log"), data = dat2)

counts_full_tidy <- broom::tidy(m_counts_full, conf.int = TRUE, exponentiate = TRUE)
knitr::kable(counts_full_tidy, digits = 3, caption = "Counts model (IRRs = exp(coef)) — full")

Counts model (IRRs = exp(coef)) — full
term	estimate	std.error	statistic	conf.low	conf.high
(Intercept)	5175.666	0.048	178.508	4711.687	5685.036
wdayMonday	1.056	0.011	4.991	1.033	1.078
wdaySaturday	0.823	0.011	-17.525	0.805	0.841
wdaySunday	0.800	0.012	-19.211	0.782	0.819
wdayThursday	1.123	0.011	10.232	1.098	1.148
wdayTuesday	1.101	0.011	8.506	1.077	1.126
wdayWednesday	1.202	0.011	16.595	1.176	1.228
hightemp	1.016	0.001	19.692	1.014	1.018
lowtemp	0.980	0.001	-16.876	0.978	0.983
precip	0.725	0.011	-30.584	0.710	0.740

m_counts <- tryCatch(step(m_counts_full, trace = 0), error = function(e) m_counts_full)
knitr::kable(broom::tidy(m_counts, conf.int = TRUE, exponentiate = TRUE),
digits = 3, caption = "Counts model (IRRs) — final")

Counts model (IRRs) — final
term	estimate	std.error	statistic	conf.low	conf.high
(Intercept)	5175.666	0.048	178.508	4711.687	5685.036
wdayMonday	1.056	0.011	4.991	1.033	1.078
wdaySaturday	0.823	0.011	-17.525	0.805	0.841
wdaySunday	0.800	0.012	-19.211	0.782	0.819
wdayThursday	1.123	0.011	10.232	1.098	1.148
wdayTuesday	1.101	0.011	8.506	1.077	1.126
wdayWednesday	1.202	0.011	16.595	1.176	1.228
hightemp	1.016	0.001	19.692	1.014	1.018
lowtemp	0.980	0.001	-16.876	0.978	0.983
precip	0.725	0.011	-30.584	0.710	0.740

knitr::kable(broom::glance(m_counts), digits = 3, caption = "Counts model fit stats")

Counts model fit stats
null.deviance	df.null	logLik	AIC	BIC	deviance	df.residual	nobs
7891.022	30	-1670.379	3360.758	3375.098	3023.595	21	31

rate_dat <- dat2 %>% dplyr::filter(!is.na(total), total > 0)

if (nrow(rate_dat) >= 2) {



rhs_rate <- c()
if ("bridge" %in% names(rate_dat)    && has_2lvls(rate_dat$bridge))    rhs_rate <- c(rhs_rate, "bridge")
if ("direction" %in% names(rate_dat) && has_2lvls(rate_dat$direction)) rhs_rate <- c(rhs_rate, "direction")
if ("month" %in% names(rate_dat)     && has_2lvls(rate_dat$month))     rhs_rate <- c(rhs_rate, "month")
if ("wday" %in% names(rate_dat)      && has_2lvls(rate_dat$wday))      rhs_rate <- c(rhs_rate, "wday")
if ("hightemp" %in% names(rate_dat)  && has_var(rate_dat$hightemp))    rhs_rate <- c(rhs_rate, "hightemp")
if ("lowtemp"  %in% names(rate_dat)  && has_var(rate_dat$lowtemp))     rhs_rate <- c(rhs_rate, "lowtemp")
if ("precip"   %in% names(rate_dat)  && has_var(rate_dat$precip))      rhs_rate <- c(rhs_rate, "precip")
if (length(rhs_rate) == 0) rhs_rate <- "1"

rate_formula <- as.formula(paste("count ~", paste(rhs_rate, collapse = " + "),
"+ offset(log(total))"))

rate_dat2 <- rate_dat %>% mutate(
bridge    = if ("bridge" %in% names(.)) droplevels(bridge) else bridge,
direction = if ("direction" %in% names(.)) droplevels(direction) else direction,
month     = if ("month" %in% names(.)) droplevels(month) else month,
wday      = if ("wday" %in% names(.)) droplevels(wday) else wday
)

m_rate_full <- glm(rate_formula, family = poisson(link = "log"), data = rate_dat2)

rate_full_tidy <- broom::tidy(m_rate_full, conf.int = TRUE, exponentiate = TRUE)
knitr::kable(rate_full_tidy, digits = 3, caption = "Rate model (rate ratios) — full")

m_rate <- tryCatch(step(m_rate_full, trace = 0), error = function(e) m_rate_full)
knitr::kable(broom::tidy(m_rate, conf.int = TRUE, exponentiate = TRUE),
digits = 3, caption = "Rate model (rate ratios) — final")
knitr::kable(broom::glance(m_rate), digits = 3, caption = "Rate model fit stats")
} else {
knitr::kable(head(rate_dat), caption = "Rate data unavailable or lacks valid 'total'; skipping rate model")
}

Rate model fit stats
null.deviance	df.null	logLik	AIC	BIC	deviance	df.residual	nobs
467.325	30	-330.421	680.841	695.181	343.678	21	31

An analysis of cyclists and weather/days of week

2025-11-10