Introduction and
Background
Here we have a dataset sourced from New York City’s Traffic
Information Management System (TIMS). TIMS recorded the number of
cyclists entering and leaving three of New York City’s five boroughs -
Queens, Manhattan and Brooklyn - via a collection of bridges known as
the East River Bridges (Brooklyn Bridge, Manhattan Bridge, Williamsburg
Bridge, and Queensboro Bridge). These recordings took place in 2017.
April, July and October are the three months that are present in our
available copy of the data.
For today’s analysis we are going to look at a randomly selected
subset of the larger dataset (subset was chosen using R’s runif
function), that pertains to cyclists who entered and left our three
boroughs of interest - Queens, Manhattan and Brooklyn - via the
Manhattan Bridge throughout the entire month of July
2017. This data has 31 observations, one detailing each day, and no
missing values. A breakdown of each of the original dataset’s variables,
their practical meaning and data types are below.
|
Name
|
Meaning
|
Data_Type
|
|
Date
|
Date for that observation; YYYY-MM-DD form
|
Date
|
|
Day
|
Day of the week for that observation
|
character
|
|
HighTemp
|
That day’s highest recorded temperature
|
double
|
|
LowTemp
|
That day’s lowest recorded temperature
|
double
|
|
Precipitation
|
Measure of rain that day (inches)
|
double
|
|
Manhattan
|
Number of cyclists entering/leaving Queens, Manhattan or Brooklyn via
the MANHATTAN Bridge
|
double
|
|
Total
|
Total number of cyclists entering/leaving Queens, Manhattan or Brooklyn
via ANY of the East River Bridges
|
double
|
Objective of
Analysis
With the available data, my goal for this analysis is to examine the
association between weather conditions and day of the week with the
amount of cyclist traffic that the Manhattan Bridge experiences. In
order to do this, I created two new variables - MeanTemp and TempDiff -
which were calculated by averaging that particular day’s low and high
temperatures and finding the difference between those temperatures
respectively.
Using these temperature-related metrics, along with measures of
precipitation and records of the day of the week, I will use Poisson
regression techniques to see which if any of these factors play a
particular role in the overall amount or the relative rate of
cyclist traffic that the Manhattan Bridge experiences.
Poisson Regression
Modeling
To explore any potential associations, I created Poisson models of
two different regression types, one being for counts and one being for
rates.
Poisson counts regression examines the total number of occurrences of
a particular event (in this case cyclists on the Manhattan Bridge) and
uses a logarithmic function to determine which, if any of the
explanatory variables have a significant effect on said response
variable’s mean. The formula for said regression is below:

\(\beta\)0 = the log
of our response variable’s mean; not very useful for practical
interpretation
\(\beta\)1, \(\beta\)2, \(\beta\)3, … \(\beta\)p = the change in our
response variable’s log mean, in association with a one unit increase in
said predictor variable
Additionally, Poisson rates regression aims to find the expected rate
of a particular event’s occurrence relative to that event’s proportion
within a larger “population.” In the instance of this dataset and
analysis, our variable Total, which represents the
total number of cyclists on all the East River Bridges,
will be what the number of cyclists on the Manhattan Bridge are
considered to be a proportion of. The calculation for this type of
Poisson regression is similar to counts regression, but the logarithm of
the population variable is also considered to be a factor. This can be
expressed in both of the following ways.
- In Poisson rates regression, the parameters \(\beta\)0, …. \(\beta\)p should be interpreted
in the same manner as they are in Poisson counts model.
Poisson Regression
(Counts)
Below is a summary of the Poisson counts regression model I created,
with measures of temperature range and averages, precipitation amount
and day of the week all functioning as predictors of how many cyclists
crossed the Manhattan Bridge in or out of our three boroughs of
interest.
# Counts Model:
# Response = Manhattan
# Predictors = Day, MeanTemp, TempDiff, Precipitation
# Day is stored as a Factor
Counts_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, family = poisson(link = "log"), data = Data)
Counts_Model_Sum = summary(Counts_Model)
Counts_Model_Coef = Counts_Model_Sum$coefficients
invisible(Counts_Model_Coef)
kable(Counts_Model_Coef, caption = "Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Counts Regression: Weather and Schedule Relationship
with Count of Manhattan Bridge Cyclists
| (Intercept) |
8.5013371 |
0.0421490 |
201.697286 |
0.0000000 |
| DayMonday |
0.3199236 |
0.0089935 |
35.572866 |
0.0000000 |
| DayTuesday |
0.3357894 |
0.0093242 |
36.012843 |
0.0000000 |
| DayWednesday |
0.4023102 |
0.0090796 |
44.309388 |
0.0000000 |
| DayThursday |
0.2807381 |
0.0096557 |
29.074878 |
0.0000000 |
| DayFriday |
0.1331873 |
0.0107366 |
12.405032 |
0.0000000 |
| DaySaturday |
-0.0859127 |
0.0097279 |
-8.831613 |
0.0000000 |
| MeanTemp |
-0.0029260 |
0.0006129 |
-4.774076 |
0.0000018 |
| TempDiff |
0.0143243 |
0.0008575 |
16.703807 |
0.0000000 |
| Precipitation |
-0.4307477 |
0.0104214 |
-41.332836 |
0.0000000 |
# All predictor variables are significant
In the model, we can see that every predictor variable is
statistically significant as per p values well below the standard of
0.05, so no stepwise regression or model simplification is
necessary.
As for the practical implications of our model summary, we can say
that although every predictor variable is statistically significant, the
magnitude of their impacts are relatively small. Precipitation’s
estimated negative effect on the log mean of Manhattan Bridge cyclists
has an absolute value ~ |.4307|, which is the the highest of all our
predictors.
It appears that the day’s average temperature and difference in daily
highs and lows played very little practical significance in the log mean
of that day’s cyclists. When we look at the difference in log means from
a day-of-the-week perspective, we do see a slightly more impactful
effect. With Sunday being coded in as the baseline, it looks like
Wednesday has the greatest amount of cyclist traffic and Saturday has
the least. This higher count of cyclists during the workweek could be
due to the Manhattan Bridge functioning for many as a commuting
method.
All in all, our Poisson counts model yields some interesting and
statistically significant revelations, most notably that cyclists care
far more about precipitation than they do temperature fluctuation, and
that cyclist traffic appears to tick upwards throughout the workweek
before dying down for the weekend. However, the relatively small
magnitude of each variable’s estimated effect is a downside regarding
the model’s utility.
Poisson Regression
(Rates)
After Poisson counts regression, I then performed Poisson rates
regression with the total number of cyclists entering and exiting our
three boroughs of interest across all the East River Bridges as
the “population” for which the Manhattan Bridge cyclists are acting as a
sample of.
This process consisted of me creating two different Poisson rates
models. The first one I created listed both temperature variables as
statistically insignificant. Given their status as statistically
insignificant in this model, and their minute practical significance in
the previous counts model, I chose to remove them and create a second
Poisson rates model which did not factor in the day’s average or range
of temperature.
### Rates Model 1
Rates_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)
Rates_Model_Sum = summary(Rates_Model)
Rates_Model_Coef = Rates_Model_Sum$coefficients
invisible(Rates_Model_Coef)
kable(Rates_Model_Coef, caption = "Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Rates Regression (1): Weather and Schedule Relationship
with Count of Manhattan Bridge Cyclists
| (Intercept) |
-1.1844325 |
0.0418215 |
-28.3211719 |
0.0000000 |
| DayMonday |
0.0418134 |
0.0088829 |
4.7071774 |
0.0000025 |
| DayTuesday |
0.0549949 |
0.0094416 |
5.8247706 |
0.0000000 |
| DayWednesday |
0.0316272 |
0.0090743 |
3.4853702 |
0.0004915 |
| DayThursday |
0.0048565 |
0.0096974 |
0.5008067 |
0.6165072 |
| DayFriday |
-0.0167479 |
0.0108925 |
-1.5375635 |
0.1241554 |
| DaySaturday |
-0.0667274 |
0.0097414 |
-6.8498669 |
0.0000000 |
| MeanTemp |
-0.0010004 |
0.0006053 |
-1.6527512 |
0.0983815 |
| TempDiff |
0.0008449 |
0.0008628 |
0.9792330 |
0.3274649 |
| Precipitation |
-0.0306511 |
0.0095235 |
-3.2184824 |
0.0012887 |
### Rates Model 2
Rates_Model2 = glm(Manhattan ~ Day + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)
Rates_Model2_Sum = summary(Rates_Model2)
Rates_Model2_Coef = Rates_Model2_Sum$coefficients
invisible(Rates_Model_Coef)
kable(Rates_Model2_Coef, caption = "Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists")
Poisson Rates Regression (2): Precipitation and Schedule
Relationship with Count of Manhattan Bridge Cyclists
| (Intercept) |
-1.2497005 |
0.0065309 |
-191.3530685 |
0.0000000 |
| DayMonday |
0.0417392 |
0.0088231 |
4.7306551 |
0.0000022 |
| DayTuesday |
0.0522471 |
0.0090521 |
5.7718313 |
0.0000000 |
| DayWednesday |
0.0285783 |
0.0088706 |
3.2216687 |
0.0012745 |
| DayThursday |
0.0006398 |
0.0091826 |
0.0696739 |
0.9444532 |
| DayFriday |
-0.0205402 |
0.0106430 |
-1.9299220 |
0.0536165 |
| DaySaturday |
-0.0684652 |
0.0096858 |
-7.0685797 |
0.0000000 |
| Precipitation |
-0.0288171 |
0.0093266 |
-3.0897749 |
0.0020031 |
Looking at the findings of our second Poisson rates regression model,
we see a trend similar to that of our Poisson counts regression model,
that being a common occurrence of statistical significance but not a
great deal of practical significance on display when the magnitude of
the regression coefficient is taken into consideration.
Once again treating Sunday as our baseline, it looks like the rate of
Manhattan Bridge cyclists in proportion to the entirety of East River
Bridge cyclists is at its highest early in the week, with that rate
declining going into the weekend. That being said, the statistical
significance of this breakdown also greatly decreases when we look at
the data for Thursday and to a much lesser but still noticeable extent
Friday, perhaps suggesting that the Manhattan Bridge cyclist rate’s
decline at the tail end of the workweek could be chalked up to random
chance and not a particular characteristic of the Bridge that affects
the experience of its cyclists only on those particular days.
Day-By-Day
Averages
Since both our counts and rates models suggested that the day of the
week has the greatest association with the log mean of the Manhattan
Bridge’s cyclists, I decided to calculate the average counts and rates
per day to compare them to each other and the mean across all days
considered. The table with this information is below.
Count_Averages = c(
round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Sunday"])),
round(mean(Data$Manhattan[Data$Day == "Monday"])),
round(mean(Data$Manhattan[Data$Day == "Tuesday"])),
round(mean(Data$Manhattan[Data$Day == "Wednesday"])),
round(mean(Data$Manhattan[Data$Day == "Thursday"])),
round(mean(Data$Manhattan[Data$Day == "Friday"])),
round(mean(Data$Manhattan[Data$Day == "Saturday"]))
)
AllDays_Rates_Avg = sum(Data$Manhattan)/sum(Data$Total)
Sun_Rates_Avg = sum(Data$Manhattan[Data$Day == "Sunday"])/sum(Data$Total[Data$Day == "Sunday"])
Mon_Rates_Avg = sum(Data$Manhattan[Data$Day == "Monday"])/sum(Data$Total[Data$Day == "Monday"])
Tues_Rates_Avg = sum(Data$Manhattan[Data$Day == "Tuesday"])/sum(Data$Total[Data$Day == "Tuesday"])
Wed_Rates_Avg = sum(Data$Manhattan[Data$Day == "Wednesday"])/sum(Data$Total[Data$Day == "Wednesday"])
Thur_Rates_Avg = sum(Data$Manhattan[Data$Day == "Thursday"])/sum(Data$Total[Data$Day == "Thursday"])
Fri_Rates_Avg = sum(Data$Manhattan[Data$Day == "Friday"])/sum(Data$Total[Data$Day == "Friday"])
Sat_Rates_Avg = sum(Data$Manhattan[Data$Day == "Saturday"])/sum(Data$Total[Data$Day == "Saturday"])
Day_Rates_Averages = c(AllDays_Rates_Avg, Sun_Rates_Avg, Mon_Rates_Avg, Tues_Rates_Avg, Wed_Rates_Avg, Thur_Rates_Avg, Fri_Rates_Avg, Sat_Rates_Avg)
Rate_Averages = round(Day_Rates_Averages, digits = 4)
Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")
Counts_Difference = c(
0, # Difference between the average count of all days and itself
round(mean(Data$Manhattan[Data$Day == "Sunday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Monday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Tuesday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Wednesday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Thursday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Friday"])) - round(mean(Data$Manhattan)),
round(mean(Data$Manhattan[Data$Day == "Saturday"])) - round(mean(Data$Manhattan))
)
Rates_DifferenceB = c(
0,
Sun_Rates_Avg - AllDays_Rates_Avg,
Mon_Rates_Avg - AllDays_Rates_Avg,
Tues_Rates_Avg - AllDays_Rates_Avg,
Wed_Rates_Avg - AllDays_Rates_Avg,
Thur_Rates_Avg - AllDays_Rates_Avg,
Fri_Rates_Avg - AllDays_Rates_Avg,
Sat_Rates_Avg - AllDays_Rates_Avg
)
Rates_Difference = round(Rates_DifferenceB, digits = 4)
Table = cbind(Days, Count_Averages, Counts_Difference, Rate_Averages, Rates_Difference)
kable(Table, caption = "Distribution of Manhattan Bridge Cyclist Count and Rates July 2017") %>%
kable_styling(
bootstrap_options = c("striped", "bordered"),
full_width = FALSE,
position = "center"
)
Distribution of Manhattan Bridge Cyclist Count and Rates July 2017
|
Days
|
Count_Averages
|
Counts_Difference
|
Rate_Averages
|
Rates_Difference
|
|
All Days
|
5425
|
0
|
0.2885
|
0
|
|
Sunday
|
4690
|
-735
|
0.2865
|
-0.002
|
|
Monday
|
6001
|
576
|
0.2975
|
0.009
|
|
Tuesday
|
6363
|
938
|
0.302
|
0.0135
|
|
Wednesday
|
6938
|
1513
|
0.2949
|
0.0064
|
|
Thursday
|
5999
|
574
|
0.2868
|
-0.0017
|
|
Friday
|
4338
|
-1087
|
0.2775
|
-0.0109
|
|
Saturday
|
4031
|
-1394
|
0.2665
|
-0.022
|
The table provides greater detail into the implications of our
Poisson count and rate models. That being weekday totals of Manhattan
Bridge cyclists (specifically Monday - Thursday) far outweigh the count
of cyclists on the bridge from Friday to Sunday. With the average number
of cylclists from Monday - Thursday being about 6,325, and the average
number Friday - Sunday being about 4,353.
As for the rate of Manhattan Bridge cyclists relative to cyclists on
all East River Bridges, we see that the Manhattan Bridge’s cyclist rate
is slightly above average Monday - Wednesday, but then below average
Thursday through Sunday.
Conclusion and
Takeaways
To conclude, any implementations done in response to this model’s
findings should be done with caution due to the low practical
significance found in both our count and rate Poisson models. That being
said, there are still valuable takeaways that we can draw from our
analysis.
First, the Manhattan Bridge is clearly busier, both in the sense of
raw volume and as a proportion of the overall East River Bridge network,
early and throughout the standard workweek than it is during the
weekend. Second, the daily average temperature as well as the difference
between that day’s high and low played very little if any role in the
count or rate of cyclists on any given day, but the measure of
precipitation does appear to have a relatively noticeable and negative
association with the number of that day’s cyclists on the Manhattan
Bridge.
If we were to continue or expand on this analysis in the future, it
would be valuable to expand the scope of our data outside of the month
of July and into months that border on seasonal changes such as March,
April or October. Intuitively, one might guess that the day’s
temperatures play a much larger role in a time of the year like
that.
---
title: "Association between Weather and Day of Week with Cyclist Traffic on Manhattan Bridge"
author: "Chris Bahm"
date: "2025-10-30"
output:
  html_document:
    toc: true
    toc_float:
      collapsed: true
      smooth_scroll: true
    toc_depth: 4
    fig_width: 6
    fig_height: 4
    fig_caption: true
    number_sections: true
    code_folding: hide
    code_download: true
    theme: lumen
    highlight: tango
  pdf_document:
    toc: true
    toc_depth: 4
    fig_caption: true
    number_sections: true
  word_document:
    toc: true
    toc_depth: 4
---

```{css, echo = FALSE}
div#TOC li {     /* table of content  */
    list-style:upper-roman;
    background-image:none;
    background-repeat:none;
    background-position:0;
}

h1.title {    /* level 1 header of title  */
  font-size: 24px;
  font-weight: bold;
  color: DarkRed;
  text-align: center;
}

h4.author { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkRed;
  text-align: center;
}

h4.date { /* Header 4 - and the author and data headers use this too  */
  font-size: 18px;
  font-weight: bold;
  font-family: "Times New Roman", Times, serif;
  color: DarkBlue;
  text-align: center;
}

h1 { /* Header 1 - and the author and data headers use this too  */
    font-size: 20px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: center;
}

h2 { /* Header 2 - and the author and data headers use this too  */
    font-size: 18px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h3 { /* Header 3 - and the author and data headers use this too  */
    font-size: 16px;
    font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: navy;
    text-align: left;
}

h4 { /* Header 4 - and the author and data headers use this too  */
    font-size: 14px;
  font-weight: bold;
    font-family: "Times New Roman", Times, serif;
    color: darkred;
    text-align: left;
}

/* Add dots after numbered headers */
.header-section-number::after {
  content: ".";
}
```

```{r setup, include=FALSE}
# code chunk specifies whether the R code, warnings, and output 
# will be included in the output files.

if (!require("knitr")) {                      # use conditional statement to detect
   install.packages("knitr")                  # whether a package was installed in
   library(knitr)                             # your machine. If not, install it and
}                                             # load it to the working directory.

if (!require(tidyverse)) {library(tidyvserse)} 

if (!require(GGally)) {library(GGally)} 

if (!require(kableExtra)) {library(kableExtra)} 

if (!require(ggplot2)) {library(ggplot2)} 

if (!require(car)) {library(car)} 

if (!require(dplyr)) {library(dplyr)} 

if (!require(pander)) {library(pander)} 

if (!require(car)) {library(car)} 

if (!require("scales")) {
install.packages("scales")                                        
library("scales") 
}

knitr::opts_chunk$set(
	echo = TRUE,
	message = FALSE,
	warning = FALSE,
	comment = NA,
	results = TRUE
)
   
```

# Introduction and Background
Here we have a dataset sourced from New York City's Traffic Information Management System (TIMS). TIMS recorded the number of cyclists entering and leaving three of New York City's five boroughs - Queens, Manhattan and Brooklyn - via a collection of bridges known as the East River Bridges (Brooklyn Bridge, Manhattan Bridge, Williamsburg Bridge, and Queensboro Bridge). These recordings took place in 2017. April, July and October are the three months that are present in our available copy of the data.

For today's analysis we are going to look at a randomly selected subset of the larger dataset (subset was chosen using R's runif function), that pertains to cyclists who entered and left our three boroughs of interest - Queens, Manhattan and Brooklyn - via the **Manhattan** Bridge throughout the entire month of July 2017. This data has 31 observations, one detailing each day, and no missing values. A breakdown of each of the original dataset's variables, their practical meaning and data types are below.

```{r Variable Table, echo=FALSE}
library(knitr)

Var_Table = data.frame(
  Name = c("Date",
           "Day", 
           "HighTemp", 
           "LowTemp",
           "Precipitation", 
           "Manhattan", 
           "Total"),
  
  Meaning = c("Date for that observation; YYYY-MM-DD form", 
              "Day of the week for that observation",
              "That day's highest recorded temperature", 
              "That day's lowest recorded temperature", 
              "Measure of rain that day (inches)",  
              "Number of cyclists entering/leaving Queens, Manhattan or Brooklyn via the MANHATTAN Bridge",
              "Total number of cyclists entering/leaving Queens, Manhattan or Brooklyn via ANY of the East River Bridges"), 
              
  
  Data_Type = c("Date", "character", "double", "double", "double", "double", "double"))

kable(Var_Table) %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center")

```
## Objective of Analysis
With the available data, my goal for this analysis is to examine the association between weather conditions and day of the week with the amount of cyclist traffic that the Manhattan Bridge experiences. In order to do this, I created two new variables - MeanTemp and TempDiff - which were calculated by averaging that particular day's low and high temperatures and finding the difference between those temperatures respectively. 

Using these temperature-related metrics, along with measures of precipitation and records of the day of the week, I will use Poisson regression techniques to see which if any of these factors play a particular role in the overall amount *or* the relative rate of cyclist traffic that the Manhattan Bridge experiences.

```{r Data Loading and Cleaning, include=FALSE}
library (openxlsx)
options(scipen = 999)

# round(runif(1, min = 1, max = 10))
  # Used line above to randomly select which subset of the data to do my analysis on. The fifth tab on the original data Excel spreadsheet was for observations on the Manhattan Bridge from 7/1 to 7/31

Data = read.xlsx("https://raw.githubusercontent.com/ChrisB2323/STA321/refs/heads/main/NYC_Cyclists_Data.xlsx", sheet = "Manhattan 2")

glimpse(Data)

# Converting variable Date to a date object.
# Origin = 1899-12-30 since Excel stores data values as the number of days since then.
Data$Date = as.Date(Data$Date, origin = "1899-12-30")

# Originally converting variable Day to a date object, then using the weekdays function on it.
Data$Day = as.Date(Data$Day, origin = "1899-12-30")
  Data$Day = weekdays(as.Date(Data$Day))

  # Creation of Day_Num variable
Data$Day_Num[Data$Day == "Sunday"] = 1
Data$Day_Num[Data$Day == "Monday"] = 2
Data$Day_Num[Data$Day == "Tuesday"] = 3
Data$Day_Num[Data$Day == "Wednesday"] = 4
Data$Day_Num[Data$Day == "Thursday"] = 5
Data$Day_Num[Data$Day == "Friday"] = 6
Data$Day_Num[Data$Day == "Saturday"] = 7

  Data$MeanTemp = (Data$HighTemp + Data$LowTemp)/2
  Data$TempDiff = Data$HighTemp - Data$LowTemp
  Data$Day = factor(Data$Day,
                  levels = c("Sunday", "Monday", "Tuesday", "Wednesday", 
                             "Thursday", "Friday", "Saturday"))
            # Since Sunday is the first level of the factor listed here, it will be recognized as the baseline by R


# Reorder columns for ideal visual perception
Data = data.frame(Data$Date, Data$Day, Data$Day_Num, Data$HighTemp, Data$LowTemp, Data$MeanTemp, Data$TempDiff, Data$Precipitation, Data$Manhattan, Data$Total)
colnames(Data) = c("Date", "Day", "Day_Num", "HighTemp", "LowTemp", "MeanTemp","TempDiff", "Precipitation", "Manhattan", "Total")

glimpse(Data)
```

# Poisson Regression Modeling
To explore any potential associations, I created Poisson models of two different regression types, one being for counts and one being for rates.

Poisson counts regression examines the total number of occurrences of a particular event (in this case cyclists on the Manhattan Bridge) and uses a logarithmic function to determine which, if any of the explanatory variables have a significant effect on said response variable's mean. The formula for said regression is below:

```{r, echo=FALSE}
include_graphics("Poisson_Model_Form.png")
```

- $\beta$~0~ = the log of our response variable's mean; not very useful for practical interpretation 

- $\beta$~1~, $\beta$~2~, $\beta$~3~, ... $\beta$~p~ = the change in our response variable's log mean, in association with a one unit increase in said predictor variable

 <br> 

Additionally, Poisson rates regression aims to find the expected rate of a particular event's occurrence relative to that event's proportion within a larger "population." In the instance of this dataset and analysis, our variable Total, which represents the **total** number of cyclists on all the East River Bridges, will be what the number of cyclists on the Manhattan Bridge are considered to be a proportion of. The calculation for this type of Poisson regression is similar to counts regression, but the logarithm of the population variable is also considered to be a factor. This can be expressed in both of the following ways.
<br>
```{r, echo=FALSE}
include_graphics("Poisson_Form_Rates1.png")
```
 <br> 
```{r, echo=FALSE}
include_graphics("Poisson_Model_Form_Rates.png")
```
 <br> 
 
 - In Poisson rates regression, the parameters $\beta$~0~, .... $\beta$~p~ should be interpreted in the same manner as they are in Poisson counts model.

## Poisson Regression (Counts)
Below is a summary of the Poisson counts regression model I created, with measures of temperature range and averages, precipitation amount and day of the week all functioning as predictors of how many cyclists crossed the Manhattan Bridge in or out of our three boroughs of interest.
```{r}
# Counts Model:
  # Response = Manhattan
  # Predictors = Day, MeanTemp, TempDiff, Precipitation
    # Day is stored as a Factor

Counts_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, family = poisson(link = "log"), data = Data)

Counts_Model_Sum = summary(Counts_Model)
Counts_Model_Coef = Counts_Model_Sum$coefficients

invisible(Counts_Model_Coef)
kable(Counts_Model_Coef, caption = "Poisson Counts Regression: Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")

# All predictor variables are significant
```
In the model, we can see that *every* predictor variable is statistically significant as per p values well below the standard of 0.05, so no stepwise regression or model simplification is necessary. 

As for the practical implications of our model summary, we can say that although every predictor variable is statistically significant, the magnitude of their impacts are relatively small. Precipitation's estimated negative effect on the log mean of Manhattan Bridge cyclists has an absolute value ~ |.4307|, which is the the highest of all our predictors.

It appears that the day's average temperature and difference in daily highs and lows played very little practical significance in the log mean of that day's cyclists. When we look at the difference in log means from a day-of-the-week perspective, we do see a slightly more impactful effect. With Sunday being coded in as the baseline, it looks like Wednesday has the greatest amount of cyclist traffic and Saturday has the least. This higher count of cyclists during the workweek could be due to the Manhattan Bridge functioning for many as a commuting method.

All in all, our Poisson counts model yields some interesting and statistically significant revelations, most notably that cyclists care far more about precipitation than they do temperature fluctuation, and that cyclist traffic appears to tick upwards throughout the workweek before dying down for the weekend. However, the relatively small magnitude of each variable's estimated effect is a downside regarding the model's utility. 

## Poisson Regression (Rates)
After Poisson counts regression, I then performed Poisson rates regression with the total number of cyclists entering and exiting our three boroughs of interest across *all* the East River Bridges as the "population" for which the Manhattan Bridge cyclists are acting as a sample of. 

This process consisted of me creating two different Poisson rates models. The first one I created listed both temperature variables as statistically insignificant. Given their status as statistically insignificant in this model, and their minute practical significance in the previous counts model, I chose to remove them and create a second Poisson rates model which did not factor in the day's average or range of temperature.
```{r}
### Rates Model 1
Rates_Model = glm(Manhattan ~ Day + MeanTemp + TempDiff + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model_Sum = summary(Rates_Model)
Rates_Model_Coef = Rates_Model_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model_Coef, caption = "Poisson Rates Regression (1): Weather and Schedule Relationship with Count of Manhattan Bridge Cyclists")


### Rates Model 2
Rates_Model2 = glm(Manhattan ~ Day + Precipitation, offset = log(Total), family = poisson(link = "log"), data = Data)

Rates_Model2_Sum = summary(Rates_Model2)
Rates_Model2_Coef = Rates_Model2_Sum$coefficients

invisible(Rates_Model_Coef)
kable(Rates_Model2_Coef, caption = "Poisson Rates Regression (2): Precipitation and Schedule Relationship with Count of Manhattan Bridge Cyclists")

```
Looking at the findings of our second Poisson rates regression model, we see a trend similar to that of our Poisson counts regression model, that being a common occurrence of statistical significance but not a great deal of practical significance on display when the magnitude of the regression coefficient is taken into consideration.

Once again treating Sunday as our baseline, it looks like the rate of Manhattan Bridge cyclists in proportion to the entirety of East River Bridge cyclists is at its highest early in the week, with that rate declining going into the weekend. That being said, the statistical significance of this breakdown also greatly decreases when we look at the data for Thursday and to a much lesser but still noticeable extent Friday, perhaps suggesting that the Manhattan Bridge cyclist rate's decline at the tail end of the workweek could be chalked up to random chance and not a particular characteristic of the Bridge that affects the experience of its cyclists only on those particular days.


## Day-By-Day Averages
Since both our counts and rates models suggested that the day of the week has the greatest association with the log mean of the Manhattan Bridge's cyclists, I decided to calculate the average counts and rates per day to compare them to each other and the mean across all days considered. The table with this information is below.
```{r}
Count_Averages = c(
  round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Sunday"])),
  round(mean(Data$Manhattan[Data$Day == "Monday"])),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])),
  round(mean(Data$Manhattan[Data$Day == "Friday"])),
  round(mean(Data$Manhattan[Data$Day == "Saturday"]))
)

AllDays_Rates_Avg = sum(Data$Manhattan)/sum(Data$Total)

Sun_Rates_Avg = sum(Data$Manhattan[Data$Day == "Sunday"])/sum(Data$Total[Data$Day == "Sunday"])

Mon_Rates_Avg = sum(Data$Manhattan[Data$Day == "Monday"])/sum(Data$Total[Data$Day == "Monday"])

Tues_Rates_Avg = sum(Data$Manhattan[Data$Day == "Tuesday"])/sum(Data$Total[Data$Day == "Tuesday"])

Wed_Rates_Avg = sum(Data$Manhattan[Data$Day == "Wednesday"])/sum(Data$Total[Data$Day == "Wednesday"])

Thur_Rates_Avg = sum(Data$Manhattan[Data$Day == "Thursday"])/sum(Data$Total[Data$Day == "Thursday"])

Fri_Rates_Avg = sum(Data$Manhattan[Data$Day == "Friday"])/sum(Data$Total[Data$Day == "Friday"])

Sat_Rates_Avg = sum(Data$Manhattan[Data$Day == "Saturday"])/sum(Data$Total[Data$Day == "Saturday"])

Day_Rates_Averages = c(AllDays_Rates_Avg, Sun_Rates_Avg, Mon_Rates_Avg, Tues_Rates_Avg, Wed_Rates_Avg, Thur_Rates_Avg, Fri_Rates_Avg, Sat_Rates_Avg)

Rate_Averages = round(Day_Rates_Averages, digits = 4)

Days = c("All Days", "Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday")

Counts_Difference = c(
  0, # Difference between the average count of all days and itself
  round(mean(Data$Manhattan[Data$Day == "Sunday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Monday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Tuesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Wednesday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Thursday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Friday"])) - round(mean(Data$Manhattan)),
  round(mean(Data$Manhattan[Data$Day == "Saturday"])) - round(mean(Data$Manhattan))
)

Rates_DifferenceB = c(
  0,
  Sun_Rates_Avg - AllDays_Rates_Avg,
  Mon_Rates_Avg - AllDays_Rates_Avg,
  Tues_Rates_Avg - AllDays_Rates_Avg,
  Wed_Rates_Avg - AllDays_Rates_Avg,
  Thur_Rates_Avg - AllDays_Rates_Avg,
  Fri_Rates_Avg - AllDays_Rates_Avg,
  Sat_Rates_Avg - AllDays_Rates_Avg
)

Rates_Difference = round(Rates_DifferenceB, digits = 4)

Table = cbind(Days, Count_Averages, Counts_Difference, Rate_Averages, Rates_Difference)

kable(Table, caption = "Distribution of Manhattan Bridge Cyclist Count and Rates July 2017") %>%
  kable_styling(
    bootstrap_options = c("striped", "bordered"),
    full_width = FALSE,
    position = "center"
  )
```

The table provides greater detail into the implications of our Poisson count and rate models. That being weekday totals of Manhattan Bridge cyclists (specifically Monday - Thursday) far outweigh the count of cyclists on the bridge from Friday to Sunday. With the average number of cylclists from Monday - Thursday being about 6,325, and the average number Friday - Sunday being about 4,353.

As for the rate of Manhattan Bridge cyclists relative to cyclists on all East River Bridges, we see that the Manhattan Bridge's cyclist rate is slightly above average Monday - Wednesday, but then below average Thursday through Sunday.


# Conclusion and Takeaways
To conclude, any implementations done in response to this model's findings should be done with caution due to the low practical significance found in both our count and rate Poisson models. That being said, there are still valuable takeaways that we can draw from our analysis. 

First, the Manhattan Bridge is clearly busier, both in the sense of raw volume and as a proportion of the overall East River Bridge network, early and throughout the standard workweek than it is during the weekend. Second, the daily average temperature as well as the difference between that day's high and low played very little if any role in the count or rate of cyclists on any given day, but the measure of precipitation does appear to have a relatively noticeable and negative association with the number of that day's cyclists on the Manhattan Bridge.

If we were to continue or expand on this analysis in the future, it would be valuable to expand the scope of our data outside of the month of July and into months that border on seasonal changes such as March, April or October. Intuitively, one might guess that the day's temperatures play a much larger role in a time of the year like that.

# References:

Original Dataset Source:

- https://pengdsci.github.io/STA321/ww09/w09-AssignDataSet.xlsx

Dataset Download Links via Github:

- https://raw.githubusercontent.com/ChrisB2323/STA321/refs/heads/main/NYC_Cyclists_Data.xlsx