1 Description of the Data set

The daily total of bike counts was conducted monthly on the Manhattan Bridge. To keep count of cyclists entering and leaving Manhattan via the East River Bridges. The Traffic Information Management System (TIMS) collects the count data. Each record represents the total number of cyclists per 24 hours at the Manhattan Bridge. The sample size is 30 from April 1 - 30. The variables are:

  1. Date

  2. Day (of the week)

  3. HighTemp

  4. LowTemp

  5. Precipitation

  6. ManhattanBridge (number of cyclists)

  7. total (number of people in total crossing the bridge)

2 Research Question

Using the variables given, is there a model that can predict total number of proportion of cyclists crossing the Manhattan bridge?

The response variable is the proportion of cyclists entering and leaving the city in a day with day, hightemp, lowtemp and precipitation being the predictor variables.

3 Models

A Poisson regression model on the counts and proportions will be performed on the data.

The basic assumptions of Poisson regression are

  1. Poisson Response: The response variable is a count per unit of time

  2. Independence: The observations must by independent of one another.

  3. Mean is equal to variance: By definition, the mean of a Poisson random variable must be equal to its variance.

  4. Linearity: The log of the mean rate, \(log(\lambda)\), must be a linear function of \(x\).

3.1 Poison Regression Model on the Counts

A Poison Regression Model on the Counts of cyclists crossing the Manhattan Bridge using the predicitor variables day, HighTemp, LowTemp, and Precipitation is made below.

The Poisson regression model for the counts of cyclists on the bridge versus the day of the week, high temperature, low temperature, and precipitation.
Estimate Std. Error z value Pr(>|z|)
(Intercept) 7.4842771 0.0236149 316.930067 0.0000000
DayMonday 0.0985625 0.0104808 9.404096 0.0000000
DaySaturday -0.3552100 0.0110159 -32.245078 0.0000000
DaySunday -0.2367027 0.0102777 -23.030627 0.0000000
DayThursday 0.0016272 0.0110615 0.147108 0.8830468
DayTuesday 0.2002094 0.0109684 18.253325 0.0000000
DayWednesday 0.0392152 0.0107568 3.645636 0.0002667
HighTemp 0.0197124 0.0006288 31.348897 0.0000000
LowTemp -0.0046676 0.0008375 -5.573330 0.0000000
Precipitation -1.1345660 0.0183601 -61.795096 0.0000000

The only variable that was not statistically significant was the Day Thursday but all other days were signficant so this variable will still be included in the model. All the other variables were highly statistically significant.

In this model, when the day is a weekend and the temperature is low and the precipitation is high, the count of cyclists crossing the bridge decreases. And when the day is a weekday and the temperature is high, the count of cyclists crossing the bridge increases. This means that there is more cyclist traffic on the bridge during warm and nonrainy weekdays whereas there is less cyclist traffric during cold, wet, weekends.

3.2 Poisson Regression Model of Proportions

Now another Poisson Regression Model will be made but instead using the proportion of cyclists to the total number of people crossing the bridge in a day in April.

Poisson regression on the rate of cyclists crossing the bridge.
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.2633758 0.0239141 -52.8297757 0.0000000
DayMonday 0.0439902 0.0104710 4.2011562 0.0000266
DaySaturday -0.0456126 0.0109416 -4.1687367 0.0000306
DaySunday 0.0032549 0.0102893 0.3163426 0.7517425
DayThursday -0.0126757 0.0110537 -1.1467342 0.2514915
DayTuesday 0.0152086 0.0109994 1.3826720 0.1667655
DayWednesday 0.0117739 0.0107482 1.0954316 0.2733276
HighTemp 0.0018419 0.0006280 2.9329459 0.0033576
LowTemp -0.0019199 0.0008391 -2.2880910 0.0221322
Precipitation -0.0387606 0.0177886 -2.1789584 0.0293348

The variables in this model have less statistically significant p-values than the other model. The day variables that are not significant are Sunday, Tuesday, Wednesday, and Thursday. All of the other variables are statistically significant.

In this model, when the day is Thursday or Saturday and the temperature is low and the precipitation is high, the rate of cyclists crossing the bridge decreases. And when the day is a Sunday, Monday, Tuesday, and Wednesday and the temperature is high, the rate of cyclists crossing the bridge increases. This means that there is more cyclist traffic on the bridge during warm and nonrainy Sundays, Mondays, Tuesdays, and Wednesdays whereas there is less cyclist traffic during cold, wet, Thursdays and Saturdays.

4 Quasi-Poisson Model

Next, a Quasi-Poisson Model will be made but two of the predictor variables will be modified. First, a new variable called AvgTemp will be made which is the average temperature of the high and low temps.

Next, the variable Precipitation will be discretized where if precipitation = 0, then NewPrecip = 0 and if precipitation > 0 than New Precip = 1.

Thus the Quasi-Poisson Regression Model will have three variables: Day, AvgTemp, and NewPrecip. The response variable will be the count of cyclists crossing the Manhattan Bridge on a certain day in April.

Quasi-Poisson regression on the number of cyclists crossing the Manhattan Bridge
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.2841788 0.0259352 -49.5149736 0.0000000
DayMonday 0.0491628 0.0206233 2.3838525 0.0266478
DaySaturday -0.0431585 0.0220493 -1.9573633 0.0637280
DaySunday 0.0055753 0.0210582 0.2647556 0.7937790
DayThursday -0.0106473 0.0224219 -0.4748613 0.6397871
DayTuesday 0.0115586 0.0218350 0.5293604 0.6021072
DayWednesday 0.0136944 0.0220048 0.6223377 0.5404191
AvgTemp 0.0052373 0.0023506 2.2280570 0.0369391
NewPrecip 0.0040403 0.0150458 0.2685317 0.7909104

The dispersion index is:

Dispersion
3.932

5 Final Working Model

The dispersion index is almost 4 which is high. The Quasi-Poisson Regression model of the rate of cyclists instead of the regular Poisson Model will be used because of how high the dispersion index is. This makes our final model: \(Rate = -1.284 + 0.049*DayMonday - 0.043*DaySaturday + 0.0056*DaySunday - 0.011*DayThursday + 0.012*DayTuesday + 0.0137*DayWednesday + 0.0052*AvgTemp + 0.004*NewPrecip\)

The intercept represents the baseline log-number of cyclists (of baseline day Friday).

6 Visual Comparisons

Here is a visualization of the data of the rate of cyclists entering and exiting the bridge when there is no precipitation and the day.

Discussions and Conclusions

According to the graph above and the model, the three days that have the highest rate of cyclists crossing the bridge are Monday, Tuesday, and Wednesday with Saturday having the lowest rate. As the average temperature increases, so does the rate of cyclists. So the larger the difference there is between the high and low temperatures, there is a higher rate of cyclists crossing the bridge.

The Quasi-Poisson Regression Model is used for prediction due to the high dispersion value. Overall, the highest rate of people that are cycling over the Manhattan bridge is on Monday with the lowest rate being Saturday. This means that a person driving across on the Manhattan bridge will see more people cycling than they would on Saturday.