The daily total of bike counts was conducted monthly on the Manhattan Bridge. To keep count of cyclists entering and leaving Manhattan via the East River Bridges. The Traffic Information Management System (TIMS) collects the count data. Each record represents the total number of cyclists per 24 hours at the Manhattan Bridge. The sample size is 30 from April 1 - 30. The variables are:
Date
Day (of the week)
HighTemp
LowTemp
Precipitation
ManhattanBridge (number of cyclists)
total (number of people in total crossing the bridge)
Using the variables given, is there a model that can predict total number of proportion of cyclists crossing the Manhattan bridge?
The response variable is the proportion of cyclists entering and leaving the city in a day with day, hightemp, lowtemp and precipitation being the predictor variables.
A Poisson regression model on the counts and proportions will be performed on the data.
The basic assumptions of Poisson regression are
Poisson Response: The response variable is a count per unit of time
Independence: The observations must by independent of one another.
Mean is equal to variance: By definition, the mean of a Poisson random variable must be equal to its variance.
Linearity: The log of the mean rate, \(log(\lambda)\), must be a linear function of \(x\).
A Poison Regression Model on the Counts of cyclists crossing the Manhattan Bridge using the predicitor variables day, HighTemp, LowTemp, and Precipitation is made below.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | 7.4842771 | 0.0236149 | 316.930067 | 0.0000000 |
| DayMonday | 0.0985625 | 0.0104808 | 9.404096 | 0.0000000 |
| DaySaturday | -0.3552100 | 0.0110159 | -32.245078 | 0.0000000 |
| DaySunday | -0.2367027 | 0.0102777 | -23.030627 | 0.0000000 |
| DayThursday | 0.0016272 | 0.0110615 | 0.147108 | 0.8830468 |
| DayTuesday | 0.2002094 | 0.0109684 | 18.253325 | 0.0000000 |
| DayWednesday | 0.0392152 | 0.0107568 | 3.645636 | 0.0002667 |
| HighTemp | 0.0197124 | 0.0006288 | 31.348897 | 0.0000000 |
| LowTemp | -0.0046676 | 0.0008375 | -5.573330 | 0.0000000 |
| Precipitation | -1.1345660 | 0.0183601 | -61.795096 | 0.0000000 |
The only variable that was not statistically significant was the Day Thursday but all other days were signficant so this variable will still be included in the model. All the other variables were highly statistically significant.
In this model, when the day is a weekend and the temperature is low and the precipitation is high, the count of cyclists crossing the bridge decreases. And when the day is a weekday and the temperature is high, the count of cyclists crossing the bridge increases. This means that there is more cyclist traffic on the bridge during warm and nonrainy weekdays whereas there is less cyclist traffric during cold, wet, weekends.
Now another Poisson Regression Model will be made but instead using the proportion of cyclists to the total number of people crossing the bridge in a day in April.
| Estimate | Std. Error | z value | Pr(>|z|) | |
|---|---|---|---|---|
| (Intercept) | -1.2633758 | 0.0239141 | -52.8297757 | 0.0000000 |
| DayMonday | 0.0439902 | 0.0104710 | 4.2011562 | 0.0000266 |
| DaySaturday | -0.0456126 | 0.0109416 | -4.1687367 | 0.0000306 |
| DaySunday | 0.0032549 | 0.0102893 | 0.3163426 | 0.7517425 |
| DayThursday | -0.0126757 | 0.0110537 | -1.1467342 | 0.2514915 |
| DayTuesday | 0.0152086 | 0.0109994 | 1.3826720 | 0.1667655 |
| DayWednesday | 0.0117739 | 0.0107482 | 1.0954316 | 0.2733276 |
| HighTemp | 0.0018419 | 0.0006280 | 2.9329459 | 0.0033576 |
| LowTemp | -0.0019199 | 0.0008391 | -2.2880910 | 0.0221322 |
| Precipitation | -0.0387606 | 0.0177886 | -2.1789584 | 0.0293348 |
The variables in this model have less statistically significant p-values than the other model. The day variables that are not significant are Sunday, Tuesday, Wednesday, and Thursday. All of the other variables are statistically significant.
In this model, when the day is Thursday or Saturday and the temperature is low and the precipitation is high, the rate of cyclists crossing the bridge decreases. And when the day is a Sunday, Monday, Tuesday, and Wednesday and the temperature is high, the rate of cyclists crossing the bridge increases. This means that there is more cyclist traffic on the bridge during warm and nonrainy Sundays, Mondays, Tuesdays, and Wednesdays whereas there is less cyclist traffic during cold, wet, Thursdays and Saturdays.
Next, a Quasi-Poisson Model will be made but two of the predictor variables will be modified. First, a new variable called AvgTemp will be made which is the average temperature of the high and low temps.
Next, the variable Precipitation will be discretized where if precipitation = 0, then NewPrecip = 0 and if precipitation > 0 than New Precip = 1.
Thus the Quasi-Poisson Regression Model will have three variables: Day, AvgTemp, and NewPrecip. The response variable will be the count of cyclists crossing the Manhattan Bridge on a certain day in April.
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -1.2841788 | 0.0259352 | -49.5149736 | 0.0000000 |
| DayMonday | 0.0491628 | 0.0206233 | 2.3838525 | 0.0266478 |
| DaySaturday | -0.0431585 | 0.0220493 | -1.9573633 | 0.0637280 |
| DaySunday | 0.0055753 | 0.0210582 | 0.2647556 | 0.7937790 |
| DayThursday | -0.0106473 | 0.0224219 | -0.4748613 | 0.6397871 |
| DayTuesday | 0.0115586 | 0.0218350 | 0.5293604 | 0.6021072 |
| DayWednesday | 0.0136944 | 0.0220048 | 0.6223377 | 0.5404191 |
| AvgTemp | 0.0052373 | 0.0023506 | 2.2280570 | 0.0369391 |
| NewPrecip | 0.0040403 | 0.0150458 | 0.2685317 | 0.7909104 |
The dispersion index is:
| Dispersion |
|---|
| 3.932 |
The dispersion index is almost 4 which is high. The Quasi-Poisson Regression model of the rate of cyclists instead of the regular Poisson Model will be used because of how high the dispersion index is. This makes our final model: \(Rate = -1.284 + 0.049*DayMonday - 0.043*DaySaturday + 0.0056*DaySunday - 0.011*DayThursday + 0.012*DayTuesday + 0.0137*DayWednesday + 0.0052*AvgTemp + 0.004*NewPrecip\)
The intercept represents the baseline log-number of cyclists (of baseline day Friday).
Here is a visualization of the data of the rate of cyclists entering and exiting the bridge when there is no precipitation and the day.
According to the graph above and the model, the three days that have the highest rate of cyclists crossing the bridge are Monday, Tuesday, and Wednesday with Saturday having the lowest rate. As the average temperature increases, so does the rate of cyclists. So the larger the difference there is between the high and low temperatures, there is a higher rate of cyclists crossing the bridge.
The Quasi-Poisson Regression Model is used for prediction due to the high dispersion value. Overall, the highest rate of people that are cycling over the Manhattan bridge is on Monday with the lowest rate being Saturday. This means that a person driving across on the Manhattan bridge will see more people cycling than they would on Saturday.