Most Important Split & Interpretation of the Tree
The most important split in the regression is traffic index < 19 vs traffic index >= 19. Traffic appears to be a major contributing factor to predicting usage ratio. The tree in particular splits on the variables traffic index, effective capacity, temperature, and if there is an event.
The left split in the tree represents all data which the traffic index is < 19. Therefore low congestion signified relatively lower usage ratio. After this the left split in the tree is if effective capacity is less than 3300. Usage ratio is the ratio of rentals to effective capacity. When effective capacity drops one may expect for usage ratio to rise. However, the effective capacity is reduced from the hours of 12pm-5am to under 3300 on average. So when effective capacity is less than 3300 it is in the early hours of the day where bike rentals is close to zero, therefore the usage ratio is predicted to be 1.5%. On the right side of the inital left split, it separates based on temperature. So if it there is low congestion, not the early morning, and the temperature is less than 78 degrees then the usage ratio is predicted to be 5%. If those same conditions are met but the temperature is greater than 78 degrees then the predicted usage ratio is 12%.
On the right side of the tree where the traffic index is greater than or equal to 19 proceeds to split based on temperature being greater than or less than 60. The left side of the split is based on traffic index being 56% which is significantly higher. The tree predicts that if the traffic index is between 19% and 56% and the temperature is less than 60F then the usage ratio will be 9.4% which is below average usage ratio of 11%. In the right side of the split, a traffic index greater than 56% and temperature above 60F indicates a usage ratio of 18% which is significantly higher than average. This is to be expected as the multiple regression highlighted that warmer temperatures along with higher traffic index scores were predicts of higher usage ratio.
If the traffic index is greater than 19% and the temperature is greater than 60F, the next split is again on traffic index. A traffic index score greater than 52% with temperature greater than 60F results in a predicted usage ratio of 25%. This, coupled with the other prediction of that a temperature greater than 60F and traffic index greater than 56% resulted in a predicted usage ratio of 18%, suggests that if the temperature is warm (above 60F) a traffic index of around 50 results in the most stress to Capital Bikeshare’s system. In this warm scenario, as traffic index grows above 56 the predicted usage ratio tapers down from 25% to 18%. One potential result is that if congestion is too high, the bike lanes could get filled up and squeezed by the addition cars on the road.
If the traffic index is less than 52% the next split is if there is an event. If there is an event, the traffic index is between 19% and 52%, and the temperature is above 60F then the usage ratio is predicted to be 24%. Having an event, especially in warm conditions with moderate levels of traffic, appear to be an ideal situation for high rental demand for bikes.
If there is no event, the final split is on if the temperature is less than or greater than 78. If the temperature is between 60 and 78F, there is no event, and the traffic index is between 19% and 52%, then the tree predicts the usage ratio to be 14%, slightly above average. This makes sense as it is moderately warm temperature with slight to moderate traffic resulting in slightly above average usage rate. If the traffic index is between 19% and 52% there is no event and the weather is warmer that 78F, the predicted usage ratio is 19%. Similar to the other predicted usage ratios above 18%, these days with higher than average usage ratios are marked by warm weather, moderate traffic, and the potential of an event.