This assignment covers two very different modeling problems. The first looks at daily stock price series for the NASDAQ, Callaway, and Caesars. Since financial price series usually trend over time, the analysis has to begin with stationarity. That is why the time series portion starts with unit root testing, then moves to Engle Granger cointegration testing, and then to an error correction model only if the data support a long run equilibrium relationship.
The second part focuses on football no shows. Here the question is not whether someone bought a ticket, but whether they actually used it. That makes the no show problem a good fit for both a regression tree and a multiple regression model. The tree helps reveal practical threshold behavior, while the regression provides coefficient based estimates, diagnostic testing, and formal inference.
| Series | Mean | Median | SD | Min | Max |
|---|---|---|---|---|---|
| NASDAQ | 9247.76 | 7973.39 | 2995.18 | 5477.00 | 16057.44 |
| Callaway | 19.35 | 18.07 | 6.22 | 5.34 | 37.29 |
| Caesars | 50.12 | 44.07 | 26.73 | 7.10 | 119.49 |
The stock plot suggests that all three series move in persistent trends rather than fluctuating around a fixed mean. That is a warning sign against running simple regressions in levels without first checking for unit roots. In financial data, apparent co movement can be driven by shared trending rather than a genuine equilibrium relationship.
| Series | Level_Statistic | Level_p_value | Diff_Statistic | Diff_p_value | Order_of_Integration |
|---|---|---|---|---|---|
| NASDAQ | -1.4955 | 0.7919 | -10.4276 | 0.01 | I(1) |
| Callaway | -2.2879 | 0.4564 | -10.7754 | 0.01 | I(1) |
| Caesars | -2.3786 | 0.4180 | -8.9633 | 0.01 | I(1) |
NASDAQ: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.
Callaway: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.
Caesars: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.
The key question in this section is whether shocks to each stock series are temporary or persistent. If a series is I(1), shocks have lasting effects in levels, but the differenced series becomes stationary. That matters because Engle Granger cointegration testing only makes sense when the variables are nonstationary in a compatible way.
| Pair | Residual_Test_Statistic | Critical_Value_1pct | Critical_Value_5pct | Critical_Value_10pct | Cointegrated_at_5pct |
|---|---|---|---|---|---|
| Callaway ~ NASDAQ | -2.3875 | -2.58 | -1.95 | -1.62 | Yes |
| Caesars ~ NASDAQ | -2.5380 | -2.58 | -1.95 | -1.62 | Yes |
| Caesars ~ Callaway | -3.5453 | -2.58 | -1.95 | -1.62 | Yes |
Callaway ~ NASDAQ: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.
Caesars ~ NASDAQ: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.
Caesars ~ Callaway: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.
The Engle Granger test asks whether two nonstationary series still move together in a way that prevents them from drifting apart forever. That is stronger than simple correlation. Cointegration means there is evidence of a stable long run relationship.
| Variable | Mean | SD | Min | Max | |
|---|---|---|---|---|---|
| notatt | notatt | 5859.53 | 2105.17 | 2554.0 | 11040.00 |
| line | line | -0.98 | 6.01 | -13.0 | 10.50 |
| total | total | 45.95 | 4.75 | 37.5 | 58.00 |
| temp | temp | 49.62 | 13.42 | 20.0 | 79.00 |
| hum | hum | 72.69 | 10.86 | 54.0 | 95.00 |
| precip | precip | 0.23 | 0.55 | 0.0 | 2.17 |
| wind | wind | 8.72 | 4.23 | 3.0 | 23.00 |
| thur | thur | 0.03 | 0.18 | 0.0 | 1.00 |
| mon | mon | 0.03 | 0.18 | 0.0 | 1.00 |
| sunnight | sunnight | 0.06 | 0.25 | 0.0 | 1.00 |
| div | div | 0.38 | 0.49 | 0.0 | 1.00 |
| absline | absline | 5.08 | 3.23 | 1.0 | 13.00 |
| winpct | winpct | 0.42 | 0.31 | 0.0 | 1.00 |
| offw | offw | 0.47 | 0.51 | 0.0 | 1.00 |
| offl | offl | 0.50 | 0.51 | 0.0 | 1.00 |
| win1 | win1 | 0.22 | 0.42 | 0.0 | 1.00 |
| win2 | win2 | 0.19 | 0.40 | 0.0 | 1.00 |
| win2p | win2p | 0.06 | 0.25 | 0.0 | 1.00 |
| loss1 | loss1 | 0.31 | 0.47 | 0.0 | 1.00 |
| loss2 | loss2 | 0.03 | 0.18 | 0.0 | 1.00 |
| loss2p | loss2p | 0.16 | 0.37 | 0.0 | 1.00 |
| sep | sep | 0.25 | 0.44 | 0.0 | 1.00 |
| oct | oct | 0.22 | 0.42 | 0.0 | 1.00 |
| nov | nov | 0.28 | 0.46 | 0.0 | 1.00 |
| dec | dec | 0.25 | 0.44 | 0.0 | 1.00 |
| fav | fav | 0.59 | 0.50 | 0.0 | 1.00 |
The no show analysis looks at the number of ticket holders who did not attend the game. This is a behavioral outcome shaped by weather, scheduling, team quality, and game context. A regression tree is helpful for uncovering split based decision structure, while a regression model is useful for measuring average relationships and testing statistical reliability.
The first split in the regression tree is on winpct. That means this variable produces the largest reduction in unexplained variation and is the most important top level separator in the no show data. The later branches refine that pattern and show how combinations of game conditions help distinguish lower no show situations from higher no show situations.
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 10887.1976 | 5333.8937 | 2.0411 | 0.0571 |
| absline | -107.0390 | 112.2758 | -0.9534 | 0.3538 |
| total | -1.1953 | 95.3327 | -0.0125 | 0.9901 |
| temp | -7.0252 | 37.0124 | -0.1898 | 0.8517 |
| hum | -21.9673 | 40.7770 | -0.5387 | 0.5971 |
| precip | 598.6617 | 824.2997 | 0.7263 | 0.4776 |
| wind | 141.9897 | 92.5427 | 1.5343 | 0.1434 |
| thur | 709.4595 | 1879.4306 | 0.3775 | 0.7105 |
| mon | -2958.8447 | 2165.4318 | -1.3664 | 0.1896 |
| sunnight | -1889.7905 | 1518.5936 | -1.2444 | 0.2302 |
| div | -218.8445 | 1032.4446 | -0.2120 | 0.8347 |
| winpct | -4602.5944 | 1233.7453 | -3.7306 | 0.0017 |
| sep | -2089.6916 | 1797.0985 | -1.1628 | 0.2610 |
| oct | -2806.3902 | 1780.0615 | -1.5766 | 0.1333 |
| nov | -1825.2117 | 1446.3944 | -1.2619 | 0.2240 |
This regression follows the assignment specification directly, with no shows as the dependent variable and the provided weather, timing, line, and team quality variables as predictors.
| Variable | VIF |
|---|---|
| absline | 1.710 |
| total | 2.659 |
| temp | 3.202 |
| hum | 2.543 |
| precip | 2.702 |
| wind | 1.988 |
| thur | 1.433 |
| mon | 1.902 |
| sunnight | 1.810 |
| div | 3.347 |
| winpct | 1.884 |
| sep | 8.113 |
| oct | 7.255 |
| nov | 5.666 |
##
## The following predictors had VIF values above 5 and were removed to reduce multicollinearity: sep, oct, nov
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 9581.7230 | 5119.4196 | 1.8716 | 0.0760 |
| absline | -48.4811 | 103.5376 | -0.4682 | 0.6447 |
| total | -51.4474 | 83.9265 | -0.6130 | 0.5468 |
| temp | -28.4994 | 27.7970 | -1.0253 | 0.3175 |
| hum | 11.9470 | 33.7419 | 0.3541 | 0.7270 |
| precip | -51.1611 | 690.7997 | -0.0741 | 0.9417 |
| wind | 129.1415 | 72.0212 | 1.7931 | 0.0881 |
| thur | 692.7809 | 1794.1814 | 0.3861 | 0.7035 |
| mon | -2304.2140 | 1894.2189 | -1.2164 | 0.2380 |
| sunnight | -1850.6369 | 1313.5858 | -1.4088 | 0.1742 |
| div | 813.1232 | 718.1246 | 1.1323 | 0.2709 |
| winpct | -4314.7880 | 1164.7130 | -3.7046 | 0.0014 |
| Statistic | p_value |
|---|---|
| 7.6586 | 0.7435 |
| term | estimate | std.error | statistic | p.value |
|---|---|---|---|---|
| (Intercept) | 9581.7230 | 4851.5049 | 1.9750 | 0.0622 |
| absline | -48.4811 | 116.2502 | -0.4170 | 0.6811 |
| total | -51.4474 | 66.9874 | -0.7680 | 0.4515 |
| temp | -28.4994 | 31.8241 | -0.8955 | 0.3812 |
| hum | 11.9470 | 37.1728 | 0.3214 | 0.7512 |
| precip | -51.1611 | 610.3954 | -0.0838 | 0.9340 |
| wind | 129.1415 | 59.3513 | 2.1759 | 0.0417 |
| thur | 692.7809 | 1138.0079 | 0.6088 | 0.5495 |
| mon | -2304.2140 | 1170.1993 | -1.9691 | 0.0630 |
| sunnight | -1850.6369 | 568.8708 | -3.2532 | 0.0040 |
| div | 813.1232 | 571.6541 | 1.4224 | 0.1703 |
| winpct | -4314.7880 | 1575.5072 | -2.7387 | 0.0127 |
Using robust standard errors, the statistically significant predictors at the 5 percent level are: wind, sunnight, winpct. These variables show the clearest relationship with no shows in the final regression model.
The Breusch Pagan test does not indicate strong evidence of heteroskedasticity, although the robust standard error results still provide a cautious basis for inference.
The regression tree and the regression model approach the same problem from different angles. The tree is designed to uncover decision rules and thresholds, while the regression summarizes average conditional relationships. If the main regression predictors also appear near the top of the tree, that strengthens the overall story. If the tree emphasizes thresholds that the regression smooths over, that suggests the attendance process may be nonlinear.
The stock portion of the assignment shows why time series structure matters before running level based regressions. Unit root testing identifies whether the series are stationary or integrated, and Engle Granger testing determines whether any of the nonstationary series share a long run equilibrium. Only if that equilibrium is present does an error correction model make sense.
The no show portion shows the value of combining machine learning style partitioning with classical regression. The tree provides an intuitive picture of how the predictors split the data, while the regression model provides coefficient based estimates, multicollinearity checks, and heteroskedasticity robust inference. Together, the two approaches provide a fuller explanation of football no show behavior.