Introduction

This assignment covers two very different modeling problems. The first looks at daily stock price series for the NASDAQ, Callaway, and Caesars. Since financial price series usually trend over time, the analysis has to begin with stationarity. That is why the time series portion starts with unit root testing, then moves to Engle Granger cointegration testing, and then to an error correction model only if the data support a long run equilibrium relationship.

The second part focuses on football no shows. Here the question is not whether someone bought a ticket, but whether they actually used it. That makes the no show problem a good fit for both a regression tree and a multiple regression model. The tree helps reveal practical threshold behavior, while the regression provides coefficient based estimates, diagnostic testing, and formal inference.

Part 1: Stock Time Series Analysis

Stock Data Overview

Summary Statistics for the Stock Series
Series Mean Median SD Min Max
NASDAQ 9247.76 7973.39 2995.18 5477.00 16057.44
Callaway 19.35 18.07 6.22 5.34 37.29
Caesars 50.12 44.07 26.73 7.10 119.49

The stock plot suggests that all three series move in persistent trends rather than fluctuating around a fixed mean. That is a warning sign against running simple regressions in levels without first checking for unit roots. In financial data, apparent co movement can be driven by shared trending rather than a genuine equilibrium relationship.

Unit Root Tests and Order of Integration

ADF Unit Root Results and Order of Integration
Series Level_Statistic Level_p_value Diff_Statistic Diff_p_value Order_of_Integration
NASDAQ -1.4955 0.7919 -10.4276 0.01 I(1)
Callaway -2.2879 0.4564 -10.7754 0.01 I(1)
Caesars -2.3786 0.4180 -8.9633 0.01 I(1)

NASDAQ: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.

Callaway: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.

Caesars: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.

The key question in this section is whether shocks to each stock series are temporary or persistent. If a series is I(1), shocks have lasting effects in levels, but the differenced series becomes stationary. That matters because Engle Granger cointegration testing only makes sense when the variables are nonstationary in a compatible way.

Engle Granger Cointegration Tests

Engle Granger Cointegration Test Results
Pair Residual_Test_Statistic Critical_Value_1pct Critical_Value_5pct Critical_Value_10pct Cointegrated_at_5pct
Callaway ~ NASDAQ -2.3875 -2.58 -1.95 -1.62 Yes
Caesars ~ NASDAQ -2.5380 -2.58 -1.95 -1.62 Yes
Caesars ~ Callaway -3.5453 -2.58 -1.95 -1.62 Yes

Callaway ~ NASDAQ: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.

Caesars ~ NASDAQ: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.

Caesars ~ Callaway: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.

The Engle Granger test asks whether two nonstationary series still move together in a way that prevents them from drifting apart forever. That is stronger than simple correlation. Cointegration means there is evidence of a stable long run relationship.

Part 2: No Show Analysis

No Show Data Overview

Summary Statistics for the No Show Data
Variable Mean SD Min Max
notatt notatt 5859.53 2105.17 2554.0 11040.00
line line -0.98 6.01 -13.0 10.50
total total 45.95 4.75 37.5 58.00
temp temp 49.62 13.42 20.0 79.00
hum hum 72.69 10.86 54.0 95.00
precip precip 0.23 0.55 0.0 2.17
wind wind 8.72 4.23 3.0 23.00
thur thur 0.03 0.18 0.0 1.00
mon mon 0.03 0.18 0.0 1.00
sunnight sunnight 0.06 0.25 0.0 1.00
div div 0.38 0.49 0.0 1.00
absline absline 5.08 3.23 1.0 13.00
winpct winpct 0.42 0.31 0.0 1.00
offw offw 0.47 0.51 0.0 1.00
offl offl 0.50 0.51 0.0 1.00
win1 win1 0.22 0.42 0.0 1.00
win2 win2 0.19 0.40 0.0 1.00
win2p win2p 0.06 0.25 0.0 1.00
loss1 loss1 0.31 0.47 0.0 1.00
loss2 loss2 0.03 0.18 0.0 1.00
loss2p loss2p 0.16 0.37 0.0 1.00
sep sep 0.25 0.44 0.0 1.00
oct oct 0.22 0.42 0.0 1.00
nov nov 0.28 0.46 0.0 1.00
dec dec 0.25 0.44 0.0 1.00
fav fav 0.59 0.50 0.0 1.00

The no show analysis looks at the number of ticket holders who did not attend the game. This is a behavioral outcome shaped by weather, scheduling, team quality, and game context. A regression tree is helpful for uncovering split based decision structure, while a regression model is useful for measuring average relationships and testing statistical reliability.

Regression Tree

The first split in the regression tree is on winpct. That means this variable produces the largest reduction in unexplained variation and is the most important top level separator in the no show data. The later branches refine that pattern and show how combinations of game conditions help distinguish lower no show situations from higher no show situations.

Multiple Regression Model

Full Multiple Regression Model for No Shows
term estimate std.error statistic p.value
(Intercept) 10887.1976 5333.8937 2.0411 0.0571
absline -107.0390 112.2758 -0.9534 0.3538
total -1.1953 95.3327 -0.0125 0.9901
temp -7.0252 37.0124 -0.1898 0.8517
hum -21.9673 40.7770 -0.5387 0.5971
precip 598.6617 824.2997 0.7263 0.4776
wind 141.9897 92.5427 1.5343 0.1434
thur 709.4595 1879.4306 0.3775 0.7105
mon -2958.8447 2165.4318 -1.3664 0.1896
sunnight -1889.7905 1518.5936 -1.2444 0.2302
div -218.8445 1032.4446 -0.2120 0.8347
winpct -4602.5944 1233.7453 -3.7306 0.0017
sep -2089.6916 1797.0985 -1.1628 0.2610
oct -2806.3902 1780.0615 -1.5766 0.1333
nov -1825.2117 1446.3944 -1.2619 0.2240

This regression follows the assignment specification directly, with no shows as the dependent variable and the provided weather, timing, line, and team quality variables as predictors.

VIF Check and Multicollinearity Correction

Variance Inflation Factors for the Full No Show Model
Variable VIF
absline 1.710
total 2.659
temp 3.202
hum 2.543
precip 2.702
wind 1.988
thur 1.433
mon 1.902
sunnight 1.810
div 3.347
winpct 1.884
sep 8.113
oct 7.255
nov 5.666
## 
## The following predictors had VIF values above 5 and were removed to reduce multicollinearity: sep, oct, nov
Final No Show Regression Model After VIF Review
term estimate std.error statistic p.value
(Intercept) 9581.7230 5119.4196 1.8716 0.0760
absline -48.4811 103.5376 -0.4682 0.6447
total -51.4474 83.9265 -0.6130 0.5468
temp -28.4994 27.7970 -1.0253 0.3175
hum 11.9470 33.7419 0.3541 0.7270
precip -51.1611 690.7997 -0.0741 0.9417
wind 129.1415 72.0212 1.7931 0.0881
thur 692.7809 1794.1814 0.3861 0.7035
mon -2304.2140 1894.2189 -1.2164 0.2380
sunnight -1850.6369 1313.5858 -1.4088 0.1742
div 813.1232 718.1246 1.1323 0.2709
winpct -4314.7880 1164.7130 -3.7046 0.0014

Heteroskedasticity Test and Robust Inference

Breusch Pagan Test for Heteroskedasticity
Statistic p_value
7.6586 0.7435
Final No Show Regression Model with Robust Standard Errors
term estimate std.error statistic p.value
(Intercept) 9581.7230 4851.5049 1.9750 0.0622
absline -48.4811 116.2502 -0.4170 0.6811
total -51.4474 66.9874 -0.7680 0.4515
temp -28.4994 31.8241 -0.8955 0.3812
hum 11.9470 37.1728 0.3214 0.7512
precip -51.1611 610.3954 -0.0838 0.9340
wind 129.1415 59.3513 2.1759 0.0417
thur 692.7809 1138.0079 0.6088 0.5495
mon -2304.2140 1170.1993 -1.9691 0.0630
sunnight -1850.6369 568.8708 -3.2532 0.0040
div 813.1232 571.6541 1.4224 0.1703
winpct -4314.7880 1575.5072 -2.7387 0.0127

Using robust standard errors, the statistically significant predictors at the 5 percent level are: wind, sunnight, winpct. These variables show the clearest relationship with no shows in the final regression model.

The Breusch Pagan test does not indicate strong evidence of heteroskedasticity, although the robust standard error results still provide a cautious basis for inference.

Comparing the Tree and the Regression

The regression tree and the regression model approach the same problem from different angles. The tree is designed to uncover decision rules and thresholds, while the regression summarizes average conditional relationships. If the main regression predictors also appear near the top of the tree, that strengthens the overall story. If the tree emphasizes thresholds that the regression smooths over, that suggests the attendance process may be nonlinear.

Conclusion

The stock portion of the assignment shows why time series structure matters before running level based regressions. Unit root testing identifies whether the series are stationary or integrated, and Engle Granger testing determines whether any of the nonstationary series share a long run equilibrium. Only if that equilibrium is present does an error correction model make sense.

The no show portion shows the value of combining machine learning style partitioning with classical regression. The tree provides an intuitive picture of how the predictors split the data, while the regression model provides coefficient based estimates, multicollinearity checks, and heteroskedasticity robust inference. Together, the two approaches provide a fuller explanation of football no show behavior.