Introduction

This assignment covers two very different modeling problems. The first looks at daily stock price series for the NASDAQ, Callaway, and Caesars. Since financial price series usually trend over time, the analysis has to begin with stationarity. That is why the time series portion starts with unit root testing, then moves to Engle Granger cointegration testing, and then to an error correction model only if the data support a long run equilibrium relationship.

The second part focuses on football no shows. Here the question is not whether someone bought a ticket, but whether they actually used it. That makes the no show problem a good fit for both a regression tree and a multiple regression model. The tree helps reveal practical threshold behavior, while the regression provides coefficient based estimates, diagnostic testing, and formal inference.

Part 1: Stock Time Series Analysis

Stock Data Overview

Summary Statistics for the Stock Series
Series	Mean	Median	SD	Min	Max
NASDAQ	9247.76	7973.39	2995.18	5477.00	16057.44
Callaway	19.35	18.07	6.22	5.34	37.29
Caesars	50.12	44.07	26.73	7.10	119.49

The stock plot suggests that all three series move in persistent trends rather than fluctuating around a fixed mean. That is a warning sign against running simple regressions in levels without first checking for unit roots. In financial data, apparent co movement can be driven by shared trending rather than a genuine equilibrium relationship.

Unit Root Tests and Order of Integration

ADF Unit Root Results and Order of Integration
Series	Level_Statistic	Level_p_value	Diff_Statistic	Diff_p_value	Order_of_Integration
NASDAQ	-1.4955	0.7919	-10.4276	0.01	I(1)
Callaway	-2.2879	0.4564	-10.7754	0.01	I(1)
Caesars	-2.3786	0.4180	-8.9633	0.01	I(1)

NASDAQ: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.

Callaway: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.

Caesars: I(1). The level series fails to reject a unit root, but the first difference rejects a unit root, which is the classic pattern for a nonstationary financial price series.

The key question in this section is whether shocks to each stock series are temporary or persistent. If a series is I(1), shocks have lasting effects in levels, but the differenced series becomes stationary. That matters because Engle Granger cointegration testing only makes sense when the variables are nonstationary in a compatible way.

Engle Granger Cointegration Tests

Engle Granger Cointegration Test Results
Pair	Residual_Test_Statistic	Critical_Value_1pct	Critical_Value_5pct	Critical_Value_10pct	Cointegrated_at_5pct
Callaway ~ NASDAQ	-2.3875	-2.58	-1.95	-1.62	Yes
Caesars ~ NASDAQ	-2.5380	-2.58	-1.95	-1.62	Yes
Caesars ~ Callaway	-3.5453	-2.58	-1.95	-1.62	Yes

Callaway ~ NASDAQ: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.

Caesars ~ NASDAQ: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.

Caesars ~ Callaway: Yes. The residuals from the long run regression appear stationary at the 5 percent level, which suggests the two nonstationary series share a stable long run equilibrium relationship.

The Engle Granger test asks whether two nonstationary series still move together in a way that prevents them from drifting apart forever. That is stronger than simple correlation. Cointegration means there is evidence of a stable long run relationship.

Part 2: No Show Analysis

No Show Data Overview

Summary Statistics for the No Show Data
	Variable	Mean	SD	Min	Max
notatt	notatt	5859.53	2105.17	2554.0	11040.00
line	line	-0.98	6.01	-13.0	10.50
total	total	45.95	4.75	37.5	58.00
temp	temp	49.62	13.42	20.0	79.00
hum	hum	72.69	10.86	54.0	95.00
precip	precip	0.23	0.55	0.0	2.17
wind	wind	8.72	4.23	3.0	23.00
thur	thur	0.03	0.18	0.0	1.00
mon	mon	0.03	0.18	0.0	1.00
sunnight	sunnight	0.06	0.25	0.0	1.00
div	div	0.38	0.49	0.0	1.00
absline	absline	5.08	3.23	1.0	13.00
winpct	winpct	0.42	0.31	0.0	1.00
offw	offw	0.47	0.51	0.0	1.00
offl	offl	0.50	0.51	0.0	1.00
win1	win1	0.22	0.42	0.0	1.00
win2	win2	0.19	0.40	0.0	1.00
win2p	win2p	0.06	0.25	0.0	1.00
loss1	loss1	0.31	0.47	0.0	1.00
loss2	loss2	0.03	0.18	0.0	1.00
loss2p	loss2p	0.16	0.37	0.0	1.00
sep	sep	0.25	0.44	0.0	1.00
oct	oct	0.22	0.42	0.0	1.00
nov	nov	0.28	0.46	0.0	1.00
dec	dec	0.25	0.44	0.0	1.00
fav	fav	0.59	0.50	0.0	1.00

The no show analysis looks at the number of ticket holders who did not attend the game. This is a behavioral outcome shaped by weather, scheduling, team quality, and game context. A regression tree is helpful for uncovering split based decision structure, while a regression model is useful for measuring average relationships and testing statistical reliability.

Regression Tree

The first split in the regression tree is on winpct. That means this variable produces the largest reduction in unexplained variation and is the most important top level separator in the no show data. The later branches refine that pattern and show how combinations of game conditions help distinguish lower no show situations from higher no show situations.

Multiple Regression Model

Full Multiple Regression Model for No Shows
term	estimate	std.error	statistic	p.value
(Intercept)	10887.1976	5333.8937	2.0411	0.0571
absline	-107.0390	112.2758	-0.9534	0.3538
total	-1.1953	95.3327	-0.0125	0.9901
temp	-7.0252	37.0124	-0.1898	0.8517
hum	-21.9673	40.7770	-0.5387	0.5971
precip	598.6617	824.2997	0.7263	0.4776
wind	141.9897	92.5427	1.5343	0.1434
thur	709.4595	1879.4306	0.3775	0.7105
mon	-2958.8447	2165.4318	-1.3664	0.1896
sunnight	-1889.7905	1518.5936	-1.2444	0.2302
div	-218.8445	1032.4446	-0.2120	0.8347
winpct	-4602.5944	1233.7453	-3.7306	0.0017
sep	-2089.6916	1797.0985	-1.1628	0.2610
oct	-2806.3902	1780.0615	-1.5766	0.1333
nov	-1825.2117	1446.3944	-1.2619	0.2240

This regression follows the assignment specification directly, with no shows as the dependent variable and the provided weather, timing, line, and team quality variables as predictors.

VIF Check and Multicollinearity Correction

Variance Inflation Factors for the Full No Show Model
Variable	VIF
absline	1.710
total	2.659
temp	3.202
hum	2.543
precip	2.702
wind	1.988
thur	1.433
mon	1.902
sunnight	1.810
div	3.347
winpct	1.884
sep	8.113
oct	7.255
nov	5.666

## 
## The following predictors had VIF values above 5 and were removed to reduce multicollinearity: sep, oct, nov

Final No Show Regression Model After VIF Review
term	estimate	std.error	statistic	p.value
(Intercept)	9581.7230	5119.4196	1.8716	0.0760
absline	-48.4811	103.5376	-0.4682	0.6447
total	-51.4474	83.9265	-0.6130	0.5468
temp	-28.4994	27.7970	-1.0253	0.3175
hum	11.9470	33.7419	0.3541	0.7270
precip	-51.1611	690.7997	-0.0741	0.9417
wind	129.1415	72.0212	1.7931	0.0881
thur	692.7809	1794.1814	0.3861	0.7035
mon	-2304.2140	1894.2189	-1.2164	0.2380
sunnight	-1850.6369	1313.5858	-1.4088	0.1742
div	813.1232	718.1246	1.1323	0.2709
winpct	-4314.7880	1164.7130	-3.7046	0.0014

Heteroskedasticity Test and Robust Inference

Breusch Pagan Test for Heteroskedasticity
Statistic	p_value
7.6586	0.7435

Final No Show Regression Model with Robust Standard Errors
term	estimate	std.error	statistic	p.value
(Intercept)	9581.7230	4851.5049	1.9750	0.0622
absline	-48.4811	116.2502	-0.4170	0.6811
total	-51.4474	66.9874	-0.7680	0.4515
temp	-28.4994	31.8241	-0.8955	0.3812
hum	11.9470	37.1728	0.3214	0.7512
precip	-51.1611	610.3954	-0.0838	0.9340
wind	129.1415	59.3513	2.1759	0.0417
thur	692.7809	1138.0079	0.6088	0.5495
mon	-2304.2140	1170.1993	-1.9691	0.0630
sunnight	-1850.6369	568.8708	-3.2532	0.0040
div	813.1232	571.6541	1.4224	0.1703
winpct	-4314.7880	1575.5072	-2.7387	0.0127

Using robust standard errors, the statistically significant predictors at the 5 percent level are: wind, sunnight, winpct. These variables show the clearest relationship with no shows in the final regression model.

The Breusch Pagan test does not indicate strong evidence of heteroskedasticity, although the robust standard error results still provide a cautious basis for inference.

Comparing the Tree and the Regression

The regression tree and the regression model approach the same problem from different angles. The tree is designed to uncover decision rules and thresholds, while the regression summarizes average conditional relationships. If the main regression predictors also appear near the top of the tree, that strengthens the overall story. If the tree emphasizes thresholds that the regression smooths over, that suggests the attendance process may be nonlinear.

Conclusion

The stock portion of the assignment shows why time series structure matters before running level based regressions. Unit root testing identifies whether the series are stationary or integrated, and Engle Granger testing determines whether any of the nonstationary series share a long run equilibrium. Only if that equilibrium is present does an error correction model make sense.

The no show portion shows the value of combining machine learning style partitioning with classical regression. The tree provides an intuitive picture of how the predictors split the data, while the regression model provides coefficient based estimates, multicollinearity checks, and heteroskedasticity robust inference. Together, the two approaches provide a fuller explanation of football no show behavior.

Time Series and No-Show Analysis

DJ Barry