Modeling Changes in Maximal Vertical Jump Heights with Sparse Information

Rasim Muzaffer Musal, Kevin McCurdy, plots helped by Chatgpt

Practical Motivation

A multitude of work exists that discusses how, whether and to what extent a covariate effects athletic performance.
Literature on Maximal Vertical Jump (MVJ) of athletes, measured with a vertec device is no exception.
We want to come up with a model that will help the athletes and trainers come up with a comprehensive plan.
MVJ is an important indicator of athletic ability.

Statistical Motivation

For each athlete multitude of covariates has been measured.
The number of measured MVJ is small.
This problem is called large P small N problem.
A general solution as a variable selection problem is through information theoretic methods. AIC, BIC, AICc.
Others might be through dimension reduction such as PCA or penalized regression models.
All these models either remove a variable from the model or do not penalize the larger effects of variables.

Motivation

In athletic training there can be a lot of covariates that effect the result in incremental ways.
If there are some covariates that have a larger effect they get to be picked by information theoretic methods. The same is true for penalized regression models.
A comprehensive approach is important for athletes and their trainers.

Data: Target Variable

-Texas State University Women’s Basketball Team \(MVJ_{dif}\)

## Data: Covariates

\(WL_{t,j}\): Workload t days from MVJ measurement for athlete j
\(LL_{t,j}\): Intensity measured as Workload/TIME t days from MVJ measurement for athlete j
\(DL_{t,j}\): \(WL_{t,j} \times IL_{t,j}\) for athlete j
Survey questions on \(Stress_{j}\), \(Sleep_{j}\), \(Recovery_{j}\) for athlete j
\(RT_{j}\) Resistance Training in Gym per week for athlete j
Athlete Baseline

Mandatory Preliminaries

-Parameters of probability distributions that will be assigned a prior distribution f. \[f(\mathbf{\theta})\]

Likelihood \[ f( \bf{Y} \vert \mathbf{\theta} ) = \prod_{1=1}^{i=n} f( Y_{i} \vert \theta) \]
The larger this number is, the better the fit the function provides for the observations.

Mandatory Preliminaries

-Posterior Parameters \[f(\mathbf{\theta} \vert \bf{Y})=\frac{\prod_{1=1}^{i=n} f( Y_{i} \vert \theta) \times f(\mathbf{\theta})}{\int_{\theta} \prod_{1=1}^{i=n} f( Y_{i} \vert \theta) \times f(\mathbf{\theta})}=\frac{f(Y,\theta)}{f(Y)}\] Usually the integral in the denominator is not easy to evaluate therefore MCMC methods are used.

Mandatory Preliminaries

Posterior Predictives: Generate \(M \times K\) new posterior predictive (\(Y^{new}\)) values from M posterior parameters \((\theta^{+})\).

\[f(Y^{new} \vert \theta^{+}) \]

Literature on Variable Selection

Different approaches exist.
- Information theoretic methods eliminate variables
- Penalized Likelihood models either eliminate or penalize coefficients
Bayesian approach might look like penalized likelihood but it is not.
The priors ensure that all coefficients, including the larger ones are penalized but no variable coefficient is set to 0.

Frequentist Variable Selection

AIC, BIC, AICc are all Based on the idea that if the added coefficient to be estimated do not improve the likelihood compared to the penalty factor it incurs, eliminate it from the model

\[\begin{aligned} AIC & = -2\times L(\boldsymbol{\theta};MVJ_{dif}) + 2\times p \\ BIC & = -2\times L(\boldsymbol{\theta};MVJ_{dif}) + 2\times log(n) \\ AICc &= AIC+ 2 \times \frac{2 \times p \times (p+1)}{(n-p-1)} \end{aligned}\]

BIC Example Table 1

Coefficient	Estimate	Std. Error	t value	\(p_{v}\)
\(b_{0}\)	-0.46	0.34	-1.36	0.18
\(A_2\)	-0.34	0.53	-0.63	0.53
\(A_3\)	0.74	0.46	1.61	0.11
\(A_4\)	1.21	0.58	2.09	0.04
\(A_5\)	0.32	0.56	0.57	0.57
\(A_6\)	1.26	0.30	4.17	0.00

BIC Example Table 2

Coefficient	Estimate	Std. Error	t value	\(p_{v}\)
\(\beta_{WL_1}\)	-0.40	0.14	-2.92	0.01
\(\beta_{WL_4}\)	0.46	0.15	3.13	0.00
\(\beta_{IL_5}\)	0.22	0.12	1.87	0.07
\(\beta_{DL_2}\)	-0.26	0.13	-1.93	0.06
\(\beta_{Sign_{-1}}\)	-0.69	0.21	-3.20	0.00
\(\beta_{Sign_{+1}}\)	-0.29	0.24	-1.21	0.23
\(\beta_{Sleep}\)	0.44	0.12	3.75	0.00
\(\beta_{Stress}\)	0.67	0.23	2.97	0.00

Bayesian Model Development

Mitchell and Beauchamp 1988 JASA
George and McCulloch 1993 JASA
Carvalho Polson and Scott 2010 Biometrika
Polson and Scott 2011 Bayesian Statistics
Piironen and Vekhtari 2017 EJS

Mitchell and Beauchamp 1988 JASA

\[Y=\sum_{j=1}^{j=K} \beta_{j}\times x_{j} +\epsilon \] - Assign a prior to “vulnerable” coefficients so that \(j'th\) such coefficient \(\beta_{j}\) has a mass at 0 and a uniform distribution for the rest of possible values. \(\epsilon \sim N(0,\sigma)\). \(\sigma\) is assigned a prior distribution.

Paper’s goal is to find the probability of subset models, we ignore that for now.

Mitchell and Beauchamp 1988

\(f(\beta_{j} \vert \beta_{j}\ne 0)\sim Uniform(-f,+f), pr(\beta_{j}=0)=h_{0,j}\)

George and McCulloch 1993

Stochastic Search Variable Selection

\[\begin{aligned} & f(\beta_{j} \vert\gamma_{j})\sim (1-\gamma_{j})N(0,\tau^{2}_{j})+\gamma_{j}N(0,c^{2}_{j}\tau^{2}_{j}) \\ & \gamma_{j}= \{1,0\},\tau_{i}>0, \text{small};c_{i}>1, \text{large} \\ & \text{if }\gamma_{j}= 0 \text{ then } \beta_{j} \approx 0 \text{ otherwise } \beta_{j} \ne 0 \\ & \mathbf{\gamma} \sim Bernoulli(\bf{p}), \sigma^{2}\vert \mathbf{\gamma} \sim IG(\nu_{\gamma}/2,\nu_{\gamma}\lambda_{\gamma}/2) \end{aligned}\]

\(\nu_{\gamma}\) and \(\lambda_{\gamma}\) will also be assigned a prior that will depend on how many non-zero values we thing there might be. More non-zero \(\beta\) values imply less \(\sigma^{2}\) for Y.

1988 \(\&\) 1993

\[\begin{aligned} f(\beta_{j} \vert \gamma_{j})\sim & (1-\gamma_{j})N(0,c^{2}) +\gamma_{j}N(0,\epsilon^2_{j})\\ p(\lambda_{j})\sim & Ber(\pi), \epsilon <<c , \text{often } \epsilon=0\\ \text{c and } \pi & \text{ different priors possible } \end{aligned}\]

if \(\epsilon=0\), the above can be written as

\[ \begin{aligned} f(\beta_{j} \vert \lambda_{j} c)\sim & N(0,c^{2}\lambda^{2}_{j]}) +\gamma_{j}N(0,\epsilon^2_{j})\\ p(\lambda_{j})\sim & Ber(\pi), \epsilon <<c , \text{often } \epsilon=0\\ \text{c and } \pi & \text{ different priors possible }\\ \end{aligned} \]

Carvalho Polson and Scott 2010

Horseshoe Estimator is the prior for the \(E(\beta_{j} \vert y)\)

\[\begin{aligned} f(Y \vert \beta)\sim N(\beta,\sigma^{2} I) \\ f(\beta_{j} \vert \lambda_{j}) \sim N(0,\lambda^{2}_{j}),\\ f(\lambda_{j} \vert \tau) \sim C^{+}(0,\tau),\\ f(\tau \vert \sigma) \sim C^{+}(0,\sigma). \end{aligned} \]

Why the phrase horsehoe?

Assume \(\sigma^{2}\) and \(\tau^{2}\) is 1.
For \(\kappa_{j}=\frac{1}{1+\lambda^{2}_{j}}\)
\(E(\beta_{j}\vert y)= \int_{0}^{1}(1-\kappa_{j})y_{j}f(\kappa_{j}\vert y)=[1-E(\kappa_{j}\vert y)]y_{j}\)
\(\text{If }\) \(\kappa_{j}=1\), \(\beta_{j}\) is 0, if \(\kappa_{j}=0\), \(\beta_{j}\) is \(\text{preserved.}\)

Why the phrase horseshoe?

Piironen and Vehtari 2017

Another way to write Horsehoe Prior \[\begin{aligned} f(Y \vert \beta)\sim N(\beta,\sigma^{2} I) \\ f(\beta_{j} \vert \lambda_{j}, \tau) \sim N(0,\lambda^{2}_{j}\tau^{2}),\\ f(\lambda_{j}) \sim C^{+}(0,1). \end{aligned} \] \(\tau\) is the global shrinkage parameter. \(\lambda_{j}\) is what allows some parameters to escape total shrinkage.

Piironen and Vehtari 2017

Issues to highlight for :Horseshoe Prior
- Prior on \(\tau\) has a very large effect
- Issue on unregularized large signals, \(\beta_{j} \approx 0 \text{ or } \beta_{j}= \hat{\beta_{j}}\)
- In problems with sparse information large \(\beta_{j}\) can reduce the other coefficients vanish.

Model

\[\begin{aligned} f(\beta_{j} \vert \hat{\lambda_{j}},\tau,c)=N(0,\tau^{2}\hat{\lambda^{2}_{j}})\\ \hat{\lambda^{2}_{j}}=\frac{c^{2}\lambda^{2}_{j}}{c^{2}+\tau^{2}\lambda^{2}_{j}}\\ \lambda_{j} \sim C^{+}(0,1) \end{aligned}\]

If \(\tau^{2}\lambda^{2}_{j}<<c^{2}\), \(\hat{\lambda_{j}} \to \lambda_{j},\beta{j} \text{ close to } 0\)
If \(\tau^{2}\lambda^{2}_{j}>>c^{2}\), \(\hat{\lambda_{j}} \to \frac{c^{2}}{\tau^{2}}\) and \(\beta_{j} \approx N(0,c^{2})\)
\(c^{2}\sim InvGamma(\alpha_c,\beta_c)\), \(\alpha_{c}\sim \frac{\nu}{2}\), \(\beta_{c} \sim \frac{\nu s^{2}}{2}\)

Results

\(MVJ_{T_{j}}\) Model Comparisons

AIC=0.89
AICc=0.94
BIC=1.02
\(Bayes_{M2}\)

Limitations

Theoretical and Practical Limitations abound.
More Vertical Jumps needed from more people.
Pre season covariate data
Resistance training can be detailed.
Decision Support from statisticians to the coaches.

Modeling Changes in Maximal Vertical Jump Heights with Sparse Information

Practical Motivation

Statistical Motivation

Motivation

Data: Target Variable

Mandatory Preliminaries

Mandatory Preliminaries

Mandatory Preliminaries

Literature on Variable Selection

Frequentist Variable Selection

BIC Example Table 1

BIC Example Table 2

Bayesian Model Development

Mitchell and Beauchamp 1988 JASA

Mitchell and Beauchamp 1988

George and McCulloch 1993

1988 \(\&\) 1993

Carvalho Polson and Scott 2010

Why the phrase horsehoe?

Why the phrase horseshoe?

Piironen and Vehtari 2017

Piironen and Vehtari 2017

Model

Results

Results

Results

Results

\(MVJ_{T_{j}}\) Model Comparisons

Limitations

Questions?