Modeling Changes in Maximal Vertical Jump Heights with Sparse Information

Rasim Muzaffer Musal, Kevin McCurdy, plots helped by Chatgpt

Practical Motivation

  • A multitude of work exists that discusses how, whether and to what extent a covariate effects athletic performance.

  • Literature on Maximal Vertical Jump (MVJ) of athletes, measured with a vertec device is no exception.

  • We want to come up with a model that will help the athletes and trainers come up with a comprehensive plan.

  • MVJ is an important indicator of athletic ability.

Statistical Motivation

  • For each athlete multitude of covariates has been measured.

  • The number of measured MVJ is small.

  • This problem is called large P small N problem.

  • A general solution as a variable selection problem is through information theoretic methods. AIC, BIC, AICc.

  • Others might be through dimension reduction such as PCA or penalized regression models.

  • All these models either remove a variable from the model or do not penalize the larger effects of variables.

Motivation

  • In athletic training there can be a lot of covariates that effect the result in incremental ways.

  • If there are some covariates that have a larger effect they get to be picked by information theoretic methods. The same is true for penalized regression models.

  • A comprehensive approach is important for athletes and their trainers.

Data: Target Variable

-Texas State University Women’s Basketball Team \(MVJ_{dif}\)

## Data: Covariates

  • \(WL_{t,j}\): Workload t days from MVJ measurement for athlete j
  • \(LL_{t,j}\): Intensity measured as Workload/TIME t days from MVJ measurement for athlete j
  • \(DL_{t,j}\): \(WL_{t,j} \times IL_{t,j}\) for athlete j
  • Survey questions on \(Stress_{j}\), \(Sleep_{j}\), \(Recovery_{j}\) for athlete j
  • \(RT_{j}\) Resistance Training in Gym per week for athlete j
  • Athlete Baseline

Mandatory Preliminaries

-Parameters of probability distributions that will be assigned a prior distribution f. \[f(\mathbf{\theta})\]

  • Likelihood \[ f( \bf{Y} \vert \mathbf{\theta} ) = \prod_{1=1}^{i=n} f( Y_{i} \vert \theta) \]

  • The larger this number is, the better the fit the function provides for the observations.

Mandatory Preliminaries

-Posterior Parameters \[f(\mathbf{\theta} \vert \bf{Y})=\frac{\prod_{1=1}^{i=n} f( Y_{i} \vert \theta) \times f(\mathbf{\theta})}{\int_{\theta} \prod_{1=1}^{i=n} f( Y_{i} \vert \theta) \times f(\mathbf{\theta})}=\frac{f(Y,\theta)}{f(Y)}\] Usually the integral in the denominator is not easy to evaluate therefore MCMC methods are used.

Mandatory Preliminaries

  • Posterior Predictives: Generate \(M \times K\) new posterior predictive (\(Y^{new}\)) values from M posterior parameters \((\theta^{+})\).

\[f(Y^{new} \vert \theta^{+}) \]

Literature on Variable Selection

  • Different approaches exist.

    • Information theoretic methods eliminate variables
    • Penalized Likelihood models either eliminate or penalize coefficients
  • Bayesian approach might look like penalized likelihood but it is not.

  • The priors ensure that all coefficients, including the larger ones are penalized but no variable coefficient is set to 0.

Frequentist Variable Selection

  • AIC, BIC, AICc are all Based on the idea that if the added coefficient to be estimated do not improve the likelihood compared to the penalty factor it incurs, eliminate it from the model
\[\begin{aligned} AIC & = -2\times L(\boldsymbol{\theta};MVJ_{dif}) + 2\times p \\ BIC & = -2\times L(\boldsymbol{\theta};MVJ_{dif}) + 2\times log(n) \\ AICc &= AIC+ 2 \times \frac{2 \times p \times (p+1)}{(n-p-1)} \end{aligned}\]

BIC Example Table 1

Coefficient Estimate Std. Error t value \(p_{v}\)
\(b_{0}\) -0.46 0.34 -1.36 0.18
\(A_2\) -0.34 0.53 -0.63 0.53
\(A_3\) 0.74 0.46 1.61 0.11
\(A_4\) 1.21 0.58 2.09 0.04
\(A_5\) 0.32 0.56 0.57 0.57
\(A_6\) 1.26 0.30 4.17 0.00

BIC Example Table 2

Coefficient Estimate Std. Error t value \(p_{v}\)
\(\beta_{WL_1}\) -0.40 0.14 -2.92 0.01
\(\beta_{WL_4}\) 0.46 0.15 3.13 0.00
\(\beta_{IL_5}\) 0.22 0.12 1.87 0.07
\(\beta_{DL_2}\) -0.26 0.13 -1.93 0.06
\(\beta_{Sign_{-1}}\) -0.69 0.21 -3.20 0.00
\(\beta_{Sign_{+1}}\) -0.29 0.24 -1.21 0.23
\(\beta_{Sleep}\) 0.44 0.12 3.75 0.00
\(\beta_{Stress}\) 0.67 0.23 2.97 0.00

Bayesian Model Development

  • Mitchell and Beauchamp 1988 JASA
  • George and McCulloch 1993 JASA
  • Carvalho Polson and Scott 2010 Biometrika
  • Polson and Scott 2011 Bayesian Statistics
  • Piironen and Vekhtari 2017 EJS

Mitchell and Beauchamp 1988 JASA

\[Y=\sum_{j=1}^{j=K} \beta_{j}\times x_{j} +\epsilon \] - Assign a prior to “vulnerable” coefficients so that \(j'th\) such coefficient \(\beta_{j}\) has a mass at 0 and a uniform distribution for the rest of possible values. \(\epsilon \sim N(0,\sigma)\). \(\sigma\) is assigned a prior distribution.

  • Paper’s goal is to find the probability of subset models, we ignore that for now.

Mitchell and Beauchamp 1988

  • \(f(\beta_{j} \vert \beta_{j}\ne 0)\sim Uniform(-f,+f), pr(\beta_{j}=0)=h_{0,j}\)

George and McCulloch 1993

  • Stochastic Search Variable Selection
\[\begin{aligned} & f(\beta_{j} \vert\gamma_{j})\sim (1-\gamma_{j})N(0,\tau^{2}_{j})+\gamma_{j}N(0,c^{2}_{j}\tau^{2}_{j}) \\ & \gamma_{j}= \{1,0\},\tau_{i}>0, \text{small};c_{i}>1, \text{large} \\ & \text{if }\gamma_{j}= 0 \text{ then } \beta_{j} \approx 0 \text{ otherwise } \beta_{j} \ne 0 \\ & \mathbf{\gamma} \sim Bernoulli(\bf{p}), \sigma^{2}\vert \mathbf{\gamma} \sim IG(\nu_{\gamma}/2,\nu_{\gamma}\lambda_{\gamma}/2) \end{aligned}\]
  • \(\nu_{\gamma}\) and \(\lambda_{\gamma}\) will also be assigned a prior that will depend on how many non-zero values we thing there might be. More non-zero \(\beta\) values imply less \(\sigma^{2}\) for Y.

1988 \(\&\) 1993

\[\begin{aligned} f(\beta_{j} \vert \gamma_{j})\sim & (1-\gamma_{j})N(0,c^{2}) +\gamma_{j}N(0,\epsilon^2_{j})\\ p(\lambda_{j})\sim & Ber(\pi), \epsilon <<c , \text{often } \epsilon=0\\ \text{c and } \pi & \text{ different priors possible } \end{aligned}\]
  • if \(\epsilon=0\), the above can be written as

\[ \begin{aligned} f(\beta_{j} \vert \lambda_{j} c)\sim & N(0,c^{2}\lambda^{2}_{j]}) +\gamma_{j}N(0,\epsilon^2_{j})\\ p(\lambda_{j})\sim & Ber(\pi), \epsilon <<c , \text{often } \epsilon=0\\ \text{c and } \pi & \text{ different priors possible }\\ \end{aligned} \]

Carvalho Polson and Scott 2010

  • Horseshoe Estimator is the prior for the \(E(\beta_{j} \vert y)\)

\[\begin{aligned} f(Y \vert \beta)\sim N(\beta,\sigma^{2} I) \\ f(\beta_{j} \vert \lambda_{j}) \sim N(0,\lambda^{2}_{j}),\\ f(\lambda_{j} \vert \tau) \sim C^{+}(0,\tau),\\ f(\tau \vert \sigma) \sim C^{+}(0,\sigma). \end{aligned} \]

Why the phrase horsehoe?

  • Assume \(\sigma^{2}\) and \(\tau^{2}\) is 1.
  • For \(\kappa_{j}=\frac{1}{1+\lambda^{2}_{j}}\)
  • \(E(\beta_{j}\vert y)= \int_{0}^{1}(1-\kappa_{j})y_{j}f(\kappa_{j}\vert y)=[1-E(\kappa_{j}\vert y)]y_{j}\)
  • \(\text{If }\) \(\kappa_{j}=1\), \(\beta_{j}\) is 0, if \(\kappa_{j}=0\), \(\beta_{j}\) is \(\text{preserved.}\)

Why the phrase horseshoe?

Piironen and Vehtari 2017

  • Another way to write Horsehoe Prior \[\begin{aligned} f(Y \vert \beta)\sim N(\beta,\sigma^{2} I) \\ f(\beta_{j} \vert \lambda_{j}, \tau) \sim N(0,\lambda^{2}_{j}\tau^{2}),\\ f(\lambda_{j}) \sim C^{+}(0,1). \end{aligned} \] \(\tau\) is the global shrinkage parameter. \(\lambda_{j}\) is what allows some parameters to escape total shrinkage.

Piironen and Vehtari 2017

  • Issues to highlight for :Horseshoe Prior
    • Prior on \(\tau\) has a very large effect
    • Issue on unregularized large signals, \(\beta_{j} \approx 0 \text{ or } \beta_{j}= \hat{\beta_{j}}\)
    • In problems with sparse information large \(\beta_{j}\) can reduce the other coefficients vanish.

Model

\[\begin{aligned} f(\beta_{j} \vert \hat{\lambda_{j}},\tau,c)=N(0,\tau^{2}\hat{\lambda^{2}_{j}})\\ \hat{\lambda^{2}_{j}}=\frac{c^{2}\lambda^{2}_{j}}{c^{2}+\tau^{2}\lambda^{2}_{j}}\\ \lambda_{j} \sim C^{+}(0,1) \end{aligned}\]
  • If \(\tau^{2}\lambda^{2}_{j}<<c^{2}\), \(\hat{\lambda_{j}} \to \lambda_{j},\beta{j} \text{ close to } 0\)

  • If \(\tau^{2}\lambda^{2}_{j}>>c^{2}\), \(\hat{\lambda_{j}} \to \frac{c^{2}}{\tau^{2}}\) and \(\beta_{j} \approx N(0,c^{2})\)

  • \(c^{2}\sim InvGamma(\alpha_c,\beta_c)\), \(\alpha_{c}\sim \frac{\nu}{2}\), \(\beta_{c} \sim \frac{\nu s^{2}}{2}\)

Results

Results

Results

Results

\(MVJ_{T_{j}}\) Model Comparisons

  • AIC=0.89
  • AICc=0.94
  • BIC=1.02
  • \(Bayes_{M2}\)

Limitations

  • Theoretical and Practical Limitations abound.
  • More Vertical Jumps needed from more people.
  • Pre season covariate data
  • Resistance training can be detailed.
  • Decision Support from statisticians to the coaches.

Questions?