Human Movement 2026: Inferential Statistics

Inferential statistics

If we want to see if there are any systematic differences between our conditions we need to run inferential statistics. However, it is good practice to check our data to see if it meets our assumption that it is “normally distributed”. See the introduction to R workbook from a few weeks ago for more details.

Histograms and density plots

We can visualise our data to see if it looks normally distributed (bell-shaped curve) through histograms and / or density plots. A really simple histogram can be produced in BASE R like so:

hist(cmj$cmj_height)

The data looks pretty good. We can improve this by calculating the density function whcih will give a better picture of the distribution curve. Note I’ve added a title and an x label here. Try changing colors or text and making this your own.

hist(cmj$cmj_height,
     freq = FALSE,           # IMPORTANT: histogram must show density, not counts
     col = "lightgray",
     border = "white",
     main = "CMJ Height Distribution",
     xlab = "CMJ Height")

lines(density(cmj$cmj_height),
      col = "red",
      lwd = 2)

Q-Q plot

The Q-Q plot, or quantile-quantile plot, is a graphical tool to help us assess if a set of data plausibly came from some theoretical distribution such as a Normal or exponential. For example, if we run a statistical analysis that assumes our dependent variable is Normally distributed, we can use a Normal Q-Q plot to check that assumption. It’s just a visual check, not an air-tight proof, so it is somewhat subjective. But itallows us to see at-a-glance if our assumption is plausible, and if not, how the assumption is violated and what data points contribute to the violation.

A Q-Q plot is a scatterplot created by plotting two sets of quantiles against one another. If both sets of quantiles came from the same distribution, we should see the points forming a line that’s roughly straight.

We can use base R or library(car) so you might need to install the package if you don’t have it already:

If we want to produce a qq-plot we can do this in base R as such

qqnorm(cmj$cmj_height); qqline(cmj$cmj_height)

Or we can use library(car) as such:

library(car) # you might want to add this to the top of your script to keep it neat and tidy? 

qqPlot(cmj$cmj_height)

[1]  1 23

Shapiro-Wilk normality test

The Shapiro–Wilk test is a test of normality and simply tells us if the data is significantly different from normal. Here the null hypothesis is that the data will be normal and the test gives us a p-value for the likelihood that we would see the distribution we do if the null hypothesis is true. Remember p-values are crude and in reality why is p=0.51 any better than p=0.49. As such this is also not “air-tight”.

The code below runs the Shapiro-Wilk test for CMJ height and we can report the statisic and p-value as such, W = 0.????, p = 0.???)

shapiro.test(cmj$cmj_height)


    Shapiro-Wilk normality test

data:  cmj$cmj_height
W = 0.97901, p-value = 0.7119

➡️ Key takeaway:

This confirms what our plots show, there is no strong evidence this data is different from normal.

So let’s crack on with a standard statistical approach

Statistical analysis

To understand if there are any differences between our conditions we need to fit an appropriate model. We will also need to install and load some new packages (sorry!)

library(lme4)
library(lmerTest)
library(emmeans)

We’re using a mixed model because each participant completed jumps in all three conditions. That means their results are not independent, the same person tends to jump in a similar way each time.

The model looks at how condition affects jump height, while accounting for differences between people.

Before we run the model we’ll set the order of our fixed effects (conditions so we are always comparing to our control condition - otherwise it sets the order in alphabetical order comparing Control and then Toward to Away)

cmj$condition <- factor(cmj$condition, levels = c("Control", "Away", "Towards"))

Now let’s run our model and call it m1:

m1 <- lmer(cmj_height ~ condition + (1|id), data = cmj)

The (1 | id) part tells R to give every participant their own starting level (random intercept), so the analysis separates:

Differences between people (some jump higher than others)
Differences within a person across conditions (the effect we care about)

This is the same idea as a repeated‑measures ANOVA, but mixed models handle this design more flexibly and correctly.”

To show results we’ll use this code:

summary(m1)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: cmj_height ~ condition + (1 | id)
   Data: cmj

REML criterion at convergence: 150.2

Scaled residuals: 
     Min       1Q   Median       3Q      Max 
-1.89146 -0.56256  0.05483  0.71277  1.52094 

Random effects:
 Groups   Name        Variance Std.Dev.
 id       (Intercept) 14.343   3.787   
 Residual              1.397   1.182   
Number of obs: 36, groups:  id, 12

Fixed effects:
                 Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)       28.6167     1.1453 12.4026  24.987 5.53e-12 ***
conditionAway      0.6667     0.4825 22.0000   1.382    0.181    
conditionTowards   0.7083     0.4825 22.0000   1.468    0.156    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) cndtnA
conditinAwy -0.211       
condtnTwrds -0.211  0.500

What does this output mean?

Random effects

id (Intercept): People differ from each other in how high they jump. The model estimates that the variation between participants is quite large (SD ≈ 3.8 cm).
Residual: This is the leftover variation within a person across trials that isn’t explained by condition. Smaller than the between‑person variation (SD ≈ 1.2 cm), which makes sense — people are more similar to themselves than to others.

➡️ Key takeaway:

The model knows that people start at different jump heights, and it adjusts for that.

Intercept This is the average jump height in the reference condition (“Control” as we set this above).

So you can read this as:

“On average, people jumped about 28.6 cm in the control condition.”

Condition effects

These show how much higher or lower the other two conditions are compared to the reference:

Away: about 0.67 cm higher, but not statistically significant
Towards: about 0.71 cm higher, also not significant

➡️ Key takeaway:

The model doesn’t find clear evidence that jump height changes across the 3 conditions.

Estimated marginal mean differences

“emmeans helps us estimate the average performance in each condition, after the model has already adjusted for the fact that people are different.

It gives us:

the estimated mean in each condition
the difference between conditions
the confidence interval for each difference
a p‑value

This is the safest and clearest way to compare conditions in a repeated‑measures design, because the estimates come from the mixed model — not from raw averages.”

emm <- emmeans(m1, ~ condition)
pairs(emm)

 contrast          estimate    SE df t.ratio p.value
 Control - Away     -0.6667 0.483 22  -1.382  0.3674
 Control - Towards  -0.7083 0.483 22  -1.468  0.3251
 Away - Towards     -0.0417 0.483 22  -0.086  0.9959

Degrees-of-freedom method: kenward-roger 
P value adjustment: tukey method for comparing a family of 3 estimates

confint(pairs(emm))

 contrast          estimate    SE df lower.CL upper.CL
 Control - Away     -0.6667 0.483 22    -1.88    0.545
 Control - Towards  -0.7083 0.483 22    -1.92    0.504
 Away - Towards     -0.0417 0.483 22    -1.25    1.170

Degrees-of-freedom method: kenward-roger 
Confidence level used: 0.95 
Conf-level adjustment: tukey method for comparing a family of 3 estimates

So this tell us there is 0.67 cm increase in the Away condition compared to the control but the confidence intervals range from -0.56 to 1.88 cm, and the p-value is 0.367.

This just gives more detail to your conclusion above that we have found no clear evidence that jump height changes across the 3 conditions

Statistical equivalence

This is the final piece of the puzzle — and there’s absolutely no need to go this far unless you want to. What we’re covering now is well beyond standard undergraduate statistics.

But if you’ve understood everything above and feel like a challenge, let’s explore it.

Just because we didn’t find evidence that jump height differs across conditions, we cannot conclude the conditions are the same.

That’s not how traditional null‑hypothesis significant testing works.

To actually test whether the conditions are similar enough, we can use an equivalence test (Lakens, Scheel, and Isager 2018).

Instead of asking “is there a difference?”, an equivalence test asks:

“Is any difference so small that it doesn’t matter in practice?”

To do this, we define a lower and upper limit within which differences are considered trivially small.

This range is called the smallest effect size of interest (SESOI) or minimal important difference. To do this we need to ask the question How much of a change in jump height would actually matter for performance?

There are two sensible ways to choose this:

Option 1: Standardised thresholds

We can set a threshold at 0.2 or 0.5 times the between‑participant SD (a common rule of thumb in sports science).

From our model the between‑participant SD ≈ 3.8 cm

So:

Small effect (0.2 × SD): 0.2 × 3.8 = 0.76 cm
Medium effect (0.5 × SD): 0.5 × 3.8 = 1.9 cm
These numbers suggest:

“Any difference smaller than about 0.8 to 2 cm probably isn’t meaningful.”

Option 2: Use real-world evidence

Datson et al. (2021) surveyed experts in women’s football (Datson et al. 2021). They reported that meaningful changes in CMJ height are around 2.1 to 3.4 cm

Those numbers are larger than the standardised thresholds, so they set a more conservative (wider) equivalence range.

Applying this to our results

If we use Datson’s smallest estimate (±2.1 cm) as our equivalence bound, we ask:

➡️ Key takeaway:

Do the 90% confidence intervals for the pairwise differences lie entirely within −2.1 to +2.1 cm?

If the answer is yes (and in your data it is), then we can say the conditions are statistically — and practically — equivalent.:

This gives a much stronger conclusion than “non‑significant” differences.

References

Datson, Naomi, Lorenzo Lolli, Barry Drust, Greg Atkinson, Matthew Weston, and Warren Gregson. 2021. “Inter-Methodological Quantification of the Target Change for Performance Test Outcomes Relevant to Elite Female Soccer Players.” Science and Medicine in Football 6 (2): 248–61. https://doi.org/10.1080/24733938.2021.1942538.

Lakens, Daniël, Anne M. Scheel, and Peder M. Isager. 2018. “Equivalence Testing for Psychological Research: A Tutorial.” Advances in Methods and Practices in Psychological Science 1 (2): 259–69. https://doi.org/10.1177/2515245918770963.