# load packages you need ----
library(tidyverse)
library(janitor)
library(readxl)Human Movement 2026 Data Analysis
Aims and Hypothesis
Our research aimed to build on previous studies and test if external cuing with directional foci (away and towards) can improve jump performance over a control condition.
Hypothesis 1: External cues will improve jump height (and or other variables) above the control.
Null hypothesis: There will be no difference between cuing conditions.
Hypothesis 2: You may have made another hypothesis for example you may think that one of the two external cuing conditions will improve a specific outcome measure.
Before you go any further - write down your hypotheses if you haven’t already.
Data
We have collected data through a Microsoft Form and through VALD Force Decks. As such we need to load and tidy 2 files. The first is our participant information sheet (Participant Information.xlsx) and our second is our jump data (cmj.csv)
This workbook is going to provide some code to help you read in these two files, organise the data that you are interested in and provide some summary statistics.
Reading in your Participant Information Data
First things first, we will need to use the following three packages to read our data into R-Studio, Organise our data before summarising in plots or tables.
We will then need to use readxl to read in the data: The code -read_excel("Participant Information.xlsx") will read the document in, and we will call this p_info.
There are a few things we might want to do with this data though before we look at it:
Read the data into R
read_excel("Participant Information.xlsx")clean the column names (we use the janitor package for this):
clean_names()We have participants date of birth but not their age - so we need to create a new column that uses dob and date of testing to work out age at testing. We will call this age and we will do this using the mutate() function from the tidyverse package.
mutate(age = as.integer(interval(dob, date) / years(1)))Finally we want to select the rows we are interested in. Again we can use the tidyverse packages to do this and the function select():
select( id = id1, sex, experience = jump_experience_experience, stature, age)
We could run all these as separate functions but it is neater to combine them into a “pipe” and we use this %>% at the end of each function to do this.
So the code you need to read your data in, tidy it up and select the correct columns is as follows:
# read in the participant information data ----
p_info <-read_excel("Participant Information.xlsx") %>%
clean_names()%>%
mutate(age = as.integer(interval(dob, date) / years(1))) %>%
select( id = id1,
sex,
experience = jump_experience_experience,
stature,
age)
head(p_info)# A tibble: 6 × 5
id sex experience stature age
<chr> <chr> <chr> <dbl> <int>
1 P5 Male Moderately experienced 183 19
2 P6 Male Moderately experienced 173 21
3 P3 Male Moderately experienced 172. 20
4 P7 Male No experience at all 171 21
5 P4 Male A little experience 184. 22
6 P8 Male A little experience 192. 20
You can then run the code view(p_info) to view your data. You’ll notice we have almost everything we need, however participant mass is recorded on our jump data - we’ll get to that in a minute!
Reading in your CMJ Data
For our jump data we are reading a .csv file so we use read_csv(). I will read it in without making any changes to the file first (apart from cleaning the column names), you can run the code below and then as above use view() to see your data view(cmj)
# read in the jump data ----
cmj<- read_csv("cmj.csv") %>%
clean_names() Before we go any further, let’s create a “data_frame” which has participant id and their body mass. We can then join it to our p_info data to complete it.
The code below does the following
We name our new data “mass” and then write
<- cmjdata_frame “cmj” to indicate we are starting with the “cmj” data frame.We then use select to select the column name, but we will call it “id” so it matches our participant information data. We will also select bw_kg but call that mass_kg.
We have a slight problem in that we have weighted our participants on every trial so have three mass values and we only need one. To do this we will use group_by() to group by id and then summarise() to get the mean of each participants mass.
As above we have used a pipe function to build our code.
# pull out mass so you can add that into your participant information data frame
mass<- cmj %>%
select(id = name, mass_kg = bw_kg) %>%
group_by(id)%>%
summarise(mass_kg = mean(mass_kg))
head(mass)# A tibble: 6 × 2
id mass_kg
<chr> <dbl>
1 P1 62.3
2 P10 82.3
3 P11 96.4
4 P12 80.8
5 P2 53.2
6 P3 64.7
Finally, we need to merge this data with our p_info data and we’ll use a “left_join” to do that. Notice in the code we name the two data sets we are joining together and we are doing this by “id”. This ensures that the correct mass is assigned to the correct
merged_p_info <- left_join(p_info, mass, by = "id") # join mass to p_info
head(merged_p_info)# A tibble: 6 × 6
id sex experience stature age mass_kg
<chr> <chr> <chr> <dbl> <int> <dbl>
1 P5 Male Moderately experienced 183 19 81.5
2 P6 Male Moderately experienced 173 21 79.7
3 P3 Male Moderately experienced 172. 20 64.7
4 P7 Male No experience at all 171 21 102.
5 P4 Male A little experience 184. 22 134.
6 P8 Male A little experience 192. 20 83.1
We know have a complete participant information data set that we can summarise.
Summary statistics for participant information
We need to summarise who are participatns are. In your write up you will say how you recruited 12 sport and exercise science students who were studying a Human Movement module. However, we want to be able to give information as to their age, stature and mass and typically we’d present this as mean ± standard deviation.
The following code pulls these statistics out for you and saves it as a dataframe called “summary_p_info”. Here we use summarise() and tje code is very straightforward to understand:
summary_p_info <- merged_p_info %>%
summarise(stature_mean = mean(stature),
stature_sd = sd(stature),
mass_mean = mean(mass_kg),
mass_sd = sd(mass_kg),
age_mean = mean(age),
age_sd = sd(age))
head(summary_p_info)# A tibble: 1 × 6
stature_mean stature_sd mass_mean mass_sd age_mean age_sd
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 178. 8.93 83.5 20.9 24.5 8.08
These values can be written straight into your participant information section in your methods.
You might also want to summarise the number of males and females in your sample and their jump experience. To do this we can use fct_count():
fct_count(p_info$sex)# A tibble: 2 × 2
f n
<fct> <int>
1 Female 1
2 Male 11
fct_count(p_info$experience)# A tibble: 4 × 2
f n
<fct> <int>
1 A little experience 3
2 Moderately experienced 7
3 No experience at all 1
4 Very experienced 1
Summary statistics for CMJ data
First we need to clean our CMJ data and select the variables of interest, for everyone this will be jump height, but you might also want to choose some others. You will need to do your research and justify these as exploratory variables.
If you want to look at your column names use colnames() as such:
colnames(cmj) [1] "name"
[2] "condition"
[3] "test_type"
[4] "date"
[5] "time"
[6] "bw_kg"
[7] "reps"
[8] "tags"
[9] "additional_load_kg"
[10] "jump_height_imp_mom_cm"
[11] "jump_height_flight_time_cm"
[12] "countermovement_depth_cm"
[13] "contraction_time_ms"
[14] "rsi_modified_imp_mom_m_s"
[15] "rsi_modified_m_s"
[16] "peak_landing_force_n_l"
[17] "peak_landing_force_n_r"
[18] "concentric_mean_force_n_l"
[19] "concentric_mean_force_n_r"
[20] "eccentric_deceleration_rfd_bm_n_s_kg"
[21] "peak_power_bm_w_kg"
[22] "eccentric_peak_power_bm_w_kg"
[23] "eccentric_peak_velocity_m_s"
[24] "bodyweight_in_kilograms_kg"
[25] "concentric_impulse_n_s"
We will use the same code as above but we will choose
cmj<- clean_names(cmj) %>%
select(id = name,
condition,
cmj_height = jump_height_imp_mom_cm,
con_impulse = concentric_impulse_n_s,
cont_time = contraction_time_ms,
rsi_mod = rsi_modified_imp_mom_m_s) Visualise the main outcome variable first
Let’s take a look at our data, a nice way to do this is summarise the data in a box and whisker plot. This will give us information as to the median and interquartile range (box) and the range of out data (whiskers). These whiskers exclude any extreme values that are termed outliers however, if there are any they will be presented as a point on the plot.
To do this we will use the package ggplot which is part of the tidyverse set of packages.
ggplots follow a similar structure where the first line sets the canvas e.g.
ggplot(data = cmj, aes(x = condition, y = cmj_height))The code above tells ggplot to plot data from the data frame “cmj” and to plot “condition” on the x-axis and “cmj_height” on the y-axis. but we haven’t told it what type of plot (geom) to apply. So let’s do that now:
ggplot(data = cmj, aes(x = condition, y = cmj_height)) +
geom_boxplot()So now we have told it to use a box plot, but would it not be useful to see the actual data points too? To do this we’ll add another line to out plot:
ggplot(data = cmj, aes(x = condition, y = cmj_height)) +
geom_boxplot() +
geom_point()We might now want to add a fill color to each box, or add a plot title and specific text for the x- and y- axis labels. We might also like to apply a theme to the plot. Below is an example I came up with but you will be able to personalise your plots.
Summarise with mean and standard deviation
Our box plot visually shows no outliers and the median sits central to the box which sits nicely within the whiskers. This would all suggest our data meets the assumption of normal distribution and is not influenced by outliers. Whilst we might want to veryfy this with some distribution checks (in the next workbook) I think we can assume it is approriate to summarise these data with means and standard deviations.
So, we are going to adapt code from above to do this for us, we need to group our data by condition this time (away, control and towards) and then we can summarise for each condition.
cmj_summary <- cmj %>%
group_by(condition) %>%
summarise(cmj_height_mean = mean(cmj_height),
cmj_height_sd = sd(cmj_height),
con_impulse_mean = mean(con_impulse),
con_impulse_sd = sd(con_impulse),
cont_time_mean = mean(cont_time),
cont_time_sd = sd(cont_time),
cmj_rsi_mean = mean(rsi_mod),
cmj_rsi_sd = sd(rsi_mod))
head(cmj_summary)# A tibble: 3 × 9
condition cmj_height_mean cmj_height_sd con_impulse_mean con_impulse_sd
<chr> <dbl> <dbl> <dbl> <dbl>
1 Away 29.3 4.18 203. 47.6
2 Control 28.6 3.48 202. 49.3
3 Towards 29.3 4.20 203. 46.3
# ℹ 4 more variables: cont_time_mean <dbl>, cont_time_sd <dbl>,
# cmj_rsi_mean <dbl>, cmj_rsi_sd <dbl>