setwd("~/Library/CloudStorage/OneDrive-TeessideUniversity/Work/Teaching/Human Movement/2025/Data")Human Movement 25 Data Analysis
Aims and hypotheses
Our research aimed to build on previous studies and test the “constraints action hypothesis” which suggests internalizing focus of attention negatively effects performance, particular when a secondary task or distraction is present. We were interested in testing this in a basketball free throw scenario aiming to investigate if A: free throw performance was effected by focus of attention (external versus internal) and B if dual-task performance effected by focus of attention. As such we had too hypotheses:
Hypothesis 1: Free throw performance would be improved when throwing with an external focus of attention.
Hypothesis 2: The addition of a second task (dual-task condition) performance would have a greater negatively effect upon free-throw performance in the internal condition.
Variables of interest
Our independent variable throughout is attention focus (External versus internal)
For hypothesis 1 our dependent variable (outcome measure) is basketball score (out of 50)
For hypothesis 2 our dependent variable (outcome measure) is the difference in basketball score between dual and single-task (dual task score - single task score)
R: Script to meet the aims
Before you start
Make sure you have the following files downloaded from blackboard and saved into your working folder
Participant information file: Human Movement Participant Details.xlsx
Basketball data file: data_24.xlsx
Stage 1: Load the packages you require
Remember, you may need to install these first.
Stage 2: Set your working directory
Remember to set the work directory to your folder not mine!
Stage 3: Load your data
Below the code reads in our participant information data that was collect on the Microsoft from.
We need to import the data and make sure that columns for Age, Stature and Mass are number (the code read_excel("Human Movement Participant Details.xlsx") reads the data and the code mutate(across(c(Age, Stature, Mass), as.numeric)) turns those columns to numeric variables.
p_info <- read_excel("Human Movement Participant Details.xlsx") %>%
mutate(across(c(Age, Stature, Mass), as.numeric))
head(p_info)# A tibble: 6 × 12
Id `Start time` `Completion time` Email Name ID1 Sex Age
<dbl> <dttm> <dttm> <chr> <lgl> <chr> <chr> <dbl>
1 42 2025-02-25 09:12:30 2025-02-25 09:16:37 anonymo… NA P1 Male 21
2 43 2025-02-25 09:16:47 2025-02-25 09:18:11 anonymo… NA P2 Fema… 20
3 44 2025-02-25 09:23:12 2025-02-25 09:24:49 anonymo… NA P4 Fema… 21
4 45 2025-02-25 09:24:51 2025-02-25 09:26:32 anonymo… NA P3 Male 20
5 46 2025-02-25 09:43:25 2025-02-25 09:44:19 anonymo… NA P6 Male 21
6 47 2025-02-25 09:44:23 2025-02-25 09:45:44 anonymo… NA P5 Male 19
# ℹ 4 more variables: Stature <dbl>, Mass <dbl>, `Limb dominance` <chr>,
# `Basketball experience.How regularly have you played basketball` <chr>
We also need to import our basketball data.
We also need to turn our scores for each of the 10 throws into numbers and then we need to add them up. The code mutate(Total = rowSums(across(10:19))) is creating a new variable called “Total” by taking the sum of the columns 10 to 19 i.e. the sum of our 10 basketball shots.
data <- read_excel("data_24.xlsx") %>%
mutate(across(10:19, as.numeric))%>%
mutate(Total = rowSums(across(10:19)))
head(data)# A tibble: 6 × 20
Id `Start time` `Completion time` Email Name ID1 `Trial number`
<dbl> <dttm> <dttm> <chr> <lgl> <chr> <chr>
1 3 2025-02-25 09:30:50 2025-02-25 09:33:45 anon… NA P1 Trial 1
2 4 2025-02-25 09:34:16 2025-02-25 09:37:39 anon… NA P1 Trial 2
3 10 2025-02-25 09:55:50 2025-02-25 09:58:02 anon… NA P1 Trial 3
4 12 2025-02-25 10:00:32 2025-02-25 10:03:11 anon… NA P1 Trial 4
5 33 2025-02-25 12:07:07 2025-02-25 12:13:22 anon… NA P10 Trial 1
6 35 2025-02-25 12:16:02 2025-02-25 12:19:01 anon… NA P10 Trial 2
# ℹ 13 more variables: `Attention focus` <chr>, `Task condition` <chr>,
# `Score.Throw 1` <dbl>, `Score.Throw 2` <dbl>, `Score.Throw 3` <dbl>,
# `Score.Throw 4` <dbl>, `Score.Throw 5` <dbl>, `Score.Throw 6` <dbl>,
# `Score.Throw 7` <dbl>, `Score.Throw 8` <dbl>, `Score.Throw 9` <dbl>,
# `Score.Throw 10` <dbl>, Total <dbl>
Stage 4: Wrangle and tidy your data
Participant data
Let’s start by cleaning up our column names
p_info <- clean_names(p_info)
colnames(p_info) [1] "id"
[2] "start_time"
[3] "completion_time"
[4] "email"
[5] "name"
[6] "id1"
[7] "sex"
[8] "age"
[9] "stature"
[10] "mass"
[11] "limb_dominance"
[12] "basketball_experience_how_regularly_have_you_played_basketball"
Now let’s select and rename the columns as we wish. We will call this p_info_clean:
p_info_clean <- p_info %>%
select(c(id = "id1",
sex,
age,
stature,
mass,
limb_dominance,
bb_experience = "basketball_experience_how_regularly_have_you_played_basketball"))
head(p_info_clean)# A tibble: 6 × 7
id sex age stature mass limb_dominance bb_experience
<chr> <chr> <dbl> <dbl> <dbl> <chr> <chr>
1 P1 Male 21 188. 60 Left handed Very little experience
2 P2 Female 20 169 66 Right handed No experience
3 P4 Female 21 164 70.6 Right handed Very little experience
4 P3 Male 20 178 72.6 Right handed Very little experience
5 P6 Male 21 176 75 Right handed Very little experience
6 P5 Male 19 183 80 Right handed Very little experience
Participant summary statistics
This data is summarizing our sample from the population of Sport and Exercise Students, so we need to create some summary statistics to put in our participant information section of our methods.
For our numeric values (age, stature, mass) we need mean and standard deviations
mean <- p_info_clean %>%
summarize(across(c(3:5), mean, na.rm = TRUE))
# For standard deviation
sd <- p_info_clean %>%
summarize(across(c(3:5), ~ sd(.x, na.rm = TRUE)))For our categorical values (sex, limb_dominance, bb_experience) the count and proportion are ideal. Here is example code, you will need to repeat for the other variables.
# sex
sex <- fct_count(p_info_clean$sex, prop = TRUE)Basketball data
Remember we called our basketball data “data”. As above you will need to clean your names:
data <- clean_names(data)Then we will need to select the variables we need (participant id, attention focus (external or internal, & task condition (single or dual) & finally our total score out of 50 (total):
data_clean <- data %>%
select(c(id = "id1",
attention_focus,
task_condition,
total))
head(data_clean)# A tibble: 6 × 4
id attention_focus task_condition total
<chr> <chr> <chr> <dbl>
1 P1 Internal Single-task 0
2 P1 Internal Dual-task 0
3 P1 External Single-task 0
4 P1 External Dual-task 3
5 P10 Internal Single-task 25
6 P10 Internal Dual-task 15
Basketball summary statistics
The code below groups data by task condition and attention focus and gives us our means. You will also need to amend and run this code to give you a standard deviation.
data_clean %>%
group_by(task_condition, attention_focus) %>%
summarise(mean = mean(total, na.rm = TRUE))`summarise()` has grouped output by 'task_condition'. You can override using
the `.groups` argument.
# A tibble: 4 × 3
# Groups: task_condition [2]
task_condition attention_focus mean
<chr> <chr> <dbl>
1 Dual-task External 20.8
2 Dual-task Internal 21.4
3 Single-task External 22.2
4 Single-task Internal 21.7
Hypothesis 1 data
At the moment we have two data points in each attention focus. For hypothesis 1 we are simply comparing external and internal so we need to reduce this to one value for external and 1 value for internal. To do this we will take the mean of the single-task and dual-task scores in each condition for each person.
The code group_by(id, attention_focus) groups the data by id and attention_focus and then we can use summarise to get a mean for “total” summarise(mean_total = mean(total, na.rm = TRUE)) - we are calling this mean_total
hyp_1 <- data_clean %>%
group_by(id, attention_focus) %>%
summarise(mean_total = mean(total, na.rm = TRUE))`summarise()` has grouped output by 'id'. You can override using the `.groups`
argument.
head(hyp_1)# A tibble: 6 × 3
# Groups: id [3]
id attention_focus mean_total
<chr> <chr> <dbl>
1 P1 External 1.5
2 P1 Internal 0
3 P10 External 17.5
4 P10 Internal 20
5 P11 External 24
6 P11 Internal 26.5
Hypothesis 2 data
For hypothesis 2 we need to know the difference between dual and single task conditions:
# This code takes one away from the other:
hyp_2 <- data_clean %>%
# group by attention_focus and id to ensure correct matching
group_by(attention_focus, id) %>%
# mutate new variable called "dif" that takes Single-task away from Dual-task
mutate(dif = if_else(task_condition == "Dual-task", total[task_condition== "Dual-task"] - total[task_condition== "Single-task"] , NA_real_)) %>%
ungroup() %>%
filter(!is.na(dif)) %>%
# we don't need these columns now so we can get rid of them here
select(-c(task_condition))Hypothesis 2 summary statistics
This time we use the same code as above but this time we only need to group by attention focus. We also need to take the mean and sd of “dif” (i.e. the difference between single and dual task. The code below is for the standard deviation - you will also need to repeat it for the mean.
hyp_2 %>%
group_by(attention_focus) %>%
summarise(sd = sd(dif, na.rm = TRUE))# A tibble: 2 × 2
attention_focus sd
<chr> <dbl>
1 External 8.13
2 Internal 6.46
Stage 5 Visualise our data
So now we have two data sets to work with
hyp_1 to test hypothesis 1
hyp_2 to test hypothesis 2
Visualise hypothesis 1 data
Remember we need to check to see if our data is “normally distributed” - see previous work sheet.
# check for normality hypothesis 1 data
hist(hyp_1$mean_total)qqPlot(hyp_1$mean_total)[1] 2 22
shapiro.test(hyp_1$mean_total)
Shapiro-Wilk normality test
data: hyp_1$mean_total
W = 0.94012, p-value = 0.05123
You might want to make a nicer histogram and density plot in ggplot - if you do see previous workbook.
# summarise through box plot for example?
ggplot(data = hyp_1, aes(x = attention_focus, y = mean_total))+
geom_boxplot() +
geom_point() +
theme_classic()You will want to make a nicer graph, by filling the boxes, adding axis titles, maybe a plot title? You might want to a theme (theme_classic() or theme_minimal(). You might even want to specify your colors to put your own stamp on them. You will need to work this out yourself.
Here is an example:
Visualise hypothesis 1 data
You might want to look at total score here but you will definitely want to graph the difference between dual and single-task.
ggplot(data = hyp_2, aes(x = attention_focus, y = dif))+
geom_boxplot() +
geom_point() Think hard about what your plots show you. What do you think the results of your statistical tests will show?
Stage 5: Run your statistical tests
Hypothesis 1:
Pivot data wider to make sure it is “paired” correctly
First, the data is currently in “long format” external and internal stacked on top of each other and we need our data to be paired for each participant so we need to convert to “wide format”. To do this we use the function pivot_wider()
hyp_1_wide <- hyp_1 %>%
pivot_wider(names_from = attention_focus, values_from = c(mean_total) )
head(hyp_1_wide )# A tibble: 6 × 3
# Groups: id [6]
id External Internal
<chr> <dbl> <dbl>
1 P1 1.5 0
2 P10 17.5 20
3 P11 24 26.5
4 P12 18.5 21
5 P13 21.5 24
6 P14 21.5 22.5
Now we can run our t-test:
t <- t.test(hyp_1_wide$Internal, hyp_1_wide$External, paired = TRUE)But…….
was our data normally distributed? Possibly? Probably not. We could use a non-parametric alternative to a t-test (Wilcoxon test). As you will see if you run the test we don’t get too much useful information outside of a p-value and test score.
Or we could use a robust t-test based on the work of Field and Wilcox (2017) here this might be a good option give some outliers are skewing our data a little. This basically takes the “trimmed mean difference” rather than the actual mean difference. It trim’s the outlier keeping 80% of the data, trimming 20% (tr = 0.2).
# traditional non-parametric approach Wilcocen test:
w <- wilcox.test(hyp_1_wide$Internal, hyp_1_wide$External, paired = TRUE)Warning in wilcox.test.default(hyp_1_wide$Internal, hyp_1_wide$External, :
cannot compute exact p-value with ties
Warning in wilcox.test.default(hyp_1_wide$Internal, hyp_1_wide$External, :
cannot compute exact p-value with zeroes
# or run robust t-test on the "trimmed means" to deal with our outliers:
rob_t <- yuend(hyp_1_wide$Internal, hyp_1_wide$External, tr = 0.2) Field, A.P. and Wilcox, R.R., 2017. Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour research and therapy, 98, pp.19-38.
Hypothesis 2:
So for hypothesis 2 we do the same thing but this data was normally distributed so just a simple t-test is needed:
# for hypothesis 2: note we need the data in "wide_format"
dif_wide <- pivot_wider(hyp_2, names_from = attention_focus, values_from = c(dif, total) )
# run paired samples t-test:
dif_t <- t.test(dif_wide$dif_Internal, dif_wide$dif_External, paired = TRUE)You can now report the mean difference in arbitrary units (round to 3 significant digits), the 95% confidence intervals and the p-value (exact if possible).
e.g. NOTE THIS IS NOT THE CORRECT DATA
No significant differences were found in the difference between dual and single-task between attention foci (mean difference, 1.5 AU, 95% confidence intervals -4.45 to 6.55 AU, p = 0.7589.