Human Movement 25 Data Analysis

Aims and hypotheses

Our research aimed to build on previous studies and test the “constraints action hypothesis” which suggests internalizing focus of attention negatively effects performance, particular when a secondary task or distraction is present. We were interested in testing this in a basketball free throw scenario aiming to investigate if A: free throw performance was effected by focus of attention (external versus internal) and B if dual-task performance effected by focus of attention. As such we had too hypotheses:

  • Hypothesis 1: Free throw performance would be improved when throwing with an external focus of attention.

  • Hypothesis 2: The addition of a second task (dual-task condition) performance would have a greater negatively effect upon free-throw performance in the internal condition.

Variables of interest

  • Our independent variable throughout is attention focus (External versus internal)

  • For hypothesis 1 our dependent variable (outcome measure) is basketball score (out of 50)

  • For hypothesis 2 our dependent variable (outcome measure) is the difference in basketball score between dual and single-task (dual task score - single task score)

R: Script to meet the aims

Before you start

Make sure you have the following files downloaded from blackboard and saved into your working folder

  • Participant information file: Human Movement Participant Details.xlsx

  • Basketball data file: data_24.xlsx

Stage 1: Load the packages you require

Remember, you may need to install these first.

Stage 2: Set your working directory

Remember to set the work directory to your folder not mine!

setwd("~/Library/CloudStorage/OneDrive-TeessideUniversity/Work/Teaching/Human Movement/2025/Data")

Stage 3: Load your data

Below the code reads in our participant information data that was collect on the Microsoft from.

We need to import the data and make sure that columns for Age, Stature and Mass are number (the code read_excel("Human Movement Participant Details.xlsx") reads the data and the code mutate(across(c(Age, Stature, Mass), as.numeric)) turns those columns to numeric variables.

p_info <- read_excel("Human Movement Participant Details.xlsx") %>% 
  mutate(across(c(Age, Stature, Mass), as.numeric))

head(p_info)
# A tibble: 6 × 12
     Id `Start time`        `Completion time`   Email    Name  ID1   Sex     Age
  <dbl> <dttm>              <dttm>              <chr>    <lgl> <chr> <chr> <dbl>
1    42 2025-02-25 09:12:30 2025-02-25 09:16:37 anonymo… NA    P1    Male     21
2    43 2025-02-25 09:16:47 2025-02-25 09:18:11 anonymo… NA    P2    Fema…    20
3    44 2025-02-25 09:23:12 2025-02-25 09:24:49 anonymo… NA    P4    Fema…    21
4    45 2025-02-25 09:24:51 2025-02-25 09:26:32 anonymo… NA    P3    Male     20
5    46 2025-02-25 09:43:25 2025-02-25 09:44:19 anonymo… NA    P6    Male     21
6    47 2025-02-25 09:44:23 2025-02-25 09:45:44 anonymo… NA    P5    Male     19
# ℹ 4 more variables: Stature <dbl>, Mass <dbl>, `Limb dominance` <chr>,
#   `Basketball experience.How regularly have you played basketball` <chr>

We also need to import our basketball data.

We also need to turn our scores for each of the 10 throws into numbers and then we need to add them up. The code mutate(Total = rowSums(across(10:19))) is creating a new variable called “Total” by taking the sum of the columns 10 to 19 i.e. the sum of our 10 basketball shots.

data <- read_excel("data_24.xlsx") %>% 
  mutate(across(10:19, as.numeric))%>%
  mutate(Total = rowSums(across(10:19)))

head(data)
# A tibble: 6 × 20
     Id `Start time`        `Completion time`   Email Name  ID1   `Trial number`
  <dbl> <dttm>              <dttm>              <chr> <lgl> <chr> <chr>         
1     3 2025-02-25 09:30:50 2025-02-25 09:33:45 anon… NA    P1    Trial 1       
2     4 2025-02-25 09:34:16 2025-02-25 09:37:39 anon… NA    P1    Trial 2       
3    10 2025-02-25 09:55:50 2025-02-25 09:58:02 anon… NA    P1    Trial 3       
4    12 2025-02-25 10:00:32 2025-02-25 10:03:11 anon… NA    P1    Trial 4       
5    33 2025-02-25 12:07:07 2025-02-25 12:13:22 anon… NA    P10   Trial 1       
6    35 2025-02-25 12:16:02 2025-02-25 12:19:01 anon… NA    P10   Trial 2       
# ℹ 13 more variables: `Attention focus` <chr>, `Task condition` <chr>,
#   `Score.Throw 1` <dbl>, `Score.Throw 2` <dbl>, `Score.Throw  3` <dbl>,
#   `Score.Throw 4` <dbl>, `Score.Throw 5` <dbl>, `Score.Throw 6` <dbl>,
#   `Score.Throw 7` <dbl>, `Score.Throw 8` <dbl>, `Score.Throw 9` <dbl>,
#   `Score.Throw 10` <dbl>, Total <dbl>

Stage 4: Wrangle and tidy your data

Participant data

Let’s start by cleaning up our column names

p_info <- clean_names(p_info)

colnames(p_info)
 [1] "id"                                                            
 [2] "start_time"                                                    
 [3] "completion_time"                                               
 [4] "email"                                                         
 [5] "name"                                                          
 [6] "id1"                                                           
 [7] "sex"                                                           
 [8] "age"                                                           
 [9] "stature"                                                       
[10] "mass"                                                          
[11] "limb_dominance"                                                
[12] "basketball_experience_how_regularly_have_you_played_basketball"

Now let’s select and rename the columns as we wish. We will call this p_info_clean:

p_info_clean <- p_info %>% 
  select(c(id = "id1",
           sex,
           age,
           stature,
           mass,
           limb_dominance,
           bb_experience = "basketball_experience_how_regularly_have_you_played_basketball"))

head(p_info_clean)
# A tibble: 6 × 7
  id    sex      age stature  mass limb_dominance bb_experience         
  <chr> <chr>  <dbl>   <dbl> <dbl> <chr>          <chr>                 
1 P1    Male      21    188.  60   Left handed    Very little experience
2 P2    Female    20    169   66   Right handed   No experience         
3 P4    Female    21    164   70.6 Right handed   Very little experience
4 P3    Male      20    178   72.6 Right handed   Very little experience
5 P6    Male      21    176   75   Right handed   Very little experience
6 P5    Male      19    183   80   Right handed   Very little experience

Participant summary statistics

This data is summarizing our sample from the population of Sport and Exercise Students, so we need to create some summary statistics to put in our participant information section of our methods.

For our numeric values (age, stature, mass) we need mean and standard deviations

mean <- p_info_clean %>% 
  summarize(across(c(3:5), mean, na.rm = TRUE))

# For standard deviation
sd <- p_info_clean %>% 
  summarize(across(c(3:5), ~ sd(.x, na.rm = TRUE)))

For our categorical values (sex, limb_dominance, bb_experience) the count and proportion are ideal. Here is example code, you will need to repeat for the other variables.

# sex

sex <- fct_count(p_info_clean$sex, prop = TRUE)

Basketball data

Remember we called our basketball data “data”. As above you will need to clean your names:

data <- clean_names(data)

Then we will need to select the variables we need (participant id, attention focus (external or internal, & task condition (single or dual) & finally our total score out of 50 (total):

data_clean <- data %>% 
  select(c(id = "id1",
           attention_focus,
           task_condition,
           total))

head(data_clean)
# A tibble: 6 × 4
  id    attention_focus task_condition total
  <chr> <chr>           <chr>          <dbl>
1 P1    Internal        Single-task        0
2 P1    Internal        Dual-task          0
3 P1    External        Single-task        0
4 P1    External        Dual-task          3
5 P10   Internal        Single-task       25
6 P10   Internal        Dual-task         15

Basketball summary statistics

The code below groups data by task condition and attention focus and gives us our means. You will also need to amend and run this code to give you a standard deviation.

data_clean %>%
  group_by(task_condition, attention_focus) %>%
  summarise(mean = mean(total, na.rm = TRUE))
`summarise()` has grouped output by 'task_condition'. You can override using
the `.groups` argument.
# A tibble: 4 × 3
# Groups:   task_condition [2]
  task_condition attention_focus  mean
  <chr>          <chr>           <dbl>
1 Dual-task      External         20.8
2 Dual-task      Internal         21.4
3 Single-task    External         22.2
4 Single-task    Internal         21.7

Hypothesis 1 data

At the moment we have two data points in each attention focus. For hypothesis 1 we are simply comparing external and internal so we need to reduce this to one value for external and 1 value for internal. To do this we will take the mean of the single-task and dual-task scores in each condition for each person.

The code group_by(id, attention_focus) groups the data by id and attention_focus and then we can use summarise to get a mean for “total” summarise(mean_total = mean(total, na.rm = TRUE)) - we are calling this mean_total

hyp_1 <- data_clean %>%
  group_by(id, attention_focus) %>%
  summarise(mean_total = mean(total, na.rm = TRUE))
`summarise()` has grouped output by 'id'. You can override using the `.groups`
argument.
head(hyp_1)
# A tibble: 6 × 3
# Groups:   id [3]
  id    attention_focus mean_total
  <chr> <chr>                <dbl>
1 P1    External               1.5
2 P1    Internal               0  
3 P10   External              17.5
4 P10   Internal              20  
5 P11   External              24  
6 P11   Internal              26.5

Hypothesis 2 data

For hypothesis 2 we need to know the difference between dual and single task conditions:

  # This code takes one away from the other:

hyp_2 <- data_clean %>%
  
  # group by attention_focus and id to ensure correct matching
  group_by(attention_focus, id) %>%
  
  # mutate new variable called "dif" that takes Single-task away from Dual-task 
  
  mutate(dif = if_else(task_condition == "Dual-task", total[task_condition== "Dual-task"] - total[task_condition== "Single-task"] , NA_real_)) %>%
  ungroup() %>%
  filter(!is.na(dif)) %>%
  
  # we don't need these columns now so we can get rid of them here
  select(-c(task_condition))

Hypothesis 2 summary statistics

This time we use the same code as above but this time we only need to group by attention focus. We also need to take the mean and sd of “dif” (i.e. the difference between single and dual task. The code below is for the standard deviation - you will also need to repeat it for the mean.

hyp_2 %>%
  group_by(attention_focus) %>%
  summarise(sd = sd(dif, na.rm = TRUE))
# A tibble: 2 × 2
  attention_focus    sd
  <chr>           <dbl>
1 External         8.13
2 Internal         6.46

Stage 5 Visualise our data

So now we have two data sets to work with

  1. hyp_1 to test hypothesis 1

  2. hyp_2 to test hypothesis 2

Visualise hypothesis 1 data

Remember we need to check to see if our data is “normally distributed” - see previous work sheet.

# check for normality hypothesis 1 data

hist(hyp_1$mean_total)

qqPlot(hyp_1$mean_total)

[1]  2 22
shapiro.test(hyp_1$mean_total)

    Shapiro-Wilk normality test

data:  hyp_1$mean_total
W = 0.94012, p-value = 0.05123

You might want to make a nicer histogram and density plot in ggplot - if you do see previous workbook.

# summarise through box plot for example?

ggplot(data = hyp_1, aes(x = attention_focus, y = mean_total))+ 
  geom_boxplot() + 
  geom_point() +
  theme_classic()

You will want to make a nicer graph, by filling the boxes, adding axis titles, maybe a plot title? You might want to a theme (theme_classic() or theme_minimal(). You might even want to specify your colors to put your own stamp on them. You will need to work this out yourself.

Here is an example:

Visualise hypothesis 1 data

You might want to look at total score here but you will definitely want to graph the difference between dual and single-task.

ggplot(data = hyp_2, aes(x = attention_focus, y = dif))+ 
  geom_boxplot() + 
  geom_point() 

Think hard about what your plots show you. What do you think the results of your statistical tests will show?

Stage 5: Run your statistical tests

Hypothesis 1:

Pivot data wider to make sure it is “paired” correctly

First, the data is currently in “long format” external and internal stacked on top of each other and we need our data to be paired for each participant so we need to convert to “wide format”. To do this we use the function pivot_wider()

hyp_1_wide <- hyp_1 %>%
  pivot_wider(names_from = attention_focus, values_from = c(mean_total) )

head(hyp_1_wide )
# A tibble: 6 × 3
# Groups:   id [6]
  id    External Internal
  <chr>    <dbl>    <dbl>
1 P1         1.5      0  
2 P10       17.5     20  
3 P11       24       26.5
4 P12       18.5     21  
5 P13       21.5     24  
6 P14       21.5     22.5

Now we can run our t-test:

t <- t.test(hyp_1_wide$Internal, hyp_1_wide$External, paired = TRUE)

But…….

was our data normally distributed? Possibly? Probably not. We could use a non-parametric alternative to a t-test (Wilcoxon test). As you will see if you run the test we don’t get too much useful information outside of a p-value and test score.

Or we could use a robust t-test based on the work of Field and Wilcox (2017) here this might be a good option give some outliers are skewing our data a little. This basically takes the “trimmed mean difference” rather than the actual mean difference. It trim’s the outlier keeping 80% of the data, trimming 20% (tr = 0.2).

# traditional non-parametric approach Wilcocen test:

w <- wilcox.test(hyp_1_wide$Internal, hyp_1_wide$External, paired = TRUE)
Warning in wilcox.test.default(hyp_1_wide$Internal, hyp_1_wide$External, :
cannot compute exact p-value with ties
Warning in wilcox.test.default(hyp_1_wide$Internal, hyp_1_wide$External, :
cannot compute exact p-value with zeroes
# or run robust t-test on the "trimmed means" to deal with our outliers:

rob_t <- yuend(hyp_1_wide$Internal, hyp_1_wide$External, tr = 0.2) 

Field, A.P. and Wilcox, R.R., 2017. Robust statistical methods: A primer for clinical psychology and experimental psychopathology researchers. Behaviour research and therapy, 98, pp.19-38.

Hypothesis 2:

So for hypothesis 2 we do the same thing but this data was normally distributed so just a simple t-test is needed:

# for hypothesis 2: note we need the data in "wide_format"

dif_wide <- pivot_wider(hyp_2, names_from = attention_focus, values_from = c(dif, total) )

# run paired samples t-test:

dif_t <- t.test(dif_wide$dif_Internal, dif_wide$dif_External, paired = TRUE)

You can now report the mean difference in arbitrary units (round to 3 significant digits), the 95% confidence intervals and the p-value (exact if possible).

e.g. NOTE THIS IS NOT THE CORRECT DATA

No significant differences were found in the difference between dual and single-task between attention foci (mean difference, 1.5 AU, 95% confidence intervals -4.45 to 6.55 AU, p = 0.7589.