This week's coding goals

This week, I had several goals:

  1. I wanted to finalise what questions I would answer in my exploratory analyses
  2. Answer most of the questions, so that next week I can focus on tidying up my verification report
  3. Figure out how to knit my RMarkdown to pdf or word.

How did I go?

Question 1

Attempt 1

Last week, I answered the question: Do the number of years spent in university affect students' bias scores?. I looked at the average bias scores recorded at each of the 4 IAT timepoints, averaged over the cued and uncued conditions. I visualised my plot as a line graph with 4 different coloured lines, connoting the number of years spent in university (0, 1, 2, or 3). The x-axis had the 4 IAT timepoints (baseline, prenap, postnap, and one-week delay), and the y-axis had the participants' average bias scores.

However, the feedback for last week's learning log was that the question may not have necessarily been well-thought out as the premises for it did not make sense. Thus, I decided to start from the beginning and reread the paper. My revised question became: Do the number of years spent in university influence the effectiveness of targeted memory reactivation (TMR) for reducing implicit biases?. This question will look at the differences between the prenap (post-counterbias training) and postnap (after TMR) IAT scores. I'll visualise this using a column graph to see if there are any differences between the 4 year groups.

If TMR proves particularly effective for a certain year group, there can be further research done on this year group to determine why or how TMR works so effectively with them - for example, it may be a particular culture or demographic within this year group that allows for TMR to work so well with them. Further, it allows for research efforts to be redirected to refining or discovering more effective techniques for the other year groups. In all, this could have real-world implications for the use of TMR in education, the workforce and more.

Preliminaries

load packages

library(tidyverse)
library(readspss) 
library(ggplot2)
library(janitor)
library(plotrix)
library(gt)

Read in the data

cleandata <- read_csv("cleandata.csv")

Figuring out what data I need

First, I need to calculate the changes in implicit bias levels at the immediate tests (i.e. from pre-to post nap).

Since, I'm only looking for the effectiveness of TMR, I'm only looking at the cued condition.

Thus, I need to calculate the difference between pre-nap bias and post nap bias scores for each of the 4 age groups. This difference is encapsulated in the variable postnap_change_cued.

Descriptive statistics

To create my plot, I need to first calculate the means and standard errors for their difference in bias scores for each of the 4 year groups. I did this by creating a new variable year_summary_postnapchange using <-. <- assigns a value (given on the right of the symbol) to a name (i.e. year_summary_postnapchange). Then, the data cleandata is selected.

  • The pipe operator %>% is used to chain multiple methods into a single statement, without having to create and store new variables. It does this by taking the output of one statement and making it the input of the next statement.
    • For example, the first line of code is the first statement. It creates a new variable year_summary_postnapchange that includes all data from cleandata. The pipe operator %>% takes that output, and uses it for the input of the next line of code, and so forth.
  • group_by() is from the dplyr package. It allows for an existing data tibble (i.e. cleandata) to be converted into a grouped tibble according to the variable selected in the brackets. In this case, we are grouping by General_1_UniYears (number of years spent in University).
  • summarise() is from the dplyr package. It allows for summary statistics to be created in a new dataframe. It will create a column for each grouping variable (in this case, we only have one: General_1_UniYears) and a column for each of the summary statistics I specified (mean, sd, n, se)
year_summary_postnapchange <- cleandata %>%
  group_by(General_1_UniYears) %>% 
  summarise(mean = mean(postnap_change_cued),
            sd = sd(postnap_change_cued),
            n = n(),
            se = sd/sqrt(n))

Now that I've calculated my needed values, I need to put it into a table. Again, I've used the gt() package to do this.

  • I first specify what dataframe I'll use year_summary_postnapchange then use the pipe operator %>% to use it as the input for my gt() package.
  • tab_header is from gt() package and adds a table header to the gt() table. This function allows for a title and a subtitle so I have to specify that I'm creating a title. The title text is placed in quotation marks.
    • md() is from the gt() package. It allows for formatting of text so I can use bolded, or italicised fonts etc.
  • fmt_number() is from gt() package and controls the formatting of numeric values.
    • columns = specifies how to format the columns. vars is similar to select() in that it selects the variables that are needed. Combined, these two arguments specify that I want mean, sd and se as my columns.
    • decimals = specifies how many decimal points I want displayed in my table.
  • cols_label is from the gt() package. It lets me relabel my columns. I first have to specify the original variable name (e.g. General_1_UniYears) and then after =, place my new label in quotation marks.
year_summary_postnapchange %>% 
  gt() %>% 
  tab_header(title = md("**Change in implicit bias levels at the immediate test for each year group**")) %>% 
  fmt_number(
    columns = vars(mean, sd, se),
    decimals = 2
  ) %>% 
  cols_label(General_1_UniYears = "Number of Years at University", 
             mean = "Mean", 
             sd = "SD", 
             n = "n",
             se = "SE")
Change in implicit bias levels at the immediate test for each year group
Number of Years at University Mean SD n SE
0 −0.05 0.53 11 0.16
1 0.24 0.75 4 0.38
2 0.33 0.56 10 0.18
3 −0.12 0.28 6 0.11

Note: sample sizes are quite small for each condition.

Visualisation

Now that the descriptive statistics have been calculated, it's time to create the plot First, I have to create the dataframe for the figure.

  • condition = defines the x-axis, which will be the 4 different age groups
  • bias_change defines the y-axis
  • stderror defines the data points for the standard errors
  • Data = data.frame translates this into a dataframe named data where the relevant groups in the brackets are included in the data.
  • head allows the data to be viewed.
condition <-c("0", "1", "2", "3")
bias_change <- c(-0.05, 0.24, 0.33, -0.12)
stderror <- c(0.16, 0.38, 0.18, 0.11)
data = data.frame(condition, bias_change, stderror)

head(data)
##   condition bias_change stderror
## 1         0       -0.05     0.16
## 2         1        0.24     0.38
## 3         2        0.33     0.18
## 4         3       -0.12     0.11

Plotting the graph

I've plotted the graph using `ggplot.

  • ggplot() indicates that I am going to create a ggplot object.
  • data = specifies what dataset to use for the plot.
  • aes() specifies which variables are to be used for the x-axis (x =), y-axis (y =).
    • fill = indicates that different colours are to be allocated for each condition
  • geom_bar() is used to add the column bars - makes the heights of the bar proportional to the number of cases in each group. Without this, the plot would be empty, with only x- and y-axis showing.
  • position = "dodge" ensure that the separate conditions are not stacked but are instead side by side
  • stat = "identity" is a statement that needs to include when using geom_bar() as this function reads data in a way that is incompatible with the ‘y’ aesthetic. Normally geom_bar() formats the heights of the bars such that it formats the height to the number of observations in the group, not the value we assign to it. Therefore we need to add stat = "identity" to indicate to R that we want the bar heights to be the values we provide, rather than to the default setting (number of observations).
  • alpha determines the opacity of a geom, with lower values indicating more transparency
  • geom_errorbar() adds error bars. It specifies where on the graph the error bars are in terms of x- and y-values. x = specifies where the error bars will sit (i.e. according to the condition). Otherwise the bars will sit next to each other and not on each bar for each condition. ymin= and ymax= indicate where the error bars will end on the y-axis. width= indicates how wide I want the error bar ends to be. colour= specifies what colour I want the error bar to be.
  • ylim() indicates the length of the y-axis i.e. where the y axis will cut off
  • labs() relabels axis, legends and plot labels. Here, I'm using it to relabel my x-axis (x=), y-axis (y=) and plot title (title=).
ggplot(data = data, aes(
  x = condition,
  y = bias_change,
  fill = condition
)) +
  geom_bar(
    position = "dodge", 
    stat = "identity", 
    alpha=0.7) +
  geom_errorbar(aes(
    x = condition, 
    ymin=bias_change-stderror, 
    ymax=bias_change+stderror), 
    width=0.4, 
    colour="black", 
    alpha= 0.9, 
    position = position_dodge(0.9)) +
  ylim(-0.4, 0.7) +
  labs(x = "Number of years participant has spent in university", 
       y = "Bias Change", 
       title = "Change in implicit bias levels at the immediate test for each year group")

Attempt 2

However, I realised that maybe including the uncued condition (counterbias training only) could provide more information so I tried again. My question is worded the same but I now define my question differently.

  • Q: Do the number of years spent in university influence the effectiveness of targeted memory reactivation (TMR) for reducing implicit biases?

I've defined this question as:

  • Looking at the differences between the prenap (post-counterbias training) and postnap (after TMR) IAT scores
  • Effectiveness of TMR is measured by looking at the changes of the cued (counterbias training + TMR) vs uncued (counterbias training only) condition between prenap and postnap tests
  • Recall: prenap scores are obtained after counterbias training and postnap scores are acquired after TMR

Descriptive statistics

For the cued condition:

year_summary_postnapcued <- mutated_exploratorydata1 %>% 
  group_by(General_1_UniYears) %>% 
  summarise(mean = mean(postnap_change_cued),
            sd = sd(postnap_change_cued),
            n = n(),
            se = sd/sqrt(n))
## Error in group_by(., General_1_UniYears): object 'mutated_exploratorydata1' not found
year_summary_postnapcued %>% 
  gt() %>% 
  tab_header(title = md("**Change in implicit bias levels using Targeted Memory Reactivation for each year group**")) %>% 
  fmt_number(
    columns = vars(mean, sd, se),
    decimals = 2
  ) %>% 
  cols_label(General_1_UniYears = "Number of Years at University", 
             mean = "Mean", 
             sd = "SD", 
             n = "n",
             se = "SE")
## Error in dplyr::group_vars(data): object 'year_summary_postnapcued' not found

For the uncued condition:

year_summary_postnapuncued <- mutated_exploratorydata1 %>% 
  group_by(General_1_UniYears) %>% 
  summarise(mean = mean(postnap_change_uncued),
            sd = sd(postnap_change_uncued),
            n = n(),
            se = sd/sqrt(n))
## Error in group_by(., General_1_UniYears): object 'mutated_exploratorydata1' not found
year_summary_postnapuncued %>% 
  gt() %>% 
  tab_header(title = md("**Change in implicit bias levels using counterbias training for each year group**")) %>% 
  fmt_number(
    columns = vars(mean, sd, se),
    decimals = 2
  ) %>% 
  cols_label(General_1_UniYears = "Number of Years at University", 
             mean = "Mean", 
             sd = "SD", 
             n = "n",
             se = "SE")
## Error in dplyr::group_vars(data): object 'year_summary_postnapuncued' not found

Visualisation

Now that the descriptive statistics have been calculated, it's time to create the figure. First, I have to create the dataframe for the figure.

time1 <- c(rep("0",2), rep("1",2), rep("2",2), rep("3",2))
condition <-rep(c("cued","uncued"),4) 
bias_change <- c(-0.05, -0.14, 0.24, -0.19, 0.33, -0.11, -0.12, 0.28)
stderror <- c(0.16, 0.16, 0.38, 0.21, 0.18, 0.14, 0.11, 0.36)
data1 = data.frame(condition, bias_change, stderror)

head(data1)
##   condition bias_change stderror
## 1      cued       -0.05     0.16
## 2    uncued       -0.14     0.16
## 3      cued        0.24     0.38
## 4    uncued       -0.19     0.21
## 5      cued        0.33     0.18
## 6    uncued       -0.11     0.14

Plotting the graph

Attempt 1:

ggplot(data = data1, aes(
  x = time1,
  y = bias_change,
  fill = condition
)) +
  geom_bar(
    position = "dodge", 
    stat = "identity", 
    alpha=0.7) +
  geom_errorbar(aes(
    x = condition, 
    ymin=bias_change-stderror, 
    ymax=bias_change+stderror), 
    width=0.4, 
    colour="black", 
    alpha= 0.9, 
    position = position_dodge(0.9)) +
  ylim(-0.4, 0.7) +
  labs(x = "Number of years participant has spent in university", 
       y = "Bias Change", 
       title = "Change in implicit bias levels at the immediate test for each year group")

Attempt 2:

ggplot(data = data1, aes(
  x = time1,
  y = bias_change,
  fill = condition
)) +
  geom_bar(
    position = "dodge", 
    stat = "identity", 
    alpha=0.7) +
  geom_errorbar(aes(
    x = time1, 
    ymin=bias_change-stderror, 
    ymax=bias_change+stderror), 
    width=0.4, 
    colour="black", 
    alpha= 0.9, 
    position = position_dodge(0.9)) +
  ylim(-0.4, 0.7) +
  labs(x = "Number of years spent in university", 
       y = "Bias Change", 
       title = "Change in implicit bias levels at the immediate test for each year group") +
  theme_classic()

As seen by this plot, only the 1-year and 2-year university groups experienced an increase in cued bias scores and a decrease in uncued bias scores. This can be interpreted as TMR showing an undesirable effect in increasing bias levels for university students, while counterbias training was able to somewhat reduce bias levels.

The opposite is observed in the 3-year university group - this group experienced an increase in uncued bias scores and a decrease in cued bias scores. This can be interpreted as TMR showing a desirable effect in decreasing bias levels for university students, while counterbias training increased bias levels in students.

The only year-group that experienced both a reduction in bias levels using both the TMR and counter-bias training procedure are the participants that spent 0 years in university. However, note that these students experienced a greater reduction in bias levels using the counterbias procedure, compared to the TMR procedure.

Statistics

I want to compare several things:

  1. Compare means between conditions (number of years spent in university) for the cued condition - is there a difference of TMR effectiveness between university year groups?
    • Use ANOVA to compare 4 different means
  2. Compare means between conditions (number of years spent in university) for the uncued condition - is there a difference of counterbias training effectiveness between university year groups?
    • Use ANOVA to compare 4 different means
  3. For each year group, is TMR more effective than counterbias?
    • Use a t-test to compare between cued and uncued conditions, for each year group.

Constructing the code for ANOVA for points 1&2 seem difficult, so I will leave this for later.

For point 3: I'll try using the stat_compare_means function, by applying it to the plot.

cuedvsuncued <- list(c("cued", "uncued"))

ggplot(data = data1, aes(
  x = time1,
  y = bias_change,
  fill = condition
)) +
  geom_bar(
    position = "dodge", 
    stat = "identity", 
    alpha=0.7) +
  geom_errorbar(aes(
    x = time1, 
    ymin=bias_change-stderror, 
    ymax=bias_change+stderror), 
    width=0.4, 
    colour="black", 
    alpha= 0.9, 
    position = position_dodge(0.9)) +
  ylim(-0.4, 0.7) +
  labs(x = "Number of years spent in university", 
       y = "Bias Change", 
       title = "Change in implicit bias levels at the immediate test for each year group") +
  theme_classic() +
  stat_compare_means(comparisons = cuedvsuncued, method = "t.test")
## Error in stat_compare_means(comparisons = cuedvsuncued, method = "t.test"): could not find function "stat_compare_means"

Doesn't seem to work so will just do normal t-test:

Year0:

t.test(year0uni_participants$postnap_change_cued, year0uni_participants$postnap_change_uncued)
## Error in t.test(year0uni_participants$postnap_change_cued, year0uni_participants$postnap_change_uncued): object 'year0uni_participants' not found

The p-value = 0.7082 - thus, for students who have only attended 0 years at university, there is no evidence of a significant difference between the use of TMR vs counterbias training procedures in reducing implicit biases .

Year1:

t.test(year1uni_participants$postnap_change_cued, year1uni_participants$postnap_change_uncued)
## Error in t.test(year1uni_participants$postnap_change_cued, year1uni_participants$postnap_change_uncued): object 'year1uni_participants' not found

Year 2:

t.test(year2uni_participants$postnap_change_cued, year2uni_participants$postnap_change_uncued)
## Error in t.test(year2uni_participants$postnap_change_cued, year2uni_participants$postnap_change_uncued): object 'year2uni_participants' not found

Year 3:

t.test(year3uni_participants$postnap_change_cued, year3uni_participants$postnap_change_uncued)
## Error in t.test(year3uni_participants$postnap_change_cued, year3uni_participants$postnap_change_uncued): object 'year3uni_participants' not found

As seen above, all p-values for each of the 4 year groups rise above 0.05. Thus, for each year group, there is no evidence to suggest that there is a significant difference between the use of TMR vs counterbias training procedures in reducing implicit biases.

Question 2

Q: Do the number of cues that participants are exposed to influence the efffectiveness of the TMR procedure?

During the targeted memory reactivation procedure, participants were exposed to sound cues for the duration of their 90 minute nap. However, the sound cues automatically stopped if participantsshowed signs of awakening or entering another sleep stage. Thus, the number of cues that participants were exposed to varied from as low as 37.5 cues to as high as 660 cues.

I've defined this question as:

  • Looking at the differential bias change between cued (counterbias training + TMR) and uncued (counterbias training only) conditions
    • Differential bias change = the baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias
  • I'll be looking at the correlation between this differential bias change and the number of cues participants are exposed to. Thus, I'll use a scatterplot to illustrate this.

Preliminaries

load packages

library(tidyverse)
library(readspss) 
library(ggplot2)
library(janitor)
library(plotrix)
library(gt)

Read in the data

exploratory2 <- read_csv("cleandata.csv")

Figuring out what data I need

  • Number of cues that participants are exposed to: already provided in the open data, under variable cues_total
  • Differential bias change: I need to calculate this myself using mutate(). As noted earlier, differential bias change is defined as the baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias.

Thus, the equation will look like this:

differential bias change = (baseline_cued - delayed_cued) - (baseline_uncued - delayed_uncued)

This equation must be applied to each participant's score. Thus,mutate() would be the best function to do this. mutate() is taken from the dplyr package and allows for the creation, modification and deletion of columns. It allows for new variables to be added, while keeping existing ones. The first part of the equation can be grouped into 2 new variables: cued_differential and uncued_differential. cued_differential will be defined as baseIATcued - weekIAT cued. Likewise, uncued_differential will be defined as baseIATuncued - weekIATuncued. Thus, the equation to create the variable diff_bias_change can alse be defined as:

diff_bias_change = cued_differential - uncued_differential

However, before that equation can be coded in, several other steps need to occur first:

  • First, a new variable differential is created using <-. <- assigns a value (given on the right of the symbol) to a name (i.e. differential). Then, the data cleandata is selected.
  • The pipe operator %>% is used to chain multiple methods into a single statement, without having to create and store new variables. It does this by taking the output of one statement and making it the input of the next statement.
  • For example, the first line of code is the first statement. It creates a new variable differential that includes all data from exploratory2. The pipe operator %>% takes that output, and uses it for the input of the next line of code, and so forth.
  • select() is from the dplyr package. It allows for the selection of variables within a dataframe - the variables from exploratory2 dataframe that are to be selected are contained within the brackets
  • mutate() is used to create the three new variables as detailed above. mutate() adds three new columns/variables to the dataset exploratory2.
  • head allows for the newly-calculated variable to be viewed.
differential <- exploratory2 %>%
  select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, cues_total) %>%
  mutate(cued_differential = baseIATcued - weekIATcued,
         uncued_differential = baseIATuncued - weekIATuncued,
         diff_bias_change = cued_differential - uncued_differential) 

head(differential)
## # A tibble: 6 x 9
##   ParticipantID baseIATcued weekIATcued baseIATuncued weekIATuncued cues_total
##   <chr>               <dbl>       <dbl>         <dbl>         <dbl>      <dbl>
## 1 ub6                0.575       0.204         0.610         0.683        142.
## 2 ub7                0.0991      0.459         0.644        -0.0107       180 
## 3 ub8                0.206       0.399         1.52          0.712        232.
## 4 ub9                0.353       0.923         0.131         0.202        240 
## 5 ub11               0.572      -0.0187        0.0488        0.131        225 
## 6 ub13               0.310       0.561         0.901         1.12         375 
## # … with 3 more variables: cued_differential <dbl>, uncued_differential <dbl>,
## #   diff_bias_change <dbl>

Descriptive statistics

I calculated my descriptive statistics in the same way I did for my first exploratory question. However, my table came out weird - no SD or SE for most of my values. After the Thursday class, Jenny pointed out because most participants were exposed to a varying number of cues, calculating for descriptive statistics would be unnecessary in order to answer my question. not necessary

cuesexposed <- differential %>% 
  group_by(cues_total) %>% 
  summarise(mean = mean(diff_bias_change),
            sd = sd(diff_bias_change),
            n = n(),
            se = sd/sqrt(n))
cuesexposed %>% 
  gt() %>% 
  tab_header(title = md("**Is there an association between number of cues exposed to and differential bias change?**")) %>% 
  fmt_number(
    columns = vars(mean, sd, se),
    decimals = 2
  ) %>% 
  cols_label(cues_total = "Number of cues exposed", 
             mean = "Mean", 
             sd = "SD", 
             n = "n",
             se = "SE")
Is there an association between number of cues exposed to and differential bias change?
Number of cues exposed Mean SD n SE
37.5 1.27 NA 1 NA
45.0 −0.16 NA 1 NA
52.5 0.34 NA 1 NA
142.5 0.44 NA 1 NA
165.0 0.21 NA 1 NA
180.0 −1.01 NA 1 NA
187.5 0.81 NA 1 NA
210.0 −0.83 NA 1 NA
225.0 0.67 NA 1 NA
232.5 −1.01 NA 1 NA
240.0 −0.50 NA 1 NA
262.5 0.29 NA 1 NA
270.0 −0.23 NA 1 NA
277.5 −0.53 NA 1 NA
285.0 −0.04 0.36 2 0.26
345.0 −1.48 NA 1 NA
352.5 −0.02 NA 1 NA
375.0 −0.04 NA 1 NA
390.0 0.25 NA 1 NA
405.0 −0.31 NA 1 NA
420.0 0.54 0.25 2 0.17
427.5 −0.53 NA 1 NA
442.5 0.18 NA 1 NA
487.5 0.69 NA 1 NA
502.5 −0.79 NA 1 NA
540.0 −1.11 NA 1 NA
562.5 0.03 NA 1 NA
600.0 −0.13 NA 1 NA
660.0 0.04 NA 1 NA

Thus, I just decided to calculate for the overall mean and standard error. An average of 323+-29 individual cues were presented to each participant.

cuesexposed2 <- exploratory2 %>% 
  summarise(mean = mean(cues_total),
            sd = sd(cues_total),
            n = n(),
            se = sd/sqrt(n))
cuesexposed2 %>% 
  gt() %>% 
  tab_header(title = md("**Is there an association between number of cues exposed to and differential bias change?**")) %>% 
  fmt_number(
    columns = vars(mean, sd, n, se),
    decimals = 2
  ) %>% 
  cols_label(mean = "Mean", 
             sd = "SD", 
             n = "n",
             se = "SE")
Is there an association between number of cues exposed to and differential bias change?
Mean SD n SE
323.47 162.33 31.00 29.15

Visualisation

I've plotted the graph using `ggplot.

  • ggplot() indicates that I am going to create a ggplot object.
  • data = specifies what dataset to use for the plot.
  • aes() specifies which variables are to be used for the x-axis (x =), y-axis (y =).
  • geom_point() is used to add the scatterplot. Without this, the plot would be empty, with only x- and y-axis showing.
  • geom_smooth() adds a feature to the graph that allows for a regression line
    • method = can alse be set as "lm" (linear model) or glm" (generalised linear model). To make the regression line straight, method must be set as "lm" (linear model).
    • To get rid of the confidence interval shading, the confidence interval must be set as se = FALSE.
  • scale_x_continuous() and scale_y_continuous() allow for formatting of the position scales for continuous data.
    • limits defines the limits of the scale.
    • expand has been set on default to allow for some padding/gap on each side for the data variables. For continuous vairables, there is a 5% gap/expansion on each side of the scale. To remove that default padding/expansion and set it so that the x-axis starts at the value 0, expand = is set at c(0,0) for both the x- and y-axes.
    • The c(...) in c(0,0) combines the arguments (i.e. the values within the brackets 0, 0) to form a vector
  • labs() relabels axis, legends and plot labels. Here, I'm using it to relabel my x-axis (x=), y-axis (y=) and plot title (using subtitle= because my question was too long and subtitle= uses a smaller font).
  • theme_classic() removes the grey and gridded background.
ggplot(data = differential, aes(
  x = cues_total,
  y = diff_bias_change
))+
  geom_point()+
  geom_smooth(method = lm, 
              se = F)+ 
  scale_x_continuous(expand = c(0,0),limits = c(0,700))+ 
  scale_y_continuous(expand = c(0,0),limits = c(-2,2))+
  labs(subtitle = "Do the number of cues that participants are exposed to influence the efffectiveness of TMR?", 
       x = "Number of cues exposed",
       y = "Differential bias change")+
  theme_classic()

Looking at the plot, there seems to be a slight decreasing trend but it is almost horizontal... Thus, I need to run a statistical test to see if this trend is of signficance.

Statistics!

To measure correlation, I'll be using cor_test().

cor.test(differential$cues_total, differential$diff_bias_change)
## 
##  Pearson's product-moment correlation
## 
## data:  differential$cues_total and differential$diff_bias_change
## t = -0.99925, df = 29, p-value = 0.3259
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.5041878  0.1837791
## sample estimates:
##        cor 
## -0.1824417

There is a non-significant negative correlation of -0.1824417. Thus, although there does seem to be a slight decreasing trend in that the more cues participants are exposed to, the more decrease in bias is seen, overall there is no association between number of cues that participants are exposed to and the effectiveness of TMR.

Question 3

Q: Does the difference in procedure matter - does getting offered course credit (vs. cash) significantly change implicit biases?

I defined this question in the following way:

  • Again, I'll be looking at the differential bias change between cued (counterbias training + TMR) and uncued (counterbias training only) conditions
    • Differential bias change = the baseline minus delayed score for uncued bias subtracted from the baseline minus delayed score for cued bias
  • I will be constructing a boxplot to compare the means

Preliminaries

load packages

library(tidyverse)
library(readspss) 
library(ggplot2)
library(janitor)
library(plotrix)
library(gt)
library(ggeasy)
library(jmv)

Read in the data

exploratory3 <- read_csv("cleandata.csv")

Figuring out what data I need

  • What compensation was used: already provided in the open data, under variable compensation
  • I'll be using the same formula to calculate for differential bias change, as I used for my second exploratory analysis. The only difference in my code below is that I'll be selecting the variable compensation (instead of cues_total), as this is my variable of interest.
differential3 <- exploratory3 %>%
  select(ParticipantID, baseIATcued, weekIATcued, baseIATuncued, weekIATuncued, compensation) %>%
  mutate(cued_differential = baseIATcued - weekIATcued,
         uncued_differential = baseIATuncued - weekIATuncued,
         diff_bias_change = cued_differential - uncued_differential) 

head(differential)
## # A tibble: 6 x 9
##   ParticipantID baseIATcued weekIATcued baseIATuncued weekIATuncued cues_total
##   <chr>               <dbl>       <dbl>         <dbl>         <dbl>      <dbl>
## 1 ub6                0.575       0.204         0.610         0.683        142.
## 2 ub7                0.0991      0.459         0.644        -0.0107       180 
## 3 ub8                0.206       0.399         1.52          0.712        232.
## 4 ub9                0.353       0.923         0.131         0.202        240 
## 5 ub11               0.572      -0.0187        0.0488        0.131        225 
## 6 ub13               0.310       0.561         0.901         1.12         375 
## # … with 3 more variables: cued_differential <dbl>, uncued_differential <dbl>,
## #   diff_bias_change <dbl>

Descriptive statistics

I've calculated my descriptive statistics in the same way, as I did for my first exploratory analysis.

coursecompensation <- differential3 %>% 
  group_by(compensation) %>% 
  summarise(mean = mean(diff_bias_change),
            sd = sd(diff_bias_change),
            n = n(),
            se = sd/sqrt(n))
coursecompensation %>% 
  gt() %>% 
  tab_header(title = md("**Does getting offered course credit (vs. cash) significantly change implicit biases in the long-term?**")) %>% 
  fmt_number(
    columns = vars(mean, sd, n, se),
    decimals = 2
  ) %>% 
  cols_label(mean = "Mean", 
             sd = "SD", 
             n = "n",
             se = "SE")
Does getting offered course credit (vs. cash) significantly change implicit biases in the long-term?
compensation Mean SD n SE
cash −0.20 0.48 12.00 0.14
course credit 0.00 0.72 19.00 0.16

It seems like using course credit as compensation did not impact the effectiveness of TMR compared to using counterbias training only - as the mean is 0, this indicates no difference at all between cued (TMR) and uncued (counterbias training). There was a very very slight difference when using cash as course compensation - the negative mean indicates that there was a greater change in implicit biases in the uncued condition, compared to cued.

Visualisation

I've plotted the graph using `ggplot.

  • ggplot() indicates that I am going to create a ggplot object.
  • data = specifies what dataset to use for the plot.
  • aes() specifies which variables are to be used for the x-axis (x =), y-axis (y =).
  • fill = indicates that different colours are to be allocated for each condition, according to the variable compensation.
  • geomboxplot is used to add the boxplot. Without this, the plot would be empty, with only x- and y-axis showing.
ggplot(data = differential3, aes(
    x = compensation,
    y = diff_bias_change,
    fill = compensation))+
  geom_boxplot()

  • geom_jitter() is to visualise each datapoint.
  • To differentiate the different datapoints, colour = is used to differentiate which condition the datapoints belong to. alpha = specifies the opacity of these data points; lower values correspond to a lower transparency of colour.
  • theme_classic() removes the grey and gridded background.
  • easy_remove_legend() from the ggeasy() package removes the legend.
ggplot(data = differential3, aes(
    x = compensation,
    y = diff_bias_change,
    fill = compensation))+
  geom_boxplot()+
  geom_jitter(alpha = 0.8, aes(colour=compensation))+
  theme_classic() +
  easy_remove_legend() 

-labs() relabels axis, legends and plot labels. Here, I'm using it to relabel my x-axis (x=), y-axis (y=) and plot title (title =).

ggplot(data = differential3, aes(
    x = compensation,
    y = diff_bias_change,
    fill = compensation))+
  geom_boxplot()+
  geom_jitter(alpha = 0.8, aes(colour=compensation))+
  easy_remove_legend() +
  labs(x = "Compensation",
       y = "Differential Bias Change",
       title = "Does getting offered course credit (vs. cash) significantly change implicit biases?")

Note: the means seen here are different to that seen in my descriptive statistics table. I'm not sure why this occurs

Statistics

To perform a t-test, I used the ttestIS() function from the jmv() package. I’ve specified the DV diff_bias_change’ and I want to know if that varies significantly betweencompensation`. THe data comes from the ’differential3’ dataframe I created at the beginning of this question's exploratory analysis.

ttestIS(formula = diff_bias_change ~ compensation, data = differential3)
## 
##  INDEPENDENT SAMPLES T-TEST
## 
##  Independent Samples T-Test                                                 
##  ────────────────────────────────────────────────────────────────────────── 
##                                       Statistic     df          p           
##  ────────────────────────────────────────────────────────────────────────── 
##    diff_bias_change    Student's t    -0.8417201    29.00000    0.4068325   
##  ──────────────────────────────────────────────────────────────────────────

As can be seen, the p-value (0.4068325) is greater than 0.05. Thus, there is no evidence to suggest that the difference in compensation procedure matters - getting offered course credit (vs. cash) does not significantly change implicit biases.

Knitting to word/pdf

I tried using this website http://tug.org/mactex/faq/3-4.html to download TeX, which is apparently what is required to knit to word and pdf. I loaded the package tinytex and was able to knit to word (even though my gt() tables were unable to be loaded properly. However, I was unable to knit to pdf as it kept saying I do not have TeX installed (even though I've deleted and redownloaded TeX package multiple times).

After some research, I tried downloading MiKTeX. I was able to knit my test document into pdf! However, when I tried knitting together more complex RMarkdown files (including plots and gt packages I was unable to do this.

Do I have to knit to pdf, or is knitting to word ok?

My next steps

My next steps are to fine-tune my statistics sections for each exploratory analysis question and to finalise my verification report.