Article

Tybur, J. M., Lieberman, D., Fan, L., Kupfer, T. R., & de Vries, R. E. (2020). Behavioral immune trade-offs: Interpersonal value relaxes social pathogen avoidance. Psychological Science, 31(10), 1211-1221.

Part 1 - Summary and Reaction

Summary

In order to navigate an environment full of pathogens, natural selection in humans has prompted adaptation in our sensory and motivational systems. This involves actively avoiding bodily wastes, spoiled foods, and people with infectious symptoms or diseases. However, the recent COVID-19 pandemic has circumvented these systems, as there are often no signs of infection, which is in part responsible for its rapid spread. This has prompted extensive research in the field, aiming to determine the factors that promote personal hygiene practices, physical distancing and pathogen avoidance, to prevent and reduce the spread of the virus.

Previous research has found that people are less disgusted by the bodily fluids and wastes from a friend compared to a stranger. The most common explanation for these findings is that people subconsciously use familiarity as an indicator of infection threat, assuming that those they know better are less likely to be infected. Tyber et al. propose an alternative explanation, suggesting that interpersonal value directly influences willingness to engage in infectious-risky behaviour, regardless of symptoms. As such, Tyber et al. aimed to investigate how people evaluate the trade-off between the costs of pathogen exposure and the benefits of social interaction. They hypothesised that people would be more willing to engage in infectious-risky behaviour with individuals they have more interpersonal value for, such as a romantic partner or friend, compared to less interpersonally valued people such as acquaintances and enemies. They also hypothesised that this would extend to more socially valued strangers, compared to less socially valued strangers. To explore their explanation, Tyber et al. conducted three studies.

Study 1 involved asking participants to consider a romantic partner, close friend, acquaintance, or enemy. Participants were then asked a series of questions relating to contact comfort, interpersonal value, germ-avoidance, and honesty and humility. As a measure of Contact Comfort, participants were asked a series of questions measuring how comfortable they were engaging in contact with their target. For example, participants were asked to rate how comfortable they would be using the same towel that the target used to dry him or herself with earlier on a scale of -3 (very uncomfortable) to 3 (very comfortable). As a measure of interpersonal value, participants completed a wellness-trade-off-ratio (WTR) task, where they had to choose between either the target or themselves receiving an amount of money. For example, participants were asked whether they would prefer to receive $17 while the target received nothing, or for the target to receive $37 while they received nothing. Responses were calculated using a ratio (i.e. 17/37 = 0.45), which was used to determine where the participant changed from prioritising themselves to the target, known as a switchpoint. Participants were then asked to rate the honesty-humility of the target, as well as their general motivation to avoid pathogens (measured through as disgust scale). Study 2 was conducted very similarly to the first study, only altering the method to measure the target’s prosocial traits (as opposed to honesty-humility) and control for sex (by removing the romantic partner condition). Finally, Study 3 was conducted to address the limitations in the first two studies, that is, more interpersonally valued people may actually be less infectious, as well as control the extraneous variables that come with picturing a known individual. As such, participants were given a photo and a description of a stranger, either high or low in honesty-humility and agreeableness, and completed the same WTR task and questions as in Study 2.

Notably, across all three studies, they found that participants are more comfortable with infectious-risky behaviours if they valued the person more interpersonally, or a stranger with higher honesty-humility and agreeableness. This indicates that the interpersonal value of someone influences the trade-off between infection-risk and social benefits. Thus, they concluded that interpersonal value and prosocial personality traits were both important predicters in infection-risky behaviour, which has serious implications for the spread of infectious diseases. Tyber et al. noted that these findings have practical implications for COVID-19, as people are likely less motivated to abide by COVID-19 regulations of social distancing and lockdowns if they value the personal more highly.

Reaction 1

I was surprised that… people are more comfortable exposing themselves to pathogens or engaging in infection-risky behaviour with more honest and agreeable strangers, as opposed to less honest and agreeable strangers. It seems more intuitive to me that people are more comfortable with people they know and have high personal value for, such as a romantic partner more than an acquaintance. However, I was surprised that the traits of a complete stranger influences how comfortable someone may be with potentially exposing themselves to pathogens. The article proposes that this bias occurs as more prosocial strangers offer more potential benefits that can offset the risk of exposure. I think this finding is another demonstration of how, as soon as we meet or hear of a person, we immediately form an impression of them, making judgements about who they are and the value they could provide. I find this process fascinating, and I am still undecided on how accurate these first impressions are. I can also see how these first impressions feed into confirmation biases, where you then treat a person how you expect them to behave, and only look for signs that prove this. The processes underlying these judgements are particularly interesting when considering the spread of COVID-19, as we need to correctly judge the infectiousness of someone in order to effectively avoid pathogen exposure.

Reaction 2

It seems that the next step in this area of research would be to… consider and study the many different variables that may also affect the extent to which people are willing to engage in infection risky behaviour. The article is focused on how the type of relationship, and subsequently value of a person, affects comfort in engaging in infection-risky behaviour. This seems intuitive, however I can imagine there are many potential moderating or confounding variables that also influence a person’s willingness to engage in infection-risky behaviour. The article proposes two factors, sexual value, and genetic relatedness, that they believe would influence interpersonal value, which in turn would impact contact comfort. Further research into both of these would be very interesting, and would potentially allow directionality to be established, as to me it is not clear that necessarily genetic relatedness -> interpersonal value -> contact comfort. As such, I think further research should be conducted in this field that looks at potential other variables that influence contact comfort, and if possible, the directionality of these effects.

Reaction 3

The authors seemed to have missed… an analysis on sex differences. In the data exclusion section of Study 1, the authors explicitly state that they excluded participants that indicated they were neither male or female so that “sex differences could be examined”. However, this is the only time that sex difference are referred to in the article, and no findings or plots are published regarding these differences. This made me wonder why these results were not published, and why that had excluded non-gendered participants in the first place. It is a known issue in psychology wherein authors only publish findings that reach significance, as this is beleived to be more important, or get more funding. However, this has huge implications for replicability and reproducability. I am not saying that the authors didn’t publish their findings on sex differences because they didn’t reach significance, however it did get me starting to think about reproducability in this paper, and more generally in the field of psychology.

Part 2 - Verification

Reproducibility Plan

The paper we are planning to reproduce is Tyber et al.’s (2020) Behavioral Immune Trade-Offs: Interpersonal Value Relaxes Social Pathogen Avoidance. We were able to download the data, analysis script and codebook from the OSF Repository.

We aimed to reproduce the demographic statistics reported in each of the three studies, the exclusion criteria, the two figures, and the descriptive statistics.

Initially, it was not clear that we would need to reproduce the exclusion criteria in order to reproduce the figures and descriptive statistics. We attempted to reproduce the descriptive statistics first, however we were unable to get the right statistics, which lead us to realise we needed to first reproduce the exclusion criteria, meaning we updated our goals.

Demographic Statistics

The demographic statistics for each study relate to the participants and include:

Total number of participants
Percentage of male
Mean age and standard deviation

Study 1

Study 2

Study 3

Our goal was to reproduce each of these statistics exactly as presented in the article.

Exclusion Criteria

For each study, there are three types of exclusion criteria:

WTR: participants with more than two switch points within any of the WTR anchors
Gender: participants who selected neither male or female as their gender
English: participants who demonstrated a nonsensical or poor English description of their partner

Study 1

Study 2

Study 3

Figures

There are two figues in the article we aimed to reproduce:

Figure 1: a violin plot with an internal box plot, describing the average level of contact comfort for each relationship type, combined across all three studies.
Figure 2: a scatterplot describing the average level of contact comfort across WTR scores, with a line of best fit and a shaded area displaying the 95% confidence intervals.

Our goal was to reproduce both of these figures, as close to the original as possible, both data and aesthetic wise.

Descriptive Statistics

Throughout the article, there are only two instances where descriptive statisitcs are mentioned. Our goal was to reproduce both of these.

Study 1

Study 3

Getting Started

Accessing the Data

The first step was to source the data and code. The article includes a link to the OSF where we could openly access and download the raw data, the authors’ code, and the codebooks for each study.

Loading the Packages and Data

After opening RStudio, we first needed to load the appropriate packages. At the beginning, the only packages we loaded were tidyverse and janitor, however throughout the term we installed and loaded many more packages, which will each be descriped as we used them.

# load the packages
library(tidyverse) # for almost all analysis
library(janitor) # for data exploration

Next, we needed to load in the data for each study. As the data was in .csv files, we used the tidyverse function read.csv to import the data for each study, and name them appropriately so we did not get confused between each study dataset.

Although seemingly simple, reading in the data was actually our first challenge. When Danielle explained how to import the .csv files, she did so on a non-Mac, however the majority of our group use a Mac, and so we had to Google how to load the data. We found the easy solution that it requires the exact file location. So that we could all use the same code, we each saved the files to our desktops, so the directory to read in the data would be the same for everyone.

# import the data
study_1 <- read.csv("~/Desktop/Reproduction Assessment/WTR_Comfort_S1.csv")
study_2 <- read.csv("~/Desktop/Reproduction Assessment/WTR_Comfort_S2.csv")
study_3 <- read.csv("~/Desktop/Reproduction Assessment/WTR_Comfort_S3.csv")

Demographics

Study 1

Now that we had loaded in the data, we decided to start by reproducing the demographics for Study 1, as each of the data files are seperate. Before doing so, I first had a look at the dataset file to get an idea of how many participants there were, and what each of the variables were called.

To easily reproduce the number of participants, I used the nrows function, which prints the number of rows, or participants, in each study.

nrow(study_1)

## [1] 504

This confirmed there was 504 participants in Study 1. Next, we wanted to determine the percentage of males. After some searching on Google, we found the function tably, which creates a table of basic statistics, including the number and percentage, for a selected variable. We created a new variable sex_1 and entered the data for Study 1 and the variable sex to output the percentage of each sex.

sex_1 <- tabyl(study_1, sex)
print(sex_1)

##  sex   n     percent
##    1 278 0.551587302
##    2 224 0.444444444
##    3   2 0.003968254

Using the codebook provided by the authors, we found that a 2 in the sex variable means male. This means we replicated the first demographic statistic, 55.16% male!

Next was finding the mean age and standard deviation of participants. To do this we created a new value age_1 by pulling the age variable from the Study 1 dataset. We then used the function print and mean and sd to output the demographics.

age_1 <- study_1$age 
print(mean(age_1))

## [1] 35.87698

print(sd(age_1))

## [1] 10.20061

Success! Although this produced the correct demographics, I wanted to find another function that summarised the data more clearly, and rounded to 2 decimal places (as in the article). I found the function summarise which Danielle had taught us, and piped in the data from study, and then used the same mean and sd functions. After Googling, I found the function round which I used to round the numbers to two digits.

age_1 <- study_1 %>%
  summarise(
    mean = round(mean(study_1$age), digits = 2),
    sd = round(sd(study_1$age), digits = 2))
print(age_1)

##    mean   sd
## 1 35.88 10.2

Now the output is much more clear and exactly the same as in the article!

Study 2

For Study 2, we used the exact same code to reproduce the demographic statistics.

nrow(study_2) # number of participants

## [1] 430

# percentage of males
sex_2 <- tabyl(study_2, sex)
print(sex_2)

##  sex   n     percent
##    1 242 0.562790698
##    2 187 0.434883721
##    3   1 0.002325581

age_2 <- study_2 %>%
  summarise(
    mean = round(mean(study_2$age), digits = 2),
    sd = round(sd(study_2$age), digits = 2))
print(age_2)

##    mean    sd
## 1 36.29 10.99

All of these statistics match those stated in the article!

Study 3

For Study 3, again we used the same code as above to reproduce the statistics.

nrow(study_3)

## [1] 905

sex_3 <- tabyl(study_3, sex)
print(sex_3)

##  sex   n     percent
##    1 452 0.499447514
##    2 445 0.491712707
##    3   8 0.008839779

age_3 <- study_3 %>%
  summarise(
    mean = round(mean(study_3$age), digits = 2),
    sd = round(sd(study_3$age), digits = 2))
print(age_3)

##    mean    sd
## 1 38.22 11.64

Again, all of these statistics match those in the article! After reproducing the demographics for each study, we were feeling very satisfied and confident.

Descriptive Statistics - Attempt 1

After reproducing the demographics, we were searching through the paper for the descriptive statistics, and came across a line referencing the mean contact comfort in the procedure for Study 1.

According to the article, contact comfort refers to how comfortable the participant would be with a given scenario, such as using someone else’s deodorant. Despite this, it was unclear what mean contact comfort was referring to, and were were unable to find it anywhere in the article, code, or codebook. This was the first time we realised that even by providing data and analysis openly, when it is not clearly labelled, described, and explained, it greatly impacts the ease at which someone can reproduce the work.

After looking at the data, we found 10 variables that relate to contact comfort, so we assumed the mean contact comfort was referring to the grand mean for all 10 variables for all 504 participants. As we already knew how to calculate a mean using the mean function, we attempted to use this and the $ to select each column.

mean(study_1$comf1,
  study_1$comf2,
  study_1$comf3,
  study_1$comf4,
  study_1$comf5,
  study_1$comf6,
  study_1$comf7,
  study_1$comf8,
  study_1$comf9,
  study_1$comf10)

## Error in mean.default(study_1$comf1, study_1$comf2, study_1$comf3, study_1$comf4, : 'trim' must be numeric of length one

Unfortunately this gave us an error, so, based on logic, we tried another way, using the study and a pipe, and then selecting from the first to the last contact comfort column.

study_1 %>%
  mean(comf1:comf10)

## Warning in mean.default(., comf1:comf10): argument is not numeric or logical:
## returning NA

## [1] NA

Unfortunately this gave us another error, and we soon realised that we would not be able to do this assignment solely by using our ‘logic’. Up until this point we had all been ‘coding as caterpillars’, all viewing one person’s R, but this was the first time we had hit a few errors. So, we decided to each try searching and testing out code until we found something that worked. We ended up using this strategy throughout the entirety of our joruney, but more on that later!

After each searching through forums, Caitlyn found a solution! The function colMeans allows you to calculate the grand mean of specified columns. We used the data from Study 1, and specified the columns 30:39 using c(30:39). We also found that if you include na.rm = TRUE, it will remove all NAs in the columns. At first, we did not include this, and the result was NA, as at least one value must have been an NA, and when multiplying, it automatically produced NA. As such, we included the code to remove it.

comfort_1 <- colMeans(study_1[c(30:39)],na.rm = TRUE) 
print(mean(comfort_1))

## [1] 4.079103

print(sd(comfort_1))

## [1] 0.5663728

This number was much larger than the one reported in the article (0.06)! After looking at the article, we realised the scale for contact comfort ranged from -3 to 3, however our output was outside of this range. After looking at the dataset, we noticed the values ranged from 1 to 7, which we presumed is so the dataset did not include negative numbers. This was frustrating, as nothing was mentioned in the article regarding this, and once again we were confused due to the authors not explaining their data or code. Thankfully, the solution to transform the range was easy, and we simply added a -4 to subtract 4 to return to the original range.

comfort_1 <- colMeans(study_1[, c(30:39)], na.rm = TRUE)
print(mean(comfort_1)-4)

## [1] 0.07910276

Unfortunately this value still did not match that in the article! As this function was completely new to us, we weren’t positive it was calculating the right thing, so, to test the integrity of the code, Caitlyn created a random tiny dataset, which we could use to run the code and then compare it to the manually calculated mean. Caitlyn created the dataset on excel, and calculated the mean to be 4.5. We read in the .csv and then used the same code as above to calculate the mean of the data.

tinydataset <- read_csv("~/Desktop/Reproduction Assessment/example_data.csv")
print(tinydataset)

## # A tibble: 2 × 5
##     `1`   `2`   `3`   `4`   `5`
##   <dbl> <dbl> <dbl> <dbl> <dbl>
## 1     2     3     4     5     6
## 2     3     4     5     6     7

comfort_tiny <- colMeans(tinydataset[ , c(1:5)], na.rm=TRUE)
print(mean(comfort_tiny))

## [1] 4.5

As this produced the correct result, we knew that the code was calculating the correct thing, but with the wrong values. The only plausible explanation our group could come up with for why this may be was that this line was produced after the exclusion criteria had been carried out, and hence our code was correct, but not feeding in the right dataset. However, the line stating the mean contact comfort comes before the exclusion criteria in the article, which is problematic if this explanation is true. Despite our hesitation, we decided to move onto the exclusion criteria, and then come back to see whether this code worked after we had excluded all the necessary participants.

Exclusion Criteria

Study 1

As mentioned previously, for each study there are three types of exclusion criteria:

WTR exclusion
Gender exclusion
English exclusion

We figured the gender and English exclusions would be simpler than the WTR, so we decided to start with these.

Sex Exclusion

The article states that it excludes an participant that indicated they were neither a man nor a woman. In the codebook, we found that a 3 in the sex column was indicative of this. We created a new dataset data_excluded_1 to hold the new excluded data and used a filter function to only keep participants without (!=) a 3 in the sex column. According to the article, there were two participants removed due to this criteria, so we ran the nrow function to determine the total number of participants after exclusion.

sex_exclusion_1 <- study_1 %>%
  filter(sex!=3)
nrow(sex_exclusion_1)

## [1] 502

As there was now 502 participants in the study, 2 participants have been excluded, which matches the article.

English Exclusion

The article states that they excluded participants who demonstrated nonsensical descriptions of their partners or poor English. Through viewing the entire dataset, we found there is a column called English_exclude which indicates those that they excluded, denoted by a 1. So, we piped the above dataset with sex exclusions applied into another filter statement that only keeps participants with a 0 in the English_exclude column. Again, we ran the nrow function to determine the total number of participants after both sex and english exclusion.

English_exclusion_1 <- study_1 %>%
  filter(English_exclude==0)
nrow(English_exclusion_1)

## [1] 501

As there was now 501 participants in the study, 3 participants have been excluded, which matches the article.

WTR Exclusion

The WTR exclusion criteria ended up being our biggest challenge in our whole journey. In the article, the authors defined a switchpoint as

“Switch points—the ratio at which participants begin choosing the benefit for the target.”

As mentioned above, any participant with more than 2 switchpoints on any anchor (6 total in this study) were excluded.

Before doing anything, we looked at the dataset, to try and understand what a switchpoint looked like. There were 60 columns relating to the WTR task, divided into 6 anchors, with 10 columns each. This was about as far as we could understand, and we very quickly became completely confused, with different people in our group each thinking something different. For the first few weeks, admittedly we really weren’t concrete on what a switchpoint was, however we moved forward regardless.

According to the authors codebook, in each anchor, a value of 0 for any of the first 9 values means the participant prioritised themselves, while a non-zero value means the participant prioritised the target. For the last value in each anchor, -0.25 means the participant prioritised themselves, while -0.45 means the participant prioritised the target. Despite giving us this information, the authors did not explain how they obtained these numbers, or what they meant, which made us very frustrated, but ended up not being particularly important in reproducing the exclusion criteria.

Now we (vaguely) understood what a switchpoint was, we still had no idea where to start in reproducing the exclusion criteria, so we thought it logical to look at the authors code, and try to understand what it is doing. We copy and pasted their data, and changed the variable names and source to the same as ours. Very quickly, we realised that the authors did not provide much explanation of the code, or what chunks were doing, which would have made understanding it alot easier. Regardless, the first line of code was selecting the columns relating to the WTR task.

Next, they changed all values of -0.45 to a 0. However, -0.45 signifies ‘other’ in that anchor and 0 signifies ‘self’ in all other anchors, so this seemed like an incorrect change. For the majority of the term we were very confused as to why this had been done, as this meant that either the authors’ code or the codebook was incorrect. We decided to use their code, so we could reproduce the rest of their analysis, and hoped that the codebook was incorrect, as if the code was incorrect, this would have serious implications for the reliability of their results.

When looking at it recently I realised the -0.45 only exists after all of the other values in that anchor are a 0, signifying the always prioritised themself, and it doesn’t make sense that coincidentally they all chose to value the other person in only the last value. So, I am now fairly confident that their code is correct, and the codebook is wrong (thankfully!). Regardless, we followed the authors code so we were able to reproduce their results.

# select the columns related to the WTR task
study_1_WTR<-study_1 %>%
  select(49:40,59:50,69:60,79:70,89:80,99:90)

# change all instances of -0.45 in the specified columns for each anchor to 0 
study_1_WTR$X37_.13[which(study_1_WTR$X37_.13==-.45)]<-0
study_1_WTR$X23_.8[which(study_1_WTR$X23_.8==-.45)]<-0
study_1_WTR$X75_.26[which(study_1_WTR$X75_.26==-.45)]<-0
study_1_WTR$X19_.7[which(study_1_WTR$X19_.7==-.45)]<-0
study_1_WTR$X46_.16[which(study_1_WTR$X46_.16==-.45)]<-0
study_1_WTR$X68_.24[which(study_1_WTR$X68_.24==-.45)]<-0

# create a matrix to store the above data (that removes null)
study_1_WTRdummy <- as.matrix(study_1_WTR)

# change any non-zero value to 1
study_1_WTRdummy[study_1_WTRdummy != 0] <- 1

# transform the matrix back into a dataset 
study_1_WTR <- as.data.frame(study_1_WTRdummy)
rm(study_1_WTRdummy) # remove the dummy dataset

This changed all the values to either a 0 or 1, which is great! From this point onwards, we completely failed to understand what the code was doing. We spent quite a while reading through the lines of code, running one at a time and trying to work out what it was doing.

participantnumber <- 1
Caulsum <- matrix(,,ncol=9)
Caulsum <- Caulsum[-1,]
colnames(Caulsum) <- c("WTR37","WTR23","WTR75","WTR19","WTR46","WTR68","WTRTOTAL",'DS','HH')

The variable Caulsum was completely unknown to us, and this was the first time it appeared in the code, and there was no explanation as to what it could mean or what it was doing, and it wasn’t mentioned in the article, codebook or supplementary materials at all. After Googling it for a while and coming up blank, we decided it may be best to write our own code, rather than painstakingly try to understand theirs.

After thinking about it logically, we realised the first step we needed to do was mutate new columns that held the difference between each set of two columns, and therefore a difference of 1 would signify a switch point, where they went from valuing themselves to the target.

Before doing this, we noticed that in the author’s code, they reversed the order of the columns in each (signified by select(49:40) rather than select(40:49)). As we couldn’t figure out why they had done this, we decided to change it back to the normal order so it didn’t affect whether -1 or 1 signified a switch point.

study_1_WTR<-study_1 %>%
  select(40:49,50:59,60:69,70:79,80:89,90:99) 
  
# change all instances of -0.45 in the first column for each anchor to 0 
study_1_WTR$X37_.13[which(study_1_WTR$X37_.13==-.45)]<-0
study_1_WTR$X23_.8[which(study_1_WTR$X23_.8==-.45)]<-0
study_1_WTR$X75_.26[which(study_1_WTR$X75_.26==-.45)]<-0
study_1_WTR$X19_.7[which(study_1_WTR$X19_.7==-.45)]<-0
study_1_WTR$X46_.16[which(study_1_WTR$X46_.16==-.45)]<-0
study_1_WTR$X68_.24[which(study_1_WTR$X68_.24==-.45)]<-0

# create a matrix to store the above data (that removes null)
study_1_WTRdummy <- as.matrix(study_1_WTR)

# change any non-zero value to 1
study_1_WTRdummy[study_1_WTRdummy != 0] <- 1
glimpse(study_1_WTRdummy)

##  num [1:504, 1:60] 0 0 0 0 0 0 0 0 1 0 ...
##  - attr(*, "dimnames")=List of 2
##   ..$ : NULL
##   ..$ : chr [1:60] "X37_54" "X37_46" "X37_39" "X37_31" ...

study_1_WTR <- as.data.frame(study_1_WTRdummy)
rm(study_1_WTRdummy)

Now that we had the same dataset holding all the values for the participants’ choice of whether to prioritise themself or the target, represented in 1s and 0s, we now needed to mutate the new columns which calculated the differences. We first tried to do this using the mutate function and a lag, which we found on a forum.

diff <- study_1_WTR %>%
mutate(diff = X37_.13 - X37_.13)
ncol(diff)

## [1] 61

This worked! In our excitement, we started typing out the code for each pair of columns, but very soon we realised how long that would take, as there were 60 columns, and there must be an easier way to do it! We then tried using a variety of different methods, including a for loop, if/else loop and a while loop but we didn’t have any luck. After hunting for quite a while on Google and trudging through many forums, I stumbled across a function colDiff from the package matrixStats that looked like it did exactly what we were looking for. That is, created a new dataset with just the differences between columns. Our WTR dataset had 60 columns, so the new dataset should have 59 columns. After installing the package, we used the below code.

library(matrixStats) # allows the colDiff and rowDiff functions
# transform study_1_WTR into a matrix, required for the function
WTR_matrix <- as.matrix(study_1_WTR)  
col_diff <- colDiffs(study_1_WTR)

## Error in colDiffs(study_1_WTR): Argument 'x' must be a matrix or a vector.

Unfortunately, the dataset it produced still had 60 columns, and after a manual inspection I could see it hadn’t taken the difference between the columns, bit instead the rows! Despite the unusual naming of these functions, I decided to try the function rowDiffs as this seemed to be the opposite.

row_diff <- rowDiffs(WTR_matrix)
ncol(row_diff)

## [1] 59

Success! This was so rewarding as it was really looking like we might manually have to write out the code to mutate a new column 59 times for each pair of columns. This also reinforced how effective dividing and conquering can be when facing challenges. Caitlyn and Helen were trying different loops, while I tried out this new package, and then with Sophie’s help was able to make it work. This stragegy worked particuarly well for us, as it means you can try more things in less time, as most don’t end up working! This also made me realise that there are so many different packages out there, that often do exactly what you want, the difficult bit is finding them!

Next, we wanted to turn this new dataset holding the differences back into a dataset so we could calacualte the switch points. To do this, we used the function as.data.frame and then removed the WTR_matrix variable using rm.

row_diff <- as.data.frame(row_diff)
rm(WTR_matrix)

We then needed to count the number of switch points in each anchor, so that we could remove any participant that had more than two in any anchor. At this point, we still didn’t fully understand what a switchpoint was, as the article was very hard to understand. We were under the impression that a switch point was any time the target changed from valuing themself to the target, or vice versa. As such, we needed to change all the -1s to 1s in the dataset. This would mean if we summed the total of the columns in each anchor point for each participant, we would get the number of switch points. We used the following code to change all values equal to (==) -1 in the row_diff dataset into a 1.

row_diff[row_diff == '-1'] <- 1

This worked! Now, we wanted to sum the total of each set of columns corresponding to an anchor point. This got a little confusing, however we realised each column ending in a 0 (10, 20 etc.) were the columns representing the differences between two different anchor points, and hence should be ignored. We found the function rowSums which allowed you to calculate the sum for a row in specified columns, and used this to mutate new columns that held the summated total for each anchor point.

data_excluded_1 <- study_1 %>%
  mutate(switchpoint1= rowSums(row_diff[,1:9] == 1),
         switchpoint2= rowSums(row_diff[,11:19] == 1),
         switchpoint3= rowSums(row_diff[,21:29] == 1),
         switchpoint4= rowSums(row_diff[,31:39] == 1),
         switchpoint5= rowSums(row_diff[,41:49] == 1),
         switchpoint6= rowSums(row_diff[,51:59] == 1))

We then wanted to remove all participants that had more than two switchpoints for any anchor, so we used the function filter to only keep participants with less than two switchpoints for any anchor.

data_excluded_1 <- data_excluded_1 %>%
  filter(switchpoint1 <= 2,
         switchpoint2 <= 2,
         switchpoint3 <= 2,
         switchpoint4 <= 2,
         switchpoint5 <= 2,
         switchpoint6 <= 2)
nrow(data_excluded_1)

## [1] 372

Heartbreakingly, this was far less than the number stated in the article, so we knew something was wrong. After a lot of discussion and many hours on call, we realised that a switchpoint actually is when the target goes from valuing themself to the target, and not the other way around. So, we reran the code, removing the line that changed all the -1s into 1s.

data_excluded_1 <- study_1 %>%
  mutate(switchpoint1= rowSums(row_diff[,1:9] == 1),
         switchpoint2= rowSums(row_diff[,11:19] == 1),
         switchpoint3= rowSums(row_diff[,21:29] == 1),
         switchpoint4= rowSums(row_diff[,31:39] == 1),
         switchpoint5= rowSums(row_diff[,41:49] == 1),
         switchpoint6= rowSums(row_diff[,51:59] == 1)) %>%
  filter(switchpoint1 <= 2,
         switchpoint2 <= 2,
         switchpoint3 <= 2,
         switchpoint4 <= 2,
         switchpoint5 <= 2,
         switchpoint6 <= 2)
nrow(data_excluded_1)

## [1] 472

This value is much more similar to the one in the article, so we must be on the right track! The article states that 35 participants were excluded based on the switchpoints, and we had excluded 32, so we were getting close! We then filtered the dataset again to remove the participants excluded either for gender or English (as described above).

data_excluded_1 <- study_1 %>%
  mutate(switchpoint1= rowSums(row_diff[,1:9] == 1),
         switchpoint2= rowSums(row_diff[,11:19] == 1),
         switchpoint3= rowSums(row_diff[,21:29] == 1),
         switchpoint4= rowSums(row_diff[,31:39] == 1),
         switchpoint5= rowSums(row_diff[,41:49] == 1),
         switchpoint6= rowSums(row_diff[,51:59] == 1)) %>% 
  filter(sex!=3, # exclude gender = "other"
         English_exclude==0, # exclude poor English
         switchpoint1 <= 2, # exclude >2 switchpoints for each anchor
         switchpoint2 <= 2,
         switchpoint3 <= 2, 
         switchpoint4 <= 2,
         switchpoint5 <= 2,
         switchpoint6 <= 2) 
nrow(data_excluded_1)

## [1] 470

This number is so close to the 467 in the authors’ final dataset, only 3 participants off. However, when rereading the article, we noticed that it stated there were 464 participants after exclusion, which was inconsistent with their own final dataset! We also noticed that running our exclusion criteria for the English and gender only excluded a further 2 participants, even though 5 participants were excluded when we previosuly had ran this filter (as above). This means that three participants who were excluded for gender or English were also being excluded based on WTR, however in the article it states that there was no overlap between these participants, as 35 were excluded for WTR, and a further 5 for English and gender, totaling 464. At this point we were thinking something fishy was definitely going on, but we wanted to prove it!

Up until this point, we had for the most part been coding as a team, with one person screen sharing the R and everyone else looking on, however we were stumped why this wasn’t producing the right number, and we all decided to capitalise on our individual strengths to try and figure out why. Helen has always been really comfortable with Excel, so decided to recreate our code in excel and exclude participants that way, to see if it produced the same number. Sophie, who originally manually looked through the original data to identify people with switchpoints, very generously offered to go through each of the 504 participants again and visually determine whether they had more than two switchpoints in any anchor and then exclude their participant number if they did. Caitlyn and I were confident our code was correct, so I pulled up the author’s final dataset with 467 participants, and together we read out each participant that was excluded to see where the discrepancies were.

After quite some time, when our eyes were really starting to burn, Sophie had finished her manual check and had a list of participants she had excluded. She had found 27 people to exclude (based on WTR, gender and English), and 6 more which she was uncertain about, as they ‘switched’ on the first trial of an anchor, which we weren’t sure if this counted. This gave us a final number of 471, which was very close to ours, and probably a result of an error in the manual check. However, this number still wasn’t the 464 reported in the article. At this point we were getting very frustrated, but Caitlyn and I finished our comparison of our dataset and the auhtors’ dataset. We found three discrepancies; the authors code was excluding participants 130, 181 and 364, while ours was not. Sophie noted that she hadn’t excluded these participants either, and after each of us manually checking them, we all could not see any reason for them to be excluded, and believe that their exclusion was incorrect. However, in order to reproduce the figures, we manually excluded these three participants from the dataset using a filter so that our final dataset is the exact same as the authors. We assume these three extra people were somehow excluded by the authors as they didn’t realise there were three overlaps between the WTR and gender/English exclusion criteria, but we can never know for sure, as we cannot see any reason to exclude them!

data_excluded_1 <- data_excluded_1 %>%
  filter(participant != 130,
         participant != 181,
         participant != 364)
nrow(data_excluded_1)

## [1] 467

Finally, the same dataset as the authors code! This was quite the emotional sigh of relief, and we were all feeling quite proud of ourselves for rewriting their exclusion criteria in code that we could actually understand, that is arguably much simpler. This process had also allowed us to realise what we think is a mistake in their dataset, something we would never have realised had we just used their code and tried to understand it. This was an interesting aspect of the project, as we realised that even if we were able to reproduce something, this doesn’t necessarily mean that these results are correct. I think this gave us all a profound appreciation for how important commenting your code and explaining your analysis is, so that not only can other’s reproduce your results, but so they can ensure the analysis is correct.

Study 2

English and Gender Exclusion

Using the same code from Study 1, we filtered out those who indicated they were neither a man nor a woman and who demonstrated nonsensical descriptions of their partners or poor English to reproduce the number excluded in the article.

sex_exclusion_2 <- study_2 %>%
  filter(sex!=3)
nrow(sex_exclusion_2)

## [1] 429

English_exclusion_2 <- study_2 %>%
  filter(English_exclude==0)
nrow(English_exclusion_2)

## [1] 423

Our code excluded one participant based on gender, and 7 based on English, both which match that reported in the article!

WTR Exclusion

For the WTR exclusion, we used the exact same code as we did for Study 1, only slightly changing it, as the number of WTR anchors was different.

# select the columns that relate to the WTR anchors
study_2_WTR<-study_2 %>%
  select(55:64, 65:74, 75:84)

# change the value of -.45 to 0
study_2_WTR$X75_.26[which(study_2_WTR$X75_.26==-.45)]<-0
study_2_WTR$X19_.7[which(study_2_WTR$X19_.7==-.45)]<-0
study_2_WTR$X46_.16[which(study_2_WTR$X46_.16==-.45)]<-0

study_2_WTRdummy <- as.matrix(study_2_WTR) # create a matrix to store the above
study_2_WTRdummy[study_2_WTRdummy != 0] <- 1 # change any non-zero value to 1
study_2_WTR <- as.data.frame(study_2_WTRdummy) # transform the above back into a dataset
rm(study_2_WTRdummy) # remove study_2_WTRdummy
WTR_matrix <- as.matrix(study_2_WTR) # transform study_2_WTR into a matrix
row_diff <- rowDiffs(WTR_matrix) # calculate the differences between columns
row_diff <- as.data.frame(row_diff) # turn row_diff back into dataset
rm(WTR_matrix) # remove WTR_matrix

# count the number of switch points in each anchor and create excluded dataset
data_excluded_2 <- study_2 %>%
  mutate(switchpoint1= rowSums(row_diff[,1:9] == 1),
         switchpoint2= rowSums(row_diff[,11:19] == 1),
         switchpoint3= rowSums(row_diff[,21:29] == 1)) %>% 
  filter(switchpoint1 <= 2,
         switchpoint2 <= 2,
         switchpoint3 <= 2) 
nrow(data_excluded_2)

## [1] 409

The article states it excluded 19 due to WTR exclusion, however our code had actually excluded more participants than this, 21 in total. We thought this was unusual as this was the opposite problem to Study 1. Before manually checking the data, we decided to add in the English and sex exclusions, to see how many participants were excluded in total.

data_excluded_2 <- study_2 %>%
  mutate(switchpoint1= rowSums(row_diff[,1:9] == 1),
         switchpoint2= rowSums(row_diff[,11:19] == 1),
         switchpoint3= rowSums(row_diff[,21:29] == 1)) %>% 
  filter(switchpoint1 <= 2,
         switchpoint2 <= 2,
         switchpoint3 <= 2,
         sex!=3,
         English_exclude==0) 
nrow(data_excluded_2)

## [1] 404

As 8 were excluded for English and gender, and 21 for WTR, but only 26 total, there must be two participants that were excluded for both WTR and gender/English. However, similar to Study 1, the authors did not report this when writing the article, which makes reproduction confusing.

As the above total was very similar to that reported in the article, we decided to manually compare our final dataset to the authors’, as this worked well for Study 1. We found that the only discrepancy was participant 263, which our code was not excluding but was excluded in the authors’ final dataset. By manually checking all the WTR anchors, as well as their gender and English, once again we couldn’t see any logical reason why they were excluded, similar to the three in Study 1. Again, for the purposes of reproducing the figures, we manually excluded this participant.

data_excluded_2 <- data_excluded_2 %>%
  filter(participant != 263)
nrow(data_excluded_2)

## [1] 403

We finally have the same dataset as the authors, and this time it matched the final number reported int the article (unlike Study 1), so we were ready to move onto Study 3.

Study 3

English and Gender Exclusion

Using the same code from Studies 1 and 2, we excluded participants based on gender and English. At first we ran into an error saying that it couldn’t find the variable English_exclude, however we soon realised that only a capital letter was causing this error so we changed the statement to include English_Exclude. This was the first time we noticed the inconsistencies in variable labelling across the studies, which was extremely frustrating when locating the correct variables.

sex_exclusion_3 <- study_3 %>%
  filter(sex!=3)
nrow(sex_exclusion_3)

## [1] 897

English_exclusion_3 <- study_3 %>%
  filter(English_Exclude==0)
nrow(English_exclusion_3)

## [1] 865

Our code excluded 8 participants based on gender, and 40 based on English, both which match that reported in the article!

WTR Exclusion

For the WTR exclusion, similar to Study 2, we used the same code and only slightly changed it so it to select the correct columns and use the correct dataset.

# select just the columns related to WTR ratios
study_3_WTR<-study_3 %>%
  select(38:47, 48:57, 58:67)

# change the value of -.45 to 0
study_3_WTR$X75_.26[which(study_3_WTR$X75_.26==-.45)]<-0
study_3_WTR$X19_.7[which(study_3_WTR$X19_.7==-.45)]<-0
study_3_WTR$X46_.16[which(study_3_WTR$X46_.16==-.45)]<-0

study_3_WTRdummy <- as.matrix(study_3_WTR) # create a matrix to store the above
study_3_WTRdummy[study_3_WTRdummy != 0] <- 1 # change any non-zero value to 1
study_3_WTR <- as.data.frame(study_3_WTRdummy) # transform the above back into a dataset
rm(study_3_WTRdummy) # remove study_2_WTRdummy
WTR_matrix <- as.matrix(study_3_WTR) # transform study_2_WTR into a matrix
row_diff <- rowDiffs(WTR_matrix) # calculate the differences between columns
row_diff <- as.data.frame(row_diff) # turn row_diff back into dataset
rm(WTR_matrix) # remove WTR_matrix

# count the number of switch points in each anchor and create excluded dataset
data_excluded_3 <- study_3 %>%
  mutate(switchpoint1= rowSums(row_diff[,1:9] == 1),
         switchpoint2= rowSums(row_diff[,11:19] == 1),
         switchpoint3= rowSums(row_diff[,21:29] == 1)) %>% 
  filter(switchpoint1 <= 2,
         switchpoint2 <= 2,
         switchpoint3 <= 2) 
nrow(data_excluded_3)

## [1] 859

So, our code excluded 46 participants based on WTR, however the article states that only 30 were excluded. Before doing a manual check, we combined all the exclusion criteria, as the overlap between them had been the source of a lot of the issues in Studies 1 and 2.

data_excluded_3 <- data_excluded_3 %>%
  filter(sex!=3,
         English_Exclude==0)
nrow(data_excluded_3)

## [1] 828

As 48 were excluded for English and gender, and 46 for WTR, but only 77 total, there must be 17 participants that were excluded for both WTR and gender/English. However, similar to Studies 1 and 2, the authors did not report this overlap when writing the article. Despite how misleading this is in the article, we were used to it by now, and after a manual comparison of our dataset and the authors’ final dataset, we found only one discrepancy, participant 619. Again, we could not see any reason for this participant to be excluded, however we manually excluded them for the purposes of reproducing the figures.

data_excluded_3 <- data_excluded_3 %>%
  filter(participant != 619)
nrow(data_excluded_3)

## [1] 827

Now our dataset finally matched the authors’ final dataset, and we were ready to move onto the figures. Despite creating our own datasets for each study that matched the authors’, the journey in reproducing the WTR exclusions taught us so much!

Reproducing the exclusion criteria had allowed us to find participants that we beleive should not have been excluded, which alhtough very minimally, would have impacted the results. Not only this, but the way the authors wrote the article made the whole journey far more difficult than it would have been if they described their process in the code, and correctly labelled their variables. At first we were disgruntled that they didn’t better label their code and provide explanations, however I know am grateful that they didn’t, as it meant we attempted the exclusion criteria using our own code, and this allowed us to realise the issues in their code and datasets, which we probably would not have if we just followed their code!

We also learnt the most about R doing the exclusion criteria. Not only did we find lots of new packages and functions, but we learned how to learn R. Personally, forums are my best friend, as they allow you to copy and paste the code, editing it to your dataset, and you can see different methods (proposed by different people) in the same forum. We also learnt how to overcome errors, both in your code and in the logic behind your code. We found that when R produced an error, simply Googling it often told you what you needed to change. Somewhat more difficult was when there was an error in the logic of our code, as these were harder to pick up on and fix. Despite this, I think the process of rewriting the exclusion criteria taught us so much about how to find solutions to overcome challenges.

Finally, I wanted to summarise what we learnt about teamwork, that is, working together while things were easy, and then dividing and conquering when we faced challenges. As I described, this allowed us to maximise our time and efficiency, which also making sure we all understood all the code. Understanding the code is so important in each of us being able to test out possible solutions ourselves, which then in turn allowed us to progress in our coding.

So, it is probably quite obvious, but the WTR exclusion criteria was extremely frustrating, but in the end, extremely rewarding.

Reproducing the Figures

Now that we had the same datasets as the article, it was time to move onto creating the figures. We had a look through the authors code expecting to find the code for them and… nothing! Unfortuntaely for us, the authors did not provide any of the code for the graphs, so we turned to Google.

Figure 1

Study 1

To create the violion and boxplot, Google instructed us to first load the ggplot2 package.

library(ggplot2) # used to make graphs and plots

We then needed to create the backdrop for the plot. The function ggplot allows you to create this, using the dataset and then inputting the x- and y-axis variables into aes(x,y). Along the x-axis the graph has the relationship type, and the y-axis has the average contact comfort. We looked through the dataset and found a column called relationship_category which we assumed was the variable for the x-axis. For the y-axis, we thought there would be a column with the average contact comfort, however, no luck. So, we created a new column in the dataset using the function mutate which calculated the average value across each of the 10 contact comfort variables (comf1:comf10) for each participant. As mentioned above, we subtracted 4 to return it back to the correct range (-3 to 3).

data_excluded_1 = data_excluded_1 %>% 
  mutate(average_cc = rowMeans(select(., comf1:comf10)-4))

We then used the ggplot function to create the plot with the dataset data_excluded_1, and added the violin plot and boxplot. We also realised that we needed to transform the relationship_category column into a factor to prevent the graph from merging along the x-axis.

# transform relationship_category into a factor
data_excluded_1$relationship_category <- as.factor(data_excluded_1$relationship_category)

#create the violin plot
violin_1 <- ggplot(data_excluded_1, aes(relationship_category, average_cc)) +
  geom_violin() + # add the violin plot
  geom_boxplot() # add the box plot
print(violin_1)

Next, we needed to change the aesthetics, to make it more similar to the one in the article, as well as more aesthetically pleasing. I spent far too long Googling and playing around with the aesthetics, as I took it on as my personal challenge to reproduce Figure 1 as best as I could.

violin_1 <- ggplot(data_excluded_1, aes(relationship_category, average_cc)) +
  geom_violin() + # add the violinplot
  geom_boxplot( # add the boxplot
    width = 0.1) + # change the width to be smaller 
  ylab("Contact Comfort") + # rename the y-axis
  xlab("Relationship Category") + # rename the x-axis
  ggtitle("Study 1") # title the graph
  
print(violin_1) # print the graph

Study 2

Using the same code as above, we created the basic graph for Study 2.

# mutate the new average_cc column in the dataset 
data_excluded_2 = data_excluded_2 %>% 
  mutate(average_cc = rowMeans(select(., comf1:comf10)-4))

# turn target category column into a factor
data_excluded_2$target_category <- as.factor(data_excluded_2$target_category)

# create the violin plot
violin_2 <- ggplot(data_excluded_2, aes(target_category, average_cc)) +
  geom_violin() + # add the violinplot
  geom_boxplot( # add the boxplot
    width = 0.1) + # change the width to be smaller 
  ylab("Contact Comfort") + # rename the y-axis
  xlab("Relationship Category") + # rename the x-axis
  ggtitle("Study 2") # title the graph
  
print(violin_2) # print the graph

Study 3

Again, we used the same code to produce the basic graph for Study 3.

# mutate the new average_cc column in the dataset 
data_excluded_3 = data_excluded_3 %>% 
  mutate(average_cc = rowMeans(select(., comf1:comf10)-4))

# turn Value column into a factor
data_excluded_3$Value <- as.factor(data_excluded_3$Value)

# create the violin plot
violin_3 <- ggplot(data_excluded_3, aes(Value, average_cc)) +
  geom_violin() + # add the violinplot
  geom_boxplot( # add the boxplot
    width = 0.1) + # change the width to be smaller 
  ylab("Contact Comfort") + # rename the y-axis
  xlab("Relationship Category") + # rename the x-axis
  ggtitle("Study 3") # title the graph
  
print(violin_3) # print the graph

We noticed that for some reason, the two values on the x-axis were reversed in our graph compared to in the article. We consulted we Google, and tried the function scale_x_reverse, however we were unable to make this work. We then found the function levels = which allows you to nominate a column order, which we used to reverse the columns.

data_excluded_3$Value <- as.factor(data_excluded_3$Value)

# create new variable reversing x axis order
x1 = factor(data_excluded_3$Value, levels = c(1,0))

print(violin_3) # print the graph

At last, we had the basic graphs for each study. The biggest lesson we learnt during this process was that inconsistent variable names is a barrier to reproducability. The authors’ used a variety of different labels to refer to the same varable, notably, when referring to the targets relationship, they used relationship_category in Study 1, target_category in Study 2, and Value in Study 3. This meant we often struggled to determine which variables to use in the graphs, and more than once we used the wrong one, or ran into an error because a variable did not exist in one of the studies.

Combined Figure

Now we had the individual plots for each study, we needed to combine them into one figure. After searching on Google, we found the functions ggarrange, grid.arrange and facet_grid, all which combined the graphs, however they did not allow (or at least we couldn’t work out how to) us to remove the shared y-axis and have them share an x-axis. After some time, I found the package Patchwork, which allows you to combine graphs using just a plus symbol.

library(patchwork) # allows you to easily merge graphs
violin_1 + violin_2 + violin_3

Now that we had reproduced the basic Figure, I was determined to replicate the aesthetics to exactly match the article. As a bit of a perfectionist, I really enjoyed this part of the process, as it was so satisfying hunting on Google for the answers, then coming back to add them to the code, and watching the graph slowly come to life. I haven’t included all the intermediary steps in this report, as there were so (so!) many, but by the end, I was able to get it very close to the one in the paper, an achievement I am extremely proud of, and I learnt so much in the process.

violin_1 <- ggplot(data_excluded_1, aes(relationship_category, average_cc)) +
  geom_violin(aes(fill = relationship_category, # fill colour with relationship type
                  colour = relationship_category, # outline colour with relationship type 
                  alpha = .8)) + # make it opaque
  geom_boxplot(aes(colour = relationship_category), 
               outlier.shape = 16, # make the outliers circles
               outlier.size = 1, # make the outlier circles smaller 
               width = 0.1) + # make the boxplot thinner
  ylab("Contact Comfort") + # rename y-axis
  ggtitle("Study 1") + # title the graph 
  scale_x_discrete(name = NULL, labels = c('Romantic\nPartner', 'Friend', 'Acquaintance', 'Enemy')) + # rename each x-axis value with the appropriate relationship type
  theme(panel.background = element_rect(fill = "transparent"), # change the background to be transparent 
        legend.position = "none", # remove the legend
        plot.title = element_text(hjust = 0.5), # centre the title
        axis.line.x.bottom = element_line(colour = "black"), # add in a black x-axis
        axis.line.y.left = element_line(colour = "black"), # add in a black y-axis
        plot.margin = margin(0, 0, 0, 0,'cm')) + # remove the plot margins 
  scale_fill_manual(values = c("#3F6487","#B17276", "#829966", '#E89B39')) + # add the fill colours manually by Hex
  scale_colour_manual(values = c("#3F6487","#B17276", "#829966", '#E89B39')) + # add the fill colours manually by Hex
  scale_y_continuous(breaks = c(-3, -2, -1, 0, 1, 2, 3)) # change the y-axis numbers to range from -3 to 3 

violin_2 <- ggplot(data_excluded_2, aes(target_category, average_cc)) +
  geom_violin(aes(fill = target_category, #  fill colour with relationship type
                  colour = target_category, # outline colour with relationship type 
                  alpha = .8)) + # make it opaque
  geom_boxplot(aes(colour = target_category), # fill colour with relationship type
               outlier.shape = 16, # make the outliers circles
               outlier.size = 1, # make the outlier circles smaller 
               width = 0.1) + # make the boxplot thinner
  ggtitle("Study 2") + # title the graph
  scale_x_discrete(name = NULL, labels = c('Friend', 'Acquaintance', 'Enemy')) + # rename each x-axis value with the appropriate relationship type 
  theme(panel.background = element_rect(fill = "transparent"), # change the background to be transparent 
        legend.position = "none", # remove the legend
        plot.title = element_text(hjust = 0.5), # centre the title
        axis.line.x.bottom = element_line(colour = "black"), # add in a black x-axis
        plot.margin = margin(0, 0, 0, 0, 'cm')) + # remove the margins
  scale_fill_manual(values = c("#B17276", "#829966", "#E89B39")) + # add the fill colours manually by Hex
  scale_colour_manual(values = c("#B17276", "#829966", "#E89B39")) # add the fill colours manually by Hex

violin_3 <- ggplot(data_excluded_3, aes(x1, average_cc)) +
  # add the violin plot
  geom_violin(aes(fill = x1, #  fill colour with relationship type
                  colour = x1, # outline colour with relationship type
                  alpha = .8)) + # make it opaque
  geom_boxplot(aes(colour = x1), # fill colour with value of stranger  
               outlier.shape = 16, # make the outliers circles
               outlier.size = 1, # make the outlier circles smaller 
               width = 0.1) + # make the boxplot thinner
  ggtitle("Study 3") + # title the graph
  scale_x_discrete(name = NULL, labels = c('High-Value\nStranger','Low-Value\nStranger')) + # rename each x-axis value with the appropriate value type   
  theme(panel.background = element_rect(fill = "transparent"), # change the background to be transparent 
        legend.position = "none", # remove the legend
        plot.title = element_text(hjust = 0.5), # centre the title
        axis.line.x.bottom = element_line(colour = "black"), # add in a black x-axis
        plot.margin = margin(0, 0, 0, 0, 'cm')) + # remove the plot margins 
  scale_fill_manual(values = c("#86699A", "#6494A5")) + # add the fill colours manually by Hex
  scale_colour_manual(values = c("#86699A", "#6494A5")) # add the fill colours manually by Hex 

# COMBINED
# merge the graphs and remove y-axis elements for Study 2 and Study 3
Figure_1 <- violin_1 + violin_2 + theme(axis.text.y = element_blank(), 
                                        axis.ticks.y = element_blank(), 
                                        axis.title.y = element_blank()) + violin_3 +
  theme(axis.text.y = element_blank(), axis.ticks.y = element_blank(), axis.title.y = element_blank())

library(cowplot) # allows drawing on plots
Figure_1 <- ggdraw() + draw_plot(Figure_1) + 
  draw_line( # draw the first line 
x = c(0.37, 0.37), # in between the first and second graph
y = c(0.1, 0.95), # from the bottom to the top
colour = "dark grey", size = 0.5, linetype = 2) +  # change the colour, size, and type to dashed
  draw_line( # draw the second line 
x = c(0.68, 0.68), # in between the first and second graph
y = c(0.1, 0.95), # from the bottom to the top
colour = "dark grey", size = 0.5, linetype = 2 # change the colour, size, and type to dashed
)
print(Figure_1)

Above is the figure we reproduced (unfortunately a bit stretched when knitted), and below is the one from the article. Success!

Figure 2

Study 1

The next figure is three scatterplots and a legend, arranged in a grid. When looking at the figure, the x-axis is labelled WTR and the y-axis is Contact Comfort. As with Figure 1, we tried to locate the axis variables in the data, however the only thing that looked appropriate was each of the 60 WTR columns. To test whether these were where the data was coming from, we used a single WTR value as an example.

# create the scatterplot
scatterplot <- ggplot(data_excluded_1, aes(X37_54, average_cc)) + geom_point() 

print(scatterplot)

This looked like it was producing the correct type of graph, but with only a small amount of the input. From this, we assumed WTR was referring to an average WTR for each participant, or something similar. However, there was no information in the article, codebook, or the data. So, we looked at the authors original code to try and find how it was calculated. We found a column in the original code called WTRTOTAL with a value in it for each participant. By manually looking at each participant, we tried to work out how they had gotten this number, but it was not an average, and we couldn’t see how they possibly got to this number! We looked at the codebook and supplementary materials as well, but there was no mention of this variable. After speaking to Yuki, we decided to give in and use the authors code, so that we could reproduce the figures, even though we have no idea where WTRTOTAL came from. We copy and pasted in the authors’ original code from the start up until the creation of the final dataset. I have ommitted it from view in this report as we used the exact code from the author, only changing Finaldata to Finaldata_1, Final_data_2, and Finaldata_3 so we could distinguish between the studies.

In order to make the scatterplots, we needed to mutate the new column with the average contact comfort score for each participant, as we have in our final dataset, called average_cc. We also needed to transform the variable that contains the relationship type, previously called relationship_category in our dataset, but called relation in the authors’ final dataset, into a factor.

# mutate new average_cc column
Finaldata_1 = Finaldata_1 %>% 
  mutate(average_cc = rowMeans(select(., comf1:comf10)-4))

Finaldata_1$relation <- as.factor(Finaldata_1$relation) # transform into factor

We then ready to create a basic scatterplot with 95% CIs. We used the functions geom_point and geom_smooth, and included method=lm to create a linear line of best fit.

scatterplot_1 <- ggplot(Finaldata_1, aes(WTRTOTAL, average_cc)) +
  geom_point() + 
  geom_smooth(method = lm)
print(scatterplot_1)

## `geom_smooth()` using formula 'y ~ x'

This worked, however we realised we were missing the other regression lines for each relationship type. To add these in, we added another line with the function geom_smooth and filled with relation.

scatterplot_1 <- ggplot(Finaldata_1, aes(WTRTOTAL, average_cc)) + 
  geom_point() + 
  geom_smooth(method = lm) +
  geom_smooth(aes(fill = relation), method = "lm")
print(scatterplot_1)

Now that we had created the basic scatterplot with all the important information, it was time to reproduce the aesthetics. Helen took the lead on this one, using the code I had produced for Figure 1 and adapting it for this figure.

scatterplot_1 <- ggplot(Finaldata_1, aes(WTRTOTAL, average_cc)) + 
  geom_point(aes(colour = relation), size = 1) + # add the data points
  ggtitle(label = "Study 1", ) + # add the title
  theme(plot.title = element_text(hjust = 0.5)) + # centre the title
  scale_y_continuous(breaks = c(-3, -2, -1, 0, 1, 2, 3), name = "Contact Comfort") + # change the y-axis range and name
  scale_x_continuous(breaks = c(-0.45, 0.05, 0.55, 1.05, 1.55), name = "WTR") + # change the x-axis range and name
  geom_smooth(aes(fill=relation), # add the relationship regression lines
              method = "lm", # make the lines linear
              linetype = "dashed", # make the lines dashsed
              colour = "black", # make the lines black
              size = 0.3) + # make the lines thinner
  geom_smooth(method = "lm", colour = "black", size = 0.3) + # add the main regression line
  theme(panel.background = element_rect(fill = "transparent"), # make the plot background transparent
        axis.line.x.bottom = element_line(colour = "black"), # add the x-axis in black
        axis.line.y.left = element_line(colour = "black")) + # add the y-axis in black
  scale_fill_manual(values = c("#A3B5C4", "#D2AEB1","#BBC7AC", "#F6CB98"), # manually change the colour using Hex
                    name = "Target\nStudy 1", # label the legend 
                    labels = c("Romantic Partner", "Friend", "Acquaintance", "Enemy")) + # change variable names in the legend
  scale_colour_manual(values = c("#A3B5C4", "#D2AEB1", "#BBC7AC", "#F6CB98"), # manually change the colour using Hex
                      name = "Target\nStudy 1", # label the legend 
                      labels = c("Romantic Partner", "Friend", "Acquaintance", "Enemy")) # change variable names in the legend

print(scatterplot_1)

Study 2

We used equivalent code to produce the scatterplot for Study 2.

print(scatterplot_2)

Study 3

Again, we used equivalent code to produce the scatterplot for Study 3.

print(scatterplot_3)

Finally, we needed to combine the graphs into a grid, with the bottom-right being the legends combined. After consulting Google (which we were getting very good at), I found that you could use Patchwork and the function guides = ‘collect’.

Figure_2 <- scatterplot_1 + scatterplot_2 + scatterplot_3 + # merge the scatterplots
  guide_area() + plot_layout(guides = 'collect') # group the legends together in the bottom-right

print(Figure_2)

Above is the figure we reproduced (unfortunately squished when knitted), and below is the one from the original article. At this point, we were finally feeling the satisfaction of coding!

Descriptive Statistics - Attempt 2

Study 1

As we now had the same final datasets as the author, we wanted to try and calculate the mean contact comfort from Study 1 again, hoping that now it would produce the right number.

We used the exact same code as we did previously.

comfort_1 <- colMeans(data_excluded_1[c(30:39)],na.rm = TRUE) 
print(mean(comfort_1)-4)

## [1] 0.06775326

print(sd(comfort_1))

## [1] 0.5815826

Despite being much closer, we were still technically unable to reproduce the number in the paper (0.06), as the above statistic, when rounded, would be 0.07. As we were confident that our code was correctly calculating the mean contact comfort, we figured it was likely that either the authors wrote this statistic in the article, then changed the dataset and forgot to update it, or it was simply a rounding error. Alternatively, it may be that we were still using the wrong dataset, but given how close it was, we figured the former explanation was more likely. As for the standard deviation, we were unsure why this was so different from that in the article (1.97). We attempted to calculate the population standard deviation, to see if the authors had mistakenly used this formula, however this was also wrong. Similar to the above, as we were confident in our code, we concluded that either the authors wrote this line before changing the dataset, or they simply reported the wrong statistic.

Study 3

As Study 2 did not have any descriptive statistics, the final part of our journey was to reproduce those for Study 3. The statistic was:

To do this, I used the summarise function I had previously learnt to create a table with the means for honesty and kindness across the different values. I used the groub_by function to separate them into high-value and low-value, and the mean and round functions to generate the statistic.

value_summary <- data_excluded_3 %>% 
  group_by(Value) %>% 
  summarise(
    HonestMean = round(mean(Alex_honest), digits = 2),
    KindMean = round(mean(Alex_kind), digits = 2))
print(value_summary)

## # A tibble: 2 × 3
##   Value HonestMean KindMean
##   <fct>      <dbl>    <dbl>
## 1 0           2.39     2.39
## 2 1           9.59     9.39

According to the codebook, a value of 0 indicates low-value, and a value of 1 indicates high-value. As such, the values are close, but not identical to those published in the article. We were not sure why there was a difference, as our code was relatively simple, and presume it is working correctly. Similar to the descriptive statistic for Study 1, we concluded that likely either the authors wrote this statistic before changing the dataset, or they reported the incorrect number. Another explanation is that they were conducting a more compelx analysis, however they did not describe this or make it clear what it was. Regardless of the explanation for these differences, as we were unable to reproduce any of their descriptive statistics, there is a clear issue with reproducability in this paper. If the authors’ had of provided more explanation for this figures, or the code that produced them, this would have greatly improved our ability to reproduce them, or conclude that they were incorrect, however as they did not, we were unable to make any solid conclusions, which is a strong barrier to reproducability.

Part 3 - Exploration

Question 1

One of my first reactions to the article was regarding the statement in the abstract that suggests people are less opposed to infection-risky behaviours with more socially valued strangers. When I read this, I started thinking about how quickly humans make judgements on the people we meet, and how we then use this impression as a basis for our interactions with them. It is clear from Figure 1 in the article that people are more comfortable with strangers that are deemed to be ‘high-value’ as opposed to ‘low-value’. This led me to consider what other judgements the participants were making about this stranger, that either influenced their contact comfort, or was a by-product of their judgement of their value. While reproducing the exclusion criteria and figures for Study 3, I noticed that participants were asked to rate many characteristics of the stranger (called Alex), however neither the article nor the supplementary materials published any findings on these. As such, I was particularly interested in answering the following question:

Does the perceived value of a stranger influence judgements of their personality and physical characteristics?

In Study 3, participants made judgements on 8 different characteristics of Alex, the high-value or low-value stranger, those being:

Honesty
Kindness
Attractiveness
Intelligence
Height
Weight
Age
Wealth

As honesty was explicitly addressed in the summary of Alex, participants in the High-Value condition should rate Alex as significantly more honest than those in the Low-Value condition. As such, I decided to look at this trait first, to ensure that the participants were sensitive to the summary they were given, and that this influenced their judgements on Alex.

First, I wanted to create a summary table which calculates the average estimated honesty of Alex between the value conditions. I used the summarise function to create the table, and the group_by function to separate the table into High-Value and Low-Value. I then used the functions mean, sd and se to calculate the mean, standard deviation, and standard error respectively. I used the round function I learnt when reproducing the demographic statistics from the paper to round each value to 2 decimal places.

honest_summary <- data_excluded_3 %>% 
  group_by(Value) %>% 
  summarise(
    mean = round(mean(Alex_honest), digits = 2),
    sd = round(sd(Alex_honest), digits = 2),
    se = round(sd/sqrt(827), digits = 2))
print(honest_summary)

## # A tibble: 2 × 4
##   Value  mean    sd    se
##   <fct> <dbl> <dbl> <dbl>
## 1 0      2.39  1.53  0.05
## 2 1      9.59  1.24  0.04

As per the codebook, a value of 0 indicates Low-Value and a value of 1 indicates High-Value. It is clear from this table that participants in the High-Value condition estimated Alex’s honesty is much higher than those in the Low-Value condition. This indicates participants were sensitive to the condition they were in, and were correctly evaluating Alex based on the summary they were given. I then wanted to create a plot comparing the conditions, so I could visusalise the difference. Using the code I learnt reproducing the violin and boxplot from the article, I was able to very easily use the functions ggplot and geom_boxplot to create the plot, as well as rename the values and axis and change the colours.

honest_plot <- ggplot(data_excluded_3, 
       aes(Value, Alex_honest, colour = Value, fill = Value)) +
  geom_point() +   # add the data points
  geom_boxplot(alpha = 0.4) + # add the boxplot
  scale_x_discrete(label = c("Low-High\nStranger", "High-Value\nStranger"), name = NULL) + # add the x-axis labels and remove the x-axis title
  ylab("Estimated Honesty") + # change the y-axis label and range
  theme(legend.position = 'none') + # remove the legend 
  ylim(0, 10) +
  scale_fill_manual(values = c("#B17276", "#829966")) +
  scale_colour_manual(values = c("#B17276", "#829966"))
print(honest_plot)

Although this graph is informative and shows the difference between the conditions, I wanted it to show each of the individual data points along the y-axis. After some searching I found the function geom_jitter, which adds some variation between the points along the x-axis so that all data points can be presented. As such, I changed geom_point to geom_jitter in the above code code.

honest_plot <- ggplot(data_excluded_3, 
       aes(Value, Alex_honest, colour = Value, fill = Value)) +
  geom_jitter() +   # add the data points
  geom_boxplot(alpha = 0.4) + # add the boxplot
  scale_x_discrete(label = c("Low-High\nStranger", "High-Value\nStranger"), name = NULL) + # add the x-axis labels and remove the x-axis title
  ylab("Estimated Honesty") + # change the y-axis label and range
  theme(legend.position = 'none') + # remove the legend 
  ylim(0, 10) +
  scale_fill_manual(values = c("#B17276", "#829966")) +
  scale_colour_manual(values = c("#B17276", "#829966"))
print(honest_plot)

Much better! Even though the difference between the conditions is clear, I wanted to run a t-test to establish significance and 95% CIs. First I needed to create two different datasets, one for each condition. I used a filter statement to only select participants with either a 0 or a 1 in the Value column and piped this into our final dataset for Study 3 to create the new datasets.

Study_3_high <- data_excluded_3 %>% 
  filter(Value == "1") 
Study_3_low <- data_excluded_3 %>% 
  filter(Value == "0")

I then used the function t.test to run the t-test, using the column Alex_honest from both the high- and low-value datasets.

t.test(Study_3_high$Alex_honest, Study_3_low$Alex_honest)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_honest and Study_3_low$Alex_honest
## t = 74.336, df = 801.86, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  7.004219 7.384161
## sample estimates:
## mean of x mean of y 
##  9.585185  2.390995

As the p value is much less than 0.05, significance has been established, and the 95% CIs demonstrate that the difference is between 7.00 and 7.38 points. Now that I had established this difference, I wanted to answer my question by looking at what other judgements are influenced by someone’s perceived value. Although the summary table shows the mean, sd and se, I find the graph easier to understand and interpret, so I ran the same plot code as above to create a graph for each characteristic, only changing it accommodate each different variable. I have ommitted this from view, as the code for each graph is equivilent. I then used the function patchwork, which we used previously to reproduce the figures in the paper, to combine all the graphs together so they could all be viewed together.

# patch the graphs together
Figure_3 <- honest_plot + attract_plot + intell_plot + kind_plot + weight_plot + wealth_plot + age_plot + height_plot

print(Figure_3)

From the above Figure, it is clear that the value of a stranger is influencing at least some of their perceived characteristics. Similar to honesty, value has a large impact on estimated kindness, which was unsurprising given the description of Alex. More surprisingly however was the differences in estimated attractiveness, intelligence, and wealth, as none of these characteristics were mentioned in the description. From the figure, there does not seem to be significant difference in perceptions of age, height, and weight between the conditions. As the plots are quite small, it is difficult to determine whether the difference are significant, so I ran the same code as above to carry out a t-test for each variable to establish significance and the 95% CIs.

t.test(Study_3_high$Alex_attractive, Study_3_low$Alex_attractive)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_attractive and Study_3_low$Alex_attractive
## t = 11.043, df = 795.75, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.241564 1.778388
## sample estimates:
## mean of x mean of y 
##  7.012346  5.502370

t.test(Study_3_high$Alex_intelligent, Study_3_low$Alex_intelligent)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_intelligent and Study_3_low$Alex_intelligent
## t = 17.601, df = 753.52, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  1.762132 2.204564
## sample estimates:
## mean of x mean of y 
##  8.116049  6.132701

t.test(Study_3_high$Alex_kind, Study_3_low$Alex_kind)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_kind and Study_3_low$Alex_kind
## t = 69.34, df = 822.45, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  6.800807 7.197051
## sample estimates:
## mean of x mean of y 
##  9.385185  2.386256

t.test(Study_3_high$Alex_weight, Study_3_low$Alex_weight)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_weight and Study_3_low$Alex_weight
## t = -1.3898, df = 806.77, p-value = 0.165
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7090478  0.1212063
## sample estimates:
## mean of x mean of y 
##  7.056790  7.350711

t.test(Study_3_high$Alex_wealth, Study_3_low$Alex_wealth)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_wealth and Study_3_low$Alex_wealth
## t = 3.4693, df = 819.22, p-value = 0.000549
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1640794 0.5916517
## sample estimates:
## mean of x mean of y 
##  6.674074  6.296209

t.test(Study_3_high$Alex_age, Study_3_low$Alex_age)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_age and Study_3_low$Alex_age
## t = 1.1255, df = 821.72, p-value = 0.2607
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.2610931  0.9629479
## sample estimates:
## mean of x mean of y 
##  12.06420  11.71327

t.test(Study_3_high$Alex_height, Study_3_low$Alex_height)

## 
##  Welch Two Sample t-test
## 
## data:  Study_3_high$Alex_height and Study_3_low$Alex_height
## t = 1.168, df = 824.86, p-value = 0.2431
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.1706559  0.6722532
## sample estimates:
## mean of x mean of y 
##  8.696296  8.445498

From the above t-tests, when comparing high-value and low-value strangers, there is a significant difference in ratings for attractiveness, intelligence, kindness, and wealth. No significant difference were established for weight, age, or height. The t-tests confirmed the above conclusions from viewing the graphs.

I found these results fascinating, as I am very intrigued by this aspect of psychology. There is research demonstrating that people ascribe more desirable personality traits and successful life outcomes to more attractive people. However, in this study, it seems like the reverse may be true, that is, people ascribe more valued people as more attractive and more successful (in regards to wealth and intelligence). It is interesting that seemingly this link between value, and attractiveness and success, seems to be bidirectional, wherein underlying processes associate these characteristics, and thus when given only a short summary of a person, people’s judgements of seemingly unrelated characteristics are influenced. I also found it interesting that the stranger’s value did not significantly influence judgements of physical traits, those being height, weight, and age. This lack of variation is likely because participants were given a photo of Alex (randomly assigned to 1 of 40 different photos in both conditions) along with the summary, and thus these characteristics are less subjective. It should be noted however that there was still a significant difference in attractiveness, despite the faces being selected from a range of attractiveness, and both conditions being given the same range. Thus, through my exploration I have found that a stranger’s value significantly influences judgements on their perceived characteristics.

Question 2

After reading through the article, I was convinced that the interpersonal value of known individuals affects willingness to engage in infection-risky behaviour, however I was curious as to what other factors also influence this behaviour.When looking through the codebook for the studies, I noticed that in Study 1 and Study 2, participants were asked how long they had known the target. This piqued my interest, as the length of time you have known someone often influences how comfortable you are around them, and thus would likely influence how comfortable you are in engaging in infection-risky behaviour with them. Furthermore, the length of time you have known someone is not always equivalent to how much you value them, that is, many people may value a partner the have known only a few years over a lifelong friend. This led me to explore the following question:

Is willingness to engage in infection-risky behaviour predicted by the length of the interpersonal relationship?

In order to answer my question, I needed to first determine the correlation between relationship length and contact comfort, across Studies 1 and 2. As Study 3 involved only strangers, this data was not relevant to my question. To start off with, I needed to figure out how to combine the data for Study 1 and 2, as this was not something we had needed to do thus far in our reproduction journey. On an R forum, I found the function rbind, which allows you combine two datasets together. First, I created a new dataset for each study and mutated the average_cc column (explained in depth in the verification section) in each new dataset. I then attempted to bind the two together using rbind.

# mutate new average_cc column calculating average contact comfort across the 10 items
Study1_length <- data_excluded_1 %>%
  mutate(average_cc = rowMeans(select(., comf1:comf10)-4)) 
   
Study2_length <- data_excluded_2 %>%
  mutate(average_cc = rowMeans(select(., comf1:comf10)-4))

Study_length <- rbind(Study1_length, Study2_length) # bind the datasets

## Error in rbind(deparse.level, ...): numbers of columns of arguments do not match

Unfortunately, this produced an error saying that the number of columns did not match. Thankfully the error on R made it very clear what the problem was, and to solve it I simply used the select function to only select the necessary columns, such that there were the same number in each dataset.

Study1_length <- Study1_length %>%
  select(participant, part_leng, average_cc) # select only these three columns
   
Study2_length <- Study2_length %>%
  select(participant, relate_length, average_cc) # select only these three columns

Study_length <- rbind(Study1_length, Study2_length) # bind the datasets

## Error in match.names(clabs, names(xi)): names do not match previous names

Another error! This time it was because the column names didn’t match up. This was something I didn’t realise needed to be the same, but with the power of hindsight, made sense that it does. I used the mutate function to create a new column that changed part_leng to relate_length so they had the same name, and then re-selected the columns for Study 1.

Study1_length <- data_excluded_1 %>%
  mutate(relate_length = part_leng) %>% # rename part_leng to relate_length to bind
  select(participant, relate_length, average_cc)
   
Study_length <- rbind(Study1_length, Study2_length)

Finally, the two datasets were bound together! This process made me realise that I needed to slow down and take a step back. Although rushing into coding and just copy and pasting a line of code straight from a forum may work, it is often better to think logically about what you want to do, how you could do it, and how the certain function would do what you wanted. Clearly I was unfamiliar with rbind, and had not taken the time to understand how it worked, which meant I ran into errors, which were both time-consuming and disheartening. Thankfully both these errors were easy to fix by reading the R output, however it did make me consider each of my lines of code more deeply going forward.

Next, I wanted to determine whether relationship length was correlated with contact comfort. A quick Google produced the function cor.test, which gives both the correlation and the p-value. As I had learnt previously, I included na.rm = TRUE to remove any NA values, so the output would not be NA.

cor.test(Study_length$relate_length, Study_length$average_cc, na.rm = TRUE)

## 
##  Pearson's product-moment correlation
## 
## data:  Study_length$relate_length and Study_length$average_cc
## t = 6.1004, df = 862, p-value = 1.597e-09
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1386189 0.2665143
## sample estimates:
##       cor 
## 0.2034343

This demonstrates there is a positive correlation between relationship length and contact comfort, and the p-value is less than 0.05, so the effect has reached significance. To better visualise the correlation, I wanted to create a scatterplot with this data. Using similar code that we used to reproduce Figure 2 from the article, I was quickly able to create a scatterplot representing the correlation.

scatter <- ggplot(Study_length, aes(relate_length, average_cc)) + 
  geom_jitter(colour="#B17276", size = 0.6) + # add the dots 
  geom_smooth(colour="black", method = "lm", size = 0.5) + # add the regression lines
  ggtitle(label = "Average Contact Contact by Relationship Length") + # title the graph
  xlab("Relationship Length (years)") + # add the x-axis label
  ylab("Average Contact Comfort") + # add the y-axis label
  theme(legend.position = "none", # remove the legend
        plot.title = element_text(hjust = 0.5)) +  # centre the title
  scale_y_continuous(breaks = c(-3, -2, -1, 0, 1, 2, 3)) # change the y-axis range
print(scatter)

Now that I had established the correlation across Studies 1 and 2, I wanted to determine whether the correlation was similar between the studies, or if one was larger than the other. I hypothesised that the correlation would be higher for Study 2, as I thought the Romantic Partner condition in Study 1 may reduce the correlation as partners are often highly valued after a shorter amount of time. I used the exact same code as above to produce the correlations, only ommitting the rbind step, and changing the variable names.

cor.test(Study1_length$relate_length, Study1_length$average_cc, na.rm = TRUE)

## 
##  Pearson's product-moment correlation
## 
## data:  Study1_length$relate_length and Study1_length$average_cc
## t = 4.4531, df = 462, p-value = 1.063e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1139415 0.2885718
## sample estimates:
##       cor 
## 0.2028692

cor.test(Study2_length$relate_length, Study2_length$average_cc, na.rm = TRUE)

## 
##  Pearson's product-moment correlation
## 
## data:  Study2_length$relate_length and Study2_length$average_cc
## t = 4.0352, df = 398, p-value = 6.542e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1021881 0.2906551
## sample estimates:
##       cor 
## 0.1982534

From the correlation tests, there is a significant positive correlation in both studies (p-value < 0.05), however the correlation is higher for Study 1. To visualise the correlations, I created a scatterplot for each, using the same code as above (omitted as it is equivalent).

scatter_1 + scatter_2

Now, after being able to clearly visualise the correlations between the two studies, I found it interesting that the correlation was higher for Study 1, which contradicted my hypothesis, and led me to think about why this may be. The major difference between Study 1 and Study 2 is that Study 2 controlled for sex, and thus removed the Romantic Partners condition. I based my hypothesis on the fact that for myself and other young people I know, often a romantic partner that hasn’t been known for long is more valued than a longer-time friends. Thus, to determine whether it was in fact the inclusion of romantic partners in Study 1 driving the difference, I wanted to compare the Study 1 with romantic partners included and Study 1 without romantic partners. First, I needed to create a new dataset for Study 1 excluding those where the target was a romantic partner. I used the filter function to only include participants without (!=) a 4 in the relationship_category variable, and then used the same code as above the rename the variable names and select only the relevant columns.

# create a new dataset for Study 1 excluding participants in Romantic Partner condition
Study1_length_excRP <- data_excluded_1 %>%
  mutate(average_cc = rowMeans(select(., comf1:comf10)-4)) %>% # mutate average_cc variable
  mutate(relate_length = part_leng) %>% # rename part_leng to relate_length to bind
  filter(relationship_category!=1) %>% # exclude targets that were romantic partners
  select(participant, relate_length, average_cc)

Then, using the same code as for Study 1, I ran a correlation test to determine whether excluding romantic partners changed the correlation.

cor.test(Study1_length_excRP$relate_length, Study1_length_excRP$average_cc, na.rm = TRUE)

## 
##  Pearson's product-moment correlation
## 
## data:  Study1_length_excRP$relate_length and Study1_length_excRP$average_cc
## t = 4.1025, df = 339, p-value = 5.123e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1139097 0.3163826
## sample estimates:
##       cor 
## 0.2174845

When including targets that were romantic partners, the correlation was 0.203 (as above). As such, removing these participants only slightly decreased the overall correlation. To visualise this, I used the same code to create a scatterplot for both the included and excluded Study 1 datasets (code omitted as it is equivilent).

scatter_1a + scatter_3

Plotting the relationship length against contact comfort and comparing the inclusion and exclusion of romantic partner targets was surprisingly informative! From the graphs, the correlation between the variables is similar same across studies, however the mean contact comfort varies significantly. Notably, the average contact comfort is lower when romantic partners were excluded. This result is to be expected as it is in line with the findings of the paper that contact comfort is highest for romantic partners. However, it is interesting that the exclusion of romantic partners does not greatly influence the correlation between relationship length and contact comfort. Thus, to answer my question, these results suggest relationship length and contact comfort are associated, fairly independently of relationship type. That is, as relationship length increases, average contact comfort is likely to be higher.

Question 3

One of my reactions after initially reading the article through was that the authors noted that they excluded participants who identified as neither male nor female so that they could examine sex differences. However, no such findings were reported in the paper, which made me confused as to why they mentioned this, and why they excluded participants that identified as neither sex. After starting the verification journey, I soon realised that additional findings were published in the Supplementary Materials, which included an analysis on the relationship between participant sex and target sex on contact comfort (pictured below).

While looking at this figure, I found the differences particularly interesting, and was curious as to how honesty-humility ratings would differ based on the relationship between participant sex and target sex. I find myself particularly interested by sex differences, as it is fascinating how differences in either nature (sex) or nurture (upbringing and societal pressure) prompt noticeable differences in behaviours and judgements. I found it surprising in the above graph that women were more comfortable engaging in contact with male targets. As a woman, I am more comfortable engaging in contact with other women, excluding only my romantic partner. When looking again at the graph I noticed that this effect in woman was only apparent in Study 1, which made me think that the inclusion or romantic partners was driving the higher ratings of comfort with male targets for female participants. This process made me interested in examining what other sex differences are in the data, and specifically whether the combination of participant sex and target sex affected honesty-humility ratings. As such, my final exploration question is:

Does the relationship between participant sex and target sex influence participant honesty-humility ratings?

To explore this question, I first created a new dataset, and used the mutate function to create a new column with the average honesty-humility (HH) rating across all 10 questions. I then used the select function to only select the necessary columns. I then used the as.factor function to transform sex and part_sex into factors, so they could be used for the graph. Then, using a combination of code used for the above explorations and for the figures from the article, I was able to produce a similar graph to the supplementary mateials, with honesty-humility ratings instead of contact comfort.

Study1_HH <- data_excluded_1 %>% # create a new dataset 
  mutate(mean_HH = rowMeans(select(., HH1:HH10))) %>% # mutate mean HH column
  select(participant, sex, part_sex, mean_HH) # select only relevant columns

Study1_HH$sex <- as.factor(Study1_HH$sex) # transform sex into factor
Study1_HH$part_sex <- as.factor(Study1_HH$part_sex) # transform part_sex into factor

HH_graph1 <- ggplot(Study1_HH, aes(sex, mean_HH, fill=factor(part_sex))) +
  geom_boxplot() + # add the boxplot 
  ylab("Honest Humility") + # rename the y-axis
  xlab(" Participant Sex") + # rename the x-axis
  scale_x_discrete(labels = c('Male', 'Female'))+
  ggtitle("Study 1") + # title the graph
  theme(panel.background = element_rect(fill = "transparent"), # change the background to be transparent 
        plot.title = element_text(hjust = 0.5), # centre the title
        axis.line.x.bottom = element_line(colour = "black"), # add in a black x-axis
        axis.line.y.left = element_line(colour = "black"), # add in a black x-axis
        plot.margin = margin(0, 0, 0, 0, 'cm')) + # remove the plot margins 
  scale_fill_manual(values = c("#A3B5C4", "#D2AEB1"), # manually change the colour using Hex
                    name = "Target Sex", # label the legend 
                    labels = c("Male", "Female")) # change variable names in the legend

print(HH_graph1) # print the graph

From the graph, the pattern of results was very similar to that for contact comfort. That is, male participants rated female targets on average as higher in honesty-humility (than male targets), while female participants rated male targets as higher (than female targets). Similar to contact comfort, this effect was larger for male participants than female participants.

Initially, I aimed to produce the same figure as the authors’, with honesty-humility instead of contact comfort, however I soon realised that Study 3 did not involve honesty-humility ratings, nor whether the target was male or female, and as such I could only create the figures for Studies 1 and 2. So, I used the above code for Study 2, only changing the variable names (code omitted as equivalent).

print(HH_graph2) # print the graph

This graph was more interesting, as it varied significantly from the pattern of results observed for contact comfort. That is, target sex seems to produce only minimal differences in honesty-humility ratings for both males and females. For contact comfort, males in Study 2 had significantly higher contact comfort for female targets, whereas in this figure, no significant difference can be seen.

The variable driving the differences between Study 1 and 2 is likely the inclusion of romantic partners in Study 1. That is, I assume that participants are more likely to rate their romantic partner as higher in honesty-humility, and the majority of couples in this study were opposite sex. Thus, ratings of honesty-humility in Study 1 were much higher when the target was of the opposite sex. In order to test this hypothesis, I ran a t.test similar to that used in Question 1 to determine whether honesty-humility ratings were higher for romantic partners. I used the filter function to filter those either with a target that was a romantic partner (==1) or not a romantic partner (!=1), and the mutate function to create the average honesty-humility rating.

# create new dataset for only those with a romantic partner target
Study_1_RP <- data_excluded_1 %>% 
  filter(relationship_category == "1") %>% 
  mutate(mean_HH = rowMeans(select(., HH1:HH10))) # mutate mean HH column

# create new dataset for those with a non-romantic partner target
Study_1_NRP <- data_excluded_1 %>% 
  filter(relationship_category != "1") %>% 
  mutate(mean_HH = rowMeans(select(., HH1:HH10))) # mutate mean HH column

# run the t-test
t.test(Study_1_RP$mean_HH, Study_1_NRP$mean_HH)

## 
##  Welch Two Sample t-test
## 
## data:  Study_1_RP$mean_HH and Study_1_NRP$mean_HH
## t = 7.139, df = 278.72, p-value = 8.176e-12
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.4303125 0.7579698
## sample estimates:
## mean of x mean of y 
##  3.431200  2.837059

As the p-value is less than 0.05, this effect reached significance. However, I realised this is fairly intuitive, as the non-romantic partner dataset is including less valued targets such as enemies. As such, I wanted to use a similar method to Question 2 to remove the romantic partner targets from Study 1, allowing a more fair comparison between studies.

Study1_HH_excRP <- data_excluded_1 %>% # create a new dataset 
  mutate(mean_HH = rowMeans(select(., HH1:HH10))) %>% # mutate mean HH column
  filter(relationship_category!=1) %>% # exclude targets that were romantic partners
  select(participant, sex, part_sex, mean_HH) # select only relevant columns

# transform variables into factors
Study1_HH_excRP$sex <- as.factor(Study1_HH_excRP$sex) # transform sex into factor
Study1_HH_excRP$part_sex <- as.factor(Study1_HH_excRP$part_sex) # transform part_sex into factor

# create the plot
HH_excRP_graph1 <- ggplot(Study1_HH_excRP, aes(sex, mean_HH, fill=factor(part_sex))) +
  geom_boxplot() + # add the boxplot 
  ylab("Honest Humility") + # rename the y-axis
  xlab(" Participant Sex") + # rename the x-axis
  scale_x_discrete(labels = c('Male', 'Female'))+
  ggtitle("Exc. Romantic Partners") + # title the graph
  theme(panel.background = element_rect(fill = "transparent"), # change the background to be transparent 
        plot.title = element_text(hjust = 0.5), # centre the title
        axis.line.x.bottom = element_line(colour = "black"), # add in a black x-axis
        axis.line.y.left = element_line(colour = "black"), # add in a black x-axis
        plot.margin = margin(0, 0, 0, 0, 'cm')) + # remove the plot margins 
  scale_fill_manual(values = c("#A3B5C4", "#D2AEB1"), # manually change the colour using Hex
                    name = "Target Sex", # label the legend 
                    labels = c("Male", "Female")) # change variable names in the legend

HH_incRP_graph1 + HH_excRP_graph1 + # merge the plots
  guide_area() + plot_layout(guides = 'collect') # group the legends together in the bottom-right

When comparing the pattern of results above, it is interesting that male participants still rate female targets as significantly higher in honesty-humility compared to male targets, even when the target is not a romantic partners. Conversely, for female participants, when the target is not a romantic partner, there is no significant difference in honesty-humility ratings for male vs. female targets. The significant change in the pattern of results for females was particularly interesting to me, as it became clear that the romantic partner condition was driving the higher honesty-humility ratings for opposite sex targets, however this was only true for female participants. As such, it seems like male participants innately rate female targets as higher in honesty-humility, regardless of whether they are in a romantic relationship with the target.

As such, this answered my question, suggesting that target sex does influence honesty-humility ratings for both male and female participants. Additionally, in females, this effect is driven by the target being a romantic partner, while for males, this effect is independent of whether the target is a romantic partner.

All in all, the exploration process was the part I personally found most interesting and rewarding, as I could look at the variables that I was interested by that the authors had not published any findings on. Furthermore, I could utilise a lot of the knowledge our group had learnt throughout the term, which made producing the plots much easier and much less frustrating!

Part 4 - Reccomendations

Comprehensive Annotations and Explanations for Code

The most significant barrier to reproducability for us was the lack of explanation of the authors code. There were many times throughout the journey where we looked to the authors code, to either try and understand what they were doing, or to determine whether their reported statistics were correct. However, the authors only included a few lines of comments in their code. Moreover, the majority of these comments were a general label describing what a chunk was doing, such as ‘Variable Computing’ as opposed to explaining what each line of code was actually computing.

The lack of explanation was particularly important when we were trying to reproduce the exclusion criteria, as the code was particularly complicated. Initially, we attempted to use and understand the author’s code, however after approximately 10 lines, we were unable to understand it, despite running each line at a time. The lack of comments was a major factor inour group deciding to write our own code, so that we could be confident that it was doing what we wanted. However, when we found discrepancies between the dataset our code produced and the author’s final dataset, we could not be sure which results were correct, as we were still unable to understand the authors’. If they had provided better explanations, it would have allowed us to compare and contrast the code, and determine which was correct.

This highlights how important it is for authors to comment and explain their code, so that anyone reading it can understand it, ensure it is valid, and reproduce it themselves. The importance of open code is highlighted in this article written by Lindsay Morton, a senior manager at the Public Library of Open Science. Tyber et al. are definitely not the only authors that did not provide adequate explanation of their code, and thus this highlights how better practices need to implemented and upheld within the field of psychological research. For authors looking to improve their comments, this article written by Jef Raskin called ‘Comments are More Important Than Code’ explores how commenting can be used to provide clarification to code, and help other researchers and laymen understand it.

Use Consistent Variable Names

Another significant issue that we encountered during our reproducability journey was the lack of consistent variable names.

As mentioned throughout the report, equivalent variables across the three studies were labelled differently, leading to significant confusion. This meant that more than once we were using the wrong variable, and only when the different results were produced did we realise this. If the authors provided more consistent variable names, it would improve the reproducability of their research.

Furthermore, it is important that when labeling variables, they are logical. Throughout our analysis, we often found variable names that were illogical, leading to further confusion. For example, in Study 1, there is a variable called part_sex. It is not clear whether this is referring to participant sex, or partner (target) sex When conducting my exploratory analysis, I encountered this issue, as when viewing the dataset, I first noticed this variable and assumed it was participant age, leading to an analysis that was incorrect.

Not only does the labeling of variables lead to confusion, but consistent labeling would also make the process of coding much more succinct. If the same variable across studies was named the same thing, the same chunk of code could be used across studies, allowing for much simpler, shorter and succinct code. However, as the variable names changed, it meant we had to copy the code and change each instance of the variable, often missing one, which was time-consuming and frustrating.

As such, the authors could have reduced barriers to reproducability by using consistent variable names across studies and ensuring they were logical as to not introduce confusion. For authors looking to improve their identification, labeling, and definitions of variables, this resource, written by Amanda J. Rockinson-Szapkiw, a professor at the University of Memphis, explores the different types of variables and how they can be appropriately and logically named to allow for sufficient transfer of meaning.

Provide an Extensive Codebook

Including a logical and comprehensive codebook is extremely important for reproducability. This article, written by Daniel Lakens, a scientist from the University of Technology, Netherlands, illustrates how important including a codebook is to computational reproducability.

Although the authors did provide a codebook, we found that it was not sufficient in explaining the variables, and how they were calculated. This was particularly relevant when trying to understand the WTR task. When we became confused by the data and the author’s code, we looked to the codebook for an explanation, however it did not provide any more detail on how these scores were calculated. In fact, it introduced more confusion, as it stated that a -0.45 score indicated ‘Other’, while in their code they changed a -0.45 to a 0, indicating ‘Self’ (as explained in the above report). This is problematic, as it meant that we were unsure whether the author’s code was incorrect, which has huge implications for the reliability of their results, or whether simply the codebook was incorrect. This highlights how important it is to check that the codebook is correct and lines up with the study, the variables, and the code.

Furthermore, the codebook did provide the details of all variables, or explanations of them. For example, the data recorded the allocated sex of ‘Alex’, the stranger, however the codebook did not indicate which number indicated male or female. This meant that when doing exploratory analysis, I was unable to look at the interaction between participant sex and target sex for this study. More importantly, it meant that if we attempted to reproduce the graphs produced in the supplementary materials, we would be unable to, as it relies on this variable.

Thus, when providing data and code openly, it is extremely important that authors provide a codebook. In this article written by Daniel Lakens et al. (2020), the authors note the components are necessary for an effective codebook, as well as providing links to codebook guides and examples of good codebooks. These resources and examples would be extremely helpful for any author aiming to improve the reproducability of their work.

Check Statistics Following Revisions

Ensuring that code and statistics reproduce after final revisions is extremely important. Throughout the process of peer-review, researchers often will slightly alter their code, or even their dataset, which will alter the results they produce. However, sometimes the authors will write their results and statistics into the body of the article, and not update these after code or dataset changes are made.

Although it is not certain that the authors failed to do this, it would explain some of the inconsistencies between our results and the author’s, especially for the descriptive statistics. After producing the exclusion criteria, our descriptive statistics were very similar to those published, despite using the same dataset, and code that we were confident was workign correctly. This lead me to beleive that potentially the authors may have written these in prior to changes, and forgotten to update them. Furthermore, the authors note in the article that after exclusions, there are 464 participants in Study 1, however the author’s final dataset in their code has 467. This is another reason why author’s should be especially diligent when checking that what they publish in their article is consistent with their final analysis and code.

Daniel Lakens comments on the importance of cross-checking results prior to publishing in this article, noting that although it may be arduous, it is extremely important to the reliability of your results.

Include Plot Code

Although much less significant than the prior recommendations, to improve the ease of reproducability, Tyber et al. could have provided the code they used to create the plots. As they provided no code, it was very time-consuming for our group to reproduce these. However, the majority of this reproducability was merely aesthetics, and thus it is not as important as the previously mentioned recommendations, however it would be useful for those reproducing the findings.

Thank you for reading my report!

Verification Report

Hana McMahon

2022-07-30

Article

Part 1 - Summary and Reaction

Summary

Reaction 1

Reaction 2

Reaction 3

Part 2 - Verification

Reproducibility Plan

Demographic Statistics

Study 1

Study 2

Study 3

Exclusion Criteria

Study 1

Study 2

Study 3

Figures

Descriptive Statistics

Study 1

Study 3

Getting Started

Accessing the Data

Loading the Packages and Data

Demographics

Study 1

Study 2

Study 3

Descriptive Statistics - Attempt 1

Exclusion Criteria

Study 1

Sex Exclusion

English Exclusion

WTR Exclusion

Study 2

English and Gender Exclusion

WTR Exclusion

Study 3

English and Gender Exclusion

WTR Exclusion

Reproducing the Figures

Figure 1

Study 1

Study 2

Study 3

Combined Figure

Figure 2

Study 1

Study 2

Study 3

Descriptive Statistics - Attempt 2

Study 1

Study 3

Part 3 - Exploration

Question 1

Question 2

Question 3

Part 4 - Reccomendations

Comprehensive Annotations and Explanations for Code

Use Consistent Variable Names

Provide an Extensive Codebook

Check Statistics Following Revisions

Include Plot Code