“The Emotional Path to Action: Empathy Promotes Physical Distancing and Wearing of Face Masks During the COVID-19 Pandemic” by Pfattheicher et al. (2020) is the first study of its kind to explore the relationship between empathy and adherence to health measures such as physical distancing and wearing of face masks in a COVID-19 specific context. COVID-19 as “an epidemic crisis” (Pfattheicher et al., 2020) necessitates rapid alterations in one’s core social habits in order to manage the uncontrolled growth in infections and reduce strain on the health care system. However, research examining the factors which can promote these changes in a COVID-19 pandemic environment is limited due to the contemporary nature of the issue. Thus, Pfattheicher et al.’s (2020) research in this area addresses an important gap in the literature.
Methods
The experimenters conducted 4 studies to address their research question.
Study 1 recruited 3 non-representative samples from 3 Western countries (USA, UK and Germany) to investigate the correlation between participants’ self-reported affective empathy for those most vulnerable to the virus and their self-reported adherence to physical distancing. To quantify empathy, participants were asked to respond to 3 prompts such as “I feel compassion for those most vulnerable to coronavirus (COVID-19)” on a 5-point Likert scale with 1 being “strongly disagree” and 5 being “strongly agree”. Physical distancing was also measured through responses on a similar 5-point Likert scale to the following question: “Because of coronavirus (COVID-19), I am massively curtailing social contact (so-called ‘social distancing’)”.
Study 2 was a correlational design similar to Study 1 except that it was conducted in 1 German sample and participants’ adherence to physical distancing was measured in more concrete terms i.e., rather than asking them about “curtailing social contact” in general, items referred to specific circumstances such as visiting the elderly and visiting places where other people will be such as cafes and church. The likelihood of them visiting these places were self-reported on a 5-point Likert scale with 1 being “very unlikely” and 5 being “very likely”. This was later recoded such that higher numbers represented greater motivation to adhere to physical distancing.
Study 3 tested whether manipulations aimed at promoting empathy can increase motivation to physically distance in a German sample. There were 3 conditions:
Adherence to physical distancing was measured using the same items as in Study 2.
Finally, to rule out the alternative explanation that affective empathy increases motivation to physically distance by way of increasing one’s sensitivity to their own vulnerability, Study 4 measured adherence to face-mask wearing instead of motivation to physically distance given that wearing masks (specifically non-surgical grade masks) has a greater protective effect for others than for the wearer. There were 3 conditions:
As for measures:
Results
Although the paper has no explicitly stated hypotheses, it does note that affective empathy has previously been found to promote altruism and caring within a healthcare context. Thus, it appears Pfattheicher et al.’s (2020) stance is that high levels of empathy should increase prosocial behaviour in the context of the pandemic – i.e., increased physical distancing and wearing of face masks.
This aligns with all 4 studies’ findings:
Implications
This paper adds to the growing literature on how the COVID-19 pandemic has impacted our perceptions of health and our personal roles in preventing the spread of disease. It also establishes a positive correlation between empathy and adherence to COVID-19 measures – importantly, that interventions appealing to an individual’s empathy can increase motivation to engage in the prosocial protection of others in a pandemic environment.
Given that increasing adherence to COVID-19 health measures is of great interest to politicians, this paper has important real-world implications. While more studies will need to be conducted in more representative and diverse samples to support these initial findings, this paper provides some impetus for politicians to appeal to affective empathy when promoting adherence to physical distancing and wearing of face masks.
When I was reading, I was excited to learn that… empathy was consistently associated with adherence to COVID-19 health measures. It seems that across all 4 studies, findings relating empathy to motivation to physically distance or motivation to wear a face mask were significant. This indicates to me that this relationship is somewhat robust and is groundbreaking because little is known about what can influence health-related behaviours specifically in a COVID-19 context. This is a highly topical issue even though the official policy is to continue business as usual and “live with COVID” because the more people that continue to engage in these health practices, the more we are able to protect each other and reduce strain on the healthcare system. As such, it is important that we identify effective ways to encourage altruistic health behaviours such that we can continue on our path of “living with COVID” in a sustainable manner. I would be excited to see more research verifying these findings as it could have promising real-world implications.
The most interesting part of this paper was… that while subjective vulnerability did increase in both the empathy condition and the information only condition, this didn’t significantly moderate the effect of empathy on one’s motivation to wear a face mask. Personally, I find myself wearing face masks due to wanting to protect myself from illness and the inconvenience of having to isolate - my initial thought isn’t to wear a face mask to protect others around me just in case I’m carrying the virus. So I think that it is interesting that the results run somewhat counter to my personal experience. I also wonder specifically how appeals to empathy work to encourage face-mask wearing - if it’s not one’s sensitivity to our personal vulnerability nor others’ vulnerability, what does empathy increase our sensitivity to? And how does this relate back to our motivation to wear a face mask? I’m also curious about the relationship between subjective vulnerability and objective vulnerability. Is there any correlation between the two? Do either of them significantly predict one’s motivation to wear a face mask? This measure really interests me because I love learning about how perception and how we experience the environment moderates our behaviour regardless of the realities of that environment.
I wonder whether… having self-report measures that were quite clear in indicating what construct they were getting at could enhance the demand characteristics of the population. Could participants be wanting to appear as empathetic as possible? How valid are these findings when other experiments with similar manipulations yielded a null effect as the experimenters raised in the discussion? By asking participants about their explicit opinions on mask-wearing/physical distancing in an online testing environment, experimenters can’t really operationalise real-world behaviour effectively. Furthermore, it is interesting what real world implications this may have since these measurements and manipulations occurred over a small period of time whilst mask wearing is most effective as a consistent habit. I would love to see future research look into investigating how this study’s findings may meaningfully be applied in the real-world i.e., how might this empathy manipulation translate into the real world? As an advertisement on the street, on TV or perhaps in high-risk settings? It would definitely be interesting to see these kinds of experiments being run in the field.
Demographic statistics
Study 1:
Study 2:
Study 3:
Study 4:
Means/SD
Study 1:
Study 2:
Study 3:
Study 4:
Figures
Here we installed and loaded all the packages necessary to successfully meet our verification goals. Because we were starting from a blank slate, our group had to first install all the relevant packages using the install.package() function. There were some that were installed and loaded from the outset, while others (like the ggpubr package) were recruited as we were problem solving how to accurately reproduce particular figures.
library(tidyverse) # for most of the functions you'll need
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6 ✔ purrr 0.3.4
## ✔ tibble 3.1.7 ✔ dplyr 1.0.9
## ✔ tidyr 1.2.0 ✔ stringr 1.4.0
## ✔ readr 2.1.2 ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
library(readxl) # to read excel files into RStudio
library(here) # to specify the file path, organise files appropriately and work collaboratively through OneDrive
## here() starts at /Users/jennywang/Desktop/University/Courses/2022/Term 2/PSYC3361/Assessments/RStudio files
library(ggplot2) # for most of the functions needed to reproduce the figures
library(ggpubr) # for additional customisation of figures (explained in further detail later)
Initially, we struggled to understand the utility of the here() function and didn’t know how we should implement it to improve our verification report workflow. However, after some discussion with Anita (our tutor) and consulting Jenny Richmond’s blog (http://jenrichmond.rbind.io/post/how-to-use-the-here-package/), we figured that the here() package can be used to facilitate file organisation such that you can map specific file paths to datasets you wish to read. It turned out to be quite useful when organising the shared OneDrive so that it didn’t get too messy.
I ran the here() function by itself to verify where my file path started.
here()
## [1] "/Users/jennywang/Desktop/University/Courses/2022/Term 2/PSYC3361/Assessments/RStudio files"
We began with Study 3 as it looked like one of the simplest to reproduce out of the 4 studies presented in our paper (it had 1 sample unlike Study 1, Study 2 was related to Study 1 and Study 4 was related to Study 3).
The relevant data was saved as “Study3” in the “data&syntax” file in the OSF. There were three formats available (sps, sav and xlsx). Wanting to limit the number of unknowns in an area already unfamiliar to us, we decided to read and interpret the xlsx file.
Study3 <- read_excel(here("data&syntax", "Study3.xlsx"))
Then, I used the glimpse() function from the pillar package (re-exported by dplyr) to gain a brief understanding of what variables were recorded in Study 3. This function makes it possible to see every column in the data frame which is helpful for our purposes.
glimpse(Study3)
## Rows: 868
## Columns: 17
## $ Q22_1 <dbl> 1, 1, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1…
## $ Q22_2 <dbl> 1, 1, 2, 1, 1, 1, 1, 4, 1, 1, 1, 4, 1, 5, 1, 4, 2, 4, 1…
## $ Q22_3 <dbl> 1, 1, 1, 1, 1, 1, 1, 2, 1, 5, 1, 5, 1, 1, 1, 3, 1, 1, 1…
## $ Q22_4 <dbl> 1, 1, 1, 1, 1, 1, 1, 3, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 1…
## $ Q22_5 <dbl> 2, 1, 1, 1, 1, 1, 2, 4, 4, 1, 3, 1, 1, 1, 2, 2, 1, 1, 1…
## $ Q22_6 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q20 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Age <dbl> 32, 57, 63, 50, 52, 44, 47, 24, 43, 57, 31, 50, 19, 53,…
## $ Gender <dbl> 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 2, 1…
## $ Household_size <dbl> 1, 1, 2, 2, 1, 3, 3, 3, 3, 2, 3, 3, 2, 3, 1, 4, 1, 1, 2…
## $ Q28 <chr> "Toast", "Bratkartoffeln", "Hühnchen", "Ich esse mittag…
## $ bed <dbl> 2, 2, 0, 0, 2, 2, 0, 0, 0, 2, 0, 1, 2, 1, 1, 2, 2, 0, 0…
## $ sd <dbl> 1.2, 1.0, 1.2, 1.0, 1.0, 1.0, 1.2, 3.4, 1.6, 1.8, 1.4, …
## $ sdR <dbl> 4.8, 5.0, 4.8, 5.0, 5.0, 5.0, 4.8, 2.6, 4.4, 4.2, 4.6, …
## $ sd01 <dbl> 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1…
## $ d1 <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0…
## $ d2 <dbl> 1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 0, 0, 1, 0, 0, 1, 1, 0, 0…
This glimpse caused considerable confusion among our group. The variable labels made little intuitive sense and we were unsure why the variable labelled Q28 had various food items listed underneath it. We decided to investigate this further by exploring the other file formats as well as the supplementary materials included in the paper’s OSF.
The supplementary materials listed “bed” as corresponding with the condition participants were assigned to for Study 4. We verified that this was also the case for Study 3 by consulting its corresponding sav file (Study3.sav) through Jamovi.
Jamovi: Identifying conditions
To limit later confusion, I renamed the variable as “condition” using the rename() function from the dplyr package and assigned this new dataset to “Study3_Rename” to avoid overwriting the original dataset in case we needed the original dataset late on. The pipe function ( %>% ) helps string functions together such that the rename() function is acting upon the “Study3” object.
Study3_Rename <- Study3 %>%
rename(condition = "bed") # using the rename() function to rename "bed" to "condition"
By consulting the sav file in Jamovi we discovered the following:
However, it was still not absolutely which variable was used to operationalise physical distancing motivation. Our initial idea was to compute a new variable that was the average of all the questions asked as part of Q22 (except Q22_6 which served as an attention check).
Replicating means: Attempt One
The reported means and standard deviations for motivation to physically distance in Study 3 were:
To do this:
Study3_RenameM <- Study3_Rename %>%
mutate(physdist_motiv = (Q22_1 + Q22_2 + Q22_3 + Q22_4 + Q22_5)/5) # creating a new variable for physical distancing motivation called "physdist_motiv" which is calculated by averaging participants' responses to the survey questions
Study3_MControl <- Study3_RenameM %>%
filter(condition == "0") %>% # generates the descriptives for the control condition only
summarise(controlM = mean(physdist_motiv), # using the mean() function to calculate the mean
controlsd = sd (physdist_motiv)) # using the sd() function to calculate the standard deviation
Study3_MInfEmp <- Study3_RenameM %>%
filter(condition == "1") %>% # generates the descriptives for the information + empathy condition only
summarise(infempM = mean (physdist_motiv),
infempsd = sd(physdist_motiv))
Study3_MInf <- Study3_RenameM %>%
filter(condition == "2") %>% # generates the descriptives for the information only condition only
summarise(infM = mean(physdist_motiv),
infsd = sd(physdist_motiv))
print(Study3_MControl)
## # A tibble: 1 × 2
## controlM controlsd
## <dbl> <dbl>
## 1 1.70 0.760
print(Study3_MInfEmp)
## # A tibble: 1 × 2
## infempM infempsd
## <dbl> <dbl>
## 1 1.49 0.657
print(Study3_MInf)
## # A tibble: 1 × 2
## infM infsd
## <dbl> <dbl>
## 1 1.61 0.739
The means did not match. But I also realised that the code was very inefficient. It would be easier to compare means if all the means were in one table. We looked through our notes from Danielle’s modules again, found the group_by() function from the dplyr package and tried it out to see if we could use it later in our coding journey.
Study3_M1 <- Study3_RenameM %>%
group_by(condition) %>% # group_by() function uses "condition" as the organising variable
summarise(mean_resp = mean(physdist_motiv),
sd_resp = sd(physdist_motiv)) %>%
ungroup() # it's good practice to ungroup() the variables so that we can regroup them later if need be
print(Study3_M1)
## # A tibble: 3 × 3
## condition mean_resp sd_resp
## <dbl> <dbl> <dbl>
## 1 0 1.70 0.760
## 2 1 1.49 0.657
## 3 2 1.61 0.739
A minor triumph! However, the matching the means was still an issue.
We decided to adopt a trial and error approach and test the means for a number of variables in the dataset.
Replicating means: Attempt Two
Study3_M2 <- Study3_Rename %>%
group_by(condition) %>%
summarise (mean_resp = mean(sd),
sd_resp = sd(sd)) %>% # this generated a table identical to the one above; this means that "sd" is the average of participants' responses to each of the questions
ungroup() # "sd" is not the variable for motivation to physically distance
Study3_M2 <- Study3_Rename %>%
group_by(condition) %>%
summarise (mean_resp = mean(sdR),
sd_resp = sd(sdR)) %>% # the descriptive statistics in this table match the reported ones!
ungroup()
print(Study3_M2)
## # A tibble: 3 × 3
## condition mean_resp sd_resp
## <dbl> <dbl> <dbl>
## 1 0 4.30 0.760
## 2 1 4.51 0.657
## 3 2 4.39 0.739
After some trial and error, we realised that “sdR” was the variable that the experimenters had used to operationalise physical distancing motivation. I revisted the glimpse() function from above and realised that we could have recognised this earlier given that the numbers listed under variable name “sdR” were closest to the reported means.
It appears that the experimenters calculated this variable by subtracting “sd” from 6 which would allow them to invert the scale such that 5 represented the greatest motivation to physically distance and 1 represented the lowest motivation to physically distance.
To reduce later confusion and improve the clarity of subsequent code, we renamed the variable titled “sdR” to “physdist_motiv” using the rename() function in the dplyr package.
Study3_Rename <- Study3_Rename %>%
rename(physdist_motiv = "sdR")
Now that we had replicated the descriptive statistics, we moved onto replicating the demographic statistics (i.e., the number of participants, the age means and standard deviations as well as the gender distribution).
The number of participants
First, the number of participants. I used the count() function from the dplyr package to verify the number of participants in Study 3.
The paper reported 868 participants.
count(Study3)
## # A tibble: 1 × 1
## n
## <int>
## 1 868
Nice.
The age demographics
Now onto the age demographics. I used the summarise() function from the dplyr package to verify these numbers.
The paper reported the mean age as 35.09 years with the standard deviation being 12.44.
Study3_Age <- Study3_Rename %>%
summarise(mean_age = mean(Age),
sd_age = sd(Age))
print(Study3_Age)
## # A tibble: 1 × 2
## mean_age sd_age
## <dbl> <dbl>
## 1 35.1 12.4
Very nice. Going suspiciously smoothly so far…
The gender distribution
The paper reported the gender distribution as being 43.8% female.
table(Study3_Rename$Gender)
##
## 1 2
## 484 380
However, which gender did “1” and “2” correspond with? The information in Jamovi was written in German (see the image below) but knowing a bit of German myself, I recognised that “1” was male and “2” was female. There was also a variable coded as “3” which was labelled as “Keine Angabe”. I had no idea what this meant so I popped it into Google Translate - turns out it means “No Information”.
Jamovi: Variables labelled in German
380/868 #trying to replicate the percentage female distribution through simple mathematics
## [1] 0.437788
This matches the reported gender distribution!
Attempt One
Initially we tried to plot the summary table “Study3_M2”.
ggplot(data = Study3_M2, mapping = aes(x = condition, y = mean_resp)) +
geom_violin()
This does not look right at all. The condition needed to be a factor variable so that each of the violins were separate for each condition (it was currently continuous).
ggplot(data = Study3_M2, mapping = aes(x = factor(condition), y = mean_resp)) +
geom_violin()
## Warning: Groups with fewer than two data points have been dropped.
## Groups with fewer than two data points have been dropped.
## Groups with fewer than two data points have been dropped.
## Warning in max(data$density): no non-missing arguments to max; returning -Inf
## Warning: Computation failed in `stat_ydensity()`:
## replacement has 1 row, data has 0
This returned some errors. Perhaps this is because there is only one number per “factor” which cannot be used to generate a violin plot. Upon further thought, we realised that by only plotting the means of physical distancing motivation, we would also lose the whiskers in the box plot shown in the paper. We needed to try an alternative method.
Attempt Two
We realised we needed to be graphing using the whole dataset (i.e., “Study3_Rename”), not just the summary table in order to capture the whole graph.
ggplot(data = Study3_Rename, mapping = aes(x = factor(condition), y = physdist_motiv))+
geom_violin()
This looks more correct!
However, I noticed that the condition labels were numeric on the generated figure and the one presented in the paper had the condition specified non-numerically. Thus, I wanted to create a new variable called “conditionlabel” that would non-numerically describe the condition (i.e., as “Control”, “Information + Empathy” and “Information Only”). After some Googling, I discovered how to do this via this website: https://www.statology.org/conditional-mutating-r/.
Accordingly, I used the case_when() function from the dplyr package such that a specified label (e.g. “Control”) corresponded with a specified number in the “condition” column (e.g. “0”).
Study3Conditions <- Study3_Rename %>%
mutate (conditionlabel = case_when(condition == 0 ~ "Control",
condition == 1 ~ "Information + Empathy",
condition == 2 ~ "Information Only"))
Furthermore, Figure 2 had a box-plot overlayed on top of the violin plot. We decided to try geom_boxplot() as part of the ggplot2 package to see if it had all the features we needed (spoiler: fortunately, it did).
We also spent some time perfecting the aesthetics of the plot, often using Google as a tool, as you will see in the code below.
Figure2 <- ggplot(data = Study3Conditions, mapping = aes(x = reorder(conditionlabel, physdist_motiv), y = physdist_motiv)) +
geom_violin(aes(fill = conditionlabel)) +
geom_boxplot(width = 0.1) # this replicated the box-plot overlayed on top of the violin plot; the width was specified using width() to adjust the size of the box plot according to the one presented in the paper
print(Figure2) # I printed the plot first to see if the basic graph was the right shape before I adjusted the aesthetics; it was! yay!
# Superficial Aesthetics #
Figure2 +
theme_bw()+ # from ggplot2 pckg: this gets rid of the grey background in a default ggplot graph
theme(panel.grid.major = element_blank(), # from ggplot2 pckg: this gets rid of the horizontal grid lines at each of the y-axis numbers
panel.grid.minor = element_blank(), # this gets rid of the horizontal grid lines halfway between the y-axis numbers
legend.position = "none") + # this removes the legend
scale_fill_manual(values = c("#A7CFE5", "#B3D88B", "#1779B6"))+ # Google Chrome's "ColourPick Eyedropper" extension tool was used to determine the exact hexadecimal codes for the colours used in the paper
scale_y_continuous(name = "Physical Distancing Motivation") + # from ggplot2 pckg: this specifies the y-axis label, given that it is a continuous variable
scale_x_discrete(name = NULL) + # from ggplot2 pckg: this removes the x-axis label, given that it is a discrete variable
stat_summary(fun = "mean", shape = 9) # from ggplot2 pckg: this plots the mean onto the graph as a specified shape ("9") which was discovered on this website: https://www.oreilly.com/library/view/r-data-visualization/9781788398312/74f79f51-f6ee-45e8-9f7c-8df78c825cf6.xhtml
## Warning: Removed 3 rows containing missing values (geom_segment).
Success! We’ve completed our verification goals for Study 3! We felt that it would be faster if we divide and conquer for subsequent studies since all four experiments were largely similar. Study was assigned to myself and Daphne, Study 2 to Kopal, and Study 4 was assigned to Rukshali and Jasper: we were to attempt the verification goals for the assigned studies and regroup in a week’s time where we would present our efforts so that we were all on the same page.
This study had 3 samples from different countries and thus, there were 3 datasets that we needed to work with.
Study1USA <- read_excel(here("data&syntax", "Study1USA.xlsx"))
Study1Ger <- read_excel(here("data&syntax","Study1Germany.xlsx"))
Study1UK <- read_excel(here("data&syntax","Study1UK.xlsx"))
glimpse(Study1USA)
## Rows: 322
## Columns: 27
## $ Q22_1 <dbl> 4, 4, 5, 5, 5, 2, 5, 1, 5, 4, 5, 5, 4, 4, 5, 4, 5, 5, 4…
## $ Q22_2 <dbl> 4, 3, 5, 5, 4, 2, 5, 5, 5, 5, 5, 4, 4, 3, 5, 3, 4, 4, 2…
## $ Q22_3 <dbl> 4, 4, 5, 5, 5, 2, 5, 3, 5, 4, 5, 5, 4, 4, 5, 5, 5, 5, 4…
## $ Q22_4 <dbl> 1, 1, 4, 4, 3, 2, 2, 1, 4, 3, 5, 2, 4, 1, 5, 1, 4, 4, 1…
## $ Q22_5 <dbl> 4, 3, 5, 4, 4, 2, 5, 1, 5, 4, 5, 3, 4, 3, 5, 5, 5, 5, 4…
## $ Q22_6 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q22_7 <dbl> 5, 2, 4, 4, 4, 2, 2, 4, 5, 4, 4, 4, 3, 3, 5, 5, 4, 3, 4…
## $ Q24_1 <dbl> 4, 4, 5, 4, 5, 2, 4, 5, 5, 4, 4, 4, 5, 2, 5, 4, 4, 5, 4…
## $ Q24_2 <dbl> 4, 5, 5, 5, 5, 2, 5, 5, 5, 4, 5, 4, 5, 3, 5, 5, 5, 5, 4…
## $ Q24_3 <dbl> 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q24_4 <dbl> 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q25_1 <dbl> 5, 5, 5, 5, 5, 2, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q25_2 <dbl> 5, 5, 5, 5, 5, 3, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q25_3 <dbl> 5, 5, 5, 5, 4, 2, 4, 3, 5, 4, 5, 4, 5, 4, 5, 5, 5, 5, 4…
## $ Q25_4 <dbl> 5, 5, 5, 5, 4, 4, 5, 5, 5, 4, 5, 4, 5, 4, 5, 5, 5, 5, 4…
## $ Q32 <dbl> 3, 4, 1, 7, 1, 8, 3, 5, 2, 6, 3, 2, 5, 3, 5, 3, 1, 2, 4…
## $ Q20 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Age <dbl> 20, 19, 22, 46, 32, 19, 49, 31, 51, 21, 18, 21, 25, 34,…
## $ Gender <dbl> 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, 2, 2…
## $ Household_size <dbl> 4, 4, 4, 3, 3, 3, 1, 5, 3, 3, 4, 3, 2, 4, 5, 1, 2, 1, 2…
## $ Education <dbl> 3, 2, 4, 5, 4, 2, 4, 4, 4, 3, 3, 3, 4, 4, 4, 4, 4, 5, 3…
## $ Income <dbl> 12, 3, 12, 6, 9, 1, 3, 7, 2, 6, 10, 8, 6, 8, 5, 2, 3, 5…
## $ ve <dbl> 4.000000, 3.666667, 5.000000, 4.666667, 4.666667, 2.000…
## $ sd <dbl> 4.0, 4.5, 5.0, 4.5, 5.0, 2.0, 4.5, 5.0, 5.0, 4.0, 4.5, …
## $ q <dbl> 5, 5, 5, 5, 5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ i <dbl> 5.0, 5.0, 5.0, 5.0, 5.0, 2.5, 5.0, 5.0, 5.0, 4.0, 5.0, …
## $ gov <dbl> 5.0, 5.0, 5.0, 5.0, 4.5, 3.0, 4.5, 4.0, 5.0, 4.5, 5.0, …
glimpse(Study1Ger)
## Rows: 326
## Columns: 29
## $ Q22_1 <dbl> 4, 5, 3, 2, 5, 4, 4, 4, 5, 2, 2, 4, 5, 5, 5, 4, 3, 4, 3…
## $ Q22_2 <dbl> 29, 32, 32, 32, 30, 32, 30, 29, 31, 29, 30, 30, 32, 31,…
## $ Q22_3 <dbl> 4, 5, 4, 3, 3, 3, 5, 5, 5, 3, 3, 4, 2, 5, 4, 2, 3, 4, 3…
## $ Q22_4 <dbl> 29, 30, 28, 32, 29, 30, 30, 29, 29, 28, 28, 29, 29, 31,…
## $ Q22_5 <dbl> 4, 5, 3, 3, 5, 2, 4, 4, 5, 3, 2, 4, 5, 5, 5, 2, 4, 5, 3…
## $ Q22_6 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q22_7 <dbl> 29, 30, 32, 30, 32, 29, 29, 30, 31, 30, 32, 30, 29, 31,…
## $ Q77_1 <dbl> 5, 4, 3, 5, 5, 1, 3, 5, 5, 1, 4, 4, 5, 4, 3, 5, 3, 5, 5…
## $ Q77_2 <dbl> 4, 4, 4, 5, 5, 4, 3, 5, 5, 2, 4, 4, 5, 5, 5, 5, 4, 5, 5…
## $ Q77_3 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q77_4 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q78_1 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q78_2 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 5, 5, 5…
## $ Q78_3 <dbl> 4, 5, 4, 5, 5, 4, 4, 5, 5, 1, 4, 4, 5, 5, 4, 4, 4, 5, 4…
## $ Q78_4 <dbl> 4, 5, 4, 5, 5, 4, 4, 5, 5, 3, 4, 4, 5, 5, 4, 5, 4, 5, 5…
## $ Q20 <dbl> 1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Age <dbl> 18, 29, 18, 32, 37, 39, 20, 22, 39, 22, 32, 29, 26, 24,…
## $ Gender <dbl> 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 2…
## $ Household_size <dbl> 2, 1, 4, 2, 1, 1, 3, 4, 2, 2, 2, 3, 2, 1, 3, 2, 1, 2, 1…
## $ Q21 <chr> "5b3d0c7477cb3a00019c1bb3", "5d9739389e5b9a0018820f68",…
## $ Q28 <chr> "Pizza", "Pizza", "Schweinefleisch mit Pilzen", "Brot u…
## $ `filter_$` <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ ve <dbl> 4.000000, 5.000000, 3.333333, 2.666667, 4.333333, 3.000…
## $ sd <dbl> 4.5, 4.0, 3.5, 5.0, 5.0, 2.5, 3.0, 5.0, 5.0, 1.5, 4.0, …
## $ q <dbl> 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, …
## $ i <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5…
## $ Zve <dbl> -0.01877901, 1.06156624, -0.73900918, -1.45923934, 0.34…
## $ Zsd <dbl> -0.5627027, 0.8062606, -0.5627027, 0.8062606, 0.8062606…
## $ gov <dbl> 4.0, 5.0, 4.0, 5.0, 5.0, 4.0, 4.0, 5.0, 5.0, 2.0, 4.0, …
glimpse(Study1UK)
## Rows: 317
## Columns: 28
## $ Q41 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q42 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q22_1 <dbl> 4, 4, 5, 5, 5, 5, 5, 4, 5, 5, 4, 5, 4, 5, 4, 5, 5, 2, 5…
## $ Q22_2 <dbl> 2, 4, 4, 3, 4, 5, 5, 5, 4, 4, 4, 4, 5, 3, 5, 5, 4, 4, 4…
## $ Q22_3 <dbl> 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 4, 5, 5, 5, 5, 4, 4…
## $ Q22_4 <dbl> 1, 4, 4, 1, 2, 4, 3, 3, 2, 3, 4, 4, 5, 3, 2, 5, 2, 3, 2…
## $ Q22_5 <dbl> 5, 4, 5, 4, 3, 4, 3, 4, 5, 5, 4, 4, 4, 5, 4, 5, 4, 2, 3…
## $ Q22_6 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q22_7 <dbl> 3, 5, 3, 3, 4, 2, 5, 3, 4, 5, 3, 4, 4, 4, 3, 5, 4, 1, 4…
## $ Q24_1 <dbl> 3, 5, 5, 4, 3, 5, 4, 4, 4, 3, 4, 4, 3, 2, 2, 5, 5, 4, 5…
## $ Q24_2 <dbl> 3, 5, 5, 4, 4, 5, 4, 5, 4, 4, 4, 4, 5, 2, 3, 5, 5, 4, 5…
## $ Q24_3 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, 5, 5, 5…
## $ Q24_4 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 5, 5, 5…
## $ Q25_1 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, 5, 5, 5…
## $ Q25_2 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 5, 5, 5…
## $ Q25_3 <dbl> 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 5, 4, 5, 4, 3, 4, 5, 5, 5…
## $ Q25_4 <dbl> 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 5, 4, 5, 4, 4, 4, 5, 5, 5…
## $ Q20 <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2…
## $ Age <dbl> 24, 31, 31, 33, 34, 21, 49, 27, 33, 45, 39, 39, 54, 53,…
## $ Gender <dbl> 1, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, 2, 2…
## $ Household_size <dbl> 7, 4, 2, 3, 4, 2, 4, 3, 3, 4, 4, 2, 2, 3, 1, 2, 4, 2, 3…
## $ Education <dbl> 4, 3, 5, 4, 4, 3, 2, 4, 4, 3, 4, 4, 3, 2, 5, 5, 3, 4, 3…
## $ Income <dbl> 5, 5, 11, 5, 2, 2, 4, 4, 11, 4, 9, 3, 4, 5, 3, 3, 2, 5,…
## $ ve <dbl> 4.666667, 4.000000, 5.000000, 4.666667, 4.333333, 4.666…
## $ sd <dbl> 3.0, 5.0, 5.0, 4.0, 3.5, 5.0, 4.0, 4.5, 4.0, 3.5, 4.0, …
## $ q <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 5, 5, 5…
## $ i <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 5, 5, 5…
## $ gov <dbl> 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.5, 4.5, 4.5, 5.0, …
The glimpse() function did not tell us much. These datasets had no variable called “sdR” as was used in Study 3 to represent physical distancing motivation. It wasn’t entirely clear which variable corresponded to empathy scores either.
Firstly, I checked the supplementary materials to see if there was anything of note there.
Supplementary materials: Studies 1 and 2
This included the translated items from the survey but didn’t specify which variable name corresponded with which question. To locate this information, I opened each of the datasets in Jamovi. The information was there but I was still unsure whether the variable used to operationalise “empathy” and “physical distancing motivation” were an average of all these questions or could be quantified through a single test item (as it was in Study 3).
A trial and error approach was needed.
Finding empathy
Let’s begin with finding the variable for empathy. We ran multiple variables through the same summarise function used for Study 3 to find what the experimenters used to operationalise “empathy”. This time, however, we didn’t need the group_by() function given that there were no conditions that participants were assigned to.
The reported descriptive statistics were M = 4.46 and SD = 0.74.
First, I ran singular questions through the summarise function:
Study1USA %>%
summarise (mean_emp = mean(Q22_1), #Q: I am very concerned about those most vulnerable to coronavirus COVID-19. The questions in this dataset were in English.
sd_emp = sd(Q22_1))
## # A tibble: 1 × 2
## mean_emp sd_emp
## <dbl> <dbl>
## 1 4.49 0.836
This wasn’t right.
Study1USA %>%
summarise (mean_emp = mean(Q22_3), #Q: I feel compassion for those most vulnerable to coronavirus COVID-19.
sd_emp = sd(Q22_3))
## # A tibble: 1 × 2
## mean_emp sd_emp
## <dbl> <dbl>
## 1 4.61 0.733
This wasn’t right either.
Study1USA %>%
summarise (mean_emp = mean(Q22_5), #Q: I am quite moved by what can happen to those most vulnerable to coronavirus COVID-19.
sd_emp = sd(Q22_5))
## # A tibble: 1 × 2
## mean_emp sd_emp
## <dbl> <dbl>
## 1 4.28 0.926
Still incorrect.
However, I was sure I could find the variable in the dataset without having to calculate it myself since that was the case with Study 3. I decided to run other variables through the summarise function to see what it could be. Revisiting the glimpse() function from above, perhaps it could be any one of “ve”, “sd”, “q”, “i” or “gov” since the numbers looked most feasible given the reported mean.
Study1USA %>%
summarise (mean_emp = mean(ve),
sd_emp = sd(ve))
## # A tibble: 1 × 2
## mean_emp sd_emp
## <dbl> <dbl>
## 1 4.46 0.743
This matched the reported means! I also wanted to quickly check it wasn’t just a coincidence by checking the means and standard deviations of the other variables I thought it could be.
Study1USA %>%
summarise (mean(sd), mean(q), mean(i), mean(gov),
sd(sd), sd(q), sd(i), sd(gov))
## # A tibble: 1 × 8
## `mean(sd)` `mean(q)` `mean(i)` `mean(gov)` `sd(sd)` `sd(q)` `sd(i)` `sd(gov)`
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 4.44 4.88 4.89 4.75 0.802 0.434 0.406 0.471
None of the other means and standard deviations matched the reported ones. “Ve” must be the variable used to operationalise “empathy” in Study 1!
I hypothesised that this was calculated by averaging participants’ responses to the 3 empathy-related questions in the survey. To verify this hypothesis, I used the mutate() function from the dplyr package to create a new variable that was the average of participants’ three responses to Q22_1, Q22_3 and Q22_5 to see if this matched the variable “ve”. This process was similar to the one taken for Study 3 earlier.
Study1USA_Emp <- Study1USA %>%
mutate(empathy = ((Q22_1 + Q22_3 + Q22_5)/3))
Study1USA_Emp %>%
summarise(mean_emp = mean(empathy),
sd_emp = sd(empathy))
## # A tibble: 1 × 2
## mean_emp sd_emp
## <dbl> <dbl>
## 1 4.46 0.743
It’s the same (as suspected)!
Let’s see if “ve” is also the variable for empathy in the German and UK sample.
The reported means and standard deviations for the German and UK sample were M = 4.02, SD = 0.93 and M = 4.56, SD = 0.61 respectively.
Study1Ger %>%
summarise (mean_emp = mean(ve),
sd_emp = sd(ve))
## # A tibble: 1 × 2
## mean_emp sd_emp
## <dbl> <dbl>
## 1 4.02 0.926
Study1UK %>%
summarise (mean_emp = mean(ve),
sd_emp = sd(ve))
## # A tibble: 1 × 2
## mean_emp sd_emp
## <dbl> <dbl>
## 1 4.56 0.612
They match! Perfect. We have now verified the means for “empathy” in Study 1.
We decided to use the rename() function from the dplyr package to rename the variables accordingly and minimise confusion later down the track.
Study1USA_Rename <-Study1USA %>%
rename(empathy = ve)
Study1Ger_Rename <- Study1Ger %>%
rename(empathy = ve)
Study1UK_Rename <-Study1UK %>%
rename(empathy = ve)
What about physical distancing motivation?
Let’s start by seeing if the question that states “Because of coronavirus COVID-19, I am massively curtailing social contact (so-called”social distancing”).” is the correct variable as per the supplementary materials.
The reported descriptive statistics for the USA sample were M = 4.30, SD = 0.99.
Study1USA %>%
summarise(mean_physdist = mean(Q24_1), #Q: Because of coronavirus COVID-19, I am massively curtailing social contact (so-called "social distancing").
sd_physdist = sd(Q24_1))
## # A tibble: 1 × 2
## mean_physdist sd_physdist
## <dbl> <dbl>
## 1 4.30 0.991
They match! Nice.
Now, to replicate this process for the German and UK samples. The UK sample was relatively simple to figure out since the questions were listed in English. However, when opening up “Study1Germany.sav” in Jamovi, all the questions were written in German.
Jamovi: Study1Germany.sav in need of translation
Google Translate was once again our best friend.
The reported descriptive statistics for the UK sample were M = 4.12, SD = 1.01 and for the German sample were M = 4.04, SD = 1.11.
Study1UK %>%
summarise(mean_physdist = mean(Q24_1), #Q: Because of coronavirus COVID-19, I am massively curtailing social contact (so-called "social distancing").
sd_physdist = sd(Q24_1))
## # A tibble: 1 × 2
## mean_physdist sd_physdist
## <dbl> <dbl>
## 1 4.12 1.01
Study1Ger %>%
summarise(mean_physdist = mean(Q77_1), #This Q translated to: Due to the coronavirus COVID-19, I have massively reduced my direct social contacts (so-called "social distancing").
sd_physdist = sd(Q77_1))
## # A tibble: 1 × 2
## mean_physdist sd_physdist
## <dbl> <dbl>
## 1 4.04 1.11
They also match! Triple nice. We have now verified the means for “physical distancing motivation” in Study 1.
Time to update the “Study1???_Rename” objects to ensure the corresponding question items are renamed to something more intuitive: “physdist_motiv”.
Study1USA_Rename <-Study1USA_Rename %>%
rename(physdist_motiv = Q24_1)
Study1Ger_Rename <- Study1Ger_Rename %>%
rename(physdist_motiv = Q77_1)
Study1UK_Rename <-Study1UK_Rename %>%
rename(physdist_motiv = Q24_1)
The number of participants
First, the number of participants in each sample. We used the same process used for Study 3 to reproduce these numbers.
The paper reported the following numbers of participants:
count(Study1USA_Rename)
## # A tibble: 1 × 1
## n
## <int>
## 1 322
count(Study1Ger_Rename)
## # A tibble: 1 × 1
## n
## <int>
## 1 326
count(Study1UK_Rename)
## # A tibble: 1 × 1
## n
## <int>
## 1 317
All of the above match perfectly.
The age demographics
Now onto the age demographics.
The paper reported the following age demographics:
Study1USA_Age <- Study1USA_Rename %>%
summarise (mean_age = mean(Age),
sd_age = sd(Age))
print(Study1USA_Age)
## # A tibble: 1 × 2
## mean_age sd_age
## <dbl> <dbl>
## 1 33.3 13.0
Study1Ger_Age <- Study1Ger_Rename %>%
summarise (mean_age = mean(Age),
sd_age = sd(Age))
print(Study1Ger_Age)
## # A tibble: 1 × 2
## mean_age sd_age
## <dbl> <dbl>
## 1 29.4 9.31
Study1UK_Age <- Study1UK_Rename %>%
summarise (mean_age = mean(Age),
sd_age = sd(Age))
print(Study1UK_Age)
## # A tibble: 1 × 2
## mean_age sd_age
## <dbl> <dbl>
## 1 38.1 12.2
Perfect.
The gender distribution
Onto the gender distribution.
The paper reported the following gender distributions:
#1 = Male; 2 = Female; 3 = No information (N.A)
table(Study1USA_Rename$Gender)
##
## 1 2 3
## 172 147 3
147/322
## [1] 0.4565217
table(Study1Ger_Rename$Gender)
##
## 1 2 3
## 172 152 2
152/326
## [1] 0.4662577
table(Study1UK_Rename$Gender)
##
## 1 2 3
## 128 188 1
188/317
## [1] 0.5930599
Smooth sailing.
Because studies 1 and 2 are represented in a singular figure, I thought it was appropriate to satisfy the verification goals for Study 2 first before attempting to reproduce Figure 1.
This process proved relatively straightforward given that there was only one sample to deal with.
Study2 <- read_excel(here("data&syntax", "Study2.xlsx"))
glimpse(Study2)
## Rows: 359
## Columns: 20
## $ Q36_1 <dbl> 5, 5, 5, 4, 2, 2, 4, 4, 5, 4, 3, 5, 4, 4, 5, 3, 5, 4, 5…
## $ Q36_2 <dbl> 3, 3, 5, 3, 3, 4, 5, 3, 4, 3, 5, 3, 4, 4, 5, 4, 3, 3, 5…
## $ Q36_3 <dbl> 5, 5, 5, 4, 2, 2, 4, 4, 5, 4, 3, 5, 4, 2, 5, 3, 5, 4, 5…
## $ Q36_4 <dbl> 3, 1, 5, 3, 2, 4, 2, 1, 2, 1, 5, 3, 4, 4, 5, 3, 3, 3, 4…
## $ Q36_5 <dbl> 5, 5, 5, 4, 2, 2, 5, 4, 5, 4, 3, 5, 4, 4, 5, 2, 5, 4, 5…
## $ Q36_6 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q36_7 <dbl> 3, 4, 3, 3, 2, 3, 3, 1, 4, 3, 4, 4, 2, 4, 3, 4, 2, 4, 4…
## $ Q22_1 <dbl> 1, 1, 2, 1, 4, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1…
## $ Q22_2 <dbl> 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 5, 1…
## $ Q22_3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q22_4 <dbl> 1, 1, 2, 1, 4, 1, 1, 1, 1, 5, 1, 1, 1, 1, 2, 1, 1, 2, 1…
## $ Q22_5 <dbl> 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q22_6 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q20 <dbl> 3, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1…
## $ Age <dbl> 27, 36, 26, 32, 29, 32, 37, 23, 27, 32, 34, 28, 57, 44,…
## $ Gender <dbl> 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1…
## $ Household_size <dbl> 1, 4, 1, 1, 1, 2, 3, 3, 3, 4, 3, 1, 1, 3, 1, 5, 2, 2, 4…
## $ ve <dbl> 5.000000, 5.000000, 5.000000, 4.000000, 2.000000, 2.000…
## $ sd <dbl> 1.0, 1.0, 1.6, 1.0, 2.8, 1.0, 1.0, 1.4, 1.0, 1.8, 2.0, …
## $ sdR <dbl> 5.0, 5.0, 4.4, 5.0, 3.2, 5.0, 5.0, 4.6, 5.0, 4.2, 4.0, …
The glimpse revealed some variable names that we were quite familiar with by this point. Perhaps “ve” refers to empathy (as in Study 1) and “sdR” relates to the motivation to physically distance (as in Study 3). We tested these hypotheses by using the summarise() function from the dplyr package that we’ve been using thus far.
The paper reported the following descriptive statistics:
Study2 %>%
summarise (mean_emp = mean(ve),
sd_emp = sd(ve),
mean_physdist = mean(sdR),
sd_physdist = sd(sdR))
## # A tibble: 1 × 4
## mean_emp sd_emp mean_physdist sd_physdist
## <dbl> <dbl> <dbl> <dbl>
## 1 4.05 0.941 4.56 0.648
Oh my! They match perfectly. We’re finally getting the hang of it. It feels very satisfying being able to identify the variables correctly first try. We were glad to know that we didn’t have to individually run all the items through trial and error.
As usual, we renamed the variables accordingly to minimise as much confusion as possible later down the track.
Study2_Rename <-Study2 %>%
rename(empathy = ve, physdist_motiv = sdR)
The number of participants
This was where we noticed a mismatch between the paper’s reported number of participants and the number that was specified in the supplementary materials. The paper stated that there were 359 participants. In the supplementary materials, the number of participants was recorded as 361 (see image below).
Supplementary materials: Study 2 number of participants
count(Study2_Rename)
## # A tibble: 1 × 1
## n
## <int>
## 1 359
Looks like the number reported in the paper itself was the correct one.
The age demographics
The paper reported that the mean age was 29.75 years with a standard deviation of 9.40.
Study2_Age <- Study2_Rename %>%
summarise (mean_age = mean(Age),
sd_age = sd(Age))
print(Study2_Age)
## # A tibble: 1 × 2
## mean_age sd_age
## <dbl> <dbl>
## 1 29.8 9.40
Success!
The gender distribution
The experimenters reported that 48.5% of participants were female.
table(Study2_Rename$Gender)
##
## 1 2 3
## 184 174 1
174/359
## [1] 0.4846797
That rounds up to 48.5%. Perfect.
Figure 1 is a line graph that represents data from all 3 samples in Study 1 as well as the single sample in Study 2. As such, our thought as a team was to generate graphs for each of the samples in Studies 1 and 2 and then combine all the graphs together afterwards.
The aesthetic portions of the graph took a considerable amount of Googling to fine tune but since we were more familiar with the ggplot2 package now, we understood where to start and what to Google in order to get the result we desired.
This process was repeated for each of the individual graphs.
Study1Ger_plot <- ggplot(data = Study1Ger_Rename, mapping = aes(x = empathy, y = physdist_motiv)) +
geom_smooth(method = lm, colour = "black") +
scale_x_continuous(name = NULL, limits = c(1,5)) +
scale_y_continuous(name = NULL, limits =c (1,5)) +
ggtitle(label = "Study 1:\nGerman Sample" ) +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
print(Study1Ger_plot)
## `geom_smooth()` using formula 'y ~ x'
Study1UK_plot <- ggplot(data = Study1UK_Rename, mapping = aes(x = empathy, y = physdist_motiv)) +
geom_smooth(method = lm, colour = "black") +
scale_x_continuous(name = NULL, limits =c (1,5)) +
scale_y_continuous(name = NULL, limits= c (1,5)) +
ggtitle(label = "Study 1:\nUK Sample" ) +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
print(Study1UK_plot)
## `geom_smooth()` using formula 'y ~ x'
Study1USA_plot <- ggplot(data = Study1USA_Rename, mapping = aes(x = empathy, y = physdist_motiv)) +
geom_smooth(method = lm, colour = "black") +
scale_x_continuous(name = NULL, limits= c (1,5)) +
scale_y_continuous(name = NULL, limits= c (1,5)) +
ggtitle(label = "Study 1:\nU.S. Sample") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
print(Study1USA_plot)
## `geom_smooth()` using formula 'y ~ x'
Study2_plot <- ggplot(data = Study2_Rename, mapping = aes(x = empathy, y = physdist_motiv)) +
geom_smooth(method = lm, colour = "black") +
scale_x_continuous(name = NULL, limits = c(1,5)) +
scale_y_continuous(name = NULL, limits = c(1,5)) +
ggtitle(label = "Study 2:\nGerman Sample") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
plot.title = element_text(size = 10, hjust = 0.5))
print(Study2_plot)
## `geom_smooth()` using formula 'y ~ x'
The difficult part was knowing how to combine the graphs. We did some more Googling and found that the ggarrange() function as part of the “ggpubr” package could help us satisfy this verification goal (http://www.sthda.com/english/articles/32-r-graphics-essentials/126-combine-multiple-ggplots-in-one-graph/). We also found some articles discussing the “patchwork” package as a useful tool. However, the advantage of the “ggpubr” package was that it contained an annotate_figure() function which allowed us to input axes labels once the graphs were combined. Because this was necessary as part of Figure 1, we decided to go with the “ggpubr” package.
#load package
library(ggpubr)
#Attempt 1
Figure1 <- ggarrange(Study1USA_plot, Study1UK_plot, Study1Ger_plot, Study2_plot, #ggarrange() helps combine multiple plots and arrange them in a specified configuration
ncol=4) #this specifies how the plots are arranged (4 columns)
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Goodness me! It worked! I can finalise the code to include all the aesthetic adjustments now.
Figure1 <- ggarrange(Study1USA_plot, Study1UK_plot, Study1Ger_plot, Study2_plot,
ncol= 4) #4 columns
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Figure1_Final <- annotate_figure(Figure1, #annotate_figure() from the ggpubr package to add aesthetics
left = text_grob("Physical Distancing", #text_grob() function from the ggpubr package adds text into the graph on the left
rot = 90), #this rotates the title of the graph 90 degrees
bottom = text_grob("Empathy")) #text_grob() function used to add text into the graph on the bottom
print(Figure1_Final)
And with that, we’ve completed the verification goals for Study 1.
Study4 <- read_excel(here("data&syntax","Study4.xlsx"))
glimpse(Study4)
## Rows: 1,526
## Columns: 19
## $ QID90_1 <dbl> 1, 3, 1, 2, 2, 1, 5, 1, 1, 5, 5, 3, 1, 5, 4, 3, 4, 3, 1…
## $ QID90_2 <dbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, 3, 1…
## $ QID90_3 <dbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, 3, 2…
## $ Q22_1 <dbl> 5, 4, 4, 3, 5, 1, 5, 4, 4, 5, 5, 2, 5, 4, 4, 2, 5, 4, 5…
## $ Q92_1 <dbl> 4, 3, 5, 2, 5, 3, 2, 4, 3, 5, 5, 2, 4, 4, 3, 3, 5, 3, 5…
## $ Q92_2 <dbl> 1, 3, 1, 3, 1, 5, 1, 2, 2, 1, 1, 3, 1, 4, 3, 3, 1, 3, 1…
## $ Q92_3 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q100_1 <dbl> 2, 2, 3, 2, 3, 1, 2, 3, 4, 2, 4, 3, 3, 4, 3, 2, 3, 2, 4…
## $ Q100_2 <dbl> 3, 3, 4, 3, 3, 1, 3, 3, 4, 3, 4, 3, 3, 3, 4, 4, 3, 3, 4…
## $ Q61 <dbl> 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2…
## $ Q20 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Age <dbl> 30, 52, 39, 29, 46, 20, 31, 43, 40, 18, 29, 33, 27, 28,…
## $ Gender <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 2…
## $ Household_size <dbl> 1, 1, 3, 1, 1, 2, 3, 1, 4, 5, 2, 4, 2, 5, 2, 2, 2, 2, 5…
## $ Q28 <chr> "Gulasch", "Nudeln", "Nudelsalat", "Nudelgratin", "Sala…
## $ bed <dbl> 1, 1, 1, 0, 1, 0, 2, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 1, 0…
## $ empa <dbl> 1.000000, 3.000000, 1.000000, 2.000000, 2.000000, 1.000…
## $ Q92_2r <dbl> 5, 3, 5, 3, 5, 1, 5, 4, 4, 5, 5, 3, 5, 2, 3, 3, 5, 3, 5…
## $ maskePo <dbl> 4.5, 3.0, 5.0, 2.5, 5.0, 2.0, 3.5, 4.0, 3.5, 5.0, 5.0, …
By this point in the verification process, we had established a consistent workflow such that we knew how to approach and achieve our verification goals.
The reported descriptive statistics for “empathy” in the paper were:
#First I wanted to rename "bed" to "condition" so that the dataset made more intuitive sense:
Study4_Rename <- Study4 %>%
rename(condition = "bed")
#Testing "empa"
Study4_Rename %>%
group_by(condition) %>%
summarise (mean_resp = mean(empa),
sd_resp = sd(empa)) %>%
ungroup()
## # A tibble: 3 × 3
## condition mean_resp sd_resp
## <dbl> <dbl> <dbl>
## 1 0 2.14 1.00
## 2 1 2.10 1.01
## 3 2 4.03 0.900
It looks like “empa” was used to operationalise empathy scores. I also wanted to verify that “Q22_1” was indeed the motivation to wear a face mask given that the supplementary materials can be misleading at times. I used the same summarise() function to do this.
The reported descriptive statistics for “motivation to wear a face mask” in the paper were:
Study4_Rename %>%
group_by(condition) %>%
summarise (mean_resp = mean(Q22_1),
sd_resp = sd(Q22_1)) %>%
ungroup()
## # A tibble: 3 × 3
## condition mean_resp sd_resp
## <dbl> <dbl> <dbl>
## 1 0 3.83 1.20
## 2 1 3.69 1.24
## 3 2 4.00 1.12
This looks right. We weren’t misled by the supplementary materials this time! A small win.
To clarify variable labels in the dataset, we used the rename() function from the dplyr package to rename these variables to something more intuitive.
Study4_Rename <- Study4_Rename %>%
rename(empathy = "empa",
mask_motiv = "Q22_1")
Time to replicate the demographic statistics.
The number of participants
The paper reported n = 1526.
count(Study4_Rename)
## # A tibble: 1 × 1
## n
## <int>
## 1 1526
Perfect!
The age demographics
The mean age was reported to be 34.71 years old with a standard deviaiton of 12.09.
Study4_Age <- Study4_Rename %>%
summarise (mean_age = mean(Age),
sd_age = sd(Age))
print(Study4_Age)
## # A tibble: 1 × 2
## mean_age sd_age
## <dbl> <dbl>
## 1 34.7 12.1
Great!
The gender distribution
According to Jamovi, “1” corresponds with male and “2” with female (same as previous studies).
The sample was said to be 42.7% female.
table(Study4_Rename$Gender)
##
## 1 2
## 796 720
720/1526 #1526 being the total number of participants
## [1] 0.4718218
This checks out.
Now that we have completed verifying the demographic and descriptive statistics of this sample, it’s time to reproduce the last figure of the paper. What a journey we’ve been on so far…
Upon observation, this figure is really similar to Figure 2. The process for reproducing this figure should be the same as for Figure 2. Below you will see me replicate the process used for Figure 2 but with data from Study 4 in place of Study 3.
#Creating a new variable with condition labels
Study4_Conditions <- Study4_Rename %>%
mutate (conditionlabel = case_when(condition == 0 ~ "Information Only",
condition == 1 ~ "Control",
condition == 2 ~ "Empathy"))
#Generating the plot
Study4_plot <- ggplot(data = Study4_Conditions,
mapping = aes(x = reorder(conditionlabel, mask_motiv), y = mask_motiv)) +
geom_violin(aes(fill = conditionlabel)) + # from ggplot2 pckg: colour fills each violin plot differently according to the conditionlabel variable
geom_boxplot(width = 0.1) + # from ggplot2 pckg: adds a boxplot layer and specifies the width of the plot as 0.1 of the violin plot
theme_bw()+ # from ggplot2 pckg: this gets rid of the grey background in a default ggplot graph
theme(panel.grid.major = element_blank(), # theme() from ggplot2 pckg: this gets rid of the horizontal grid lines at each of the y-axis numbers
panel.grid.minor = element_blank(), # this gets rid of the horizontal grid lines halfway between the y-axis numbers
legend.position = "none") + # this removes the legend
scale_fill_manual(values = c("#A7CFE5", "#B3D88B", "#1779B6"))+ # from ggplot2 pckg: this allows me to manually specify the fill colours of each violin plot
scale_y_continuous(name = "Physical Distancing Motivation") + # from ggplot2 pckg: this specifies the y-axis label as "Physical Distancing Motivation", given that it is a contunous variable
scale_x_discrete(name = NULL) + # from ggplot2 pckg: this removes the x-axis label, given that it is a discrete variable
stat_summary(fun = "mean", shape = 9) # from ggplot2 pckg: this plots the mean onto the graph as a specified shape ("9")
print(Study4_plot)
## Warning: Removed 3 rows containing missing values (geom_segment).
Success! After being able to reproduce this figure within one attempt, I could tangibly see how much I had improved when learning R. It was really about consulting as many resources as possible (including but not limited to stackoverflow forums, ALL the available files in the OSF provided, Google Translate, various coding blogs) and trying out different functions until I came across one that consistently did what I wanted it to.
As part of my exploratory analyses I will be using the skills I have acquired over the verification process to ask questions of the data.
This question arose out of curiosity about whether living with more people would increase one’s sensitivity to their ability to pass the illness onto others and thus impact their motivation to physically distance. Personally, my primary worry when it comes to the virus is passing it on to others, particularly in the household, rather than my personal vulnerability. Therefore, I thought it would be interesting to explore this relationship.
I also wondered if constantly being surrounded by more people in a large household would impact one’s affective empathy. An article by Promsri (2019) [“Does A Family Size Have an Effect on Empathy Level?: Filling the Gap in Empathy Literature”] indicated that children with larger family sizes exhibited higher levels of empathetic skill, possibly due to family socialisation. I was interested to see if this trend could be observed in the Study 2 sample.
First, I wanted to generate a frequency table to guage the range of household sizes that were recorded. This was the same process we used to verify the gender distribution in Part 2 of this report.
table(Study2_Rename$Household_size)
##
## 1 2 3 4 5 6 7 44
## 67 130 76 52 23 7 3 1
There was a household size of FORTY-FOUR?? Interesting…
I wanted to generate descriptive statistics to see if I could spot a pattern in the numbers.
Study2_HouseholdSummary <- Study2_Rename %>%
group_by(Household_size) %>%
summarise (mean_emp = mean(empathy),
mean_physdist = mean(physdist_motiv))
print(Study2_HouseholdSummary)
## # A tibble: 8 × 3
## Household_size mean_emp mean_physdist
## <dbl> <dbl> <dbl>
## 1 1 3.95 4.44
## 2 2 4.08 4.57
## 3 3 4.12 4.59
## 4 4 4.04 4.56
## 5 5 4 4.63
## 6 6 3.81 4.83
## 7 7 4.11 4.6
## 8 44 5 4.4
The numbers all seem pretty similar…Let’s try and represent this visually through a graph.
This is where I encountered some issues:
#Graphing attempt 1
ggplot(data = Study2_HouseholdSummary,
mapping = aes(x = factor(Household_size))) +
geom_col(aes(y = mean_emp)) #I was a bit stuck as to how to superimpose two column graphs on top of each other. Then, I thought to use the pivot() function
Study2_HouseholdSummary2 <- Study2_HouseholdSummary %>%
pivot_longer(cols = c(mean_emp, mean_physdist),
names_to = "category",
values_to = "means")
print(Study2_HouseholdSummary2)
## # A tibble: 16 × 3
## Household_size category means
## <dbl> <chr> <dbl>
## 1 1 mean_emp 3.95
## 2 1 mean_physdist 4.44
## 3 2 mean_emp 4.08
## 4 2 mean_physdist 4.57
## 5 3 mean_emp 4.12
## 6 3 mean_physdist 4.59
## 7 4 mean_emp 4.04
## 8 4 mean_physdist 4.56
## 9 5 mean_emp 4
## 10 5 mean_physdist 4.63
## 11 6 mean_emp 3.81
## 12 6 mean_physdist 4.83
## 13 7 mean_emp 4.11
## 14 7 mean_physdist 4.6
## 15 44 mean_emp 5
## 16 44 mean_physdist 4.4
#Graphing attempt 2
ggplot(data = Study2_HouseholdSummary2,
mapping = aes(x = factor(Household_size))) +
geom_col(aes(y = means, fill = category),
position = "dodge") #I want to add error bars (this requires the sd)
Study2_HouseholdSummary2 <- Study2_Rename %>%
group_by(Household_size) %>%
summarise (mean_emp = mean(empathy),
sd_emp = sd(empathy),
mean_physdist = mean(physdist_motiv),
sd_physdist = sd(physdist_motiv))
print(Study2_HouseholdSummary2)
## # A tibble: 8 × 5
## Household_size mean_emp sd_emp mean_physdist sd_physdist
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 3.95 0.985 4.44 0.767
## 2 2 4.08 0.893 4.57 0.670
## 3 3 4.12 0.956 4.59 0.512
## 4 4 4.04 0.988 4.56 0.660
## 5 5 4 0.853 4.63 0.664
## 6 6 3.81 1.44 4.83 0.214
## 7 7 4.11 0.694 4.6 0.4
## 8 44 5 NA 4.4 NA
Study2_HouseholdSummary3 <- Study2_HouseholdSummary2 %>%
pivot_longer(cols = c(mean_emp, mean_physdist),
names_to = "category_mean",
values_to = "means") %>%
pivot_longer(cols = c(sd_emp, sd_physdist),
names_to = "category_sd",
values_to = "sd") #this doesn't work numbers don't match up; will need to do these tables separately and then join them
print(Study2_HouseholdSummary3)
## # A tibble: 32 × 5
## Household_size category_mean means category_sd sd
## <dbl> <chr> <dbl> <chr> <dbl>
## 1 1 mean_emp 3.95 sd_emp 0.985
## 2 1 mean_emp 3.95 sd_physdist 0.767
## 3 1 mean_physdist 4.44 sd_emp 0.985
## 4 1 mean_physdist 4.44 sd_physdist 0.767
## 5 2 mean_emp 4.08 sd_emp 0.893
## 6 2 mean_emp 4.08 sd_physdist 0.670
## 7 2 mean_physdist 4.57 sd_emp 0.893
## 8 2 mean_physdist 4.57 sd_physdist 0.670
## 9 3 mean_emp 4.12 sd_emp 0.956
## 10 3 mean_emp 4.12 sd_physdist 0.512
## # … with 22 more rows
You can see my excitement in the code annotations below.
Study2_HouseholdSummary3 <- Study2_HouseholdSummary2 %>%
pivot_longer(cols = c(mean_emp, mean_physdist),
names_to = "category_mean",
values_to = "means") %>%
mutate(ID = 1:16) #https://statisticsglobe.com/assign-unique-id-for-each-group-in-r
Study2_HouseholdSummary4 <- Study2_HouseholdSummary2 %>%
pivot_longer(cols = c(sd_emp, sd_physdist),
names_to = "category_sd",
values_to = "sd") %>%
mutate(ID = 1:16)
Study2_HouseholdJoin <- Study2_HouseholdSummary4 %>%
left_join(Study2_HouseholdSummary3, by = "ID")
print(Study2_HouseholdJoin) #omg it works ok lets see if i can make it into a graph
## # A tibble: 16 × 11
## Household_size.x mean_emp mean_physdist category_sd sd ID
## <dbl> <dbl> <dbl> <chr> <dbl> <int>
## 1 1 3.95 4.44 sd_emp 0.985 1
## 2 1 3.95 4.44 sd_physdist 0.767 2
## 3 2 4.08 4.57 sd_emp 0.893 3
## 4 2 4.08 4.57 sd_physdist 0.670 4
## 5 3 4.12 4.59 sd_emp 0.956 5
## 6 3 4.12 4.59 sd_physdist 0.512 6
## 7 4 4.04 4.56 sd_emp 0.988 7
## 8 4 4.04 4.56 sd_physdist 0.660 8
## 9 5 4 4.63 sd_emp 0.853 9
## 10 5 4 4.63 sd_physdist 0.664 10
## 11 6 3.81 4.83 sd_emp 1.44 11
## 12 6 3.81 4.83 sd_physdist 0.214 12
## 13 7 4.11 4.6 sd_emp 0.694 13
## 14 7 4.11 4.6 sd_physdist 0.4 14
## 15 44 5 4.4 sd_emp NA 15
## 16 44 5 4.4 sd_physdist NA 16
## # … with 5 more variables: Household_size.y <dbl>, sd_emp <dbl>,
## # sd_physdist <dbl>, category_mean <chr>, means <dbl>
I was finally up to plotting this tibble. The basic components of the plot were specified as normal i.e., using <- to assign a name to the object, specifying the input data and the aesthetic mappings using the ggplot() function etc.
#Graphing attempt 3
expplot1_3 <- ggplot(data = Study2_HouseholdJoin,
mapping = aes(x = factor(Household_size.x))) +
geom_col(aes(y = means, fill = category_mean),
position = "dodge",
alpha = 0.9) +
geom_errorbar(aes(ymin=means-sd, ymax=means+sd, group = category_mean),
position = position_dodge(0.9),
width = 0.25) + # the position_dodge() function required some tinkering but group = category_mean solved it because it gave it a position to dodge from
scale_fill_manual(values = c("#A7CFE5", "#1779B6"),
name = "Measure",
labels = c("Empathy", "Physical Distancing")) +
scale_x_discrete(name = "Household Size") +
scale_y_continuous(name = "Means") +
theme_bw()
print(expplot1_3)
It looks like there is little correlation between household size and self-reported empathy and motivation to physically distance - not really worth doing a statistical test.
As we get older, perhaps there is a greater chance that empathy impacts our behaviour as we rely more on others to look after us and our wellbeing. Furthermore, older participants may have higher empathy levels due to the accumulation of social interactions over the years. Younger age groups might be more independent and self-motivated because they are just entering adulthood and are looking to forge their own paths. Alternatively, perhaps younger age groups are more emotionally driven in their behaviour and high levels of empathy might actually increase motivation to physically distance.
For this question, I wanted to create a graph which could visually represent these relationships before seeing if it is worth doing any statistical tests to verify if there are any differences that approach statistical significance.
To manage the large range of ages recorded, I decided to create a new variable that grouped age groups together.
table(Study2_Rename$Age) #looks like 18 is the youngest and 67 is the oldest; I will generate new age categories (18-28), (28-38), (38-48), (48-58), (58-68)
##
## 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43
## 9 8 13 14 27 28 20 27 19 21 15 20 16 13 14 10 5 11 10 4 1 7 3 6 2 4
## 44 45 47 48 50 52 53 54 55 56 57 58 60 67
## 4 2 1 1 2 1 3 2 2 6 3 1 3 1
Study2_Age <- Study2_Rename %>%
mutate(age_bracket = ifelse(Age %in% 18:28, "18-28",
ifelse(Age %in% 28:38, "28-38",
ifelse(Age %in% 38:48, "38-48",
ifelse(Age %in% 48:58, "48-58",
ifelse(Age %in% 58:68, "58-68", "NA"))))))
glimpse(Study2_Age) #looks like it worked!!
## Rows: 359
## Columns: 21
## $ Q36_1 <dbl> 5, 5, 5, 4, 2, 2, 4, 4, 5, 4, 3, 5, 4, 4, 5, 3, 5, 4, 5…
## $ Q36_2 <dbl> 3, 3, 5, 3, 3, 4, 5, 3, 4, 3, 5, 3, 4, 4, 5, 4, 3, 3, 5…
## $ Q36_3 <dbl> 5, 5, 5, 4, 2, 2, 4, 4, 5, 4, 3, 5, 4, 2, 5, 3, 5, 4, 5…
## $ Q36_4 <dbl> 3, 1, 5, 3, 2, 4, 2, 1, 2, 1, 5, 3, 4, 4, 5, 3, 3, 3, 4…
## $ Q36_5 <dbl> 5, 5, 5, 4, 2, 2, 5, 4, 5, 4, 3, 5, 4, 4, 5, 2, 5, 4, 5…
## $ Q36_6 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q36_7 <dbl> 3, 4, 3, 3, 2, 3, 3, 1, 4, 3, 4, 4, 2, 4, 3, 4, 2, 4, 4…
## $ Q22_1 <dbl> 1, 1, 2, 1, 4, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, 3, 1…
## $ Q22_2 <dbl> 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, 5, 1…
## $ Q22_3 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q22_4 <dbl> 1, 1, 2, 1, 4, 1, 1, 1, 1, 5, 1, 1, 1, 1, 2, 1, 1, 2, 1…
## $ Q22_5 <dbl> 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 5, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Q22_6 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ Q20 <dbl> 3, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 1, 1…
## $ Age <dbl> 27, 36, 26, 32, 29, 32, 37, 23, 27, 32, 34, 28, 57, 44,…
## $ Gender <dbl> 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1…
## $ Household_size <dbl> 1, 4, 1, 1, 1, 2, 3, 3, 3, 4, 3, 1, 1, 3, 1, 5, 2, 2, 4…
## $ empathy <dbl> 5.000000, 5.000000, 5.000000, 4.000000, 2.000000, 2.000…
## $ sd <dbl> 1.0, 1.0, 1.6, 1.0, 2.8, 1.0, 1.0, 1.4, 1.0, 1.8, 2.0, …
## $ physdist_motiv <dbl> 5.0, 5.0, 4.4, 5.0, 3.2, 5.0, 5.0, 4.6, 5.0, 4.2, 4.0, …
## $ age_bracket <chr> "18-28", "28-38", "18-28", "28-38", "28-38", "28-38", "…
I was very excited to see that it worked! Now, onto graphing this relationship.
Study2_AgeSummary <- Study2_Age %>%
group_by(age_bracket) %>%
summarise(mean_emp = mean(empathy),
mean_physdist = mean(physdist_motiv)) %>%
ungroup()
Study2_AgeSummary2 <- Study2_AgeSummary %>%
pivot_longer(cols = c(mean_emp, mean_physdist),
names_to = "category_mean",
values_to = "means")
print(Study2_AgeSummary2)
## # A tibble: 10 × 3
## age_bracket category_mean means
## <chr> <chr> <dbl>
## 1 18-28 mean_emp 4.03
## 2 18-28 mean_physdist 4.47
## 3 28-38 mean_emp 3.96
## 4 28-38 mean_physdist 4.67
## 5 38-48 mean_emp 4.37
## 6 38-48 mean_physdist 4.63
## 7 48-58 mean_emp 4.13
## 8 48-58 mean_physdist 4.83
## 9 58-68 mean_emp 4.75
## 10 58-68 mean_physdist 4.35
expplot2col <- ggplot(data = Study2_AgeSummary2,
mapping = aes(x = age_bracket, y = means)) +
geom_col(aes(fill = category_mean),
position = "dodge",
alpha = 0.9)
expplot2col
I felt that the graph didn’t really say much about the specific interaction between physical distancing motivation and empathy across the different age groups.
At this point, I had been coding for several hours. My previous hiccups with R were always made better by a pair of fresh eyes, so I decided to leave this for the next day.
…
Refreshed, I decided to take a completely different angle. Rather than limiting the input data to the means only, I tried to plot the entire dataset (“Study2_Age”) to look at the relationship between physical distancing and empathy and how this might differ between age groups. Thus, I decided to plot physical distancing on the x-axis, empathy on the y-axis and use the age bracket as different coloured lines on the graph.
The following code is a mix of code used to reproduce Figure 1 and Figures 2 and 3 from the original paper.
expplot2smooth <- ggplot(data = Study2_Age,
mapping = aes(x = physdist_motiv)) +
geom_smooth(aes(y = empathy, colour = age_bracket),
method = lm,
se = FALSE) +
scale_colour_discrete(name = "Age (Years)") +
scale_x_continuous(name = "Empathy", limits = c(1,5)) +
scale_y_continuous(name = "Physical Distancing Motivation", limits = c(1,5)) +
theme_bw()
expplot2smooth
## `geom_smooth()` using formula 'y ~ x'
## Warning: Removed 13 rows containing missing values (geom_smooth).
The resultant graph was very interesting indeed. Before jumping to any conclusions, I wanted to revisit the sample sizes of each age-bracket (from memory, it was positively skewed).
table(Study2_Age$age_bracket)
##
## 18-28 28-38 38-48 48-58 58-68
## 201 104 30 20 4
Very much positively skewed. Because the sample sizes for the older age brackets were quite small, the trend observed in the graph may be attributed to individual differences rather than any real trend in the population. An interesting graph nonetheless and perhaps something to explore in future studies…
To verify whether these differences approached any significance, I decided to conduct some statistical tests. Because I had never done this in R before, I needed to do some Googling. This website was helpful: https://www.scribbr.com/statistics/anova-in-r/
library(broom) #this takes the messy output of built-in functions in R such as aov() and turns them into tidy and more easily interpretable tibbles
one.way_empa <- aov(empathy ~ age_bracket, data = Study2_Age) #aov() conducts an ANOVA testing the effect of age on empathy using the data from "Study2_Age"
summary(one.way_empa) #summary() prints the results in an interpretable table; since p>.05 can't reject the null hypothesis
## Df Sum Sq Mean Sq F value Pr(>F)
## age_bracket 4 5.94 1.4854 1.689 0.152
## Residuals 354 311.39 0.8796
one.way_pd <- aov(physdist_motiv ~ age_bracket, data = Study2_Age) #aov() conducts an ANOVA testing the effect of age on physical distancing motiavtion using the data from "Study2_Age"
summary(one.way_pd) #since p<.05 can reject the null; let's run some post-hoc tests
## Df Sum Sq Mean Sq F value Pr(>F)
## age_bracket 4 4.6 1.1495 2.788 0.0264 *
## Residuals 354 146.0 0.4123
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
tukey_pd <- TukeyHSD(one.way_pd) #TukeyHSD() conducts a Tukey HSD test using the data from the "one.way_pd" ANOVA test
tukey_pd #this prints the results
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = physdist_motiv ~ age_bracket, data = Study2_Age)
##
## $age_bracket
## diff lwr upr p adj
## 28-38-18-28 0.19672790 -0.01593831 0.4093941 0.0850077
## 38-48-18-28 0.15800995 -0.18658635 0.5026062 0.7175504
## 48-58-18-28 0.36134328 -0.05146308 0.7741496 0.1176557
## 58-68-18-28 -0.11865672 -1.00767805 0.7703646 0.9961625
## 38-48-28-38 -0.03871795 -0.40358813 0.3261522 0.9984310
## 48-58-28-38 0.16461538 -0.26525987 0.5944906 0.8317528
## 58-68-28-38 -0.31538462 -1.21245907 0.5816898 0.8711085
## 48-58-38-48 0.20333333 -0.30491113 0.7115778 0.8080751
## 58-68-38-48 -0.27666667 -1.21382314 0.6604898 0.9276394
## 58-68-48-58 -0.48000000 -1.44432606 0.4843261 0.6506354
However, none of the individual comparisons had a p<.05. This may be because the number of participants in each age bracket were drastically unequal - the higher age brackets have less power due to their smaller samples.
I decided to run one last statistical test to check if there is a significant interaction between age bracket and empathy which ultimately impacts the motivation to physically distance as I speculated previously.
interaction_empd <- aov(physdist_motiv ~ age_bracket*empathy, data = Study2_Age) #aov() conducts an ANOVA testing the effect of the interaction between age and empathy on physical distancing motivation using the data from "Study2_Age"
summary(interaction_empd)
## Df Sum Sq Mean Sq F value Pr(>F)
## age_bracket 4 4.60 1.149 3.186 0.0137 *
## empathy 1 15.20 15.203 42.133 2.93e-10 ***
## age_bracket:empathy 4 4.82 1.206 3.343 0.0105 *
## Residuals 349 125.93 0.361
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Given the statisically significant output for the interaction (p < .05), it appears that the relationship between age and physical distancing differs by one’s level of empathy and that the effect of age on physical distancing should not be considered without empathy scores in mind. This is in line with what the paper suggests: that empathy the variable of interest when asking what influences motivation to physically distance.
Overall, the results are largely inconclusive. There are no significant comparisons between age brackets but there does seem to be some promising findings to base future research on (namely, how age might interact with empathy to impact physical distancing).
As a psychology student, one thing that is often reiterated is that it is often our perception of harm/risk/the world around us that influences our choices rather than the reality of our surroundings. I wanted to see if this was the case even when making important decisions regarding the health of ourselves and others. Although the experimenters reported that neither of these factors significantly moderated the relationship between empathy and motivation to wear a face mask, I thought it would be interesting to see whether there is a difference in their relationship.
To briefly recap, subjective vulnerability was coded on a 5-point Likert scale in which 1 represented the lowest and 5 represented the highest perceived personal harm that could come from infection with COVID-19. Objective vulnerability was a binary yes/no variable and was determined according to the criteria of the Robert Koch Institution. According to Jamovi, “High-risk” was coded as 1 and “Low-risk” was coded as 2.
To address this question, I wanted to produce a visual representation of the relationship before determining if any statistical tests are warranted.
Study4exp <- Study4_Rename %>%
rename(obj_vuln = "Q61",
subj_vuln = "Q100_1")
glimpse(Study4exp)
## Rows: 1,526
## Columns: 19
## $ QID90_1 <dbl> 1, 3, 1, 2, 2, 1, 5, 1, 1, 5, 5, 3, 1, 5, 4, 3, 4, 3, 1…
## $ QID90_2 <dbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, 3, 1…
## $ QID90_3 <dbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, 3, 2…
## $ mask_motiv <dbl> 5, 4, 4, 3, 5, 1, 5, 4, 4, 5, 5, 2, 5, 4, 4, 2, 5, 4, 5…
## $ Q92_1 <dbl> 4, 3, 5, 2, 5, 3, 2, 4, 3, 5, 5, 2, 4, 4, 3, 3, 5, 3, 5…
## $ Q92_2 <dbl> 1, 3, 1, 3, 1, 5, 1, 2, 2, 1, 1, 3, 1, 4, 3, 3, 1, 3, 1…
## $ Q92_3 <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5…
## $ subj_vuln <dbl> 2, 2, 3, 2, 3, 1, 2, 3, 4, 2, 4, 3, 3, 4, 3, 2, 3, 2, 4…
## $ Q100_2 <dbl> 3, 3, 4, 3, 3, 1, 3, 3, 4, 3, 4, 3, 3, 3, 4, 4, 3, 3, 4…
## $ obj_vuln <dbl> 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, 1, 2…
## $ Q20 <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1…
## $ Age <dbl> 30, 52, 39, 29, 46, 20, 31, 43, 40, 18, 29, 33, 27, 28,…
## $ Gender <dbl> 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 2…
## $ Household_size <dbl> 1, 1, 3, 1, 1, 2, 3, 1, 4, 5, 2, 4, 2, 5, 2, 2, 2, 2, 5…
## $ Q28 <chr> "Gulasch", "Nudeln", "Nudelsalat", "Nudelgratin", "Sala…
## $ condition <dbl> 1, 1, 1, 0, 1, 0, 2, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, 1, 0…
## $ empathy <dbl> 1.000000, 3.000000, 1.000000, 2.000000, 2.000000, 1.000…
## $ Q92_2r <dbl> 5, 3, 5, 3, 5, 1, 5, 4, 4, 5, 5, 3, 5, 2, 3, 3, 5, 3, 5…
## $ maskePo <dbl> 4.5, 3.0, 5.0, 2.5, 5.0, 2.0, 3.5, 4.0, 3.5, 5.0, 5.0, …
My next task was to create summary tables such that I could garner a quick snapshot of the descriptive statistics and later graph the means for motivation to wear a face-mask with objective vulnerability and subjective vulnerability as grouping factors. This process was essentially identical as previous summary tables except that:
Study4exp_o <- Study4exp %>%
group_by(obj_vuln) %>%
summarise(mean_masko = mean(mask_motiv),
sd_masko = sd(mask_motiv)) %>%
ungroup()
print(Study4exp_o) #looks like there are some entries that are blank
## # A tibble: 3 × 3
## obj_vuln mean_masko sd_masko
## <dbl> <dbl> <dbl>
## 1 1 3.98 1.17
## 2 2 3.76 1.20
## 3 NA 2.5 0.707
Study4expfilter <- Study4exp %>%
filter(obj_vuln < 3)
Study4exp_o <- Study4expfilter %>%
group_by(obj_vuln) %>%
summarise(mean_masko = mean(mask_motiv),
sd_masko = sd(mask_motiv)) %>%
ungroup()
Study4exp_s <- Study4exp %>%
group_by(subj_vuln) %>%
summarise(mean_masks = mean(mask_motiv),
sd_masks = sd(mask_motiv)) %>%
ungroup()
print(Study4exp_s)
## # A tibble: 5 × 3
## subj_vuln mean_masks sd_masks
## <dbl> <dbl> <dbl>
## 1 1 2.45 1.44
## 2 2 3.61 1.20
## 3 3 3.96 1.05
## 4 4 4.28 0.992
## 5 5 4.82 0.644
Now to plotting this information.
expplot4_1 <- ggplot(data = Study4expfilter,
mapping = aes(x = subj_vuln, y = mask_motiv)) +
geom_smooth(aes(colour = factor(obj_vuln)),
method = lm,
se = FALSE) + #this removes the confidence interval shading that was present in Figure 1
scale_x_continuous(name = "Subjective Vulnerability", limits = c(1,5)) +
scale_y_continuous(name = "Face-Mask Wearing Motivation", limits = c(1,5)) +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "bottom", #this specifies the location of the legend as being at the bottom of the plot
axis.title = element_text(size = 10),
legend.title = element_text(size = 9), #this specifies the size of the legend title
legend.text = element_text(size = 8)) + #this specifies the size of the legend labels
scale_colour_manual(name = element_text("Objective Vulnerability"), #this specifies the legend title
labels = c("Low-risk", "High-risk"), #this specifies the legend labels
values = c("#B3D88B", "#1779B6")) #this specifies the colours of each line (I wanted to match the aesthetics of the original paper)
print(expplot4_1)
## `geom_smooth()` using formula 'y ~ x'
From this graph, it looks like there is absolutely no effect of objective vulnerability on the relationship between perceived personal vulnerability and face mask wearing: the lines are almost completely overlapping! Because this graph was a little difficult to wrap my head around, I decided to break it down into two smaller graphs plotting objective vulnerability against motivation to wear a face mask and subjective vulnerability against motivation to wear a face mask individually.
This was done using geom_col() which I am quite familiar with by now. I have noted any differences in the code below to reduce repetition as much as possible.
expplot4_2 <- ggplot(data = Study4exp_o,
mapping = aes(x = factor(obj_vuln), y = mean_masko)) +
geom_col(fill = "#A7CFE5") +
scale_x_discrete(name = "Objective Vulnerability", labels = c("Low-risk", "High-risk")) +
scale_y_continuous(name = "Face-Mask Wearing Motivation", limits = c(0,5)) + # I set the limits to 0 and 5 to match the graph produced for subjective vulnerability
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "none",
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 6))
print(expplot4_2)
expplot4_3 <- ggplot(data = Study4exp_s,
mapping = aes(x = factor(subj_vuln), y = mean_masks)) +
geom_col(fill = "#A7CFE5") +
scale_x_discrete(name = "Subjective Vulnerability") +
scale_y_continuous(name = "Face-Mask Wearing Motivation") +
theme_bw() +
theme(panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "none",
axis.title.x = element_text(size = 10),
axis.title.y = element_text(size = 6))
print(expplot4_3)
For the sake of completeness, I wanted to put all the graphs together such that they complemented each other and the smaller graphs on the right hand side extrapolated what is depicted in the larger one on the left. However, I wasn’t sure how to use the ggarrange() function (as used to reproduce Figure 1) to arrange the graphs in a specified configuration. After some Googling, it looked like the “patchwork” package made this process very simple with / transposing the following graph into the line below and | transposing the following graph into a column on the right.
#https://ggplot2-book.org/arranging-plots.html
library(patchwork)
(expplot4_1|expplot4_2/expplot4_3)
## `geom_smooth()` using formula 'y ~ x'
This produced exactly the output I wanted: the main graph on the left hand side and two of the smaller graphs on the right. While the paper recognised that there were no significant interactions between either vulnerability measures and empathy’s effect on the wearing of face masks, the paper did not compare the difference between how subjective and objective vulnerability related to the wearing of face masks. This figure indicates that subjective vulnerability may have a stronger impact on motivation to wear a face mask relative to objective vulnerability which appears to have no effect at all. This supports my initial hypothesis regarding the importance of one’s subjective perception in influencing our behaviour.
My first recommendation is for authors to include a codebook or other document that is sufficiently detailed and descriptive. One of the first issues that we encountered was difficulty interpreting the variable labels which led to a tedious process of trial and error when reproducing the reported descriptive statistics. To begin remedying the reproducibility crisis and facilitate maximal transparency within the scientific community, experiments should be relatively easy to reproduce. In order to fulfil this goal, any accompanying codebooks or supplementary materials should be easy to understand whilst maintaining a high level of detail. They should also aim to be accurate since we found some discrepancies between the supplementary materials and the paper itself in our verification process.
When describing the data, there should be description at both the variable-level and the collection-level. Noting that it may be difficult for experimenters who may have specific (and habitual) naming conventions for their data to change the way they name variables, having a comprehensive codebook which details specifically which measures belong to which variable name would be very useful. Furthermore, specific information including the text of all questions, details on missing data, exclusion criteria and information on how the variables were constructed/calculated should all be included. This helps uphold principles of reproducibility and accountability that are central to open science. Codebooks can be created in programs such as SPSS and excel although experimenters unfamiliar with those programs can also create supplementary materials using a word processor.
This is a useful resource that includes both examples and guidelines for writing an appropriately detailed codebook: https://www.icpsr.umich.edu/icpsrweb/content/shared/ICPSR/faqs/what-is-a-codebook.html. This website is also a useful resource for learning how to create a codebook in SPSS specifically: https://libguides.library.kent.edu/spss/codebooks.
Another issue we encountered was always having to check the data files in order to determine which number in the variable values corresponded with which parameter. For example, while the control condition was coded as “0” in Study 3, it was coded as “1” in Study 4. Additionally, while empathy was labelled as “empa” in Study 4 (which is somewhat intuitive), it was labelled as “ve” in Study 1. This generates unnecessary confusion and can be quite inefficient if we need to re-consult the data for each study especially when there are so many studies to reproduce within the one paper. Thus, my third recommendation is to encourage experimenters to code their variables consistently to minimise such confusion and aid the ease of reproducing their data. By gradually eroding the barriers to reproducing studies, perhaps we as a scientific community can encourage more attempts at reproducing experimenters’ findings and therefore cement accountability as a core tenet in academia.
To learn more about best practice when coding variables as well as other recommendations for how to uphold open science principles see Kathawalla, Silverstein and Syed’s (2021) article “Easing Into Open Science: A Guide for Graduate Students and Their Advisors.” It’s a very helpful article that details how and why certain practices are helpful as well as addresses common worries when engaging in these practices. Furthermore, there is a tiered system of difficulty that allows experimenters to ease into open science in a manageable way.
Thanks for reading!!