The social and economic impacts of the COVID-19 pandemic around the world have been nothing short of devastating, with evidence-based behavioural measures such as physical distancing and mask-wearing being implemented in the hopes of reducing the transmission of the virus. The ongoing nature of this pandemic highlights the importance of research that explores the psychological motivation behind complying with COVID-19 behavioural measures in informing policy makers on how to best promote the necessity of these measures in slowing down the spread of COVID-19.
Previous research has found that having empathy, a prosocial emotional process that involves having an understanding and concern for vulnerable people, improves health outcomes. This notion led to Pfattheicher et al. (2020) to develop two aims that were tested across four studies with a total of 3718 participants.
The first two studies were correlational studies and they investigated the first aim, which was to assess whether there is an association between empathy for people who are the most vulnerable to the COVID-19 (for example, elderly people and frontline healthcare workers) and increased physical distancing. The last two studies explored the second aim, which was to determine whether inducing empathy can be used as a tool for motivating people to physical distance (Study 3) and to wear a face mask (Study 4).
Each study consisted of non-representative samples from Germany, the United States of America, and the United Kingdom and these participants completed a short, paid survey about their opinions of the current COVID-19 health regulations in their county, their current protective practices and their perceived vulnerability to the virus.
Across the four studies, participants were randomly assigned to one of three conditions, which were the information-only condition (in which basic information about COVID-19 risks and the importance of adhering to health guidelines was presented), the combined information and empathy condition (in which there was information presented, along with a video or text about the harrowing experiences of patients who had severe COVID-19 or about family members who could not see their sick loved ones), and the control condition.
For studies both 1 and 2, Pfattheicher et al. (2020) found that empathy and physical distancing were positively correlated. Athough the causal nature of this relationship could not be established, Pfattheicher et al. (2020) argued that empathy levels are a useful predictor of who will more likely engage in physical distancing.
Additionally, it was found in studies 3 and 4 that motivation to physical distance and wear masks increased with higher levels of state empathy, so it was concluded that empathy can be used as an emotional tool to promote motivation to follow COVID-19 measures. Thus, the policy implications of these conclusions were that merely providing basic background information about the importance of social distancing and mask wearing may be insufficient in encouraging prosocial behaviour.
The most interesting part of this paper was that in both studies 3 and 4, the three experimental conditions did not significantly differ in terms of the assessed demographics, which included gender, age, and household size.
Previous studies have shown that women are more likely to engage in prosocial behaviours during epidemics and pandemics than men (Haischer et al. 2020), so naturally, I thought that the present study would find that the motivation to wear face masks and physically distance during the COVID-19 pandemic was higher among women.
Additionally, elderly people are at higher risk of developing a severe case of COVID-19, so I thought it was reasonable to expect that people in that age category would be much more motivated to engage in protective measures against COVID-19 than people in younger age categories. I also believed that bigger households would have higher motivation to adhere to COVID-19 measures given that if one family member is at increased risk of contracting COVID-19, then all the other family members would also be at increased risk.
So, even though I honestly thought that these demographic factors would have been highly influential in compliance along with empathy, the present study seemed to suggest that this was not the case.
However, I was not entirely convinced that the control groups were already highly motivated to engage in protective measures prior to, and during, the study. Although these measures are mandatory in the public places of most countries, as Pfattheicher et al. (2020) notes, physical distancing and wearing masks both come at considerable personal cost, such as that they reduce social contact with loved ones, as well as alter appearances and breathing.
So while many people adhere to public health advice, some individuals around the world, primarily in the Western countries such as the United States, Canada, Europe and Australia, vehemently object to it. Under the beliefs that wearing a mask is ineffective in preventing COVID-19 and threatens personal freedom, some go as far as attending rallies that protest against masks, social distancing, lockdown, and vaccines (Taylor & Asmundson, 2020).
Thus, since the samples for the present study were derived from Western countries (the US, the United Kingdom and Germany), I cannot help but be skeptical about the high baseline motivation for each of the four studies. In fact, public health experts and policy-makers have appeared countlessly in the media to beg people to wear masks and physically distance when out in public, which suggests to me that expert health advice tended to be ignored up until heavy fines and increased restrictions were introduced.
It seems that the next step in this area of research would be to determine whether empathy has the same effect on motivation to physical distance and wear face masks in non-Western countries. Pfattheicher et al. (2020) acknowledged in their discussion that all four of their studies were only conducted in Western countries, using samples that were not representative of the populations of those countries. Whilst I believe that this should not detract from their primary conclusion that empathy could be used as a tool to promote methods of mitigating the COVID-19 pandemic, I feel like that conclusion would have been stronger if the study included non-Western countries such as China, India, and Nigeria, along with more representative samples.
Additionally, since the conclusions of studies 3 and 4 were based on a short-term induction of empathy amongst the participants in the empathy condition, I believe that it would be beneficial for future research in this area to replicate the findings of the present study with different empathy-inducing stimuli that measures the impact of empathy on motivation to adhere to COVID-19 measures in the longer-term.
My group and I obtained the data that was gathered by Pfattheicher and his colleagues from the OSF repository, with the goal to reproduce the statistics and figures that were included in the paper.
The data files for each of the four studies were in .sav and .xlxs format, making them compatible with SPSS and Excel respectively. Since I am more familiar with using SPSS, I chose to use the ‘read_sav()’ function from the ‘haven’ package that reads SPSS files.
However, the variable labels in all of the data files were written in German, so the first step of our verification process was to translate these variable labels into English. I translated the variable names for study 4 into English and my group members did the translations for the other three studies. Although this was a tedious task, it allowed us to obtain important information about the variables we needed to incorporate into each of our data replications.
Since four studies were conducted, my group and I had to replicate four sets of demographic descriptives, four sets of summary statistics, and three figures.
library(tidyverse) # for data manipulation and visualisation
library(dbplyr) # a database backend that allows for the use of remote database tables
library(ggplot2) # allows for the creation of graphs and figures
library(haven) # to read files from spss
library(here) # to help with file paths
library(janitor) # to clean names and count things
library(gt) # for nice tables
library(ggeasy) # for easy ggplot functions
library(gridExtra) # arranges multiple plots on a page
tinytex::install_tinytex() # allows you to knit to PDF/word
tinytex:::is_tinytex()
## [1] TRUE
Another thing to note is that I needed to update R on my computer to the newest version (R 4.0.5) because I was unable to install the packages I needed to put together this report with the previous version of R 4.0.4. So I installed the package “installr” and used the “updateR()” function in order to launch the R updating process.
Study 1 was a correlational study (N = 965) and consisted of three samples from the US (n = 322), the UK (n = 317), and Germany (n = 326). To determine whether there was an association between empathy and physical distancing practices (the two dependent variables), the survey measured affective empathy for people who are most vulnerable to COVID-19 by having the participants rate three statements on a Likert scale that ranged from 1 (Strongly Disagree) to 5 (Strongly Agree). The participants also had self-report their current physical distancing practices on the same five-point Likert scale.
To start, we used the glimpse function to examine the data in the United States sample from Study 1. Although the variable names were not very clear at first glance, the document titled “Supplementary Materials” that within OSF Repository provided us with more details about what they meant.
# Read data
Study1USA <- read_sav("PSYC3361 Project/COVID paper data+syntax/Study1USA.sav")
glimpse(Study1USA)
## Rows: 322
## Columns: 27
## $ Q22_1 <dbl+lbl> 4, 4, 5, 5, 5, 2, 5, 1, 5, 4, 5, 5, 4, 4, 5, 4, 5, ~
## $ Q22_2 <dbl+lbl> 4, 3, 5, 5, 4, 2, 5, 5, 5, 5, 5, 4, 4, 3, 5, 3, 4, ~
## $ Q22_3 <dbl+lbl> 4, 4, 5, 5, 5, 2, 5, 3, 5, 4, 5, 5, 4, 4, 5, 5, 5, ~
## $ Q22_4 <dbl+lbl> 1, 1, 4, 4, 3, 2, 2, 1, 4, 3, 5, 2, 4, 1, 5, 1, 4, ~
## $ Q22_5 <dbl+lbl> 4, 3, 5, 4, 4, 2, 5, 1, 5, 4, 5, 3, 4, 3, 5, 5, 5, ~
## $ Q22_6 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q22_7 <dbl+lbl> 5, 2, 4, 4, 4, 2, 2, 4, 5, 4, 4, 4, 3, 3, 5, 5, 4, ~
## $ Q24_1 <dbl+lbl> 4, 4, 5, 4, 5, 2, 4, 5, 5, 4, 4, 4, 5, 2, 5, 4, 4, ~
## $ Q24_2 <dbl+lbl> 4, 5, 5, 5, 5, 2, 5, 5, 5, 4, 5, 4, 5, 3, 5, 5, 5, ~
## $ Q24_3 <dbl+lbl> 5, 5, 5, 5, 5, 2, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q24_4 <dbl+lbl> 5, 5, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q25_1 <dbl+lbl> 5, 5, 5, 5, 5, 2, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q25_2 <dbl+lbl> 5, 5, 5, 5, 5, 3, 5, 5, 5, 4, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q25_3 <dbl+lbl> 5, 5, 5, 5, 4, 2, 4, 3, 5, 4, 5, 4, 5, 4, 5, 5, 5, ~
## $ Q25_4 <dbl+lbl> 5, 5, 5, 5, 4, 4, 5, 5, 5, 4, 5, 4, 5, 4, 5, 5, 5, ~
## $ Q32 <dbl+lbl> 3, 4, 1, 7, 1, 8, 3, 5, 2, 6, 3, 2, 5,~
## $ Q20 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Age <dbl> 20, 19, 22, 46, 32, 19, 49, 31, 51, 21, 18, 21, 25, 34,~
## $ Gender <dbl+lbl> 1, 1, 2, 2, 1, 1, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 2, ~
## $ Household_size <dbl> 4, 4, 4, 3, 3, 3, 1, 5, 3, 3, 4, 3, 2, 4, 5, 1, 2, 1, 2~
## $ Education <dbl+lbl> 3, 2, 4, 5, 4, 2, 4, 4, 4, 3, 3, 3, 4, 4, 4, 4, 4, ~
## $ Income <dbl+lbl> 12, 3, 12, 6, 9, 1, 3, 7, 2, 6, 10, 8, 6,~
## $ ve <dbl> 4.000000, 3.666667, 5.000000, 4.666667, 4.666667, 2.000~
## $ sd <dbl> 4, 4, 5, 4, 5, 2, 4, 5, 5, 4, 4, 4, 5, 2, 5, 4, 4, 5, 4~
## $ q <dbl> 5, 5, 5, 5, 5, 3, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5~
## $ i <dbl> 5.0, 5.0, 5.0, 5.0, 5.0, 2.5, 5.0, 5.0, 5.0, 4.0, 5.0, ~
## $ gov <dbl> 5.0, 5.0, 5.0, 5.0, 4.5, 3.0, 4.5, 4.0, 5.0, 4.5, 5.0, ~
Upon looking at the value labels for the “Gender” variable in the SPSS dataset, we determined that “1” stood for “Male” and “2” stood for “Female”. We then used the “tably()” function to reproduce the percentages of males and females in the Study 1 German sample that were reported on page 1365 of the paper.
Following this, we used the “mean()” and “sd()” functions to reproduce the average age of the participants and the standard deviation.
# Demographic descriptives
Study1USA %>%
tabyl(Gender) %>%
adorn_pct_formatting()
## Gender n percent
## 1 172 53.4%
## 2 147 45.7%
## 3 3 0.9%
mean(Study1USA$Age)
## [1] 33.3323
sd(Study1USA$Age)
## [1] 13.00269
Next we produced the summary statistics that were reported on page 1365. Firstly, we had to determine the survey items that were used to measure the participants’ empathy and physical distancing practices. By looking at the Supplementary Materials and the SPSS data, we found that the empathy measure consisted of three items and were labelled as Q22_1, Q22_3 and Q22_5. Since there was only one mean and standard deviation value for empathy, reproducing the summary statistics for this measure was a bit complicated because we could not just simply compute the averages of each of the three items.
However, we realised that we could use the “mutate()” function to create a new variable (that we titled “empathy”) that summed the three empathy items together. After this, we created another variable called “empathy_avg” that divided our previous “empathy” variable by three in order to calculate the average empathy score for each participant. Then we used the “mean()” and “sd()” function and found that we successfully reproduced the summary statistics for the empathy measure.
On the other hand, only one survey item was used to measure physical distancing, so the process of reproducing the summary statistics for this measure was a lot easier.
# Summary statistics (means/SD)
mean(Study1USA$Q24_1) # physical distancing mean
## [1] 4.298137
sd(Study1USA$Q24_1) # physical distancing standard deviation
## [1] 0.9912057
Study1USA <- mutate(Study1USA, empathy = Q22_1+Q22_3+Q22_5)
print(Study1USA) # combined empathy items
## # A tibble: 322 x 28
## Q22_1 Q22_2 Q22_3 Q22_4 Q22_5 Q22_6 Q22_7 Q24_1 Q24_2
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 4 4 4 1 [stro~ 4 5 [stro~ 5 [str~ 4 4
## 2 4 3 4 1 [stro~ 3 5 [stro~ 2 4 5 [str~
## 3 5 [stro~ 5 [stro~ 5 [stro~ 4 5 [stro~ 5 [stro~ 4 5 [str~ 5 [str~
## 4 5 [stro~ 5 [stro~ 5 [stro~ 4 4 5 [stro~ 4 4 5 [str~
## 5 5 [stro~ 4 5 [stro~ 3 4 5 [stro~ 4 5 [str~ 5 [str~
## 6 2 2 2 2 2 5 [stro~ 2 2 2
## 7 5 [stro~ 5 [stro~ 5 [stro~ 2 5 [stro~ 5 [stro~ 2 4 5 [str~
## 8 1 [stro~ 5 [stro~ 3 1 [stro~ 1 [stro~ 5 [stro~ 4 5 [str~ 5 [str~
## 9 5 [stro~ 5 [stro~ 5 [stro~ 4 5 [stro~ 5 [stro~ 5 [str~ 5 [str~ 5 [str~
## 10 4 5 [stro~ 4 3 4 5 [stro~ 4 4 4
## # ... with 312 more rows, and 19 more variables: Q24_3 <dbl+lbl>,
## # Q24_4 <dbl+lbl>, Q25_1 <dbl+lbl>, Q25_2 <dbl+lbl>, Q25_3 <dbl+lbl>,
## # Q25_4 <dbl+lbl>, Q32 <dbl+lbl>, Q20 <dbl+lbl>, Age <dbl>, Gender <dbl+lbl>,
## # Household_size <dbl>, Education <dbl+lbl>, Income <dbl+lbl>, ve <dbl>,
## # sd <dbl>, q <dbl>, i <dbl>, gov <dbl>, empathy <dbl>
Study1USA <- mutate(Study1USA, empathy_avg = empathy/3) # average of combined empathy items
print(Study1USA)
## # A tibble: 322 x 29
## Q22_1 Q22_2 Q22_3 Q22_4 Q22_5 Q22_6 Q22_7 Q24_1 Q24_2
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 4 4 4 1 [stro~ 4 5 [stro~ 5 [str~ 4 4
## 2 4 3 4 1 [stro~ 3 5 [stro~ 2 4 5 [str~
## 3 5 [stro~ 5 [stro~ 5 [stro~ 4 5 [stro~ 5 [stro~ 4 5 [str~ 5 [str~
## 4 5 [stro~ 5 [stro~ 5 [stro~ 4 4 5 [stro~ 4 4 5 [str~
## 5 5 [stro~ 4 5 [stro~ 3 4 5 [stro~ 4 5 [str~ 5 [str~
## 6 2 2 2 2 2 5 [stro~ 2 2 2
## 7 5 [stro~ 5 [stro~ 5 [stro~ 2 5 [stro~ 5 [stro~ 2 4 5 [str~
## 8 1 [stro~ 5 [stro~ 3 1 [stro~ 1 [stro~ 5 [stro~ 4 5 [str~ 5 [str~
## 9 5 [stro~ 5 [stro~ 5 [stro~ 4 5 [stro~ 5 [stro~ 5 [str~ 5 [str~ 5 [str~
## 10 4 5 [stro~ 4 3 4 5 [stro~ 4 4 4
## # ... with 312 more rows, and 20 more variables: Q24_3 <dbl+lbl>,
## # Q24_4 <dbl+lbl>, Q25_1 <dbl+lbl>, Q25_2 <dbl+lbl>, Q25_3 <dbl+lbl>,
## # Q25_4 <dbl+lbl>, Q32 <dbl+lbl>, Q20 <dbl+lbl>, Age <dbl>, Gender <dbl+lbl>,
## # Household_size <dbl>, Education <dbl+lbl>, Income <dbl+lbl>, ve <dbl>,
## # sd <dbl>, q <dbl>, i <dbl>, gov <dbl>, empathy <dbl>, empathy_avg <dbl>
mean(Study1USA$empathy_avg) # overall empathy average
## [1] 4.459627
sd(Study1USA$empathy_avg) # overall empathy standard deviation
## [1] 0.7427453
We then had to reproduce the descriptive and summary statistics for the United Kingdom and German samples, which were also reported on page 1365. Luckily, this was not a very time-consuming task, as the same items were used across all three samples. So apart from the German sample having a slightly different label for the physical distancing, we were able to use the same coding steps that we used for the United States sample.
# Read data
Study1UK <- read_sav("PSYC3361 Project/COVID paper data+syntax/Study1UK.sav")
glimpse(Study1UK)
## Rows: 317
## Columns: 28
## $ Q41 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Q42 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Q22_1 <dbl+lbl> 4, 4, 5, 5, 5, 5, 5, 4, 5, 5, 4, 5, 4, 5, 4, 5, 5, ~
## $ Q22_2 <dbl+lbl> 2, 4, 4, 3, 4, 5, 5, 5, 4, 4, 4, 4, 5, 3, 5, 5, 4, ~
## $ Q22_3 <dbl+lbl> 5, 4, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 4, 5, 5, 5, 5, ~
## $ Q22_4 <dbl+lbl> 1, 4, 4, 1, 2, 4, 3, 3, 2, 3, 4, 4, 5, 3, 2, 5, 2, ~
## $ Q22_5 <dbl+lbl> 5, 4, 5, 4, 3, 4, 3, 4, 5, 5, 4, 4, 4, 5, 4, 5, 4, ~
## $ Q22_6 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q22_7 <dbl+lbl> 3, 5, 3, 3, 4, 2, 5, 3, 4, 5, 3, 4, 4, 4, 3, 5, 4, ~
## $ Q24_1 <dbl+lbl> 3, 5, 5, 4, 3, 5, 4, 4, 4, 3, 4, 4, 3, 2, 2, 5, 5, ~
## $ Q24_2 <dbl+lbl> 3, 5, 5, 4, 4, 5, 4, 5, 4, 4, 4, 4, 5, 2, 3, 5, 5, ~
## $ Q24_3 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, 5, ~
## $ Q24_4 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 5, ~
## $ Q25_1 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 2, 5, ~
## $ Q25_2 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 4, 5, ~
## $ Q25_3 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 5, 4, 5, 4, 3, 4, 5, ~
## $ Q25_4 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 4, 4, 4, 5, 4, 5, 4, 4, 4, 5, ~
## $ Q20 <dbl+lbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, ~
## $ Age <dbl> 24, 31, 31, 33, 34, 21, 49, 27, 33, 45, 39, 39, 54, 53,~
## $ Gender <dbl+lbl> 1, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1, 2, 2, 2, 1, 1, 2, ~
## $ Household_size <dbl> 7, 4, 2, 3, 4, 2, 4, 3, 3, 4, 4, 2, 2, 3, 1, 2, 4, 2, 3~
## $ Education <dbl+lbl> 4, 3, 5, 4, 4, 3, 2, 4, 4, 3, 4, 4, 3, 2, 5, 5, 3, ~
## $ Income <dbl+lbl> 5, 5, 11, 5, 2, 2, 4, 4, 11, 4, 9, 3, 4,~
## $ ve <dbl> 4.666667, 4.000000, 5.000000, 4.666667, 4.333333, 4.666~
## $ sd <dbl> 3, 5, 5, 4, 3, 5, 4, 4, 4, 3, 4, 4, 3, 2, 2, 5, 5, 4, 5~
## $ q <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 5, 5, 5~
## $ i <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 3, 5, 5, 5~
## $ gov <dbl> 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.5, 4.5, 4.5, 5.0, ~
# Demographic descriptives
Study1UK %>%
tabyl(Gender) %>%
adorn_pct_formatting()
## Gender n percent
## 1 128 40.4%
## 2 188 59.3%
## 3 1 0.3%
mean(Study1UK$Age)
## [1] 38.05363
sd(Study1UK$Age)
## [1] 12.19879
# Summary statistics (means/SD)
mean(Study1UK$Q24_1) # physical distancing mean
## [1] 4.123028
sd(Study1UK$Q24_1) # physical distancing standard deviation
## [1] 1.009765
Study1UK <- mutate(Study1UK, empathy = Q22_1+Q22_3+Q22_5)
print(Study1UK) # combining empathy variables
## # A tibble: 317 x 29
## Q41 Q42 Q22_1 Q22_2 Q22_3 Q22_4 Q22_5 Q22_6 Q22_7
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 1 [Yes] 1 [Yes] 4 2 5 [stro~ 1 [stro~ 5 [str~ 5 [str~ 3
## 2 1 [Yes] 1 [Yes] 4 4 4 4 4 5 [str~ 5 [str~
## 3 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 4 5 [str~ 5 [str~ 3
## 4 1 [Yes] 1 [Yes] 5 [stro~ 3 5 [stro~ 1 [stro~ 4 5 [str~ 3
## 5 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 2 3 5 [str~ 4
## 6 1 [Yes] 1 [Yes] 5 [stro~ 5 [stro~ 5 [stro~ 4 4 5 [str~ 2
## 7 1 [Yes] 1 [Yes] 5 [stro~ 5 [stro~ 5 [stro~ 3 3 5 [str~ 5 [str~
## 8 1 [Yes] 1 [Yes] 4 5 [stro~ 5 [stro~ 3 4 5 [str~ 3
## 9 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 2 5 [str~ 5 [str~ 4
## 10 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 3 5 [str~ 5 [str~ 5 [str~
## # ... with 307 more rows, and 20 more variables: Q24_1 <dbl+lbl>,
## # Q24_2 <dbl+lbl>, Q24_3 <dbl+lbl>, Q24_4 <dbl+lbl>, Q25_1 <dbl+lbl>,
## # Q25_2 <dbl+lbl>, Q25_3 <dbl+lbl>, Q25_4 <dbl+lbl>, Q20 <dbl+lbl>,
## # Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>, Education <dbl+lbl>,
## # Income <dbl+lbl>, ve <dbl>, sd <dbl>, q <dbl>, i <dbl>, gov <dbl>,
## # empathy <dbl>
Study1UK <- mutate(Study1UK, empathy_avg = empathy/3) # average of combined empathy items
print(Study1UK)
## # A tibble: 317 x 30
## Q41 Q42 Q22_1 Q22_2 Q22_3 Q22_4 Q22_5 Q22_6 Q22_7
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 1 [Yes] 1 [Yes] 4 2 5 [stro~ 1 [stro~ 5 [str~ 5 [str~ 3
## 2 1 [Yes] 1 [Yes] 4 4 4 4 4 5 [str~ 5 [str~
## 3 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 4 5 [str~ 5 [str~ 3
## 4 1 [Yes] 1 [Yes] 5 [stro~ 3 5 [stro~ 1 [stro~ 4 5 [str~ 3
## 5 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 2 3 5 [str~ 4
## 6 1 [Yes] 1 [Yes] 5 [stro~ 5 [stro~ 5 [stro~ 4 4 5 [str~ 2
## 7 1 [Yes] 1 [Yes] 5 [stro~ 5 [stro~ 5 [stro~ 3 3 5 [str~ 5 [str~
## 8 1 [Yes] 1 [Yes] 4 5 [stro~ 5 [stro~ 3 4 5 [str~ 3
## 9 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 2 5 [str~ 5 [str~ 4
## 10 1 [Yes] 1 [Yes] 5 [stro~ 4 5 [stro~ 3 5 [str~ 5 [str~ 5 [str~
## # ... with 307 more rows, and 21 more variables: Q24_1 <dbl+lbl>,
## # Q24_2 <dbl+lbl>, Q24_3 <dbl+lbl>, Q24_4 <dbl+lbl>, Q25_1 <dbl+lbl>,
## # Q25_2 <dbl+lbl>, Q25_3 <dbl+lbl>, Q25_4 <dbl+lbl>, Q20 <dbl+lbl>,
## # Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>, Education <dbl+lbl>,
## # Income <dbl+lbl>, ve <dbl>, sd <dbl>, q <dbl>, i <dbl>, gov <dbl>,
## # empathy <dbl>, empathy_avg <dbl>
mean(Study1UK$empathy_avg) # overall empathy average
## [1] 4.560463
sd(Study1UK$empathy_avg) # overall empathy standard deviation
## [1] 0.6124644
# Read data
Study1Germany <- read_sav("PSYC3361 Project/Study1Germany.sav")
glimpse(Study1Germany)
## Rows: 326
## Columns: 29
## $ Q22_1 <dbl+lbl> 4, 5, 3, 2, 5, 4, 4, 4, 5, 2, 2, 4, 5, 5, 5, 4, 3, ~
## $ Q22_2 <dbl+lbl> 29, 32, 32, 32, 30, 32, 30, 29, 31, 29, 30, 30, 32,~
## $ Q22_3 <dbl+lbl> 4, 5, 4, 3, 3, 3, 5, 5, 5, 3, 3, 4, 2, 5, 4, 2, 3, ~
## $ Q22_4 <dbl+lbl> 29, 30, 28, 32, 29, 30, 30, 29, 29, 28, 28, 29, 29,~
## $ Q22_5 <dbl+lbl> 4, 5, 3, 3, 5, 2, 4, 4, 5, 3, 2, 4, 5, 5, 5, 2, 4, ~
## $ Q22_6 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Q22_7 <dbl+lbl> 29, 30, 32, 30, 32, 29, 29, 30, 31, 30, 32, 30, 29,~
## $ Q77_1 <dbl+lbl> 5, 4, 3, 5, 5, 1, 3, 5, 5, 1, 4, 4, 5, 4, 3, 5, 3, ~
## $ Q77_2 <dbl+lbl> 4, 4, 4, 5, 5, 4, 3, 5, 5, 2, 4, 4, 5, 5, 5, 5, 4, ~
## $ Q77_3 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q77_4 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q78_1 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q78_2 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 3, 5, ~
## $ Q78_3 <dbl+lbl> 4, 5, 4, 5, 5, 4, 4, 5, 5, 1, 4, 4, 5, 5, 4, 4, 4, ~
## $ Q78_4 <dbl+lbl> 4, 5, 4, 5, 5, 4, 4, 5, 5, 3, 4, 4, 5, 5, 4, 5, 4, ~
## $ Q20 <dbl+lbl> 1, 1, 4, 4, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Age <dbl> 18, 29, 18, 32, 37, 39, 20, 22, 39, 22, 32, 29, 26, 24,~
## $ Gender <dbl+lbl> 1, 2, 1, 2, 2, 1, 1, 2, 2, 1, 1, 1, 2, 2, 1, 1, 1, ~
## $ Household_size <dbl> 2, 1, 4, 2, 1, 1, 3, 4, 2, 2, 2, 3, 2, 1, 3, 2, 1, 2, 1~
## $ Q21 <chr> "5b3d0c7477cb3a00019c1bb3", "5d9739389e5b9a0018820f68",~
## $ Q28 <chr> "Pizza", "Pizza", "Schweinefleisch mit Pilzen", "Brot u~
## $ `filter_$` <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ ve <dbl> 4.000000, 5.000000, 3.333333, 2.666667, 4.333333, 3.000~
## $ sd <dbl> 5, 4, 3, 5, 5, 1, 3, 5, 5, 1, 4, 4, 5, 4, 3, 5, 3, 5, 5~
## $ q <dbl> 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, ~
## $ i <dbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 4, 5, 5, 5~
## $ Zve <dbl> -0.01877901, 1.06156624, -0.73900918, -1.45923934, 0.34~
## $ Zsd <dbl> -0.5627027, 0.8062606, -0.5627027, 0.8062606, 0.8062606~
## $ gov <dbl> 4.0, 5.0, 4.0, 5.0, 5.0, 4.0, 4.0, 5.0, 5.0, 2.0, 4.0, ~
# Demographic descriptives
Study1Germany %>%
tabyl(Gender) %>%
adorn_pct_formatting()
## Gender n percent
## 1 172 52.8%
## 2 152 46.6%
## 3 2 0.6%
mean(Study1Germany$Age) # mean age
## [1] 29.44172
sd(Study1Germany$Age) # standard deviation for age
## [1] 9.312092
# Summary statistics (means/SD)
Study1Germany <- mutate(Study1Germany, empathy = Q22_1+Q22_3+Q22_5)
print(Study1Germany) # combining empathy items
## # A tibble: 326 x 30
## Q22_1 Q22_2 Q22_3 Q22_4 Q22_5 Q22_6 Q22_7 Q77_1 Q77_2 Q77_3
## <dbl+lb> <dbl+> <dbl+lb> <dbl+> <dbl+l> <dbl+l> <dbl> <dbl+l> <dbl+l> <dbl+l>
## 1 4 29 4 29 4 1 [str~ 29 5 [str~ 4 5 [str~
## 2 5 [stro~ 32 5 [stro~ 30 5 [str~ 1 [str~ 30 4 4 5 [str~
## 3 3 32 4 28 3 1 [str~ 32 3 4 5 [str~
## 4 2 32 3 32 3 1 [str~ 30 5 [str~ 5 [str~ 5 [str~
## 5 5 [stro~ 30 3 29 5 [str~ 1 [str~ 32 5 [str~ 5 [str~ 5 [str~
## 6 4 32 3 30 2 1 [str~ 29 1 [str~ 4 5 [str~
## 7 4 30 5 [stro~ 30 4 1 [str~ 29 3 3 5 [str~
## 8 4 29 5 [stro~ 29 4 1 [str~ 30 5 [str~ 5 [str~ 5 [str~
## 9 5 [stro~ 31 5 [stro~ 29 5 [str~ 1 [str~ 31 5 [str~ 5 [str~ 5 [str~
## 10 2 29 3 28 3 1 [str~ 30 1 [str~ 2 5 [str~
## # ... with 316 more rows, and 20 more variables: Q77_4 <dbl+lbl>,
## # Q78_1 <dbl+lbl>, Q78_2 <dbl+lbl>, Q78_3 <dbl+lbl>, Q78_4 <dbl+lbl>,
## # Q20 <dbl+lbl>, Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>,
## # Q21 <chr>, Q28 <chr>, filter_$ <dbl+lbl>, ve <dbl>, sd <dbl>, q <dbl>,
## # i <dbl>, Zve <dbl>, Zsd <dbl>, gov <dbl>, empathy <dbl>
Study1Germany <- mutate(Study1Germany, empathy_avg = empathy/3)
print(Study1Germany) # average of combined empathy items
## # A tibble: 326 x 31
## Q22_1 Q22_2 Q22_3 Q22_4 Q22_5 Q22_6 Q22_7 Q77_1 Q77_2 Q77_3
## <dbl+lb> <dbl+> <dbl+lb> <dbl+> <dbl+l> <dbl+l> <dbl> <dbl+l> <dbl+l> <dbl+l>
## 1 4 29 4 29 4 1 [str~ 29 5 [str~ 4 5 [str~
## 2 5 [stro~ 32 5 [stro~ 30 5 [str~ 1 [str~ 30 4 4 5 [str~
## 3 3 32 4 28 3 1 [str~ 32 3 4 5 [str~
## 4 2 32 3 32 3 1 [str~ 30 5 [str~ 5 [str~ 5 [str~
## 5 5 [stro~ 30 3 29 5 [str~ 1 [str~ 32 5 [str~ 5 [str~ 5 [str~
## 6 4 32 3 30 2 1 [str~ 29 1 [str~ 4 5 [str~
## 7 4 30 5 [stro~ 30 4 1 [str~ 29 3 3 5 [str~
## 8 4 29 5 [stro~ 29 4 1 [str~ 30 5 [str~ 5 [str~ 5 [str~
## 9 5 [stro~ 31 5 [stro~ 29 5 [str~ 1 [str~ 31 5 [str~ 5 [str~ 5 [str~
## 10 2 29 3 28 3 1 [str~ 30 1 [str~ 2 5 [str~
## # ... with 316 more rows, and 21 more variables: Q77_4 <dbl+lbl>,
## # Q78_1 <dbl+lbl>, Q78_2 <dbl+lbl>, Q78_3 <dbl+lbl>, Q78_4 <dbl+lbl>,
## # Q20 <dbl+lbl>, Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>,
## # Q21 <chr>, Q28 <chr>, filter_$ <dbl+lbl>, ve <dbl>, sd <dbl>, q <dbl>,
## # i <dbl>, Zve <dbl>, Zsd <dbl>, gov <dbl>, empathy <dbl>, empathy_avg <dbl>
mean(Study1Germany$empathy_avg) # overall empathy mean
## [1] 4.017382
sd(Study1Germany$empathy_avg) # overall empathy standard deviation
## [1] 0.92563
mean(Study1Germany$Q77_1) # physical distancing mean
## [1] 4.039877
sd(Study1Germany$Q77_1) # physical distancing standard deviation
## [1] 1.112836
Like study 1, study 2 was a correlational study with the same measurement of empathy. Unlike study 1 however, study 2 only used a German sample (N = 359). Another point of difference was that physical distancing was assessed in a more concrete way by having the participants rate five statements on a Likert scale that ranged from 1 (very unlikely) and 5 (very likely) instead of just one statement.
# Read data
Study2 <- read_sav("PSYC3361 Project/Study2.sav")
glimpse(Study2)
## Rows: 359
## Columns: 20
## $ Q36_1 <dbl+lbl> 5, 5, 5, 4, 2, 2, 4, 4, 5, 4, 3, 5, 4, 4, 5, 3, 5, ~
## $ Q36_2 <dbl+lbl> 3, 3, 5, 3, 3, 4, 5, 3, 4, 3, 5, 3, 4, 4, 5, 4, 3, ~
## $ Q36_3 <dbl+lbl> 5, 5, 5, 4, 2, 2, 4, 4, 5, 4, 3, 5, 4, 2, 5, 3, 5, ~
## $ Q36_4 <dbl+lbl> 3, 1, 5, 3, 2, 4, 2, 1, 2, 1, 5, 3, 4, 4, 5, 3, 3, ~
## $ Q36_5 <dbl+lbl> 5, 5, 5, 4, 2, 2, 5, 4, 5, 4, 3, 5, 4, 4, 5, 2, 5, ~
## $ Q36_6 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Q36_7 <dbl+lbl> 3, 4, 3, 3, 2, 3, 3, 1, 4, 3, 4, 4, 2, 4, 3, 4, 2, ~
## $ Q22_1 <dbl+lbl> 1, 1, 2, 1, 4, 1, 1, 2, 1, 1, 1, 1, 1, 1, 2, 1, 1, ~
## $ Q22_2 <dbl+lbl> 1, 1, 2, 1, 3, 1, 1, 1, 1, 1, 2, 1, 1, 1, 2, 1, 1, ~
## $ Q22_3 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Q22_4 <dbl+lbl> 1, 1, 2, 1, 4, 1, 1, 1, 1, 5, 1, 1, 1, 1, 2, 1, 1, ~
## $ Q22_5 <dbl+lbl> 1, 1, 1, 1, 2, 1, 1, 2, 1, 1, 5, 1, 1, 1, 1, 1, 1, ~
## $ Q22_6 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q20 <dbl+lbl> 3, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, 4, 1, 1, 1, 1, 1, ~
## $ Age <dbl> 27, 36, 26, 32, 29, 32, 37, 23, 27, 32, 34, 28, 57, 44,~
## $ Gender <dbl+lbl> 1, 1, 1, 1, 1, 2, 1, 2, 1, 1, 2, 2, 2, 2, 1, 1, 2, ~
## $ Household_size <dbl> 1, 4, 1, 1, 1, 2, 3, 3, 3, 4, 3, 1, 1, 3, 1, 5, 2, 2, 4~
## $ ve <dbl> 5.000000, 5.000000, 5.000000, 4.000000, 2.000000, 2.000~
## $ sd <dbl> 1.0, 1.0, 1.6, 1.0, 2.8, 1.0, 1.0, 1.4, 1.0, 1.8, 2.0, ~
## $ sdR <dbl> 5.0, 5.0, 4.4, 5.0, 3.2, 5.0, 5.0, 4.6, 5.0, 4.2, 4.0, ~
We used the steps the took in Study 1 to obtain the demographic descriptives for Study 2 that were reported on page 1365. The value labels for each gender were the same as those in Study 1.
# Demographic descriptives
Study2 %>%
tabyl(Gender) %>%
adorn_pct_formatting()
## Gender n percent
## 1 184 51.3%
## 2 174 48.5%
## 3 1 0.3%
mean(Study2$Age)
## [1] 29.75209
sd(Study2$Age)
## [1] 9.402682
However, obtaining the summary statistics for Study 2 (also reported on page 1365) was a bit more challenging than for Study 1. As shown below, averaging the physical distancing items like we did with the empathy measurement in Study 1 produced the right value for the standard deviation but we got the incorrect value for the physical distancing mean.
### Attempt at reproducing the summary statistics
Study2 <- mutate(Study2, physical_distancing = Q22_1 + Q22_2 + Q22_3 + Q22_4 + Q22_5)
print(Study2) # combining physical distancing variables
## # A tibble: 359 x 21
## Q36_1 Q36_2 Q36_3 Q36_4 Q36_5 Q36_6 Q36_7 Q22_1 Q22_2
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 5 [stro~ 3 5 [stro~ 3 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 2 5 [stro~ 3 5 [stro~ 1 [stro~ 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 3 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 1 [stro~ 3 2 2
## 4 4 3 4 3 4 1 [stro~ 3 1 [ver~ 1 [ver~
## 5 2 3 2 2 2 1 [stro~ 2 4 3
## 6 2 4 2 4 2 1 [stro~ 3 1 [ver~ 1 [ver~
## 7 4 5 [stro~ 4 2 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 8 4 3 4 1 [stro~ 4 1 [stro~ 1 [str~ 2 1 [ver~
## 9 5 [stro~ 4 5 [stro~ 2 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 10 4 3 4 1 [stro~ 4 1 [stro~ 3 1 [ver~ 1 [ver~
## # ... with 349 more rows, and 12 more variables: Q22_3 <dbl+lbl>,
## # Q22_4 <dbl+lbl>, Q22_5 <dbl+lbl>, Q22_6 <dbl+lbl>, Q20 <dbl+lbl>,
## # Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>, ve <dbl>, sd <dbl>,
## # sdR <dbl>, physical_distancing <dbl>
Study2 <- mutate(Study2, physical_distancing_avg = physical_distancing/5)
print(Study2) # averaging total for physical distancing
## # A tibble: 359 x 22
## Q36_1 Q36_2 Q36_3 Q36_4 Q36_5 Q36_6 Q36_7 Q22_1 Q22_2
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 5 [stro~ 3 5 [stro~ 3 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 2 5 [stro~ 3 5 [stro~ 1 [stro~ 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 3 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 1 [stro~ 3 2 2
## 4 4 3 4 3 4 1 [stro~ 3 1 [ver~ 1 [ver~
## 5 2 3 2 2 2 1 [stro~ 2 4 3
## 6 2 4 2 4 2 1 [stro~ 3 1 [ver~ 1 [ver~
## 7 4 5 [stro~ 4 2 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 8 4 3 4 1 [stro~ 4 1 [stro~ 1 [str~ 2 1 [ver~
## 9 5 [stro~ 4 5 [stro~ 2 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 10 4 3 4 1 [stro~ 4 1 [stro~ 3 1 [ver~ 1 [ver~
## # ... with 349 more rows, and 13 more variables: Q22_3 <dbl+lbl>,
## # Q22_4 <dbl+lbl>, Q22_5 <dbl+lbl>, Q22_6 <dbl+lbl>, Q20 <dbl+lbl>,
## # Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>, ve <dbl>, sd <dbl>,
## # sdR <dbl>, physical_distancing <dbl>, physical_distancing_avg <dbl>
mean(Study2$physical_distancing_avg) # overall physical distancing mean -> incorrect value
## [1] 1.44234
sd(Study2$physical_distancing_avg) # overall physical distancing standard deviation -> correct value
## [1] 0.6484973
Until we attended one of the Q and A sessions, we did not realise that there was a variable called “sdR” in our dataset that averaged the physical distancing items whilst accounting for the fact that the authors had re-coded the values so that higher values reflected more physical distancing. So after we replaced the variable we created called “physical_distancing_avg” with the “sdR” variable, we successfully reproduced the summary statistics for the physical distancing measure.
Reproducing the summary statistics for the empathy measure was much easier, as we were able to use the exact same coding technique as Study 1.
# Summary statistics (means/SD)
mean(Study2$sdR) # physical distancing mean
## [1] 4.55766
sd(Study2$sdR) # physical distancing standard deviation
## [1] 0.6484973
Study2 <- mutate(Study2, empathy = Q36_1 + Q36_3 + Q36_5)
print(Study2) # Combining empathy items
## # A tibble: 359 x 23
## Q36_1 Q36_2 Q36_3 Q36_4 Q36_5 Q36_6 Q36_7 Q22_1 Q22_2
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 5 [stro~ 3 5 [stro~ 3 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 2 5 [stro~ 3 5 [stro~ 1 [stro~ 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 3 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 1 [stro~ 3 2 2
## 4 4 3 4 3 4 1 [stro~ 3 1 [ver~ 1 [ver~
## 5 2 3 2 2 2 1 [stro~ 2 4 3
## 6 2 4 2 4 2 1 [stro~ 3 1 [ver~ 1 [ver~
## 7 4 5 [stro~ 4 2 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 8 4 3 4 1 [stro~ 4 1 [stro~ 1 [str~ 2 1 [ver~
## 9 5 [stro~ 4 5 [stro~ 2 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 10 4 3 4 1 [stro~ 4 1 [stro~ 3 1 [ver~ 1 [ver~
## # ... with 349 more rows, and 14 more variables: Q22_3 <dbl+lbl>,
## # Q22_4 <dbl+lbl>, Q22_5 <dbl+lbl>, Q22_6 <dbl+lbl>, Q20 <dbl+lbl>,
## # Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>, ve <dbl>, sd <dbl>,
## # sdR <dbl>, physical_distancing <dbl>, physical_distancing_avg <dbl>,
## # empathy <dbl>
Study2 <- mutate(Study2, empathy_avg = empathy/3)
print(Study2) # average of combined empathy items
## # A tibble: 359 x 24
## Q36_1 Q36_2 Q36_3 Q36_4 Q36_5 Q36_6 Q36_7 Q22_1 Q22_2
## <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+lb> <dbl+l> <dbl+l> <dbl+l>
## 1 5 [stro~ 3 5 [stro~ 3 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 2 5 [stro~ 3 5 [stro~ 1 [stro~ 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 3 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 5 [stro~ 1 [stro~ 3 2 2
## 4 4 3 4 3 4 1 [stro~ 3 1 [ver~ 1 [ver~
## 5 2 3 2 2 2 1 [stro~ 2 4 3
## 6 2 4 2 4 2 1 [stro~ 3 1 [ver~ 1 [ver~
## 7 4 5 [stro~ 4 2 5 [stro~ 1 [stro~ 3 1 [ver~ 1 [ver~
## 8 4 3 4 1 [stro~ 4 1 [stro~ 1 [str~ 2 1 [ver~
## 9 5 [stro~ 4 5 [stro~ 2 5 [stro~ 1 [stro~ 4 1 [ver~ 1 [ver~
## 10 4 3 4 1 [stro~ 4 1 [stro~ 3 1 [ver~ 1 [ver~
## # ... with 349 more rows, and 15 more variables: Q22_3 <dbl+lbl>,
## # Q22_4 <dbl+lbl>, Q22_5 <dbl+lbl>, Q22_6 <dbl+lbl>, Q20 <dbl+lbl>,
## # Age <dbl>, Gender <dbl+lbl>, Household_size <dbl>, ve <dbl>, sd <dbl>,
## # sdR <dbl>, physical_distancing <dbl>, physical_distancing_avg <dbl>,
## # empathy <dbl>, empathy_avg <dbl>
mean(Study2$empathy_avg) # overall empathy mean
## [1] 4.052925
sd(Study2$empathy_avg) # overall empathy standard deviation
## [1] 0.9414831
The plot for studies 1 and 2 that was published on page 1366 consisted of four line graphs that each included a 95% confidence interval and compared the empathy of the participants against their tendency to engage in physical distancing.
The idea of reproducing this plot was absolutely horrifying, but using the “ggplot 2” package to create a separate ggplot for each of the four samples made this task a lot less overwhelming.
From examining the Supplementary Materials, we discovered that the variables we needed for the plot were “ve”, which represented average empathy, and “sd”, which represented average physical distancing.
For the aesthetics, we used the “theme_classic()” function to remove the grid background the “geom_smooth()” function to obtain the line of best fit.
ggplot(data = Study1Germany, aes(x = ve, y = sd)) +
geom_smooth() + labs(title = "Study 1: German Sample") +
theme_classic()
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
The above chunk was our original code for the Study 1Germany plot, but when we first knitted it, we discovered that there was more work to do. So after a lot of troubleshooting, we figured out that we needed to specify a linear model in order to make the line straight rather than curvy.
Additionally, we removed the individual x and y-axis labels with the “theme()” function because the published combined graph only had one common label on each axis. We also added other customisations such as the colour black for the line of best fit and limits on the y-axis so that all the plots aligned.
Once we had the final code for the Study1Germany plot, we repeated this process twice more for the Study1UK, Study1USA, and Study2 plots.
Finally, after Jenny suggested that we use the “grid.arrange()” function during one of our Q and A sessions, we were able to successfully re-create the combined plot by specifying one row and adding common x and y-axis labels.
# creating line graphs for each of the samples from Studies 1 and 2
p1germany <- ggplot(data = Study1Germany, aes(x = ve, y = sd)) +
geom_smooth(method = lm, colour = "black") +
labs(title = " Study 1: \nGerman Sample") + theme_classic() +
theme(axis.title.x = element_blank(), # removing individual x and y-axis labels
axis.title.y = element_blank()) + ylim(1, 5) # adding limits to y-axis
p2uk <- ggplot(data = Study1UK, aes(x = ve, y = sd)) +
geom_smooth(method = lm, colour = "black") +
labs(title = " Study 1: \nUK Sample") + theme_classic() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank()) + xlim(1, 5) + ylim(1, 5) # adding limits to both axes
p3usa <- ggplot(data = Study1USA, aes(x = ve, y = sd)) +
geom_smooth(method = lm, colour = "black") +
labs(title = " Study 1: \nU.S. Sample") +
theme_classic() + theme(axis.title.x = element_blank(),
axis.title.y = element_blank()) + ylim(1, 5)
p4study2 <- ggplot(data = Study2, aes(x = ve, y = sdR)) +
geom_smooth(method = lm, colour = "black") +
labs(title = " Study 2: \nGerman Sample") + theme_classic() +
theme(axis.title.x = element_blank(),
axis.title.y = element_blank()) + ylim(1, 5)
grid.arrange(p3usa, p2uk, p1germany, p4study2, nrow = 1,
left = "Physical Distancing", bottom = "Empathy") # combining the above four plots
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
Unlike the first two studies, study 3 involved randomly assigning the participants (N = 868), all of whom were German into one of three conditions, including the information only condition (n = 305), the combined information and empathy condition (n = 258) and the control condition (n = 305). With the independent variable being the level of state empathy induced, the dependent variable was adherence to physical distancing. This was was measured with the same survey items that were used in Study 2.
# Read data
study3 <- read_sav("PSYCHOLOGICAL SCIENCE/PSYCHOLOGY/PSYC 3361/study3.sav")
View(study3)
Fortunately, the steps needed to produce the gender and age statistics (reported on page 1366) were the same as those for the first two studies. The value labels for gender were also the same.
# Demographic descriptives
study3 %>%
tabyl(Gender) %>%
adorn_pct_formatting()
## Gender n percent valid_percent
## 1 484 55.8% 56.0%
## 2 380 43.8% 44.0%
## NA 4 0.5% -
mean(study3$Age)
## [1] 35.09447
sd(study3$Age)
## [1] 12.44069
When it came to reproducing the summary statistics, we determined that the variable titled “bed” represented the empathy conditions. The assigned value labels were “0” for the control condition, “1” for the combined information and empathy condition, and “2” for the information-only condition. We then used the “filter()” function to create new variables and name them after the the condition they represented.
After this, we managed to reproduce the summary statistics (reported on page 1367) for each of the three conditions with the “sdR” variable, which represented physical distancing.
# Summary statistics (means/SD)
control <- study3 %>%
filter(bed == 0)
info_only <- study3 %>%
filter(bed == 2)
info_empathy <- study3 %>%
filter(bed == 1)
mean(control$sdR)
## [1] 4.297049
sd(control$sdR)
## [1] 0.7602098
mean(info_only$sdR)
## [1] 4.392787
sd(info_only$sdR)
## [1] 0.7393635
mean(info_empathy$sdR)
## [1] 4.510078
sd(info_empathy$sdR)
## [1] 0.6572961
The figure for Study 3 on page 1367 consisted of three violin plots, with one violin for experimental condition. The vertical box in each plot indicated the interquartile range from the 25th to the 75th percentile, the diamond indicated the mean, and the horizontal line indicated the median.
In order to replicate what looked like a very challenging figure, we first had to change each condition into factors. Then we used “ggplot()” to plot the empathy condition (“bed”) on the x-axis, and physical distancing motivation (“sdR”) on the y-axis. The “geom_violin()” function was then added along with the aesthetic “fill” to ensure each violin was coloured in. Following this, we added the boxplot with altered width and outlier sizes so that the boxplot would fit within the violins.
To add in the mean diamond, we used the “stat_summary()” function to specify a point to indicate the mean, which was subsequently changed into a diamond shape using “shape=9”.
The next part of the code for this figure was to match the same aesthetics as the graph from the paper. Using an online colour-picking tool, we found the hexadecimal values for the colours in each violin which were then added through the “scale_fill_manual()” function. An axis label titled “Physical-Distancing Motivation” was added for the y-axis, but we had to remove the common label for the x-axis. However, to match the figure in the paper, we created a label for each study condition on the x-axis by using the function “scale_x_discrete()”. Finally, the grey background was changed to white using by using the “theme_bw()” function, and used the “theme()” function to remove the gridlines and the legend.
# Check levels of condition
levels(study3$bed) #0, 1, 2
## NULL
# re-level factors
study3$bed <- as.factor(study3$bed)
study3$bed <- fct_relevel(study3$bed, c("0", "2", "1"))
levels(study3$bed) #0, 2, 1
## [1] "0" "2" "1"
graph1 <- ggplot(data = study3, aes(x = bed, y = sdR)) +
geom_violin(aes(fill = bed)) +
geom_boxplot(width = 0.1, outlier.size=0.5) +
stat_summary(fun = mean, geom = "point", shape = 9,
size = 2) +
scale_fill_manual(values = c("#a4cee4","#1578b5", "#b3d789")) +
labs(y = "Physical-Distancing Motivation", x = NULL) +
scale_x_discrete(labels = c("Control", "Information Only", "Information + Empathy")) +
theme_bw() +
theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
theme(legend.position = "none")
print(graph1)
That all being said, the main problem that we encountered with this figure was each condition was plotted in ascending order (as each condition was given a value of 0, 1, or 2), which was not the order that we desired.
So we had to figure out a way to order of conditions to be 0, 2, and 1 in order to match the plot shown in the paper. We did this by using the “levels” function to confirm that R was in fact plotting them in ascending order, and then we used the “fct_relevel()” function to reorganise the conditions into the right order. After re-checking the levels to see if the conditions had been reorganised, we re-ran the code for the plot and it was a success.
The participants for this final study (N = 1526) were randomly assigned into one of the three same conditions that were used in Study 3, including the information only condition (n = 492), the combined information and empathy condition (n = 500) and the control condition (n = 534). However, unlike Study 3, the dependent variable was motivation to wear a face mask, which was measured with the with one survey item.
# Read data
study4 <- read_sav("PSYCHOLOGICAL SCIENCE/PSYCHOLOGY/PSYC 3361/study4.sav")
glimpse(study4)
## Rows: 1,526
## Columns: 20
## $ QID90_1 <dbl+lbl> 1, 3, 1, 2, 2, 1, 5, 1, 1, 5, 5, 3, 1, 5, 4, 3, 4, ~
## $ QID90_2 <dbl+lbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, ~
## $ QID90_3 <dbl+lbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, ~
## $ Q22_1 <dbl+lbl> 5, 4, 4, 3, 5, 1, 5, 4, 4, 5, 5, 2, 5, 4, 4, 2, 5, ~
## $ Q92_1 <dbl+lbl> 4, 3, 5, 2, 5, 3, 2, 4, 3, 5, 5, 2, 4, 4, 3, 3, 5, ~
## $ Q92_2 <dbl+lbl> 1, 3, 1, 3, 1, 5, 1, 2, 2, 1, 1, 3, 1, 4, 3, 3, 1, ~
## $ Q92_3 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q100_1 <dbl+lbl> 2, 2, 3, 2, 3, 1, 2, 3, 4, 2, 4, 3, 3, 4, 3, 2, 3, ~
## $ Q100_2 <dbl+lbl> 3, 3, 4, 3, 3, 1, 3, 3, 4, 3, 4, 3, 3, 3, 4, 4, 3, ~
## $ Q61 <dbl+lbl> 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, ~
## $ Q20 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Age <dbl> 30, 52, 39, 29, 46, 20, 31, 43, 40, 18, 29, 33, 27, 28,~
## $ Gender <dbl+lbl> 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, ~
## $ Household_size <dbl> 1, 1, 3, 1, 1, 2, 3, 1, 4, 5, 2, 4, 2, 5, 2, 2, 2, 2, 5~
## $ Q28 <chr> "Gulasch", "Nudeln", "Nudelsalat", "Nudelgratin", "Sala~
## $ bed <dbl+lbl> 1, 1, 1, 0, 1, 0, 2, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, ~
## $ empa <dbl> 1.000000, 3.000000, 1.000000, 2.000000, 2.000000, 1.000~
## $ Q92_2r <dbl> 5, 3, 5, 3, 5, 1, 5, 4, 4, 5, 5, 3, 5, 2, 3, 3, 5, 3, 5~
## $ maskePo <dbl> NA, 3.0, 5.0, 2.5, 5.0, 2.0, 3.5, 4.0, 3.5, 5.0, 5.0, 2~
## $ VAR00001 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
We completed the demographic descriptive statistics (reported on page 1368) in the same manner as previous studies. Similar functions and the same variable names were used, allowing for consistency between analyses.
# Demographic descriptives
study4 %>%
tabyl(Gender) %>%
adorn_pct_formatting()
## Gender n percent valid_percent
## 1 796 52.2% 52.5%
## 2 720 47.2% 47.5%
## NA 10 0.7% -
mean(study4$Age)
## [1] 34.70904
sd(study4$Age)
## [1] 12.08643
For study 4, we had to reproduce the summary statistics (reported on page 1369) for each condition in the same way as study 3. Again, we determined that the variable “bed” represented the conditions and the meaning of each value.
However, the values given to each condition differed slightly to those in study 3, so we had to be careful to not mix them up. The assigned values were “0” for the information only, “1” for the control condition, and “2” for the information and empathy condition. Then using the filter function in the same way as we did for study 3, we created new variables and named them with respect to the condition they represented.
# Summary statistics (means/SD)
info_empathy <- study4 %>%
filter(bed == 2)
info_only <- study4 %>%
filter(bed == 0)
control <- study4 %>%
filter(bed == 1)
mean(info_empathy$Q22_1)
## [1] 4.004
sd(info_empathy$Q22_1)
## [1] 1.118251
mean(control$Q22_1)
## [1] 3.691011
sd(control$Q22_1)
## [1] 1.243382
mean(info_only$Q22_1)
## [1] 3.825203
sd(info_only$Q22_1)
## [1] 1.199938
The figure for Study 4 on page 1369 was very similar to that of Study 3, as it consisted of three violins, with one violin for experimental condition. Again, the vertical box in each plot indicated the interquartile range from the 25th to the 75th percentile, the diamond indicated the mean, and the horizontal line indicated the median.
Like in Study 3, we changed the experimental conditions (which was called “bed” in SPSS) from numeric variables into factors in order to control the order of the conditions.
However, initially, RStudio did not recognise the experimental condition variable as being different factors, so we used the as.factor function() in order to tell RStudio that 0, 1 and 2 were in fact factors and not some continuous variable.
We also had to reorder these factors, as their numerical order was not the same as the order they were presented in the plot.
Another similarity to Study 3 was that after some troubleshooting on Google, we discovered that shape number 9 was needed for adding the mean diamond to the boxplots in the Study 4 figure.
After removing the legend, overall x-axis title, and grey background from the plots, we sifted through the different questions that that participants were asked to find the one that matched the question from the paper as there seemed to be no explicit variable that measured mask-wearing.
# Changing study condition into factors
## 0 = Information only
## 1 = Control
## 2 = Empathy
study4$bed <- as.factor(study4$bed)
# Re-level the factors so that the conditions are in the right position
study4$bed <- fct_relevel(
study4$bed,
c("1", "0", "2")
)
levels(study4$bed)
## [1] "1" "0" "2"
# Plot
ggplot(
data = study4,
aes(
x = bed, #experimental condition
y = Q22_1 #Q used for measuring mask wearing
)
) +
geom_violin(
aes(
fill = bed
)
) +
geom_boxplot(width = 0.1) +
# Adding mean diamond within the boxplot
stat_summary(
fun = mean,
geom = "point",
shape = 9,
size = 3
) +
# Adding violin plot colours
scale_fill_manual(
values = c("#a4cee4", "#1578b5", "#b3d789")
) +
# Removing grey background
theme_bw() +
# x-axis labels
scale_x_discrete(labels = c("Control","Information Only","Information + Empathy")) +
# Adding y-axis label
scale_y_continuous("Motivation to Wear a Face Mask") +
# Removed legend and added individual labels on each violin plot
theme(
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
legend.position = "none",
axis.title.x = element_blank()
)
Overall, apart from the dataset used and the axis titles, the figure for Study 4 was essentially the same as the figure that we reproduced for Study 3 in terms of aesthetics. Thus, many of the functions used were identical between the two.
My first question is whether the number of people in a household has an influence on mask-wearing practices. In Part 1, I mentioned that the three experimental conditions in studies 3 and 4 did not significantly differ in terms of the assessed demographics, including household size. I found this very interesting because of how unexpected it was, and therefore wanted to find out whether this would be the case between household size only (not accounting for empathy) and engagement in mask-wearing.
Following on the notion that all members of a family have a higher risk of contracting COVID-19 if at least one family member has a high risk (U.S. Department of Health and Human Services, 2021), I hypothesied that people in a larger household would result in higher motivation to wear a mask.
I first started off by loading some extra packages to help with obtaining my summary statistics and constructing my figure. I then imported the data from Study 4, since this dataset contained the relevant variables I needed to answer my first exploratory analysis question. Converting the “Household size” variable into a continuous variable was also necessary for me to be able plot my data and complete my analysis.
Producing the summary statistics was relatively straightforward, as I only had to use the “summarise()” function find the mean, standard deviation and standard error, and then I used the gt() to put theses statistics into a nice table. However, constructing the column graph was a little bit more complicated. I used the “labs()” function to customise the title and axis labels for my graph, added error bars with the “geom_errorbar()” function and changed the limits of the y-axis with the “scale_y_continuous” function.
From looking at the graph, I could not see much of a clear pattern between my two variables of interest, so I conducted a simple regression analysis to determine whether the association between my predictor variable (household size) and outcome variable (motivation to wear a face mask) was significant. After using the “lm()” function to work out the beta coefficients for my model and using the “summary()” function to display a statistical summary, I found that the p-value for the intercept was significant (which does not provide much information), but my predictor variable was not. This was confirmed by the scatterplot I created, which included the regression line. Thus, there is insufficient evidence to suggest that the number of people in a household is a useful predictor of motivation to wear a face mask.
#Household size vs. Motivation to wear a face mask
## load packages
library(rstatix) # creates basic statistical tests
##
## Attaching package: 'rstatix'
## The following object is masked from 'package:janitor':
##
## make_clean_names
## The following object is masked from 'package:stats':
##
## filter
library(corrplot) # visualises correlation matrices and confidence intervals
## corrplot 0.84 loaded
library(ggpubr) # easily creates a publication-ready plot
library(jmv) # allows for the use of common statistical methods
library(car) # contains functions for linear models
## Loading required package: carData
##
## Attaching package: 'car'
## The following object is masked from 'package:dplyr':
##
## recode
## The following object is masked from 'package:purrr':
##
## some
library(psych) # used primarily for multivariate analyses in experimental psychology
##
## Attaching package: 'psych'
## The following object is masked from 'package:car':
##
## logit
## The following object is masked from 'package:jmv':
##
## pca
## The following objects are masked from 'package:ggplot2':
##
## %+%, alpha
library(gt) # easily creates nice-looking tables
library(dplyr) # makes data manipulation easier and more efficient
## read data
study4 <- read_sav("PSYCHOLOGICAL SCIENCE/PSYCHOLOGY/PSYC 3361/study4.sav")
glimpse(study4)
## Rows: 1,526
## Columns: 20
## $ QID90_1 <dbl+lbl> 1, 3, 1, 2, 2, 1, 5, 1, 1, 5, 5, 3, 1, 5, 4, 3, 4, ~
## $ QID90_2 <dbl+lbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, ~
## $ QID90_3 <dbl+lbl> 1, 3, 1, 2, 2, 1, 4, 1, 1, 4, 5, 3, 1, 4, 3, 3, 4, ~
## $ Q22_1 <dbl+lbl> 5, 4, 4, 3, 5, 1, 5, 4, 4, 5, 5, 2, 5, 4, 4, 2, 5, ~
## $ Q92_1 <dbl+lbl> 4, 3, 5, 2, 5, 3, 2, 4, 3, 5, 5, 2, 4, 4, 3, 3, 5, ~
## $ Q92_2 <dbl+lbl> 1, 3, 1, 3, 1, 5, 1, 2, 2, 1, 1, 3, 1, 4, 3, 3, 1, ~
## $ Q92_3 <dbl+lbl> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, ~
## $ Q100_1 <dbl+lbl> 2, 2, 3, 2, 3, 1, 2, 3, 4, 2, 4, 3, 3, 4, 3, 2, 3, ~
## $ Q100_2 <dbl+lbl> 3, 3, 4, 3, 3, 1, 3, 3, 4, 3, 4, 3, 3, 3, 4, 4, 3, ~
## $ Q61 <dbl+lbl> 1, 1, 2, 2, 1, 1, 2, 2, 1, 2, 2, 2, 2, 1, 2, 2, 2, ~
## $ Q20 <dbl+lbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ~
## $ Age <dbl> 30, 52, 39, 29, 46, 20, 31, 43, 40, 18, 29, 33, 27, 28,~
## $ Gender <dbl+lbl> 2, 2, 2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 1, 1, 1, 1, 2, ~
## $ Household_size <dbl> 1, 1, 3, 1, 1, 2, 3, 1, 4, 5, 2, 4, 2, 5, 2, 2, 2, 2, 5~
## $ Q28 <chr> "Gulasch", "Nudeln", "Nudelsalat", "Nudelgratin", "Sala~
## $ bed <dbl+lbl> 1, 1, 1, 0, 1, 0, 2, 1, 1, 2, 2, 0, 1, 2, 0, 2, 2, ~
## $ empa <dbl> 1.000000, 3.000000, 1.000000, 2.000000, 2.000000, 1.000~
## $ Q92_2r <dbl> 5, 3, 5, 3, 5, 1, 5, 4, 4, 5, 5, 3, 5, 2, 3, 3, 5, 3, 5~
## $ maskePo <dbl> NA, 3.0, 5.0, 2.5, 5.0, 2.0, 3.5, 4.0, 3.5, 5.0, 5.0, 2~
## $ VAR00001 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,~
## Summary statistics
### Convert Household_size variable into a continuous variable
study4$Household_size <- as.numeric(as.character(study4$Household_size))
### Mean, standard deviation and standard error values for the the relationship between Household_size + mask-wearing
exploreQ1 <- study4 %>%
group_by(Household_size) %>%
select(Q22_1) %>%
summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
se = sd/sqrt(n))
## Adding missing grouping variables: `Household_size`
gt(exploreQ1)
| Household_size | mean | sd | n | se |
|---|---|---|---|---|
| 1 | 3.711370 | 1.187687 | 343 | 0.06412909 |
| 2 | 3.889105 | 1.206447 | 514 | 0.05321410 |
| 3 | 3.959752 | 1.164267 | 323 | 0.06478156 |
| 4 | 3.710638 | 1.216558 | 235 | 0.07935953 |
| 5 | 3.962500 | 1.095951 | 80 | 0.12253100 |
| 6 | 3.809524 | 1.289149 | 21 | 0.28131534 |
| 7 | 3.200000 | 1.483240 | 5 | 0.66332496 |
| 8 | 5.000000 | NA | 1 | NA |
| 9 | 5.000000 | NA | 1 | NA |
| 11 | 3.000000 | NA | 1 | NA |
| 12 | 5.000000 | NA | 1 | NA |
| 20 | 1.000000 | NA | 1 | NA |
## bar plot
exploreQ1_plot <- exploreQ1 %>%
ggplot(aes(x=Household_size, y=mean, fill=Household_size)) + #column graph
geom_col() + # customise title and axis labels
labs(title = "Motivation to wear a face mask across household size", x = "Number of people in the household", y = "Motivation to wear a face mask") + # add error bars
geom_errorbar(aes(ymin=mean-sd, ymax=mean+sd)) + ylim(0, 1) + # change y-axis limits
scale_y_continuous(expand = c(0,0),
limits = c(0, 6))
## Scale for 'y' is already present. Adding another scale for 'y', which will
## replace the existing scale.
print(exploreQ1_plot)
# Linear regression analysis
# modelling the relationship between Household_size and motivation to wear a face mask
exploreQ1_stats <- lm(Q22_1 ~ Household_size, data = study4) # to determine the beta coefficients of the linear model
confint(exploreQ1_stats, level=0.95) # adding 95% confidence interval
## 2.5 % 97.5 %
## (Intercept) 3.70726058 3.9601777
## Household_size -0.04231672 0.0447492
summary(exploreQ1_stats) # to display the statistical summary of the model
##
## Call:
## lm(formula = Q22_1 ~ Household_size, data = study4)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.8580 -0.8361 0.1638 1.1626 1.1651
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.833719 0.064470 59.466 <2e-16 ***
## Household_size 0.001216 0.022193 0.055 0.956
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.196 on 1524 degrees of freedom
## Multiple R-squared: 1.971e-06, Adjusted R-squared: -0.0006542
## F-statistic: 0.003003 on 1 and 1524 DF, p-value: 0.9563
exploreQ1_stats_plot <- exploreQ1 %>%
ggplot(aes(x=Household_size, y=mean)) + geom_point() + geom_smooth() + # scatter plot
stat_smooth(method =lm) # adding regression line
print(exploreQ1_stats_plot)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
## `geom_smooth()` using formula 'y ~ x'
My second question is whether empathy has an influence on people’s opinions on whether the COVID measures imposed by the government should be followed, which was inspired by Pfattheicher et al. (2020) mentioning in their paper that empathy motivates a variety of attitudes and behavioral outcomes. I chose to focus on the USA sample from Study 1 because according to Papageorge et al. (2020), the United States does not have a consistent national policy on COVID-related behaviours, which has resulted in individuals having differing opinions and responses on COVID policies. So I wanted to see if this notion was reflected in Pfattheicher et al.’s (2020) paper.
Thus, I hypothesised that those who tend to be more empathetic are more likely to believe that the COVID measured imposed by the government should be followed.
To start, I imported the data from Study1USA, which contained the relevant variables for my second exploratory analysis question. I then produced the mean, standard deviation and standard error by using the “summarise()” function, followed by using the “mutate()” function to add value labels to the empathy item. I then used the “gt()” function to display my summary statistics in a neat table.
Following this, I produced a boxplot to visualise my data. I the used the “labs()” function to customise the title and axis labels for my graph, just like I did for the graph in my first exploratory analysis.
However, although the dots on the graph appeared to indicate a positive correlation, they were still a bit murky, so I conducted a two-way ANOVA to determine whether there is relationship between my predictor variable (empathy) and outcome variable (belief that COVID rules should be followed) was significant. Type II was used since the data I was using was unbalanced. Upon examining the ANOVA table, I found that opinions on following COVID rules was significant across different ratings of empathy. Therefore, there is evidence to suggest that in the United States, those who rate higher on empathy are more likely to have the opinion that people should follow COVID rules imposed by the government.
# Influence of empathy on opinions of COVID-19 measures in the USA
## Q22_5 --> Empathy question: "I am quite moved by what can happen to those most vulnerable to COVID-19"
### Rated from 1 ("Strongly disagree" -> little to no empathy) to 5 ("Strongly agree" -> highly empathetic)
##Q25_3 --> Opinion question: "It is very important that others will follow all major rules imposed by the government to contain COVID-19"
### Rated from 1 ("Strongly disagree" -> little to no importance) to 5 ("Strongly agree" -> highly important)
## read data
Study1USA <- read_sav("PSYC3361 Project/COVID paper data+syntax/Study1USA.sav")
## Descriptive statistics
# Mutating the Q22_5 variable to allocate its value labels and obtaining statistics
exploreQ2 <- Study1USA %>%
select(Q22_5, Q25_4) %>%
group_by(Q22_5) %>%
summarise(mean = mean(Q25_4), sd = sd(Q25_4), n=n(),
se = sd/sqrt(n)) %>%
mutate(Q22_5 = case_when(
Q22_5 == 1 ~ "Strongly disagree",
Q22_5 == 2 ~ "Somewhat disagree",
Q22_5 == 3 ~ "Neither agree nor disagree",
Q22_5 == 4 ~ "Somewhat agree",
Q22_5 == 5 ~ "Strongly agree"))
gt(exploreQ2)
| Q22_5 | mean | sd | n | se |
|---|---|---|---|---|
| Strongly disagree | 3.428571 | 1.7182494 | 7 | 0.64943722 |
| Somewhat disagree | 4.363636 | 0.8090398 | 11 | 0.24393469 |
| Neither agree nor disagree | 4.433333 | 0.6789106 | 30 | 0.12395154 |
| Somewhat agree | 4.600000 | 0.5617796 | 110 | 0.05356358 |
| Strongly agree | 4.884146 | 0.3572113 | 164 | 0.02789351 |
## bar plot
Study1USA$Q22_5 <- as.factor(Study1USA$Q22_5)
exploreQ3_plot <- Study1USA %>%
group_by(Q22_5) %>%
ggplot(aes(x = Q22_5, y = Q25_4, fill=Q22_5)) + geom_boxplot(alpha = 0.8) + geom_jitter(alpha = 0.5) + # adding axis labels
labs(title = "Opinion on following COVID-19 measures in the United States", x = "Level of empathy", y = "Rating of opinion on COVID measures")
print(exploreQ3_plot
)
## Don't know how to automatically pick scale for object of type haven_labelled/vctrs_vctr/double. Defaulting to continuous.
# Two way ANOVA
exploreQ2_anova <- aov(Q25_4 ~ Q22_5, data = Study1USA)
Anova(exploreQ2_anova, type = "II") # Type two was used since data was unbalanced
## Anova Table (Type II tests)
##
## Response: Q25_4
## Sum Sq Df F value Pr(>F)
## Q22_5 21.349 4 18.227 1.726e-13 ***
## Residuals 92.825 317
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
My third and final question is whether the importance that people place on using hand sanitiser during the COVID-19 pandemic has an effect on their tendency to engage in physical distancing. Since hand hygiene is one of the key prevention measures against COVID-19 along with physical distancing, I thought that it would worthwhile to investigate whether those who have stronger beliefs about the importance of hand sanitiser (and therefore use it more) would have a higher level of engagement in physical distancing, and whether this would vary in different countries. This is because if the outcome of my analysis is significant, it could be potentially attributed higher levels of worry or perceived vulnerability towards COVID-19, as research has suggested (Ebrahimi, Hoffart, and Johnson, 2021).
I first started off by imported the data of each of three samples from Study 1, since these datasets contained the relevant variables I needed to answer my third exploratory analysis question. I also needed to convert both the hand sanitiser and physical distancing variables into continuous variables, since both involved ratings on a Likert scale.
When it came to producing the summary statistics, it was quite time-consuming because I had to heavily rely on the “mutate()” function in order for me to combine the hand sanitiser data of each sample. I had to use the “mutate()” function twice on the German sample because the value labels for the hand sanitiser scale were inconsistent with those of the USA and UK samples. A lot of errors were made along the way, but I was eventually able to produce an overall mean for each country by using the “rbind()” function on the hand sanitiser data and display them in a table using the gt() function.
The table and bar graph that I produced indicated that there was very little difference between the means of each country, so I conducted a two-way ANOVA to determine the significance of physical distancing across hand sanitiser usage and/or countries. Type II was used since the data I was using was unbalanced, like in my second exploratory analysis. Upon looking at the ANOVA table, I found that physical distancing was significant across hand sanitiser usage, but not significant across countries. Therefore, there is evidence to suggest that those who place more importance on using hand sanitiser are more likely to engage in physical distance but this effect is not influence by which country one lives in.
# Importance of hand sanitiser vs. Physical distancing
## read data
Study1USA <- read_sav("PSYC3361 Project/COVID paper data+syntax/Study1USA.sav")
Study1UK <- read_sav("PSYC3361 Project/COVID paper data+syntax/Study1UK.sav")
Study1Germany <- read_sav("PSYC3361 Project/Study1Germany.sav")
# Variable name for measuring importance of hand sanitiser: Q22_7
# Variable name for measuring physical distancing: sd
# Variable name for country: Q20
# Coverting both hand sanitiser and physical distancing variables into continuous variables
Study1USA$Q22_7 <- as.numeric(as.character(Study1USA$Q22_7))
Study1UK$Q22_7 <- as.numeric(as.character(Study1UK$Q22_7))
Study1Germany$Q22_7 <- as.numeric(as.character(Study1Germany$Q22_7))
Study1USA$sd <- as.numeric(Study1USA$sd)
Study1UK$sd <- as.numeric(Study1UK$sd)
Study1Germany$sd <- as.numeric(Study1Germany$sd)
# Summary statistics for the USA sample
exploreQ3 <- Study1USA %>%
select(Q20, Q22_7, sd) %>%
group_by(Q20) %>%
summarise(mean = mean(sd)) %>%
mutate(Q20 = case_when(
Q20 == 1 ~ "United States of America",
Q20 == 2 ~ "United Kingdom",
Q20 == 3 ~ "Ireland",
Q20 == 4 ~ "Canada",
Q20 == 5 ~ "Other"))
gt(exploreQ3)
| Q20 | mean |
|---|---|
| United States of America | 4.291139 |
| United Kingdom | 5.000000 |
| Canada | 4.500000 |
| Other | 4.666667 |
# Summary statistics for the UK sample
exploreQ3 <- Study1UK %>%
select(Q20, Q22_7, sd) %>%
group_by(Q20) %>%
summarise(mean = mean(sd)) %>%
mutate(Q20 = case_when(
Q20 == 1 ~ "United States of America",
Q20 == 2 ~ "United Kingdom",
Q20 == 3 ~ "Ireland",
Q20 == 4 ~ "Canada",
Q20 == 5 ~ "Other"))
gt(exploreQ3)
| Q20 | mean |
|---|---|
| United Kingdom | 4.121795 |
| Other | 4.200000 |
# Mutate Study1Germany's Q22_7 values to be consistent with value from other countries
Study1Germany <- Study1Germany %>%
mutate(Q22_7 = case_when(Q22_7 == 28 ~ "1",
Q22_7 == 29 ~ "2",
Q22_7 == 30 ~ "3",
Q22_7 == 31 ~ "5",
Q22_7 == 32 ~ "4"))
# Summary statistics for the German sample
exploreQ3 <- Study1Germany %>%
select(Q20, Q22_7, sd) %>%
group_by(Q20) %>%
summarise(mean = mean(sd)) %>%
mutate(Q20 = case_when(
Q20 == 1 ~ "Germany",
Q20 == 2 ~ "Switzerland",
Q20 == 3 ~ "Austria",
Q20 == 4 ~ "Other country"))
gt(exploreQ3)
| Q20 | mean |
|---|---|
| Germany | 3.981481 |
| Switzerland | 5.000000 |
| Austria | 5.000000 |
| Other country | 4.240000 |
# Combining hand sanitiser data from each sample
Study1USA <- Study1USA %>%
select(Q20, Q22_7, sd) %>%
mutate(Q20 = "USA")
Study1UK <- Study1UK %>%
select(Q20, Q22_7, sd) %>%
mutate(Q20 = "UK")
Study1Germany <- Study1Germany %>%
select(Q20, Q22_7, sd) %>%
mutate(Q20 = "Germany")
exploreQ3 <- rbind(Study1USA, Study1UK, Study1Germany) %>%
select(Q20, Q22_7, sd) %>%
group_by(Q20) %>%
summarise(mean = mean(sd))
gt(exploreQ3)
| Q20 | mean |
|---|---|
| Germany | 4.039877 |
| UK | 4.123028 |
| USA | 4.298137 |
# plotting hand sanitiser usage next to each country
ggplot(
data = rbind(Study1USA, Study1UK, Study1Germany),
aes(
x = Q22_7,
y = mean(sd),
fill = Q20)
) +
geom_bar(
position = "dodge", stat = "identity") +
# colour scheme
scale_fill_brewer() +
scale_y_continuous(
limits = c(0,5)) +
labs(x = "Hand-sanitiser importance rating", y = "Physical distancing",fill = "Country"
)
# Two way ANOVA
exploreQ3_anova <- aov(sd ~ Q22_7 + Q20, data = rbind(Study1USA, Study1UK, Study1Germany))
Anova(exploreQ3_anova, type = "II") # Type two was used since data was unbalanced
## Anova Table (Type II tests)
##
## Response: sd
## Sum Sq Df F value Pr(>F)
## Q22_7 23.54 4 5.5457 0.0002048 ***
## Q20 3.41 2 1.6086 0.2006983
## Residuals 1016.52 958
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Seems to be indicating that physical distancing is significant across hand sanitiser usage, but not across countries.
Based on the reproducibility checklist that my group and I made, some of our criteria was met as the data was publicly available and easy to locate in the paper. However, if I was in a situation where I had to provide feedback to Pfattheicher and his colleagues, I would definitely have some recommendations on how they could make it easier for people like myself and my group to reproduce their data.
Firstly, more consistent scaling of the survey items was needed to make the open data more user-friendly. Most of the survey items included in each of the four studies involved the participants giving ratings on a Likert scale. Most of them ranged from 1-5 whilst others ranged from 28-32, which meant a lot more steps (which mainly involved having to rely more on the “mutate()” function) needed to be taken to reproduce the data from the paper, as well as when it came to conducting my own exploratory analyses.
Secondly, as mentioned in Part 2, my group and I had to translate the variable labels and survey items from German to English for the datasets within all four studies. This was a very time-consuming task that also carried the risk of important information being lost in translation, which could have potentially hindered our ability to properly produce the data and figures from the paper. Although the authors cannot be faulted for their preference to write use their native language to write the variable names in each dataset, it would have been more logical to at least have a copy of each dataset in English, given that the paper itself was published in English.
Another important part of our criteria was that the researchers communicated their findings in a clear and unambiguous way. This was mostly achieved, but some parts of the paper were quite confusing. The main example that comes to mind is the section about Study 4, as it was highly convoluted and contained a lot of jargon. Although this may be easier said than done, my recommendation is to find a balance between writing in a professional manner that suits the scientific and using language that enables lay people to understand and interpret what is written.
This extends to ensuring that the variable names in each dataset are labelled clearly. Although the file names in the OSF repository were appropriate and legible, the same could not be said about the individual variables within the datasets. Some variables were given very arbitrary names, including “bed” for the level of state empathy and “sdR” for physical distancing in the Study 4 dataset. So my final recommendation would be to ensure that variable labels are readable for both humans and computers.
Ebrahimi, O. V., Hoffart, A., & Johnson, S. U. (2021). Physical distancing and mental health during the COVID-19 pandemic: Factors associated with psychological symptoms and adherence to pandemic mitigation strategies. Clinical Psychological Science, 1-18. https://doi.org/10.1177/2167702621994545.
Papageorge, N. W., Zahn, M. V., Belot, M., Broek-Altenburg, E., Choi, S., Jamison, J. C., and Tripodi, E. (2021). Socio-demographic factors associated with self-protecting behavior during the Covid-19 pandemic. Journal of Population Economics, 34(1), 691-738. https://doi.org/10.1007/s00148-020-00818-x.
Pfattheicher, S., Nockur, L., Böhm, R., Sassenrath, C., & Petersen, M. B. (2020). The Emotional Path to Action: Empathy Promotes Physical Distancing and Wearing of Face Masks During the COVID-19 Pandemic. Psychological Science, 31(11), 1363–1373. https://doi.org/10.1177/0956797620964422.
Taylor, S., and Asmundson, J. G. (2020). Negative attitudes about facemasks during the COVID-19 pandemic: The dual importance of perceived ineffectiveness and psychological reactance. medRxiv, 1-20. .
U.S. Department of Health and Human Services (2021). Guidance for large or extended families living in the same home. https://www.cdc.gov/coronavirus/2019-ncov/downloads/living-in-close-quarters.pdf.