The Covid-19 pandemic that began in early 2020 was unprecedented and showed how under-prepared many nations were to face such a threat. However, it also acted as something of a global scale social experiment. With many having to quarantine and completely change the way they work and live, there became opportunities to study psychology, as such socially isolating conditions were not attainable prior to the pandemic and would not be attainable post-pandemic. Folk, et al. (2020) capitalized on this and explored the impact of the socially depriving aspects of Covid-19 on social-connectedness, comparing measures conducted prior to the pandemic with the same recruits measured during the pandemic. Analyses of individual extraversion levels were also explored to see how social connectedness impacted extraverted and introverted groups. Because this was the first time nations across the globe have mandated quarantine like we’ve seen during Covid-19, there has been no prior research on the matter. Thus, allowing Folk et al. to research a previously unfilled gap in Psychology.
Two studies were conducted, study 1 collected data from (n = 467) undergrads from a Canadian university, while study 2 used (n = 336) adult participants from 28 countries. In study 1, students were surveyed at time 1 (pre-Covid) on social connectedness, with measures of lethargy used as a proxy for well-being. At time 2, early April 2020 (during Covid-19), the same students were surveyed on social connectedness, but also included measures of extraversion and experiences during Covid-19, for exploratory analyses. In study 2, at time 1, participants were recruited to complete an online survey that measured personality, social connectedness, subjective wellbeing and demographics. At time 2, early April 2020 (during Covid-19), the same students were surveyed on the same measures as time 1, as well as including exploratory measures around experiences during Covid-19.
Folk et al. found that in general, there was no significant change in feelings of social connectedness as a result of the pandemic. Through an exploratory analysis of impact on the most extraverted (> 75th percentile) and introverted participants (< 25th percentile) in study 1, findings show that extraverts experienced a larger drop in social connectedness than introverts between time 1 and 2. However, this can be attributed to the fact that extraverts had more social connection to “lose”, as impact of extraversion on social connection disappeared when controlled for levels of social connectedness before the pandemic. What was also found was that levels of loneliness slightly decreased for introverts between time 1 (pre-Covid) and time 2 (during Covid).
I was surprised that there seemed to be no big changes in social connectedness and loneliness for introverted or extraverted participants, despite the isolating nature of the pandemic. I would have expected extraverted individuals to experience a large decline in feelings of social connectedness & loneliness. While I would have expected introverts to experience a much smaller drop in social connectedness and perhaps a large decrease in loneliness in comparison to extraverts.
I was confused by the fact that they were able to draw results about extraversion in study 1 when they didn’t measure extraversion of participants in the pre-Covid measure. By asking for a pre-Covid retrospective rating of extraversion, the experiment is lending itself to being influenced by the Hawthorne effect. I was also confused as to why they believed lethargy ratings could be used as a rough proxy to measure well-being.
I wonder whether the low changes in social connectedness were related to the second measure being conducted relatively short time after the first lockdown happened. Perhaps if measures of social connectedness were conducted after the first lockdown had been prolonged for a while, or during a second lockdown (depending on what country you’re in) there would be a difference in the results.
In this study, I verified:
Study 1:
n = 467, Age = (M = 20.89, SD = 3.03), Gender: 77% women
Study 2:
n = 336, Age: (M = 32.03, SD = 11.94), Gender: 55% Male, Ethnicity: 80% White, Country: 32% U.S., 27% U.K.)
Study 1
Study 2
Study 1 - Distribution of Social Connectedness Differences scores
Study 2 - Distribution of relatedness and loneliness difference scores
To start, we verified the demographics of the study. This was an easy process as our study had a codebook, so we could identify what the variables meant.
First, we read in our libraries.
library(dplyr) # for summary, group_by and %>%
library(tidyverse) # for dplyr
Reading in the study 1 data.
Study_1_data <- read.csv("Study 1 data.csv")
In this code, we try to confirm that the number of participants in study 1 was 467. length gets the length of a vector, in this case, the length of participant ID. unique is in there to return the vector to length. Our result confirms n = 467.
length(unique(Study_1_data$Participant.ID))
## [1] 467
Here we are trying to verify that the mean age of participants was 20.89, and the SD was 3.03. I had to convert “Age” from a character to an integer using as.integer, to be able to calculate the mean, because it kept appearing as ‘NA’ in the output. This was weird because the Age data in study 2 was set to integer correctly. The na.rm argument removes all missing (NA) values.
Study_1_data$Age <- as.integer(Study_1_data$Age)
sd(Study_1_data$Age, na.rm = TRUE)
## [1] 3.034918
mean(Study_1_data$Age, na.rm = TRUE)
## [1] 20.88616
And it was successfully confirmed!
Here we group the data by gender and create a percent column that counts the number of participants of each gender and puts it over the total number of participants in the study.
Study_1_data %>%
group_by(Gender) %>%
summarise(percent = 100 * n()/ nrow(Study_1_data)) # count number of participants in each category, over the total number of participants in study 1.
## # A tibble: 4 x 2
## Gender percent
## * <chr> <dbl>
## 1 [Decline to Answer] 0.428
## 2 Man 21.6
## 3 Other 0.857
## 4 Woman 77.1
Confirming that the percentage of women in the study was 77%.
The process of reproducing the study 2 demographics was pretty much the same.
Reading in the data.
Study_2 <- read.csv("Study 2.csv") # read data
length which gets length of a vector, in this case, the length of participant ID. unique is in there to return the vector to length. This confirms the number of participants for study 2 to be 336.
length(unique(Study_2$Participant_ID)) #no. of participants
## [1] 336
Calculating the mean and SD of Age from the Study_2 dataframe.
mean(Study_2$Age) # mean of participants age
## [1] 32.02679
sd(Study_2$Age) #standard deviation of participants age
## [1] 11.94299
Here try to confirm the % of males in the study. We first call Study_2, and say group_by gender and then create a new column with the percentage of genders using the summarise function.
#55% males overall in study 2
Study_2 %>%
group_by (Gender) %>%
summarise (percent = 100 * n() / nrow (Study_2))
## # A tibble: 3 x 2
## Gender percent
## * <chr> <dbl>
## 1 Female 44.9
## 2 Male 54.8
## 3 Non-Binary 0.298
To create a percentage table for ethnicities, we use the same process as we did for the gender table. As we can see, 80% of participants were white (79.76% rounded up).
#80% white, percentage of ethnicities in study 2
Study_2 %>%
group_by(Ethnicity) %>%
summarise (percent = 100 * n() / nrow (Study_2))
## # A tibble: 8 x 2
## Ethnicity percent
## * <chr> <dbl>
## 1 American Indian/Alaskan Native 0.298
## 2 Asian 8.93
## 3 Black 2.68
## 4 Hispanic 4.17
## 5 Middle-Eastern 0.893
## 6 More than one 2.98
## 7 Other 0.298
## 8 White 79.8
Same process, but we added an arrange function in descending order so that USA and UK appear at the top. There is a rounding error for the USA statistic, the percentage should be 31%, not 32%, so that failed to be reproduced. But the UK statistic was correct.
#32% US, 27% UK, what percentage of participants are from the US & UK?
Study_2 %>%
group_by(Country) %>%
summarise (percent = 100 * n() / nrow (Study_2)) %>%
arrange(desc(percent))
## # A tibble: 29 x 2
## Country percent
## <chr> <dbl>
## 1 USA 31.0
## 2 UK 26.8
## 3 Poland 8.63
## 4 Portugal 6.85
## 5 Canada 4.46
## 6 Greece 3.27
## 7 Italy 2.98
## 8 Germany 1.79
## 9 Mexico 1.79
## 10 Slovenia 1.49
## # ... with 19 more rows
library(janitor) # for the clean_names function
library(tidyverse) # for dplyr and ggplot
library(dplyr) # for various helpful data manipulation functions.
library(gt) # to create the gt table
clean_names makes all the variable names lower case. We then select all the columns relevant to the table using the select function. We then renamed all the columns to the names they used in the paper using the rename function.
# Correlations and Means (SD) tables for Study 1 and Study 2
S1master <- read_csv("Study 1 Data.csv") %>%
clean_names() %>%
# Let's first select all the relevant columns we'll be manipulating and get rid of gender and age.
select(participant_id:extraversion, -gender, -age) %>%
# Rename them to match the original paper's labels
rename("T1 Lethargy" = lethaverage_t1,
"T2 Lethargy" = lethaverage_t2,
"Lethargy Diff (T2 - T1)" = leth_diff,
"T1 Social Connectedness" = scaverage_t1,
"T2 Social Connectedness" = scaverage_t2,
"Connectedness Diff (T2 - T1)" = s_cdiff,
"Extraversion" = extraversion)
We then create two data frames using the summarise function to find the mean and SD for the relevant columns. pivot_longer was used to convert the data frame from wide to long. After these two dataframes are created, we merge the two dataframes together. We use mutate_if and it’s arguments to mutate only numerics and round them to 2 dp. Once we merge the data frames for the means and sd of each variable, we correlate the data for the s1 into a different dataframe.
We select the variables we want to perform the correlation on with the select() function and put that into a variable called ‘correlation’. Then, we use the cor function to correlate and use round to round to 2 decimal places. We then want to only display correlations for the bottom half of the table to reduce redundancies, so we filled the upper.tri for s1upcor with just empty space so it appears blank (""). We then convert the table from a matrix into a dataframe using as.data.frame, because gt does like matrices. We then bind both the meansd data frame, and s1upcor dataframe together, which binds meansd to the bottom of the table. We then put our bound dataframe into a gt function and added rownames_to_stub = TRUE to make the row names the same as the column names, we had a lot of trouble trying to figure out how to do this. Finally, we added a title and changed the font with opt_table_font.
# We first create a summary of means and sd. Pivoting longer is necessary for both dataframes in order to merge them horizontally later!
S1mean <- S1master %>%
summarise(across(where(is.numeric), mean)) %>% # get everything that's numeric
pivot_longer(!participant_id, names_to = "variable", values_to = "Mean") # the exclamation mark indicates that we don't want to edit the participant_id column. All the columns get labeled under "variable" and make values
# Do the same for sd
S1sd <- S1master %>%
summarise(across(where(is.numeric), sd)) %>%
pivot_longer(!participant_id, names_to = "variable", values_to = "SD")
# Merge the two dataframes by "variable", which is the only common column across the two sets
meansd <- merge(S1mean, S1sd, by = c("variable")) %>%
mutate_if(is.numeric, round, digits = 2) %>% # round the numbers to 2 decimal places.
mutate(SD = paste0("(", (SD), ")")) %>% # adds the brackets around the SD in the last row.
unite("Mean (SD)", Mean, "SD", sep = " ") %>% # Merges mean and SD columns together, and puts a space in between the mean and the SD.
pivot_wider(names_from = "variable", values_from = "Mean (SD)") %>% # we pivot wider so that the variable names turn into column headers.
select("Connectedness Diff (T2 - T1)":"T2 Social Connectedness") # we get rid of the columns we don't need.
row.names(meansd)[1] <- "Mean (SD)" # Changes the name of the row to "Mean (SD)"
correlation <- S1master %>%
select("T1 Lethargy":"Extraversion") # We need to select only the variables we want to perform a pairwise correlation on
s1corr <- round(cor(correlation), 2) # the "2" restricts the number of decimal places to 2
s1upcor <- s1corr
# Let's now select only the bottom half triangle of the matrix to reduce the redundancy
s1upcor[upper.tri(s1corr)]<-""
s1upcor<-as.data.frame(s1upcor) # This step turns the matrix into a dataframe for us to feed into the gt function. Bind combines them.
s1final <- rbind(s1upcor, meansd)
s1final <- s1final %>%
gt(rownames_to_stub = TRUE) %>%
tab_header(
title = md("**Table 1**: Means and Correlations Among Variables at Time 1 and Time 2 (Study 1)")
) %>%
opt_align_table_header("left") %>%
cols_width(
everything() ~px(150)
) %>%
opt_table_font("Times New Roman")
s1final
| Table 1: Means and Correlations Among Variables at Time 1 and Time 2 (Study 1) | |||||||
|---|---|---|---|---|---|---|---|
| T1 Lethargy | T2 Lethargy | Lethargy Diff (T2 - T1) | T1 Social Connectedness | T2 Social Connectedness | Connectedness Diff (T2 - T1) | Extraversion | |
| T1 Lethargy | 1 | ||||||
| T2 Lethargy | 0.41 | 1 | |||||
| Lethargy Diff (T2 - T1) | -0.48 | 0.6 | 1 | ||||
| T1 Social Connectedness | -0.48 | -0.27 | 0.16 | 1 | |||
| T2 Social Connectedness | -0.36 | -0.41 | -0.08 | 0.66 | 1 | ||
| Connectedness Diff (T2 - T1) | 0.16 | -0.16 | -0.3 | -0.45 | 0.38 | 1 | |
| Extraversion | -0.28 | -0.14 | 0.11 | 0.56 | 0.5 | -0.09 | 1 |
| Mean (SD) | 2.6 (1.16) | 3.16 (1.27) | 0.56 (1.33) | 4.11 (0.88) | 3.97 (0.85) | -0.14 (0.71) | 4.17 (1.01) |
The process for creating the correlations table for study 2 was essentially the same, just with some different values and variable names etc. Some minor adjustments were made to this table, like changing the order of the columns to match the table in the paper: s2correlation <- s2correlation[, c(2, 3, 4, 8, 9, 10, 5, 6, 7, 1)].
# Let's do the same for study 2
s2 <- read_csv("Study 2.csv") %>%
clean_names() %>%
# Let's first select all the relevant columns we'll be manipulating
select(participant_id:bmpn_diff, -gender, -age, -ethnicity) %>%
# Rename them to match the original paper's labels
rename("T1 Extraversion" = t1extraversion,
"T1 Life Satisfaction" = t1swls,
"T2 Life Satisfaction" = t2swls,
"Life Satisfaction Change (T2 - T1)" = swls_diff,
"T1 Relatedness" = t1bmpn,
"T2 Relatedness" = t2bmpn,
"Relatedness Change (T2 - T1)" = bmpn_diff,
"T1 Loneliness" = t1lonely,
"T2 Loneliness" = t2lonely,
"Loneliness Change (T2 - T1)" = lonely_diff)
# Again, we create a summary of means and sd.
s2mean <- s2 %>%
summarise(across(where(is.numeric), mean)) %>%
pivot_longer(!participant_id, names_to = "variable", values_to = "Mean")
s2sd <- s2 %>%
summarise(across(where(is.numeric), sd)) %>%
pivot_longer(!participant_id, names_to = "variable", values_to = "SD")
# Merge the two dataframes by "variable", which is the only common column across the two sets
s2meansd <- merge(s2mean, s2sd, by = c("variable")) %>%
mutate_if(is.numeric, round, digits = 2) %>%
mutate(SD = paste0("(", (SD), ")")) %>%
unite("Mean (SD)", Mean, "SD", sep = " ") %>%
pivot_wider(names_from = "variable", values_from = "Mean (SD)") %>%
select("Life Satisfaction Change (T2 - T1)":"T2 Relatedness")
row.names(s2meansd)[1] <- "Mean (SD)"
s2correlation <- s2 %>%
select("T1 Extraversion":"Relatedness Change (T2 - T1)") # We need to select only the variables we want to perform a pairwise correlation on
s2correlation <- s2correlation[, c(2, 3, 4, 8, 9, 10, 5, 6, 7, 1)] # Puts extraversion as last column. Because in the paper, extraversion was the last column in the table.
s2corr <- round(cor(s2correlation), 2) # the "2" restricts the number of decimal places to 2
# Let's now select only the bottom half triangle of the matrix to reduce the redundancy
s2corr[upper.tri(s2corr)]<-""
s2upcor <- as.data.frame(s2corr) # This step turns the matrix into a dataframe for us to feed into the gt function
s2final <- rbind(s2upcor, s2meansd)
s2final <- s2final %>%
gt(rownames_to_stub = TRUE) %>%
tab_header(
title = md("**Table 3**: Correlations Among Variables for Time 1 and Time 2 (Study 2).")
) %>%
opt_align_table_header("left") %>%
cols_width(
everything() ~px(150)
) %>%
opt_table_font("Times New Roman")
s2final
| Table 3: Correlations Among Variables for Time 1 and Time 2 (Study 2). | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|
| T1 Life Satisfaction | T2 Life Satisfaction | Life Satisfaction Change (T2 - T1) | T1 Relatedness | T2 Relatedness | Relatedness Change (T2 - T1) | T1 Loneliness | T2 Loneliness | Loneliness Change (T2 - T1) | T1 Extraversion | |
| T1 Life Satisfaction | 1 | |||||||||
| T2 Life Satisfaction | 0.83 | 1 | ||||||||
| Life Satisfaction Change (T2 - T1) | -0.37 | 0.22 | 1 | |||||||
| T1 Relatedness | 0.51 | 0.45 | -0.14 | 1 | ||||||
| T2 Relatedness | 0.35 | 0.42 | 0.09 | 0.5 | 1 | |||||
| Relatedness Change (T2 - T1) | -0.13 | 0 | 0.22 | -0.46 | 0.54 | 1 | ||||
| T1 Loneliness | -0.56 | -0.55 | 0.07 | -0.69 | -0.48 | 0.19 | 1 | |||
| T2 Loneliness | -0.46 | -0.55 | -0.11 | -0.58 | -0.65 | -0.1 | 0.8 | 1 | ||
| Loneliness Change (T2 - T1) | 0.2 | 0.03 | -0.29 | 0.22 | -0.24 | -0.46 | -0.38 | 0.24 | 1 | |
| T1 Extraversion | 0.4 | 0.4 | -0.02 | 0.34 | 0.24 | -0.08 | -0.54 | -0.42 | 0.22 | 1 |
| Mean (SD) | 3.97 (1.53) | 3.99 (1.45) | 0.02 (0.88) | 4.92 (1.09) | 4.91 (1.14) | -0.01 (1.11) | 2.12 (0.65) | 2.06 (0.62) | -0.06 (0.4) | 3.9 (0.79) |
This table shows what percentage of participants practice physical distancing and the mean amount of people (excluding household members) that participants came within 6 feet of the day before the measure. This was ordered by countries with the most participants in the study (>15).
library(tidyverse) # for dplyr
library(dplyr) # for various helpful data manipulation functions.
library(gt) # to create the gt table
library(janitor) # for clean_names
PhyDist <- read_csv("Study 2.csv") %>%
clean_names() %>% #changes variable names to lowercase.
rename(Country = country)
Here’s the code that we used to create the table. It required a lot of trial and error with filtering, arranging, summarising and formatting. The second half of the code is mainly just for aesthetics.
PhyDist <- PhyDist %>% #summarise % of people phys distancing and mean and sd of 6feet column
group_by(Country) %>% #Grouping by each country
summarise(sample_size = n(),
"% Physical \n distancing" = mean(social_distancing),
"Mean Six feet response \n (SD)" = mean(six_feet), # calculate mean number of people within 6 feet.
sd_six_feet = sd(six_feet) # calculate sd
) %>%
ungroup() %>% #ungroup after summarise
arrange(desc(sample_size)) %>% #arranging by sample size in descending order
filter(sample_size > 14) %>% #filter for number of participants greater than 14
gt() %>% #make it into gt table
fmt_number(columns = (3:5), decimals = 2) %>% #columns of gt table - phys distnacing, 6feet, 2dp for 3rd to 5th col
fmt_number(columns = 5, pattern = "({x})") %>% #puts brackets around sd (column 5)
fmt_percent(
columns = vars("% Physical \n distancing"), #converting this col to % values
decimals = 0
) %>%
cols_merge(columns = vars(`Mean Six feet response \n (SD)`, sd_six_feet)) %>% #merge m and sd for 6feet
#aesthetics below
cols_width( #change col width - aesthetics
everything() ~ px(120)
) %>%
cols_label( #rename sample size to Sample Size - aesthetics
sample_size = "Sample Size"
) %>%
tab_source_note( #produces notes under table - table captions
source_note = md("*Note.* Only countries with at least 15 participants displayed")
) %>%
tab_footnote( #footnote below the table caption
footnote = "Mean number of people reported to have been within six feet of participant on previous day",
locations = cells_column_labels( #this associates the footnote with the mean for 6feet column
columns = "Mean Six feet response \n (SD)"
)
) %>%
opt_footnote_marks(marks = letters) %>% #make footnote a letter instead of number - the little a thing
tab_header( #heading of overall table
title = md("**Table S2**"),
subtitle = md("*Physical Distancing by Country (Study 2)*")
) %>%
opt_table_font("Times New Roman") %>% #align top header left, font, centring data in tables
opt_align_table_header("left") %>%
cols_align(
align = c("center"),
columns = everything()
)
PhyDist
| Table S2 | |||
|---|---|---|---|
| Physical Distancing by Country (Study 2) | |||
| Country | Sample Size | % Physical distancing | Mean Six feet response (SD)a |
| USA | 104 | 93% | 1.01 (1.58) |
| UK | 90 | 96% | 1.12 (1.74) |
| Poland | 29 | 79% | 1.48 (2.11) |
| Portugal | 23 | 100% | 0.70 (1.82) |
| Canada | 15 | 100% | 0.87 (1.41) |
| Note. Only countries with at least 15 participants displayed | |||
|
a
Mean number of people reported to have been within six feet of participant on previous day
|
|||
In my third reaction to the paper, I was critical of the timing of the second measurement being conducted too soon into the spread of Covid-19, with some countries not yet implementing very strict quarantining rules. I thought this may have been a possible explanation for the lack of significant results for social connectedness changes pre and post. To test if this was apparent in other measurements, I wanted to investigate whether country of residence was a predictor of life satisfaction scores before and during Covid.
library(tidyverse) # for dplyr and ggplot
library(dplyr) # for data frame manipulation functions
library(ggplot2) # for graphs
To do this, I used the data from study 2 and looked at the life satisfaction at time 1 and time 2, by country. I chose to analyse countries with above >14 participants because country sample sizes below 15 were too small to be meaningful (the paper also did exploratory analyses of countries with above 14 participants).
I called our data frame LS_Country to indicate that we were measuring Life Satisfaction (LS) by country. The life satisfaction difference variable was included to compare figures later on.
LS_Country <- read.csv("Study 2.csv")
LS_Country <- LS_Country %>%
group_by(Country) %>% # Group scores by country
select(Country, T1SWLS, T2SWLS, SWLS_Diff) %>% # Select only the country, T1 and T2 life satisfaction scores.
summarise(sample_size = n(), # create a new column that displays number of participants
"T1LS" = mean(T1SWLS), # create a column that displays mean life satisfaction at time 1
"T2LS" = mean(T2SWLS), # create a column that displays mean life satisfaction at time 2
"LS_Diff" = mean(SWLS_Diff) # create a column that displays mean life satisfaction differences across T1 and T2
) %>%
filter(sample_size > 14) %>% # filter out all countries with less than 15 participants
arrange(desc(sample_size)) # arrange with the highest no. of participants at the top
LS_Country
## # A tibble: 5 x 5
## Country sample_size T1LS T2LS LS_Diff
## <chr> <int> <dbl> <dbl> <dbl>
## 1 USA 104 3.98 4.03 0.0519
## 2 UK 90 4.14 4.02 -0.116
## 3 Poland 29 3.81 4 0.186
## 4 Portugal 23 3.90 3.65 -0.252
## 5 Canada 15 3.83 4.12 0.293
We are left with 5 out of 28 countries with 15 or more participants: USA, UK, Poland, Portugal and Canada. Now we will compare these mean scores in a bar graph.
To be able to create a bar graph that has a side-by-side comparison for T1 and T2, by country, I first have to convert my dataframe from wide to long. Here I use the pivot_longer function, which takes the T1LS and T2LS columns from my data frame and puts the names of the columns into the data, and aligns the means with each ‘time’ in a separate column labelled ‘Means’. So now my data frame looks something like this:
LS_Country_Long <- pivot_longer(LS_Country,
cols = c(T1LS, T2LS),
names_to = "Time",
values_to = "Means")
LS_Country_Long
## # A tibble: 10 x 5
## Country sample_size LS_Diff Time Means
## <chr> <int> <dbl> <chr> <dbl>
## 1 USA 104 0.0519 T1LS 3.98
## 2 USA 104 0.0519 T2LS 4.03
## 3 UK 90 -0.116 T1LS 4.14
## 4 UK 90 -0.116 T2LS 4.02
## 5 Poland 29 0.186 T1LS 3.81
## 6 Poland 29 0.186 T2LS 4
## 7 Portugal 23 -0.252 T1LS 3.90
## 8 Portugal 23 -0.252 T2LS 3.65
## 9 Canada 15 0.293 T1LS 3.83
## 10 Canada 15 0.293 T2LS 4.12
I then create the graph using ggplot and geom_bar. Assigning country to the x axis and means to the y axis. By using ‘Time’ as my fill argument, it differentiates Time 1 and Time 2 with red and blue. I had to use stat = ‘identity’ which tells ggplot that I will provide the values for y, rather than letting it aggregate the number of rows for each x axis (which is the default). The position = “dodge” argument lets me put the bars for each country next to each other, something I discovered after a bit of Google searching. Width = 0.5 lets me set the width of the bars. scale_y_continuous(limits = c(0, 5)) lets me set the limits of the y axis to be 0 to 5, which is the scale of the life satisfaction measure. expand = c(0, 0) makes it so there’s not gap between the x axis and the graph. labs is used to let me label the y axis. scale_fill_discrete(labels = ) enables me to label the fill variable (Time) and therefore change the key labels.
LS_Country_Graph <- ggplot(data = LS_Country_Long) +
geom_bar(mapping = aes(x = Country, y = Means, fill = Time),
stat = 'identity',
position = "dodge",
width = 0.5) +
scale_y_continuous(limits = c(0, 5), expand = c(0, 0)) +
labs(y = "Mean life satisfaction") +
scale_fill_discrete(labels = c("Pre-Covid Life Satisfaction", "During-Covid Life Satisfaction")) # labels for the key
LS_Country_Graph
The graph shows very minor differences in life satisfaction across countries and also very little difference within countries between time 1 and time 2. Portugal was the most negatively affected by Covid-19, with an average drop in life satisfaction of 0.25 points. Conversely, Canada saw an on average increase in life satisfaction of 0.29 points. Countries like USA and the UK likely had smaller changes in life satisfaction scores as they had more participants than Poland, Portugal and Canada. The differences in life satisfaction between time 1 and time 2 can be seen below. These results support my reaction that time 2 measurements were taken too pre-maturely into the lifespan of the pandemic.
LS_Country %>%
select(Country, LS_Diff)
## # A tibble: 5 x 2
## Country LS_Diff
## <chr> <dbl>
## 1 USA 0.0519
## 2 UK -0.116
## 3 Poland 0.186
## 4 Portugal -0.252
## 5 Canada 0.293
In my first reaction to the paper, I was surprised that the isolation of Covid-19 had no real impact on social connectedness and loneliness for the most introverted and extraverted participants. As a result, I wanted to explore to see if participant loneliness differed between genders during Covid-19 compared to before Covid-19. In study 2, the researchers measured loneliness pre-Covid using a 20-item loneliness measure. Participants loneliness scores were re-measured during Covid-19 (in early April). A score of 1 (never lonely) was the minimum score, while a score of 4 (often lonely) was the maximum score.
library(ggplot2) # for creating the graph
library(tidyverse) # for ggplot
library(dplyr) # for dataframe manipulation
Study_2 <- read.csv("Study 2.csv")
Here we are calling from Study_2, grouping by gender and using summarise to generate a column that specifies the percentage of genders in study 2. This is the same code that was used in the gender demographics for the verification section.
#55% males overall in study 2
Study_2 %>%
group_by (Gender) %>%
summarise (percent = 100 * n() / nrow (Study_2))
## # A tibble: 3 x 2
## Gender percent
## * <chr> <dbl>
## 1 Female 44.9
## 2 Male 54.8
## 3 Non-Binary 0.298
Because the sample size is too small for the ‘Non-Binary’ category (n = 1), I will just use male and female because there’s almost an even 50/50 split between them.
Because I’m choosing to compare the two visually, I’m going to make my data frame long to make life easier. I select the variables I need, and then pipe into a pivot_longer function. cols lets me select the columns I want to pivot. names_to lets me send the “T1Lonely” and “T2Lonely” variable names into the cells, and values_to sends the values of the loneliness scores to a new column called “values”, which aligns each value to its class.
GenderLonely_long <- Study_2 %>%
select(Gender, T1Lonely, T2Lonely) %>%
pivot_longer(cols = T1Lonely:T2Lonely,
names_to = "time",
values_to = "values")
GenderLonely_long
## # A tibble: 672 x 3
## Gender time values
## <chr> <chr> <dbl>
## 1 Male T1Lonely 1.42
## 2 Male T2Lonely 1.47
## 3 Female T1Lonely 3.16
## 4 Female T2Lonely 2.89
## 5 Female T1Lonely 2.68
## 6 Female T2Lonely 2.74
## 7 Male T1Lonely 1.42
## 8 Male T2Lonely 2.21
## 9 Male T1Lonely 1.42
## 10 Male T2Lonely 1.11
## # ... with 662 more rows
I spent a lot of time fiddling around with this section. Originally I had split up my dataframes into male and female groups and merged them like I did in my 2nd exploratory analysis. But I knew with the data being long, that there would be a way to do it all in 1 code chunk. It turned out to be a lot simpler than I had thought. I plugged in my long dataset, grouped by gender and class, and then I created a new column that takes the means of the loneliness scores, grouped by gender and time. Finally, I filtered out the n = 1 non-binary participant due to the small sample size.
Gender_lonely <- GenderLonely_long %>%
group_by(Gender, time) %>%
summarise(Loneliness = mean(values)) %>%
filter(Gender != "Non-Binary")
## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.
Gender_lonely
## # A tibble: 4 x 3
## # Groups: Gender [2]
## Gender time Loneliness
## <chr> <chr> <dbl>
## 1 Female T1Lonely 2.08
## 2 Female T2Lonely 2.00
## 3 Male T1Lonely 2.16
## 4 Male T2Lonely 2.12
I tried to simplify the graph as much as I could. I set my aesthetics to group the lines by Gender, and set my axes; linetype was set to Gender as well so that the lines would look different. I added geom_line() to add the line to graph, and geom_point to insert the points of means. scale_y_continuous lets me set the y axis limits with the limits argument, and bring the graph onto the x-axis with the expand argument.
Gender_lonelyline <- ggplot(Gender_lonely,
aes(group = Gender,
x = time,
y = Loneliness,
linetype = Gender)) +
geom_line() +
geom_point() +
scale_y_continuous(limits = c(1, 4), expand = c(0, 0)) +
labs(x = "Time") +
scale_x_discrete(labels = c("Before Pandemic", "During Pandemic")) +
theme_gray()
Gender_lonelyline
Interestingly, the loneliness scores do not differ much between pre-covid and during covid, there is only a very slight decrease for both genders. On average, females seem to experience slightly less loneliness compared to males, both pre and during Covid-19.
I would recommend using clearer variable names in the codebook and R files. A good variable name lets you know what it is at a glance. For example, instead of using BMPN (the tool used to measure relatedness) to represent relatedness in the code, just use “Relatedness”. It sounds simple, but just naming things what they are can save a lot of confusion in reading code.
The studies had a number of measures, but never operationalised what was considered a “high” or “low” score for any of the variables. They also never outlined whether a higher score was better or worse for the variable it was measuring. Therefore, I had to go through the actual measures externally on Google and figure out what was considered a significant score which was a bit of a time waster.
While this seems like an obvious one, just including code for all analyses, including figures. While it might take the researchers a while to annotate what the code is doing, this would make the whole process of replication so much faster. When trying to replicate, you wouldn’t have to waste so much time working out what packages, functions, arguments and aesthetics/themes they used. It would also open up the opportunity to first reproduce the researchers results with your own processes, and then cross reference afterwards if you got different results to see whether or not their code is accurate.
Canning, D., Karra, M., Dayalu, R., Guo, M., & Bloom, D. E. (2020). The association between age, COVID-19 symptoms, and social distancing behavior in the United States. MedRxiv.