Part 1: Summary and Reaction

Summary

The Covid-19 pandemic that began in early 2020 was unprecedented and showed how under-prepared many nations were to face such a threat. However, it also acted as something of a global scale social experiment. With many having to quarantine and completely change the way they work and live, there became opportunities to study psychology, as such socially isolating conditions were not attainable prior to the pandemic and would not be attainable post-pandemic. Folk, et al. (2020) capitalized on this and explored the impact of the socially depriving aspects of Covid-19 on social-connectedness, comparing measures conducted prior to the pandemic with the same recruits measured during the pandemic. Analyses of individual extraversion levels were also explored to see how social connectedness impacted extraverted and introverted groups. Because this was the first time nations across the globe have mandated quarantine like we’ve seen during Covid-19, there has been no prior research on the matter. Thus, allowing Folk et al. to research a previously unfilled gap in Psychology.

Two studies were conducted, study 1 collected data from (n = 467) undergrads from a Canadian university, while study 2 used (n = 336) adult participants from 28 countries. In study 1, students were surveyed at time 1 (pre-Covid) on social connectedness, with measures of lethargy used as a proxy for well-being. At time 2, early April 2020 (during Covid-19), the same students were surveyed on social connectedness, but also included measures of extraversion and experiences during Covid-19, for exploratory analyses. In study 2, at time 1, participants were recruited to complete an online survey that measured personality, social connectedness, subjective wellbeing and demographics. At time 2, early April 2020 (during Covid-19), the same students were surveyed on the same measures as time 1, as well as including exploratory measures around experiences during Covid-19.

Folk et al. found that in general, there was no significant change in feelings of social connectedness as a result of the pandemic. Through an exploratory analysis of impact on the most extraverted (> 75th percentile) and introverted participants (< 25th percentile) in study 1, findings show that extraverts experienced a larger drop in social connectedness than introverts between time 1 and 2. However, this can be attributed to the fact that extraverts had more social connection to “lose”, as impact of extraversion on social connection disappeared when controlled for levels of social connectedness before the pandemic. What was also found was that levels of loneliness slightly decreased for introverts between time 1 (pre-Covid) and time 2 (during Covid).

Reaction

I was surprised that there seemed to be no big changes in social connectedness and loneliness for introverted or extraverted participants, despite the isolating nature of the pandemic. I would have expected extraverted individuals to experience a large decline in feelings of social connectedness & loneliness. While I would have expected introverts to experience a much smaller drop in social connectedness and perhaps a large decrease in loneliness in comparison to extraverts.
I was confused by the fact that they were able to draw results about extraversion in study 1 when they didn’t measure extraversion of participants in the pre-Covid measure. By asking for a pre-Covid retrospective rating of extraversion, the experiment is lending itself to being influenced by the Hawthorne effect. I was also confused as to why they believed lethargy ratings could be used as a rough proxy to measure well-being.
I wonder whether the low changes in social connectedness were related to the second measure being conducted relatively short time after the first lockdown happened. Perhaps if measures of social connectedness were conducted after the first lockdown had been prolonged for a while, or during a second lockdown (depending on what country you’re in) there would be a difference in the results.

Part 2: Verification

In this study, I verified:

Demographics data for:

Study 1:

n = 467, Age = (M = 20.89, SD = 3.03), Gender: 77% women

Study 2:

n = 336, Age: (M = 32.03, SD = 11.94), Gender: 55% Male, Ethnicity: 80% White, Country: 32% U.S., 27% U.K.)

Means and Correlations Among Variables at Time 1 and Time 2 for:

Study 1

Study 2

Physical Distancing by Country (Study 2)

Histograms for:

Study 1 - Distribution of Social Connectedness Differences scores

Study 2 - Distribution of relatedness and loneliness difference scores

Study 1 - demographics

To start, we verified the demographics of the study. This was an easy process as our study had a codebook, so we could identify what the variables meant.

Read in libraries

First, we read in our libraries.

library(dplyr) # for summary, group_by and %>% 
library(tidyverse) # for dplyr

Read in data

Reading in the study 1 data.

Study_1_data <- read.csv("Study 1 data.csv")

Confirm n = 467

In this code, we try to confirm that the number of participants in study 1 was 467. length gets the length of a vector, in this case, the length of participant ID. unique is in there to return the vector to length. Our result confirms n = 467.

length(unique(Study_1_data$Participant.ID))

## [1] 467

Confirm Age demographics, confirm if M = 20.89, SD = 3.03

Here we are trying to verify that the mean age of participants was 20.89, and the SD was 3.03. I had to convert “Age” from a character to an integer using as.integer, to be able to calculate the mean, because it kept appearing as ‘NA’ in the output. This was weird because the Age data in study 2 was set to integer correctly. The na.rm argument removes all missing (NA) values.

Study_1_data$Age <- as.integer(Study_1_data$Age)
sd(Study_1_data$Age, na.rm  = TRUE)

## [1] 3.034918

mean(Study_1_data$Age, na.rm  = TRUE)

## [1] 20.88616

And it was successfully confirmed!

Confirm Gender demographics, confirm if 77% women

Here we group the data by gender and create a percent column that counts the number of participants of each gender and puts it over the total number of participants in the study.

Study_1_data %>% 
  group_by(Gender) %>% 
  summarise(percent = 100 * n()/ nrow(Study_1_data)) # count number of participants in each category, over the total number of participants in study 1.

## # A tibble: 4 x 2
##   Gender              percent
## * <chr>                 <dbl>
## 1 [Decline to Answer]   0.428
## 2 Man                  21.6  
## 3 Other                 0.857
## 4 Woman                77.1

Confirming that the percentage of women in the study was 77%.

Study 2 Demographics

The process of reproducing the study 2 demographics was pretty much the same.

Read in data

Reading in the data.

Study_2 <- read.csv("Study 2.csv") # read data

Number of participants, confirm if n = 336

length which gets length of a vector, in this case, the length of participant ID. unique is in there to return the vector to length. This confirms the number of participants for study 2 to be 336.

length(unique(Study_2$Participant_ID)) #no. of participants

## [1] 336

Mean and SD age of participants, confirm age M = 32.03 SD = 11.94

Calculating the mean and SD of Age from the Study_2 dataframe.

mean(Study_2$Age) # mean of participants age

## [1] 32.02679

sd(Study_2$Age) #standard deviation of participants age

## [1] 11.94299

% of males in study 2, confirm if = 55% males

Here try to confirm the % of males in the study. We first call Study_2, and say group_by gender and then create a new column with the percentage of genders using the summarise function.

#55% males overall in study 2
Study_2 %>% 
  group_by (Gender) %>% 
  summarise (percent = 100 * n() / nrow (Study_2))

## # A tibble: 3 x 2
##   Gender     percent
## * <chr>        <dbl>
## 1 Female      44.9  
## 2 Male        54.8  
## 3 Non-Binary   0.298

% of ethnicities in study 2, confirm if = 80% white

To create a percentage table for ethnicities, we use the same process as we did for the gender table. As we can see, 80% of participants were white (79.76% rounded up).

#80% white, percentage of ethnicities in study 2
Study_2 %>% 
  group_by(Ethnicity) %>% 
  summarise (percent = 100 * n() / nrow (Study_2))

## # A tibble: 8 x 2
##   Ethnicity                      percent
## * <chr>                            <dbl>
## 1 American Indian/Alaskan Native   0.298
## 2 Asian                            8.93 
## 3 Black                            2.68 
## 4 Hispanic                         4.17 
## 5 Middle-Eastern                   0.893
## 6 More than one                    2.98 
## 7 Other                            0.298
## 8 White                           79.8

What percentage of participants are from the US & UK?, confirm if = 32% US, 27% UK

Same process, but we added an arrange function in descending order so that USA and UK appear at the top. There is a rounding error for the USA statistic, the percentage should be 31%, not 32%, so that failed to be reproduced. But the UK statistic was correct.

#32% US, 27% UK, what percentage of participants are from the US & UK?
Study_2 %>% 
  group_by(Country) %>% 
  summarise (percent = 100 * n() / nrow (Study_2)) %>% 
  arrange(desc(percent))

## # A tibble: 29 x 2
##    Country  percent
##    <chr>      <dbl>
##  1 USA        31.0 
##  2 UK         26.8 
##  3 Poland      8.63
##  4 Portugal    6.85
##  5 Canada      4.46
##  6 Greece      3.27
##  7 Italy       2.98
##  8 Germany     1.79
##  9 Mexico      1.79
## 10 Slovenia    1.49
## # ... with 19 more rows

Means and Correlations Among Variables at Time 1 and Time 2

Study 1 - Means and Correlations Among Variables at Time 1 and Time 2

Read packages

library(janitor) # for the clean_names function
library(tidyverse) # for dplyr and ggplot
library(dplyr) # for various helpful data manipulation functions.
library(gt) # to create the gt table

Read in the data

clean_names makes all the variable names lower case. We then select all the columns relevant to the table using the select function. We then renamed all the columns to the names they used in the paper using the rename function.

# Correlations and Means (SD) tables for Study 1 and Study 2 

S1master <- read_csv("Study 1 Data.csv") %>% 
  clean_names() %>% 
  
  # Let's first select all the relevant columns we'll be manipulating and get rid of gender and age.
  select(participant_id:extraversion, -gender, -age) %>% 
  
  # Rename them to match the original paper's labels
  rename("T1 Lethargy" = lethaverage_t1,
         "T2 Lethargy" = lethaverage_t2,
         "Lethargy Diff (T2 - T1)" = leth_diff,
         "T1 Social Connectedness" = scaverage_t1,
         "T2 Social Connectedness" = scaverage_t2,
         "Connectedness Diff (T2 - T1)" = s_cdiff, 
         "Extraversion" = extraversion)

Creating the table

We then create two data frames using the summarise function to find the mean and SD for the relevant columns. pivot_longer was used to convert the data frame from wide to long. After these two dataframes are created, we merge the two dataframes together. We use mutate_if and it’s arguments to mutate only numerics and round them to 2 dp. Once we merge the data frames for the means and sd of each variable, we correlate the data for the s1 into a different dataframe.

We select the variables we want to perform the correlation on with the select() function and put that into a variable called ‘correlation’. Then, we use the cor function to correlate and use round to round to 2 decimal places. We then want to only display correlations for the bottom half of the table to reduce redundancies, so we filled the upper.tri for s1upcor with just empty space so it appears blank (""). We then convert the table from a matrix into a dataframe using as.data.frame, because gt does like matrices. We then bind both the meansd data frame, and s1upcor dataframe together, which binds meansd to the bottom of the table. We then put our bound dataframe into a gt function and added rownames_to_stub = TRUE to make the row names the same as the column names, we had a lot of trouble trying to figure out how to do this. Finally, we added a title and changed the font with opt_table_font.

# We first create a summary of means and sd. Pivoting longer is necessary for both dataframes in order to merge them horizontally later! 
S1mean <- S1master %>% 
  summarise(across(where(is.numeric), mean)) %>% # get everything that's numeric
  pivot_longer(!participant_id, names_to = "variable", values_to = "Mean") # the exclamation mark indicates that we don't want to edit the participant_id column. All the columns get labeled under "variable" and make values 

# Do the same for sd
S1sd <- S1master %>%  
  summarise(across(where(is.numeric), sd)) %>% 
  pivot_longer(!participant_id, names_to = "variable", values_to = "SD") 

# Merge the two dataframes by "variable", which is the only common column across the two sets
meansd <- merge(S1mean, S1sd, by = c("variable")) %>% 
  mutate_if(is.numeric, round, digits = 2) %>% # round the numbers to 2 decimal places.
  mutate(SD = paste0("(", (SD), ")")) %>% # adds the brackets around the SD in the last row.
  unite("Mean (SD)", Mean, "SD", sep = " ") %>% # Merges mean and SD columns together, and puts a space in between the mean and the SD. 
  pivot_wider(names_from = "variable", values_from = "Mean (SD)") %>% # we pivot wider so that the variable names turn into column headers. 
  select("Connectedness Diff (T2 - T1)":"T2 Social Connectedness") # we get rid of the columns we don't need.
  row.names(meansd)[1] <- "Mean (SD)" # Changes the name of the row to "Mean (SD)" 

  
  
correlation <- S1master %>% 
  select("T1 Lethargy":"Extraversion") # We need to select only the variables we want to perform a pairwise correlation on

s1corr <- round(cor(correlation), 2) # the "2" restricts the number of decimal places to 2 
s1upcor <- s1corr

# Let's now select only the bottom half triangle of the matrix to reduce the redundancy
s1upcor[upper.tri(s1corr)]<-""
s1upcor<-as.data.frame(s1upcor) # This step turns the matrix into a dataframe for us to feed into the gt function. Bind combines them.

s1final <- rbind(s1upcor, meansd)
s1final <- s1final %>% 
  gt(rownames_to_stub = TRUE) %>% 
  tab_header(
    title = md("**Table 1**: Means and Correlations Among Variables at Time 1 and Time 2 (Study 1)")
  ) %>% 
  opt_align_table_header("left") %>% 
  cols_width(
    everything() ~px(150)
  ) %>% 
  opt_table_font("Times New Roman")

s1final

Table 1: Means and Correlations Among Variables at Time 1 and Time 2 (Study 1)

	T1 Lethargy	T2 Lethargy	Lethargy Diff (T2 - T1)	T1 Social Connectedness	T2 Social Connectedness	Connectedness Diff (T2 - T1)	Extraversion
T1 Lethargy	1
T2 Lethargy	0.41	1
Lethargy Diff (T2 - T1)	-0.48	0.6	1
T1 Social Connectedness	-0.48	-0.27	0.16	1
T2 Social Connectedness	-0.36	-0.41	-0.08	0.66	1
Connectedness Diff (T2 - T1)	0.16	-0.16	-0.3	-0.45	0.38	1
Extraversion	-0.28	-0.14	0.11	0.56	0.5	-0.09	1
Mean (SD)	2.6 (1.16)	3.16 (1.27)	0.56 (1.33)	4.11 (0.88)	3.97 (0.85)	-0.14 (0.71)	4.17 (1.01)

Study 2 - Correlations Among Variables for Time 1 and Time 2

The process for creating the correlations table for study 2 was essentially the same, just with some different values and variable names etc. Some minor adjustments were made to this table, like changing the order of the columns to match the table in the paper: s2correlation <- s2correlation[, c(2, 3, 4, 8, 9, 10, 5, 6, 7, 1)].

# Let's do the same for study 2 

s2 <- read_csv("Study 2.csv") %>% 
  clean_names() %>% 
# Let's first select all the relevant columns we'll be manipulating
  select(participant_id:bmpn_diff, -gender, -age, -ethnicity) %>% 
  
  # Rename them to match the original paper's labels
  rename("T1 Extraversion" = t1extraversion,
         "T1 Life Satisfaction" = t1swls,
         "T2 Life Satisfaction" = t2swls,
         "Life Satisfaction Change (T2 - T1)" = swls_diff,
         "T1 Relatedness" = t1bmpn,
         "T2 Relatedness" = t2bmpn, 
         "Relatedness Change (T2 - T1)" = bmpn_diff,
         "T1 Loneliness" = t1lonely,
         "T2 Loneliness" = t2lonely,
         "Loneliness Change (T2 - T1)" = lonely_diff)

# Again, we create a summary of means and sd.
s2mean <- s2 %>%
  summarise(across(where(is.numeric), mean)) %>% 
  pivot_longer(!participant_id, names_to = "variable", values_to = "Mean")

s2sd <- s2 %>%  
  summarise(across(where(is.numeric), sd)) %>% 
  pivot_longer(!participant_id, names_to = "variable", values_to = "SD")

# Merge the two dataframes by "variable", which is the only common column across the two sets
s2meansd <- merge(s2mean, s2sd, by = c("variable")) %>% 
  mutate_if(is.numeric, round, digits = 2) %>% 
  mutate(SD = paste0("(", (SD), ")")) %>% 
  unite("Mean (SD)", Mean, "SD", sep = " ") %>% 
  pivot_wider(names_from = "variable", values_from = "Mean (SD)") %>% 
  select("Life Satisfaction Change (T2 - T1)":"T2 Relatedness")
  row.names(s2meansd)[1] <- "Mean (SD)"
  
  
s2correlation <- s2 %>% 
  select("T1 Extraversion":"Relatedness Change (T2 - T1)") # We need to select only the variables we want to perform a pairwise correlation on

s2correlation <- s2correlation[, c(2, 3, 4, 8, 9, 10, 5, 6, 7, 1)] # Puts extraversion as last column. Because in the paper, extraversion was the last column in the table. 

s2corr <- round(cor(s2correlation), 2) # the "2" restricts the number of decimal places to 2 

# Let's now select only the bottom half triangle of the matrix to reduce the redundancy
s2corr[upper.tri(s2corr)]<-""
s2upcor <- as.data.frame(s2corr) # This step turns the matrix into a dataframe for us to feed into the gt function 

s2final <- rbind(s2upcor, s2meansd)

s2final <- s2final %>% 
  gt(rownames_to_stub = TRUE) %>% 
  tab_header(
    title = md("**Table 3**: Correlations Among Variables for Time 1 and Time 2 (Study 2).")
  ) %>% 
  opt_align_table_header("left") %>% 
  cols_width(
    everything() ~px(150)
  ) %>% 
  opt_table_font("Times New Roman")

s2final

Table 3: Correlations Among Variables for Time 1 and Time 2 (Study 2).

	T1 Life Satisfaction	T2 Life Satisfaction	Life Satisfaction Change (T2 - T1)	T1 Relatedness	T2 Relatedness	Relatedness Change (T2 - T1)	T1 Loneliness	T2 Loneliness	Loneliness Change (T2 - T1)	T1 Extraversion
T1 Life Satisfaction	1
T2 Life Satisfaction	0.83	1
Life Satisfaction Change (T2 - T1)	-0.37	0.22	1
T1 Relatedness	0.51	0.45	-0.14	1
T2 Relatedness	0.35	0.42	0.09	0.5	1
Relatedness Change (T2 - T1)	-0.13	0	0.22	-0.46	0.54	1
T1 Loneliness	-0.56	-0.55	0.07	-0.69	-0.48	0.19	1
T2 Loneliness	-0.46	-0.55	-0.11	-0.58	-0.65	-0.1	0.8	1
Loneliness Change (T2 - T1)	0.2	0.03	-0.29	0.22	-0.24	-0.46	-0.38	0.24	1
T1 Extraversion	0.4	0.4	-0.02	0.34	0.24	-0.08	-0.54	-0.42	0.22	1
Mean (SD)	3.97 (1.53)	3.99 (1.45)	0.02 (0.88)	4.92 (1.09)	4.91 (1.14)	-0.01 (1.11)	2.12 (0.65)	2.06 (0.62)	-0.06 (0.4)	3.9 (0.79)

Physical distancing table (study 2)

This table shows what percentage of participants practice physical distancing and the mean amount of people (excluding household members) that participants came within 6 feet of the day before the measure. This was ordered by countries with the most participants in the study (>15).

Read libaries

library(tidyverse) # for dplyr 
library(dplyr) # for various helpful data manipulation functions.
library(gt) # to create the gt table
library(janitor) # for clean_names

Reading the data

PhyDist <- read_csv("Study 2.csv") %>% 
  clean_names() %>% #changes variable names to lowercase.
  rename(Country = country)

Creating the table

Here’s the code that we used to create the table. It required a lot of trial and error with filtering, arranging, summarising and formatting. The second half of the code is mainly just for aesthetics.

PhyDist <- PhyDist %>%  #summarise % of people phys distancing and mean and sd of 6feet column 
  group_by(Country) %>% #Grouping by each country 
  summarise(sample_size = n(),
            "% Physical \n distancing" = mean(social_distancing),
            "Mean Six feet response \n (SD)" = mean(six_feet), # calculate mean number of people within 6 feet.
            sd_six_feet = sd(six_feet) # calculate sd
            ) %>% 
  ungroup() %>% #ungroup after summarise 
  arrange(desc(sample_size)) %>% #arranging by sample size in descending order
  filter(sample_size > 14) %>% #filter for number of participants greater than 14
  gt() %>% #make it into gt table 

  fmt_number(columns = (3:5), decimals = 2) %>% #columns of gt table - phys distnacing, 6feet, 2dp for 3rd to 5th col
  fmt_number(columns = 5, pattern = "({x})") %>% #puts brackets around sd (column 5)
  fmt_percent(
    columns = vars("% Physical \n distancing"), #converting this col to % values
    decimals = 0
  ) %>% 
  cols_merge(columns = vars(`Mean Six feet response \n (SD)`, sd_six_feet)) %>% #merge m and sd for 6feet

  #aesthetics below  
  
  cols_width( #change col width - aesthetics
    everything() ~ px(120)
  ) %>%
  cols_label( #rename sample size to Sample Size - aesthetics
    sample_size = "Sample Size"
  ) %>% 

  tab_source_note( #produces notes under table - table captions 
    source_note = md("*Note.* Only countries with at least 15 participants displayed")
  ) %>% 
  tab_footnote( #footnote below the table caption
    footnote = "Mean number of people reported to have been within six feet of participant on previous day",
    locations = cells_column_labels( #this associates the footnote with the mean for 6feet column 
      columns = "Mean Six feet response \n (SD)"
    )
  ) %>% 
  opt_footnote_marks(marks = letters) %>% #make footnote a letter instead of number - the little a thing
  tab_header( #heading of overall table
    title = md("**Table S2**"),
    subtitle = md("*Physical Distancing by Country (Study 2)*")
  ) %>%
  opt_table_font("Times New Roman") %>% #align top header left, font, centring data in tables
  opt_align_table_header("left") %>% 
  cols_align(
    align = c("center"),
    columns = everything()
  )

PhyDist

Table S2
Physical Distancing by Country (Study 2)
Country	Sample Size	% Physical distancing	Mean Six feet response (SD)^a
USA	104	93%	1.01 (1.58)
UK	90	96%	1.12 (1.74)
Poland	29	79%	1.48 (2.11)
Portugal	23	100%	0.70 (1.82)
Canada	15	100%	0.87 (1.41)
Note. Only countries with at least 15 participants displayed
^a Mean number of people reported to have been within six feet of participant on previous day

Histograms

Study 1 - Distribution of Social Connectedness Differences scores

Read Packages

library(ggplot2) # To let us create the histograms
library(tidyverse) # For ggplot

Read Data

I used read.csv to read in the .csv data file for study 1 from the OSF. The as_tibble function sets the data frame as a tibble, displaying only the first 10 rows of data in the data frame when called, as shown:

Data <- read.csv("Study 1 Data.csv") #reading in data
Data <- as_tibble(Data)

## # A tibble: 467 x 15
##    Participant.ID Gender Age   LETHAVERAGE.T1 LETHAVERAGE.T2 LethDiff
##             <int> <chr>  <chr>          <dbl>          <dbl>    <dbl>
##  1              1 Woman  25               4.8            5.7      0.9
##  2              2 Woman  20               3.4            5        1.6
##  3              3 Man    23               1.8            2.4      0.6
##  4              4 Man    26               1.2            5.4      4.2
##  5              5 Woman  19               1.9            3.8      1.9
##  6              6 Woman  22               5.1            5.8      0.7
##  7              7 Woman  20               2.5            2.8      0.3
##  8              8 Woman  22               3.9            3.1     -0.8
##  9              9 Woman  22               1.4            4.4      3  
## 10             10 Woman  21               2              1.1     -0.9
## # ... with 457 more rows, and 9 more variables: SCAVERAGE.T1 <dbl>,
## #   SCAVERAGE.T2 <dbl>, SCdiff <dbl>, EXTRAVERSION <dbl>, SPANE.P <int>,
## #   SPANE.N <int>, SPANE.B <int>, SocialDistancing <int>, SixFeet <int>

Graphing

Hist() failure

When first attempting to create the histogram for study 1, I tried to use the Hist function. However, the aesthetics that resulted from using this method did not match that used in the original paper:

hist(Data$SCdiff, = tells R to compute a histogram of the data values in the social connectedness difference column from the dataframe.
xlab = lets you label the x axis.
main = lets you label the title.
cex.main = lets you change the size of the title font.
col = lets you change the colour of the histogram columns.

hist(Data$SCdiff, 
     xlab = "Social Connectedness Difference Score (T2 - T1)", 
     main = "Distribution of social connectedness difference scores (Study 1)", cex.main = 0.75, col = "darkgrey")

geom_histogram(), a process

Jenny then suggested that I use the ggplot’s own histogram function geom_histogram instead. I spent a lot of time working out the kinks of the function. My first attempts at using it were fails, as I couldn’t get the right height on the bars, and the aesthetics were still completely off as you can see below. I put all the code for the histogram into the “picture” variable, not a very unique name, I know. The aes(SCdiff) is mapping the aesthetics of the histogram to use the variable of social connectedness differences. The binwidth argument controls the width of the histogram bars, I set it at 0.5 because at default it was at 30, and did not look anything like the one in the paper.

picture <- ggplot(Data, (aes(SCdiff))) + 
  geom_histogram(binwidth = 0.5, colour = 'black', fill = 'darkgrey') +
  labs(x = "Social Connectedness Difference Score (T2 - T1)", y = "Frequency")
  

print(picture)

The reason this looks so different from the one in the paper was because I was yet to discover a couple of very helpful arguments that would enable me to perfectly replicate the graph in the paper, including ggplot themes!

The graph below was my finished graph. boundary was set to 0 so that the middle of the bars did not align with the middle of the x-axis intervals. scale_x_continuous and scale_y_continuous allow me to adjust the x and y axes. The limits arguments allow me to set the limits of the axis’. breaks allows me to set the intervals for my x axis. expand set to (0, 0) for both axes means that there will be no spacing between the plot and the x & y axes. Through a bit of Googling around themes, I discovered theme_classic() which was the theme used in the paper for histograms, removing the background grid and styling the axes.

picture <- ggplot(Data, (aes(SCdiff))) + 
  geom_histogram(binwidth = 0.5, boundary = 0, colour = 'black', fill = 'darkgrey') +
  labs(x = "Social Connectedness Difference Score (T2 - T1)", y = "Frequency") +
  scale_x_continuous(limits = c(-3.5, 3), breaks = c(-3, -2, -1, 0, 1, 2, 3), expand = c(0, 0)) +
  scale_y_continuous(expand = c(0, 0)) +
  theme_classic()

print(picture)

Study 2 - Distribution of relatedness and loneliness difference scores

Read packages

library(tidyverse) # For ggplot
library(patchwork) # Used to merge the two figures together
library(extrafont) # Used to change the font of the figure

## Registering fonts with R

library(ggplot2) # To create the histogram

The OSF for Study 2 provided a markdown document with a few different chunks of code that were used to conduct separate analyses, but none were used during reproducibility of the figures.

Reading the data

Reading in the study 2 csv:

Study2 <- read.csv("Study 2.csv")

Graphing

Because I had already made the first histogram for Study 1, I already had the code to create the study 2 histograms. The only new functions/arguments I had to learn were to do with the aesthetics.

I didn’t know why ‘BMPN’ was supposed to represent relatedness, but then I looked in the measures pdf and it stood for ‘Balanced Measure of Psychological Needs’, which they used to measure relatedness. I think they could of used something more obvious for the variable names like they did for loneliness.

In the code chunk specifications, I set the height of the figure using fig.height and set the width using fig.width. I used the ggtitle function to create titles for each of my histograms. All other functions and arguments are repurposed from the first histogram, just with altered values to match the new histograms. I set the relatedness_hist x limits to NA for the left side because there is no far left number on the graph in the paper. I discovered you could use preset themes like theme_classic in conjunction with custom theme elements with the theme function. text = element_text(family = "Times New Roman") lets me set my figure font to Times New Roman. plot.title = element_text(face = "bold", hjust = 0.5) lets me bold and adjust my title horizontally to the middle of the figure (hjust is only defined between 0 and 1, so 0.5 is in the middle). axis.line = element_line(colour = 'black', size = 0.75) lets me set the colour and size of the axis lines. axis.ticks = element_line(colour = "black", size = 0.75) lets me change the colour and size of the axis notches. The patchwork package lets me merge the two figures together side-by-side, simply by using: relatedness_hist + loneliness_hist.

relatedness_hist <- ggplot(Study2, (aes(BMPN_Diff))) +
  ggtitle("Distribution of Relatedness Difference Scores") +
  geom_histogram(binwidth = 0.5, boundary = 0, colour = 'black', fill = 'grey') +
  labs(x = "Relatedness Difference Score (T2 - T1)", y = "Frequency") +
  scale_x_continuous(limits = c(NA, 4), expand = c(0, 0)) + 
  scale_y_continuous(expand = c(0, 0), limits = c(0, 80)) +
  theme_classic() +
  theme(text = element_text(family = "Times New Roman"),
        plot.title = element_text(face = "bold", hjust = 0.5),
        axis.line = element_line(colour = 'black', size = 0.75),
        axis.ticks = element_line(colour = "black", size = 0.75))


loneliness_hist <- ggplot(Study2, (aes(Lonely_Diff))) +
    ggtitle("Distribution of Loneliness Scores") +
  geom_histogram(binwidth = 0.1, boundary = 0, colour = 'black', fill = 'grey') +
  labs(x = "Loneliness Difference Score (T2 - T1)", y = "Frequency") +
  scale_x_continuous(limits = c(-2, 2), expand = c(0, 0)) + # just for x axis
  scale_y_continuous(expand = c(0, 0), limits = c(0, 50)) + # just for y axis
  theme_classic() +
  theme(text = element_text(family = "Times New Roman"),
        plot.title = element_text(face = "bold", hjust = 0.5),
        axis.line = element_line(colour = 'black', size = 0.75),
        axis.ticks = element_line(colour = "black", size = 0.75))

relatedness_hist + loneliness_hist

Error Bar Line Graph - Changes in social connectedness and loneliness for the most introverted and extraverted participants (Study 1 and Study 2; 95% CI error bars)

Read packages

library(tidyverse) # For dplyr and ggplot
library(ggplot2) # To create the graphs
library(dplyr) # For manipulation of dataframes
library(reshape2) # For the melt function

## 
## Attaching package: 'reshape2'

## The following object is masked from 'package:tidyr':
## 
##     smiths

library(extrafont) # Used to change the font of the figure
library(patchwork) # Used to merge the two figures together

Graphing

This part of the replication process was by far the most time consuming. There was lots of trial and error, I have many different code chunks that attempted to replicate it. All have weird comments, and many lines of code were commented out to test where I was running into issues, or if I wanted to come back to old code if new code I tried didn’t work. Here’s an attempt I labelled ATTEMPT 1 in angry capital letters:

My first failed attempt

In this attempt, I was only calculating one graph, because I figured if I could one, I could do another. This chunk tries to calculate the means and standard deviations of change in lonelinessness scores for the most introverted and extraverted participants in study 2. I don’t remember the exact logic behind the code because, but quantile lets you examine values about a certain threshold. subset creates vectors that meet conditions (in this case less than or equal to 3.33333 for the most introverted, and greater than or equal to 4.416667 for most). I also mistakenly didn’t use the right signage, so they both were <= symbols. Disaster. So we filter these values and create new dataframes with them. Then we call each data frame when calculating the mean and sd and filter out just the loneliness variable for each time.

ATTEMPT1:

# Means and SDs Lonely
subset
# most introverted change in loneliness
quantile(Study2$T1Extraversion)
MostIntrovertedS2 <- subset(Study2, T1Extraversion <= 3.33333)

IntLonelyMean <- mean(MostIntrovertedS2$T2Lonely)
sd(MostIntrovertedS2$T2Lonely)
mean(MostIntrovertedS2$T1Lonely)
sd(MostIntrovertedS2$T1Lonely)

#most extraverted, change in loneliness
quantile(Study2$T1Extraversion)
MostExtravertedS2 <- subset(Study2, T1Extraversion <= 4.416667)

mean(MostExtravertedS2$T2Lonely)
sd(MostExtravertedS2$T2Lonely)
mean(MostExtravertedS2$T1Lonely)
sd(MostExtravertedS2$T1Lonely)

I then went on to make the original dataframe ‘long’ so that I can manipulate it easier. I used the melt function from the reshape2 package, which the tutors pointed out to me in week 8, was oudated, and the pivot_longer function was better. id.vars sets my identification variable to be participant ID in the dataframe. measure.vars sets my measurement variables to be T1Lonely and T2Lonely. The colnames function does what it sounds like, lets me set the column names.

Study2_long <- melt(Study2,
                  id.vars = "Participant_ID",
                  measure.vars = c("T1Lonely", "T2Lonely"))

colnames(Study2_long) = c("ID", "Time", "Loneliness")

The rest of this code is just a mix of confusion and sadness. I was trying to wrap my head around how to plot 3 different things: time 1 and 2, loneliness change means, filtered by introversion/extraversion. I can’t really even comment on the rest, because I don’t know what I was trying to do.

IntroLine <- Study2_long %>% # Trying to filter out introvert only line.
  filter(Loneliness == 'Introvert')

lonelyline = ggplot(Study2_long,
                    aes(Time, Loneliness,)
                    ) +
  geom_line() +
    geom_errorbar(width=.1, aes(ymin=Time-ci, ymax=Time+ci)) +
    geom_point(shape=21, size=3, fill="white") +
    ylim(40,60)

lonelyline




#lonelyline +
 # stat_summary(fun = mean, ## Adds points
  #             geom = "point") +
  #stat_summary(fun = mean, ## Adds the line
  #             geom = "line",
  #             aes(group = 1)) +
  #stat_summary(fun.data = mean_cl_normal,
  #             geom = "errorbar",
  #             width = .2) +
  #xlab("Time") +
  #ylab("Mean Loneliness") +
  #scale_x_discrete(labels = c("Before Pandemic", "After Pandemic"))

My successful attempt

First I read in the data. Then I used the melt function to convert my data long. I used extraversion as an id variable in this attempt, my rationale was that it might be easier to filter them that way. Social connectedness scores for time 1 and 2 were my measure variables. colnames was used to rename the columns.

################### STUDY 1 Social Connectedness ##################

#read data
Study_1 <- read.csv("Study 1 data.csv")

# Make Study_2 Long
Study1_long <- melt(Study_1,
                    id.vars = c("Participant.ID", "EXTRAVERSION"),
                    measure.vars = c("SCAVERAGE.T1", "SCAVERAGE.T2"))

# Naming the columns
colnames(Study1_long) = c("ID", "EXTRAVERSION", "Time", "SocialConnectedness")

I then take the new long data frame and make dataframes out of it, one df for social connectedness scores for the most introverted and one df for the most extraverted participants. I used filter to look for the most introverted (EXTRAVERSION <= 3.41667) and to look for the most extraverted (EXTRAVERSION >= 4.8333). group_by was then used to group together groups by the time of the measurement (time 1 or 2). I then found used summarise to add a mean column & calculated the mean of social connectedness scores, as grouped by time 1 and 2. I then calculated the 95% confidence interval limits and hard coded them as their own variables.

#filter introvert/extravert for T1 SC
T1_1IntroSC <- Study1_long %>% 
  filter(EXTRAVERSION <= 3.41667) %>% 
  group_by(Time) %>% 
  summarise(
    mean = mean(SocialConnectedness)
  )
  SCupperIn = c(3.573387, 3.469464)
  SClowerIn = c(3.3195757, 3.232880)

#filter extravert only for T1 SC
T1_1ExtraSC <- Study1_long %>% 
  filter(EXTRAVERSION >= 4.8333) %>% 
  group_by(Time) %>% 
  summarise(
    mean = mean(SocialConnectedness)
  )
  SCupperEx = c(4.8294, 4.589643)
  SClowerEx = c(4.579, 4.301045)

T1_1IntroSC

## # A tibble: 2 x 2
##   Time          mean
## * <fct>        <dbl>
## 1 SCAVERAGE.T1  3.45
## 2 SCAVERAGE.T2  3.35

T1_1ExtraSC

## # A tibble: 2 x 2
##   Time          mean
## * <fct>        <dbl>
## 1 SCAVERAGE.T1  4.70
## 2 SCAVERAGE.T2  4.45

To create the error line bar graph for the social connectedness, we make variable socialline = ggplot() + geom_point, to position a point for each mean for the most introverted and extraverted at time 1 and 2. I called the data frames that I made in the previous chunk and set the x axis to ‘Time’ and y to ‘mean’. I used the group argument to differentiate them. geom_line was then used to connect a line between the points, using the same aesthetics as geom_point. The linetype argument lets me change the aesthetics of the line (eg. I can make on of the lines “dashed”). geom_errorbar was added for each line graph. expand_limits lets me set the limits of the y axis. theme_classic was used in combination with custom theme options. One new trick was the use \n in the ggtitle so that it would change which line the second half of the title was on.

socialline = ggplot() + 
   labs(y = "Mean Social Connectedness") + # Label y-axis
   scale_x_discrete(labels = c("Before Pandemic", "During Pandemic")) + # label x-axis
   geom_point(data = T1_1IntroSC, aes(x = Time, y = mean, group = 1)) + # puts two points on the graph, for most introverted social connectedness scores at time 1 and time 2.
   geom_point(data = T1_1ExtraSC, aes(x = Time, y = mean, group = 2)) + # puts two points on the graph, for most extraverted social connectedness scores at time 1 and time 2.
   
   geom_line(data = T1_1IntroSC, aes(x = Time, y = mean, group = 1), linetype = "dashed") +
   geom_line(data = T1_1ExtraSC, aes(x = Time, y = mean, group = 2), linetype = "solid") +
   geom_errorbar(aes(x = T1_1IntroSC$Time, ymin = SClowerIn, ymax = SCupperIn, width = 0.1, group = 1)) + # sets the range for the error bars for the introverted participants by calling from the variables set in the previous chunk.
   geom_errorbar(aes(x = T1_1ExtraSC$Time, ymin = SClowerEx, ymax = SCupperEx, width = 0.1, group = 2)) + # sets the range for the error bars for the extraverted participants by calling from the variables set in the previous chunk.
   
   expand_limits(y = c(3, 5)) + # used to set the limits 
   theme_classic() +
   theme(text = element_text(family = "Times New Roman"),
        plot.title = element_text(hjust = 0.5),
        axis.ticks = element_line(colour = "black", size = 0.75)) +
   ggtitle("Social Connectedness Changes \nBased on Extraversion")

socialline

Then I made the graph for the change in loneliness for the most introverted and extraverted at time 1 and time 2, which was essentially the same code but with different variable names and values. I then used the patchwork package to merge the graphs into 1 figure.

################### STUDY 2 Loneliness #############################

# Lonely - Make study2 long
Study2_long <- melt(Study2,
                  id.vars = c("Participant_ID", "T1Extraversion"),
                  measure.vars = c("T1Lonely", "T2Lonely"))
                
# Naming columns

colnames(Study2_long) = c("ID", "T1Extraversion", "Time", "Loneliness")

# Creating most intro and extrovert data sets by time

T1_2IntroLonely <- Study2_long %>% 
    filter(T1Extraversion <= 3.33333) %>% 
    group_by(Time) %>% 
    summarise(mean = mean(Loneliness))
    LonlowerIn = c(2.391, 2.175)
    LonupperIn = c(2.671, 2.450)
  
  
T1_2ExtraLonely <- Study2_long %>% 
    filter(T1Extraversion >= 4.416667) %>% 
    group_by(Time) %>% 
    summarise(mean = mean(Loneliness))
    LonlowerEx = c(1.5203, 1.568)
    LonupperEx = c(1.734, 1.7788)


# Plotting the graphs

lonelyline = ggplot() +
   labs(y = "Mean Loneliness") +
   scale_x_discrete(labels = c("Before Pandemic", "During Pandemic")) +
   geom_point(data = T1_2IntroLonely, aes(x = Time, y = mean, group = 1)) +
   geom_point(data = T1_2ExtraLonely, aes(x = Time, y = mean, group = 2)) +
   
   geom_line(data = T1_2IntroLonely, aes(x = Time, y = mean, group = 1, linetype = "solid")) +
   geom_line(data = T1_2ExtraLonely, aes(x = Time, y = mean, group = 2, linetype = "dashed")) +
  geom_errorbar(aes(x = T1_2IntroLonely$Time, ymin = LonlowerIn, ymax = LonupperIn, width = 0.1, group = 1)) +
  geom_errorbar(aes(x = T1_2ExtraLonely$Time, ymin = LonlowerEx, ymax = LonupperEx, width = 0.1, group = 2)) +
  
   expand_limits(y = c(1, 3)) +
   scale_linetype_discrete(labels = c("Most Extraverted", "Most Introverted")) + # sets graph key to differentiate between linetypes (dashed and solid lines)
   theme_classic() +
   theme(text = element_text(family = "Times New Roman"), # sets font to times new roman
        plot.title = element_text(hjust = 0.5), # sets title to the middle
        axis.ticks = element_line(colour = "black", size = 0.75), 
        legend.position = "right", # positions the legend on the right side
        legend.title = element_blank()) +
        
   ggtitle("Loneliness Changes \nBased on Extraversion") # creates a title for the loneliness graph
      

socialline + lonelyline #uses patchwork to put the two graphs together into one figure

Part 3: Exploration

3a. Country vs. life satisfaction scores at time 1 and time 2

In my third reaction to the paper, I was critical of the timing of the second measurement being conducted too soon into the spread of Covid-19, with some countries not yet implementing very strict quarantining rules. I thought this may have been a possible explanation for the lack of significant results for social connectedness changes pre and post. To test if this was apparent in other measurements, I wanted to investigate whether country of residence was a predictor of life satisfaction scores before and during Covid.

Loading libraries

library(tidyverse) # for dplyr and ggplot
library(dplyr) # for data frame manipulation functions
library(ggplot2) # for graphs

Reading and creating data frame

To do this, I used the data from study 2 and looked at the life satisfaction at time 1 and time 2, by country. I chose to analyse countries with above >14 participants because country sample sizes below 15 were too small to be meaningful (the paper also did exploratory analyses of countries with above 14 participants).

I called our data frame LS_Country to indicate that we were measuring Life Satisfaction (LS) by country. The life satisfaction difference variable was included to compare figures later on.

LS_Country <- read.csv("Study 2.csv")

LS_Country <- LS_Country %>% 
  group_by(Country) %>% # Group scores by country
  select(Country, T1SWLS, T2SWLS, SWLS_Diff) %>% # Select only the country, T1 and T2 life satisfaction scores.
  summarise(sample_size = n(), # create a new column that displays number of participants
            "T1LS" = mean(T1SWLS), # create a column that displays mean life satisfaction at time 1
            "T2LS" = mean(T2SWLS), # create a column that displays mean life satisfaction at time 2
            "LS_Diff" = mean(SWLS_Diff) # create a column that displays mean life satisfaction differences across T1 and T2
            ) %>% 
  filter(sample_size > 14) %>% # filter out all countries with less than 15 participants
  arrange(desc(sample_size)) # arrange with the highest no. of participants at the top

LS_Country

## # A tibble: 5 x 5
##   Country  sample_size  T1LS  T2LS LS_Diff
##   <chr>          <int> <dbl> <dbl>   <dbl>
## 1 USA              104  3.98  4.03  0.0519
## 2 UK                90  4.14  4.02 -0.116 
## 3 Poland            29  3.81  4     0.186 
## 4 Portugal          23  3.90  3.65 -0.252 
## 5 Canada            15  3.83  4.12  0.293

We are left with 5 out of 28 countries with 15 or more participants: USA, UK, Poland, Portugal and Canada. Now we will compare these mean scores in a bar graph.

To be able to create a bar graph that has a side-by-side comparison for T1 and T2, by country, I first have to convert my dataframe from wide to long. Here I use the pivot_longer function, which takes the T1LS and T2LS columns from my data frame and puts the names of the columns into the data, and aligns the means with each ‘time’ in a separate column labelled ‘Means’. So now my data frame looks something like this:

LS_Country_Long <- pivot_longer(LS_Country, 
                                cols = c(T1LS, T2LS), 
                                names_to = "Time", 
                                values_to = "Means")
LS_Country_Long

## # A tibble: 10 x 5
##    Country  sample_size LS_Diff Time  Means
##    <chr>          <int>   <dbl> <chr> <dbl>
##  1 USA              104  0.0519 T1LS   3.98
##  2 USA              104  0.0519 T2LS   4.03
##  3 UK                90 -0.116  T1LS   4.14
##  4 UK                90 -0.116  T2LS   4.02
##  5 Poland            29  0.186  T1LS   3.81
##  6 Poland            29  0.186  T2LS   4   
##  7 Portugal          23 -0.252  T1LS   3.90
##  8 Portugal          23 -0.252  T2LS   3.65
##  9 Canada            15  0.293  T1LS   3.83
## 10 Canada            15  0.293  T2LS   4.12

Graphing

I then create the graph using ggplot and geom_bar. Assigning country to the x axis and means to the y axis. By using ‘Time’ as my fill argument, it differentiates Time 1 and Time 2 with red and blue. I had to use stat = ‘identity’ which tells ggplot that I will provide the values for y, rather than letting it aggregate the number of rows for each x axis (which is the default). The position = “dodge” argument lets me put the bars for each country next to each other, something I discovered after a bit of Google searching. Width = 0.5 lets me set the width of the bars. scale_y_continuous(limits = c(0, 5)) lets me set the limits of the y axis to be 0 to 5, which is the scale of the life satisfaction measure. expand = c(0, 0) makes it so there’s not gap between the x axis and the graph. labs is used to let me label the y axis. scale_fill_discrete(labels = ) enables me to label the fill variable (Time) and therefore change the key labels.

LS_Country_Graph <- ggplot(data = LS_Country_Long) +
  geom_bar(mapping = aes(x = Country, y = Means, fill = Time), 
           stat = 'identity', 
           position = "dodge",
           width = 0.5) +
  scale_y_continuous(limits = c(0, 5), expand = c(0, 0)) +
  labs(y = "Mean life satisfaction") +
  scale_fill_discrete(labels = c("Pre-Covid Life Satisfaction", "During-Covid Life Satisfaction"))  # labels for the key

LS_Country_Graph

The graph shows very minor differences in life satisfaction across countries and also very little difference within countries between time 1 and time 2. Portugal was the most negatively affected by Covid-19, with an average drop in life satisfaction of 0.25 points. Conversely, Canada saw an on average increase in life satisfaction of 0.29 points. Countries like USA and the UK likely had smaller changes in life satisfaction scores as they had more participants than Poland, Portugal and Canada. The differences in life satisfaction between time 1 and time 2 can be seen below. These results support my reaction that time 2 measurements were taken too pre-maturely into the lifespan of the pandemic.

LS_Country %>% 
  select(Country, LS_Diff)

## # A tibble: 5 x 2
##   Country  LS_Diff
##   <chr>      <dbl>
## 1 USA       0.0519
## 2 UK       -0.116 
## 3 Poland    0.186 
## 4 Portugal -0.252 
## 5 Canada    0.293

3b. Age and social distancing.

I was interested to see how age of participants influenced their compliance with social distancing regulations during Covid-19. Canning, Karra, Dayula, Guo & Bloom (2020) found in their study on social distancing in the United States that older people are having significantly fewer close contacts than younger people. Thus I want to see if those results are consistent in the Folk et al. paper. Study 2 was used in this analysis as it measured both age and social distancing. I assessed the age range of participants in the data set and found that there were a lot more participants that were under 40 than there were over 40. Social distancing was measured using a yes or no question that asked if they were practicising social distancing or not. A score of 1 = they were social distancing, 0 = they were not social distancing.

Load libraries

library(dplyr) # used for data frame manipulation functions
library(tidyverse) # used for dplyr
library(gt) # used for table making and to convert decimal to percentage

Read in the data.

study_2 <- read.csv("Study 2.csv")

I chose age ranges 18 - 22 for the younger demographic, and 40 - 72 for the older demographic. This was mainly because those ranges of age both contained equal number of participants at n = 75. Here is how I calculated the number of participants by age (individually and total).

Calculating number of participants by age, Young group: 18 -> 22 year olds

I created a Younger_age dataframe and selected Age, filtered it to show me only participants 18 or older, but no older than 22. I then grouped by age to group individuals of each age into 1 number. I then added a sample size column that calculated the number of participants per age. Below that is code to sum the total sample size of participants aged 18 to 22, and the total was n = 75.

# Calculates number of participants in each age from 18 - 22
Younger_age <- study_2 %>% 
  select(Age) %>% 
  filter(Age >= 18, Age <= 22) %>% 
  group_by(Age) %>% 
  summarise(Sample_size_young = n())

Younger_age

## # A tibble: 5 x 2
##     Age Sample_size_young
## * <int>             <int>
## 1    18                12
## 2    19                13
## 3    20                17
## 4    21                12
## 5    22                21

# Calculates the total number of participants aged 18 - 22
Younger_age$Sample_size_young %>% 
  sum()

## [1] 75

Calculating number of participants by age, Old group: 40 -> 72 year olds

Here I calculate the sample size of participants that were aged from 40 to 72. I basically did the same thing as I did for the younger group, except changed the values when filtering.

Older_age <- study_2 %>% 
  select(Age) %>% 
  filter(Age >= 40, Age <= 73) %>% 
  group_by(Age) %>% 
  summarise(Sample_size_old = n())

Older_age

## # A tibble: 28 x 2
##      Age Sample_size_old
##  * <int>           <int>
##  1    40               5
##  2    41               4
##  3    42               5
##  4    43               3
##  5    44               1
##  6    45               5
##  7    46               7
##  8    47               3
##  9    48               2
## 10    49               5
## # ... with 18 more rows

# Calculates the total number of participants aged 40 - 72
Older_age$Sample_size_old %>% 
  sum()

## [1] 75

(Young group) Calculating social distancing compliance for individual age groups

This code shows me the number of participants and the percentage that practice social distancing per age. I reused the old code from when I was just counting the participants and added another variable to my summarise function called “Social_Distancing” which calculates the mean of social distancing for each individual age group.

# Calculates social distancing compliance of each age from 18 - 22
Dist_by_young <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 18, Age <= 22) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing" = mean(SocialDistancing))
  

Dist_by_young

## # A tibble: 5 x 3
##     Age Sample_size Social_Distancing
## * <int>       <int>             <dbl>
## 1    18          12             0.917
## 2    19          13             1    
## 3    20          17             0.882
## 4    21          12             0.917
## 5    22          21             0.857

(Young group) Total mean social distancing compliance for ages (18 - 22)

I then wanted to find the total mean of social distancing across all ages in the young group. To do this, I reused the code from above and piped in a second summarise function to summarise the means we had just calculated into a new “Social distancing” variable that is the mean of those means. I also added in an “Age_range” variable that displays the young age range group we’re calculating. A sample_size varible was added again that sums the total of sample sizes we got in the previous code chunk (above).

# Calculates social distancing compliance of each age from 18 - 22
Dist_by_young <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 18, Age <= 22) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing" = mean(SocialDistancing)) %>% 
  summarise(Age_range = "18 - 22",
            Sample_size = sum(Sample_size),
            "Social distancing" = mean(Social_Distancing))

Dist_by_young

## # A tibble: 1 x 3
##   Age_range Sample_size `Social distancing`
##   <chr>           <int>               <dbl>
## 1 18 - 22            75               0.915

So roughly 91.5% of our 75 young participants, aged 18 - 22, practice social distancing.

(Old group) Calculating social distancing compliance for individual age groups

Here we are selecting Age and SocialDistancing, filtering for Ages between 40 and 72, grouping them together and calculating the sample size AND the percentage that social distance in that age group.

# Calculates social distancing compliance of each age from 40 - 72
Dist_by_old <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 40, Age <= 72) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing_old" = mean(SocialDistancing))

Dist_by_old

## # A tibble: 28 x 3
##      Age Sample_size Social_Distancing_old
##  * <int>       <int>                 <dbl>
##  1    40           5                   1  
##  2    41           4                   1  
##  3    42           5                   1  
##  4    43           3                   1  
##  5    44           1                   1  
##  6    45           5                   1  
##  7    46           7                   1  
##  8    47           3                   1  
##  9    48           2                   0.5
## 10    49           5                   0.8
## # ... with 18 more rows

(Old group) Total mean social distancing compliance for ages (40 - 72)

We then add onto the code above to find the total mean social distancing compliance for those aged 40 to 72 by summarising a new “Social distancing” variable that takes the mean of all the social distancing percentages. Sample size was also summed for all ages (40 - 72), and an Age_range variable for 40 - 72 was included.

# Calculates social distancing compliance of each age from 40 - 72
Dist_by_old <- study_2 %>% 
  select(Age, SocialDistancing) %>% 
  filter(Age >= 40, Age <= 72) %>% 
  group_by(Age) %>% 
  summarise(Sample_size = n(),
            "Social_Distancing_Old" = mean(SocialDistancing)) %>% 
  summarise(Age_range = "40 - 72",
            Sample_size = sum(Sample_size),
            "Social distancing" = mean(Social_Distancing_Old))


Dist_by_old

## # A tibble: 1 x 3
##   Age_range Sample_size `Social distancing`
##   <chr>           <int>               <dbl>
## 1 40 - 72            75               0.907

Merging the old and young dataframes together!!

Finally, I get to merge the two together into a nice gt plot and use fmt_percent (from the gt package) to convert the decimal for Social Distancing into a percentage.

Dist_by_age <- rbind(Dist_by_young, Dist_by_old)

gt(Dist_by_age) %>% 

fmt_percent(columns = vars("Social distancing")) # fmt_percent from the gt() pack

Age_range	Sample_size	Social distancing
18 - 22	75	91.46%
40 - 72	75	90.65%

Dist_by_age

## # A tibble: 2 x 3
##   Age_range Sample_size `Social distancing`
##   <chr>           <int>               <dbl>
## 1 18 - 22            75               0.915
## 2 40 - 72            75               0.907

The results in the final table show no significant difference in social distancing compliance between younger and older age groups. With compliance only 0.81% higher for younger participants. These results are not consistent with the Canning et al. (2020) paper. Perhaps similar results would have been achieved with a larger sample size as Canning et al. (2020) had n = 4676, while Folk et al. (2020) only had n = 336 for study 2.

3c. How does loneliness differ between genders during Covid-19

In my first reaction to the paper, I was surprised that the isolation of Covid-19 had no real impact on social connectedness and loneliness for the most introverted and extraverted participants. As a result, I wanted to explore to see if participant loneliness differed between genders during Covid-19 compared to before Covid-19. In study 2, the researchers measured loneliness pre-Covid using a 20-item loneliness measure. Participants loneliness scores were re-measured during Covid-19 (in early April). A score of 1 (never lonely) was the minimum score, while a score of 4 (often lonely) was the maximum score.

Read in libraries

library(ggplot2) # for creating the graph
library(tidyverse) # for ggplot
library(dplyr) # for dataframe manipulation

Read in the data

Study_2 <- read.csv("Study 2.csv")

Determining gender ratio

Here we are calling from Study_2, grouping by gender and using summarise to generate a column that specifies the percentage of genders in study 2. This is the same code that was used in the gender demographics for the verification section.

#55% males overall in study 2
Study_2 %>% 
  group_by (Gender) %>% 
  summarise (percent = 100 * n() / nrow (Study_2))

## # A tibble: 3 x 2
##   Gender     percent
## * <chr>        <dbl>
## 1 Female      44.9  
## 2 Male        54.8  
## 3 Non-Binary   0.298

Because the sample size is too small for the ‘Non-Binary’ category (n = 1), I will just use male and female because there’s almost an even 50/50 split between them.

Making data long

Because I’m choosing to compare the two visually, I’m going to make my data frame long to make life easier. I select the variables I need, and then pipe into a pivot_longer function. cols lets me select the columns I want to pivot. names_to lets me send the “T1Lonely” and “T2Lonely” variable names into the cells, and values_to sends the values of the loneliness scores to a new column called “values”, which aligns each value to its class.

GenderLonely_long <- Study_2 %>% 
  select(Gender, T1Lonely, T2Lonely) %>% 
  pivot_longer(cols = T1Lonely:T2Lonely, 
               names_to = "time",
               values_to = "values")

GenderLonely_long

## # A tibble: 672 x 3
##    Gender time     values
##    <chr>  <chr>     <dbl>
##  1 Male   T1Lonely   1.42
##  2 Male   T2Lonely   1.47
##  3 Female T1Lonely   3.16
##  4 Female T2Lonely   2.89
##  5 Female T1Lonely   2.68
##  6 Female T2Lonely   2.74
##  7 Male   T1Lonely   1.42
##  8 Male   T2Lonely   2.21
##  9 Male   T1Lonely   1.42
## 10 Male   T2Lonely   1.11
## # ... with 662 more rows

Creating the dataframe

I spent a lot of time fiddling around with this section. Originally I had split up my dataframes into male and female groups and merged them like I did in my 2nd exploratory analysis. But I knew with the data being long, that there would be a way to do it all in 1 code chunk. It turned out to be a lot simpler than I had thought. I plugged in my long dataset, grouped by gender and class, and then I created a new column that takes the means of the loneliness scores, grouped by gender and time. Finally, I filtered out the n = 1 non-binary participant due to the small sample size.

Gender_lonely <- GenderLonely_long %>% 
  group_by(Gender, time) %>% 
  summarise(Loneliness = mean(values)) %>% 
  filter(Gender != "Non-Binary")

## `summarise()` has grouped output by 'Gender'. You can override using the `.groups` argument.

Gender_lonely

## # A tibble: 4 x 3
## # Groups:   Gender [2]
##   Gender time     Loneliness
##   <chr>  <chr>         <dbl>
## 1 Female T1Lonely       2.08
## 2 Female T2Lonely       2.00
## 3 Male   T1Lonely       2.16
## 4 Male   T2Lonely       2.12

Graphing

I tried to simplify the graph as much as I could. I set my aesthetics to group the lines by Gender, and set my axes; linetype was set to Gender as well so that the lines would look different. I added geom_line() to add the line to graph, and geom_point to insert the points of means. scale_y_continuous lets me set the y axis limits with the limits argument, and bring the graph onto the x-axis with the expand argument.

Gender_lonelyline <- ggplot(Gender_lonely,
                            aes(group = Gender, 
                                x = time, 
                                y = Loneliness, 
                                linetype = Gender)) +
  geom_line() +
  geom_point() +
  scale_y_continuous(limits = c(1, 4), expand = c(0, 0)) +
  labs(x = "Time") +
  scale_x_discrete(labels = c("Before Pandemic", "During Pandemic")) +
  theme_gray()

Gender_lonelyline

Interestingly, the loneliness scores do not differ much between pre-covid and during covid, there is only a very slight decrease for both genders. On average, females seem to experience slightly less loneliness compared to males, both pre and during Covid-19.

Part 4: Recommendations.

4a. Clear variable names

I would recommend using clearer variable names in the codebook and R files. A good variable name lets you know what it is at a glance. For example, instead of using BMPN (the tool used to measure relatedness) to represent relatedness in the code, just use “Relatedness”. It sounds simple, but just naming things what they are can save a lot of confusion in reading code.

4b. Operationalising scores in the codebook

The studies had a number of measures, but never operationalised what was considered a “high” or “low” score for any of the variables. They also never outlined whether a higher score was better or worse for the variable it was measuring. Therefore, I had to go through the actual measures externally on Google and figure out what was considered a significant score which was a bit of a time waster.

4c. Including code for all analyses/figures

While this seems like an obvious one, just including code for all analyses, including figures. While it might take the researchers a while to annotate what the code is doing, this would make the whole process of replication so much faster. When trying to replicate, you wouldn’t have to waste so much time working out what packages, functions, arguments and aesthetics/themes they used. It would also open up the opportunity to first reproduce the researchers results with your own processes, and then cross reference afterwards if you got different results to see whether or not their code is accurate.

References

Canning, D., Karra, M., Dayalu, R., Guo, M., & Bloom, D. E. (2020). The association between age, COVID-19 symptoms, and social distancing behavior in the United States. MedRxiv.

Did social connection decline during the first wave of COVID-19?

Thomas Gillings

14/04/2021

Part 1: Summary and Reaction

Summary

Reaction

Part 2: Verification

Demographics data for:

Means and Correlations Among Variables at Time 1 and Time 2 for:

Physical Distancing by Country (Study 2)

Histograms for:

Error Bar Line Graph - Changes in social connectedness and loneliness for the most introverted and extraverted participants (Study 1 and Study 2; 95% CI error bars).

Study 1 - demographics

Read in libraries

Read in data

Confirm n = 467

Confirm Age demographics, confirm if M = 20.89, SD = 3.03

Confirm Gender demographics, confirm if 77% women

Study 2 Demographics

Read in data

Number of participants, confirm if n = 336

Mean and SD age of participants, confirm age M = 32.03 SD = 11.94

% of males in study 2, confirm if = 55% males

% of ethnicities in study 2, confirm if = 80% white

What percentage of participants are from the US & UK?, confirm if = 32% US, 27% UK

Means and Correlations Among Variables at Time 1 and Time 2

Study 1 - Means and Correlations Among Variables at Time 1 and Time 2

Read packages

Read in the data

Creating the table

Study 2 - Correlations Among Variables for Time 1 and Time 2

Physical distancing table (study 2)

Read libaries

Reading the data

Creating the table

Histograms

Study 1 - Distribution of Social Connectedness Differences scores

Read Packages

Read Data

Graphing

Hist() failure

geom_histogram(), a process

Study 2 - Distribution of relatedness and loneliness difference scores

Read packages

Reading the data

Graphing

Error Bar Line Graph - Changes in social connectedness and loneliness for the most introverted and extraverted participants (Study 1 and Study 2; 95% CI error bars)

Read packages

Graphing

My first failed attempt

My successful attempt

Part 3: Exploration

3a. Country vs. life satisfaction scores at time 1 and time 2

Loading libraries

Reading and creating data frame

Graphing

3b. Age and social distancing.

Load libraries

Read in the data.

Calculating number of participants by age, Young group: 18 -> 22 year olds

Calculating number of participants by age, Old group: 40 -> 72 year olds

(Young group) Calculating social distancing compliance for individual age groups

(Young group) Total mean social distancing compliance for ages (18 - 22)

(Old group) Calculating social distancing compliance for individual age groups

(Old group) Total mean social distancing compliance for ages (40 - 72)

Merging the old and young dataframes together!!

3c. How does loneliness differ between genders during Covid-19

Read in libraries

Read in the data

Determining gender ratio

Making data long

Creating the dataframe

Graphing

Part 4: Recommendations.

4a. Clear variable names

4b. Operationalising scores in the codebook

4c. Including code for all analyses/figures

References