Summary

The COVID-19 pandemic instigated radical change in everyone’s daily social behaviour. Particularly by depriving people of social contact through physical distancing restrictions and isolating lockdowns (Department of Health and Aged Care, 2020). This dramatic decrease in physical social interaction raises the question whether the pandemic has affected people’s subjective sense of social connection and as an extension their wellbeing. Folk and colleagues (2020) explore this question in their study and further stipulate that changes in social connection and wellbeing may depend on whether a person is an introvert or extravert. This is because introverts and extraverts have fundamentally different approaches to their social life (Smilie et al., 2019) which may influence the effects of social distancing. The distinction can have important practical implications as it could help identify individuals who are more susceptible to decreases in social connection and wellbeing due to the pandemic. Folk and colleagues (2020) predicted that social connection and wellbeing would decline during the pandemic and that extroverts would experience a greater decline compared to introverts as extroverts had ‘more social connection to lose’. 

To test the research question two experiments were conducted. Both studies capitalised on existing data collected prior to the pandemic on measures of social connection and wellbeing. The same individuals were then surveyed during the pandemic, enabling for within-in person change to be examined. 

In the first study, 467 undergraduates from a Canadian university were surveyed before the pandemic (Janurary-February 2020). The survey obtained self-assessed measures of social connection and lethargy (used as a proxy for wellbeing). Social connection was measured using the revised 20-item Social Connectedness Scale (Lee et al., 2001) which involved rating statements such as ‘I feel close to people’ on a 6-point Likert scale (1 = strongly disagree to 6 = strongly agree) and lethargy was measured with a 10-item measure that accessed mental and physical fatigue with items like ‘I am fatigued now’ (1 = not at all, 6 = extremely) (Folk et al., 2020). Next, the same participants were surveyed during the pandemic (April 2020) and social connection and lethargy were measured again with additional measures of extraversion and physical distancing information (eg. if they were physical distancing and how many people they got within 6 feet of them in the previous day). Extraversion was accessed with a 12-item subscale of the Big-Five inventory (Soto & John, 2017).

The second study involved 336 adults from the US and UK who were surveyed on self-assessed measures of personality, social connection and wellbeing before the pandemic (February 2020) and again during the pandemic (April 2020). The second survey also included the same physical distancing information as the first study. Social connection was accessed with two measures: the 6-item relatedness subscale of the Balanced Measure of Psychological Needs Scale (Sheldon & Hilpert, 2012) which asked participants to rate statements based on the past, and the 20-item UCLA Loneliness Scale (Russell et al., 1980) which asked more general questions about their social tendencies. Life satisfaction was measured by a 5-item Satisfaction with Life Scale (Diener et al., 2010) completed by rating items such as ‘I am satisfied with my life’ (1 = strongly disagree to 7 = strongly agree). Lastly, extraversion was measured with the 60-item Big-Five Inventory (Soto & John, 2017).

The results were consistent between both studies. Social connectedness and well-being remained relatively the same from pre to during pandemic (with a slight decline in the first study). It was also determined that extraverts had larger declines in social connection than introverts. However, when pre-pandemic levels of social connectedness was controlled for, that is when extraverts and introverts begin with the same level of social connection pre-pandemic, there was no significant decline between introverts and extraverts. Furthermore, over both studies, individuals who reported larger declines in social connection also experiences larger decreases in wellbeing. The implications of these findings are that contrary to the initial hypothesis, social connection did not change during the pandemic despite restrictions, suggesting that people are resilient in finding other avenues of social connection. The results offer a hopeful perspective that people often find opportunities for social bonding even in unprecedented times of upheaval.

Reaction

The most interesting part of this paper was that participants’ social connectedness was not largely impacted by restrictions, implying that people have found alternate ways to satisfy their need for social connection. Contrary to what was hypothesized, during the pandemic people have found alternate ways to satisfy their social connection whether that be through organized video ‘happy hours’ (Tiffany, 2020) or social media as an outlet to satisfy the need to belong. It was also interesting that change in social connectedness did not differ between introverts and extroverts once pre-pandemic levels were controlled for. Like the authors hypothesized I expected extraverts to be more impacted because they would be more likely to rely on physical social interaction to gain their sense of social connection and wellbeing. However, it appeared that this was not the case. The results made me rethink the ways in which social connection can occur and made me realize how resilient people can be in these unprecedented times.

It seems that the next step in this area of research would be to investigate the mechanisms behind the adaptability that was demonstrated in the study and how this may have changed over the course of the pandemic. The interesting results provided by Folk and colleagues (2020) can be further explored to determine what mechanisms people have harnessed to have a consistent social connection and wellbeing during the pandemic. These mechanisms could then be used to help individuals who did experience significant decreases in social connection and wellbeing. As noted by Folk and colleagues (2020) a segment of participants did experience significant decreases in social connection, particularly the most introverted participants. Furthermore, this study was conducted at the beginning of the first wave of the pandemic. Further research could explore whether stricter restrictions or a second or even third wave of COVID has an impact on social connection or if it will stay consistent with current findings. 

I wonder whether age has a large impact on social connection during the pandemic. I think it would be interesting to explore whether people from different age groups have had the same emotional reaction to the pandemic. This would be crucial to explore if the mechanism behind people’s resilience are tools that are accessed through technology. Older people are a very vulnerable age group and are less likely to be skilled in technological communication and on social media (Jacobs & Ellis, 2021). I would be very interested to investigate this line of questioning further in my exploratory section.

Verification

Reproducibility goals

The goals of the verification section are to reproduce the demographic descriptives, means and SD reported in the study and all figures from the article for both studies 1 and 2. The following are screenshots of the information and figures that we set out to reproduce:

Descriptive statistics

Study 1:

‘Our final sample included 467 participants (age: M = 20.89, SD = 3.03; 77% women) who completed both our Time1 and Time2 surveys and met our inclusion criteria.’

Study 2:

‘Our final sample comprised 336 participants (age: M = 32.03, SD = 11.94; 55% Male; 80% White; 45% single/never married; 32% U.S.; 27% U.K.) who completed both our Time1 and Time2 surveys and met our pre-registered inclusion criteria.’


Results (means and SD)

Study 1:

‘Almost all participants (98.5%) reported practicing physical/social distancing, and most participants indicated that no one outside their household came within 6 feet of them the day before (Mode = 0, M = 0.77, SD = 1.39).’

‘We found that participants reported lower levels of social connection during the COVID-19 pandemic (Time2: M = 3.97, SD = 0.85) than before (Time1: M = 4.11,SD = 0.88), t(466) = 4.19,p < .001’

‘Our most introverted participants exhibited a small drop (d = 0.14, 95% CI = [–0.11, 0.39]) in social connectedness between Time1 (M = 3.45, SD = 0.70) and Time2 (M = 3.35, SD = 0.66); t(118) = 1.73, p = .087’

‘The most extraverted participants exhibited a larger drop (d = 0.33, 95% CI = [0.08, 0.58]) in social connectedness between Time1 (M = 4.70,SD = 0.72) and Time2 (M = 4.45, SD = 0.84); t(129) = 3.49;p < .001.’

‘Lethargy increased from Time1 (M = 2.60,SD = 1.16) to Time2 (M = 3.16, SD = 1.27); t(466) = 9.21,p < .001; d = 0.46, 95% CI = [0.33, 0.60]’

Study 2:

‘No significant difference emerged between extraversion scores at Time1 (M = 3.90;SD = 0.79) and Time2 (M = 3.86, SD = 1.09), t(335) = –1.40,p = .162.’

92.9% of participants reported practicing physical/social distancing, and the modal person indicated that no one got within 6 feet of them the day before (Mode = 0, M = 1.11,SD = 0.75).’

‘We found no differences between participants’ reports of relatedness prior to the pandemic (Time1: M = 4.90, SD = 1.11) versus during the pandemic (Time2: M = 4.91, SD = 1.15), t(335) = –0.18,p = .857, d = 0.01, 95% CI = [–0.14, 0.16]’

‘Participants reported feeling a little less lonely during the pandemic (Time2: M = 2.07, SD = 0.63) than before (Time1: M = 2.14,SD = 0.67), t(335) = –2.63,p = .009, d = 0.12, 95% CI = [–.03, 0.27]’

‘Most extraverted participants showed no change in loneliness from Time1 (M = 1.64, SD = 0.51) to Time2 (M = 1.67, SD = 0.49),t(82) = –1.09, p = .279,d = 0.07, 95% CI = [–0.23, 0.38]’

‘Our most introverted participants decreased in loneliness from Time1 (M = 2.56, SD = 0.63) to Time2 (M = 2.31, SD = 0.63), t(79) = 4.02, p < .001,d = 0.39, 95% CI = [.07, .70].’

‘Life satisfaction did not change from Time1 (M = 3.96, SD = 1.56) to Time2 (M = 3.98, SD = 1.46); t(335) = –0.43, p = .666,d = 0.02, 95% CI = [–.13, .17].’


Figures

Table 1 (Means and correlations among variables at Time 1 and Time 2) (Study 1)

Figure 1 (Distribution of social connectedness difference scores) (Study 1)

Figure 2 (Distribution of relatedness and loneliness difference scores) (Study 2)


Table 3 (Correlations among variables for Time 1 and Time 2) (Study 2)

Figure 3 (Changes in social connectedness and loneliness for the most introverted and extraverted participants) (Study 1 and Study 2; 95% CI error bars)

Loading Packages

First we loaded the packages that are required for our code. We used tidyverse for most of functions in our codes. The tidyverse package also includes dplyr (used for data wrangling), readr (used to read data in R) and gglot2 (used to visualise our data by creating figures and tables) packages which means they do not need to be loaded separately.

The flextable package was used to customize the correlation table elements and the psych package is used for the describeby() function. The gridExtra package is used to combine graphs together in Figure 3.

library(tidyverse) #used for most functions required for our code and includes dplyr, readr and ggplot2 packages
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
## ✔ ggplot2 3.3.6     ✔ purrr   0.3.4
## ✔ tibble  3.1.8     ✔ dplyr   1.0.9
## ✔ tidyr   1.2.0     ✔ stringr 1.4.0
## ✔ readr   2.1.2     ✔ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(psych)
## 
## Attaching package: 'psych'
## 
## The following objects are masked from 'package:ggplot2':
## 
##     %+%, alpha
library(flextable)
## 
## Attaching package: 'flextable'
## 
## The following object is masked from 'package:purrr':
## 
##     compose
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

Reading that data

Next, we loaded experiment 1’s data set which was available by following the OSF links within the paper. It is important to note that data set provided by the authors is not the raw data set but rather the final sample data set, meaning that participants who were excluded from the data set had already been removed.

I saved experiment 1’s data set as ‘study_1_data.cvs’ in the working directory. To load it into my Rmd file is used the function read_csv() and labelled the data ‘data_1’ . read_cvs() is from the readr package which is in the tidyverse package.

data_1 <- read_csv("study_1_data.csv")
## Rows: 467 Columns: 15
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (2): Gender, Age
## dbl (13): Participant ID, LETHAVERAGE.T1, LETHAVERAGE.T2, LethDiff, SCAVERAG...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

The same processes was replicated for study 2’s data.

data_2 <- read_csv("study_2_data.csv")
## Rows: 336 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Gender, Ethnicity, Country
## dbl (14): Participant_ID, Age, T1Extraversion, T1SWLS, T2SWLS, SWLS_Diff, T1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Demographic Statistics

Study 1

Know we will begin reproducing the descriptive statistics! It was reported that the final sample for experiment 1 included 467 participants. To reproduce this I use the count() function, from the dplyr package which is part of tidyverse package, to find the final participants by counting the number of rows in the data frame.

count(data_1)
## # A tibble: 1 × 1
##       n
##   <int>
## 1   467

Next, I attempted to reproduce the mean and SD of the participant’s age (M = 20.89, SD = 3.03). First when looking at the Age column I noticed that a handful of the responses had ‘Decline to Answer’. I therefore used the filter() function from the dplyr package to filter out the non-responses.

data_1 %>% filter (Age != “[Decline to Answer]”)

mean(data_1$Age)

sd(data_1$Age)

Our first challenge! When this code is run an error message occurs. Based on this message I went back to the data frame to check if the responses were entered as numeric data. When looking at the tibble in the console I noticed that the age column was in characters (chr). To rectify this I consulted Google and the function as.numeric() was recommended as a possible solution. I applied the as.numeric() function by adding a new line of code, where age$Age means to select the subset Age from the data frame ‘age’. The round() function was also applied to round the output to 2 decimal places. Therefore, successfully reproduce the same result as the paper.

age <- data_1 %>% filter (Age != "[Decline to Answer]")
age <- as.numeric(age$Age)
mean(age) %>% round(digits = 2)
## [1] 20.89
sd(age) %>% round(digits = 2)
## [1] 3.03

The last descriptive statistic provided for study 1 the percentage of women in the sample (77%). First I used the filter() function to only include the response ‘woman’ and labelled it gender. Next I used the count() function to determine how many women were in the sample, the result was 360.

gender <- filter(data_1, Gender == "Woman") 
count(gender)
## # A tibble: 1 × 1
##       n
##   <int>
## 1   360

To convert this into a percentage I simply divided the number of women by the total number of participants (467 - as previously determined) and multiply by 100 to produce the percentage. I also rounded to the nearest whole number to reproduce the same result as the paper.

women <- (360/467)*100
round(women, digit = 0)
## [1] 77

Study 2

Moving on to reproducing study 2’s descriptive statistics. Similar to study 1 the count() function is used to determine the number of participants.

count(data_2)
## # A tibble: 1 × 1
##       n
##   <int>
## 1   336

Next, the mean and SD of the participant’s age is determined. The summarise() function was used to display the results in a single data frame to make the results look more aesthetic.

data_2 %>% summarise(mean = mean(Age), SD = sd(Age))
## # A tibble: 1 × 2
##    mean    SD
##   <dbl> <dbl>
## 1  32.0  11.9

The gender of participants is determined by firstly grouping the data frame by gender using the groupby() function. Next the percentage is found by finding n() which is the number of participants of each gender divided by the total number of participants (nrow(data_2)) then multiplied by 100. Because we are only provided with the male percentage, the male row is filtered to reproduce the same statistic.

data_2 %>% group_by(Gender) %>% summarise(percentage = n()/nrow(data_2)*100) %>% filter(Gender == 'Male')
## # A tibble: 1 × 2
##   Gender percentage
##   <chr>       <dbl>
## 1 Male         54.8

Next, the ethnicity of participants were determine with the same process as gender.

data_2 %>% group_by(Ethnicity) %>% summarise(percetage = n()/nrow(data_2)*100) %>% filter(Ethnicity == 'White')
## # A tibble: 1 × 2
##   Ethnicity percetage
##   <chr>         <dbl>
## 1 White          79.8

And then the country statistics.

data_2 %>% group_by(Country) %>% summarise(percetage = n()/nrow(data_2)*100) %>% filter(Country %in% c('USA','UK'))
## # A tibble: 2 × 2
##   Country percetage
##   <chr>       <dbl>
## 1 UK           26.8
## 2 USA          31.0

An interesting point to note is the authors reported that 45% of participants were single/never married. However, the data provided did not include this information! It is therefore not possible to fully reproduce the demographic statistics as we have been given an incomplete data set. This shows that reproducibility can be dependent on if the full, raw data is provided and if it is not it is possible that some results may not be reproducible.

Results

Study 1

For the results section of study 1, I first reproduced the percentage of participants who reported to be social distancing (98.5%). This was achieved by counting the number of participants who social distanced and dividing this number by the total number of participants, then multiply by 100 to obtain the percentage.

data_1 %>% 
  count(SocialDistancing)
## # A tibble: 2 × 2
##   SocialDistancing     n
##              <dbl> <int>
## 1                1   460
## 2                3     7
  (460/467)*100
## [1] 98.50107

Next, the paper reported: most participants indicated that no one outside their household came within 6 feet of them the day before (Mode = 0, M = 0.77, SD = 1.39). After some Googling I discovered that there was no mode function in R, to combat this I utilized the table() function. This function presented in a table format the frequency of how many people came within 6ft of the participant the day before. The result demonstrated that for 313 participants no people came within the 6ft of them which is the highest frequency. Therefore, confirming that the mode is 0.

The mean (0.77) and SD (1.39) was also reproduced by using the mean() and sd() functions. The $ sign extracts the SixFeet data from the data_1 data set to use the functions on.

table(data_1$SixFeet)
## 
##   0   1   2   3   4   5   7 
## 313  64  34  23  14  16   3
mean(data_1$SixFeet)
## [1] 0.7665953
sd(data_1$SixFeet)
## [1] 1.392468

The social connection means and SD for time 1 and 2 were determined using the same method as above. Where the SCAVERAGE.T1 and SCAVERAGE.T2 is the social connection averages from time 1 and 2.

  • Time1 (M = 4.11, SD = 0.88) to Time2 (M = 3.97, SD = 0.85)
mean(data_1$SCAVERAGE.T1)
## [1] 4.113212
sd(data_1$SCAVERAGE.T1)
## [1] 0.8785128
mean(data_1$SCAVERAGE.T2)
## [1] 3.974868
sd(data_1$SCAVERAGE.T2)
## [1] 0.8496547

Same process for lethargy.

  • Time1 (M = 2.60, SD = 1.16) to Time2 (M = 3.16, SD = 1.27)
mean(data_1$LETHAVERAGE.T1)
## [1] 2.597359
sd(data_1$LETHAVERAGE.T1)
## [1] 1.157475
mean(data_1$LETHAVERAGE.T2)
## [1] 3.162313
sd(data_1$LETHAVERAGE.T2)
## [1] 1.272857

Next, to determine the most introverted participants, the authors split the sample into those with an extroversion score at or below the 25th percentile. We achieved this we used the quantile() function which outputted the extraversion scores for those in the 0, 25th, 50th, 75th and 100th percentile ranges. From this information we determined that a score of 3.41667 and below puts a participant in the 25th percentile of extraversion. We then extracted the participants in the 25th percentile from the data set by using the subset() function and named the subset ‘Introvert’. The means and SD of social connectedness of the most introverted participants from time 1 and 2 were then obtained with the mean() and sd() functions from the SCAVERAGE.T1 and SCAVERAGE.T2 columns in the ‘Introvert’ subset.

  • Time1 (M = 3.45, SD = 0.70) and Time2 (M = 3.35, SD = 0.66)
quantile(data_1$EXTRAVERSION) 
##      0%     25%     50%     75%    100% 
## 1.50000 3.41667 4.16667 4.83333 6.75000
Introvert<- subset(data_1, EXTRAVERSION <= 3.41667) %>% as_tibble()
print(Introvert)
## # A tibble: 119 × 15
##    Partic…¹ Gender Age   LETHA…² LETHA…³ LethD…⁴ SCAVE…⁵ SCAVE…⁶  SCdiff EXTRA…⁷
##       <dbl> <chr>  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1        4 Man    26        1.2     5.4     4.2    4.47    2.37 -2.11      2.67
##  2        5 Woman  19        1.9     3.8     1.9    3.68    3.84  0.158     2.92
##  3       12 Woman  20        2.4     2.5     0.1    2.47    4     1.53      2.92
##  4       16 Woman  21        4       5.4     1.4    2.17    3     0.833     2   
##  5       18 Woman  19        3.5     4.9     1.4    3.74    4.05  0.316     2.92
##  6       22 Woman  21        1.7     5.2     3.5    3.89    3.05 -0.842     3.33
##  7       26 Woman  21        5.8     5.7    -0.1    3.05    2.68 -0.368     3   
##  8       28 Woman  27        3.6     4.2     0.6    2.32    2.68  0.368     3.25
##  9       30 Woman  22        3.1     4.6     1.5    3.42    3.05 -0.368     3.25
## 10       32 Woman  23        3.7     1.5    -2.2    2.63    2.58 -0.0526    3.17
## # … with 109 more rows, 5 more variables: SPANE.P <dbl>, SPANE.N <dbl>,
## #   SPANE.B <dbl>, SocialDistancing <dbl>, SixFeet <dbl>, and abbreviated
## #   variable names ¹​`Participant ID`, ²​LETHAVERAGE.T1, ³​LETHAVERAGE.T2,
## #   ⁴​LethDiff, ⁵​SCAVERAGE.T1, ⁶​SCAVERAGE.T2, ⁷​EXTRAVERSION
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
mean(Introvert$SCAVERAGE.T1)
## [1] 3.446572
sd(Introvert$SCAVERAGE.T1)
## [1] 0.7024337
mean(Introvert$SCAVERAGE.T2)
## [1] 3.351172
sd(Introvert$SCAVERAGE.T2)
## [1] 0.655226

The same processes as above was replicated for the most extraverted participants. The most extraverted participants are in the 75th percentile and above. As determined by the quantile() output a score of 4.8333 in extraversion and above is in the 75th percentile. The means and SD of social connection from time 1 and 2 are then determined for the most extraverted participants.

  • Time1 (M = 4.70,SD = 0.72) and Time2 (M = 4.45, SD = 0.84)
quantile(data_1$EXTRAVERSION) 
##      0%     25%     50%     75%    100% 
## 1.50000 3.41667 4.16667 4.83333 6.75000
Extravert <- subset(data_1, EXTRAVERSION >= 4.8333) %>% as_tibble()
print(Extravert)
## # A tibble: 130 × 15
##    Partic…¹ Gender Age   LETHA…² LETHA…³ LethD…⁴ SCAVE…⁵ SCAVE…⁶  SCdiff EXTRA…⁷
##       <dbl> <chr>  <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
##  1        1 Woman  25        4.8     5.7     0.9    3.21    2.95 -0.263     6.25
##  2        2 Woman  20        3.4     5       1.6    4       4.47  0.474     5.25
##  3        8 Woman  22        3.9     3.1    -0.8    4       4.11  0.105     5.92
##  4       10 Woman  21        2       1.1    -0.9    5.68    5.63 -0.0526    6.17
##  5       13 Woman  21        1.2     1.8     0.6    5.37    4.74 -0.632     5.33
##  6       14 Woman  22        2       2.1     0.1    3.84    4.68  0.842     5.08
##  7       17 Woman  22        1.5     5.8     4.3    4.05    3.32 -0.737     5.08
##  8       27 Woman  18        2.7     2.6    -0.1    5.21    3.21 -2         6.25
##  9       35 Man    22        1.5     2.2     0.7    5.74    5.74  0         5.75
## 10       36 Man    19        1.1     2.7     1.6    2.89    3.16  0.263     4.92
## # … with 120 more rows, 5 more variables: SPANE.P <dbl>, SPANE.N <dbl>,
## #   SPANE.B <dbl>, SocialDistancing <dbl>, SixFeet <dbl>, and abbreviated
## #   variable names ¹​`Participant ID`, ²​LETHAVERAGE.T1, ³​LETHAVERAGE.T2,
## #   ⁴​LethDiff, ⁵​SCAVERAGE.T1, ⁶​SCAVERAGE.T2, ⁷​EXTRAVERSION
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
mean(Extravert$SCAVERAGE.T1)
## [1] 4.704611
sd(Extravert$SCAVERAGE.T1)
## [1] 0.7228096
mean(Extravert$SCAVERAGE.T2)
## [1] 4.445344
sd(Extravert$SCAVERAGE.T2)
## [1] 0.835402

Study 2

Moving on to the study 2 results. We obtained the mean and SD of participant’s extraversion scores at time 1. However, the extraversion scores at time 2 were not provided by the authors. Therefore, we were not able to reproduce the mean and SD of extraversion at time 2. Although this is a small detail, it demonstrates the importance of providing a complete data set when publishing.

  • Time1 (M = 3.90;SD = 0.79) and Time2 (M = 3.86, SD = 1.09)
mean(data_2$T1Extraversion)
## [1] 3.896577
sd(data_2$T1Extraversion)
## [1] 0.7882516

To reproduce the social distancing statistics for study 2, we went through the same processes as study 1. A noticeable difference was that the reported percentage of participants practicing social distancing was 92.9%, however we reproduced 93.2%. Furthermore, the mean number of people that got within 6ft of the participants was also different by 0.01 and the SD was noticably different by 1.

As a team we are unsure of where these disparities arose from. However, throughout study 2’s results we failed to reproduce a lot of the statistics. It is possible that the study’s author’s may have included additional exclusion criteria that had not been applied to the final data set provided. This problem reaffirms the importance that author’s should provide readers with the complete, raw, unedited data set.

  • 92.9% of participants reported practicing physical/social distancing, and the modal person indicated that no one got within 6 feet of them the day before (Mode = 0, M = 1.11, SD = 0.75).’
data_2 %>% count(SocialDistancing)
## # A tibble: 2 × 2
##   SocialDistancing     n
##              <dbl> <int>
## 1                0    23
## 2                1   313
(313/336)*100
## [1] 93.15476
# Mode, Mean, SD
table(data_2$SixFeet)
## 
##   0   1   2   3   4   5   7 
## 204  41  23  24  16  22   6
mean(data_2$SixFeet)
## [1] 1.116071
sd(data_2$SixFeet)
## [1] 1.753864

As alluded above, participant’s relatedness prior and during the pandemic were for the most part not able to be reproduced. Only the mean at time 2 was successfully reproduced.

When trying to obtain the relatedness mean and SD we were at first very confused to which variable in the dataset contained the relatedness information. Luckily the author’s provided a codebook (https://osf.io/69hxa?view_only=14b462058b2745f8a51997a49f8b62e0) on OSF which detailed variable names and descriptions of each variable. After looking through the codebook we were able to determine that the T1BMPN and T2BMPN variables contained relatedness measures for time 1 and 2 respectively. This experience taught us the usefulness of having a codebook and how necessary they are when trying to understand the labels of variables in data. We were very grateful that a codebook was provided to us from the authors as it really helped with reproducibility!

  • (Time1: M = 4.90, SD = 1.11) and (Time2: M = 4.91, SD = 1.15)
mean(data_2$T1BMPN)
## [1] 4.922619
sd(data_2$T1BMPN)
## [1] 1.085849
mean(data_2$T2BMPN)
## [1] 4.911706
sd(data_2$T2BMPN)
## [1] 1.141836

Participant’s loneliness results were not reproducible :( the reasons for this were discussed above.

  • (Time1: M = 2.14,SD = 0.67) and (Time2: M = 2.07, SD = 0.63)
mean(data_2$T1Lonely)
## [1] 2.121711
sd(data_2$T1Lonely)
## [1] 0.6517097
mean(data_2$T2Lonely)
## [1] 2.064223
sd(data_2$T2Lonely)
## [1] 0.6208416

Participant’s life satisfaction results were not reproducible.

  • Time1 (M = 3.96, SD = 1.56) to Time2 (M = 3.98, SD = 1.46)
mean(data_2$T1SWLS)
## [1] 3.970833
sd(data_2$T1SWLS)
## [1] 1.527454
mean(data_2$T2SWLS)
## [1] 3.991667
sd(data_2$T2SWLS)
## [1] 1.450946

To find the most extraverted participants we used the same method as in study 1. The extraverted participant’s change in loneliness from time 1 to 2 were not reproducible as demonstrated below.

  • ‘Most extraverted participants showed no change in loneliness from Time1 (M = 1.64, SD = 0.51) to Time2 (M = 1.67, SD = 0.49)’
quantile(data_2$T1Extraversion) 
##       0%      25%      50%      75%     100% 
## 2.083333 3.333333 3.833333 4.416667 6.000000
Extravert2<- subset(data_2, T1Extraversion >= 4.416667) %>% as_tibble()
print(Extravert2)
## # A tibble: 83 × 17
##    Particip…¹ Gender   Age Ethni…² Country T1Ext…³ T1SWLS T2SWLS SWLS_…⁴ T1Lon…⁵
##         <dbl> <chr>  <dbl> <chr>   <chr>     <dbl>  <dbl>  <dbl>   <dbl>   <dbl>
##  1          1 Male      23 White   Canada     4.75    4.8    5       0.2    1.42
##  2         14 Female    21 Asian   USA        4.92    5.4    5.2    -0.2    1.42
##  3         18 Male      39 White   USA        6       5.8    5.2    -0.6    1.05
##  4         22 Female    27 Black   USA        4.67    3.8    3.4    -0.4    1.79
##  5         29 Female    22 White   Sweden     4.5     2.4    5.2     2.8    2.21
##  6         31 Male      33 More t… USA        5       5      5.4     0.4    1.47
##  7         36 Female    60 White   USA        4.5     2      2       0      3.32
##  8         37 Female    29 White   USA        5.42    5.2    4.8    -0.4    1.16
##  9         40 Female    46 White   USA        5.58    5.8    5.8     0      1.05
## 10         44 Female    27 White   Nether…    4.75    5.8    5.4    -0.4    1.42
## # … with 73 more rows, 7 more variables: T2Lonely <dbl>, Lonely_Diff <dbl>,
## #   T1BMPN <dbl>, T2BMPN <dbl>, BMPN_Diff <dbl>, SocialDistancing <dbl>,
## #   SixFeet <dbl>, and abbreviated variable names ¹​Participant_ID, ²​Ethnicity,
## #   ³​T1Extraversion, ⁴​SWLS_Diff, ⁵​T1Lonely
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
mean(Extravert2$T1Lonely)
## [1] 1.62714
sd(Extravert2$T1Lonely)
## [1] 0.494084
mean(Extravert2$T2Lonely)
## [1] 1.673431
sd(Extravert2$T2Lonely)
## [1] 0.4877916

Similarly, the most introverted participant’s loneliness scores were also not reproducible.

  • ‘Most introverted participants decreased in loneliness from Time1 (M = 2.56, SD = 0.63) to Time2 (M = 2.31, SD = 0.63)’
quantile(data_2$T1Extraversion) 
##       0%      25%      50%      75%     100% 
## 2.083333 3.333333 3.833333 4.416667 6.000000
Introvert2<- subset(data_2, T1Extraversion <= 3.333333) %>% as_tibble()
print(Introvert2)
## # A tibble: 80 × 17
##    Particip…¹ Gender   Age Ethni…² Country T1Ext…³ T1SWLS T2SWLS SWLS_…⁴ T1Lon…⁵
##         <dbl> <chr>  <dbl> <chr>   <chr>     <dbl>  <dbl>  <dbl>   <dbl>   <dbl>
##  1          9 Male      37 White   Ireland    2.08    1      1       0      3.84
##  2         19 Female    19 White   Sloven…    3.17    4      3.6    -0.4    2.74
##  3         27 Female    21 More t… UK         2.92    5.2    5      -0.2    2.47
##  4         32 Male      39 White   UK         3       2.2    2.6     0.4    2.58
##  5         33 Male      23 White   UK         3.17    4.2    4.2     0      2.11
##  6         38 Male      23 Asian   USA        3.08    2      2       0      2.63
##  7         39 Male      27 White   USA        2.92    4.6    6       1.4    1.89
##  8         41 Female    51 White   UK         3.25    5.6    4.8    -0.8    2.42
##  9         42 Female    28 White   Portug…    2.25    3.6    2      -1.6    3.37
## 10         43 Female    21 Middle… UK         2.75    4      3.2    -0.8    3.26
## # … with 70 more rows, 7 more variables: T2Lonely <dbl>, Lonely_Diff <dbl>,
## #   T1BMPN <dbl>, T2BMPN <dbl>, BMPN_Diff <dbl>, SocialDistancing <dbl>,
## #   SixFeet <dbl>, and abbreviated variable names ¹​Participant_ID, ²​Ethnicity,
## #   ³​T1Extraversion, ⁴​SWLS_Diff, ⁵​T1Lonely
## # ℹ Use `print(n = ...)` to see more rows, and `colnames()` to see all variable names
mean(Introvert2$T1Lonely)
## [1] 2.530921
sd(Introvert2$T1Lonely)
## [1] 0.6340831
mean(Introvert2$T2Lonely)
## [1] 2.3125
sd(Introvert2$T2Lonely)
## [1] 0.6250688

Figures

Table 1 - (Means and correlations among variables at Time 1 and Time 2) (Study 1)

# Turning correlation output into dataframe 
cor_1 <- data_1 %>% select(LETHAVERAGE.T1:EXTRAVERSION) %>% 
  cor() %>%
  round(digits=2) %>% 
  str_remove("^0+") %>% 
  str_remove("(?<=-)(.*?)(.)") %>% 
  matrix(nrow=7, ncol=7) %>% 
  as.data.frame()

# Finding the M and SD of each variable
data_1 %>% select(LETHAVERAGE.T1:EXTRAVERSION) %>% describeBy()
## Warning in describeBy(.): no grouping variable requested
##                vars   n  mean   sd median trimmed  mad   min  max range  skew
## LETHAVERAGE.T1    1 467  2.60 1.16   2.30    2.50 1.19  1.00 6.00  5.00  0.70
## LETHAVERAGE.T2    2 467  3.16 1.27   3.00    3.12 1.33  1.00 6.00  5.00  0.27
## LethDiff          3 467  0.56 1.33   0.40    0.54 1.33 -3.30 4.90  8.20  0.21
## SCAVERAGE.T1      4 467  4.11 0.88   4.16    4.14 0.94  1.16 5.95  4.79 -0.29
## SCAVERAGE.T2      5 467  3.97 0.85   4.00    3.99 0.94  1.63 5.95  4.32 -0.14
## SCdiff            6 467 -0.14 0.71  -0.11   -0.12 0.62 -3.05 2.16  5.21 -0.42
## EXTRAVERSION      7 467  4.17 1.01   4.17    4.15 1.11  1.50 6.75  5.25  0.09
##                kurtosis   se
## LETHAVERAGE.T1    -0.30 0.05
## LETHAVERAGE.T2    -0.90 0.06
## LethDiff           0.13 0.06
## SCAVERAGE.T1      -0.34 0.04
## SCAVERAGE.T2      -0.62 0.04
## SCdiff             1.60 0.03
## EXTRAVERSION      -0.53 0.05
# Creating new object with the M and SD 
new_row <- c("2.60(1.16)", "3.16 (1.27)", "0.56 (1.33)", "4.11 (0.88)", "3.97 (0.85)", "-0.14 (0.71)", "4.17 (1.01)")

# Binding new_row to bottom of cor_1
cor_1 <- cor_1 %>% rbind(new_row)

#Renaming variables 
rownames(cor_1) <- c("T1 Lethargy", "T2 Lethargy", "Lethargy diff(T2-T1)", "T1 Social Connectedness", "T2 Social Connectedness", "Connectedness diff(T2-T1)", "Extraversion", "Mean(SD)")


colnames(cor_1) <- c("T1 Lethargy", "T2 Lethargy", "Lethargy diff(T2-T1)", "T1 Social Connectedness", "T2 Social Connectedness", "Connectedness diff(T2-T1)", "Extraversion")

#Removing upper values 
cor_1[upper.tri(cor_1)]<- ""


# Creating an extra column for row names 
new_col <- c("T1 Lethargy", "T2 Lethargy", "Lethargy diff(T2-T1)", "T1 Social Connectedness", "T2 Social Connectedness", "Connectedness diff(T2-T1)", "Extraversion", "Mean(SD)")
new_col %>% as.data.frame()
##                           .
## 1               T1 Lethargy
## 2               T2 Lethargy
## 3      Lethargy diff(T2-T1)
## 4   T1 Social Connectedness
## 5   T2 Social Connectedness
## 6 Connectedness diff(T2-T1)
## 7              Extraversion
## 8                  Mean(SD)
cor_1 <- cor_1 %>% bind_cols(new_col)
## New names:
## • `` -> `...8`
cor_1 <-cor_1[,c(8,1,2,3,4,5,6,7)]

# Formatting the table 
cor_1 %>% 
  flextable() %>%
  bold(bold = TRUE, part = "header") %>% 
  set_header_labels(...8 = "") %>% 
  border_remove() %>% #get rid of all the default borders
  add_header_row(colwidths = c(8), values = c("Table 1: Means and Correlations Among Variables at Time 1 and Time 2 (Study 1)")) %>% 
  hline_top() # adds in the line underneath column names

Table 1 depicts the correlations among therapy, social connection and extraversion and the means at each time and difference. 

The first step was to select the relevant data from the data set. My team’s first challenge was that we only required some of the variables for the dataset. So we decided to attempt to create a new data set by extracting variables using the combine function. The following code demonstrates our initial attempt:

correlation <- c(Data$LETHAVERAGE.T1, Data$LETHAVERAGE.T2, Data$LethDiff, Data$SCAVERAGE.T1, Data$SCAVERAGE.T2, Data$SCdiff, Data$EXTRAVERSION)

cor(correlation)

However, ultimately we decided that the select() function would be more efficient as it is neater and shorter in code. The select() function chooses the LETHAGERAGE.T1 to EXTRAVERSION columns of the data_1 data set which are required for the table. The cor() function was then piped to create a table of correlations. The values were then rounded to 2 decimal points using round(). The main challenge we had was adding the Mean and SD for each variable as a new line in our table. This proved a relatively difficult feat as our research did not prove fruitful. We then consulted our tutor, who recommended that we turn our correlation output into a data frame and then manually add a row with the mean and SD. To turn our correlation output into a data frame we piped the as.data.frame() function.

Next, the mean and SD of each variable were obtained with the describeby() function which is part of the psych package. This function provides a quick way to find the means and SD of each variable.

After some googling, we found a way to add rows to a data frame. We created a new object called ‘new_row’ and manually put in the mean and SD values that were found above in the row for each associated variable with the combine function, this combined the values into a row. We then stuck the new_row to the correlation data set using rbind(). 

Next, we fixed some aesthetic differences in our name. First, we renamed the variables to match the original table using rownames() and colnames(). Then removed the upper half of the values in the table using this line of code: cor_1[upper.tri(cor_1)]<- ““. 

We then ran into another problem. When we ran the code up to this section we noticed that the table no longer produced the first column which consisted of the names for the variables. Luckily this was a relatively easy fix, as with our previous coding knowledge we already knew how to create another column and bind that to our table. So we created an extra column for the row names called ‘new_col’ using the combine function, converted it into a data frame using as.data.frame() and then bound it to the cor_1 table with bind_cols. However, then the new_col was bound to the end of the table. To fix this we reordered the columns by using the last line of the following code which placed the 8th column at the beginning. This challenge taught us that coding really does become easier over time and that the R journey is slow but is being built with each challenge.

The last challenge that we faced for table 1 was formatting. Through some research we stumbled upon a blog (https://rempsyc.remi-theriault.com/articles/table#Custom_cell_formatting%5D%20and%2) that introduced us to the rempsyc package. This package styles tables into APA 7th edition format and is compatible with the flextable package which we used to further format the table. 

The function nice_table() formats the table to APA 7th edition standards. Next, the function bold() is used to make the top row of headings bold. The set_header_labels() function changes the header 8 to blank as indicated by the empty quotation marks. The default borders are then removed with border_remove() which is part of the flextable package. Next, the title is added with the add_header_row which is also part of the flextable package with the colwidths being 8 to span the whole width of the table. Lastly, the hiline_top() function added the line underneath the column names to successfully reproduce table 1!

Table 3 - Correlations among variables for time 1 and time 2 (Study 2)

# Turning correlation output into dataframe 
cor_2 <- data_2 %>% select(T1Extraversion:BMPN_Diff) %>% 
  cor() %>% 
  round(digits = 2) %>% 
  as.data.frame()

# Rearrange variables to match the order in original table 
cor_2 <- cor_2[,c(2,3,4,8,9,10,5,6,7,1)] #Extraversion and Relatedness move columns
cor_2 <- cor_2[c(2,3,4,8,9,10,5,6,7,1),] #Extraversion and Relatednesss move rows 


# Finding M and SD 
data_2 %>% 
  select(T1Extraversion:BMPN_Diff) %>%
  describeBy()
## Warning in describeBy(.): no grouping variable requested
##                vars   n  mean   sd median trimmed  mad   min  max range  skew
## T1Extraversion    1 336  3.90 0.79   3.83    3.88 0.86  2.08 6.00  3.92  0.15
## T1SWLS            2 336  3.97 1.53   4.20    4.01 1.48  1.00 7.00  6.00 -0.23
## T2SWLS            3 336  3.99 1.45   4.20    4.05 1.48  1.00 7.00  6.00 -0.31
## SWLS_Diff         4 336  0.02 0.88   0.00    0.04 0.59 -5.00 2.80  7.80 -0.55
## T1Lonely          5 336  2.12 0.65   2.05    2.10 0.78  1.00 3.84  2.84  0.28
## T2Lonely          6 336  2.06 0.62   2.05    2.04 0.70  1.00 3.53  2.53  0.27
## Lonely_Diff       7 336 -0.06 0.40  -0.05   -0.05 0.31 -1.47 1.58  3.05 -0.03
## T1BMPN            8 336  4.92 1.09   5.00    4.98 1.24  1.83 7.00  5.17 -0.43
## T2BMPN            9 336  4.91 1.14   5.00    4.95 1.24  1.67 7.00  5.33 -0.30
## BMPN_Diff        10 336 -0.01 1.11   0.00    0.04 0.99 -5.17 3.00  8.17 -0.60
##                kurtosis   se
## T1Extraversion    -0.63 0.04
## T1SWLS            -0.78 0.08
## T2SWLS            -0.84 0.08
## SWLS_Diff          3.37 0.05
## T1Lonely          -0.73 0.04
## T2Lonely          -0.73 0.03
## Lonely_Diff        1.72 0.02
## T1BMPN            -0.18 0.06
## T2BMPN            -0.38 0.06
## BMPN_Diff          1.54 0.06
# New row of M and SD
new_row <- c("3.97(1.53)", "3.99(1.45)", "0.02(0.88)", "4.92(1.09)", "4.91(1.14)", "–0.01(1.11)", "2.12(0.65)", "2.06(0.62)", "–0.06(0.40)", "3.90(0.79)")
cor_2<- rbind(cor_2, new_row)

# Renaming column and row titles 
rownames(cor_2) <- c("T1 Life Satisfaction", "T2 Life Satisfaction", "Life Satisfaction change (T2-T1)", "T1 Relatedness", "T2 Relatedness", "Relatedness change (T2-T1)", "T1 Loneliness", "T2 Loneliness", "Loneliness change (T2-T1)", "T1 Extraversion", "Mean(SD)")

colnames(cor_2) <- c("T1 Life Satisfaction", "T2 Life Satisfaction", "Life Satisfaction change (T2-T1)", "T1 Relatedness", "T2 Relatedness", "Relatedness change (T2-T1)", "T1 Loneliness", "T2 Loneliness", "Loneliness change (T2-T1)", "T1 Extraversion")

# Removing upper values
cor_2[upper.tri(cor_2)]<- ""

# Creating an extra column for row names 
new_col2 <- c("T1 Life Satisfaction", "T2 Life Satisfaction", "Life Satisfaction change (T2-T1)", "T1 Relatedness", "T2 Relatedness", "Relatedness change (T2-T1)", "T1 Loneliness", "T2 Loneliness", "Loneliness change (T2-T1)", "T1 Extraversion", "Mean(SD)")

new_col2 %>% as.data.frame()
##                                   .
## 1              T1 Life Satisfaction
## 2              T2 Life Satisfaction
## 3  Life Satisfaction change (T2-T1)
## 4                    T1 Relatedness
## 5                    T2 Relatedness
## 6        Relatedness change (T2-T1)
## 7                     T1 Loneliness
## 8                     T2 Loneliness
## 9         Loneliness change (T2-T1)
## 10                  T1 Extraversion
## 11                         Mean(SD)
#Bind the new column to the dataframe 
cor_2 <- bind_cols(cor_2, new_col2)
## New names:
## • `` -> `...11`
cor_2 <- cor_2[,c(11,1,2,3,4,5,6,7,8,9,10)] #rearrange so new_col2 is at front

# Formatting the table - similar to Study 1 
cor_2 %>% 
  flextable() %>% 
  bold(bold = TRUE, part = "header") %>% 
  set_header_labels( ...11 = "") %>% 
  border_remove() %>%
  add_header_row(colwidths = c(11), values = c("Table 3: Correlations Among Variables for Time     1 and Time 2 (Study 2)")) %>% 
  hline_top()

The process of reprodoucing table 3 was extremely similar to table 1. This is because we were producing information and table as table 1 but this time using study 2’s data. There was however an additional step that needed to be taken. After turning the correlation output into a dataframe, we noticed that the order of the variables in dataframe did not match the order in the original table. To rectify this we use rearranged the variables using the same method that we used for table 1. The code looked as follows: 

cor_2 <- cor_2[,c(2,3,4,8,9,10,5,6,7,1)] #Extraversion and Relatedness move columns

cor_2 <- cor_2[c(2,3,4,8,9,10,5,6,7,1),] #Extraversion and Relatednesss move rows 

The variables then matched the original figure! From there we followed the same steps that were taken for table 1 to successfully reproduce the correlations among variables for time 1 and 2 for study 2.

Figure 1 - Distribution of social connectedness difference scores (Study 1)

hist(data_1$SCdiff,
     main = "Distribution of Social Connectedness Difference Score", 
     xlab="Social Connectedness Difference Score (T2-T1)", 
     xlim=c(-3,3), ylim = c(0,200),
     las = 1,
     xaxt="n",bty="l")
axis(1, line=-.5) 

Figure 1 is a histogram that shows the distribution of social connectedness difference scores for study 1. To replicate this figure we utilised the hist() function which we found in our google searches. To use the hist() function is quite straightforward. We took the social connectedness difference scores from study 1’s data set (data_1$SCdiff), the main in the hist function creates the title, the xlab creates the x-axis whilst the xlim and ylim are the range of difference scores for the x and y-axis that we want in the histogram. When the histogram is produced at this point the number on the y-axis are horizontal. To rectify this we used las to turn the values vertically. The main challenge that we faced for figure 1 was moving the x-axis so it lined up with the y-axis. On a stackoverflow forum (https://stackoverflow.com/questions/3491175/adjusting-x-axis-in-r-histograms) I found a code which removes the existing x-axis (xaxt=“n”,bty=“l”) and I then inserted a new axis with: axis(1, line=-.5). Thus successfully reproducing figure 1 yay!

Figure 2 - Distribution of relatedness and loneliness difference scores (Study 2)

# Creating a grid to combine the two graphs 
par(mfrow = c(1,2))

## Distribution of Relatedness Difference Scores 
relatedness_hist <- hist(data_2$BMPN_Diff, 
     main = "Distribution of Relatedness Difference Scores", 
     xlab="Relatedness Difference Score (T2-T1)", 
     xlim=c(-4,4), ylim = c(0,80),
     breaks = 13, las = 1,
     xaxt="n",bty="l")
axis(1, line=-.5)

## Distribution of Loneliness Difference Scores 
loneliness_hist <- hist(data_2$Lonely_Diff,
                       main = "Distribution of Loneliness Scores", 
                       xlab="Loneliness Difference Score (T2-T1)", 
                       xlim=c(-2,2), ylim = c(0,50),
                       breaks = 40,
                       xaxt="n",bty="l",
                       yaxt="n",bty="l")
axis(1, line=-.5)
axis(2, line=-.4, las = 1)

Figure 2 consists of two histograms, the distribution of relatedness difference scores and the distribution of loneliness difference scores. Although the hist() function works the same as in figure 1, there are some additional challenges that we faced. Firstly, when attempting to reproduce the relatedness score histogram our graph looked very different from the original figure. The number of bars was different and the frequency of relatedness scores went well over 80. This result was the same for the loneliness graph. The relatedness result is depicted below:

hist(data_2$BMPN_Diff,
      main = "Distribution of Relatedness Difference Scores", 
      xlab="Relatedness Difference Score (T2-T1)", 
      xlim=c(-5,4), ylim = c(0,150),
     las = 1,
     xaxt="n",bty="l")
axis(1, line=-.5)

Initially, our first reaction was to try a different method. We utilised ggplot but as demonstrated below it became evident that functionally and aesthetically it was very different as it used count instead of frequency on the y-axis.

ggplot(data_2) + geom_histogram(aes(x = BMPN_Diff), binwidth = 0.5, 
                                                colour = "black", fill = "grey")

We then decided to go back and keep working with the hist() function. I then came across a website (https://www.rdocumentation.org/packages/graphics/versions/3.6.2/topics/hist) that detailed the function breaks. The break function determines how many columns were used within the range of the histogram. It allowed us to change the number of columns in our histogram and our reproduction became identical to the original graph as the breaks also changed frequency at each number. This challenge showed us that sometimes persevering and working through a challenge instead of finding alternative solutions is sometimes the best path to take when coding in R.

However, the challenge is not yet over. The two graphs have to be combined together. Luckily, with some additional googling the function par(mfrow = …) successfully combined the graphs allowing us to reproduce the graph!

Figure 3 - Changes in social connectedness and loneliness for the most introverted and extraverted participants)

# Summary for points needed in social connectedness figure
df_SC <- data_1 %>% 
  mutate(Quartile = ntile(EXTRAVERSION, 4)) %>% 
  filter(Quartile == '1' | Quartile == '4') %>% 
  group_by(Quartile) %>% 
  summarize(Mean_T1 = mean(SCAVERAGE.T1),
            SD_T1 = sd(SCAVERAGE.T1),
            N_T1 = n(),
            Mean_T2 = mean(SCAVERAGE.T2),
            SD_T2 = sd(SCAVERAGE.T2),
            N_T2 = n()) %>% 
  mutate(SE_T1 = SD_T1 / sqrt(N_T1),
         LowerCI_T1 = Mean_T1 - qt(1 - (0.05 / 2), N_T1 - 1) * SE_T1,
         UpperCI_T1 = Mean_T1 + qt(1 - (0.05 / 2), N_T1 - 1) * SE_T1) %>% 
  mutate(SE_T2 = SD_T2 / sqrt(N_T2),
         LowerCI_T2 = Mean_T2 - qt(1 - (0.05 / 2), N_T2 - 1) * SE_T2,
         UpperCI_T2 = Mean_T2 + qt(1 - (0.05 / 2), N_T2 - 1) * SE_T2) %>%
  select(-c(SD_T1, N_T1, SD_T2, N_T2, SE_T1, SE_T2)) %>% 
  rename("Before Pandemic" = Mean_T1) %>% 
  rename("During Pandemic" = Mean_T2) %>% 
  gather(key = "Time", value = "Mean", "Before Pandemic", "During Pandemic")

# Summary for points needed in loneliness figure
df_lonely <- data_2 %>% 
  mutate(Quartile = ntile(T1Extraversion, 4)) %>% 
  filter(Quartile == '1' | Quartile == '4') %>%  
  group_by(Quartile) %>% 
  summarize(Mean_T1 = mean(T1Lonely),
            SD_T1 = sd(T1Lonely),
            N_T1 = n(),
            Mean_T2 = mean(T2Lonely),
            SD_T2 = sd(T2Lonely),
            N_T2 = n()) %>% 
  mutate(SE_T1 = SD_T1 / sqrt(N_T1),
         LowerCI_T1 = Mean_T1 - qt(1 - (0.05 / 2), N_T1 - 1) * SE_T1,
         UpperCI_T1 = Mean_T1 + qt(1 - (0.05 / 2), N_T1 - 1) * SE_T1) %>% 
  mutate(SE_T2 = SD_T2 / sqrt(N_T2),
         LowerCI_T2 = Mean_T2 - qt(1 - (0.05 / 2), N_T2 - 1) * SE_T2,
         UpperCI_T2 = Mean_T2 + qt(1 - (0.05 / 2), N_T2 - 1) * SE_T2) %>% 
  select(-c(SD_T1, N_T1, SD_T2, N_T2, SE_T1, SE_T2)) %>% 
  rename("Before Pandemic" = Mean_T1) %>% 
  rename("During Pandemic" = Mean_T2) %>% 
  gather(key = "Time", value = "Mean", "Before Pandemic", "During Pandemic")

#Converting tibbles into data frames
df_SC <- as.data.frame(df_SC)
df_lonely <- as.data.frame(df_lonely)

# Converting Quartile into factor variables
df_SC$Quartile <- as.factor(df_SC$Quartile)
df_lonely$Quartile <- as.factor(df_lonely$Quartile)

# Making the figures
SocialConnectedness <- ggplot(df_SC, aes(x = Time, y = Mean, group = Quartile)) +
  geom_point(aes(group = Mean)) +
  geom_errorbar(aes(x = "Before Pandemic", ymin = LowerCI_T1, ymax = UpperCI_T1),
                width = 0.1) +
  geom_errorbar(aes(x = "During Pandemic", ymin = LowerCI_T2, ymax = UpperCI_T2),
                width = 0.1) +
  geom_line(aes(linetype = Quartile)) +
  scale_linetype_manual(values=c("dashed", "solid")) +
  ggtitle("Social Connectedness Changes Based on Extraversion") + 
  theme(plot.title = element_text(hjust = 0.5),
        legend.position = "none",
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black")) +
  ylab("Mean Social Connectedness") +
  expand_limits(y=3) + expand_limits(y=5) +
  scale_y_continuous(breaks=0:20*0.5) 

Loneliness2 <- ggplot(df_lonely, aes(x = Time, y = Mean, group = Quartile)) +
  geom_point(aes(group = Mean)) +
  geom_errorbar(aes(x = "Before Pandemic", ymin = LowerCI_T1, ymax = UpperCI_T1),
                width = 0.1) +
  geom_errorbar(aes(x = "During Pandemic", ymin = LowerCI_T2, ymax = UpperCI_T2),
                width = 0.1) +
  geom_line(aes(linetype = Quartile)) +
  scale_linetype_manual(values=c("dashed", "solid"),
                        labels = c("Most Introverted", "Most Extraverted")) +
  ggtitle("Loneliness Changes Based on Extraversion") + 
  theme(plot.title = element_text(hjust = 0.5),
        legend.title=element_blank(),
        panel.background = element_blank(), 
        axis.line = element_line(colour = "black")) +
  ylab("Mean Loneliness") + 
  expand_limits(y=1) + expand_limits(y=3) +
  scale_y_continuous(breaks=0:20*0.5)

# Combining the two figures into one picture
grid.arrange(SocialConnectedness, Loneliness2, ncol=2, widths=c(3,4))

Figure 3 was the most challenging component of the paper to reproduce. It required us to create two line graphs with error bars. We focused on producing the social connectedness graph first. Because this graph contained information for only the most introverts and extraverted participants, we needed to establish what quartiles of extraversion each participant fit into. We created this line of code: mutate(Quartile = ntile(T1Extraversion, 4)) to create a new column in the dataset called Quartile which labelled each participant in one of 4 quartiles according to their extraversion score. We then filtered for quartiles 1 and 4 as this provided us with the most and least extraverted participants. We then grouped participants based on their quartile using group_by(). Based on data set 1 the social connectedness means, SD and number of participants based on quartiles for time 1 and 2 were found with the summarize() function. These values were required for the SE to calculate the confidence intervals for the error bars. Next, the mutate function was used to create a new column for the social connectedness SE, and the lower and upper confidence interval values for time 1. The same values were then created for time 2. The SD, n and SE values were no longer required and therefore were deleted from the data set. The variables mean at time 1 and 2 were renamed using rename() to ‘Before Pandemic’ and ‘After Pandemic’ to match the x-axis of the original figure.

The gather() function was then used to convert the data set from a wide-format to long-format which the values are the variables (‘Mean’, “Before Pandemic’, ‘During Pandemic’) and the key (‘Time’) are the key variables. This change is illustrated below:

Wide-format

Long-format

This change ensured we were able to produce the error bars as our x(time) and y(mean) and key(quartile) were now clearly defined in the long-format data set. 

Next, we converted the data set into a data frame (df_SC <- as.data.frame(df_SC)). This step was necessary so we can convert the quartiles into a factor variable, which is the next step: df_SC$Quartile <- as.factor(df_SC$Quartile). This line of code makes the quartiles of most introverts and extraverted into factors so they can be later turned into legends in the figure. 

Now we will begin making the figure. We are using ggplot to create our figure. The first line of code: ggplot(df_SC, aes(x = Time, y = Mean, group = Quartile)) defined our x, y and group (factor variable). Geom_point() function was then used to make is the 4 points in the graph (social connectedness means). The errors bars were added with geom_errorbar() function for time 1 and 2. The points were then connected with geom_line() based on the factor variable which is the lower and higher quartiles. The solid and dashed lines were then manually inputted using scale_linetype_manual. 

The title, themes, axis titles and further aesthetic features were also coded to ensure that our graph was a successful reproduction of the paper’s and that it followed APA guidelines.

One of the most challenging parts of reproducing this part of the graph is that initially, we believed that it could not be reproduced. This is because for the longest time we did not realise that the social connectedness part of the graph was based on study 1! We had assumed that because it was under Study 2’s exploratory analysis it was using the relatedness data from study 2, which meant that we were producing a significantly different-looking graph than what was shown in the paper. It wasn’t until we came together as a group and tested the code together that we realised, the graph is illustrating the social connectedness data from study 1. Once this was discovered we were finally able to successfully reproduce the graph with no problems. This experience showed us that when we come together to work as a team we are able to pick up little details as a group that we missed individually this has really helped us in our coding journey.

The loneliness changes based on extraversion part of the graph was produced with the same process as the social connectedness part of the graph. It used data from study 2, specifically the T1Lonely and T2Lonely variables. 

To connect the graphs together the grid.arrange() function from the gridExtra package was used. This function allowed us to control the size of each graph with ‘widths’. Now we have successfully reproduced Figure 3!

Exploratory

Question 1 - Does changes in social connectedness differ between countries?

Although Folk et al., (2020) determined social connectedness did not significantly change during the pandemic I was interested if changes in social connectedness differed between countries. Folk and colleague acknowledge that there was substantial variability in participants’ experiences during the pandemic depending on the restrictions in their nation. This variability could have a significant impact on people’s social connectedness when comparing between countries.

For study 2 in Folk and colleague’s paper, data was collected from participants between the 1-8 April 2020. During this period of time, restrictions in the United Kingdom detailed that citizens were unable to leave their homes ‘without reasonable excuse’ - which included exercise and shopping (BBC, 2020). In contrast, in the United States, depending on the state, official stay at home orders may not have been implemented until April 7th (Mervoshh et al., 2020). Therefore, the differing severity of restrictions could have impacted people’s sense of social connectedness during the pandemic. Folk and colleagues did not explore this possibility as the majority of participants in study 2 reported to participate in social distancing (mode = 0) and not going within 6ft of anyone outside their household the previous day (mode = 0). Although this suggests that social interaction decreased during the pandemic it does not address the potential effect of differing severity between national restrictions. For example, Folk and colleagues (2020) report that participants from Poland had the lowest levels of social distancing (79%) whilst Portugal and Canada participants reported the highest (100%). My question therefore explores the possibility of varying social connectedness between countries during the pandemic.

First I loaded the libraries and data required for my analyses.

library(tidyverse)
library(dplyr)
data_2 <- read_csv("study_2_data.csv")
## Rows: 336 Columns: 17
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): Gender, Ethnicity, Country
## dbl (14): Participant_ID, Age, T1Extraversion, T1SWLS, T2SWLS, SWLS_Diff, T1...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

To begin exploring this question I determined how many participants were surveyed from each country in Study 2 during the count() function. I also arranged the data in descending order to determine where most participants are from.

data_2 %>% count(Country) %>% arrange(desc(n))
## # A tibble: 29 × 2
##    Country      n
##    <chr>    <int>
##  1 USA        104
##  2 UK          90
##  3 Poland      29
##  4 Portugal    23
##  5 Canada      15
##  6 Greece      11
##  7 Italy       10
##  8 Germany      6
##  9 Mexico       6
## 10 Slovenia     5
## # … with 19 more rows
## # ℹ Use `print(n = ...)` to see more rows

From this information I decided that it would be interesting to test difference between the US and UK as they had the greatest number of participants and were both heavily impacted by the pandemic. I also thought that comparing Poland and Portugal may result in interesting results as Poland was reported to have the lowest levels of social distancing whilst Portugal one of the highest (Folk et al., 2020). Canada also had the highest levels of social distancing but Portugal was chosen over Canada because it has a greater number of participants which means more statistical power.

The descriptive statistics were then calculated. I grouped data_2 by country and then determined the mean and SD for the social connectedness difference scores for each country. The SE was also obtained to later turn into error bars in the data visualisation.

exp1data <- data_2 %>% group_by(Country) %>% 
  summarise(mean = mean(BMPN_Diff),
            SD = sd(BMPN_Diff),
            n = n(),
            SE = SD/sqrt(n)) %>% 
  filter(Country %in% c('USA', 'UK', 'Poland', 'Portugal'))
exp1data
## # A tibble: 4 × 5
##   Country      mean    SD     n    SE
##   <chr>       <dbl> <dbl> <int> <dbl>
## 1 Poland   -0.00575 1.52     29 0.282
## 2 Portugal  0.00725 0.712    23 0.148
## 3 UK        0.0333  1.17     90 0.124
## 4 USA      -0.125   1.06    104 0.104

To visualise the descriptive statistics I used ggplot. Country was put on the x-axis and mean social connectedness difference was put on the y-axis. Geom_col() created the columns in the graph and geom_errorbar() added the error bars with the SE calculated above. The next three functions added labels and titles to the graph.

When analysing the data visualisation I noted that Poland’s social connectedness increased during the pandemic (a negative difference between T1 and T2), in comparison to Portugal’s decrease. This is contrary to what was expected as Poland was reported to have the lowest social distancing whilst Portugal the highest. Also note that participants from the US experienced increased social connection during the pandemic whilst participants from the UK experience decreased social connectedness.

The data visualisation also illustrated very large SEs across all countries. This indicates that there is large variability within scores between participants and that the sample mean may not be an accurate reflection of the actual population mean. Having a large SE may make it difficult to obtain a significant inferential result.

exp1data %>% ggplot(aes(Country), mean, fill = Country) +
  geom_col(aes(y = mean)) +
  geom_errorbar(aes(ymin=mean-SE, ymax=mean+SE), width = 0.1) +
  scale_x_discrete(name = 'Country') +
  scale_y_continuous(name = 'Mean Social Connectedness Difference') +
  ggtitle(label = 'Mean Social Connectedness Difference Across Countries')

For the inferential statistics I carried out two separate independent samples t-test. One comparing participants from the US and UK and the other comparing Poland and Portugal.

The t-test analysis showed that there was no significant difference in social connectedness between the US and UK with t(180) = -0.98, p = 0.33 and 95% CI (-0.48, 0.16).

USA <- data_2 %>% filter(Country == 'USA')
UK <- data_2 %>% filter(Country == 'UK')

t.test(USA$BMPN_Diff, UK$BMPN_Diff, paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  USA$BMPN_Diff and UK$BMPN_Diff
## t = -0.98097, df = 180.85, p-value = 0.3279
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.4768126  0.1601460
## sample estimates:
##   mean of x   mean of y 
## -0.12500000  0.03333333

The second t-test analysis also showed that there was no significant difference in social connectedness between the Poland and Portugal with t(41) = -0.04, p = 0.97 and 95% CI (-0.66, 0.63).

Poland <- data_2 %>% filter(Country == 'Poland')
Portugal <- data_2 %>% filter(Country == 'Portugal')
t.test(Poland$BMPN_Diff, Portugal$BMPN_Diff, paired = FALSE)
## 
##  Welch Two Sample t-test
## 
## data:  Poland$BMPN_Diff and Portugal$BMPN_Diff
## t = -0.040734, df = 41.561, p-value = 0.9677
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.6569328  0.6309458
## sample estimates:
##    mean of x    mean of y 
## -0.005747126  0.007246377

Results

Therefore, these inferential tests demonstrate that change in social connection during the pandemic is not significantly different between the UK and US or Poland and Portugal, despite each country having varying levels of restrictions. An important point to keep in mind is the very high SE. Researchers should set out to have a large sample sizes to minimise variability in scores as much as possible. Because of the large variability it was difficult to obtain a significant result even if there was differences in overall means.

Question 2 - Does social connectedness differ between age?

The next exploratory question I wanted to ask was whether age influences social connection during the pandemic. Folk and colleagues (2020) concluded that people have found alternate ways to satisfy their need for social connection. For example, people have organized video call ‘happy hours’ (Tiffany, 2020) and used social media as an outlet to satisfy the need to belong. However, there was major concern that older adults may be more likely to be more disconnected from technology, living alone and suffering from chronic illnesses which may leave them more likely to be socially isolated (). In fact nearly a quarter of people aged 65 and older are reported to be socially isolated (). Consequently, it is possible that the pandemic has had a greater effect on older people’s social connectedness compared to others. Since Study 1 only involved university students I concentrated my analysis on study 2. 

The first step was to load the libraries and data.

library(tidyverse)
library(dplyr)
library(ggplot2) #used for the age_cut_number() and age_cut_interval() functions that were used to create the age groups
library(report)

Next I calculated the relevant descriptive statistics using the summarise() function. I found the median, minimum and maximum age of the participants in study 2 to try and get an understanding of what the age spread was like.

data_2 %>% summarise(
  Mean = mean(Age),
  SD = sd(Age),
  Median = median(Age),
  Min_age = min(Age), 
  Max_age = max(Age))
## # A tibble: 1 × 5
##    Mean    SD Median Min_age Max_age
##   <dbl> <dbl>  <dbl>   <dbl>   <dbl>
## 1  32.0  11.9     28      18      72

Next, I wanted to create age groups by dividing up the participants by age ranges. I was not sure how to achieve this but after some googling I discovered the cut_number() and cut_interval() functions from the ggplot2 package. The cut_number() makes n groups with equal group members whilst cut_interval() makes n groups with equal age ranges. After some deliberation I decided to base the age groups on even age ranges as I wanted to test changes based on young, middle-aged and older people. This was not possible with cut_numbers() as demonstrated below, as the majority of participants are in the young adult demographic.

exp2data <- mutate(data_2, age_groups = cut_number(Age, n = 3))
exp2data %>% count(age_groups)
## # A tibble: 3 × 2
##   age_groups     n
##   <fct>      <int>
## 1 [18,25]      124
## 2 (25,34]      106
## 3 (34,72]      106

I ultimately decided on splitting the participants into three age groups: 18-36, 36-54 and 54-71 using cut_interval. This will allow me to compare the social connection between young, middle-aged and older adults.

exp2data <- mutate(data_2, age_groups = cut_interval(Age, n = 3))
exp2data %>% count(age_groups)
## # A tibble: 3 × 2
##   age_groups     n
##   <fct>      <int>
## 1 [18,36]      243
## 2 (36,54]       72
## 3 (54,72]       21

To obtain the descriptive statistics I grouped the participants by age_groups that were determined above. I then used the function summarise() to determine the number of participants, mean, sd and se.

agdata <- exp2data %>% group_by(age_groups) %>%
  summarise(n = n(),
            mean = mean(BMPN_Diff),
            sd = sd(BMPN_Diff),
            se = sd/sqrt(n))
agdata
## # A tibble: 3 × 5
##   age_groups     n    mean    sd     se
##   <fct>      <int>   <dbl> <dbl>  <dbl>
## 1 [18,36]      243 -0.0583 1.16  0.0744
## 2 (36,54]       72  0.155  1.02  0.120 
## 3 (54,72]       21 -0.0317 0.804 0.175

Similar to my first exploratory question, I used ggplot to visualise the descriptive statistics. With the age_groups on the x-axis and mean social connectedness on the y-axis. The error bars were also calculated using the mean and se obtained above. The error bars when visualised were noticeably very large, much like my first exploratory question which implies there is large variability between participant’s scores. Furthermore, the older age range actually shows an increase in social connectedness during the pandemic rather than a decrease. As social connectedness scores increased during the pandemic rather than decrease, which is contrary to what was predicted.

agdata %>% ggplot(aes(age_groups), mean, fill = age_groups) +
  geom_col(aes(y = mean)) +
  geom_errorbar(aes(ymin=mean-se, ymax=mean+se), width = 0.1) +
  scale_x_discrete(name = 'Age Group') +
  scale_y_continuous(name = 'Mean Social Connectedness Difference') +
  ggtitle(label = 'Mean Social Connectedness Difference Across Ages')

For the inferential statistics, I carried out a one-way between ANOVA test to test the homogeneity hypothesis. That is is there a difference in mean social connectedness between the age groups. With some googling I found a website that taught me how to conduct an ANOVA test (http://www.cookbook-r.com/Statistical_analysis/ANOVA/#one-way-between-anova).

The conclusion is that there is no significant difference in social connectedness between age groups, with p = 0.36.

aov1 <- aov(BMPN_Diff~age_groups, data = exp2data)
report(aov1)
## For one-way between subjects designs, partial eta squared is equivalent to eta squared.
## Returning eta squared.
## Warning: 'data_findcols()' is deprecated and will be removed in a future update.
##   Its usage is discouraged. Please use 'data_find()' instead.
## The ANOVA (formula: BMPN_Diff ~ age_groups) suggests that:
## 
##   - The main effect of age_groups is statistically not significant and very small (F(2, 333) = 1.03, p = 0.359; Eta2 = 6.13e-03, 95% CI [0.00, 1.00])
## 
## Effect sizes were labelled following Field's (2013) recommendations.

Results

Therefore, the inferential statistic tests suggest that there is no significant difference in mean social connectedness between young (18-36), middle-aged (36-54) and older adults (54-72) during the pandemic. This is despite older people’s risk factors and being less proficient with technology. The insignificant result may be due the high SE, much like the problem encountered in the first exploratory question. The descriptive statistics also shed light on the older age group’s increase in social connectedness rather than the predicted decrease.

Question 3 - Does social connectedness correlate with physical interaction?

The next exploratory question I wanted to investigate was if differences in social connectedness correlate with physical interaction during the pandemic. Folk and colleagues (2020) reported on how many people participants physically interacted with the day before they completed the survey. Physical interaction was determined by if the participant got within 6ft of another person. This made me wonder if there may be an evident correlation between the number of physical interactions and a person’s subjective feelings of social connectedness. Sandstrom and Dunn (2014) reported that adults in Canada have on average six in-person interactions with others on a typical day and reported lower feelings of belonging on days when they had fewer interactions. Based on thus evidence we would expect social connection to correlate with the number of physical interactions a person has.

First I loaded the libraries that I needed to run the tests.

library(tidyverse)
library(dplyr)
library(ggpubr) #used to run the correlation test
## 
## Attaching package: 'ggpubr'
## The following objects are masked from 'package:flextable':
## 
##     border, font, rotate

Next I determined how many physical interactions people had and how many participants were at each physical interaction. This was determined by grouping the participants by the number of physical interactions they had which was called the variable ‘SixFeet’. Then using the count() function to determine how many participants were at each physical interaction.

data_2 %>% group_by(SixFeet) %>% count(SixFeet)
## # A tibble: 7 × 2
## # Groups:   SixFeet [7]
##   SixFeet     n
##     <dbl> <int>
## 1       0   204
## 2       1    41
## 3       2    23
## 4       3    24
## 5       4    16
## 6       5    22
## 7       7     6

Next the descriptive statistics were obtained in a similar fashion as the previous exploratory questions and labelled ‘exp3data’.

exp3data <- data_2 %>% group_by(SixFeet) %>% summarise(n = n(),
                                           mean = mean(BMPN_Diff),
                                           sd = sd(BMPN_Diff),
                                           se = sd/sqrt(n))
exp3data
## # A tibble: 7 × 5
##   SixFeet     n     mean    sd     se
##     <dbl> <int>    <dbl> <dbl>  <dbl>
## 1       0   204 -0.0408  1.20  0.0840
## 2       1    41  0.00813 0.988 0.154 
## 3       2    23  0.0652  1.04  0.216 
## 4       3    24  0.0556  1.00  0.204 
## 5       4    16 -0.0729  1.12  0.280 
## 6       5    22  0.0985  0.811 0.173 
## 7       7     6  0.0833  0.705 0.288

The descriptive statistics were visualised through a correlation graph. I created the graph using ggplot with the number of physical interactions (‘SixFeet’) on the x-axis and mean of social connectedness difference on the y-axis. The geom_point function created the mean points on the graph and geom_smooth with the argument method = ‘lm’ to create the line of best fit. The next for functions create aesthetic changes such as titles.

From the graph I can see that there is possible positive correlation between the number of physical interactions and social connectedness difference. Which actually indicates that the more physical interactions a person has the less socially connected they feel. As social connectedness scores decreased at a greater rate for those with more social interactions than those with less social interactions during the pandemic. This is contrary to what was predicted.

exp3plot <- exp3data %>%  
  ggplot(aes(mean, SixFeet)) +  
  geom_point() +  
  geom_smooth(method = "lm", se = FALSE) + 
  theme_minimal() + 
  scale_x_continuous(name = "Number of Physical Interactions" ) + 
  scale_y_continuous(name = "Mean Social Connectedness Difference") +  
  ggtitle(label = "Correlation between Social Connectedness and Physical Interaction") 
exp3plot
## `geom_smooth()` using formula 'y ~ x'

For the inferential statistic test we conducted a correlation test using the cor.test() function in the ggpubr package. This test calcuates the correlation between the social connectedness difference scores and the number of physical interactions a participant had.

There was insufficient evidence to suggest that there was a correlation between social connectedness and number of physical interactions.

cor.test(exp3data$SixFeet, exp3data$mean) 
## 
##  Pearson's product-moment correlation
## 
## data:  exp3data$SixFeet and exp3data$mean
## t = 1.2873, df = 5, p-value = 0.2544
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.4070827  0.9100567
## sample estimates:
##      cor 
## 0.498924

Therefore, my hypothesis was not supported by the descriptive or inferential test evidence. I think that further investigation using a larger sample size would be integral to future studies about social connection and physical interaction.

Recommendations

I would give three recommendations to Folk and colleagues (2020) to improve the reproducibility of their study. 

1. My first recommendation is for the authors to provide a complete raw data set to readers. A large challenge that my group experienced was being unable to reproduce the majority of the descriptive statistics in study 2 using the final data set the authors provided. Because we were not provided with the raw data set it was very difficult to figure out why some values were not reproducible. Our group suspected that the authors may have included additional exclusion criteria which results in a different set of data values than the one provided. Furthermore, missing data made it more difficult to understand how authors reach some statistical conclusions. For example, we were told that 46% were never married however were not provided with any data about marital status. These challenges taught us that authors should provide a complete raw data set so users can have the ability to fully reproduce their results.

2. My second recommendation is to make the variable names more intuitive. Folk and colleagues (2020) should be praised for including a basic codebook that outlined what each variable name equates to. To improve, I think the codebook could have been more comprehensive in explaining how variable names were chosen and how variables are measured and calculated. For example, ‘BMPN’ and ‘SWLS’ are not intuitive variable names and made it difficult to quickly understand some of the data. This blog (https://www.r-bloggers.com/2018/03/generating-codebooks-in-r/) gives a good example on how to make a codebook in R and what should be in a codebook.

3. My third recommendation is providing comprehensive explanations of code. Whilst Folk and colleagues (2020) did provide the R code that they used to obtain their statistics, no guidance was provided about what the code was doing. As a result my team often got stuck trying to understand large chunks of code when trying to reproduce the data. Our journey would have been greatly helped if authors provided line-by-line explanations for their code using the KISS method and reasoning behind why particular functions were used. The mechanisms behind the KISS method are explained well in this website: https://whitelabelcoders.com/blog/best-practices-in-programming-based-on-solid-kiss-and-personal-experience/. With comprehensive explanation of the code our reproducibility attempts would be much smoother and the goals of open science will be strongly upheld.

References

Department of Health and Aged Care. (2022). Restrictions, lockdowns and stay at home orders. Australian Government. https://www.health.gov.au/health-alerts/covid-19/restrictions-and-lockdowns

Diener, E., Emmons, R. A., Larsen, R. J. & Griffin, S. (1985). The Satisfaction With Life Scale. Journal of Personality Assessment, 49(1), 71-75. https://doi.org/10.1207/s15327752jpa4901_13.

Folk, D., Okabe-Miyamoto, K., Dunn, E. & Lyubomirsky, S. (2020). Did Social Connection Decline During the First Wave of COVID-19?: The Role of Extraversion. Collabra: Psychology, 6(1), 37. https://doi.org/10.1525/collabra.365.

Jacobs, M. & Ellis, C. (2021). Social Connectivity During the COVID-19 Pandemic: Disparities among Medicare Beneficiaries. Journal of Primary Care and Community Health. 12(3), 250-245. https://doi.org/10.1177/21501327211030135.

Lee, R. M., Draper, M. & Lee, S. (2001). Social connectedness, dysfunctional interpersonal behaviors, and psychological distress: Testing a mediator model. Journal of Counseling Psychology, 48(3), 310–318. https://doi.org/10.1037/0022-0167.48.3.310.

Russell, D., Peplau, L. A. & Cutrona, C. E. (1980). The revised UCLA Loneliness Scale: Concurrent and discriminant validity evidence. Journal of Personality and Social Psychology, 39(3), 472–480. https://doi.org/10.1037/0022-3514.39.3.472.

Sheldon, K. M. & Hilpert, J. C. (2012). The balanced measure of psychological needs (BMPN) scale: An alternative domain general measure of need satisfaction. Motivation and Emotion. 36, 439–451. https://doi.org/10.1007/s11031-012-9279-4.

Smillie, L. D., Kern, M. L. & Uljarevic, M. (2019). Extraversion: Description, development, and mechanisms. Handbook of personality development (pp. 118–136). The Guilford Press.

Soto, C. J. & John, O. P. (2017). The next Big Five Inventory (BFI-2): Developing and assessing a hierarchical model with 15 facets to enhance bandwidth, fidelity, and predictive power. Journal of Personality and Social Psychology, 113(1), 117–143. https://doi.org/10.1037/pspp0000096.