Carstensen et al. (2020) - Verification Report

Joshua Pham

2021-04-26


Summary

A robust finding in emotion literature is the relationship between age and improved emotional experience. Though whilst it is well-established that older individuals report more elevated emotional wellbeing than younger individuals, there still exists much debate surrounding the mechanisms that drive this effect. Two prevailing theories – socioemotional selectivity theory (SST) and the strength and vulnerability integration (SAVI) model – outline diverging accounts regarding age-related emotion regulation (Carstensen, Fung, and Charles 2003; Charles 2010).

SST attributes gradual enhancements in emotional wellbeing to motivational shifts arising from perceived limitations of the future, evolving from very expansive beliefs to more restricted perspectives as individuals age. This culminates in a redirection of cognitive processes across adulthood, where older people display an attentional bias towards positive stimuli whilst reacting less to aversive situations.

Conversely, the SAVI model ascribes improved emotional experience in later years to the proactive avoidance of emotionally threatening environments. Hence, during protracted periods of inescapable stress, advantages in emotional states are extinguished with older people possessing a similar capacity to their younger counterparts in regulating their emotions.

Previously, it was difficult to investigate the validity of the SAVI model due to ethical issues, but arising circumstances of the COVID pandemic presented an opportunity to assess this theory. Capitalising on this sustained duration of heightened stress, Carstensen, Shavit, and Barnes (2020) evaluated SAVI and SST explanations by examining whether age-related differences in emotional experience were maintained during the COVID pandemic. Through sampling 974 Americans between the ages 18 and 76 in an online survey, they found that benefits in emotional wellbeing for older individuals was maintained during the pandemic, consistent with documented findings on age-related advantages.

This contradicts SAVI model predictions which suggest similar, if not worsened, emotional experiences for older populations compared to younger ones in times of persistent stress. Furthermore, whilst it was shown that longer perceived futures were commensurate with emotional functioning, tentative support is lent to the SST theory as consequent reprioritisation of goals from perceived time constraints could not be measured.

Reactions

  1. I was surprised that the avoidance of stress factor described in the Strength and Vulnerability Integration (SAVI) model lacked sufficient clarity, resulting in its broad interpretation across the literature. If there exists a lack of consensus on what constitutes avoidance of stress, being the crux of the model, it seems very ill-founded to continue investigating such a variable as it will inevitably breed many contradicting and conflicting results depending on one’s subjective understanding. Instead, its definition and operationalisation should be specified clearly to allow further research to be conducted in an efficient manner that more accurately assesses its validity in the context of emotional wellbeing and age.
  2. I wasn’t convinced that this study captured the impact of the pandemic on emotional wellbeing across ages. Especially since this study was conducted during the very early stage of the pandemic, it is highly likely that it does not reflect an accurate assessment of people’s emotional response to the threat of the virus as it may not be fully realised or understood yet. To address this, the article surveyed perceived risk as one of the criteria but it did not mention anything about the lockdown measures being implemented and the possible confusion that may arise between individuals responding to the COVID threat or to the difficulties of quarantine on their daily lifestyle. It is very likely that the positive and negative emotions observed are reactions to the lockdown restrictions to which it can be argued that younger individuals would experience more difficulty adjusting than their older counterparts due to their proclivity for more outgoing, social endeavours rather than their lack of emotional experience. The lack of analysis differentiating between different age groups that would be more susceptible to either the impacts of COVID or mandatory lockdown procedures leaves room for many other explanations and interpretations of the data, thereby casting doubt on their conclusion of maintained age-related advantages in emotional wellbeing during a pandemic.
  3. The author seem to have missed factors other than health that would have impacted an individual’s perception of risk during the pandemic. Whilst they analysed the strong relationship between age, health and risk perception, they neglected to examine other factors which may have moderated this effect, such as personality and education, which would have implications for understanding people’s reactions to threatening, global-scale events. Instead much of the variation in perceived risk of the virus across age was attributed to recognised health concerns, which is definitely a contributing factor, but it does not explain the distinctions in perceived risk to the population or to the self and only addresses changes in perceived risk of developing complications. Hence, health ratings do not fully explain the fluctuations in other types of perceived risk and this could be clarified through an examination of other contributing factors.

Verification

The materials such as the data set, codebook, survey items and study-measures were available in an OSF repository.

Demographic Descriptives

We attempted to reproduce the summary characteristics of the participants reported on page 1376 of the study.

Participants were 49.2% female and ranged in age from 18 to 76 years (M = 45.15, SD = 16.79). Seventy-five percent of participants identified as White. The median household income for the sample was between $50,000 and $60,000, which is comparable with the median U.S. income of $62,000 (U.S. Census Bureau, 2018a). Fifty-six percent of participants were currently working for pay, which is comparable with the 60% employment rate nationwide (Bureau of Labor Statistics, 2020). The sample was somewhat more educated than the U.S. population: 88% of participants reported attending at least some college, compared with 60% in the U.S. population (U.S. Census Bureau, 2019). Twenty-three percent of participants reported living alone, again comparable with the 28% of American households with one occupant (U.S. Census Bureau, 2018b).

library(tidyverse) # for dplyr and ggplot
library(gt) # for aesthetically pleasing tables
library(ggpubr) # for easier plotting functions for data visualisation
library(papaja) # for apa themes 
library(rstatix) # for calculating statistics 
library(corrplot) # for correlation plots
library(gridExtra) # for merging multiple plots 
library(extrafont) # for importing fonts
library(plotrix) # for specialised plots and plotting accessories

The data set came available as a csv file so we used the read.csv function as part of the tidyverse package to read the data and placed it into a data frame called covid.

covid <- "Covid_Data.csv" %>% 
  read.csv()

Using the codebook provided, we selected the relevant variables to be reproduced, being:

  • Percentage of females (gender_bin_f)
  • Range, mean and standard deviation of age (age)
  • Percentage of White participants (race_bin)
  • Median household income (income)
  • Currently employed participants (emp_bin)
  • Percentage with a minimum college education (educ)
  • Percentage of participants living alone (livealone)

This is achievable through the use of the select function from the tidyverse package and we put them into a new data frame called pcp. The glimpse function provides a brief overview of the data, mainly showing me the columns which we were most interested in.

pcp <- covid %>% 
  select(age, gender_bin_f, race_bin, educ, income, emp_bin, livealone)

glimpse(pcp)
## Rows: 945
## Columns: 7
## $ age          <int> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 1~
## $ gender_bin_f <chr> "Non-female", "Non-female", "Non-female", "Non-female", "~
## $ race_bin     <chr> "White", "Non-white", "White", "White", "Non-white", "Non~
## $ educ         <chr> "Graduated high school (or GED)", "Graduated high school ~
## $ income       <chr> "Less than $10,000", "Less than $10,000", "$220,000 to $2~
## $ emp_bin      <chr> "Not_working", "Not_working", "Not_working", "Not_working~
## $ livealone    <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No~

Mean and Standard Deviation Age

Using the summarise function, we were able to quickly calculate the mean and standard deviation of the age variable.

pcp %>% 
  summarise(
    mean_age = mean(age),
    sd_age = sd(age)) 
##   mean_age   sd_age
## 1 45.15238 16.78756

Age Range

The range of participant ages in the study was obtainable through the range function.

pcp$age %>% range()
## [1] 18 76

Percentages

To calculate the percentage of females in the sample, we created a new data frame fem by selecting the gender_bin_f variable from pcp using select whilst filtering for values labelled “Female” using filter. The percentage was then calculated by dividing the number of females (fem) over the total number of participants (pcp) using count and this number was multipled by 100 to get a percentage.

fem <- pcp %>% 
  select(gender_bin_f) %>% 
  filter(gender_bin_f == "Female")

100 * count(fem) / count(pcp)
##          n
## 1 49.20635

Using the same procedure, the percentages of White participants, employed participants, participants with a minimum college education and participants who lived alone were able to be reproduced.

# Percentage of White participants

white <- pcp %>%
  select(race_bin) %>% 
  filter(race_bin == "White")      

100 * count(white) / count(pcp)
##          n
## 1 75.44974
# Working for pay

pay <- pcp %>%
  select(emp_bin) %>% 
  filter(emp_bin == "Working")     

100 * count(pay) / count(pcp)
##          n
## 1 56.19048
# Minimum of college-level education

smart <- pcp %>%
  select(educ) %>% 
  filter(educ != "Graduated high school (or GED)",
         educ != "Less than high school")  

100 * count(smart) / count(pcp)
##          n
## 1 87.72487
# Living alone

alone <- pcp %>% 
  select(livealone) %>% 
  filter(livealone == "Yes")

100 * count(alone) / count(pcp)
##          n
## 1 22.96296

Median Household Income

Initially, we tried to find the median household income using the most straightforward way possible, where we followed the same method I used in the percentage calculations - selecting income with select, filtering out any “Decline to answer” responses with filter and using summarise to find the median.

However, it returned a different value to the one reported in the study (50,000 to 60,000).

pcp %>% 
  select(income) %>% 
  filter(income != "Decline to answer") %>% 
  summarise(med = median(income))
##                  med
## 1 $30,000 to $40,000

We thought this error arose from the values being incorrectly ordered, as it is essential for the data to be arranged in either an ascending or descending fashion for the correct median to be calculated. Hence, we very tediously and very meticulously input the values into the correct order with the arrange and match functions and used summarise again to see if it worked.

As an aside, we could not use more efficient methods such as desc, which would arrange it in descending order, due to the non-numeric nature of the values.

pcp %>% 
  select(income) %>% 
  arrange(match(income, c(
    "Less than $10,000", 
    "$10,000 to $20,000",
    "$20,000 to $30,000",
    "$30,000 to $40,000",
    "$40,000 to $50,000",
    "$50,000 to $60,000",
    "$60,000 to $80,000",
    "$80,000 to $100,000",
    "$100,000 to $120,000",
    "$120,000 to $140,000",
    "$140,000 to $160,000",
    "$160,000 to $180,000",
    "$180,000 to $200,000",
    "$200,000 to $220,000",
    "$220,000 to $250,000",
    "Greater than $250,000"
  ))) %>% 
  filter(income != "Decline to answer") %>% 
  summarise(median = median(income))
##               median
## 1 $30,000 to $40,000

After this largely unsuccessful and onerous venture, we asked for help and realised that we needed to manipulate the data a bit more to find the median.

As the income values were presented as ranges, it would be easiest to find the median of the midpoints and from there, the median household income can be deduced.

However, as seen below, the income values not only contain non-numeric values such as “Less,” “than” and “Greater” but they also have “$” signs and commas which need to be removed before we can calculate the median of the midpoint.

unique(pcp$income)  # the unique function shows unique values within a variable 
##  [1] "Less than $10,000"     "$220,000 to $250,000"  "Decline to answer"    
##  [4] "$20,000 to $30,000"    "$100,000 to $120,000"  "$80,000 to $100,000"  
##  [7] "$30,000 to $40,000"    "$60,000 to $80,000"    "$40,000 to $50,000"   
## [10] "$160,000 to $180,000"  "$10,000 to $20,000"    "$50,000 to $60,000"   
## [13] "$120,000 to $140,000"  "$140,000 to $160,000"  "$200,000 to $220,000" 
## [16] "Greater than $250,000" "$180,000 to $200,000"

To eliminate the non-numeric values, we can capitalise on the values consisting of three chunks (e.g. “Less than $10,000”) and using separate, we can divide each value into three columns: low, to and high.

med_inc <- pcp %>%
  select(income) %>% 
  filter(income != "Decline to answer") %>% 
  separate(income, c("low", "to", "high"), sep = " ")

glimpse(med_inc)
## Rows: 923
## Columns: 3
## $ low  <chr> "Less", "Less", "$220,000", "$20,000", "$100,000", "$80,000", "$8~
## $ to   <chr> "than", "than", "to", "to", "to", "to", "to", "to", "to", "to", "~
## $ high <chr> "$10,000", "$10,000", "$250,000", "$30,000", "$120,000", "$100,00~

Since we don’t need the to column, we can just select low and high using select. Now, to get rid of the non-numeric values “Less” and “Greater” we will use mutate to add another column (low2) which will essentially be low with numeric replacements of “Less” and “Greater.” To achieve this, we can use the ifelse function to substitute “Less” with “0” and “Greater” with “275000” and if we select everything but low, we should now have a data frame with no non-numeric values.

med_inc <- med_inc %>% 
  select(low, high) %>%
  mutate(low2 = ifelse(low == "Less", 0,
                       ifelse(low == "Greater", 275000, low))) %>% 
  select(-low)

glimpse(med_inc)
## Rows: 923
## Columns: 2
## $ high <chr> "$10,000", "$10,000", "$250,000", "$30,000", "$120,000", "$100,00~
## $ low2 <chr> "0", "0", "$220,000", "$20,000", "$100,000", "$80,000", "$80,000"~

However, before we can calculate the midpoint, we need to remove the “$” sign and commas in high and low and this can be accomplished through the gsub function. The as.numeric function can then be used to make the values of both variables a numeric value rather than a character value.

med_inc$high <- gsub('[$]','',med_inc$high) 

med_inc$low2 <- gsub('[$]','',med_inc$low2)

med_inc$high <- as.numeric(gsub(',','',med_inc$high))

med_inc$low2 <- as.numeric(gsub(',','',med_inc$low2))

glimpse(med_inc)
## Rows: 923
## Columns: 2
## $ high <dbl> 10000, 10000, 250000, 30000, 120000, 100000, 100000, 40000, 80000~
## $ low2 <dbl> 0, 0, 220000, 20000, 100000, 80000, 80000, 30000, 60000, 40000, 2~

Finally, we can create a new variable using mutate which calculates the midpoint and subsequently, the summarise function can be used again to obtain the median of this midpoint.

med_inc %>%
  mutate(midpoint = (high+low2)/2) %>%
  summarise(med_income = median(midpoint))
##   med_income
## 1      55000

Hence, the median household income is 50,000 to 60,000 as reported in the paper.

Summary Statistics

The statistics to be reproduced can be found on pages 1376-1378 and consisted of the following:

  • Time horizons
    • Future Time Perspective (FTP) Scale
      • Overall FTP score (M = 4.09, SD = 1.35)
      • Extension (M = 4.02, SD = 1.91)
      • Opportunity (M = 4.73, SD = 1.62)
      • Constraint (M = 3.74, SD = 1.76)
  • Perceived risk
    • Risk to self (M = 2.36, SD = 1.11)
    • Risk of complications (M = 2.45, SD = 1.34)
    • Risk to general population (M = 2.97, SD = 0.94)
  • Effect on employment (M = 1.41, SD = 1.46)
  • Subjective health (M = 3.36, SD = 0.99)
  • Personality
    • Extraversion (M = 3.44, SD = 1.65)
    • Conscientiousness (M = 5.23, SD = 1.33)
  • Frequency of positive emotions (M = 1.97, SD = 0.56)
  • Frequency of negative emotions (M = 1.42, SD = 0.66)
  • Intensity of positive emotions (M = 1.92, SD = 0.58)
  • Intensity of negative emotions (M = 1.77, SD = 0.70)

Whilst there exists many of them, reproducing the summary statistics was a relatively easy feat through the use of mean and sd functions to calculate the mean and standard deviations respectively.

Any N/A values were omitted using “na.rm = TRUE.”

# Time Horizons 

## FPT Average 

mean(covid$FTP_av, na.rm = TRUE)
## [1] 4.094168
sd(covid$FTP_av, na.rm = TRUE)
## [1] 1.352868
## FPT Extension 

mean(covid$FTP_Ext, na.rm = TRUE)
## [1] 4.018028
sd(covid$FTP_Ext, na.rm = TRUE)
## [1] 1.906157
## FPT Opportunity 

mean(covid$FTP_Opp, na.rm = TRUE)
## [1] 4.727466
sd(covid$FTP_Opp, na.rm = TRUE)
## [1] 1.618943
## FPT Constraint 

mean(covid$FTP_Con, na.rm = TRUE)
## [1] 3.740191
sd(covid$FTP_Con, na.rm = TRUE)
## [1] 1.755594
# Perceived Risk

## Own risk of contraction

mean(covid$risk_self, na.rm = TRUE)
## [1] 2.362963
sd(covid$risk_self, na.rm = TRUE)
## [1] 1.113526
## Risk given current health

mean(covid$risk_comp, na.rm = TRUE)
## [1] 2.447619
sd(covid$risk_comp, na.rm = TRUE)
## [1] 1.343399
## GP risk  

mean(covid$risk_pop, na.rm = TRUE)
## [1] 2.965079
sd(covid$risk_pop, na.rm = TRUE)
## [1] 0.9387196
# Effect on employment 

mean(covid$emp_affect, na.rm = TRUE)
## [1] 1.412698
sd(covid$emp_affect, na.rm = TRUE)
## [1] 1.45521
# Subjective health 

mean(covid$health, na.rm = TRUE)
## [1] 3.357672
sd(covid$health, na.rm = TRUE)
## [1] 0.9856528
# Extraversion

mean(covid$Extraversion, na.rm = TRUE)
## [1] 3.440212
sd(covid$Extraversion, na.rm = TRUE)
## [1] 1.64625
# Conscientiousness

mean(covid$Conscientiousness, na.rm = TRUE)
## [1] 5.224868
sd(covid$Conscientiousness, na.rm = TRUE)
## [1] 1.327658
# Positive Emotions - Frequency

mean(covid$avg_f_pos, na.rm = TRUE)
## [1] 1.972079
sd(covid$avg_f_pos, na.rm = TRUE)
## [1] 0.5627299
# Negative Emotions - Frequency

mean(covid$avg_f_neg, na.rm = TRUE)
## [1] 1.41762
sd(covid$avg_f_neg, na.rm = TRUE)
## [1] 0.6613335
# Positive Emotions - Intensity

mean(covid$avg_i_pos, na.rm = TRUE)
## [1] 1.929739
sd(covid$avg_i_pos, na.rm = TRUE)
## [1] 0.5880139
# Negative Emotions - Intensity 

mean(covid$avg_i_neg, na.rm = TRUE)
## [1] 1.771569
sd(covid$avg_i_neg, na.rm = TRUE)
## [1] 0.6998724

However, this code is very long so more efficient alternatives were explored. This led to the discovery and use of the psych package which allowed for quick and easy statistical calculation and presentation. More specifically, the describe function was used to provide the means and standard deviations (along with other statistics such as standard error) in only one line of code.

# load packages 

library(psych)

# select relevant variables 

sum_stat <- covid %>% 
  select(FTP_av, FTP_Ext, FTP_Opp, FTP_Con, 
         risk_self, risk_comp, risk_pop, emp_affect,
         health, Extraversion, Conscientiousness, avg_f_pos,
         avg_f_neg, avg_i_pos, avg_i_neg)

# get tabulated summary stats (excluding irrelevant variables)

describe(sum_stat, na.rm = TRUE, interp=FALSE,
         skew = FALSE, ranges = FALSE,trim=.1,
         type=3,check=FALSE,fast=NULL,quant=NULL,
         IQR=FALSE,omit=TRUE,data=NULL)
##                   vars   n mean   sd   se
## FTP_av               1 943 4.09 1.35 0.04
## FTP_Ext              2 943 4.02 1.91 0.06
## FTP_Opp              3 943 4.73 1.62 0.05
## FTP_Con              4 943 3.74 1.76 0.06
## risk_self            5 945 2.36 1.11 0.04
## risk_comp            6 945 2.45 1.34 0.04
## risk_pop             7 945 2.97 0.94 0.03
## emp_affect           8 945 1.41 1.46 0.05
## health               9 945 3.36 0.99 0.03
## Extraversion        10 945 3.44 1.65 0.05
## Conscientiousness   11 945 5.22 1.33 0.04
## avg_f_pos           12 945 1.97 0.56 0.02
## avg_f_neg           13 945 1.42 0.66 0.02
## avg_i_pos           14 945 1.93 0.59 0.02
## avg_i_neg           15 942 1.77 0.70 0.02

Table 1

The next reproducibility goal was Table 1 from page 1378 of the study.

First, we extracted the relevant variables (positive and negative emotion frequencies) into a new data frame emot from the full data set covid using select.

emot <- covid %>%
  select(f_calm, f_qui, f_app, f_int, f_cont, f_hap, f_rela, f_pea,
         f_ener, f_aff, f_amu, f_acc, f_joy, f_pro, f_reli, f_exc,
         f_conc, f_anx, f_bor, f_fru, f_irr, f_sad, f_lon, f_fear,
         f_ang, f_dis, f_gui, f_emb, f_sha)

glimpse(emot)
## Rows: 945
## Columns: 29
## $ f_calm <int> 3, 2, 3, 3, 3, 2, 4, 0, 2, 4, 3, 2, 1, 2, 2, 4, NA, 0, 3, 2, 3,~
## $ f_qui  <int> 2, 3, 4, 2, 2, 1, 2, 2, 2, 3, 3, 4, 3, 2, 3, 2, 1, 3, 4, 2, 1, ~
## $ f_app  <int> 1, 3, 3, 1, 2, 2, 2, NA, 2, NA, 2, 1, 2, 3, 3, 4, 3, 0, 2, 1, 3~
## $ f_int  <int> 2, NA, 3, 2, 3, 2, 3, 1, 2, 2, 2, NA, 3, 2, 2, 2, 1, NA, 2, 2, ~
## $ f_cont <int> 2, 1, 2, 0, 3, 1, 3, 1, 1, NA, 3, 1, 2, 3, 2, 4, 2, NA, 3, 3, 3~
## $ f_hap  <int> 2, NA, 3, 2, 3, 2, 3, 1, 2, 2, 2, 2, 2, 2, 2, 3, 2, 1, 2, 2, 4,~
## $ f_rela <int> 3, 1, 1, 3, 2, 1, 3, NA, 1, 4, 2, 1, NA, 2, 2, 3, 1, NA, 3, 2, ~
## $ f_pea  <int> 2, 2, 2, 4, 3, 1, 3, 1, 2, 3, 2, 1, 1, 2, 2, 3, NA, 0, 3, 2, 3,~
## $ f_ener <int> 1, 1, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 2, 2, 1, 3, 1, 3, ~
## $ f_aff  <int> 1, 3, 2, 2, 2, 1, 1, 1, 1, 1, 2, 3, 2, 3, 2, 2, 3, 2, 2, 2, 3, ~
## $ f_amu  <int> 2, 0, 4, 3, 3, 2, 4, 0, 2, 2, 2, 3, 2, 3, 1, 2, 1, 1, 3, 2, 3, ~
## $ f_acc  <int> 1, NA, 3, 2, 2, NA, 1, 0, 0, 1, 2, 1, 2, 2, 2, 1, 1, 1, 1, 1, 3~
## $ f_joy  <int> 1, 0, 4, 3, 3, 2, 3, 0, 2, 2, 2, 2, 2, 2, 1, 3, 1, 1, 2, 1, 3, ~
## $ f_pro  <int> 2, 0, 2, 2, 1, NA, 1, NA, 2, 1, 1, 2, 2, 2, 0, 2, 2, 0, 2, 2, 3~
## $ f_reli <int> 1, NA, 1, 1, 1, NA, 2, NA, 2, 2, 1, 1, 2, 2, 1, 2, 2, 1, 3, 2, ~
## $ f_exc  <int> 3, NA, 1, 3, 2, 1, 2, 1, 2, 2, 2, 1, 2, 2, 0, 2, 1, 0, 2, 1, 3,~
## $ f_conc <int> 2, 3, 1, 2, 2, 2, 1, 3, 2, 3, 2, 2, 3, 1, 1, 1, 2, 2, 1, 1, 2, ~
## $ f_anx  <int> 1, 4, 3, 2, 1, 3, 1, 4, 2, 1, 2, 4, 3, 2, 2, 1, 2, 2, 1, 2, 2, ~
## $ f_bor  <int> 3, 3, 2, 4, 2, 3, 2, 4, 4, 1, 2, 4, 2, 3, 4, 2, 4, 3, 2, 3, 2, ~
## $ f_fru  <int> 0, 2, 2, 1, 1, 2, 1, 3, 3, 0, 1, 4, 2, 1, 2, 1, 3, 3, 1, 2, 1, ~
## $ f_irr  <int> 1, 0, 2, 2, 1, 2, 1, 2, 3, 0, 2, 4, 2, 1, 3, 1, 2, 2, 1, 1, 1, ~
## $ f_sad  <int> 3, 3, 0, 1, 1, 2, 1, 3, 2, NA, 2, 3, 2, 2, 2, NA, 3, 2, 1, 1, 1~
## $ f_lon  <int> 4, 4, 0, 4, 1, 3, 4, 1, 3, 1, 2, 4, NA, 2, 1, 1, 1, 2, 2, 2, 1,~
## $ f_fear <int> 0, 2, 1, 0, 1, 2, 0, 3, 2, NA, 1, 2, 2, 1, 0, NA, 2, 1, NA, 0, ~
## $ f_ang  <int> 0, 0, 1, 1, 1, 2, 0, 2, 3, 0, 2, 3, 1, 1, 1, 0, 2, 2, 2, 2, 2, ~
## $ f_dis  <int> 1, 3, 0, 1, NA, NA, 1, 3, 2, 0, 1, 3, 2, 1, 0, 0, 2, 2, NA, 1, ~
## $ f_gui  <int> 1, 3, 1, 1, 0, NA, 0, 1, 1, 1, 1, 3, 2, 0, 0, 1, 0, 2, 1, 1, 1,~
## $ f_emb  <int> 0, 2, 0, NA, 0, 1, 0, 1, 1, 0, 1, 3, 1, 0, 0, 1, 0, 0, NA, 2, 1~
## $ f_sha  <int> NA, 1, 0, 1, NA, 2, 0, 3, 1, 1, 1, 2, 2, 0, 0, 1, 0, 0, NA, 0, ~

Since we needed the mean and standard deviations for each variable, we used summarise and across to obtain these statistics across the columns whilst exluding NA values.

emot <- emot %>%
  summarise(
    across(.cols = everything(), na.rm = TRUE, list(M = mean, SD = sd)))

glimpse(emot)
## Rows: 1
## Columns: 58
## $ f_calm_M  <dbl> 2.444929
## $ f_calm_SD <dbl> 0.8656906
## $ f_qui_M   <dbl> 2.432139
## $ f_qui_SD  <dbl> 0.8747724
## $ f_app_M   <dbl> 2.404788
## $ f_app_SD  <dbl> 0.933181
## $ f_int_M   <dbl> 2.279045
## $ f_int_SD  <dbl> 0.8270466
## $ f_cont_M  <dbl> 2.151885
## $ f_cont_SD <dbl> 0.9388588
## $ f_hap_M   <dbl> 2.134133
## $ f_hap_SD  <dbl> 0.8042611
## $ f_rela_M  <dbl> 2.129103
## $ f_rela_SD <dbl> 0.8860399
## $ f_pea_M   <dbl> 2.049342
## $ f_pea_SD  <dbl> 0.9457163
## $ f_ener_M  <dbl> 1.89819
## $ f_ener_SD <dbl> 0.8040365
## $ f_aff_M   <dbl> 1.888018
## $ f_aff_SD  <dbl> 0.8607027
## $ f_amu_M   <dbl> 1.871287
## $ f_amu_SD  <dbl> 0.7194135
## $ f_acc_M   <dbl> 1.83557
## $ f_acc_SD  <dbl> 0.8667226
## $ f_joy_M   <dbl> 1.705069
## $ f_joy_SD  <dbl> 0.8975795
## $ f_pro_M   <dbl> 1.667053
## $ f_pro_SD  <dbl> 0.9719912
## $ f_reli_M  <dbl> 1.478114
## $ f_reli_SD <dbl> 0.8819814
## $ f_exc_M   <dbl> 1.463719
## $ f_exc_SD  <dbl> 0.7853215
## $ f_conc_M  <dbl> 2.231602
## $ f_conc_SD <dbl> 0.9064816
## $ f_anx_M   <dbl> 2.004376
## $ f_anx_SD  <dbl> 1.064195
## $ f_bor_M   <dbl> 1.875986
## $ f_bor_SD  <dbl> 1.119476
## $ f_fru_M   <dbl> 1.845645
## $ f_fru_SD  <dbl> 0.9257121
## $ f_irr_M   <dbl> 1.754464
## $ f_irr_SD  <dbl> 0.894416
## $ f_sad_M   <dbl> 1.693002
## $ f_sad_SD  <dbl> 0.9652122
## $ f_lon_M   <dbl> 1.545249
## $ f_lon_SD  <dbl> 1.253394
## $ f_fear_M  <dbl> 1.375291
## $ f_fear_SD <dbl> 1.055715
## $ f_ang_M   <dbl> 1.347477
## $ f_ang_SD  <dbl> 0.8924447
## $ f_dis_M   <dbl> 1.160333
## $ f_dis_SD  <dbl> 0.9870476
## $ f_gui_M   <dbl> 0.6339795
## $ f_gui_SD  <dbl> 0.8742299
## $ f_emb_M   <dbl> 0.5146886
## $ f_emb_SD  <dbl> 0.7565917
## $ f_sha_M   <dbl> 0.444186
## $ f_sha_SD  <dbl> 0.7585914

The variable names were quite distasteful and unhelpful so renaming them with rename cleaned them up significantly.

emot <- emot %>%
  rename(Calm = f_calm_M, Quiet = f_qui_M, Appreciative = f_app_M,
         Interested = f_int_M, Content = f_cont_M, Happy = f_hap_M,
         Relaxed = f_rela_M, Peaceful = f_pea_M, Energetic = f_ener_M,
         Affectionate = f_aff_M, Amused = f_amu_M, Accomplished = f_acc_M,
         Joyful = f_joy_M, Proud = f_pro_M, Relieved = f_reli_M,
         Excited = f_exc_M, Concerned = f_conc_M, "Anxious/worried" = f_anx_M,
         Bored = f_bor_M, Frustrated = f_fru_M, Irritated = f_irr_M,
         Sad = f_sad_M, Lonely = f_lon_M, Fearful = f_fear_M,
         Angry = f_ang_M, Disgusted = f_dis_M, Guilty = f_gui_M,
         Embarrassed = f_emb_M, Ashamed = f_sha_M)

To make the data frame more suitable for table-making, we used the pivot_longer function twice; once to place all the emotions into one column and means into another, and once more to flip the standard deviations into its own column. These two separate data frames are then combined into one through bind_cols.

emot_pivot1 <- emot %>%
  select(-starts_with("f_")) %>%
  pivot_longer(cols = everything(), 
               names_to = "Valence and emotion",
               values_to = "M")

emot_pivot2 <- emot %>%
  select(starts_with("f_")) %>%
  pivot_longer(cols = everything(),
               names_to = NULL,
               values_to = "SD")

emot_table <- bind_cols(emot_pivot1, emot_pivot2)

Going the extra mile, we decided to also reproduce the confidence interval limits through calculating the standard error by inputting the formula into R.

The confidence intervals themselves can be found by using mutate to create two new variables, lower_CI and upper_CI, and again inputting the calculation using the quantile function qt.

n <- 945; se <- emot_table$SD / sqrt(n)

emot_table <- emot_table %>%
  mutate(lower_CI = M - qt(1 - (0.05 / 2), n - 1) * se,
         upper_CI = M + qt(1 - (0.05 / 2), n - 1) * se)

We then rounded the CIs, means and SDs to 2 decimal places with format and round, combined the upper_CI and lower_CI into one column using unite and added square brackets with mutate and paste0.

emot_table$lower_CI <- format(round(emot_table$lower_CI, 2), nsmall = 2)
emot_table$upper_CI <- format(round(emot_table$upper_CI, 2), nsmall = 2)

emot_table$M <- format(round(emot_table$M, 2), nsmall = 2)
emot_table$SD <- format(round(emot_table$SD, 2), nsmall = 2)

emot_table <- emot_table %>%
  unite(col = "95% CI", 
        lower_CI:upper_CI,
        sep = " ") %>%
  mutate("95% CI" = paste0("[", `95% CI`, "]"))

The function apa_table from the papaja package allowed for the creation of an apa-style table with sections outlining positive and negative emotions once the original emot_table was separated into the appropriate groups using slice. (NOTE: proper apa formatting of the table can only be produced when knitted into a pdf or word file)

pos_emot_table <- emot_table %>%
  slice(1:16)

neg_emot_table <- emot_table %>%
  slice(17:29)

apa_table(list('Positive emotions' = pos_emot_table,
               'Negative emotions' = neg_emot_table), 
          caption = "Mean Frequencies of Emotions",
          note = "N = 945. CI = 95% confidence interval",
          align = c('l', 'c', 'c', 'c'),
          font_size = "scriptsize")
(#tab:unnamed-chunk-23)
Mean Frequencies of Emotions
Valence and emotion M SD 95% CI
Positive emotions
   Calm 2.44 0.87 [2.39 2.50]
   Quiet 2.43 0.87 [2.38 2.49]
   Appreciative 2.40 0.93 [2.35 2.46]
   Interested 2.28 0.83 [2.23 2.33]
   Content 2.15 0.94 [2.09 2.21]
   Happy 2.13 0.80 [2.08 2.19]
   Relaxed 2.13 0.89 [2.07 2.19]
   Peaceful 2.05 0.95 [1.99 2.11]
   Energetic 1.90 0.80 [1.85 1.95]
   Affectionate 1.89 0.86 [1.83 1.94]
   Amused 1.87 0.72 [1.83 1.92]
   Accomplished 1.84 0.87 [1.78 1.89]
   Joyful 1.71 0.90 [1.65 1.76]
   Proud 1.67 0.97 [1.61 1.73]
   Relieved 1.48 0.88 [1.42 1.53]
   Excited 1.46 0.79 [1.41 1.51]
Negative emotions
   Concerned 2.23 0.91 [2.17 2.29]
   Anxious/worried 2.00 1.06 [1.94 2.07]
   Bored 1.88 1.12 [1.80 1.95]
   Frustrated 1.85 0.93 [1.79 1.90]
   Irritated 1.75 0.89 [1.70 1.81]
   Sad 1.69 0.97 [1.63 1.75]
   Lonely 1.55 1.25 [1.47 1.63]
   Fearful 1.38 1.06 [1.31 1.44]
   Angry 1.35 0.89 [1.29 1.40]
   Disgusted 1.16 0.99 [1.10 1.22]
   Guilty 0.63 0.87 [0.58 0.69]
   Embarrassed 0.51 0.76 [0.47 0.56]
   Ashamed 0.44 0.76 [0.40 0.49]

Note. N = 945. CI = 95% confidence interval

 

Figure 1

Our final reproducibility goal was to reproduce Figure 1 on page 1380 of the study.

Initially, we thought it best to reproduce each plot independently before merging it together to produce the goal above.

For each plot, ggplot from the tidyverse packages coupled with geom_smooth and geom_point allow for plotting and linear regression lines of the intensity and frequency of positive and negative emotions across age. Some aesthetic changes were made through the theme function to remove certain axis titles and texts so that the graph could be easily accommodated when merged together.

# Frequency of Positive Emotions

pos_f <- ggplot(covid) +
  geom_point(
    alpha=1/18,
    mapping = aes(
      x = age, 
      y = avg_f_pos
    )
  ) +
  ylim(0, 2.5) +
  theme_classic() +
  theme(
    panel.background = element_rect(colour = "black", size=1/40),
    axis.title.x = element_blank(),
    axis.text.x = element_blank(),
    axis.text.y = element_blank()
  ) +
  geom_smooth(
    mapping = aes(
      x = age, 
      y = avg_f_pos
    ), method = "lm", colour = 'black') +
  ggtitle("      Positive Emotions") + xlab(" ") + ylab(" ")


# Frequency of Negative Emotions

neg_f <- ggplot(covid) +
  geom_point(
    alpha=1/18,
    mapping = aes(
      x = age, 
      y = avg_f_neg
    )
  ) +
  ylim(0, 2.5) +
  theme_classic() +
  theme(
    panel.background = element_rect(colour = "black", size=1/40),
    axis.title.x = element_blank(),
    axis.text.x = element_blank()
  )+
  geom_smooth(
    mapping = aes(
      x = age, 
      y = avg_f_neg
    ), method = "lm", colour = 'black') +
  ggtitle("   Negative Emotions") + 
  xlab("") + 
  ylab("Frequency of \n Emotional Experience")

# Intensity of Positive Emotions

pos_i <- ggplot(covid) +
  geom_point(
    alpha=1/18,
    mapping = aes(
      x = age, 
      y = avg_i_pos
    )
  ) +
  ylim(0, 2.5) +
  theme_classic() +
  theme(
    panel.background = element_rect(colour = "black", size=1/40),
    axis.text.y = element_blank(),
    axis.title.x = element_blank()
  ) +
  geom_smooth(
    mapping = aes(
      x = age, 
      y = avg_i_pos
    ), method = "lm", colour = 'black') +
  ggtitle(" ") + xlab("Age(Years)") + ylab(" ")

# Intensity of Negative Emotions

neg_i <- ggplot(covid) +
  geom_point(
    alpha=1/18,
    mapping = aes(
      x = age, 
      y = avg_i_neg
    )
  ) +
  ylim(0, 2.5) +
  theme_classic() +
  theme(
    panel.background = element_rect(colour = "black", size=1/40),
    axis.title.x = element_blank())+
  geom_smooth(
    mapping = aes(
      x = age, 
      y = avg_i_neg
    ), method = "lm", colour = 'black') +
  ggtitle(" ") + 
  xlab("Age(Years)") + 
  ylab("Intensity of \n Emotional Experience")

To combine these plots, the gridExtra package offers a grid.arrange function that achieves this goal.

grid.arrange(neg_f, pos_f, neg_i, pos_i) 

As a preliminary start, it was very good. However, there were some areas of the above graph that required correction:

  • Spacing between graphs
  • Scale lines for inner boxes should be removed
  • X-axis should be labelled
  • y-axis labelling needs to be cleaned up
  • Uncentered headings

And due to the inflexibility of grid.arrange, these problems were difficult to rectify.

Additionally, the code was extremely long and repetitive so it would be reasonable to state that a more efficient and effective method was likely to exist.

Therefore, we decided to instead manipulate the data set until it was able to be graphed easily using a single ggplot code chunk.

This involved the use of pivot_longer to move the intensities and frequencies of the positive and negative emotions into one column and the values into another. Following a similar process to the median household income calculation, the “avg” and “i”/“f” were removed from the value names using separate to divide them into individual columns and selecting the relevant variables with select.

plot <- "Covid_Data.csv" %>% 
  read_csv() %>% 
  select(age, avg_i_pos, avg_i_neg, avg_f_pos, avg_f_neg) %>% 
  na.omit(argh) %>% 
  pivot_longer(
    cols = -age,
    names_to = "fivalence",
    values_to = "means") %>% 
  separate(fivalence, c("average", "freqint", "valence"), sep = "_") %>% 
  select(-average)

glimpse(plot)
## Rows: 3,768
## Columns: 4
## $ age     <dbl> 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18, 18~
## $ freqint <chr> "i", "i", "f", "f", "i", "i", "f", "f", "i", "i", "f", "f", "i~
## $ valence <chr> "pos", "neg", "pos", "neg", "pos", "neg", "pos", "neg", "pos",~
## $ means   <dbl> 2.250000, 2.333333, 1.812500, 1.333333, 1.153846, 2.363636, 1.~

Cleaning up the value names for freqint and valence, we used mutate with ifelse to create a new variable with correct labels before selecting the final set of variables.

plot <- plot %>%  
  mutate(freqint2 = ifelse(freqint == "i", "Intensity of \n Emotional Experience",
                           ifelse(freqint == "f", "Frequency of \n Emotional Experience", freqint))) %>% 
  mutate(valence2 = ifelse(valence == "pos", "Positive Emotions", 
                           ifelse(valence == "neg", "Negative Emotions", valence))) %>% 
  select(valence2, age, freqint2, means)

The resultant code was mostly reused from the previous individual graph code (e.g. theme, geom_point and geom_smooth to create the scatterplot with a removed background) but with the notable addition of facet_grid from the ggplot2 package which allows the plotting of multiple grid graphs grouped across chosen variables.

In this case, we needed to place the frequency and intensity on the y-axis and the emotions on the x-axis so we inputted the freqint and valence and, since this defaults to putting the y-axis variable on the right, we switched it over to the left through ‘switch = “y”.’ Axes were labelled using xlab and ylab for x- and y-axes respectively.

ggplot(plot, aes(x = age, y = means)) +
  geom_point(alpha = 1/18) +
  ylim(0, 2.5) +
  theme_classic() +
  theme(panel.background = element_rect(colour = "black",
                                        size = 1/40),
        strip.text.x = element_text(size = 10, face = "bold"),
        strip.text.y = element_text(size = 10, face = "bold"),
        strip.placement = "outside",
        strip.background = element_blank(),
        axis.title.x = element_text(face = "bold")) +
  geom_smooth(method = "lm", colour = "black", size = 1.5) +
  facet_grid(freqint2 ~ valence2, switch = "y") +
  xlab("Age (Years)") +   
  ylab("")

As seen above, this approach managed to address many of the issues present within the first attempt but there were still two areas that required correction:

  • Incorrect regression lines (didn’t match the ones in the study)
  • Minor aesthetic changes with axis title fonts and formatting

To fix these problems, we used the coord_cartesian function as opposed to ylim alone, which we identified to be the cause behind the misaligned regression lines, since coord_cartersian acts to zoom in on the plot without changing the underlying data. Regarding aesthetic changes, we changed the font to “Arial Narrow” through the extrafont package which allows fonts currently installed on the computer to be imported and used in R.

ggplot(plot, aes(x = age, y = means)) +
  geom_point(alpha = 1/13, size = 2) +
  coord_cartesian(ylim = c(0,2.5)) +
  theme_bw() +
  theme(axis.text = element_text(family="Arial Narrow", size = 12)) +
  theme(panel.grid.major = element_blank(), 
        panel.grid.minor = element_blank()) +
  theme(panel.background = element_rect(colour = "black",
                                        size = 1/40),
        strip.text.x = element_text(family="Arial Narrow", 
                                    size = 14, face = "bold"),
        strip.text.y = element_text(family="Arial Narrow", 
                                    size = 13, face = "bold"),
        strip.placement = "outside",
        strip.background = element_blank(),
        axis.title.x = element_text(family="Arial Narrow", 
                                    size = 14, face = "bold")) +
  geom_smooth(method = "lm", colour = "black", size = 1.5) +
  facet_grid(freqint2 ~ valence2, switch = "y") +
  xlab("Age (Years)") +
  ylab("")

Exploration

Does Education Affect an Individual’s Perception of Risk?

Whilst the authors examined possible correlations between age and perceived risk, they neglected to analyse variations in perceived risk across different levels of education. This would be an interesting area of investigation as it could be posited that higher levels of education are commensurate with more informed perspectives of the pandemic threat and therefore influence ratings of perceived risk, with higher or lower perceived scores indicative of an overreaction or underreaction. Especially since age is generally proportional to level of education, the data could suggest a moderating influence of education on perceived risk.

The first step in analysing this would be to create a relevant data frame comprised of education, risk to self, risk of complications, and risk to the population using the select function. For the sake of clarity, I’m also going to use rename to change the risk variable names to more appropriate and understandable labels.

exp <- covid %>% 
  select(educ, risk_self, risk_pop, risk_comp) %>% 
  rename(Self = "risk_self",
         Population = "risk_pop",
         Complications = "risk_comp")

As I noticed the value names for the educ variable were quite lengthy, I took the liberty of splitting them up into separate lines by creating a new variable educ2 with mutate and applying ifelse to change each value. The educ variable now becomes redundant so I removed it with select.

exp <- exp %>% 
  mutate(educ2 = ifelse(educ == "Completed 4-year college (BA, BS)",
                        "Completed 4-year college \n (BA, BS)",
                        ifelse(educ == "Completed graduate or professional degree",
                               "Completed graduate or \n professional degree",
                               ifelse(educ == "Graduated high school (or GED)",
                                      "Graduated high school \n (or GED)",
                                      ifelse(educ == "Some college or technical school",
                                             "Some college or \n technical school",
                                             educ))))) %>% 
  select(-educ)

I thought a column graph would be the most appropriate way to present the data so I used ggplot and more specifically, geom_bar, to create the plot and inputting for mean values and for the bars to be placed alongside each other as opposed to the default ‘stacked’ option. To reduce code and pivot the data quickly, I chose to use gather instead of pivot_longer as they share the same purpose but the former is much easier to pipe in this format.

ggplot(exp %>% gather(Risk, Mean, c(Complications, Population, Self)), 
             aes(x = educ2, y = Mean, fill = Risk)) +
  geom_bar(stat = "summary", fun = "mean", position = "dodge") +
  xlab("Level of Education")

Whilst I thought this was a relatively good depiction of the data, I endeavoured to find a better way of visualising it as well as a way to add error bars for additional information. I also noticed that the levels of education were not ordered properly so that was something that needed attention.

To achieve these goals, I first addressed the disorganised education levels by using factor to manually reorder them into ascending order.

exp$educ2 <- factor(exp$educ2, 
                    levels = c("Less than high school", 
                           "Graduated high school \n (or GED)",
                           "Some college or \n technical school",
                           "Completed 4-year college \n (BA, BS)",
                           "Completed graduate or \n professional degree"))

I found that the easiest way to create error bars was to create two separate data frames: one containing the mean values for each risk type and the other containing the standard error values for each risk type, then merge them together into a single data frame using bind_cols.

For the first data frame, after grouping by education level through group_by, I used pivot_longer and summarise to collate all the types of risk into one variable and the respective mean values under another variable.

In the second data frame, I calculated the standard error for each risk type using summarise with the std.error function from the plotrix package, which provides numerous accessory functions for easier plotting and calculation. This was followed by pivot_longer to again house each risk type under one variable and standard errors under another.

Both data frames were then combined with bind_cols to form one data frame that was able to be graphed.

exp1 <- exp %>% 
  group_by(educ2) %>% 
  summarise(Complications = mean(Complications),
            Population = mean(Population),
            Self = mean(Self)) %>% 
  pivot_longer(
    cols = c(Complications, Population, Self),
    names_to = "Type",
    values_to = "means")

exp2 <- exp %>% 
  group_by(educ2) %>% 
  summarise(Complications = std.error(Complications),
            Population = std.error(Population),
            Self = std.error(Self)) %>% 
  pivot_longer(
    cols = -educ2,
    names_to = "Type",
    values_to = "se"
  ) %>% 
  select(se)

exp_full <- bind_cols(exp1, exp2)

glimpse(exp_full)
## Rows: 15
## Columns: 4
## $ educ2 <fct> "Less than high school", "Less than high school", "Less than hig~
## $ Type  <chr> "Complications", "Population", "Self", "Complications", "Populat~
## $ means <dbl> 2.600000, 3.100000, 1.700000, 2.113208, 3.056604, 2.066038, 2.50~
## $ se    <dbl> 0.52068331, 0.27688746, 0.39581140, 0.13559675, 0.11673161, 0.12~

This resultant data frame was then plotted with ggplot and the function geom_col was used instead of geom_bar as the latter, by default, plots the frequencies of an inputted x-axis variable, but the former plots specified x- and y-axis variables. Through geom_errorbar, I was able to add the error bars and, for further clarity, I divided each risk type into its own section with facet_wrap as I noted that in the first attempt, it was a little difficult to gauge education level on particular domains of perceived risk and was instead focusing on differences in risk domains for each level of education.

Some aesthetic changes were made as well using theme, where I reused some code from the reproduced figure in the study, and I removed unnecessary white space between the bars and the x-axis through scale_y_continuous.

ggplot(exp_full, aes(x = educ2, y = means, fill = educ2)) +
  facet_wrap(vars(Type)) +
  geom_col(position = "dodge") +
  geom_errorbar(aes(ymin = means-se, ymax = means+se), 
                width = 0.5, 
                position = "dodge") +
  scale_x_discrete(labels = NULL) +
  theme_minimal() +
  scale_fill_discrete(name = "Level of Education") +
  theme(strip.text.x = element_text(family="Arial Narrow", 
                                    size = 14, face = "bold"),
        strip.text.y = element_text(family="Arial Narrow", 
                                    size = 13, face = "bold"),
        axis.title.y = element_text(family="Arial Narrow", 
                                    size = 14, face = "bold")) +
  scale_y_continuous(expand = c(0,0), limits = c(0, 3.5)) +
  xlab("") +
  ylab("Mean Perceived Risk") +
  ggtitle("Level of Education on Perceived Risk Domains")

Interestingly enough, education did not affect an individual’s perception of the population’s risk whilst there was an observable trend for the risk to self, perhaps suggesting that generally most people consider others more prone to contracting the virus and that those with less education were more likely to underweight the threat posed by the pandemic. Alternatively, it could be the case that people with higher levels of education were more likely to overweight their risk of catching COVID but regardless, education appears to be a strong determinant in their assessment of self-risk.

Similar to the risk to the population, an individual’s perceived risk of developing complications from COVID seems to arise independently of education level, with a seemingly inexplicable decline in risk rating for those who have graduated high school or obtained a GED.

The large error bars for individuals with less than high school education could be explained by the unrepresentative sample, with the 88% figure of those who have a minimum college education deviating significantly from the 60% of the US population who received the same amount of education.

Therefore, education does affect a person’s perception of risk but mainly for risk to themselves and not risk to the population or risk of developing complications.

Do Any Personality Dimensions Correlate with Perceptions of Risk?

Another variable which the researchers did not explore in sufficient depth was personality traits in influencing perceived risk scores. Again, age has been shown to correlate with some personality dimensions, with increases in agreeableness, conscientiousness and emotional stability over time (McCrae et al. 1999). As the study reports age-related advantages in emotional wellbeing, it is expected for there to be a relatively significant correlation between emotional stability and perceived risk, but it would be interesting to examine whether other personality traits also correlated with an individual’s assessment of risk.

To investigate this area of inquiry, I first created a data frame including the personality dimensions, age and types of risk then calculated correlations with cor.

This was subsequently plotted using the corrplot function as part of the corrplot package which enables the creation of very simple and modifiable correlation plots.

# selecting relevant data and calculating correlations
interesting <- covid %>% 
  select(Openness, Conscientiousness, Extraversion, Agreeableness, 
         Em_Stability, risk_self, risk_pop, risk_comp, age) %>% 
  cor()

# plotting correlations
corrplot(interesting, type = "lower", 
         addCoef.col = "black", method = "color", diag = FALSE)

There seem to be quite substantial correlations between age and conscientiousness (r = .24), agreeableness (r = .3) and emotional stability (r = .26) and the significance of these correlations can be further explored using cor.test.

# creating another data set with same variables but no correlation calculation

var <- covid %>% 
  select(Openness, Conscientiousness, Extraversion, Agreeableness, 
         Em_Stability, risk_self, risk_pop, risk_comp, age)

# Age and Conscientiousness

cor.test(var$Conscientiousness, var$age)
## 
##  Pearson's product-moment correlation
## 
## data:  var$Conscientiousness and var$age
## t = 7.5608, df = 943, p-value = 9.485e-14
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.1780158 0.2982986
## sample estimates:
##       cor 
## 0.2390741
# Age and Agreeableness

cor.test(var$Agreeableness, var$age)
## 
##  Pearson's product-moment correlation
## 
## data:  var$Agreeableness and var$age
## t = 9.5052, df = 943, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2363752 0.3528099
## sample estimates:
##       cor 
## 0.2956904
# Age and Emotional Stability

cor.test(var$Em_Stability, var$age)
## 
##  Pearson's product-moment correlation
## 
## data:  var$Em_Stability and var$age
## t = 8.3547, df = 943, p-value = 2.329e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.2021350 0.3209229
## sample estimates:
##       cor 
## 0.2625233

Hence, all these correlations are very significant (p < 0.001), consistent with McCrae et al. (1999).

Age was correlated heavily with perceived risk as reported in the study but surprisingly there were no substantial correlations between personality and perceived risk except for emotional stability. It is interesting to note that there only seems to be a significant relationship for perceived risk to the population (r = -.16, p < .001), indicating that the more emotionally stable people were, the less they viewed the population to be threatened by the virus with risk towards themselves or developing complications remaining relatively unaffected.

A possible explanation of this finding could be that emotionally unstable people harbour more cynical views of other people’s compliance and competence whilst those who are more emotionally stable have more optimistic outlooks regarding the COVID threat.

# Emotional stability and perceived risk of population 
cor.test(var$Em_Stability, var$risk_pop)
## 
##  Pearson's product-moment correlation
## 
## data:  var$Em_Stability and var$risk_pop
## t = -5.1225, df = 943, p-value = 3.656e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.2259399 -0.1018344
## sample estimates:
##        cor 
## -0.1645383

To supplement this result, I also conducted a linear regression analysis of the relationship between the various personality dimensions and average perceived risk through the use of lm.

In creating the new data frame, I selected all the relevant variables and averaged the three types of risk into a new variable by using mutate and inputting an average calculation.

The lm function then performed a linear regression analysis and produced the following output:

# creating the data frame

new <- covid %>% 
  select(Openness, Conscientiousness, Extraversion, Agreeableness, 
         Em_Stability, risk_self, risk_pop, risk_comp, age) %>% 
  mutate(avg_risk = (risk_self + risk_comp + risk_pop)/3) %>% 
  select(-starts_with("risk"))

# calculating the linear regression

reg <- lm(avg_risk ~ Openness + Conscientiousness + 
            Extraversion + Agreeableness + Em_Stability, new)

summary(reg)
## 
## Call:
## lm(formula = avg_risk ~ Openness + Conscientiousness + Extraversion + 
##     Agreeableness + Em_Stability, data = new)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.57099 -0.62359  0.02178  0.56369  2.59794 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.61375    0.17092  15.292  < 2e-16 ***
## Openness           0.02187    0.02531   0.864    0.388    
## Conscientiousness  0.03112    0.02412   1.290    0.197    
## Extraversion      -0.01380    0.01898  -0.727    0.468    
## Agreeableness      0.03418    0.02577   1.326    0.185    
## Em_Stability      -0.08872    0.02187  -4.056  5.4e-05 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8862 on 939 degrees of freedom
## Multiple R-squared:  0.01863,    Adjusted R-squared:  0.01341 
## F-statistic: 3.565 on 5 and 939 DF,  p-value: 0.003364

Therefore, there was a significant main effect of emotional stability on average perceived risk (p < .001) and a significant regression equation was found (F(5, 939) = 3.565, p < .01) with an R^2 of 0.01.

To conclude, an individual’s average perception of risk is only significantly correlated with emotional stability, with the other personality domains showing substantially weak correlations that are largely uninformative in nature.

Do The Future Time Perspectives of Different Age Groups Vary Considerably Across Perceived Risk?

One of the primary objectives of the study was to measure a possible relationship between average FTP scores (with larger scores indicative of a more expansive, opportunistic future and lower scores indicative of a more limited, restrictive future) and perceived risk across age. But as they examined this relationship with age as a continuous variable, it eliminated any observations regarding variation between people in different stages of life, with possible fluctuations in the data for younger adults compared to middle aged adults compared to more elderly individuals unable to be analysed.

Hence, I sought to organise ages into 6 groups, with each group depicting an age range representative of particular decades of life where people may experience possible mental and emotional shifts. This was achieved through the use of case_when and between to classify different age ranges. Similar to the previous exploratory analysis, I averaged the risk variables with mutate and subsequently selected the relevant variables required for plotting through the select function.

albatross <- covid %>% 
  select(risk_self, risk_pop, risk_comp, FTP_av, age) %>%   
  mutate(average_risk = (risk_self + risk_comp + risk_pop)/3,  
         age_range = case_when(                                
           between(age, 18, 24) ~ "18-24",
           between(age, 25, 34) ~ "25-34",
           between(age, 35, 44) ~ "35-44",
           between(age, 45, 54) ~ "45-54",
           between(age, 55, 64) ~ "55-64",
           between(age, 65, 76) ~ "65-76"
         )) %>% 
  select(average_risk, age_range, FTP_av)

glimpse(albatross)
## Rows: 945
## Columns: 3
## $ average_risk <dbl> 2.333333, 2.666667, 1.333333, 2.666667, 2.000000, 2.33333~
## $ age_range    <chr> "18-24", "18-24", "18-24", "18-24", "18-24", "18-24", "18~
## $ FTP_av       <dbl> 4.6, 4.2, 6.2, 6.6, 6.6, 3.8, 5.2, 5.5, 4.2, 4.0, 4.9, 4.~

For the actual plotting, I decided a line graph would be the most appropriate choice so I used ggplot and geom_smooth to graph a regression line between average perceived risk and average future time perspective. To make the graph slightly more aesthetically appealing, I used theme to make some alterations to the text formatting of the axes titles and also added suitable labels to the x-, y- and legend axes.

ggplot(albatross, aes(y = FTP_av, x = average_risk, colour = age_range)) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_bw() +
  theme(panel.background = element_rect(colour = "black", size = 1/40),
        axis.title.x = element_text(family ="Arial Narrow", 
                                    size = 14, face = "bold"),
        axis.title.y = element_text(family = "Arial Narrow", 
                                    size = 14, face = "bold")) +
  scale_color_discrete(name = "Age Range") +
  ylab("Average FTP Score") +
  xlab("Average Perceived Risk")

From this graph, it can be seen that from mid-40s to mid-70s, people follow a similar trend for increases in average risk and decreases in their future time perspective. However, people younger than those ages displayed slight variations in the effect of their perceptions of risk on the future - the more middle aged individuals between 35 and 44 were less affected by their perceived risk on the outcome of the future whilst those between ages 25 and 34 reflected the steepest decline in their beliefs of a broad future the more threatening their perception of the virus was.

This could be attributed to most people mid-20s to mid-30s acknowledging that they are in the prime of their life and the emergence of the pandemic has significantly limited their options both now and consequently, far off into the future. Those who were less concerned could still have retained their vision of more opportunities later in life whilst those who were more concerned could have catastrophised their potential prospects.

As for the more middle-aged individuals, they may have more stable, well-established lives with most of their major goals achieved so their perceptions of the pandemic would have less of an impact on their future.

Also to echo the findings in the study, overall the older the age brackets of the participants, the more restrictive their mindset on the future.

Though surprisingly, the relationship between perceived risk and future time perspectives in the youngest individuals between 18 and 24 was very similar to that of elderly people. This could be due to a less realised understanding of the threat posed by the pandemic on their future or maybe a more rational and accurate perception but regardless, it would have been expected for them to overreact, especially if they had higher perceived risk, and judge their future to be extremely limited and closed.

Therefore, age groups do indicate marked distinctions in future time perspective across perceived risk, but this is mainly present for individuals between 25-34 and 35-44 with the youngest and more elderly people displaying similar trends.

Recommendations

Carstensen, Shavit, and Barnes (2020) should be commended for taking the appropriate steps to facilitate reproducibility attempts for their study. They made many of their materials openly available, including the data set, codebook and additional survey items and questionnaires. Both the data and codebook were very clearly and accurately annotated and averages for many of the data points were already calculated. As a result, we only ran into a few issues when reproducing the reported statistics, figures and tables, with a majority of these challenges arising from inexperience and unfamiliarity with R.

However, if we were to provide recommendations to make reproducibility efforts easier, we would suggest the inclusion of a README document that outlines any manipulations or exclusions made to the data as well as any specific analyses or packages used to produce their results. Also, perhaps submitting the entire raw data set would be helpful to ensure their exclusion criteria and average calculations were accurate. To prevent excessive data manipulation, they could input values that are already ordered correctly and contain no unnecessary signs or punctuation marks (e.g. “$” and “,” in income variable) or at the very least leave instructions on how to easily remove or change the values for quick and simple analysis and plotting. If possible, the authors could have supplied the code they used and that would very rapidly accelerate reproducibility attempts.

References

Carstensen, Laura L, Helene H Fung, and Susan T Charles. 2003. “Socioemotional Selectivity Theory and the Regulation of Emotion in the Second Half of Life.” Motivation and Emotion 27 (2): 103–23.
Carstensen, Laura L, Yochai Z Shavit, and Jessica T Barnes. 2020. “Age Advantages in Emotional Experience Persist Even Under Threat from the COVID-19 Pandemic.” Psychological Science 31 (11): 1374–85.
Charles, Susan Turk. 2010. “Strength and Vulnerability Integration: A Model of Emotional Well-Being Across Adulthood.” Psychological Bulletin 136 (6): 1068.
McCrae, Robert R, Paul T Costa, Margarida Pedroso de Lima, Antonio Simoes, Fritz Ostendorf, Alois Angleitner, Iris Marusic, et al. 1999. “Age Differences in Personality Across the Adult Life Span: Parallels in Five Cultures.” Developmental Psychology 35 (2): 466.