There is an abundance of evidence which suggests that emotional experience improves with age. For example, older people have more positive daily experiences than younger people (Burr et al., 2020), manage their emotions better (Gross et al., 2007), and demonstrate greater emotional stability (Burr et al., 2020). As such, the age advantage in emotional experience has been well-established, but the mechanism behind it is still unknown.
There are two main theories which attempt to explain these age advantages in emotional experience. Socioeconomical selectivity theory (SST) suggests that the age advantage in emotional experience results from a tendency of people to shift their goals towards emotional meaning and positive experience as they age. Comparatively, the strength and vulnerability integration (SAVI) model suggest that the age advantage in emotional experience can be attributed to people increasingly avoiding stressful situations as they age.
The current study by Carstensen et al. (2020) aimed to test whether the age advantages in emotional experience persisted when people are exposed to prolonged and inescapable threats. To test whether older people have a better emotional experience because they actively avoid stressful situations, as suggested by the SAVI model, this sample was tested during the COVID-19 pandemic, when the stressful threat of infection could not be avoided. The researchers issued a questionnaire to 974 people measuring their age, frequency and intensity of positive and negative emotions, perceived risk of contagion, as well as personality, health and demographic characteristics.
The results showed that the relationship between age and emotional experience persisted, both when analyses controlled and did not control for perceived risk of contagion and other covariates. Therefore, the researchers concluded that the older adults’ relatively better emotional well-being persists even in the face of prolonged and inescapable stress, which is in opposition to the SAVI model.
I was surprised that the age advantage in emotional experience did not diminish/disappear during the COVID-19 pandemic. Considering that age has been established as a risk factor for complications or mortality from COVID-19, and that the pandemic has been the most prominent global issue for over a year, it seems likely that those with the greatest risk of death would be the most concerned about the virus. Nevertheless, this was a relatively early stage in pandemic and there was no evidence to suggest that in general, people’s emotional experience had diminished as a result of the virus, which had only infected a very small fraction of the US population. This makes me wonder whether the pandemic environment affects different groups of people too. For example, it could be that other variables other than age, like gender or income, also predict emotional experience in this sample. The researchers discuss interaction effects of these variables on emotional experience when combined with age, but they do not investigate main effects of these other variables on emotional experience, which could be interesting.
The evidence for the claim that the age advantage in emotional experience persists in an environment of prolonged stress seemed like a stretch. The study was conducted in April 2020, when the COVID-19 pandemic had only just become a global issue. It is entirely possible that many of the participants in the sample did not consider COVID-19 to be a threat. This seems more likely when you consider the nation-wide reaction to COVID-19 in the US and the relatively low infection/death rates at that stage in the pandemic, compared to what they were even six months later. The researchers themselves admit that they have no pre-pandemic baseline to which they can compare their results and see whether stress/perceived threat has increased sufficiently for the environment to be characterised as one of “prolonged and inescapable stress”. Also, the researchers would have been wise to focus more on the perceived threat of the pandemic and how this (and the other covariates) account for the age advantage in emotional experience.
Further work in this area needs to consider the proper establishment of baseline values and the validated definition of the national environment as one of prolonged stress. Perhaps they should conduct a 6-month or 12-month follow-up with the original sample or conduct a replication of the study to see whether their results hold up at a later, more serious stage of the pandemic, when threats of lock down, infection and even death are greater than ever. Having said that, it could be the case that people have become desensitised to the threat of the pandemic at this stage, which again supports the idea of including a stronger measure of stress to investigate how people’s views have changed. Further research in this area should also aim to confirm the idea put forward by the SST, that older people have a better emotional experience because they prioritise emotional experiences over exploration, and whether this explanation holds up during the pandemic.
The following packages were used to produce the verification report.
library(tidyverse)
library(ggplot2)
library(here)
library(dplyr)
library(papaja)
library(psych)
library(MOTE)
library(wesanderson)
library(ggpubr)
master <- read.csv("AgeAdvantagesEmotionCovid_Data.csv", header = TRUE)
The role of each package in this verification report:
tidyverse: Allows for easy data wrangling with shared language across packages.
ggplot2: Allows for detailed data visualisation using a layered approach.
here: Allows for construction of a path to project files instead of setting working directory.
dplyr: Allows for simple data manipulation.
papaja: Allows for APA formatting of results and tables.
psych: A toolbox for personality, psychometrics and experimental psychology. Includes “describe” function (M and SD).
MOTE: Easy-to-use effect size calculator compatible with “papaja” package.
wesanderson: Cool colour palettes for graphs.
ggpubr: Allows for quick creation of quantile plots and other publication-ready plots.
Here are the demographics we are trying to reproduce:
For any demographic statistic in percentage form, a single process can be used. First, select from the master data set the target variable, then filter() the response you are interested in. Next, calculate the percentage of participants who answered with that response by counting the number of rows (participants) with that response and dividing it by the total number of rows in the master data set. This process is used to reproduce several of the demographics, so I won’t repeat myself later on.
# Demographics
## Percentage of females
### Filter female participants
n_female <- master %>%
select(gender_bin_f) %>%
filter(gender_bin_f == "Female")
### Calculate percentage of females
100 * nrow(n_female) / nrow(master)
## [1] 49.20635
To find the age range of the sample, select the age variable and apply the range() function. To find the age M and SD of the sample, the summarise() function from the plyr package can be used and the mean and sd can be called from within the function.
## Age range, mean and SD
### Use range function to find age range
age_range <- master$age %>% range()
### Use summarise function to find age mean and SD
age_mean_sd <- master %>%
summarise(mean_age = mean(age), sd_age = sd(age))
age_mean_sd; age_range
## mean_age sd_age
## 1 45.15238 16.78756
## [1] 18 76
## Percentage white
### Filter white participants
n_white <- master %>%
select(race_bin) %>%
filter(race_bin == "White")
### Calculate percentage white
100 * nrow(n_white) / nrow(master)
## [1] 75.44974
Reproducing the median household income is a difficult task. The way the data is coded makes it particularly difficult. We must reorganise and rename the answers to the income variable. Firstly, we use the select() function to select the income variable and use the filter() function to filter out the responses of participants who declined to answer. Next, we separate each individual response into three columns separated by a space, since each response contains three words/numbers. Next, we select only the “low” and “high” columns because the “to” column only contains words like “to” and “than”.
The next part is a little more complicated. The ifelse() function allows us to apply an action to all vectors within a variable. Here, we are creating a new variable called “low2”, and saying that this variable should be identical to the “low variable” but if the response begins with the word “Less” replace it with a value of 0. Similarly, if the value begins with the word “Greater”, we replace it with the value 275,000. Lastly, we drop the old “low” column because now we have our new and improved “low2” column. These two columns now contain the upper and low limits of each participants reported income range.
## Median household income
### Reorganise and rename income answers
dem_income <- master %>%
select(income) %>%
filter(income != "Decline to answer") %>%
separate(income, c("low", "to", "high"), sep = " ") %>%
select(low, high) %>%
mutate(low2 = ifelse(low == "Less", 0,
ifelse(low == "Greater", 275000, low))) %>%
select(-low)
Next, we simply use the gsub() function to substitute the ‘$’ in each variable with nothing. This effectively removes the dollar sign from each response. After that, we make each variable a numeric variable using the as.numeric() function, again substituting the commas in each response with nothing (effectively removing the commas). Finally, we can calculate the median income by creating a new variable midpoint, which is the midpoint of the upper and lower income limits for each participant, and then applying the summarise() function to find the median value of these midpoints. Sure enough, the answer falls within the range described in the original paper.
dem_income$high <- gsub('[$]','',dem_income$high)
dem_income$low2 <- gsub('[$]','',dem_income$low2)
dem_income$high <- as.numeric(gsub(',','',dem_income$high))
dem_income$low2 <- as.numeric(gsub(',','',dem_income$low2))
### Calculate median income
dem_income %>%
mutate(midpoint = (high+low2)/2) %>%
summarise(med_income = median(midpoint))
## med_income
## 1 55000
The rest of the demographics can be found using the same method described earlier regarding how to find percentages.
## Percentage working for pay
### Filter participants that work for pay
n_pay <- master %>%
select(emp_bin) %>%
filter(emp_bin == "Working")
### Calculate percentage working for pay
100 * nrow(n_pay) / nrow(master)
## [1] 56.19048
## Percentage educated
### Filter eductaed participants
n_educated <- master %>%
select(educ) %>%
filter(educ != "Graduated high school (or GED)",
educ != "Less than high school")
### Calculate percentage educated
100 * nrow(n_educated) / nrow(master)
## [1] 87.72487
## Percentage living alone
### Filter participants living alone
n_alone <- master %>%
select(livealone) %>%
filter(livealone == "Yes")
### Calculate percentage living alone
100 * nrow(n_alone) / nrow(master)
## [1] 22.96296
Success! All of the demographic statistics were reproduced. This process was lengthy and required a lot of repetition. The tidyverse package is extremely valuable for data wrangling, particularly because of the ability to pipe between actions, allowing you to construct a hierarchy of steps (i.e. select, then filter, then mutate etc). The median household income was definitely the hardest to reproduce, because the original data was not in a format appropriate to find the median. A little bit of re-coding in the original survey from the authors would have saved a lot of time here.
The main lesson we learned from our first step in the verification report was that there are multiple ways to achieve the same result in R. Our original code for reproducing the demographics was much longer and less efficient, but the ability to pipe between functions granted by the tidyverse package allowed us to “trim the fat” of our code significantly.
We intend to reproduce the summary statistics (means and SDs) for all of the following variables described in the paper:
After the lengthy process of reproducing the demographics, we decided to look for a more efficient way to reproduce the summary statistics. We found the psych package which contained the describe() function, which effectively produces the means and SDs for all variables you place within it. As such, first we created a new data frame with all the variables of interest, then ran it through the describe function. If you print the result, you get a table of summary statistics. The describe() function has a lot of arguments. For our example, most are not necessary, but here is a description of each one:
na.rm: The default is to delete missing data. na.rm=FALSE will delete the case.
interp: Should the median be standard or interpolated
skew: Should the skew and kurtosis be calculated?
ranges: Should the range be calculated?
trim: trim=.1 – trim means by dropping the top and bottom trim fraction
type: Which estimate of skew and kurtosis should be used?
check: Should we check for non-numeric variables?
fast: if TRUE, will do n, means, sds, min, max, ranges for an improvement in speed. If NULL, will switch to fast mode for large (ncol * nrow > 10^7) problems, otherwise defaults to fast = FALSE
quant : if not NULL, will find the specified quantiles (e.g. quant=c(.25,.75) will find the 25th and 75th percentiles)
IQR: If TRUE, show the interquartile range
omit: Do not convert non-numerical variables to numeric, omit them instead
# Summary Statistics
### Select the appropriate variables from the data set
sum_stat <- master %>%
select(FTP_av, FTP_Ext, FTP_Opp, FTP_Con,
risk_self, risk_comp, risk_pop, emp_affect,
health, Extraversion, Conscientiousness, avg_f_pos,
avg_f_neg, avg_i_pos, avg_i_neg)
### Create a table of summary statistics using the describe function
sum_table <- describe(sum_stat, na.rm = TRUE, interp=FALSE,
skew = FALSE, ranges = FALSE,trim=.1,
type=3,check=FALSE,fast=NULL,quant=NULL,
IQR=FALSE,omit=TRUE,data=NULL) %>%
select(-starts_with("vars"))
print(sum_table)
## n mean sd se
## FTP_av 943 4.09 1.35 0.04
## FTP_Ext 943 4.02 1.91 0.06
## FTP_Opp 943 4.73 1.62 0.05
## FTP_Con 943 3.74 1.76 0.06
## risk_self 945 2.36 1.11 0.04
## risk_comp 945 2.45 1.34 0.04
## risk_pop 945 2.97 0.94 0.03
## emp_affect 945 1.41 1.46 0.05
## health 945 3.36 0.99 0.03
## Extraversion 945 3.44 1.65 0.05
## Conscientiousness 945 5.22 1.33 0.04
## avg_f_pos 945 1.97 0.56 0.02
## avg_f_neg 945 1.42 0.66 0.02
## avg_i_pos 945 1.93 0.59 0.02
## avg_i_neg 942 1.77 0.70 0.02
Success! All of the summary statistics were reproduced. This was much quicker than reproducing the demographics. The describe() function from the psych package is extremely useful in creating the summary statistics for a range of numeric variables.
Once again, our original code for this section was much longer, and involved using the mean() and sd() function on each individual variable. We could have used the summarise() and across() functions together to produce the same summary stats for all of the variables but the psych package offers a much simpler method and produces the results in a nicely formatted table.
Table 1 displays the mean frequency of emotions. Participants were asked how often they experienced those emotions over the last week and responded on a 5-point likert scale where 0 = “Never” and 4 = “Nearly all the time”. Here is the table we are trying to reproduce:
Our plan to reproduce this table revolves around the apa_table() function within the papaja package. After reading about the papaja package and how its functions can be used to produce APA formatted tables and statistical output, we found that Table 1 looks very similar to the tables which are produced by the apa_table() function. As such, we must reorganise the raw data into the appropriate long format, where the name, mean, SD and CI for each emotion frequency is a separate variable.
Firstly, we must select the relevant variables from the master data set.
# Table 1 - Mean Frequencies of Emotions
### Select the frequencies of each emotion from master data set.
emot <- master %>%
select(f_calm, f_qui, f_app, f_int, f_cont, f_hap, f_rela, f_pea,
f_ener, f_aff, f_amu, f_acc, f_joy, f_pro, f_reli, f_exc,
f_conc, f_anx, f_bor, f_fru, f_irr, f_sad, f_lon, f_fear,
f_ang, f_dis, f_gui, f_emb, f_sha)
Next, we can obtain the means and SDs for each emotion frequency using the summarise() function, in conjunction with the across() function, which allows us to summarise across all of our variables. The “na.rm = TRUE” argument tells R to remove the NA responses from the calculation of the means and SDs.
### Create a new data frame with the descriptives (M and SD) for the emotions.
emot_des <- emot %>%
summarise(
across(.cols = everything(), na.rm = TRUE, list(M = mean, SD = sd)))
Next, we can rename each mean variable according the the name of each respective emotion using the rename() function. The format is “new name” = “old name”.
### Rename the emotions to table-friendly names.
emot_des <- emot_des %>%
rename(Calm = f_calm_M, Quiet = f_qui_M, Appreciative = f_app_M,
Interested = f_int_M, Content = f_cont_M, Happy = f_hap_M,
Relaxed = f_rela_M, Peaceful = f_pea_M, Energetic = f_ener_M,
Affectionate = f_aff_M, Amused = f_amu_M, Accomplished = f_acc_M,
Joyful = f_joy_M, Proud = f_pro_M, Relieved = f_reli_M,
Excited = f_exc_M, Concerned = f_conc_M, "Anxious/worried" = f_anx_M,
Bored = f_bor_M, Frustrated = f_fru_M, Irritated = f_irr_M,
Sad = f_sad_M, Lonely = f_lon_M, Fearful = f_fear_M,
Angry = f_ang_M, Disgusted = f_dis_M, Guilty = f_gui_M,
Embarrassed = f_emb_M, Ashamed = f_sha_M)
Now we have to pivot the data into long form. First, we select the columns which we just renamed, by telling R to select all the columns that DON’T start with “f_”. The pivot_longer() function allows us to take all of the variables we just renamed, put their names into a new column called “Valence and emotion”, and put their values to another column called “M” (for mean).
After that, we can repeat this pivot process for the SD values, by selecting all columns that DO start with “f_” (which are the SD columns) and pivoting their values to a new column called “SD”.
We can combine our two pivoted data frames into one, using the bind_cols() function. Now we have a data frame with three columns which display the names, means and SDs.
### Pivot the descriptive twice - one creates two variable with the name and mean of each emotion, the other creates the SD.
emot_pivot1 <- emot_des %>%
select(-starts_with("f_")) %>%
pivot_longer(cols = everything(),
names_to = "Valence and emotion",
values_to = "M")
emot_pivot2 <- emot_des %>%
select(starts_with("f_")) %>%
pivot_longer(cols = everything(),
names_to = NULL,
values_to = "SD")
### Bind them together to create table-friendly descriptive stats.
emot_des_table <- bind_cols(emot_pivot1, emot_pivot2)
Now it is time to find the confidence intervals for the frequency of each emotion. First, we create a new value for the number of participants and the standard error, which can be found by dividing the SD by the square-root of the number of participants.
Next, we can create two new variables: a lower CI limit and an upper CI limit, using the equation to produce t-based 95% CIs. The qt() function produces the t-value which contains x% of the distribution to left of it, given y degrees of freedom. As such, the use of it below gives the t-value which has 97.5% of the distribution to the left of it, given n-1 degrees of freedom. We can multiply that t-value by the newly created SE and add/subtract it from the means in the “M” column of data frame. Consequently, we get two new variables in our data frame: one for our lower CI limits and another for our upper CI limits.
### Create lower and upper 95% CI limits
n <- 945; se <- emot_des_table$SD / sqrt(n)
emot_des_table <- emot_des_table %>%
mutate(lower_CI = M - qt(1 - (0.05 / 2), n - 1) * se,
upper_CI = M + qt(1 - (0.05 / 2), n - 1) * se)
Next, we must round the CI limit values to two decimal planes using the format() and round() functions. We specify the number of decimals using nsmall = 2.
### Round CIs to 2 d.p. (can't do this above because of non-numeric variable)
emot_des_table$lower_CI <- format(round(emot_des_table$lower_CI, 2), nsmall = 2)
emot_des_table$upper_CI <- format(round(emot_des_table$upper_CI, 2), nsmall = 2)
Time to merge the CI limits into one column. This is a tricky task. After consulting with Jenny, she pointed us in the direction of the unite() function, which combines the values of two columns into one column. The “col” argument specifies the name of the new column we are going to produce, then we specify which columns we want to merge and what we wish to separate the values with. In this case, we wish to separate them with a comma. Finally, to add square brackets around the CI limits, we can use the paste0() function to insert a square bracket on either side of the CI interval.
### Combine the CIs into one column
emot_des_table <- emot_des_table %>%
unite(col = "95% CI",
lower_CI:upper_CI,
sep = ", ") %>%
mutate("95% CI" = paste0("[", `95% CI`, "]"))
Table 1 groups each emotion under “Positive” and “Negative” in the original paper. Originally, we weren’t sure how we could replicate this using the apa_table() function, so we thought we would have to a separate table for each group of emotions. However, after a little investigation we found a solution.
We must first separate all of our variables so far into two new data frames. We can do this using the slice() function, which effectively acts as the select() function, but selects rows instead of columns. Since the emotions are already ordered according to the original table, we can take the first 16 emotions and put them under “Positive”, and put the last 16 emotions under “Negative”.
### Split the data into positive/negative emotions
pos_emot_table <- emot_des_table %>%
slice(1:16)
neg_emot_table <- emot_des_table %>%
slice(17:29)
Finally, it is time to put the data through the apa_table() function. All of this time and effort put in to correctly formatting the data will balance out with the time saved in formatting the aesthetics of the table. Within the apa_table() function, we specify that we would like to list the emotions under “Positive Emotions” and “Negative Emotions”. We can also specify a caption, a note at the bottom of the table, and how to align the values within each column. We can also specify the font size of the values within the table.
### Apply the apa_table function (NOTE: Only produces correct output in APA formatted markdown template which comes with the papaja package)
apa_table(list('Positive emotions' = pos_emot_table,
'Negative emotions' = neg_emot_table),
caption = "Mean Frequencies of Emotions",
note = "N = 945. CI = confidence interval",
align = c('l', 'c', 'c', 'c'),
font_size = "scriptsize")
| Valence and emotion | M | SD | 95% CI |
|---|---|---|---|
| Positive emotions | |||
| Calm | 2.44 | 0.87 | [2.39, 2.50] |
| Quiet | 2.43 | 0.87 | [2.38, 2.49] |
| Appreciative | 2.40 | 0.93 | [2.35, 2.46] |
| Interested | 2.28 | 0.83 | [2.23, 2.33] |
| Content | 2.15 | 0.94 | [2.09, 2.21] |
| Happy | 2.13 | 0.80 | [2.08, 2.19] |
| Relaxed | 2.13 | 0.89 | [2.07, 2.19] |
| Peaceful | 2.05 | 0.95 | [1.99, 2.11] |
| Energetic | 1.90 | 0.80 | [1.85, 1.95] |
| Affectionate | 1.89 | 0.86 | [1.83, 1.94] |
| Amused | 1.87 | 0.72 | [1.83, 1.92] |
| Accomplished | 1.84 | 0.87 | [1.78, 1.89] |
| Joyful | 1.71 | 0.90 | [1.65, 1.76] |
| Proud | 1.67 | 0.97 | [1.61, 1.73] |
| Relieved | 1.48 | 0.88 | [1.42, 1.53] |
| Excited | 1.46 | 0.79 | [1.41, 1.51] |
| Negative emotions | |||
| Concerned | 2.23 | 0.91 | [2.17, 2.29] |
| Anxious/worried | 2.00 | 1.06 | [1.94, 2.07] |
| Bored | 1.88 | 1.12 | [1.80, 1.95] |
| Frustrated | 1.85 | 0.93 | [1.79, 1.90] |
| Irritated | 1.75 | 0.89 | [1.70, 1.81] |
| Sad | 1.69 | 0.97 | [1.63, 1.75] |
| Lonely | 1.55 | 1.25 | [1.47, 1.63] |
| Fearful | 1.38 | 1.06 | [1.31, 1.44] |
| Angry | 1.35 | 0.89 | [1.29, 1.40] |
| Disgusted | 1.16 | 0.99 | [1.10, 1.22] |
| Guilty | 0.63 | 0.87 | [0.58, 0.69] |
| Embarrassed | 0.51 | 0.76 | [0.47, 0.56] |
| Ashamed | 0.44 | 0.76 | [0.40, 0.49] |
Note. N = 945. CI = confidence interval
Success! Table 1 was reproduced. It is important to note that the results from the apa_table() function look best when the markdown file is knitted to a Word document or PDF. In HTML form, the table still looks nice, but it is definitely not in APA format. This function (and package) is super cool. It is extremely useful for anybody who needs to produce APA formatted output for manuscripts or assignments. It does all the formatting for us, which is nice considering what is coming next!
Figure 1 shows the comparative frequency and intensity of emotions across different ages. Here are the scatter plots we are trying to reproduce:
This is a tough task, so we can break it down into two parts - preparing the data and plotting the data.
First we need to prepare the data for visualisation. We start by selecting the appropriate variables (age and emotion frequency and intensity), omitting the NA responses, and pivoting the emotion data into long form.
Next, we create a new variable called “freqint2” and use the ifelse() function again to apply two conditions to the values in the variable. The first condition specifies that all values that begin with “i” (signalling intensity) are to be replaced by “Intensity of Emotional Experience”. Secondly, everything that begins with “f” (signalling frequency) is replaced with “Frequency of Emotional Experience”. The “/n” tells R to push the rest of the words to the next line (akin to pressing “enter” on your keyboard).
We repeat this step for the valence of the emotions, making the values either “Positive Emotions” or “Negative Emotions”.
Finally, the use the select() function to select only the variables of interest, discarding our old variables which we have since mutated.
### Select relevant variables, omit missing data, reorganise data into long form, create new variables for intensity and valence.
plot_data <- master %>%
select(age, avg_i_pos, avg_i_neg, avg_f_pos, avg_f_neg) %>%
na.omit() %>%
pivot_longer(cols = -age,
names_to = "fivalence",
values_to = "means") %>%
separate(fivalence, c("average", "freqint", "valence", sep = "_")) %>%
select(age, freqint, valence, means) %>%
mutate(freqint2 = ifelse(freqint == "i", "Intensity of \n Emotional Experience",
ifelse(freqint == "f", "Frequency of \n Emotional Experience", freqint))) %>%
mutate(valence2 = ifelse(valence == "pos", "Positive Emotions",
ifelse(valence == "neg", "Negative Emotions", valence))) %>%
select(valence2, age, freqint2, means)
The second step is to plot the data. This is where we first used the ggplot package, which allows for effective visualisation of data by creating layered graphs. Each line of code below specifies an aesthetic component of the graph.
First, create the plot using the ggplot() function, assigning the data, the “age” variable along the x-axis, the emotion intensity/frequency along the y-axis.
The geom_point() function specifies that the data should be plotted as dots (scatter plot), and the arguments within specify the transparency (alpha) and size of the dots.
The coord_cartesian() function allows us to set the appropriate y-axis limits to reflect the scale of the original figure. The bulk of the rest of the code deals with the aesthetics of the figure text to help it resemble the original figure.
By layering the theme() function over and over, we can add multiple aesthetic changes. These changes include:
Change the colour scheme to black and white using the theme_bw() function.
Change the axis text to Arial size 12 using the element_text() function.
Remove the background grids from the plots by assigning panel.grid.major = element_blank() and panel.grid.minor = element_blank().
Change the background colour to black using panel.background = element_rect()
Change the size and font of the axis titles and make them bold using the same element_text() function as above.
Relocate the axis titles to outside the plots using strip.placement = "outside". This becomes more important when we group the plots together.
Next, we add the regression lines using the geom_smooth() function, assigning “lm” (linear model) as the method as well as specifying the size and colour. After that, we can combine the four plots using the facet_grid function, specifying which axis we want to plot against which using the “~” symbol. Finally, we rename the x and y labels by using the xlab() and ylab() functions accordingly.
### Create scatter plots with ggplot and geom_point. geom_smooth adds regression lines and facet_grid arranges the four plots together. Rest of code makes aesthetic changes.
scatter_plot <- ggplot(plot_data, aes(x = age, y = means)) +
geom_point(alpha = 1/13, size = 2) +
coord_cartesian(ylim = c(0,2.5)) +
theme_bw() +
theme(axis.text = element_text(family="Arial Narrow", size = 12)) +
theme (panel.grid.major = element_blank(), panel.grid.minor = element_blank()) +
theme(panel.background = element_rect(colour = "black", size = 1/40),
strip.text.x = element_text(family="Arial Narrow", size = 14, face = "bold"),
strip.text.y = element_text(family="Arial Narrow", size = 13, face = "bold"),
strip.placement = "outside",
strip.background = element_blank(),
axis.title.x = element_text(family="Arial Narrow", size = 14, face = "bold")) +
geom_smooth(method = "lm", colour = "black", size = 1.5) +
facet_grid(freqint2 ~ valence2, switch = "y") +
xlab("Age (Years)") +
ylab("")
print(scatter_plot)
After all that, we get our scatter plots!. Unlike Table 1, a lot of time went into formatting the aesthetics of this figure. Having said that, the ggplot package and associated functions made it easy to layer on small aesthetic changes one after the other as we figured out how to alter every little detail of our plots.
The way we display data has a tremendous impact on how it is interpreted. Figures like these demonstrate why sharing code is so important when it comes to reproducibility. It was an arduous journey reproducing these plots, and although it wasn’t necessarily difficult to obtain the same relevant data, it was certainly difficult to visualise it in the same manner as the original authors.
Although I personally did not work on reproducing these plots, this part of the verification report was a real test of endurance and team work in a coding environment. Whilst we took a “divide and conquer” approach to the overall assignment, this particular reproducibility triumph was achieved using the combined effort of multiple group members. Rory, Josh and Lucas did a great job building on each others efforts and helping me understand what each line of code does.
One interesting point brought up by the authors was that the age effects in emotional experience held across gender, although there was a significant age-gender interaction for frequency of positive emotions. Prior research into emotional experience suggests that there are some significant differences in how men and women experience certain positive and negative emotions, but that these can be mostly be explained by social factors (Simon & Nath, 2004). As mentioned in Reaction 1, it would be interesting to see whether there are any gender differences in emotional experience in a social environment as stressful as the one created by the COVID-19 pandemic. Specifically, is there a main effect of gender on frequency of positive and negative emotions in the present sample?
To answer this question, I am going to use Jenny’s approach of finding the descriptive statistics, displaying them visually and then use an appropriate statistical test. First, I can find the means and SDs for positive and negative emotion frequency grouped by gender, by using the same summarise() function from the verification section.
### Select relevant variables
explore_1 <- master %>%
select(gender_bin_f, avg_f_pos, avg_f_neg)
### Acquire the means and SDs of emotion frequency grouped by gender
explore_1_des <- explore_1 %>%
group_by(gender_bin_f) %>%
summarise(across(.cols = everything(), na.rm = TRUE,
list(M = mean, SD = sd)))
Next, I can pivot all of the means and SDs to long form similar to what we did when reproducing Table 1.
### Prepare the data for visualisation by pivoting to long-form
explore_1_long <- explore_1 %>%
pivot_longer(cols = starts_with("avg"),
names_to = "emotion",
values_to = "frequency")
After factoring the emotion variable and renaming the values, we can plot the graph. Using ggplot, and the stat_summary() function, we can add bars for the means, and error bars for the 95% CIs. After making some aesthetic changes using functions already explained in the verification section, we have a nice bar graph. The only new addition is the position_dodge argument which makes sure the specified elements (e.g. error bars) do not overlap which each other.
### Change emotion to a factor
explore_1_long$emotion <- as.factor(explore_1_long$emotion)
### Recode emotions to table-appropriate names
explore_1_long <- explore_1_long %>%
mutate(emotion = recode(emotion,
'avg_f_neg' = "Negative",
'avg_f_pos' = "Positive"))
### Create plot of emotion frequency, grouped by emotion type and gender.
explore_1_plot <- ggplot(data = explore_1_long,
mapping = aes(
x = emotion, y = frequency, fill = gender_bin_f)) +
stat_summary(fun.y = mean,
geom = "bar",
position = "dodge",
colour = "black") +
stat_summary(fun.data = mean_cl_normal,
geom = "errorbar",
width = .2, position = position_dodge(width = 0.90)) +
xlab("Emotion Type") +
ylab("Frequency") +
scale_fill_manual(name = "Gender of Participants",
labels = c("Female", "Non-Female"),
values = c("magenta1", "deepskyblue1")) +
theme_apa() +
theme(legend.position = c(0.19, 0.85),
legend.title = element_text(size = 8),
legend.text = element_text(size = 9)) +
ylim(0, 2.5)
print(explore_1_plot)
As shown in the graph, there could be some statistically significant differences in positive and negative emotion frequency between genders. However, this difference doesn’t seem to be of any practical importance. Nevertheless, to test both of these, we can use a t-test and find the Cohen’s d effect size.
First, we create new variables grouping the mean frequency of positive emotions into “Female” and “Non-female”. We can then use the t.test() function to test for a difference between them.
### Create new variables for female and non-female positive emotion frequency.
female_pos <- explore_1_long %>%
filter(gender_bin_f == "Female", emotion == "Positive")
nonfemale_pos <- explore_1_long %>%
filter(gender_bin_f == "Non-female", emotion == "Positive")
### Use t.test function
t.test(female_pos$frequency, nonfemale_pos$frequency)
##
## Welch Two Sample t-test
##
## data: female_pos$frequency and nonfemale_pos$frequency
## t = -2.3124, df = 942.3, p-value = 0.02097
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.15615986 -0.01278386
## sample estimates:
## mean of x mean of y
## 1.929173 2.013645
Non-female individuals experience on average more positive emotion over a week than female individuals. This result is statistically significant p < .05.
To calculate effect size, we can use the MOTE package, which contains a range of functions designed for easy effect size calculation. Because our test is an independent samples t-test, we use the d.ind.t() function and input the relevant values. After that, we can tell R to print just Cohen’s d.
### Use the functions from the MOTE package to calculate the effect size of the above difference.
pos_freq_es <- d.ind.t(m1 = mean(female_pos$frequency),
m2 = mean(nonfemale_pos$frequency),
sd1 = sd(female_pos$frequency),
sd2 = sd(nonfemale_pos$frequency),
n1 = 465,
n2 = 480,
a = 0.05)
### Print only Cohen's d
pos_freq_es$d
## [1] -0.1504561
The effect would be considered small (d < 0.2).
Now we can repeat the same process for the negative emotions.
### Create new variables for female and non-female negative emotion frequency.
female_neg <- explore_1_long %>%
filter(gender_bin_f == "Female", emotion == "Negative")
nonfemale_neg <- explore_1_long %>%
filter(gender_bin_f == "Non-female", emotion == "Negative")
### Use t.test function
t.test(female_neg$frequency, nonfemale_neg$frequency)
##
## Welch Two Sample t-test
##
## data: female_neg$frequency and nonfemale_neg$frequency
## t = 4.5822, df = 937.45, p-value = 5.222e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.1116344 0.2788905
## sample estimates:
## mean of x mean of y
## 1.516801 1.321539
Once again it revealed a significant result, but we should check the size of the effect.
### Use the functions from the MOTE package to calculate the effect size of the above difference.
neg_freq_es <- d.ind.t(m1 = mean(female_neg$frequency),
m2 = mean(nonfemale_neg$frequency),
sd1 = sd(female_neg$frequency),
sd2 = sd(nonfemale_neg$frequency),
n1 = 465,
n2 = 480,
a = 0.05)
### Use t.test function
neg_freq_es$d
## [1] 0.298371
This is a small-medium effect size (0.2 < d < 0.5). As such, frequency of positive and negative emotions do differ across genders. Non-female individuals seem to experience slightly more positive emotions and less negative emotions on average compared to female individuals. The difference in negative emotional experience is also larger compared to the difference in positive emotional experience. These results are in alignment with previous research and indicate that this effect persists in the COVID-19 environment.
Recent studies by Stolarski et al. (2020) and Witowska et al. (2020) both found that higher levels of neuroticism are associated with less balanced time perspectives. In other words, people high in neuroticism have difficulties balancing their perception of past, present and future time.
The authors of the original paper measured the Big 5 personality traits and the future time horizons of participants, however, they did not investigate any potential relationship between the two. It would be interesting to see whether people lower in emotional stability (high neuroticism) perceive their future to contain more limitations than people higher in emotional stability (low neuroticism).
To investigate this, we can test if there are is correlation between the two variables. Like before, we can see if there is any visually noticeable linear relationship. Again, we can use geom_point(), add some jitter to the data points and a regression line using geom_smooth().
### Select relevant variables
explore_3 <- master %>%
select(Em_Stability, FTP_Con)
### Create scatter plot
stability_scatter <- ggplot(explore_3,
aes(Em_Stability, FTP_Con)) +
geom_point(alpha = 5/13, position = "jitter") +
geom_smooth(method = "lm") +
xlab("Emotional Stability") +
ylab("Perception of Future Limitations") +
theme_apa()
print(stability_scatter)
The scatterplot seems fairly random. The regression line suggests a weak linear association but the data does not seem to satisfy the assumption of linearity. We should check the other assumption - are the two variables normally distributed? We can do this using the ggqqplot() function from the ggpubr package.
### Create normal quantile plot
ggqqplot(explore_3$Em_Stability, ylab = "Emotional Stability")
### Create normal quantile plot
ggqqplot(explore_3$FTP_Con, ylab = "FTP Constraint")
From both of these plots it seems that neither variable satisfies the assumption of normality. This makes sense because likert scales cannot be normally distributed, but checking this assumption allowed us to test a new R skill. To deal with this, we can use the Spearman correlation again for a correlation test.
### Use cor-test and print only the r estimate and p-value
emot_cor <- cor.test(explore_3$Em_Stability, explore_3$FTP_Con,
method = "spearman")
print(emot_cor$estimate); print(emot_cor$p.value)
## rho
## 0.2195225
## [1] 9.392069e-12
There is some statistically significant correlation, but this result should be interpreted tentatively because the scatterplot does not give a very convincing case for the two variables being linearly related. As such, from these basic analyses, there is insufficient evidence to suggest that neuroticism/emotional stability is related to the perception of future limitations.
Before I give my recommendations to the authors, I should first comment on what strives they have already made towards making their findings reproducible. They included a very detailed and accurate code book which explained their entire data set. Their figures and captions were very well-labeled and it was clear which variables within the data set they were representing. However, there are a few things they could do to increase the reproducibility of their findings.
Firstly, I would recommend that they make available a copy of their R script. This would allow people to see how they analysed their data and quickly verify their results. It could also allow others to identify possible mistakes in their analyses. Additionally, it would make reproducing their figures and tables a much simpler process. This is particularly relevant to our attempt to reproduce Figure 1, which was an arduous process. I would also recommend that they keep their coding style consistent and accompany their code with explanations in places where it gets complicated.
Beyond sharing their code, I would also recommend to the authors that they consider writing their manuscripts in R markdown using the papaja package. This package allows for the printing of statistical results and tables in APA format. There are many advantages to writing the paper in the same program where you complete your analysis. Most notably, should you change your analyses or data in any way, your reported results in the paper will automatically update if you print theme using a function from papaja. This would drastically reduce the incidence of misreporting data.
A final recommendation I would give would be to ensure that they make available the raw data including excluded participants. For this paper, data file did not include any data from the excluded participants. In this case it was okay, because the participants were excluded for reasons which could not be inferred from the data itself. However, in the future, it is key that the authors make the most complete data set available where possible, because sometimes discrepancies arise in verification projects when trying to apply the exclusion criteria outline by the original authors. As such, to ensure complete transparency, I would recommend including excluded participants in the published data set and accompanying them with instructions/guidelines on they were determined to be exclusions.
Aside from these few recommendations, Carstensen et al. (2020) did fairly well in terms of the reproducibility of their results. All reproducibility goals were successfully fulfilled and there were no major issues along the way.
Burr, D. A., Castrellon, J. J., Zald, D. H., & Samanez-Larkin, G. R. (2020). Emotion dynamics across adulthood in everyday life: Older adults are more emotionally stable and better at regulating desires. Emotion. Advance online publication. doi:10.1037/emo0000734
Carstensen, L. L., Shavit, Y. Z., & Barnes, J. T. (2020). Age Advantages in Emotional Experience Persist Even Under Threat From the COVID-19 Pandemic. Psychological Science, 31(11), 1374–1385. https://doi.org/10.1177/0956797620967261
Gross, J. J., Carstensen, L. L., Pasupathi, M., Tsai, J., Skorpen, C. G., & Hsu, A. Y. (1997). Emotion and aging: Experience, expression, and control. Psychology and Aging, 12, 590– 599. doi:10.1037/0882-7974.12.4.590
Simon, R., & Nath, L. (2004). Gender and Emotion in the United States: Do Men and Women Differ in Self‐Reports of Feelings and Expressive Behavior? American Journal of Sociology, 109(5), 1137-1176. doi:10.1086/382111
Stolarski, M., Zajenkowski, M., Jankowski, K. S., & Szymaniak, K. (2020). Deviation from the balanced time perspective: A systematic review of empirical relationships with psychological variables. Personality and Individual Differences, 156, 109772.
Witowska, J., Zajenkowski, M., & Wittmann, M. (2020). Integration of balanced time perspective and time perception: The role of executive control and neuroticism. Personality and Individual Differences, 163, 110061. https://doi.org/https://doi.org/10.1016/j.paid.2020.110061