Goals

This week, my goal was to finish my verification report by adding comments to my explanatory section. I also hope to try to knit to word/PDF.

Progress

Setup

The following packages were used to produce the verification report.

library(tidyverse) 
library(ggplot2) 
library(here) 
library(dplyr) 
library(papaja) 
library(psych) 
library(MOTE) 
library(wesanderson) 
library(ggpubr) 


master <- read.csv("AgeAdvantagesEmotionCovid_Data.csv", header = TRUE)

Q1 - Does frequency of positive and emotion differ across genders?

One interesting point brought up by the authors was that the age effects in emotional experience held across gender, although there was a significant age-gender interaction for frequency of positive emotions. Prior research into emotional experience suggests that there are some significant differences in how men and women experience certain positive and negative emotions, but that these can be mostly be explained by social factors (Simon & Nath, 2004). As mentioned in Reaction 1, it would be interesting to see whether there are any gender differences in emotional experience in a social environment as stressful as the one created by the COVID-19 pandemic. Specifically, is there a main effect of gender on frequency of positive and negative emotions in the present sample?

To answer this question, I am going to use Jenny’s approach of finding the descriptive statistics, displaying them visually and then use an appropriate statistical test. First, I can find the means and SDs for positive and negative emotion frequency grouped by gender, by using the same summarise() function from the verification section.

### Select relevant variables
explore_1 <- master %>%
  select(gender_bin_f, avg_f_pos, avg_f_neg)

### Acquire the means and SDs of emotion frequency grouped by gender
explore_1_des <- explore_1 %>%
  group_by(gender_bin_f) %>%
  summarise(across(.cols = everything(), na.rm = TRUE, 
                   list(M = mean, SD = sd)))

Next, I can pivot all of the means and SDs to long form similar to what we did when reproducing Table 1.

### Prepare the data for visualisation by pivoting to long-form
explore_1_long <- explore_1 %>%
  pivot_longer(cols = starts_with("avg"),  
               names_to = "emotion",
               values_to = "frequency")

After factoring the emotion variable and renaming the values, we can plot the graph. Using ggplot, and the stat_summary() function, we can add bars for the means, and error bars for the 95% CIs. After making some aesthetic changes using functions already explained in the verification section, we have a nice bar graph. The only new addition is the position_dodge argument which makes sure the specified elements (e.g. error bars) do not overlap which each other.

### Change emotion to a factor
explore_1_long$emotion <- as.factor(explore_1_long$emotion) 

### Recode emotions to table-appropriate names
explore_1_long <- explore_1_long %>% 
  mutate(emotion = recode(emotion,
                          'avg_f_neg' = "Negative",
                          'avg_f_pos' = "Positive"))

### Create plot of emotion frequency, grouped by emotion type and gender.
explore_1_plot <- ggplot(data = explore_1_long, 
                   mapping = aes(
                     x = emotion, y = frequency, fill = gender_bin_f)) +
  stat_summary(fun.y = mean,
               geom = "bar",
               position = "dodge",
               colour = "black") +
  stat_summary(fun.data = mean_cl_normal,
               geom = "errorbar",
               width = .2, position = position_dodge(width = 0.90)) +
  xlab("Emotion Type") +
  ylab("Frequency") +
  scale_fill_manual(name = "Gender of Participants",
                    labels = c("Female", "Non-Female"),
                    values = c("magenta1", "deepskyblue1")) +
  theme_apa() +
  theme(legend.position = c(0.19, 0.85),
        legend.title = element_text(size = 8), 
        legend.text = element_text(size = 9)) +
  ylim(0, 2.5)

print(explore_1_plot)

As shown in the graph, there could be some statistically significant differences in positive and negative emotion frequency between genders. However, this difference doesn’t seem to be of any practical importance. Nevertheless, to test both of these, we can use a t-test and find the Cohen’s d effect size.

First, we create new variables grouping the mean frequency of positive emotions into “Female” and “Non-female”. We can then use the t.test() function to test for a difference between them.

### Create new variables for female and non-female positive emotion frequency.
female_pos <- explore_1_long %>%
  filter(gender_bin_f == "Female", emotion == "Positive")

nonfemale_pos <- explore_1_long %>%
  filter(gender_bin_f == "Non-female", emotion == "Positive")

### Use t.test function
t.test(female_pos$frequency, nonfemale_pos$frequency)

## 
##  Welch Two Sample t-test
## 
## data:  female_pos$frequency and nonfemale_pos$frequency
## t = -2.3124, df = 942.3, p-value = 0.02097
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.15615986 -0.01278386
## sample estimates:
## mean of x mean of y 
##  1.929173  2.013645

Non-female individuals experience on average more positive emotion over a week than female individuals. This result is statistically significant p < .05.

To calculate effect size, we can use the MOTE package, which contains a range of functions designed for easy effect size calculation. Because our test is an independent samples t-test, we use the d.ind.t() function and input the relevant values. After that, we can tell R to print just Cohen’s d.

### Use the functions from the MOTE package to calculate the effect size of the above difference.
pos_freq_es <- d.ind.t(m1 = mean(female_pos$frequency), 
                      m2 = mean(nonfemale_pos$frequency), 
                      sd1 = sd(female_pos$frequency), 
                      sd2 = sd(nonfemale_pos$frequency), 
                      n1 = 465, 
                      n2 = 480, 
                      a = 0.05)

### Print only Cohen's d
pos_freq_es$d

## [1] -0.1504561

The effect would be considered small (d < 0.2).

Now we can repeat the same process for the negative emotions.

### Create new variables for female and non-female negative emotion frequency.
female_neg <- explore_1_long %>%
  filter(gender_bin_f == "Female", emotion == "Negative")

nonfemale_neg <- explore_1_long %>%
  filter(gender_bin_f == "Non-female", emotion == "Negative")

### Use t.test function
t.test(female_neg$frequency, nonfemale_neg$frequency)

## 
##  Welch Two Sample t-test
## 
## data:  female_neg$frequency and nonfemale_neg$frequency
## t = 4.5822, df = 937.45, p-value = 5.222e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1116344 0.2788905
## sample estimates:
## mean of x mean of y 
##  1.516801  1.321539

Once again it revealed a significant result, but we should check the size of the effect.

### Use the functions from the MOTE package to calculate the effect size of the above difference.
neg_freq_es <- d.ind.t(m1 = mean(female_neg$frequency), 
                      m2 = mean(nonfemale_neg$frequency), 
                      sd1 = sd(female_neg$frequency), 
                      sd2 = sd(nonfemale_neg$frequency), 
                      n1 = 465, 
                      n2 = 480, 
                      a = 0.05)

### Use t.test function
neg_freq_es$d

## [1] 0.298371

This is a small-medium effect size (0.2 < d < 0.5). As such, frequency of positive and negative emotions do differ across genders. Non-female individuals seem to experience slightly more positive emotions and less negative emotions on average compared to female individuals. The difference in negative emotional experience is also larger compared to the difference in positive emotional experience. These results are in alignment with previous research and indicate that this effect persists in the COVID-19 environment.

Q2 - Does one’s perceived risk of contracting coronavirus correlate with their perceieved risk of others contracting coronavirus?

As suggested in Reaction 2, the authors perhaps should have focused more on the perceived risk of the virus and its complications to ensure the environment could correctly be defined as one of “prolonged stress”. However, even with their primitive measures of perceived risk, they did not investigate what would have been an interesting question - how closely related is people’s perceive risk of contracting coronavirus and their perceived risk of someone else contracting coronavirus? To achieve this, we can see if there is a correlation between the two variables risk_self and risk_pop. First, we can visualise this relationship with a special kind of scatter plot.

For this plot, we use geom_point() but add position = "jitter" to add some noise to the data points. This is important because plotting data from likert scales is difficult, since the answers often overlap with each other visually. Here the wesanderson package comes in handy by providing a cool colour pallete for our plot.

### Select relevant variables
explore_2 <- master %>%
  select(risk_self, risk_pop)

### Create scatter plot
risk_scatter <- ggplot(explore_2, 
                            aes(risk_self, risk_pop, 
                                colour = as.factor(risk_self))) + 
  geom_point(position = "jitter") +
  ylim(0, 5) +
  xlim(0, 5) +
  xlab("Perceieved Risk to Self") +
  ylab("Perceived Risk to Population") +
  theme_apa() +
  scale_color_manual(values = wes_palette(n = 6, 
                                          name = "Zissou1",
                                          type = "continuous")) +
  theme(legend.position = "none")

print(risk_scatter)

Because these variables are both scores on a Likert scale, to test for a correlation between them, we must use Spearman’s Rho. We can use the base cor.test() function and tell R to just print the estimate and it’s associated p-value.

### Use cor-test and print only the r estimate and p-value
risk_cor <- cor.test(explore_2$risk_self, explore_2$risk_pop, 
                method = "spearman")

print(risk_cor$estimate); print(risk_cor$p.value)

##       rho 
## 0.3449245

## [1] 8.628763e-28

One’s perceived risk of contracting coronavirus is weak-moderately correlated with one’s perceived risk of the general population contracting coronavirus. This is interesting because it suggests that people’s perceived risk of contracting the virus themselves does not match up directly with their perceived risk of someone else contracting it.

Q3 - Is there a relationship between personality (neuroticism/emotional stability) and perception of future limitations (constraint)?

Recent studies by Stolarski et al. (2020) and Witowska et al. (2020) both found that higher levels of neuroticism are associated with less balanced time perspectives. In other words, people high in neuroticism have difficulties balancing their perception of past, present and future time.

The authors of the original paper measured the Big 5 personality traits and the future time horizons of participants, however, they did not investigate any potential relationship between the two. It would be interesting to see whether people lower in emotional stability (high neuroticism) perceive their future to contain more limitations than people higher in emotional stability (low neuroticism).

To investigate this, we can test if there are is correlation between the two variables. Like before, we can see if there is any visually noticeable linear relationship. Again, we can use geom_point(), add some jitter to the data points and a regression line using geom_smooth().

### Select relevant variables
explore_3 <- master %>%
  select(Em_Stability, FTP_Con)

### Create scatter plot
stability_scatter <- ggplot(explore_3, 
                            aes(Em_Stability, FTP_Con)) +
  geom_point(alpha = 5/13, position = "jitter") +
  geom_smooth(method = "lm") +
  xlab("Emotional Stability") +
  ylab("Perception of Future Limitations") +
  theme_apa()

print(stability_scatter)

The scatterplot seems fairly random. The regression line suggests a weak linear association but the data does not seem to satisfy the assumption of linearity. We should check the other assumption - are the two variables normally distributed? We can do this using the ggqqplot() function from the ggpubr package.

### Create normal quantile plot
ggqqplot(explore_3$Em_Stability, ylab = "Emotional Stability")

### Create normal quantile plot
ggqqplot(explore_3$FTP_Con, ylab = "FTP Constraint")

From both of these plots it seems that neither variable satisfies the assumption of normality. This makes sense because likert scales cannot be normally distributed, but checking this assumption allowed us to test a new R skill. To deal with this, we can use the Spearman correlation again for a correlation test.

### Use cor-test and print only the r estimate and p-value
emot_cor <- cor.test(explore_3$Em_Stability, explore_3$FTP_Con, 
                method = "spearman")

print(emot_cor$estimate); print(emot_cor$p.value)

##       rho 
## 0.2195225

## [1] 9.392069e-12

There is some statistically significant correlation, but this result should be interpreted tentatively because the scatterplot does not give a very convincing case for the two variables being linearly related. As such, from these basic analyses, there is insufficient evidence to suggest that neuroticism/emotional stability is related to the perception of future limitations.

Challenges and Successes

The biggest challenge I faced this week was definitely knitting to word/PDF. At first, I was having trouble getting the document to knit at all, because I was receiving lots of error messages. As it turns out, the fonts we used for reproducing Figure 1 aren’t compatible with word or PDF format, so I had to delete those. However, even after knitting to word, the formatting was very ugly. I had to do a lot of editing in the word doc itself to make it look nice and readable. I considered using the papaja package to knit the report in APA format but there were too many issues with the formatting. If I had more time, perhaps I could have figured it out.

As for successes, I did manage to finish my verification report by adding comments to the explanatory section which explained my code.

Next Step

I guess now that the term and project are over there isn’t a next step, but my coding journey is only just beginning. I can’t wait to see what other skills I can develop and what new packages come out in the future that improve the coding experience.

Log Week 10

Luke Keevers

25/04/2021