Goals

This week I hope to start exploring my COVID paper dataset. I have a few ideas for some questions I can investigate. I plan to use Jenny’s method of descriptives-visualisation-statistics to answer the questions.

Progress

library(tidyverse) # Allows for easy data wrangling with shared language across packages
library(ggplot2) # Allows for detailed data visualisation
library(here) # Allows for construction of a path to project files
library(dplyr) # Allows for simple data manipulation
library(papaja) # Allows for APA formatting of results and tables
library(psych) # Includes "describe" function (M and SD)
library(MOTE) # Effect size calculator
library(wesanderson) # Cool colour palettes for graphs
library(ggpubr) # Allows for quick creation of quantile plots


master <- read.csv("AgeAdvantagesEmotionCovid_Data.csv", header = TRUE)

Q1 - Does frequency of positive and emotion differ across genders?

To answer this question, I am going to use Jenny’s approach of finding the descriptive statistics, displaying them visually and then use an appropriate statistical test. First, I can find the means and SDs for positive and negative emotion frequency grouped by gender.

explore_1 <- master %>%
  select(gender_bin_f, avg_f_pos, avg_f_neg)

explore_1_des <- explore_1 %>%
  group_by(gender_bin_f) %>%
  summarise(across(.cols = everything(), na.rm = TRUE, 
                   list(M = mean, SD = sd)))

Next, I can display the means visually with a bar graph, whilst also displaying 95% CI limits.

explore_1_long <- explore_1 %>%
  pivot_longer(cols = starts_with("avg"),  
               names_to = "emotion",
               values_to = "frequency")

explore_1_long$emotion <- as.factor(explore_1_long$emotion) 

explore_1_long <- explore_1_long %>% 
  mutate(emotion = recode(emotion,
                          'avg_f_neg' = "Negative",
                          'avg_f_pos' = "Positive"))

explore_1_plot <- ggplot(data = explore_1_long, 
                   mapping = aes(
                     x = emotion, y = frequency, fill = gender_bin_f)) +
  stat_summary(fun.y = mean,
               geom = "bar",
               position = "dodge",
               colour = "black") +
  stat_summary(fun.data = mean_cl_normal,
               geom = "errorbar",
               width = .2, position = position_dodge(width = 0.90)) +
  xlab("Emotion Type") +
  ylab("Frequency") +
  scale_fill_manual(name = "Gender of Participants",
                    labels = c("Female", "Non-Female"),
                    values = c("magenta1", "deepskyblue1")) +
  theme_apa() +
  theme(legend.position = c(0.19, 0.85),
        legend.title = element_text(size = 8), 
        legend.text = element_text(size = 9)) +
  ylim(0, 2.5)

print(explore_1_plot)

As shown in the graph, there could be some statistically significant differences in positive and negative emotion frequency between genders. However, this difference doesn’t seem to be of any practical importance. To test both of these, we can use a t-test and find the Cohen’s d effect size.

female_pos <- explore_1_long %>%
  filter(gender_bin_f == "Female", emotion == "Positive")

nonfemale_pos <- explore_1_long %>%
  filter(gender_bin_f == "Non-female", emotion == "Positive")

t.test(female_pos$frequency, nonfemale_pos$frequency)
## 
##  Welch Two Sample t-test
## 
## data:  female_pos$frequency and nonfemale_pos$frequency
## t = -2.3124, df = 942.3, p-value = 0.02097
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.15615986 -0.01278386
## sample estimates:
## mean of x mean of y 
##  1.929173  2.013645

Non-female individuals experience on average more positive emotion over a week than female individuals. This result is statistically significant p < .05.

pos_freq_es <- d.ind.t(m1 = mean(female_pos$frequency), 
                      m2 = mean(nonfemale_pos$frequency), 
                      sd1 = sd(female_pos$frequency), 
                      sd2 = sd(nonfemale_pos$frequency), 
                      n1 = 465, 
                      n2 = 480, 
                      a = 0.05)
pos_freq_es$d
## [1] -0.1504561

The effect would be considered small (d < 0.2).

Now we can repeat the same process for the negative emotions.

female_neg <- explore_1_long %>%
  filter(gender_bin_f == "Female", emotion == "Negative")

nonfemale_neg <- explore_1_long %>%
  filter(gender_bin_f == "Non-female", emotion == "Negative")

t.test(female_neg$frequency, nonfemale_neg$frequency)
## 
##  Welch Two Sample t-test
## 
## data:  female_neg$frequency and nonfemale_neg$frequency
## t = 4.5822, df = 937.45, p-value = 5.222e-06
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  0.1116344 0.2788905
## sample estimates:
## mean of x mean of y 
##  1.516801  1.321539

Once again it revealed a significant result, but we should check the size of the effect.

neg_freq_es <- d.ind.t(m1 = mean(female_neg$frequency), 
                      m2 = mean(nonfemale_neg$frequency), 
                      sd1 = sd(female_neg$frequency), 
                      sd2 = sd(nonfemale_neg$frequency), 
                      n1 = 465, 
                      n2 = 480, 
                      a = 0.05)
neg_freq_es$d
## [1] 0.298371

This is a small-medium effect size (0.2 < d < 0.5). As such, frequency of positive and negative emotions do differ across genders. Non-female individuals seem to experience slightly more positive emotions and less negative emotions on average compared to female individuals. The difference in negative emotional experience is also larger compared to the difference in positive emotional experience.

Q2 - Does one’s perceived risk of contracting coronavirus correlate with their perceieved risk of others contracting coronavirus?

It would be interesting to see if people perceive themselves to be as likely to contract coronavirus as other people. To achieve this, we can see if there is a correlation between the two variables risk_self and risk_pop. First, we can visualise this relationship with a special kind of scatter plot.

explore_2 <- master %>%
  select(risk_self, risk_pop)

risk_scatter <- ggplot(explore_2, 
                            aes(risk_self, risk_pop, 
                                colour = as.factor(risk_self))) + 
  geom_point(position = "jitter") +
  ylim(0, 5) +
  xlim(0, 5) +
  xlab("Perceieved Risk to Self") +
  ylab("Perceived Risk to Population") +
  theme_apa() +
  scale_color_manual(values = wes_palette(n = 6, 
                                          name = "Zissou1",
                                          type = "continuous")) +
  theme(legend.position = "none")

print(risk_scatter)

Because these variables are both scores on a Likert scale, we must use Spearman’s Rho.

risk_cor <- cor.test(explore_2$risk_self, explore_2$risk_pop, 
                method = "spearman")

print(risk_cor$estimate); print(risk_cor$p.value)
##       rho 
## 0.3449245
## [1] 8.628763e-28

One’s perceived risk of contracting coronavirus is weak-moderately correlated with one’s perceived risk of the general population contracting coronavirus.

Q3 - Is there a relationship between personality (neuroticism/emotional stability) and perception of future limitations (constraint)?

The authors of the original paper measured the Big 5 personality traits and the future time horizons of participants, however, they did not investigate any potential relationship between the two. It would be interesting to see whether people lower in emotional stability (high neuroticism) perceive their future to contain more limitations than people higher in emotional stability (low neuroticism). To investigate this, we can test if there are is correlation between the two variables. Like before, we can see if there is any visually noticeable linear relationship.

explore_3 <- master %>%
  select(Em_Stability, FTP_Con)

stability_scatter <- ggplot(explore_3, 
                            aes(Em_Stability, FTP_Con)) +
  geom_point(alpha = 5/13, position = "jitter") +
  geom_smooth(method = "lm") +
  xlab("Emotional Stability") +
  ylab("Perception of Future Limitations") +
  theme_apa()

print(stability_scatter)

The scatterplot seems fairly random. The regression line suggests a weak linear association but the data does not seem to satisfy the assumption of linearity. We should check the other assumption - are the two variables normally distributed? We can do this using the ggpubr package.

ggqqplot(explore_3$Em_Stability, ylab = "Emotional Stability")

ggqqplot(explore_3$FTP_Con, ylab = "FTP Constraint")

From both of these plots it seems that neither variable satisfies the assumption of normality. To deal with this, we can use the Spearman correlation again.

emot_cor <- cor.test(explore_3$Em_Stability, explore_3$FTP_Con, 
                method = "spearman")

print(emot_cor$estimate); print(emot_cor$p.value)
##       rho 
## 0.2195225
## [1] 9.392069e-12

There is some statistically significant correlation, but this result should be interpreted tentatively because the scatterplot does not give a very convincing case for the two variables being linearly related.

Challenges and Successes

Most of the challenges I encountered this week came from answering my second explorative question. I found it difficult to display and analyse the data for two ordinal, Likert scale variables. After some investigation, I found a relatively useful way to display the data and figured I could use Spearman’s correlation. I also realised I forgot to put comments on my code like I said I was going to do from now on :/ Oh well I will leave that for my future self to do :).

In terms of successes, I managed to complete the Exploration part of my verification report, and put my ggplot skills to use in creating purely original plots. I also did some basic statistical analysis (which I hope was accurate haha).

Next Step

My next step will be to explore R a little further and investigate some new packages, like ggpubr, as well as refine my code a little more.