The Goal

My goal this week was to finish my last two exploratory analyses, which consists of two questions:

Does extraversion affect the frequency and intensity of positive and negative emotions and does this change depending on whether or not people live alone?
Does perceived risk of COVID change people’s outlook on the longevity of the future and does the degree to which people’s employment is affected influence this effect?

Progress

Extraversion, Emotions and Living Alone

For the first question regarding extraversion, frequency and intensity of positive and negative emotions and living alone, let’s get the relevant variables into one data frame and do some manipulation. Since this graph quite closely resembles the reproduced figure in the study, I can reuse some code to make my life a little easier (but with a few improvements).

# load relevant packages

library(tidyverse) # for dplyr and ggplot
library(gt) # nice tables
library(ggpubr) # for more data visualisation functions 

# total raw data from data file

covid <- "Covid_Data.csv" %>% 
  read_csv()

# recycling some code from scatterplot made from reproducibility project to create data set

one <- covid %>% 
  select(avg_f_pos, avg_f_neg, avg_i_pos, avg_i_neg, Extraversion, livealone) %>% 
  pivot_longer(
    cols = starts_with("avg"), 
    names_to = "fivalence",
    values_to = "means") %>% 
  separate(fivalence, c("average", "freqint", "valence"), sep = "_") %>% 
  select(Extraversion, freqint, valence, means, livealone) %>% 
  mutate(freqint2 = ifelse(freqint == "i", "Intensity of \n Emotional Experience",
                           ifelse(freqint == "f", "Frequency of \n Emotional Experience", 
                                  freqint))) %>% 
  mutate(valence2 = ifelse(valence == "pos", "Positive Emotions", 
                           ifelse(valence == "neg", "Negative Emotions", valence))) %>% 
  select(Extraversion, freqint2, valence2, means, livealone)

one

## # A tibble: 3,780 x 5
##    Extraversion freqint2                         valence2        means livealone
##           <dbl> <chr>                            <chr>           <dbl> <chr>    
##  1          6   "Frequency of \n Emotional Expe~ Positive Emoti~  1.81 No       
##  2          6   "Frequency of \n Emotional Expe~ Negative Emoti~  1.33 No       
##  3          6   "Intensity of \n Emotional Expe~ Positive Emoti~  2.25 No       
##  4          6   "Intensity of \n Emotional Expe~ Negative Emoti~  2.33 No       
##  5          2.5 "Frequency of \n Emotional Expe~ Positive Emoti~  1.45 No       
##  6          2.5 "Frequency of \n Emotional Expe~ Negative Emoti~  2.31 No       
##  7          2.5 "Intensity of \n Emotional Expe~ Positive Emoti~  1.15 No       
##  8          2.5 "Intensity of \n Emotional Expe~ Negative Emoti~  2.36 No       
##  9          1.5 "Frequency of \n Emotional Expe~ Positive Emoti~  2.44 No       
## 10          1.5 "Frequency of \n Emotional Expe~ Negative Emoti~  1    No       
## # ... with 3,770 more rows

So far so good

And now for the graph where I can again recycle some code but now with calculated correlations and p-values. I also removed the standard error zones since the regression lines were so close it would be difficult to read otherwise.

aight <- ggplot(one, aes(x = Extraversion, y = means, colour = livealone)) + 
  stat_smooth(method = "lm", size = 1.5, se = FALSE) +
  theme_bw() +
  theme(panel.background = element_rect(colour = "black",
                                        size = 1/40),
        strip.text.x = element_text(family="Arial Narrow", size = 14, face = "bold"),
        strip.text.y = element_text(family="Arial Narrow", size = 13, face = "bold"),
        strip.placement = "outside",
        strip.background = element_blank(),
        axis.title.x = element_text(family="Arial Narrow", size = 14, face = "bold")) + 
  facet_grid(freqint2 ~ valence2, switch = "y") +
  xlab("Extraversion Score") +
  ylab("") +
  scale_color_discrete(name = "Living Alone") +
  stat_cor(aes(colour = livealone))

print(aight)

Surprisingly, overall the more extraverted people experienced more frequent and intense positive emotions and less frequent and intense negative emotions as a result of the pandemic compared to introverted people. It would be reasonable to predict the opposite effect, where introverted people had a greater emotional wellbeing than their more extraverted peers due to rejuvenating aspect of predominantly staying at home and spending more time on their own. When the factor of living alone comes into play, the results get increasingly confusing as introverts who live alone have fewer negative emotions than introverts who live with others, which is to be expected, but these moments are more intense.

There is also a very nice interaction effect from living alone and extraversion in the frequency of positive emotions experienced that I will not elaborate on for the sake of brevity (and declining mental resources).

Future Time Perspective, Perceived Risk and Affected Employment

For the final exploratory analysis, I thought it would be interesting to examine the relationship between perceived risk and perceived openness of the future with different employment situations.

The first step is to manipulate the data a bit. Since my intended plot for this data includes a categorical measure of employment impacted by COVID classed as either High or Low and it was a continuous variable in the original data, I needed to find a way to categorise ranges of values as separate levels. Through extensive Google, I discovered the case_when and between functions which allow you to do this very thing. I also was looking at overall perceived risk so I created a new variable that was the average of risk to population, risk to self and risk of complications and then I was good to go to create the plot.

blep <- covid %>% 
  select(FTP_av, risk_self, risk_pop, risk_comp, emp_affect) %>%
  mutate(risk_avg = (risk_self + risk_comp + risk_pop)/3,   # average perceived risk
         diff_emp = case_when(                               
           between(emp_affect, 0, 1) ~ "Low",               # employment impacted the least (rated 0 or 1) classed as 'Low'
           between(emp_affect, 2, 4) ~ "High"               # employment impacted the most (rated 2 to 4) classed as 'High'
         )) %>% 
  select(FTP_av, risk_avg, diff_emp)

blep

## # A tibble: 945 x 3
##    FTP_av risk_avg diff_emp
##     <dbl>    <dbl> <chr>   
##  1    4.6     2.33 Low     
##  2    4.2     2.67 Low     
##  3    6.2     1.33 High    
##  4    6.6     2.67 High    
##  5    6.6     2    Low     
##  6    3.8     2.33 Low     
##  7    5.2     2.33 High    
##  8    5.5     3    High    
##  9    4.2     2.67 Low     
## 10    4       1.67 High    
## # ... with 935 more rows

For the plot itself, I decided to introduce some variety and use a ggpubr function since it creates very nice-looking plots with a few lines of code.

ggscatter(blep, x = "risk_avg", y = "FTP_av",
          size = 1.5,
          alpha = 0.1,                                 # points made more transparent
          colour = "diff_emp",                          
          xlab = "Average Perceived Risk",
          ylab = "Average Future Time Perspective",
          legend = "right",                            # legend position on the right
          add = "reg.line") +                          # add a regression line 
  scale_colour_discrete(name = "Affect on Employment") +
  stat_cor(aes(colour = diff_emp), label.x = 4)        # Correlation coefficient

Oh, where is the colour? Where are the high and low regression lines? Is the colour = “diff_emp” failing me?!

After wasting my whole afternoon, it turns out my only crime here was being Australian. I seemed to have spelt ~~colour~~ color wrong and once I fixed that ‘mistake’, suddenly life flowed into the plot and this beauty surfaced.

ggscatter(blep, x = "risk_avg", y = "FTP_av",
          size = 1.5,
          alpha = 0.1,                                 
          color = "diff_emp",                  # make sure you spell this word correctly        
          xlab = "Average Perceived Risk",
          ylab = "Average Future Time Perspective",
          legend = "right",                            
          add = "reg.line") +                          
  scale_colour_discrete(name = "Affect on Employment") +
  stat_cor(aes(colour = diff_emp), label.x = 4)

Ooh splended, another interaction

And there you go - God bless America.

Challenges/Successes

For the first time in a while, I didn’t experience any challenges at all - with the obvious exception being some semantic spelling issues.

I managed to complete my exploratory analyses with little to no difficulty and I would like to think it is because I have become infinitely more proficient in coding but let’s not get ahead of ourselves.

Next Steps

Now that the exploratory analysis is done, I’ll probably start working on the next parts of the verification report and maybe have some fun with coding and make some random plots or tables or do some more stats using R and the information from Jenny’s workshops.

Learning Log 8

Joshua Pham

10/04/2021