Learning Log 10

Joshua Pham

2021-04-24

The Goal

My goal was to do yet another exploratory analysis.

Progress

As per tradition, let’s load the packages and extract the data.

library(tidyverse)
library(gt)

covid <- "Covid_Data.csv" %>% 
  read.csv()

Although the study I was reproducing focused on correlations between emotion, risk and age, they didn’t mention anything about emotion and risk correlations across different age groups which I think would be an interesting factor to examine.

To do this, I created a data frame (arbitrarily named bah) through selecting the risk variables, emotion frequencies and age. Since I was only interested in average risk, I averaged out the three risk variables and divided the ages into various ranges.

Some data manipulation would be great to make it easier to plot so I pivotted the positive and negative emotion frequencies and renamed them to more appropriate labels.

bah <- covid %>% 
  select(risk_self, risk_pop, risk_comp, avg_f_pos, avg_f_neg, age) %>%   
  mutate(average_risk = (risk_self + risk_comp + risk_pop)/3,  
         age_range = case_when(                                
           between(age, 18, 24) ~ "18-24",
           between(age, 25, 34) ~ "25-34",
           between(age, 35, 44) ~ "35-44",
           between(age, 45, 54) ~ "45-54",
           between(age, 55, 64) ~ "55-64",
           between(age, 65, 76) ~ "65-76"
         )) %>% 
  pivot_longer(
    cols = starts_with("avg"),                            
    names_to = "Frequency",
    values_to = "means") %>% 
  separate(Frequency, c("average", "freqint", "valence"), sep = "_") %>%  
  mutate(valence2 = ifelse(valence == "pos", "Positive Emotions",         
                           ifelse(valence == "neg", "Negative Emotions", valence))) %>% 
  select(average_risk, age_range, valence2, means)

glimpse(bah)
## Rows: 1,890
## Columns: 4
## $ average_risk <dbl> 2.333333, 2.333333, 2.666667, 2.666667, 1.333333, 1.33333~
## $ age_range    <chr> "18-24", "18-24", "18-24", "18-24", "18-24", "18-24", "18~
## $ valence2     <chr> "Positive Emotions", "Negative Emotions", "Positive Emoti~
## $ means        <dbl> 1.8125000, 1.3333333, 1.4545455, 2.3076923, 2.4375000, 1.~

WONDERFUL

Now for the actual plotting.

I’m thinking a nice facet-wrapped line graph should do the trick so let’s see how this goes.

ggplot(bah, aes(x = means, y = average_risk, colour = age_range)) +
  geom_smooth(method = "lm", se = FALSE) +
  theme_bw() +
  theme(panel.background = element_rect(colour = "black",
                                        size = 1/40),
        strip.text.x = element_text(family ="Arial Narrow", size = 14, face = "bold"),
        strip.background = element_blank(),
        axis.title.x = element_text(family ="Arial Narrow", size = 14, face = "bold"),
        axis.title.y = element_text(family = "Arial Narrow", size = 14, face = "bold")) +
  facet_wrap(vars(valence2)) +
  scale_color_discrete(name = "Age Range") +
  xlab("Average Frequency") +
  ylab("Average Perceived Risk")

And it actually worked first try so it might be reasonable to suggest that I actually learnt something about R since the beginning of the term.

Challenges/Successes

Nothing too difficult this week - I completed my final exploratory analysis by plotting a line graph in record time.

Next steps

Seeing as the verification report due date is rapidly approaching, it might be wise to actually start doing it.