The Goal
My goal was to do yet another exploratory analysis.
Progress
As per tradition, let’s load the packages and extract the data.
library(tidyverse)
library(gt)
covid <- "Covid_Data.csv" %>%
read.csv()Although the study I was reproducing focused on correlations between emotion, risk and age, they didn’t mention anything about emotion and risk correlations across different age groups which I think would be an interesting factor to examine.
To do this, I created a data frame (arbitrarily named bah) through selecting the risk variables, emotion frequencies and age. Since I was only interested in average risk, I averaged out the three risk variables and divided the ages into various ranges.
Some data manipulation would be great to make it easier to plot so I pivotted the positive and negative emotion frequencies and renamed them to more appropriate labels.
bah <- covid %>%
select(risk_self, risk_pop, risk_comp, avg_f_pos, avg_f_neg, age) %>%
mutate(average_risk = (risk_self + risk_comp + risk_pop)/3,
age_range = case_when(
between(age, 18, 24) ~ "18-24",
between(age, 25, 34) ~ "25-34",
between(age, 35, 44) ~ "35-44",
between(age, 45, 54) ~ "45-54",
between(age, 55, 64) ~ "55-64",
between(age, 65, 76) ~ "65-76"
)) %>%
pivot_longer(
cols = starts_with("avg"),
names_to = "Frequency",
values_to = "means") %>%
separate(Frequency, c("average", "freqint", "valence"), sep = "_") %>%
mutate(valence2 = ifelse(valence == "pos", "Positive Emotions",
ifelse(valence == "neg", "Negative Emotions", valence))) %>%
select(average_risk, age_range, valence2, means)
glimpse(bah)## Rows: 1,890
## Columns: 4
## $ average_risk <dbl> 2.333333, 2.333333, 2.666667, 2.666667, 1.333333, 1.33333~
## $ age_range <chr> "18-24", "18-24", "18-24", "18-24", "18-24", "18-24", "18~
## $ valence2 <chr> "Positive Emotions", "Negative Emotions", "Positive Emoti~
## $ means <dbl> 1.8125000, 1.3333333, 1.4545455, 2.3076923, 2.4375000, 1.~
WONDERFUL
Now for the actual plotting.
I’m thinking a nice facet-wrapped line graph should do the trick so let’s see how this goes.
ggplot(bah, aes(x = means, y = average_risk, colour = age_range)) +
geom_smooth(method = "lm", se = FALSE) +
theme_bw() +
theme(panel.background = element_rect(colour = "black",
size = 1/40),
strip.text.x = element_text(family ="Arial Narrow", size = 14, face = "bold"),
strip.background = element_blank(),
axis.title.x = element_text(family ="Arial Narrow", size = 14, face = "bold"),
axis.title.y = element_text(family = "Arial Narrow", size = 14, face = "bold")) +
facet_wrap(vars(valence2)) +
scale_color_discrete(name = "Age Range") +
xlab("Average Frequency") +
ylab("Average Perceived Risk")And it actually worked first try so it might be reasonable to suggest that I actually learnt something about R since the beginning of the term.
Challenges/Successes
Nothing too difficult this week - I completed my final exploratory analysis by plotting a line graph in record time.
Next steps
Seeing as the verification report due date is rapidly approaching, it might be wise to actually start doing it.