My goals for this week:
This week my goals were to attend the Q and A sessions to have my questions answered about my exploratory analysis, as well as run significance tests on my current exploratory analysis questions.
We also received the rubric for our verification reports, so my other aim was to either tweak my current exploratory analysis questions or come up with new ones in the hopes that they will fit the HD criteria.
Successes and challenges:
During the Q and A session on Thursday, I was the only student there, so I was very lucky to have Jenny and Kate there to answer my questions and help me with brainstorming for other possible exploratory analysis questions.
Upon looking further at the study 4 dataset, we found that the values given to some of the variables in SPSS were inconsistent with the values of the raw data (if that makes sense). Hopefully the screenschots below will help explain it a bit more!
So here, the values on this likert scale range from 28 to 32. The values for “moderately applies” and “perfectly applicable” also appear to be in the wrong order, so I will have to double-check the German version of this dataset.
On the other hand, the values of the raw data are range from 1 to 5, so I’m a littleeeeee bit confused.
Aside from that, Jenny and Kate said that the exploratory questions that I included in my last learning log would be good for this particular data set, which gave me a lot of reassurance.
However, based on the rubric we were given, and what Jenny said yesterday about how our questions need to be backed up the literature, I think I will come up with some more questions from the study 1 data set that might be considered more literature-based just in case.
So my updated questions include:
- Whether the country you live in has an influence on your perceived vulnerability
- Although the limitation of this is that this study was only done in Western Countries, so it is not a representative sample. Nevertheless, I will gear this question towards Western societies and therefore make it clear that this analysis will only apply to Western societies and not the general population.
Whether empathy has an influence on people’s opinions on the prescribed COVID measures (whether they believe they are necessary or too much)
Whether the number of people in a household has an influence on wearing a mask
- This is one of my old ones that I thought was still worth exploring, as I hypothesised that more people in a household would result in higher motivation to wear a mask)
Although I was able to successfully create the descriptive statistics for each of my three questions, I was not able to create bar graphs for my first two questions. I have put screenshots of the code for those two bar graphs. I have obviously done things that are completely wrong, but not even Google could help me figure out what the error messages mean :(
# Country vs. Perceived vulnerability
##Q20 --> Variable name for (Western) country you live in
##Q24_1 --> Variable name for perceived vulnerability to COVID-19
### Descriptive statistics
exploreQ1 <- Study1USA %>%
group_by(Q20) %>%
select(Q24_1) %>%
summarise(mean = mean(Q24_1), sd = sd(Q24_1), n=n(),
se = sd/sqrt(n)) %>%
mutate(Q20 = case_when(
Q20 == 1 ~ "United States of America",
Q20 == 2 ~ "United Kingdom",
Q20 == 3 ~ "Ireland",
Q20 == 4 ~ "Canada",
Q20 == 4 ~ "Other"))## Adding missing grouping variables: `Q20`
gt(exploreQ1)| Q20 | mean | sd | n | se |
|---|---|---|---|---|
| United States of America | 4.291139 | 0.9971629 | 316 | 0.0560948 |
| United Kingdom | 5.000000 | NA | 1 | NA |
| Canada | 4.500000 | 0.7071068 | 2 | 0.5000000 |
| NA | 4.666667 | 0.5773503 | 3 | 0.3333333 |
Screenshot of the code I tried to run for my bar graph on “Country vs. Perceived vulnerability”
# Influence of empathy on opinions of COVID-19 measures
## Q22_5 --> Empathy question: "I am quite moved by what can happen to those most vulnerable to COVID-19"
### Rated from 1 ("Strongly disagree" -> little to no empathy) to 5 ("Strongly agree" -> highly empathetic)
##Q25_3 --> relates to whether the participants' opinions on whether COVID-19 measures should be followed
## Descriptive statistics part 1
exploreQ2 <- Study1USA %>%
group_by(Q22_5) %>%
select(Q25_4) %>%
summarise(mean = mean(Q25_4), sd = sd(Q25_4), n=n(),
se = sd/sqrt(n)) %>%
mutate(Q22_5 = case_when(
Q22_5 == 1 ~ "Strongly disagree",
Q22_5 == 2 ~ "Somewhat disagree",
Q22_5 == 3 ~ "Neither agree nor disagree",
Q22_5 == 4 ~ "Somewhat agree",
Q22_5 == 5 ~ "Strongly agree"))## Adding missing grouping variables: `Q22_5`
gt(exploreQ2)| Q22_5 | mean | sd | n | se |
|---|---|---|---|---|
| Strongly disagree | 3.428571 | 1.7182494 | 7 | 0.64943722 |
| Somewhat disagree | 4.363636 | 0.8090398 | 11 | 0.24393469 |
| Neither agree nor disagree | 4.433333 | 0.6789106 | 30 | 0.12395154 |
| Somewhat agree | 4.600000 | 0.5617796 | 110 | 0.05356358 |
| Strongly agree | 4.884146 | 0.3572113 | 164 | 0.02789351 |
Screenshot of the code I tried to run for my bar graph on “Influence of empathy on opinions of COVID-19 measures”
#Household size vs. Motivation to wear a face mask
## Descriptive statistics
explore2$Household_size <- as.numeric(as.character(explore2$Household_size))
exploreQ3 <- explore2 %>%
group_by(Household_size) %>%
select(Q22_1) %>%
summarise(mean = mean(Q22_1), sd = sd(Q22_1), n=n(),
se = sd/sqrt(n)) ## Adding missing grouping variables: `Household_size`
gt(exploreQ3)| Household_size | mean | sd | n | se |
|---|---|---|---|---|
| 1 | 3.711370 | 1.187687 | 343 | 0.06412909 |
| 2 | 3.889105 | 1.206447 | 514 | 0.05321410 |
| 3 | 3.959752 | 1.164267 | 323 | 0.06478156 |
| 4 | 3.710638 | 1.216558 | 235 | 0.07935953 |
| 5 | 3.962500 | 1.095951 | 80 | 0.12253100 |
| 6 | 3.809524 | 1.289149 | 21 | 0.28131534 |
| 7 | 3.200000 | 1.483240 | 5 | 0.66332496 |
| 8 | 5.000000 | NA | 1 | NA |
| 9 | 5.000000 | NA | 1 | NA |
| 11 | 3.000000 | NA | 1 | NA |
| 12 | 5.000000 | NA | 1 | NA |
| 20 | 1.000000 | NA | 1 | NA |
## bar plot
exploreQ3_plot <- exploreQ3 %>%
ggplot(aes(x=Household_size, y=mean, fill=Household_size)) +
geom_col() + labs(title = "Motivation to wear a face mask across household size", x = "Number of people in the household", y = "Motivation to wear a face mask")
scale_y_continuous(expand = c(0,0),
limits = c(0, 6))## <ScaleContinuousPosition>
## Range:
## Limits: 0 -- 6
print(exploreQ3_plot)Next steps in my coding journey:
My next steps in my coding journey are to continue working on my verification report and try and figure out how to resolve the error messages on my first two bar graphs. At least I will have plenty of questions to ask during the next Q and A (though I know I will not have the luxury of being the only student there again)!
I will probably post on the “coding” slack page as well, so that I can get my questions answered ahead of time!
I also will double-check to see if my questions are literature-related because if I have doubts, I will probably change my questions again. This will also depend on whether I can solve the issues I am having, and will also depend on the outcomes of my statistical analyses.