Goals
This week I wanted to start digging through the data for some questions to ask. I achieved this goal through a very big week of coding - finding statistics and plotting them for two questions.
Successes:
I am toying around with the following questions to look into:
- age and ability to identify symptoms
- gender and adherence
- gender and perceived risk to self or others
- adherence and region
- perceived immunity and level of education
Here is my box plot so far for gender and adherence, I just need to fix the error bars and play around with aesthetics. I used the functions ggplot and geom_col and stat_boxplot().
Here is a table of statistics I made for this relationship: I had to create a mean adherence across all three adhere variables and then use group_by function for gender. After that the process just involved finding the statistics like I did in Jennys Q and A session.
Here is where I’m at with investigating age and ability to name symptoms:
library(ggplot2)
library(tidyverse)
library(haven)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(ggeasy)
library(corrplot)
library(ggpubr)
library(jmv)
library(rstatix)
library(tidyverse)
library(here)
library(car)
library(psych)
library(janitor)
library(ggeasy)
library(gt)
week8data <- read_sav("data_data.sav")
investigate2 <- week8data %>%
group_by(age_categories) %>%
select(Sx_covid_nomissing) %>%
summarise(mean = mean(Sx_covid_nomissing), sd = sd(Sx_covid_nomissing), n=n(),
se = sd/sqrt(n))
investigate22 <- week8data %>%
group_by(age_categories) %>%
select(Sx_covid_nomissing) %>%
summarise(mean = mean(Sx_covid_nomissing), sd = sd(Sx_covid_nomissing), n=n(),
se = sd/sqrt(n)) %>%
mutate(age_categories = case_when(age_categories == 1 ~ "18-24 years",
age_categories == 2 ~ "25-34 years",
age_categories == 3 ~ "35-44 years",
age_categories == 4 ~ "45-54 years",
age_categories == 5 ~ "55+ years"))
gt(investigate22)| age_categories | mean | sd | n | se |
|---|---|---|---|---|
| 18-24 years | 0.4704641 | 0.4993025 | 1422 | 0.01324079 |
| 25-34 years | 0.5290270 | 0.4993609 | 1223 | 0.01427912 |
| 35-44 years | 0.6220096 | 0.4851174 | 1045 | 0.01500682 |
| 45-54 years | 0.6364903 | 0.4813451 | 718 | 0.01796364 |
| 55+ years | 0.6944285 | 0.4607814 | 1741 | 0.01104322 |
To be honest, at the beginning of the semester I thought coding was only for the super tech savvy or the FBI.
But not only have I reproduced a research report, I hae also done Statistics in R code!
During the coding tutorial on Tuesday I was able to create correctional data using R, and it looks so aesthetic I am very proud. Here is the code I used: Here is the plot:
Challenges
- Error in XYZ: object ‘my_penguins’ not found.
This error was appearing firstly for my covid data. Jenny helped me out, identifying that I should read in the fresh csv file in this rmd and this worked. However then this error message appeared for the penguins data, and because I dont have the csv for penguins I wasnt able to use this fix. So I cut my losses and just screenshotted my data and output because a finished learning log is better than a perfect learning log.
- For my first investigation: gender and adherence i am having trouble getting the sd and se to print in the table. I would like to do a significance test for this investigation as the difference doesn’t look very large in the box plot. Here is the code I am using:
covid_data_investigate <- week8data %>%
group_by(gender) %>% na.omit() %>%
select(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends) %>%
summarise(mean = mean(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends, sd = sd(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends), n=n(), se = sd/sqrt(n)))
print(covid_data_investigate)## # A tibble: 2 x 2
## gender mean
## <dbl+lbl> <dbl>
## 1 1 [Male] 0.575
## 2 2 [Female] 0.646
- For my second investigation of age and ability to name symptoms I am having trouble getting a line of best fit, I have tried using the function “geom_abline” but then no line seems to appear on my graph. Here is the graph so far:
graph_investiagte2 <- investigate2 %>%
ggplot(mapping = aes(x = age_categories, y = mean)) +
geom_point() + geom_line() + labs(title = "Ability to detect main symptoms of COVID-19 across age groups", x = "age group", y = "Average ability to identify COVID-19 symptoms")
print(graph_investiagte2)Next steps:
- find statistics for other qs to ask data
- plot these statistics
- play around with aesthetics
- write up report