Goals

This week I wanted to start digging through the data for some questions to ask. I achieved this goal through a very big week of coding - finding statistics and plotting them for two questions.

Successes:

I am toying around with the following questions to look into:

Here is my box plot so far for gender and adherence, I just need to fix the error bars and play around with aesthetics. I used the functions ggplot and geom_col and stat_boxplot().

Here is a table of statistics I made for this relationship: I had to create a mean adherence across all three adhere variables and then use group_by function for gender. After that the process just involved finding the statistics like I did in Jennys Q and A session.

Here is where I’m at with investigating age and ability to name symptoms:

library(ggplot2)
library(tidyverse)
library(haven)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(ggeasy)
library(corrplot)
library(ggpubr)
library(jmv)
library(rstatix)
library(tidyverse)
library(here)
library(car)
library(psych)
library(janitor)
library(ggeasy)
library(gt)

week8data <- read_sav("data_data.sav")


investigate2 <- week8data %>%
  group_by(age_categories) %>%
  select(Sx_covid_nomissing) %>%
  summarise(mean = mean(Sx_covid_nomissing), sd = sd(Sx_covid_nomissing), n=n(),
            se = sd/sqrt(n))

investigate22 <- week8data %>%
  group_by(age_categories) %>%
  select(Sx_covid_nomissing) %>%
  summarise(mean = mean(Sx_covid_nomissing), sd = sd(Sx_covid_nomissing), n=n(),
            se = sd/sqrt(n))  %>%
  mutate(age_categories = case_when(age_categories == 1 ~ "18-24 years", 
         age_categories == 2 ~ "25-34 years",
         age_categories == 3 ~ "35-44 years",
         age_categories == 4 ~ "45-54 years",
         age_categories == 5 ~ "55+ years"))

gt(investigate22)
age_categories mean sd n se
18-24 years 0.4704641 0.4993025 1422 0.01324079
25-34 years 0.5290270 0.4993609 1223 0.01427912
35-44 years 0.6220096 0.4851174 1045 0.01500682
45-54 years 0.6364903 0.4813451 718 0.01796364
55+ years 0.6944285 0.4607814 1741 0.01104322

To be honest, at the beginning of the semester I thought coding was only for the super tech savvy or the FBI.

But not only have I reproduced a research report, I hae also done Statistics in R code!

During the coding tutorial on Tuesday I was able to create correctional data using R, and it looks so aesthetic I am very proud. Here is the code I used: Here is the plot:

Challenges

  1. Error in XYZ: object ‘my_penguins’ not found.

This error was appearing firstly for my covid data. Jenny helped me out, identifying that I should read in the fresh csv file in this rmd and this worked. However then this error message appeared for the penguins data, and because I dont have the csv for penguins I wasnt able to use this fix. So I cut my losses and just screenshotted my data and output because a finished learning log is better than a perfect learning log.

  1. For my first investigation: gender and adherence i am having trouble getting the sd and se to print in the table. I would like to do a significance test for this investigation as the difference doesn’t look very large in the box plot. Here is the code I am using:
covid_data_investigate <- week8data %>%
  group_by(gender) %>% na.omit() %>%
  select(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends) %>%
  summarise(mean = mean(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends, sd = sd(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends), n=n(), se = sd/sqrt(n)))

print(covid_data_investigate)
## # A tibble: 2 x 2
##       gender  mean
##    <dbl+lbl> <dbl>
## 1 1 [Male]   0.575
## 2 2 [Female] 0.646
  1. For my second investigation of age and ability to name symptoms I am having trouble getting a line of best fit, I have tried using the function “geom_abline” but then no line seems to appear on my graph. Here is the graph so far:
graph_investiagte2 <- investigate2 %>%
  ggplot(mapping = aes(x = age_categories, y = mean)) + 
  geom_point() + geom_line() + labs(title = "Ability to detect main symptoms of COVID-19 across age groups", x = "age group", y = "Average ability to identify COVID-19 symptoms")
  
print(graph_investiagte2)

Next steps: