Goals:
- post learning log BEFORE MONDAY
- be close to finishing plots/tables for investigation questions
- start putting things together for verification report
Succeses:
At Tuesday’s tutorial I succeeded with some troubleshooting. Whilst I was not excited to practice making errors, as I come across so many when i code, this tutorial was SO helpful. I learned to really look into the error messages, rather than just clicking the run button again, hoping for a different answer. And now I cant figure out the issue from the message I search the error and look at helpful websites such as stack overflow for solutions. In the tutorial we tried to use data that was not pre installed, and install a package (emo and devtools) which were not already installed on r. This meant going to stack overflow and finding the installation link. From this I was able to plot the data from ozbabynames. I want to work on this graph over the weekend, to try get rid of the overlapping names. Here is my code and graph so far:
I started my investigation of the relationship between those who have been tested for covid -19, the results and perceived immunity. Below is my table and output for this investigation. I played around with lots of additions with this graph, using functions such as colours: scale_fill_brewer(palette=“Blues”), titles and text size: theme(legend.text = element_text(size = 8)), error bars and a legend! I am pretty proud of this graph.
At first I wasnt sure why the error bar for positive test result wasn’t showing. However after 15 minutes of staring at my code I realised it was because my y axis was too short to fit the error bar in. Here is the before and after:
Here is the graph so far for adherenace to lockdown in different regions. I cant add an error bar yet because I cant find the sd for this data. See challenges below.
Here is a table i made investigating the relationship between age and ability to name covid-19 symptoms:
library(ggplot2)
library(tidyverse)
library(haven)
library(tidyverse)
library(dplyr)
library(ggplot2)
library(ggeasy)
library(corrplot)
library(ggpubr)
library(jmv)
library(rstatix)
library(tidyverse)
library(here)
library(car)
library(psych)
library(janitor)
library(ggeasy)
library(gt)
week8data <- read_sav("data_data.sav")
investigate2 <- week8data %>%
group_by(age_categories) %>%
select(Sx_covid_nomissing) %>%
summarise(mean = mean(Sx_covid_nomissing), sd = sd(Sx_covid_nomissing), n=n(),
se = sd/sqrt(n))
investigate22 <- investigate2 %>%
mutate(age_categories = case_when(age_categories == 1 ~ "18-24 years",
age_categories == 2 ~ "25-34 years",
age_categories == 3 ~ "35-44 years",
age_categories == 4 ~ "45-54 years",
age_categories == 5 ~ "55+ years"))
gt(investigate22)| age_categories | mean | sd | n | se |
|---|---|---|---|---|
| 18-24 years | 0.4704641 | 0.4993025 | 1422 | 0.01324079 |
| 25-34 years | 0.5290270 | 0.4993609 | 1223 | 0.01427912 |
| 35-44 years | 0.6220096 | 0.4851174 | 1045 | 0.01500682 |
| 45-54 years | 0.6364903 | 0.4813451 | 718 | 0.01796364 |
| 55+ years | 0.6944285 | 0.4607814 | 1741 | 0.01104322 |
Here is the graph I was able to create to plot this data. Inspired by Jenny Sloans tut where we used babynames as the point, I added the variable names to the point instead of squishing them on the x axis
QUESTION, regarding error bars. My error bars are depicting Standard deviations. However if the standard deviations go above or below the scale (e.g below 0, above 1) do researchers cut these error bars (as this response is impossible) or do they remain this way, to depict the true SD. In this case are error bars not as helpful, due to the size of them?
Challenges:
- moving the “age_category” labels oon my graph, adjusting them up and down, left and right until they were in the correct spot.
- creating error bars for a line graph as the function is different. I was trying to use geom_errorbar instead of geom_pointrange().
- QUESTION I can’t get the sd se or n to print in the tibble. However this code i have used previously to find these statistics for different variables and it has worked. Here is the code and output:
covid_data_investigate <- week8data %>%
group_by(region) %>%
select(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends) %>%
summarise(mean0 = mean(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends, sd = sd(Adhere_shop_groceries, Adhere_shop_other, Adhere_meet_friends), n=n(), se = sd/sqrt(n)))## Adding missing grouping variables: `region`
gt(covid_data_investigate)| region | mean0 |
|---|---|
| 1 | 0.6172481 |
| 2 | 0.6425770 |
| 3 | 0.6199313 |
| 4 | 0.5570000 |
| 5 | 0.5895097 |
Me looking for the sd that I clearly asked R to calculate:
- QUESTION The column plot displaying covid testing x perceived immunity i want to try make into a box plot (rather than having another col graph) But i am having issues with getting the box to appear, there is only a line for the mean and no individual data points. Can you see anything obvious that i am missing, I rewatched Dannys videoes where we made a boxplot but still cant solve this issue. Here is the code I am using and output I am getting:
Next steps
- solve challenge
- continue writing up verification
- help out teammates
- go to Tuesday tutorial for assistance with my challenges
- add 2 decimal places to table for age x identifying symptoms
- create box plot