Week 10: Learning Log
My Coding Goals this Week
My goals this week are to finish my report.
Challenges/Successes
To be honest, I had a hard time this week on my report.
I ran into many challenges trying to knit my super long document and edit out any errors. I also found errors within our group coding for our verification, even though the same code worked just fine a few weeks ago.
Here is my drafted report:
Part 1: Reaction
Summary
This paper by Smith et al investigates if believing you have had COVID-19 impacts how an individual self-reports their adherence to lockdown measures. It is important to understand if people will stop adhering to measures after testing positive, as it is still unclear if an individual can have COVID more than once, and how common this could be. So far, there is no previous evidence on if adhering to protective measures is different if an individual thinks they have COVID or not (could be self-diagnosed or with antigen/antibody test). Understanding if a COVID diagnosis changes the way we try to protect ourselves from COVID and if the diagnosis impacts how we report our behaviour will help us understand future ways of exiting from lockdown strategies.
This study utilised an online cross-sectional survey, spanning 6149 UK participants aged 18+. Participants were asked if they had had COVID-19, if they had been tested, their perceived immunity to COVID-19, and other variables such as how often they went shopping and saw friends. Measurement of variables included Likert scales, binary and continuous variables. The authors hypothesised that believing you have had COVID-19 makes you more likely to believe you are immune, in addition to being less likely to adhere to social distancing measures.
It was found that those who believed they had had COVID-19 were more likely to think they are immune and stop participating in activities such as washing hands and social distancing. There was no evidence found for an association between thinking that you had had COVID-19 and its perceived risk. It remains likely that people will be required to adhere to protective measures for COVID-19 even if they have had the illness previously.
Since research around COVID is still novel, the results from this paper may significantly impact the future of implementation of lockdown rules. Currently, no media communications specifically target those who believed they have had COVID-19. As the opinions towards lockdown measures and COVID-19 immunity are different in these people, it is worth addressing this gap in the media. However, this study heavily relies on self-reported measures, where the social desirability bias impacts the way participants respond, especially in the rates of adhering to lockdown measures. The response from participants also may not be representative of the UK population.
Reaction
I wonder whether the results of this study would be universal if the same method was applied in a different country, such as Australia. It would be interesting to see if an increase/decrease of COVID-19 rates in the particular country would change these results. Similarly, if harsher COVID-19 restrictions would have any effect on the results gathered
I was confused by Figure 1, I couldn’t figure out what scale they were using for the graph and it seemed very out of place. I didn’t like how the labels on the X axis were so long and how the percentages of yes and no for each variable did not add up to 100%.
The most interesting parts of this paper was the statistics for the COVID-19 antigen test. I thought it was interesting that more than half of those who tested negative believed they had COVID-19. And since people who thought they had had COVID-19 were less likely to correctly identify COVID-19 symptoms, it seems to me that those who think they are COVID negative have a better understanding of COVID-19 symptoms. Does that mean that people who think they have had COVID-19 and think they have increased immunity become complacent?
Part 2: Verification
Now we are going to reproduce!
install.packages("dplyr")##
## The downloaded binary packages are in
## /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("gt")##
## The downloaded binary packages are in
## /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("forcats")##
## The downloaded binary packages are in
## /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("ggplot")## Warning: package 'ggplot' is not available for this version of R
##
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
install.packages("tidyr")##
## The downloaded binary packages are in
## /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("janitor")##
## The downloaded binary packages are in
## /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("hrbrthemes")##
## The downloaded binary packages are in
## /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("scales")##
## The downloaded binary packages are in
## /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
library(dplyr)##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyverse)## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3 ✓ purrr 0.3.4
## ✓ tibble 3.1.0 ✓ stringr 1.4.0
## ✓ tidyr 1.1.3 ✓ forcats 0.5.1
## ✓ readr 1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(haven)
library(gt)
library(forcats)
library(ggplot2)
library(tidyr)
library(janitor)##
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
##
## chisq.test, fisher.test
library(hrbrthemes)## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
## Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
## if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(scales)##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
COVID <- read_sav(file ="coviddata.sav")Demographic descriptives
the number of participants who thought they didn’t and did have COVID
amount_covid <- COVID %>%
group_by(Ever_covid) %>%
count(Ever_covid)
print(amount_covid)## # A tibble: 2 x 2
## # Groups: Ever_covid [2]
## Ever_covid n
## <dbl+lbl> <int>
## 1 0 [Think have not had coronavirus] 4656
## 2 1 [Think have had coronavirus] 1493
how many had an antigen test for COVID
amount_covid1_antigen <- COVID %>%
group_by(q7beentested) %>%
count(q7beentested)
print(amount_covid1_antigen)## # A tibble: 3 x 2
## # Groups: q7beentested [3]
## q7beentested n
## <dbl+lbl> <int>
## 1 0 [Not been tested] 5574
## 2 1 [Tested and result showed no coronavirus] 330
## 3 2 [Tested and result showed yes coronavirus] 245
how many tested negative but thought they had COVID
negativethought <- COVID %>%
group_by(q7beentested, Ever_covid) %>%
filter(Ever_covid == 1, q7beentested == 1) %>%
count(Ever_covid)
print(negativethought)## # A tibble: 1 x 3
## # Groups: q7beentested, Ever_covid [1]
## q7beentested Ever_covid n
## <dbl+lbl> <dbl+lbl> <int>
## 1 1 [Tested and result showed no coronaviru… 1 [Think have had coronaviru… 187
how many tested positive but thought they didnt have COVID
positivethought <- COVID %>%
group_by(q7beentested, Ever_covid) %>%
filter(Ever_covid == 0, q7beentested == 2) %>%
count(Ever_covid)
print(positivethought)## # A tibble: 1 x 3
## # Groups: q7beentested, Ever_covid [1]
## q7beentested Ever_covid n
## <dbl+lbl> <dbl+lbl> <int>
## 1 2 [Tested and result showed yes coronav… 0 [Think have not had coronavi… 56
how many males/females
COVID <- read_sav(file ="coviddata.sav")
covidgender <- COVID %>%
group_by(gender, Ever_covid) %>%
mutate(gender = case_when(gender == 1 ~ "Male", gender == 2 ~ "female")) %>%
mutate(Ever_covid = case_when(Ever_covid == 0 ~ "Think have not had COVID-19", Ever_covid == 1 ~ "Think have had COVID-19")) %>%
count(gender)
print(covidgender)## # A tibble: 4 x 3
## # Groups: gender, Ever_covid [4]
## gender Ever_covid n
## <chr> <chr> <int>
## 1 female Think have had COVID-19 796
## 2 female Think have not had COVID-19 2459
## 3 Male Think have had COVID-19 697
## 4 Male Think have not had COVID-19 2197
how many in each age category
age <- COVID %>%
group_by(age_categories) %>%
mutate(age_categories = case_when(age_categories == 1 ~ "18 to 24 years", age_categories == 2 ~ "25 to 34 years", age_categories == 3 ~ "35 to 44 years", age_categories == 4 ~ "45 to 54 years", age_categories == 5 ~ "55 years and over")) %>%
count(age_categories)
print(age)## # A tibble: 5 x 2
## # Groups: age_categories [5]
## age_categories n
## <chr> <int>
## 1 18 to 24 years 1422
## 2 25 to 34 years 1223
## 3 35 to 44 years 1045
## 4 45 to 54 years 718
## 5 55 years and over 1741
how many have children
They excluded 361 people here for some reason. Our guess is that they didn’t answer the question
child <- COVID %>%
group_by(Has_child) %>%
count(Has_child)
print(child)## # A tibble: 3 x 2
## # Groups: Has_child [3]
## Has_child n
## <dbl+lbl> <int>
## 1 0 [Does not have a child] 2626
## 2 1 [Has a child] 3162
## 3 NA 361
employment status
They excluded 83 people here
employment <- COVID %>%
group_by(Working) %>%
count(Working)
print(employment)## # A tibble: 3 x 2
## # Groups: Working [3]
## Working n
## <dbl+lbl> <int>
## 1 0 [Not working] 2071
## 2 1 [Working (full or part time or self-employed)] 3995
## 3 NA 83
how many are working in key sector
worker <- COVID %>%
group_by(Key_worker) %>%
count(Key_worker)
print(worker)## # A tibble: 2 x 2
## # Groups: Key_worker [2]
## Key_worker n
## <dbl+lbl> <int>
## 1 0 [Not key worker] 3858
## 2 1 [Key worker] 2291
level of education
They excluded 92 people, maybe those who didn’t fit into either category?
education <- COVID %>%
group_by(degree) %>%
count(degree)
print(education)## # A tibble: 3 x 2
## # Groups: degree [3]
## degree n
## <dbl+lbl> <int>
## 1 0 [GCSE/vocational/A-level/no formal qualifications] 4442
## 2 1 [Degree or higher (Bachelors, Masters, PhD)] 1615
## 3 NA 92
how many from each region
region1 <- COVID %>%
group_by(region) %>%
count(region)
print(region1)## # A tibble: 5 x 2
## # Groups: region [5]
## region n
## <dbl+lbl> <int>
## 1 1 [Midlands] 1032
## 2 2 [South & East] 1785
## 3 3 [North] 1455
## 4 4 [London] 1000
## 5 5 [Wales/Scot/NI] 877
how many agreed/strongly agreed that they had some immunity to COVID & this also compared to if they thought they had COVID
agreecovid <- COVID %>%
group_by(q8haveimmunity) %>%
filter(q8haveimmunity > 3) %>%
count(q8haveimmunity)
print(agreecovid)## # A tibble: 2 x 2
## # Groups: q8haveimmunity [2]
## q8haveimmunity n
## <dbl+lbl> <int>
## 1 4 [Agree] 841
## 2 5 [Strongly agree] 299
agreecovid1 <- COVID %>%
group_by(q8haveimmunity, Ever_covid) %>%
filter(q8haveimmunity > 3) %>%
count(Ever_covid)
print(agreecovid1)## # A tibble: 4 x 3
## # Groups: q8haveimmunity, Ever_covid [4]
## q8haveimmunity Ever_covid n
## <dbl+lbl> <dbl+lbl> <int>
## 1 4 [Agree] 0 [Think have not had coronavirus] 382
## 2 4 [Agree] 1 [Think have had coronavirus] 459
## 3 5 [Strongly agree] 0 [Think have not had coronavirus] 118
## 4 5 [Strongly agree] 1 [Think have had coronavirus] 181
Means & SDs, Tables/Figures
Table 1
This code finds the numbers and percentages for variables in table 1. First, we used zap labels to remove the pre-existing labels from each vector. We used pivot longer in order to find the total number for each variable to create percentages. We filtered out variables not needed for table 1 and then omitted NA values. We found percentages using mutate, rounding to 1 decimal place.
covid_1 <- COVID %>%
zap_labels() %>%
pivot_longer(gender:region, names_to = "vars", values_to = "values") %>%
group_by(Ever_covid, vars,values) %>%
count(name = "number") %>%
na.omit %>%
group_by(vars,values) %>%
mutate(percentage = round(number/sum(number) * 100, 1))
print(covid_1)## # A tibble: 40 x 5
## # Groups: vars, values [20]
## Ever_covid vars values number percentage
## <dbl> <chr> <dbl> <int> <dbl>
## 1 0 age_categories 1 1003 70.5
## 2 0 age_categories 2 823 67.3
## 3 0 age_categories 3 751 71.9
## 4 0 age_categories 4 554 77.2
## 5 0 age_categories 5 1525 87.6
## 6 0 degree 0 3382 76.1
## 7 0 degree 1 1200 74.3
## 8 0 gender 1 2197 75.9
## 9 0 gender 2 2459 75.5
## 10 0 Has_child 0 2005 76.4
## # … with 30 more rows
We used this code to find how many thought they have not had COVID and how many thought they did. These numbers are at the top row of table 1 and table 3
amount_covid <- COVID %>%
group_by(Ever_covid) %>%
count(Ever_covid)
print(amount_covid)## # A tibble: 2 x 2
## # Groups: Ever_covid [2]
## Ever_covid n
## <dbl+lbl> <int>
## 1 0 [Think have not had coronavirus] 4656
## 2 1 [Think have had coronavirus] 1493
We now changed the data back to wide data to begin reproducing the table. We again used zap labels to remove any leftover labels since we previously grouped before omitting NA values, so now there should not be any more labels. We used mutate to turn everything into a factor.
table1 <- covid_1 %>%
pivot_wider(names_from = vars, values_from = values) %>%
zap_labels() %>% mutate_if(is.numeric, as.factor)
print(table1)## # A tibble: 40 x 10
## Ever_covid number percentage age_categories degree gender Has_child
## <fct> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 0 1003 70.5 1 <NA> <NA> <NA>
## 2 0 823 67.3 2 <NA> <NA> <NA>
## 3 0 751 71.9 3 <NA> <NA> <NA>
## 4 0 554 77.2 4 <NA> <NA> <NA>
## 5 0 1525 87.6 5 <NA> <NA> <NA>
## 6 0 3382 76.1 <NA> 0 <NA> <NA>
## 7 0 1200 74.3 <NA> 1 <NA> <NA>
## 8 0 2197 75.9 <NA> <NA> 1 <NA>
## 9 0 2459 75.5 <NA> <NA> 2 <NA>
## 10 0 2005 76.4 <NA> <NA> <NA> 0
## # … with 30 more rows, and 3 more variables: Key_worker <fct>, region <fct>,
## # Working <fct>
Now using factors, we can create the labels from the original table.
table1$Ever_covid <- factor(table1$Ever_covid, labels = c("Think have not had COVID-19 n = 4656", "Think have had COVID-19 n = 1493"))
table1$gender <-factor(table1$gender, labels = c("Male", "Female"))
table1$age_categories <- factor(table1$age_categories, labels = c("18 to 24 years", "25 to 34 years", "35 to 44 years", "45 to 54 years", "55 years and over"))
table1$Has_child <- factor(table1$Has_child, labels = c("No", "Yes"))
table1$Working <- factor(table1$Working, labels = c("Not working", "Working"))
table1$Key_worker <- factor(table1$Key_worker, labels = c("No", "Yes"))
table1$degree <-factor(table1$degree, labels = c("GCSE/vocational/A-level/No formal qualifications", "Degree or higher (Bachelors, Masters, PhD"))
table1$region <- factor(table1$region, labels = c("Midlands", "South and East", "North", "London", "Wales, Scotland and Northern Ireland"))Now we pivot again to long data to find our percentages using mutate, and add the rest of the labels.
table1 <- table1 %>%
pivot_longer(age_categories:Working, names_to = "Participant Characteristics", values_to = "Levels") %>%
na.omit %>%
mutate(Real = paste0(number,"(",percentage,"%)")) %>%
select(-percentage, -number) %>%
pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real)
print(table1)## # A tibble: 20 x 4
## `Participant Chara… Levels `Think have not had … `Think have had CO…
## <chr> <fct> <chr> <chr>
## 1 age_categories 18 to 24 years 1003(70.5%) 419(29.5%)
## 2 age_categories 25 to 34 years 823(67.3%) 400(32.7%)
## 3 age_categories 35 to 44 years 751(71.9%) 294(28.1%)
## 4 age_categories 45 to 54 years 554(77.2%) 164(22.8%)
## 5 age_categories 55 years and o… 1525(87.6%) 216(12.4%)
## 6 degree GCSE/vocationa… 3382(76.1%) 1060(23.9%)
## 7 degree Degree or high… 1200(74.3%) 415(25.7%)
## 8 gender Male 2197(75.9%) 697(24.1%)
## 9 gender Female 2459(75.5%) 796(24.5%)
## 10 Has_child No 2005(76.4%) 621(23.6%)
## 11 Has_child Yes 2386(75.5%) 776(24.5%)
## 12 Key_worker No 3105(80.5%) 753(19.5%)
## 13 Key_worker Yes 1551(67.7%) 740(32.3%)
## 14 region Midlands 781(75.7%) 251(24.3%)
## 15 region South and East 1369(76.7%) 416(23.3%)
## 16 region North 1120(77%) 335(23%)
## 17 region London 701(70.1%) 299(29.9%)
## 18 region Wales, Scotlan… 685(78.1%) 192(21.9%)
## 19 Working Not working 1714(82.8%) 357(17.2%)
## 20 Working Working 2871(71.9%) 1124(28.1%)
table1$`Participant Characteristics` <- as.factor(table1$`Participant Characteristics`)
table1$`Participant Characteristics` <- factor(table1$`Participant Characteristics`, levels = c ( "gender", "age_categories", "Has_child", "Working", "Key_worker","degree", "region"), labels = c ( "Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region"))Using gt to create the table. Using rowname_col and groupname_col to create the headings and values. We pretty much used the gt package from here onwards for this table, reordering the groups and adding labels and fixing text to match the original table
table1 <- table1 %>%
gt(rowname_col = "Levels",
groupname_col = "Participant Characteristics") %>%
tab_spanner(label = md("**Had COVID-19**"),
columns = vars("Think have not had COVID-19 n = 4656","Think have had COVID-19 n = 1493")) %>%
row_group_order(c("Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region")) %>%
cols_label(
'Think have not had COVID-19 n = 4656' = md("**Think have not had COVID-19 n = 4656**"),
'Think have had COVID-19 n = 1493' = md( "**Think have had COVID-19 n = 1493**")
) %>%
tab_style(
style = list(
cell_text(weight = "bold"),
cell_fill()
),
locations = cells_row_groups())Table 2
Started off by finding the mean and SD for variables in table 2, as this is what is needed to reproduce the original.
MSD <- COVID %>%
group_by(Ever_covid) %>%
summarise(across(c(q8haveimmunity,Going_out_total,q9worry,q10arisk,q10brisk),
list(mean = mean, standard_deviation = sd))) %>%
mutate(across(2:11, round,2))
print(MSD)## # A tibble: 2 x 11
## Ever_covid q8haveimmunity_… q8haveimmunity_… Going_out_total… Going_out_total…
## <dbl+lbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 [Think … 2.38 1.01 6.69 5.63
## 2 1 [Think … 3.33 1 9.35 7.69
## # … with 6 more variables: q9worry_mean <dbl>,
## # q9worry_standard_deviation <dbl>, q10arisk_mean <dbl>,
## # q10arisk_standard_deviation <dbl>, q10brisk_mean <dbl>,
## # q10brisk_standard_deviation <dbl>
Once we have the means/SD, we pivot to long data, selecting only the variables we want and arranging by Ever_covid
table2mean <- MSD %>%
select(ends_with("mean"), Ever_covid) %>%
pivot_longer(ends_with("mean") , names_to = "Mean_vars", values_to = "Mean_values")
table2SD <- MSD %>%
select(ends_with("deviation"), Ever_covid) %>%
pivot_longer(ends_with("deviation") , names_to = "SD_vars", values_to = "SD_values")And now we bind these together using bind_cols, and mutate to change into the labels we want for our table, as well as renaming them.
table2 <- bind_cols(table2SD,table2mean) %>%
mutate(real = paste0("M = ", Mean_values,",", " SD = ", SD_values)) %>%
select(Ever_covid...1, Mean_vars, real) %>%
mutate(Level = case_when(startsWith(Mean_vars,"q8") ~ "1 = Strongly disagree to 5 = Strongly agree", startsWith(Mean_vars,"q9") ~"1 = not worried at all to 5 = extremely worried", startsWith(Mean_vars, "Going") ~"Range = 0 to 42", TRUE ~"1 = not risk at all to 4 = major risk ")) %>%
rename("Participant characteristics" = "Mean_vars") ## New names:
## * Ever_covid -> Ever_covid...1
## * Ever_covid -> Ever_covid...4
print(table2)## # A tibble: 10 x 4
## Ever_covid...1 `Participant charact… real Level
## <dbl+lbl> <chr> <chr> <chr>
## 1 0 [Think have not had… q8haveimmunity_mean M = 2.38,… "1 = Strongly disagr…
## 2 0 [Think have not had… Going_out_total_mean M = 6.69,… "Range = 0 to 42"
## 3 0 [Think have not had… q9worry_mean M = 3.59,… "1 = not worried at …
## 4 0 [Think have not had… q10arisk_mean M = 2.81,… "1 = not risk at all…
## 5 0 [Think have not had… q10brisk_mean M = 3.39,… "1 = not risk at all…
## 6 1 [Think have had cor… q8haveimmunity_mean M = 3.33,… "1 = Strongly disagr…
## 7 1 [Think have had cor… Going_out_total_mean M = 9.35,… "Range = 0 to 42"
## 8 1 [Think have had cor… q9worry_mean M = 3.38,… "1 = not worried at …
## 9 1 [Think have had cor… q10arisk_mean M = 2.81,… "1 = not risk at all…
## 10 1 [Think have had cor… q10brisk_mean M = 3.3, … "1 = not risk at all…
Now we change them into factors..
table2$Ever_covid...1 <- as.factor(table2$Ever_covid...1)
table2$`Participant characteristics` <- as.factor(table2$`Participant characteristics`)
table2$Ever_covid...1 <- factor(table2$Ever_covid...1, labels = c("Think have not had COVID-19", "Think have had COVID-19"))
table2$`Participant characteristics` <- factor(table2$`Participant characteristics`, labels = c("I think I have some immunity to COVID-19", "Total out-of-home activity in the last seven days","Worry about COVID-19","Perceived risk of COVID-19 to onself","Perceived risk of COVID-19 to people in the UK"))Back to wide data to use gt. We add our subheadings and heading/title and it’s complete.
table2 <- table2 %>%
pivot_wider(id_cols = -real, names_from = Ever_covid...1, values_from = real) %>%
gt(rowname_col = "Levels",
groupname_col = "Participant Characteristics") %>%
tab_spanner(label = "Had COVID-19",
columns = vars("Think have not had COVID-19", "Think have had COVID-19"))Table 3
We pivot to longer data to find numbers for each variable in table 3, sorted according to if they have had COVID-19 or not.
t3_top <- COVID %>%
pivot_longer(Adhere_shop_groceries:Sx_covid_nomissing, names_to = "vars" , values_to = "values") %>%
group_by(vars, values) %>%
count(name = "number")
print(t3_top)## # A tibble: 8 x 3
## # Groups: vars, values [8]
## vars values number
## <chr> <dbl> <int>
## 1 Adhere_meet_friends 0 878
## 2 Adhere_meet_friends 1 5271
## 3 Adhere_shop_groceries 0 2389
## 4 Adhere_shop_groceries 1 3760
## 5 Adhere_shop_other 0 1833
## 6 Adhere_shop_other 1 4316
## 7 Sx_covid_nomissing 0 2517
## 8 Sx_covid_nomissing 1 3632
We now include Ever_covid while pivoting to long data as this is needed for the left column in our table. Omitting any NA values and mutating creates a new column with our percentages.
covid_3 <- COVID %>%
pivot_longer(Adhere_shop_groceries:Sx_covid_nomissing, names_to = "vars", values_to = "values") %>%
group_by(Ever_covid,vars,values) %>%
count(name = "number") %>%
na.omit %>%
group_by(Ever_covid, vars) %>%
mutate(percentage = round(number/sum(number) * 100,1))
print(covid_3)## # A tibble: 16 x 5
## # Groups: Ever_covid, vars [8]
## Ever_covid vars values number percentage
## <dbl+lbl> <chr> <dbl> <int> <dbl>
## 1 0 [Think have not had coronavir… Adhere_meet_friends 0 456 9.8
## 2 0 [Think have not had coronavir… Adhere_meet_friends 1 4200 90.2
## 3 0 [Think have not had coronavir… Adhere_shop_grocer… 0 1701 36.5
## 4 0 [Think have not had coronavir… Adhere_shop_grocer… 1 2955 63.5
## 5 0 [Think have not had coronavir… Adhere_shop_other 0 1156 24.8
## 6 0 [Think have not had coronavir… Adhere_shop_other 1 3500 75.2
## 7 0 [Think have not had coronavir… Sx_covid_nomissing 0 1729 37.1
## 8 0 [Think have not had coronavir… Sx_covid_nomissing 1 2927 62.9
## 9 1 [Think have had coronavirus] Adhere_meet_friends 0 422 28.3
## 10 1 [Think have had coronavirus] Adhere_meet_friends 1 1071 71.7
## 11 1 [Think have had coronavirus] Adhere_shop_grocer… 0 688 46.1
## 12 1 [Think have had coronavirus] Adhere_shop_grocer… 1 805 53.9
## 13 1 [Think have had coronavirus] Adhere_shop_other 0 677 45.3
## 14 1 [Think have had coronavirus] Adhere_shop_other 1 816 54.7
## 15 1 [Think have had coronavirus] Sx_covid_nomissing 0 788 52.8
## 16 1 [Think have had coronavirus] Sx_covid_nomissing 1 705 47.2
Back to wide data to prepare our tibble for the table by removing pre-existing labels using zap labels like before, and changing into factors.
table3 <- covid_3 %>%
pivot_wider(names_from = vars, values_from = values) %>%
zap_labels() %>% mutate_if(is.numeric, as.factor) ## `mutate_if()` ignored the following grouping variables:
## Column `Ever_covid`
print(table3)## # A tibble: 16 x 7
## # Groups: Ever_covid [2]
## Ever_covid number percentage Adhere_meet_friends Adhere_shop_groceries
## <dbl> <fct> <fct> <fct> <fct>
## 1 0 456 9.8 0 <NA>
## 2 0 4200 90.2 1 <NA>
## 3 0 1701 36.5 <NA> 0
## 4 0 2955 63.5 <NA> 1
## 5 0 1156 24.8 <NA> <NA>
## 6 0 3500 75.2 <NA> <NA>
## 7 0 1729 37.1 <NA> <NA>
## 8 0 2927 62.9 <NA> <NA>
## 9 1 422 28.3 0 <NA>
## 10 1 1071 71.7 1 <NA>
## 11 1 688 46.1 <NA> 0
## 12 1 805 53.9 <NA> 1
## 13 1 677 45.3 <NA> <NA>
## 14 1 816 54.7 <NA> <NA>
## 15 1 788 52.8 <NA> <NA>
## 16 1 705 47.2 <NA> <NA>
## # … with 2 more variables: Adhere_shop_other <fct>, Sx_covid_nomissing <fct>
table3$Ever_covid <- as.factor(table3$Ever_covid)
table3$Ever_covid <- factor(table3$Ever_covid, labels = c("Think have not had COVID-19", "Think have had COVID-19"))
table3$Adhere_shop_groceries <- factor(table3$Adhere_shop_groceries, labels = c("On one or fewer days in the last week, n = 2389", "On two or more days in the last week, n = 3760"))
table3$Adhere_meet_friends <- factor(table3$Adhere_meet_friends, labels = c("Not at all in the last week, n = 5271", "On one or more days in the last week, n = 878"))
table3$Adhere_shop_other <- factor(table3$Adhere_shop_other, labels = c("Not at all in the last week, n = 1833", "On one or more days in the last week, n = 4316"))
table3$Sx_covid_nomissing <- factor(table3$Sx_covid_nomissing, labels = c("Did not correctly identify symptoms, n = 2390", "Correctly identified common symptoms, n = 3632"))Pivoting to long data once again to fix up our column headings
table3 <- table3 %>%
pivot_longer(Adhere_meet_friends:Sx_covid_nomissing, names_to = "Participant Characteristics", values_to = "Levels") %>%
na.omit %>%
mutate(Real = paste0(number,"(",percentage,")")) %>%
select(-number,-percentage) %>%
pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real)
print(table3)## # A tibble: 8 x 4
## `Participant Charac… Levels `Think have not ha… `Think have had C…
## <chr> <fct> <chr> <chr>
## 1 Adhere_meet_friends Not at all in the… 456(9.8) 422(28.3)
## 2 Adhere_meet_friends On one or more da… 4200(90.2) 1071(71.7)
## 3 Adhere_shop_groceri… On one or fewer d… 1701(36.5) 688(46.1)
## 4 Adhere_shop_groceri… On two or more da… 2955(63.5) 805(53.9)
## 5 Adhere_shop_other Not at all in the… 1156(24.8) 677(45.3)
## 6 Adhere_shop_other On one or more da… 3500(75.2) 816(54.7)
## 7 Sx_covid_nomissing Did not correctly… 1729(37.1) 788(52.8)
## 8 Sx_covid_nomissing Correctly identif… 2927(62.9) 705(47.2)
Changing to factors
table3$`Participant Characteristics` <- as.factor(table3$`Participant Characteristics`)
table3$`Participant Characteristics` <- factor(table3$`Participant Characteristics`, labels = c(
"Meet up with family and friends","Shopping for groceries/pharmacy", "Shopping for items other than groceries or pharmacy",
"Correct identification of main symptoms"
))Using gt to create the table. Complete
table3 <- table3 %>%
gt(rowname_col = "Levels",
groupname_col = "Participant Characteristics") %>%
tab_spanner(label = "Had COVID-19",
columns = vars("Think have not had COVID-19",
"Think have had COVID-19")) %>%
row_group_order(c("Meet up with family and friends", "Shopping for groceries/pharmacy",
"Shopping for items other than groceries or pharmacy", "Correct identification of main symptoms" ))Figure 1 graph
This reproduces the original graph
We first removed the pre-existing labels with zap_labels and pivoted to long data with all our variables as columns. Once we had a column with our number of participants for each variable and also a column stating whether or not they thought they have had COVID, we were able to create a percentage column and filter for the values we needed from each variable. Finally, we add our labels back for the Ever_Covid variable.
graph_values_real <- COVID %>%
zap_labels() %>%
mutate(Going_out_total = case_when(Going_out_total > 7 ~ 1, TRUE ~0)) %>%
pivot_longer(q8haveimmunity:Sx_covid_nomissing, names_to = "vars", values_to = "values") %>%
group_by(Ever_covid,vars,values) %>%
count(name = "number") %>%
na.omit %>%
group_by(Ever_covid, vars) %>%
mutate(Percentage = round(number/sum(number) * 100,1)) %>%
filter(case_when(vars == "q8haveimmunity" ~ values == 5,
vars == "Sx_covid_nomissing" ~ values == 0 , startsWith(vars,'Adhere') ~ values ==0,
TRUE ~ values == 1)) %>%
select(vars, Ever_covid, Percentage) %>%
mutate(qcovid = case_when(Ever_covid==1 ~ "Think have had COVID-19", TRUE ~ "Think have not had COVID-19"))
print(graph_values_real)## # A tibble: 18 x 4
## # Groups: Ever_covid, vars [18]
## vars Ever_covid Percentage qcovid
## <chr> <dbl> <dbl> <chr>
## 1 Adhere_meet_friends 0 9.8 Think have not had COVID-19
## 2 Adhere_shop_groceries 0 36.5 Think have not had COVID-19
## 3 Adhere_shop_other 0 24.8 Think have not had COVID-19
## 4 Going_out_total 0 37.6 Think have not had COVID-19
## 5 q10arisk 0 3.2 Think have not had COVID-19
## 6 q10brisk 0 1 Think have not had COVID-19
## 7 q8haveimmunity 0 2.5 Think have not had COVID-19
## 8 q9worry 0 2.5 Think have not had COVID-19
## 9 Sx_covid_nomissing 0 37.1 Think have not had COVID-19
## 10 Adhere_meet_friends 1 28.3 Think have had COVID-19
## 11 Adhere_shop_groceries 1 46.1 Think have had COVID-19
## 12 Adhere_shop_other 1 45.3 Think have had COVID-19
## 13 Going_out_total 1 51.2 Think have had COVID-19
## 14 q10arisk 1 3.9 Think have had COVID-19
## 15 q10brisk 1 1.3 Think have had COVID-19
## 16 q8haveimmunity 1 12.1 Think have had COVID-19
## 17 q9worry 1 6 Think have had COVID-19
## 18 Sx_covid_nomissing 1 52.8 Think have had COVID-19
covid_1 <- COVID %>%
zap_labels() %>%
pivot_longer(gender:region, names_to = "vars", values_to = "values") %>%
group_by(Ever_covid, vars,values) %>%
count(name = "number") %>%
na.omit %>%
group_by(vars,values) %>%
mutate(percentage = round(number/sum(number) * 100, 1))
alldata <- covid_1 %>%
pivot_wider(names_from = vars, values_from = values) %>%
zap_labels() %>% mutate_if(is.numeric, as.factor)changing each variable into a factor and adding our labels from the original table
alldata$Ever_covid <- factor(alldata$Ever_covid, labels = c("no_covid", "yes_covid"))
alldata$gender <-factor(alldata$gender, labels = c("Male", "Female"))
alldata$age_categories <- factor(alldata$age_categories, labels = c("18 to 24 years", "25 to 34 years", "35 to 44 years", "45 to 54 years", "55 years and over"))
alldata$Has_child <- factor(alldata$Has_child, labels = c("No", "Yes"))
alldata$Working <- factor(alldata$Working, labels = c("Not working", "Working"))
alldata$Key_worker <- factor(alldata$Key_worker, labels = c("No", "Yes"))
alldata$degree <-factor(alldata$degree, labels = c("GCSE/vocational/A-level/No formal qualifications", "Degree or higher (Bachelors, Masters, PhD"))
alldata$region <- factor(alldata$region, labels = c("Midlands", "South and East", "North", "London", "Wales, Scotland and Northern Ireland"))
alldata <- alldata %>%
pivot_longer(age_categories:Working, names_to = "vars", values_to = "values") %>%
na.omit %>%
mutate(Real = paste0(number,"(",percentage,")")) %>%
select(-percentage, -number) %>%
pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real) %>%
rename("Think have not had COVID-19" = "no_covid") %>%
rename("Think have had COVID-19" = "yes_covid") %>%
rename("Level" = "values") %>%
rename("Participant Characteristics" = "vars")
print(alldata) ## # A tibble: 20 x 4
## `Participant Charac… Level `Think have not ha… `Think have had C…
## <chr> <fct> <chr> <chr>
## 1 age_categories 18 to 24 years 1003(70.5) 419(29.5)
## 2 age_categories 25 to 34 years 823(67.3) 400(32.7)
## 3 age_categories 35 to 44 years 751(71.9) 294(28.1)
## 4 age_categories 45 to 54 years 554(77.2) 164(22.8)
## 5 age_categories 55 years and over 1525(87.6) 216(12.4)
## 6 degree GCSE/vocational/… 3382(76.1) 1060(23.9)
## 7 degree Degree or higher… 1200(74.3) 415(25.7)
## 8 gender Male 2197(75.9) 697(24.1)
## 9 gender Female 2459(75.5) 796(24.5)
## 10 Has_child No 2005(76.4) 621(23.6)
## 11 Has_child Yes 2386(75.5) 776(24.5)
## 12 Key_worker No 3105(80.5) 753(19.5)
## 13 Key_worker Yes 1551(67.7) 740(32.3)
## 14 region Midlands 781(75.7) 251(24.3)
## 15 region South and East 1369(76.7) 416(23.3)
## 16 region North 1120(77) 335(23)
## 17 region London 701(70.1) 299(29.9)
## 18 region Wales, Scotland … 685(78.1) 192(21.9)
## 19 Working Not working 1714(82.8) 357(17.2)
## 20 Working Working 2871(71.9) 1124(28.1)
alldata$`Participant Characteristics` <- as.factor(alldata$`Participant Characteristics`)
alldata$`Participant Characteristics` <- factor(alldata$`Participant Characteristics`, levels = c ( "gender", "age_categories","Has_child", "Working", "Key_worker","degree", "region"),labels = c ( "Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region"))Using gt to give a rough outline of our graph in the form of a table, making sure our percentages are rounded to 2 decimal places
alldata <- alldata %>%
gt() %>%
tab_spanner(label = "Had COVID-19",
columns = vars("Think have not had COVID-19","Think have had COVID-19"))
print(alldata)## <style>html {
## font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
## }
##
## #fsdeledpoe .gt_table {
## display: table;
## border-collapse: collapse;
## margin-left: auto;
## margin-right: auto;
## color: #333333;
## font-size: 16px;
## font-weight: normal;
## font-style: normal;
## background-color: #FFFFFF;
## width: auto;
## border-top-style: solid;
## border-top-width: 2px;
## border-top-color: #A8A8A8;
## border-right-style: none;
## border-right-width: 2px;
## border-right-color: #D3D3D3;
## border-bottom-style: solid;
## border-bottom-width: 2px;
## border-bottom-color: #A8A8A8;
## border-left-style: none;
## border-left-width: 2px;
## border-left-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_heading {
## background-color: #FFFFFF;
## text-align: center;
## border-bottom-color: #FFFFFF;
## border-left-style: none;
## border-left-width: 1px;
## border-left-color: #D3D3D3;
## border-right-style: none;
## border-right-width: 1px;
## border-right-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_title {
## color: #333333;
## font-size: 125%;
## font-weight: initial;
## padding-top: 4px;
## padding-bottom: 4px;
## border-bottom-color: #FFFFFF;
## border-bottom-width: 0;
## }
##
## #fsdeledpoe .gt_subtitle {
## color: #333333;
## font-size: 85%;
## font-weight: initial;
## padding-top: 0;
## padding-bottom: 4px;
## border-top-color: #FFFFFF;
## border-top-width: 0;
## }
##
## #fsdeledpoe .gt_bottom_border {
## border-bottom-style: solid;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_col_headings {
## border-top-style: solid;
## border-top-width: 2px;
## border-top-color: #D3D3D3;
## border-bottom-style: solid;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## border-left-style: none;
## border-left-width: 1px;
## border-left-color: #D3D3D3;
## border-right-style: none;
## border-right-width: 1px;
## border-right-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_col_heading {
## color: #333333;
## background-color: #FFFFFF;
## font-size: 100%;
## font-weight: normal;
## text-transform: inherit;
## border-left-style: none;
## border-left-width: 1px;
## border-left-color: #D3D3D3;
## border-right-style: none;
## border-right-width: 1px;
## border-right-color: #D3D3D3;
## vertical-align: bottom;
## padding-top: 5px;
## padding-bottom: 6px;
## padding-left: 5px;
## padding-right: 5px;
## overflow-x: hidden;
## }
##
## #fsdeledpoe .gt_column_spanner_outer {
## color: #333333;
## background-color: #FFFFFF;
## font-size: 100%;
## font-weight: normal;
## text-transform: inherit;
## padding-top: 0;
## padding-bottom: 0;
## padding-left: 4px;
## padding-right: 4px;
## }
##
## #fsdeledpoe .gt_column_spanner_outer:first-child {
## padding-left: 0;
## }
##
## #fsdeledpoe .gt_column_spanner_outer:last-child {
## padding-right: 0;
## }
##
## #fsdeledpoe .gt_column_spanner {
## border-bottom-style: solid;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## vertical-align: bottom;
## padding-top: 5px;
## padding-bottom: 6px;
## overflow-x: hidden;
## display: inline-block;
## width: 100%;
## }
##
## #fsdeledpoe .gt_group_heading {
## padding: 8px;
## color: #333333;
## background-color: #FFFFFF;
## font-size: 100%;
## font-weight: initial;
## text-transform: inherit;
## border-top-style: solid;
## border-top-width: 2px;
## border-top-color: #D3D3D3;
## border-bottom-style: solid;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## border-left-style: none;
## border-left-width: 1px;
## border-left-color: #D3D3D3;
## border-right-style: none;
## border-right-width: 1px;
## border-right-color: #D3D3D3;
## vertical-align: middle;
## }
##
## #fsdeledpoe .gt_empty_group_heading {
## padding: 0.5px;
## color: #333333;
## background-color: #FFFFFF;
## font-size: 100%;
## font-weight: initial;
## border-top-style: solid;
## border-top-width: 2px;
## border-top-color: #D3D3D3;
## border-bottom-style: solid;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## vertical-align: middle;
## }
##
## #fsdeledpoe .gt_from_md > :first-child {
## margin-top: 0;
## }
##
## #fsdeledpoe .gt_from_md > :last-child {
## margin-bottom: 0;
## }
##
## #fsdeledpoe .gt_row {
## padding-top: 8px;
## padding-bottom: 8px;
## padding-left: 5px;
## padding-right: 5px;
## margin: 10px;
## border-top-style: solid;
## border-top-width: 1px;
## border-top-color: #D3D3D3;
## border-left-style: none;
## border-left-width: 1px;
## border-left-color: #D3D3D3;
## border-right-style: none;
## border-right-width: 1px;
## border-right-color: #D3D3D3;
## vertical-align: middle;
## overflow-x: hidden;
## }
##
## #fsdeledpoe .gt_stub {
## color: #333333;
## background-color: #FFFFFF;
## font-size: 100%;
## font-weight: initial;
## text-transform: inherit;
## border-right-style: solid;
## border-right-width: 2px;
## border-right-color: #D3D3D3;
## padding-left: 12px;
## }
##
## #fsdeledpoe .gt_summary_row {
## color: #333333;
## background-color: #FFFFFF;
## text-transform: inherit;
## padding-top: 8px;
## padding-bottom: 8px;
## padding-left: 5px;
## padding-right: 5px;
## }
##
## #fsdeledpoe .gt_first_summary_row {
## padding-top: 8px;
## padding-bottom: 8px;
## padding-left: 5px;
## padding-right: 5px;
## border-top-style: solid;
## border-top-width: 2px;
## border-top-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_grand_summary_row {
## color: #333333;
## background-color: #FFFFFF;
## text-transform: inherit;
## padding-top: 8px;
## padding-bottom: 8px;
## padding-left: 5px;
## padding-right: 5px;
## }
##
## #fsdeledpoe .gt_first_grand_summary_row {
## padding-top: 8px;
## padding-bottom: 8px;
## padding-left: 5px;
## padding-right: 5px;
## border-top-style: double;
## border-top-width: 6px;
## border-top-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_striped {
## background-color: rgba(128, 128, 128, 0.05);
## }
##
## #fsdeledpoe .gt_table_body {
## border-top-style: solid;
## border-top-width: 2px;
## border-top-color: #D3D3D3;
## border-bottom-style: solid;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_footnotes {
## color: #333333;
## background-color: #FFFFFF;
## border-bottom-style: none;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## border-left-style: none;
## border-left-width: 2px;
## border-left-color: #D3D3D3;
## border-right-style: none;
## border-right-width: 2px;
## border-right-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_footnote {
## margin: 0px;
## font-size: 90%;
## padding: 4px;
## }
##
## #fsdeledpoe .gt_sourcenotes {
## color: #333333;
## background-color: #FFFFFF;
## border-bottom-style: none;
## border-bottom-width: 2px;
## border-bottom-color: #D3D3D3;
## border-left-style: none;
## border-left-width: 2px;
## border-left-color: #D3D3D3;
## border-right-style: none;
## border-right-width: 2px;
## border-right-color: #D3D3D3;
## }
##
## #fsdeledpoe .gt_sourcenote {
## font-size: 90%;
## padding: 4px;
## }
##
## #fsdeledpoe .gt_left {
## text-align: left;
## }
##
## #fsdeledpoe .gt_center {
## text-align: center;
## }
##
## #fsdeledpoe .gt_right {
## text-align: right;
## font-variant-numeric: tabular-nums;
## }
##
## #fsdeledpoe .gt_font_normal {
## font-weight: normal;
## }
##
## #fsdeledpoe .gt_font_bold {
## font-weight: bold;
## }
##
## #fsdeledpoe .gt_font_italic {
## font-style: italic;
## }
##
## #fsdeledpoe .gt_super {
## font-size: 65%;
## }
##
## #fsdeledpoe .gt_footnote_marks {
## font-style: italic;
## font-size: 65%;
## }
## </style>
## <div id="fsdeledpoe" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;"><table class="gt_table">
##
## <thead class="gt_col_headings">
## <tr>
## <th class="gt_col_heading gt_center gt_columns_bottom_border" rowspan="2" colspan="1">Participant Characteristics</th>
## <th class="gt_col_heading gt_center gt_columns_bottom_border" rowspan="2" colspan="1">Level</th>
## <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="2">
## <span class="gt_column_spanner">Had COVID-19</span>
## </th>
## </tr>
## <tr>
## <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1">Think have not had COVID-19</th>
## <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1">Think have had COVID-19</th>
## </tr>
## </thead>
## <tbody class="gt_table_body">
## <tr>
## <td class="gt_row gt_center">Age</td>
## <td class="gt_row gt_center">18 to 24 years</td>
## <td class="gt_row gt_left">1003(70.5)</td>
## <td class="gt_row gt_left">419(29.5)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Age</td>
## <td class="gt_row gt_center">25 to 34 years</td>
## <td class="gt_row gt_left">823(67.3)</td>
## <td class="gt_row gt_left">400(32.7)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Age</td>
## <td class="gt_row gt_center">35 to 44 years</td>
## <td class="gt_row gt_left">751(71.9)</td>
## <td class="gt_row gt_left">294(28.1)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Age</td>
## <td class="gt_row gt_center">45 to 54 years</td>
## <td class="gt_row gt_left">554(77.2)</td>
## <td class="gt_row gt_left">164(22.8)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Age</td>
## <td class="gt_row gt_center">55 years and over</td>
## <td class="gt_row gt_left">1525(87.6)</td>
## <td class="gt_row gt_left">216(12.4)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Highest Education or Professional Qualification</td>
## <td class="gt_row gt_center">GCSE/vocational/A-level/No formal qualifications</td>
## <td class="gt_row gt_left">3382(76.1)</td>
## <td class="gt_row gt_left">1060(23.9)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Highest Education or Professional Qualification</td>
## <td class="gt_row gt_center">Degree or higher (Bachelors, Masters, PhD</td>
## <td class="gt_row gt_left">1200(74.3)</td>
## <td class="gt_row gt_left">415(25.7)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Gender</td>
## <td class="gt_row gt_center">Male</td>
## <td class="gt_row gt_left">2197(75.9)</td>
## <td class="gt_row gt_left">697(24.1)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Gender</td>
## <td class="gt_row gt_center">Female</td>
## <td class="gt_row gt_left">2459(75.5)</td>
## <td class="gt_row gt_left">796(24.5)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Have a child</td>
## <td class="gt_row gt_center">No</td>
## <td class="gt_row gt_left">2005(76.4)</td>
## <td class="gt_row gt_left">621(23.6)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Have a child</td>
## <td class="gt_row gt_center">Yes</td>
## <td class="gt_row gt_left">2386(75.5)</td>
## <td class="gt_row gt_left">776(24.5)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Working in Key Sector</td>
## <td class="gt_row gt_center">No</td>
## <td class="gt_row gt_left">3105(80.5)</td>
## <td class="gt_row gt_left">753(19.5)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Working in Key Sector</td>
## <td class="gt_row gt_center">Yes</td>
## <td class="gt_row gt_left">1551(67.7)</td>
## <td class="gt_row gt_left">740(32.3)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Region</td>
## <td class="gt_row gt_center">Midlands</td>
## <td class="gt_row gt_left">781(75.7)</td>
## <td class="gt_row gt_left">251(24.3)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Region</td>
## <td class="gt_row gt_center">South and East</td>
## <td class="gt_row gt_left">1369(76.7)</td>
## <td class="gt_row gt_left">416(23.3)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Region</td>
## <td class="gt_row gt_center">North</td>
## <td class="gt_row gt_left">1120(77)</td>
## <td class="gt_row gt_left">335(23)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Region</td>
## <td class="gt_row gt_center">London</td>
## <td class="gt_row gt_left">701(70.1)</td>
## <td class="gt_row gt_left">299(29.9)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Region</td>
## <td class="gt_row gt_center">Wales, Scotland and Northern Ireland</td>
## <td class="gt_row gt_left">685(78.1)</td>
## <td class="gt_row gt_left">192(21.9)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Employment Status</td>
## <td class="gt_row gt_center">Not working</td>
## <td class="gt_row gt_left">1714(82.8)</td>
## <td class="gt_row gt_left">357(17.2)</td>
## </tr>
## <tr>
## <td class="gt_row gt_center">Employment Status</td>
## <td class="gt_row gt_center">Working</td>
## <td class="gt_row gt_left">2871(71.9)</td>
## <td class="gt_row gt_left">1124(28.1)</td>
## </tr>
## </tbody>
##
##
## </table></div>
Percentage <- c("Percentage")
graph_values_real$Percentage <- as.numeric(graph_values_real$Percentage)
graph_values_real$Ever_covid <- as.factor(graph_values_real$Ever_covid)
graphh <- graph_values_real %>%
ggplot(aes(fill=Ever_covid, x= vars, y=Percentage)) +
geom_bar(position = position_dodge(width = 0.8),stat = "identity", width = 0.6 ) +
scale_y_continuous(limits = c(0,60), breaks = c(0,10,20,30,40,50,60)) +
scale_fill_grey(start = 0.8, end = 0) +
theme_bw() +
theme(axis.text.x = element_text(angle = 55, vjust = 1, hjust = 1),
legend.position = "bottom",
legend.title = element_blank(),
axis.title.x = element_blank(),
axis.title.y = element_text(vjust = 4))This is our improved version of the graph
summarising all the adhere values since they are needed to reproduce the graph. We now also round to 2 decimal places.
graph2 <- COVID %>%
group_by(Ever_covid) %>%
summarise(across(c(starts_with("adhere")), list(mean = mean, standard_deviation = sd)))
graph2 <- graph2 %>%
mutate(across(everything(), round,2))
print(graph2)## # A tibble: 2 x 7
## Ever_covid Adhere_shop_gro… Adhere_shop_gro… Adhere_shop_oth… Adhere_shop_oth…
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0 0.63 0.48 0.75 0.43
## 2 1 0.54 0.5 0.55 0.5
## # … with 2 more variables: Adhere_meet_friends_mean <dbl>,
## # Adhere_meet_friends_standard_deviation <dbl>
pivoting to long data where we have our means, and repeating for SD
graph2mean <- graph2 %>%
select(ends_with("mean"), Ever_covid) %>%
pivot_longer(ends_with("mean") , names_to = "Mean_vars", values_to = "Mean_values")
graph2SD <- graph2 %>%
select(ends_with("deviation"), Ever_covid) %>%
pivot_longer(ends_with("deviation") , names_to = "SD_vars", values_to = "SD_values")using bind_cols to bind these two together, and changing to a factor. Also labelling the Ever_covid variable like it will appear in the graph
graph2 <- bind_cols(graph2SD,graph2mean)## New names:
## * Ever_covid -> Ever_covid...1
## * Ever_covid -> Ever_covid...4
graph2$Ever_covid...1 <- as.factor(graph2$Ever_covid...1)
graph2$Ever_covid...1 <- factor(graph2$Ever_covid...1,levels = c(0,1), labels = c("Think have not had COVID-19","Think have had COVID-19"))Part 3: Exploration
1 - does gender impact on shopping for non-essentials?
I wonder if gender impacts the percentage of non-essential shopping according to lockdown measures. Non-essential shopping is shopping for items other than groceries and medicine. Females are usually generalised to shop more, so I was curious if this applied here.
# overview of data needed
exploratory1 <- COVID %>%
group_by(Adhere_shop_other, gender) %>%
count(Adhere_shop_other)
print(exploratory1)## # A tibble: 4 x 3
## # Groups: Adhere_shop_other, gender [4]
## Adhere_shop_other gender n
## <dbl+lbl> <dbl+lbl> <int>
## 1 0 [Reported shopping once or more (not adhering to guidance)] 1 [Male] 1050
## 2 0 [Reported shopping once or more (not adhering to guidance)] 2 [Female] 783
## 3 1 [Reported not shopping for non-essentials (adhering)] 1 [Male] 1844
## 4 1 [Reported not shopping for non-essentials (adhering)] 2 [Female] 2472
# manually cleaning names + filtering for shopping for non-essentials (not adhering)
# the variables were coded wrong, so 0 is shopping and 1 is not shopping
exploratory2 <- exploratory1 %>%
mutate(gender = case_when(gender == 1 ~ "Male", gender == 2 ~ "Female")) %>% mutate(Adhere_shop_other = case_when(Adhere_shop_other == 0 ~ "Shopping", Adhere_shop_other == 1 ~ "Not Shopping")) %>%
group_by(Adhere_shop_other, gender) %>%
summarise(Total = sum(n))## `summarise()` has grouped output by 'Adhere_shop_other'. You can override using the `.groups` argument.
print(exploratory2)## # A tibble: 4 x 3
## # Groups: Adhere_shop_other [2]
## Adhere_shop_other gender Total
## <chr> <chr> <int>
## 1 Not Shopping Female 2472
## 2 Not Shopping Male 1844
## 3 Shopping Female 783
## 4 Shopping Male 1050
# pivoting to wide data to get total participants
exploratory2wide <- exploratory2 %>%
pivot_wider(
id_cols = Adhere_shop_other,
names_from = gender,
values_from = Total
)
print(exploratory2wide)## # A tibble: 2 x 3
## # Groups: Adhere_shop_other [2]
## Adhere_shop_other Female Male
## <chr> <int> <int>
## 1 Not Shopping 2472 1844
## 2 Shopping 783 1050
# and adding Total column again and percentage
exploratory2wide %>%
rowwise() %>%
mutate(
Total = sum(c(Female, Male))
) %>%
mutate(
Percentage_Female = (Female/Total)*100
) %>%
mutate(
Percentage_Male = (Male/Total)*100
) %>%
filter(Adhere_shop_other == "Shopping") ## # A tibble: 1 x 6
## # Rowwise: Adhere_shop_other
## Adhere_shop_other Female Male Total Percentage_Female Percentage_Male
## <chr> <int> <int> <int> <dbl> <dbl>
## 1 Shopping 783 1050 1833 42.7 57.3
print(exploratory2wide)## # A tibble: 2 x 3
## # Groups: Adhere_shop_other [2]
## Adhere_shop_other Female Male
## <chr> <int> <int>
## 1 Not Shopping 2472 1844
## 2 Shopping 783 1050
# using janitor to put percentages in brackets
exploratory2wide %>%
adorn_percentages("row") %>%
adorn_pct_formatting(digits = 2) %>%
adorn_ns() %>%
filter(Adhere_shop_other == "Shopping") %>%
pivot_longer(
cols = c(Female, Male),
names_to = "gender",
values_to = "Percentage")## # A tibble: 2 x 3
## # Groups: Adhere_shop_other [1]
## Adhere_shop_other gender Percentage
## <chr> <chr> <chr>
## 1 Shopping Female 42.72% (783)
## 2 Shopping Male 57.28% (1050)
gender <- c("Female", "Male")
print(exploratory2wide)## # A tibble: 2 x 3
## # Groups: Adhere_shop_other [2]
## Adhere_shop_other Female Male
## <chr> <int> <int>
## 1 Not Shopping 2472 1844
## 2 Shopping 783 1050
# using ggplot to create a bar graph
ggplotfinal1 <- ggplot(data = exploratory2wide, aes(x = gender, y = Percentage, fill = gender)) + geom_bar(stat = "identity", width=0.5, colour="black", fill=rgb(1, 0.5, 0.7, 1)) +
labs(x = "Gender", y ="Percentage (%) ", title = "Non-Adherence to Non-Essential Shopping Guidelines") + theme_minimal()
print(ggplotfinal1)I had a lotttt of issues here, did a lot of Googling and filled up my laptop’s Chrome history with R-related questions.
- tried to pivot to wide data
exploratorywide <- exploratory2 %>% pivot_wider( id_cols = Adhering_to_Guidelines, names_from = Gender, values_from = n )
- tried to make a percentage column
exploratoryfinal %>% select(Total, n) %>% mutate( Percentage = (values*2))
Interestingly, it seems like males were more likely to go out for non-essential shopping. However, we must keep in mind still that this study heavily relied on self-reporting, and there is no real way to know if these numbers are accurate, or if these numbers represent the general UK population
2. does region impact on testing rates? I was interested in seeing if living in a different region in the UK impact testing rates. As a different number of participants were surveyed for each region, I counteracted this by finding the testing rate as a percentage, and included both positive and negative results.
# overview of data needed
q7beentested <- c("q7beentested")
COVID <- read_sav(file ="coviddata.sav")
exp2 <- COVID %>%
group_by(region, q7beentested) %>%
count(region)
print(exp2)## # A tibble: 15 x 3
## # Groups: region, q7beentested [15]
## region q7beentested n
## <dbl+lbl> <dbl+lbl> <int>
## 1 1 [Midlands] 0 [Not been tested] 930
## 2 1 [Midlands] 1 [Tested and result showed no coronavirus] 62
## 3 1 [Midlands] 2 [Tested and result showed yes coronavirus] 40
## 4 2 [South & East] 0 [Not been tested] 1656
## 5 2 [South & East] 1 [Tested and result showed no coronavirus] 70
## 6 2 [South & East] 2 [Tested and result showed yes coronavirus] 59
## 7 3 [North] 0 [Not been tested] 1333
## 8 3 [North] 1 [Tested and result showed no coronavirus] 66
## 9 3 [North] 2 [Tested and result showed yes coronavirus] 56
## 10 4 [London] 0 [Not been tested] 840
## 11 4 [London] 1 [Tested and result showed no coronavirus] 93
## 12 4 [London] 2 [Tested and result showed yes coronavirus] 67
## 13 5 [Wales/Scot/NI] 0 [Not been tested] 815
## 14 5 [Wales/Scot/NI] 1 [Tested and result showed no coronavirus] 39
## 15 5 [Wales/Scot/NI] 2 [Tested and result showed yes coronavirus] 23
# changing labels
exp2 %>%
mutate(region = case_when(region == 1 ~ "Midlands", region == 2 ~ "South & East", region == 3 ~ "North", region == 4 ~ "London", region == 5 ~ "Wales, Scotland, Northern Ireland")) %>%
summarise(Total = sum(n)) ## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 15 x 3
## # Groups: region [5]
## region q7beentested Total
## <chr> <dbl+lbl> <int>
## 1 London 0 [Not been tested] 840
## 2 London 1 [Tested and result showed no coronavi… 93
## 3 London 2 [Tested and result showed yes coronav… 67
## 4 Midlands 0 [Not been tested] 930
## 5 Midlands 1 [Tested and result showed no coronavi… 62
## 6 Midlands 2 [Tested and result showed yes coronav… 40
## 7 North 0 [Not been tested] 1333
## 8 North 1 [Tested and result showed no coronavi… 66
## 9 North 2 [Tested and result showed yes coronav… 56
## 10 South & East 0 [Not been tested] 1656
## 11 South & East 1 [Tested and result showed no coronavi… 70
## 12 South & East 2 [Tested and result showed yes coronav… 59
## 13 Wales, Scotland, Northern Ire… 0 [Not been tested] 815
## 14 Wales, Scotland, Northern Ire… 1 [Tested and result showed no coronavi… 39
## 15 Wales, Scotland, Northern Ire… 2 [Tested and result showed yes coronav… 23
# filtering & changing labels for q7beentested, creating totals column and totaltested column
# finding percentages
exp2final <- exp2 %>%
mutate(q7beentested = case_when(q7beentested == 1 ~ "TestednoCOVID", q7beentested == 2 ~ "TestedCOVID")) %>%
pivot_wider(
id_cols = region,
names_from = q7beentested,
values_from = n
) %>%
mutate(
totals = rowSums(across(where(is.numeric)))) %>%
select(3, 4, 5) %>%
mutate(
totaltested = sum(c(TestedCOVID, TestednoCOVID))
) %>%
select(starts_with("total")) %>%
mutate(
TestingRate = (totaltested/totals)*100
) %>%
select(TestingRate) %>%
mutate(region = case_when(region == 1 ~ "Midlands", region == 2 ~ "South & East", region == 3 ~ "North", region == 4 ~ "London", region == 5 ~ "Wales, Scotland, Northern Ireland"))## Adding missing grouping variables: `region`
## Adding missing grouping variables: `region`
## Adding missing grouping variables: `region`
print(exp2)## # A tibble: 15 x 3
## # Groups: region, q7beentested [15]
## region q7beentested n
## <dbl+lbl> <dbl+lbl> <int>
## 1 1 [Midlands] 0 [Not been tested] 930
## 2 1 [Midlands] 1 [Tested and result showed no coronavirus] 62
## 3 1 [Midlands] 2 [Tested and result showed yes coronavirus] 40
## 4 2 [South & East] 0 [Not been tested] 1656
## 5 2 [South & East] 1 [Tested and result showed no coronavirus] 70
## 6 2 [South & East] 2 [Tested and result showed yes coronavirus] 59
## 7 3 [North] 0 [Not been tested] 1333
## 8 3 [North] 1 [Tested and result showed no coronavirus] 66
## 9 3 [North] 2 [Tested and result showed yes coronavirus] 56
## 10 4 [London] 0 [Not been tested] 840
## 11 4 [London] 1 [Tested and result showed no coronavirus] 93
## 12 4 [London] 2 [Tested and result showed yes coronavirus] 67
## 13 5 [Wales/Scot/NI] 0 [Not been tested] 815
## 14 5 [Wales/Scot/NI] 1 [Tested and result showed no coronavirus] 39
## 15 5 [Wales/Scot/NI] 2 [Tested and result showed yes coronavirus] 23
# defining columns and q7beentested
TestednoCOVID <- c(q7beentested == 1)
TestedCOVID <- c(q7beentested == 2)
q7beentested <- c("TestednoCOVID", "TestedCOVID")
totals <- c("totals")
totaltested <- c("totaltested")
TestingRate <- c("TestingRate")
# creating the bar graph
ggplotfinal2 <- ggplot(data = exp2final, aes(x = region, y = TestingRate, fill = TestingRate)) + geom_bar(stat = "identity", width=0.5, colour="black", fill=rgb(1, 0.9, 1, 1)) +
labs(x = "Region", y ="Testing Rate (%) ", title = "COVID Antigen Testing Rates by Region") + theme_minimal()
print(ggplotfinal2)I mostly had trouble finding the total number of people who were tested (summing positive and negative COVID results):
- i tried mutate(sum = rowSums(exp2[, -totals])) and mutate(sum = rowSums(.[2:3])) and also mutate(tested = sum(c_across(Tested_no_COVID:Tested_and_COVID)))
Testing rate seems to be highest in the London region
This may be due to better access to testing locations in a bigger city or more testing locations available with faster results available.
Again, the social desirability bias plays a part in these results as participants across regions may inflate their response to testing.
3. does having a child increase your worry about COVID? This exploratory analysis had NA values, possibly due to participants not answering this question. You would hope that the researchers would make it compulsory to answer all questions in order to receive compensation.
# overview of data needed
exp3 <- COVID %>%
group_by(Has_child, q9worry) %>%
count(Has_child)
print(exp3)## # A tibble: 15 x 3
## # Groups: Has_child, q9worry [15]
## Has_child q9worry n
## <dbl+lbl> <dbl+lbl> <int>
## 1 0 [Does not have a child] 1 [Not at all worried] 96
## 2 0 [Does not have a child] 2 [Not very worried] 347
## 3 0 [Does not have a child] 3 [Somewhat worried] 1003
## 4 0 [Does not have a child] 4 [Very worried] 770
## 5 0 [Does not have a child] 5 [Extremely worried] 410
## 6 1 [Has a child] 1 [Not at all worried] 101
## 7 1 [Has a child] 2 [Not very worried] 285
## 8 1 [Has a child] 3 [Somewhat worried] 1027
## 9 1 [Has a child] 4 [Very worried] 962
## 10 1 [Has a child] 5 [Extremely worried] 787
## 11 NA 1 [Not at all worried] 9
## 12 NA 2 [Not very worried] 36
## 13 NA 3 [Somewhat worried] 122
## 14 NA 4 [Very worried] 111
## 15 NA 5 [Extremely worried] 83
# changing labels and omitting NA
exp3 %>%
mutate(Has_child = case_when(Has_child == 0 ~ "No Child", Has_child == 1 ~ "Child")) %>%
na.omit()## # A tibble: 10 x 3
## # Groups: Has_child, q9worry [10]
## Has_child q9worry n
## <chr> <dbl+lbl> <int>
## 1 No Child 1 [Not at all worried] 96
## 2 No Child 2 [Not very worried] 347
## 3 No Child 3 [Somewhat worried] 1003
## 4 No Child 4 [Very worried] 770
## 5 No Child 5 [Extremely worried] 410
## 6 Child 1 [Not at all worried] 101
## 7 Child 2 [Not very worried] 285
## 8 Child 3 [Somewhat worried] 1027
## 9 Child 4 [Very worried] 962
## 10 Child 5 [Extremely worried] 787
exp3final <- exp3 %>%
na.omit()
# changing them into factors for discrete variables
exp3final$q9worry <- as.factor(exp3final$q9worry)
exp3final$Has_child <-
as.factor(exp3final$Has_child)
exp3final %>%
mutate(Has_child = case_when(Has_child == 0 ~ "No Child", Has_child == 1 ~ "Child"))## # A tibble: 10 x 3
## # Groups: Has_child, q9worry [10]
## Has_child q9worry n
## <chr> <fct> <int>
## 1 No Child 1 96
## 2 No Child 2 347
## 3 No Child 3 1003
## 4 No Child 4 770
## 5 No Child 5 410
## 6 Child 1 101
## 7 Child 2 285
## 8 Child 3 1027
## 9 Child 4 962
## 10 Child 5 787
print(exp3final)## # A tibble: 10 x 3
## # Groups: Has_child, q9worry [10]
## Has_child q9worry n
## <fct> <fct> <int>
## 1 0 1 96
## 2 0 2 347
## 3 0 3 1003
## 4 0 4 770
## 5 0 5 410
## 6 1 1 101
## 7 1 2 285
## 8 1 3 1027
## 9 1 4 962
## 10 1 5 787
# using ggplot
explore3graph <- ggplot(exp3final, aes(q9worry, n, group=Has_child, colour=Has_child)) +
geom_line(mapping = aes(x = q9worry, y = n, group = Has_child), size = 1, linetype = 1) +
theme_ipsum() +
labs(x="Worry", y = "Number of Responses", title = "Impact of Having Children on Worry about COVID-19", subtitle = "1 = Not at all worried, 2 = Not very worried, 3 = Somewhat worried, 4 = Very worried, 5 = Extremely worried")
# final result!
explore3graph +
theme(
plot.subtitle = element_text(size = 10, face = "italic")
) + scale_color_manual(labels = c("No Child", "Child"), values = c("pink", "purple")) + labs(fill="Legend") +
theme(legend.title = element_blank())It seems that having a child only significantly impacts worry but at the higher end of the scale. There were more responses for “very worried” and “extremely worried” for those who have children. However, there are more participants who have children.
Has_child <- c("Has_Child")
exp3percent0 <- exp3 %>%
na.omit() %>%
pivot_wider(
id_cols = Has_child,
names_from = q9worry,
values_from = n
) %>%
mutate(totals = rowSums(across(where(is.numeric)))) %>%
mutate(
One = 48.73, Two = 54.91, Three = 49.40, Four = 44.46, Five = 34.25) %>%
select(c(Has_child, 8:12)) %>%
filter(Has_child == 0) %>%
pivot_longer(
cols = c(2:6),
names_to = "Worry",
values_to = "Percentages"
)
exp3percent1 <- exp3 %>%
na.omit() %>%
pivot_wider(
id_cols = Has_child,
names_from = q9worry,
values_from = n
) %>%
mutate(totals = rowSums(across(where(is.numeric)))) %>%
mutate(
One = 51.27, Two = 45.09, Three = 50.60, Four = 55.54, Five = 65.75) %>%
select(c(Has_child, 8:12)) %>%
filter(Has_child == 1) %>%
pivot_longer(
cols = c(2:6),
names_to = "Worry",
values_to = "Percentages") %>%
zap_label()
exp3percentfinal <- bind_rows(exp3percent0, exp3percent1) %>%
zap_labels()
Percentages <- c("Percentages")
Regions <- c("Regions")
Worry <- c("Worry")
cumulativesum <- c("cumulativesum")
Has_child <- c("Has_child")
exp3percentfinal$Worry <- as.factor(exp3percentfinal$Worry)
exp3percentfinal$Has_child <-
as.factor(exp3percentfinal$Has_child)
exp3percentfinal$Worry <- factor(exp3percentfinal$Worry, levels = c("One", "Two", "Three", "Four", "Five"))
finalgraph <- ggplot() +
geom_bar(aes(y = Percentages, x = Worry, fill = Has_child), data = exp3percentfinal, stat = "identity", width = 0.5, position = "stack") +
theme(legend.title = element_blank()) +
scale_fill_manual(labels = c("No Child", "Child"), values = c("pink", "purple")) +
labs(fill="Legend") +
scale_y_continuous(labels = dollar_format(suffix = "%", prefix = "")) +
labs(x = "Worry", y = "Percentage", title = "Percentage of Worry Responses (%)", subtitle = "1 = Not at all worried, 2 = Not very worried, 3 = Somewhat worried, 4 = Very worried, 5 = Extremely worried") +
theme(
plot.subtitle = element_text(size = 10, face = "italic"))
print(finalgraph)This stacked bar graph makes it really easy to visualise that more respondents with children are more intensely worried
Part 4: Recommendation
The 3 recommendations I have chosen are:
Clear exclusion criteria - we some variables, we didn’t know why participants were excluded. This means the total number of participants was not the original 6149 and interfered with our reproducibility as our numbers often didn’t match up
A code book - we didn’t even realise other groups got a code book, so getting pre-existing code for our descriptives would make reproducing everything so much easier. It was very time-consuming coding descriptives ourselves
read me document - this would’ve been a lifesaver to understand how each variable was coded in the data, as some Likert scales were 1-5 and some were 1-4, others were binary variables and some were continuous
Next Steps…
My next steps are to finalise my report, and figure out how to knit to pdf and word. I’ve heard some members of my group having trouble so I’m a bit nervous.