Week 10: Learning Log

My Coding Goals this Week

My goals this week are to finish my report.

Challenges/Successes

To be honest, I had a hard time this week on my report.

I ran into many challenges trying to knit my super long document and edit out any errors. I also found errors within our group coding for our verification, even though the same code worked just fine a few weeks ago.

Here is my drafted report:

Part 1: Reaction

Summary

This paper by Smith et al investigates if believing you have had COVID-19 impacts how an individual self-reports their adherence to lockdown measures. It is important to understand if people will stop adhering to measures after testing positive, as it is still unclear if an individual can have COVID more than once, and how common this could be. So far, there is no previous evidence on if adhering to protective measures is different if an individual thinks they have COVID or not (could be self-diagnosed or with antigen/antibody test). Understanding if a COVID diagnosis changes the way we try to protect ourselves from COVID and if the diagnosis impacts how we report our behaviour will help us understand future ways of exiting from lockdown strategies.

This study utilised an online cross-sectional survey, spanning 6149 UK participants aged 18+. Participants were asked if they had had COVID-19, if they had been tested, their perceived immunity to COVID-19, and other variables such as how often they went shopping and saw friends. Measurement of variables included Likert scales, binary and continuous variables. The authors hypothesised that believing you have had COVID-19 makes you more likely to believe you are immune, in addition to being less likely to adhere to social distancing measures.

It was found that those who believed they had had COVID-19 were more likely to think they are immune and stop participating in activities such as washing hands and social distancing. There was no evidence found for an association between thinking that you had had COVID-19 and its perceived risk. It remains likely that people will be required to adhere to protective measures for COVID-19 even if they have had the illness previously.

Since research around COVID is still novel, the results from this paper may significantly impact the future of implementation of lockdown rules. Currently, no media communications specifically target those who believed they have had COVID-19. As the opinions towards lockdown measures and COVID-19 immunity are different in these people, it is worth addressing this gap in the media. However, this study heavily relies on self-reported measures, where the social desirability bias impacts the way participants respond, especially in the rates of adhering to lockdown measures. The response from participants also may not be representative of the UK population.

Reaction

I wonder whether the results of this study would be universal if the same method was applied in a different country, such as Australia. It would be interesting to see if an increase/decrease of COVID-19 rates in the particular country would change these results. Similarly, if harsher COVID-19 restrictions would have any effect on the results gathered

I was confused by Figure 1, I couldn’t figure out what scale they were using for the graph and it seemed very out of place. I didn’t like how the labels on the X axis were so long and how the percentages of yes and no for each variable did not add up to 100%.

The most interesting parts of this paper was the statistics for the COVID-19 antigen test. I thought it was interesting that more than half of those who tested negative believed they had COVID-19. And since people who thought they had had COVID-19 were less likely to correctly identify COVID-19 symptoms, it seems to me that those who think they are COVID negative have a better understanding of COVID-19 symptoms. Does that mean that people who think they have had COVID-19 and think they have increased immunity become complacent?

Part 2: Verification

Now we are going to reproduce!

install.packages("dplyr")
## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("gt")
## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("forcats")
## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("ggplot")
## Warning: package 'ggplot' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages
install.packages("tidyr")
## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("janitor")
## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("hrbrthemes")
## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
install.packages("scales")
## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//Rtmp1esleE/downloaded_packages
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ stringr 1.4.0
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ readr   1.4.0
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(haven)
library(gt)
library(forcats)
library(ggplot2)
library(tidyr)
library(janitor)
## 
## Attaching package: 'janitor'
## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test
library(hrbrthemes)
## NOTE: Either Arial Narrow or Roboto Condensed fonts are required to use these themes.
##       Please use hrbrthemes::import_roboto_condensed() to install Roboto Condensed and
##       if Arial Narrow is not on your system, please see https://bit.ly/arialnarrow
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
COVID <- read_sav(file ="coviddata.sav")

Demographic descriptives

the number of participants who thought they didn’t and did have COVID

amount_covid <- COVID %>% 
  group_by(Ever_covid) %>% 
  count(Ever_covid)
print(amount_covid)
## # A tibble: 2 x 2
## # Groups:   Ever_covid [2]
##                           Ever_covid     n
##                            <dbl+lbl> <int>
## 1 0 [Think have not had coronavirus]  4656
## 2 1 [Think have had coronavirus]      1493

how many had an antigen test for COVID

amount_covid1_antigen <- COVID %>% 
  group_by(q7beentested) %>%
  count(q7beentested)
print(amount_covid1_antigen)
## # A tibble: 3 x 2
## # Groups:   q7beentested [3]
##                                   q7beentested     n
##                                      <dbl+lbl> <int>
## 1 0 [Not been tested]                           5574
## 2 1 [Tested and result showed no coronavirus]    330
## 3 2 [Tested and result showed yes coronavirus]   245

how many tested negative but thought they had COVID

negativethought <- COVID %>% 
  group_by(q7beentested, Ever_covid) %>% 
  filter(Ever_covid == 1, q7beentested == 1) %>% 
  count(Ever_covid)
print(negativethought)
## # A tibble: 1 x 3
## # Groups:   q7beentested, Ever_covid [1]
##                                 q7beentested                    Ever_covid     n
##                                    <dbl+lbl>                     <dbl+lbl> <int>
## 1 1 [Tested and result showed no coronaviru… 1 [Think have had coronaviru…   187

how many tested positive but thought they didnt have COVID

positivethought <- COVID %>% 
  group_by(q7beentested, Ever_covid) %>% 
  filter(Ever_covid == 0, q7beentested == 2) %>% 
  count(Ever_covid)
print(positivethought)
## # A tibble: 1 x 3
## # Groups:   q7beentested, Ever_covid [1]
##                               q7beentested                      Ever_covid     n
##                                  <dbl+lbl>                       <dbl+lbl> <int>
## 1 2 [Tested and result showed yes coronav… 0 [Think have not had coronavi…    56

how many males/females

COVID <- read_sav(file ="coviddata.sav")
covidgender <- COVID %>% 
  group_by(gender, Ever_covid) %>%
  mutate(gender = case_when(gender == 1 ~ "Male", gender == 2 ~ "female")) %>% 
  mutate(Ever_covid = case_when(Ever_covid == 0 ~ "Think have not had COVID-19", Ever_covid == 1 ~ "Think have had COVID-19")) %>%
  count(gender)
print(covidgender)
## # A tibble: 4 x 3
## # Groups:   gender, Ever_covid [4]
##   gender Ever_covid                      n
##   <chr>  <chr>                       <int>
## 1 female Think have had COVID-19       796
## 2 female Think have not had COVID-19  2459
## 3 Male   Think have had COVID-19       697
## 4 Male   Think have not had COVID-19  2197

how many in each age category

age <- COVID %>% 
  group_by(age_categories) %>% 
  mutate(age_categories = case_when(age_categories == 1 ~ "18 to 24 years", age_categories == 2 ~ "25 to 34 years", age_categories == 3 ~ "35 to 44 years", age_categories == 4 ~ "45 to 54 years", age_categories == 5 ~ "55 years and over")) %>%
  count(age_categories)
print(age)
## # A tibble: 5 x 2
## # Groups:   age_categories [5]
##   age_categories        n
##   <chr>             <int>
## 1 18 to 24 years     1422
## 2 25 to 34 years     1223
## 3 35 to 44 years     1045
## 4 45 to 54 years      718
## 5 55 years and over  1741

how many have children

They excluded 361 people here for some reason. Our guess is that they didn’t answer the question

child <- COVID %>% 
  group_by(Has_child) %>% 
  count(Has_child)
print(child)
## # A tibble: 3 x 2
## # Groups:   Has_child [3]
##                    Has_child     n
##                    <dbl+lbl> <int>
## 1  0 [Does not have a child]  2626
## 2  1 [Has a child]            3162
## 3 NA                           361

employment status

They excluded 83 people here

employment <- COVID %>% 
  group_by(Working) %>% 
  count(Working)
print(employment)
## # A tibble: 3 x 2
## # Groups:   Working [3]
##                                             Working     n
##                                           <dbl+lbl> <int>
## 1  0 [Not working]                                   2071
## 2  1 [Working (full or part time or self-employed)]  3995
## 3 NA                                                   83

how many are working in key sector

worker <- COVID %>% 
  group_by(Key_worker) %>% 
  count(Key_worker)
print(worker)
## # A tibble: 2 x 2
## # Groups:   Key_worker [2]
##           Key_worker     n
##            <dbl+lbl> <int>
## 1 0 [Not key worker]  3858
## 2 1 [Key worker]      2291

level of education

They excluded 92 people, maybe those who didn’t fit into either category?

education <- COVID %>% 
  group_by(degree) %>% 
  count(degree)
print(education)
## # A tibble: 3 x 2
## # Groups:   degree [3]
##                                                  degree     n
##                                               <dbl+lbl> <int>
## 1  0 [GCSE/vocational/A-level/no formal qualifications]  4442
## 2  1 [Degree or higher (Bachelors, Masters, PhD)]        1615
## 3 NA                                                       92

how many from each region

region1 <- COVID %>% 
  group_by(region) %>% 
  count(region)
print(region1)
## # A tibble: 5 x 2
## # Groups:   region [5]
##              region     n
##           <dbl+lbl> <int>
## 1 1 [Midlands]       1032
## 2 2 [South & East]   1785
## 3 3 [North]          1455
## 4 4 [London]         1000
## 5 5 [Wales/Scot/NI]   877

how many agreed/strongly agreed that they had some immunity to COVID & this also compared to if they thought they had COVID

agreecovid <- COVID %>% 
  group_by(q8haveimmunity) %>% 
  filter(q8haveimmunity > 3) %>% 
  count(q8haveimmunity)
print(agreecovid)
## # A tibble: 2 x 2
## # Groups:   q8haveimmunity [2]
##       q8haveimmunity     n
##            <dbl+lbl> <int>
## 1 4 [Agree]            841
## 2 5 [Strongly agree]   299
agreecovid1 <- COVID %>% 
  group_by(q8haveimmunity, Ever_covid) %>% 
  filter(q8haveimmunity > 3) %>% 
  count(Ever_covid)
print(agreecovid1)
## # A tibble: 4 x 3
## # Groups:   q8haveimmunity, Ever_covid [4]
##       q8haveimmunity                         Ever_covid     n
##            <dbl+lbl>                          <dbl+lbl> <int>
## 1 4 [Agree]          0 [Think have not had coronavirus]   382
## 2 4 [Agree]          1 [Think have had coronavirus]       459
## 3 5 [Strongly agree] 0 [Think have not had coronavirus]   118
## 4 5 [Strongly agree] 1 [Think have had coronavirus]       181

Means & SDs, Tables/Figures

Table 1

This code finds the numbers and percentages for variables in table 1. First, we used zap labels to remove the pre-existing labels from each vector. We used pivot longer in order to find the total number for each variable to create percentages. We filtered out variables not needed for table 1 and then omitted NA values. We found percentages using mutate, rounding to 1 decimal place.

covid_1 <- COVID %>%
  zap_labels() %>% 
  pivot_longer(gender:region, names_to = "vars", values_to = "values") %>% 
  group_by(Ever_covid, vars,values) %>% 
  count(name = "number") %>% 
  na.omit %>% 
  group_by(vars,values) %>% 
  mutate(percentage = round(number/sum(number) * 100, 1))
print(covid_1)
## # A tibble: 40 x 5
## # Groups:   vars, values [20]
##    Ever_covid vars           values number percentage
##         <dbl> <chr>           <dbl>  <int>      <dbl>
##  1          0 age_categories      1   1003       70.5
##  2          0 age_categories      2    823       67.3
##  3          0 age_categories      3    751       71.9
##  4          0 age_categories      4    554       77.2
##  5          0 age_categories      5   1525       87.6
##  6          0 degree              0   3382       76.1
##  7          0 degree              1   1200       74.3
##  8          0 gender              1   2197       75.9
##  9          0 gender              2   2459       75.5
## 10          0 Has_child           0   2005       76.4
## # … with 30 more rows

We used this code to find how many thought they have not had COVID and how many thought they did. These numbers are at the top row of table 1 and table 3

amount_covid <- COVID %>% 
  group_by(Ever_covid) %>% 
  count(Ever_covid)
  print(amount_covid)
## # A tibble: 2 x 2
## # Groups:   Ever_covid [2]
##                           Ever_covid     n
##                            <dbl+lbl> <int>
## 1 0 [Think have not had coronavirus]  4656
## 2 1 [Think have had coronavirus]      1493

We now changed the data back to wide data to begin reproducing the table. We again used zap labels to remove any leftover labels since we previously grouped before omitting NA values, so now there should not be any more labels. We used mutate to turn everything into a factor.

table1 <- covid_1 %>% 
  pivot_wider(names_from = vars, values_from = values) %>% 
zap_labels() %>% mutate_if(is.numeric, as.factor)
print(table1)
## # A tibble: 40 x 10
##    Ever_covid number percentage age_categories degree gender Has_child
##    <fct>      <fct>  <fct>      <fct>          <fct>  <fct>  <fct>    
##  1 0          1003   70.5       1              <NA>   <NA>   <NA>     
##  2 0          823    67.3       2              <NA>   <NA>   <NA>     
##  3 0          751    71.9       3              <NA>   <NA>   <NA>     
##  4 0          554    77.2       4              <NA>   <NA>   <NA>     
##  5 0          1525   87.6       5              <NA>   <NA>   <NA>     
##  6 0          3382   76.1       <NA>           0      <NA>   <NA>     
##  7 0          1200   74.3       <NA>           1      <NA>   <NA>     
##  8 0          2197   75.9       <NA>           <NA>   1      <NA>     
##  9 0          2459   75.5       <NA>           <NA>   2      <NA>     
## 10 0          2005   76.4       <NA>           <NA>   <NA>   0        
## # … with 30 more rows, and 3 more variables: Key_worker <fct>, region <fct>,
## #   Working <fct>

Now using factors, we can create the labels from the original table.

  table1$Ever_covid <- factor(table1$Ever_covid, labels = c("Think have not had COVID-19 n = 4656", "Think have had COVID-19 n = 1493"))
table1$gender <-factor(table1$gender, labels = c("Male", "Female"))
table1$age_categories <- factor(table1$age_categories, labels = c("18 to 24 years", "25 to 34 years", "35 to 44 years", "45 to 54 years", "55 years and over"))
table1$Has_child <- factor(table1$Has_child, labels = c("No", "Yes"))
table1$Working <- factor(table1$Working, labels = c("Not working", "Working"))
table1$Key_worker <- factor(table1$Key_worker, labels = c("No", "Yes"))
table1$degree <-factor(table1$degree, labels = c("GCSE/vocational/A-level/No formal qualifications", "Degree or higher (Bachelors, Masters, PhD"))
table1$region <- factor(table1$region, labels = c("Midlands", "South and East", "North", "London", "Wales, Scotland and Northern Ireland"))

Now we pivot again to long data to find our percentages using mutate, and add the rest of the labels.

table1 <- table1 %>% 
  pivot_longer(age_categories:Working, names_to = "Participant Characteristics", values_to = "Levels") %>% 
  na.omit %>% 
  mutate(Real = paste0(number,"(",percentage,"%)")) %>%
  select(-percentage, -number) %>% 
  pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real) 
print(table1)
## # A tibble: 20 x 4
##    `Participant Chara… Levels          `Think have not had … `Think have had CO…
##    <chr>               <fct>           <chr>                 <chr>              
##  1 age_categories      18 to 24 years  1003(70.5%)           419(29.5%)         
##  2 age_categories      25 to 34 years  823(67.3%)            400(32.7%)         
##  3 age_categories      35 to 44 years  751(71.9%)            294(28.1%)         
##  4 age_categories      45 to 54 years  554(77.2%)            164(22.8%)         
##  5 age_categories      55 years and o… 1525(87.6%)           216(12.4%)         
##  6 degree              GCSE/vocationa… 3382(76.1%)           1060(23.9%)        
##  7 degree              Degree or high… 1200(74.3%)           415(25.7%)         
##  8 gender              Male            2197(75.9%)           697(24.1%)         
##  9 gender              Female          2459(75.5%)           796(24.5%)         
## 10 Has_child           No              2005(76.4%)           621(23.6%)         
## 11 Has_child           Yes             2386(75.5%)           776(24.5%)         
## 12 Key_worker          No              3105(80.5%)           753(19.5%)         
## 13 Key_worker          Yes             1551(67.7%)           740(32.3%)         
## 14 region              Midlands        781(75.7%)            251(24.3%)         
## 15 region              South and East  1369(76.7%)           416(23.3%)         
## 16 region              North           1120(77%)             335(23%)           
## 17 region              London          701(70.1%)            299(29.9%)         
## 18 region              Wales, Scotlan… 685(78.1%)            192(21.9%)         
## 19 Working             Not working     1714(82.8%)           357(17.2%)         
## 20 Working             Working         2871(71.9%)           1124(28.1%)
table1$`Participant Characteristics` <- as.factor(table1$`Participant Characteristics`)
table1$`Participant Characteristics` <- factor(table1$`Participant Characteristics`, levels = c ( "gender",  "age_categories", "Has_child", "Working", "Key_worker","degree", "region"), labels = c ( "Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region"))

Using gt to create the table. Using rowname_col and groupname_col to create the headings and values. We pretty much used the gt package from here onwards for this table, reordering the groups and adding labels and fixing text to match the original table

  table1 <- table1 %>% 
  gt(rowname_col = "Levels",
     groupname_col = "Participant Characteristics") %>% 
  tab_spanner(label = md("**Had COVID-19**"), 
              columns = vars("Think have not had COVID-19 n = 4656","Think have had COVID-19 n = 1493")) %>% 
    row_group_order(c("Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region")) %>% 
    cols_label(
      'Think have not had COVID-19 n = 4656' = md("**Think have not had COVID-19 n = 4656**"),
      'Think have had COVID-19 n = 1493' = md( "**Think have had COVID-19 n = 1493**")
    ) %>% 
    tab_style(
      style = list(
        cell_text(weight = "bold"),
        cell_fill()
      ),
  locations = cells_row_groups())

Table 2

Started off by finding the mean and SD for variables in table 2, as this is what is needed to reproduce the original.

 MSD <- COVID %>% 
   group_by(Ever_covid) %>% 
   summarise(across(c(q8haveimmunity,Going_out_total,q9worry,q10arisk,q10brisk),
                    list(mean = mean, standard_deviation = sd)))  %>% 
   mutate(across(2:11, round,2))
print(MSD)
## # A tibble: 2 x 11
##   Ever_covid q8haveimmunity_… q8haveimmunity_… Going_out_total… Going_out_total…
##    <dbl+lbl>            <dbl>            <dbl>            <dbl>            <dbl>
## 1 0 [Think …             2.38             1.01             6.69             5.63
## 2 1 [Think …             3.33             1                9.35             7.69
## # … with 6 more variables: q9worry_mean <dbl>,
## #   q9worry_standard_deviation <dbl>, q10arisk_mean <dbl>,
## #   q10arisk_standard_deviation <dbl>, q10brisk_mean <dbl>,
## #   q10brisk_standard_deviation <dbl>

Once we have the means/SD, we pivot to long data, selecting only the variables we want and arranging by Ever_covid

table2mean <- MSD %>% 
  select(ends_with("mean"), Ever_covid) %>% 
  pivot_longer(ends_with("mean") ,  names_to = "Mean_vars", values_to = "Mean_values")

table2SD <- MSD %>% 
  select(ends_with("deviation"), Ever_covid) %>% 
  pivot_longer(ends_with("deviation") ,  names_to = "SD_vars", values_to = "SD_values")

And now we bind these together using bind_cols, and mutate to change into the labels we want for our table, as well as renaming them.

  table2 <- bind_cols(table2SD,table2mean) %>% 
  mutate(real = paste0("M = ", Mean_values,",", " SD = ", SD_values)) %>% 
  select(Ever_covid...1, Mean_vars, real) %>% 
  mutate(Level = case_when(startsWith(Mean_vars,"q8") ~ "1 = Strongly disagree to  5 = Strongly agree", startsWith(Mean_vars,"q9") ~"1 = not worried at all to  5 = extremely worried", startsWith(Mean_vars, "Going") ~"Range = 0 to 42", TRUE ~"1 = not risk at all to  4 = major risk ")) %>% 
  rename("Participant characteristics" = "Mean_vars") 
## New names:
## * Ever_covid -> Ever_covid...1
## * Ever_covid -> Ever_covid...4
print(table2)
## # A tibble: 10 x 4
##            Ever_covid...1 `Participant charact… real       Level                
##                 <dbl+lbl> <chr>                 <chr>      <chr>                
##  1 0 [Think have not had… q8haveimmunity_mean   M = 2.38,… "1 = Strongly disagr…
##  2 0 [Think have not had… Going_out_total_mean  M = 6.69,… "Range = 0 to 42"    
##  3 0 [Think have not had… q9worry_mean          M = 3.59,… "1 = not worried at …
##  4 0 [Think have not had… q10arisk_mean         M = 2.81,… "1 = not risk at all…
##  5 0 [Think have not had… q10brisk_mean         M = 3.39,… "1 = not risk at all…
##  6 1 [Think have had cor… q8haveimmunity_mean   M = 3.33,… "1 = Strongly disagr…
##  7 1 [Think have had cor… Going_out_total_mean  M = 9.35,… "Range = 0 to 42"    
##  8 1 [Think have had cor… q9worry_mean          M = 3.38,… "1 = not worried at …
##  9 1 [Think have had cor… q10arisk_mean         M = 2.81,… "1 = not risk at all…
## 10 1 [Think have had cor… q10brisk_mean         M = 3.3, … "1 = not risk at all…

Now we change them into factors..

  table2$Ever_covid...1 <- as.factor(table2$Ever_covid...1)
  table2$`Participant characteristics` <- as.factor(table2$`Participant characteristics`)
  table2$Ever_covid...1 <- factor(table2$Ever_covid...1, labels = c("Think have not had COVID-19", "Think have had COVID-19"))
  table2$`Participant characteristics` <- factor(table2$`Participant characteristics`, labels = c("I think I have some immunity to COVID-19", "Total out-of-home activity in the last seven days","Worry about COVID-19","Perceived risk of COVID-19 to onself","Perceived risk of COVID-19 to people in the UK"))

Back to wide data to use gt. We add our subheadings and heading/title and it’s complete.

  table2 <- table2 %>% 
    pivot_wider(id_cols = -real, names_from = Ever_covid...1, values_from = real) %>% 
    gt(rowname_col = "Levels",
       groupname_col = "Participant Characteristics") %>% 
    tab_spanner(label = "Had COVID-19", 
                columns = vars("Think have not had COVID-19", "Think have had COVID-19"))

Table 3

We pivot to longer data to find numbers for each variable in table 3, sorted according to if they have had COVID-19 or not.

  t3_top <- COVID %>% 
   pivot_longer(Adhere_shop_groceries:Sx_covid_nomissing, names_to = "vars" , values_to = "values") %>%
   group_by(vars, values) %>% 
   count(name = "number")
  print(t3_top)
## # A tibble: 8 x 3
## # Groups:   vars, values [8]
##   vars                  values number
##   <chr>                  <dbl>  <int>
## 1 Adhere_meet_friends        0    878
## 2 Adhere_meet_friends        1   5271
## 3 Adhere_shop_groceries      0   2389
## 4 Adhere_shop_groceries      1   3760
## 5 Adhere_shop_other          0   1833
## 6 Adhere_shop_other          1   4316
## 7 Sx_covid_nomissing         0   2517
## 8 Sx_covid_nomissing         1   3632

We now include Ever_covid while pivoting to long data as this is needed for the left column in our table. Omitting any NA values and mutating creates a new column with our percentages.

  covid_3 <- COVID %>% 
   pivot_longer(Adhere_shop_groceries:Sx_covid_nomissing, names_to = "vars", values_to = "values") %>% 
   group_by(Ever_covid,vars,values) %>% 
   count(name = "number") %>% 
   na.omit %>% 
   group_by(Ever_covid, vars) %>% 
   mutate(percentage = round(number/sum(number) * 100,1))
  print(covid_3)
## # A tibble: 16 x 5
## # Groups:   Ever_covid, vars [8]
##                          Ever_covid vars                values number percentage
##                           <dbl+lbl> <chr>                <dbl>  <int>      <dbl>
##  1 0 [Think have not had coronavir… Adhere_meet_friends      0    456        9.8
##  2 0 [Think have not had coronavir… Adhere_meet_friends      1   4200       90.2
##  3 0 [Think have not had coronavir… Adhere_shop_grocer…      0   1701       36.5
##  4 0 [Think have not had coronavir… Adhere_shop_grocer…      1   2955       63.5
##  5 0 [Think have not had coronavir… Adhere_shop_other        0   1156       24.8
##  6 0 [Think have not had coronavir… Adhere_shop_other        1   3500       75.2
##  7 0 [Think have not had coronavir… Sx_covid_nomissing       0   1729       37.1
##  8 0 [Think have not had coronavir… Sx_covid_nomissing       1   2927       62.9
##  9 1 [Think have had coronavirus]   Adhere_meet_friends      0    422       28.3
## 10 1 [Think have had coronavirus]   Adhere_meet_friends      1   1071       71.7
## 11 1 [Think have had coronavirus]   Adhere_shop_grocer…      0    688       46.1
## 12 1 [Think have had coronavirus]   Adhere_shop_grocer…      1    805       53.9
## 13 1 [Think have had coronavirus]   Adhere_shop_other        0    677       45.3
## 14 1 [Think have had coronavirus]   Adhere_shop_other        1    816       54.7
## 15 1 [Think have had coronavirus]   Sx_covid_nomissing       0    788       52.8
## 16 1 [Think have had coronavirus]   Sx_covid_nomissing       1    705       47.2

Back to wide data to prepare our tibble for the table by removing pre-existing labels using zap labels like before, and changing into factors.

  table3 <- covid_3 %>% 
    pivot_wider(names_from = vars, values_from = values) %>% 
    zap_labels() %>% mutate_if(is.numeric, as.factor) 
## `mutate_if()` ignored the following grouping variables:
## Column `Ever_covid`
print(table3)
## # A tibble: 16 x 7
## # Groups:   Ever_covid [2]
##    Ever_covid number percentage Adhere_meet_friends Adhere_shop_groceries
##         <dbl> <fct>  <fct>      <fct>               <fct>                
##  1          0 456    9.8        0                   <NA>                 
##  2          0 4200   90.2       1                   <NA>                 
##  3          0 1701   36.5       <NA>                0                    
##  4          0 2955   63.5       <NA>                1                    
##  5          0 1156   24.8       <NA>                <NA>                 
##  6          0 3500   75.2       <NA>                <NA>                 
##  7          0 1729   37.1       <NA>                <NA>                 
##  8          0 2927   62.9       <NA>                <NA>                 
##  9          1 422    28.3       0                   <NA>                 
## 10          1 1071   71.7       1                   <NA>                 
## 11          1 688    46.1       <NA>                0                    
## 12          1 805    53.9       <NA>                1                    
## 13          1 677    45.3       <NA>                <NA>                 
## 14          1 816    54.7       <NA>                <NA>                 
## 15          1 788    52.8       <NA>                <NA>                 
## 16          1 705    47.2       <NA>                <NA>                 
## # … with 2 more variables: Adhere_shop_other <fct>, Sx_covid_nomissing <fct>
  table3$Ever_covid <- as.factor(table3$Ever_covid)
  table3$Ever_covid <- factor(table3$Ever_covid, labels = c("Think have not had COVID-19", "Think have had COVID-19"))
  table3$Adhere_shop_groceries <- factor(table3$Adhere_shop_groceries, labels = c("On one or fewer days in the last week, n = 2389", "On two or more days in the last week, n = 3760"))
  table3$Adhere_meet_friends <- factor(table3$Adhere_meet_friends, labels = c("Not at all in the last week, n = 5271", "On one or more days in the last week, n = 878"))
  table3$Adhere_shop_other <- factor(table3$Adhere_shop_other, labels = c("Not at all in the last week, n = 1833", "On one or more days in the last week, n = 4316"))
  table3$Sx_covid_nomissing <- factor(table3$Sx_covid_nomissing, labels = c("Did not correctly identify symptoms, n = 2390", "Correctly identified common symptoms, n = 3632"))

Pivoting to long data once again to fix up our column headings

  table3 <- table3 %>% 
    pivot_longer(Adhere_meet_friends:Sx_covid_nomissing, names_to = "Participant Characteristics", values_to = "Levels") %>% 
    na.omit %>% 
    mutate(Real = paste0(number,"(",percentage,")")) %>% 
    select(-number,-percentage) %>%
    pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real) 
    print(table3)
## # A tibble: 8 x 4
##   `Participant Charac… Levels             `Think have not ha… `Think have had C…
##   <chr>                <fct>              <chr>               <chr>             
## 1 Adhere_meet_friends  Not at all in the… 456(9.8)            422(28.3)         
## 2 Adhere_meet_friends  On one or more da… 4200(90.2)          1071(71.7)        
## 3 Adhere_shop_groceri… On one or fewer d… 1701(36.5)          688(46.1)         
## 4 Adhere_shop_groceri… On two or more da… 2955(63.5)          805(53.9)         
## 5 Adhere_shop_other    Not at all in the… 1156(24.8)          677(45.3)         
## 6 Adhere_shop_other    On one or more da… 3500(75.2)          816(54.7)         
## 7 Sx_covid_nomissing   Did not correctly… 1729(37.1)          788(52.8)         
## 8 Sx_covid_nomissing   Correctly identif… 2927(62.9)          705(47.2)

Changing to factors

  table3$`Participant Characteristics` <- as.factor(table3$`Participant Characteristics`)
  table3$`Participant Characteristics` <- factor(table3$`Participant Characteristics`, labels = c(
    "Meet up with family and friends","Shopping for groceries/pharmacy", "Shopping for items other than groceries or pharmacy",
    "Correct identification of main symptoms"
  ))

Using gt to create the table. Complete

  table3 <- table3 %>% 
    gt(rowname_col = "Levels",
       groupname_col = "Participant Characteristics") %>% 
    tab_spanner(label = "Had COVID-19", 
                columns = vars("Think have not had COVID-19",
                               "Think have had COVID-19")) %>% 
    row_group_order(c("Meet up with family and friends", "Shopping for groceries/pharmacy",
                      "Shopping for items other than groceries or pharmacy", "Correct identification of main symptoms" ))

Figure 1 graph

This reproduces the original graph

We first removed the pre-existing labels with zap_labels and pivoted to long data with all our variables as columns. Once we had a column with our number of participants for each variable and also a column stating whether or not they thought they have had COVID, we were able to create a percentage column and filter for the values we needed from each variable. Finally, we add our labels back for the Ever_Covid variable.

  graph_values_real <- COVID %>% 
   zap_labels() %>% 
   mutate(Going_out_total = case_when(Going_out_total > 7 ~ 1, TRUE ~0)) %>% 
   pivot_longer(q8haveimmunity:Sx_covid_nomissing, names_to = "vars", values_to = "values") %>%
   group_by(Ever_covid,vars,values) %>% 
   count(name = "number") %>% 
   na.omit %>% 
   group_by(Ever_covid, vars) %>% 
   mutate(Percentage = round(number/sum(number) * 100,1)) %>% 
   filter(case_when(vars == "q8haveimmunity" ~ values == 5,
                    vars == "Sx_covid_nomissing" ~ values == 0 , startsWith(vars,'Adhere') ~ values ==0,
                    TRUE ~ values == 1)) %>% 
   select(vars, Ever_covid, Percentage) %>% 
  mutate(qcovid = case_when(Ever_covid==1 ~ "Think have had COVID-19", TRUE ~ "Think have not had COVID-19"))
 print(graph_values_real)
## # A tibble: 18 x 4
## # Groups:   Ever_covid, vars [18]
##    vars                  Ever_covid Percentage qcovid                     
##    <chr>                      <dbl>      <dbl> <chr>                      
##  1 Adhere_meet_friends            0        9.8 Think have not had COVID-19
##  2 Adhere_shop_groceries          0       36.5 Think have not had COVID-19
##  3 Adhere_shop_other              0       24.8 Think have not had COVID-19
##  4 Going_out_total                0       37.6 Think have not had COVID-19
##  5 q10arisk                       0        3.2 Think have not had COVID-19
##  6 q10brisk                       0        1   Think have not had COVID-19
##  7 q8haveimmunity                 0        2.5 Think have not had COVID-19
##  8 q9worry                        0        2.5 Think have not had COVID-19
##  9 Sx_covid_nomissing             0       37.1 Think have not had COVID-19
## 10 Adhere_meet_friends            1       28.3 Think have had COVID-19    
## 11 Adhere_shop_groceries          1       46.1 Think have had COVID-19    
## 12 Adhere_shop_other              1       45.3 Think have had COVID-19    
## 13 Going_out_total                1       51.2 Think have had COVID-19    
## 14 q10arisk                       1        3.9 Think have had COVID-19    
## 15 q10brisk                       1        1.3 Think have had COVID-19    
## 16 q8haveimmunity                 1       12.1 Think have had COVID-19    
## 17 q9worry                        1        6   Think have had COVID-19    
## 18 Sx_covid_nomissing             1       52.8 Think have had COVID-19
 covid_1 <- COVID %>%
  zap_labels() %>% 
  pivot_longer(gender:region, names_to = "vars", values_to = "values") %>% 
  group_by(Ever_covid, vars,values) %>% 
  count(name = "number") %>% 
  na.omit %>% 
  group_by(vars,values) %>% 
  mutate(percentage = round(number/sum(number) * 100, 1))
   
  alldata <- covid_1 %>% 
  pivot_wider(names_from = vars, values_from = values) %>% 
zap_labels() %>% mutate_if(is.numeric, as.factor)

changing each variable into a factor and adding our labels from the original table

  alldata$Ever_covid <- factor(alldata$Ever_covid, labels = c("no_covid", "yes_covid"))
alldata$gender <-factor(alldata$gender, labels = c("Male", "Female"))
alldata$age_categories <- factor(alldata$age_categories, labels = c("18 to 24 years", "25 to 34 years", "35 to 44 years", "45 to 54 years", "55 years and over"))
alldata$Has_child <- factor(alldata$Has_child, labels = c("No", "Yes"))
alldata$Working <- factor(alldata$Working, labels = c("Not working", "Working"))
alldata$Key_worker <- factor(alldata$Key_worker, labels = c("No", "Yes"))
alldata$degree <-factor(alldata$degree, labels = c("GCSE/vocational/A-level/No formal qualifications", "Degree or higher (Bachelors, Masters, PhD"))
alldata$region <- factor(alldata$region, labels = c("Midlands", "South and East", "North", "London", "Wales, Scotland and Northern Ireland"))
    
alldata <- alldata %>% 
  pivot_longer(age_categories:Working, names_to = "vars", values_to = "values") %>% 
  na.omit %>% 
  mutate(Real = paste0(number,"(",percentage,")")) %>%
  select(-percentage, -number) %>% 
  pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real) %>% 
  rename("Think have not had COVID-19" = "no_covid") %>% 
  rename("Think have had COVID-19" = "yes_covid") %>% 
  rename("Level" = "values") %>% 
  rename("Participant Characteristics" = "vars") 
print(alldata)  
## # A tibble: 20 x 4
##    `Participant Charac… Level             `Think have not ha… `Think have had C…
##    <chr>                <fct>             <chr>               <chr>             
##  1 age_categories       18 to 24 years    1003(70.5)          419(29.5)         
##  2 age_categories       25 to 34 years    823(67.3)           400(32.7)         
##  3 age_categories       35 to 44 years    751(71.9)           294(28.1)         
##  4 age_categories       45 to 54 years    554(77.2)           164(22.8)         
##  5 age_categories       55 years and over 1525(87.6)          216(12.4)         
##  6 degree               GCSE/vocational/… 3382(76.1)          1060(23.9)        
##  7 degree               Degree or higher… 1200(74.3)          415(25.7)         
##  8 gender               Male              2197(75.9)          697(24.1)         
##  9 gender               Female            2459(75.5)          796(24.5)         
## 10 Has_child            No                2005(76.4)          621(23.6)         
## 11 Has_child            Yes               2386(75.5)          776(24.5)         
## 12 Key_worker           No                3105(80.5)          753(19.5)         
## 13 Key_worker           Yes               1551(67.7)          740(32.3)         
## 14 region               Midlands          781(75.7)           251(24.3)         
## 15 region               South and East    1369(76.7)          416(23.3)         
## 16 region               North             1120(77)            335(23)           
## 17 region               London            701(70.1)           299(29.9)         
## 18 region               Wales, Scotland … 685(78.1)           192(21.9)         
## 19 Working              Not working       1714(82.8)          357(17.2)         
## 20 Working              Working           2871(71.9)          1124(28.1)
alldata$`Participant Characteristics` <- as.factor(alldata$`Participant Characteristics`)
alldata$`Participant Characteristics` <- factor(alldata$`Participant Characteristics`, levels = c ( "gender",  "age_categories","Has_child", "Working", "Key_worker","degree", "region"),labels = c ( "Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region"))

Using gt to give a rough outline of our graph in the form of a table, making sure our percentages are rounded to 2 decimal places

  alldata <- alldata %>% 
  gt() %>% 
  tab_spanner(label = "Had COVID-19", 
              columns = vars("Think have not had COVID-19","Think have had COVID-19"))
print(alldata)
## <style>html {
##   font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen, Ubuntu, Cantarell, 'Helvetica Neue', 'Fira Sans', 'Droid Sans', Arial, sans-serif;
## }
## 
## #fsdeledpoe .gt_table {
##   display: table;
##   border-collapse: collapse;
##   margin-left: auto;
##   margin-right: auto;
##   color: #333333;
##   font-size: 16px;
##   font-weight: normal;
##   font-style: normal;
##   background-color: #FFFFFF;
##   width: auto;
##   border-top-style: solid;
##   border-top-width: 2px;
##   border-top-color: #A8A8A8;
##   border-right-style: none;
##   border-right-width: 2px;
##   border-right-color: #D3D3D3;
##   border-bottom-style: solid;
##   border-bottom-width: 2px;
##   border-bottom-color: #A8A8A8;
##   border-left-style: none;
##   border-left-width: 2px;
##   border-left-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_heading {
##   background-color: #FFFFFF;
##   text-align: center;
##   border-bottom-color: #FFFFFF;
##   border-left-style: none;
##   border-left-width: 1px;
##   border-left-color: #D3D3D3;
##   border-right-style: none;
##   border-right-width: 1px;
##   border-right-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_title {
##   color: #333333;
##   font-size: 125%;
##   font-weight: initial;
##   padding-top: 4px;
##   padding-bottom: 4px;
##   border-bottom-color: #FFFFFF;
##   border-bottom-width: 0;
## }
## 
## #fsdeledpoe .gt_subtitle {
##   color: #333333;
##   font-size: 85%;
##   font-weight: initial;
##   padding-top: 0;
##   padding-bottom: 4px;
##   border-top-color: #FFFFFF;
##   border-top-width: 0;
## }
## 
## #fsdeledpoe .gt_bottom_border {
##   border-bottom-style: solid;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_col_headings {
##   border-top-style: solid;
##   border-top-width: 2px;
##   border-top-color: #D3D3D3;
##   border-bottom-style: solid;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
##   border-left-style: none;
##   border-left-width: 1px;
##   border-left-color: #D3D3D3;
##   border-right-style: none;
##   border-right-width: 1px;
##   border-right-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_col_heading {
##   color: #333333;
##   background-color: #FFFFFF;
##   font-size: 100%;
##   font-weight: normal;
##   text-transform: inherit;
##   border-left-style: none;
##   border-left-width: 1px;
##   border-left-color: #D3D3D3;
##   border-right-style: none;
##   border-right-width: 1px;
##   border-right-color: #D3D3D3;
##   vertical-align: bottom;
##   padding-top: 5px;
##   padding-bottom: 6px;
##   padding-left: 5px;
##   padding-right: 5px;
##   overflow-x: hidden;
## }
## 
## #fsdeledpoe .gt_column_spanner_outer {
##   color: #333333;
##   background-color: #FFFFFF;
##   font-size: 100%;
##   font-weight: normal;
##   text-transform: inherit;
##   padding-top: 0;
##   padding-bottom: 0;
##   padding-left: 4px;
##   padding-right: 4px;
## }
## 
## #fsdeledpoe .gt_column_spanner_outer:first-child {
##   padding-left: 0;
## }
## 
## #fsdeledpoe .gt_column_spanner_outer:last-child {
##   padding-right: 0;
## }
## 
## #fsdeledpoe .gt_column_spanner {
##   border-bottom-style: solid;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
##   vertical-align: bottom;
##   padding-top: 5px;
##   padding-bottom: 6px;
##   overflow-x: hidden;
##   display: inline-block;
##   width: 100%;
## }
## 
## #fsdeledpoe .gt_group_heading {
##   padding: 8px;
##   color: #333333;
##   background-color: #FFFFFF;
##   font-size: 100%;
##   font-weight: initial;
##   text-transform: inherit;
##   border-top-style: solid;
##   border-top-width: 2px;
##   border-top-color: #D3D3D3;
##   border-bottom-style: solid;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
##   border-left-style: none;
##   border-left-width: 1px;
##   border-left-color: #D3D3D3;
##   border-right-style: none;
##   border-right-width: 1px;
##   border-right-color: #D3D3D3;
##   vertical-align: middle;
## }
## 
## #fsdeledpoe .gt_empty_group_heading {
##   padding: 0.5px;
##   color: #333333;
##   background-color: #FFFFFF;
##   font-size: 100%;
##   font-weight: initial;
##   border-top-style: solid;
##   border-top-width: 2px;
##   border-top-color: #D3D3D3;
##   border-bottom-style: solid;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
##   vertical-align: middle;
## }
## 
## #fsdeledpoe .gt_from_md > :first-child {
##   margin-top: 0;
## }
## 
## #fsdeledpoe .gt_from_md > :last-child {
##   margin-bottom: 0;
## }
## 
## #fsdeledpoe .gt_row {
##   padding-top: 8px;
##   padding-bottom: 8px;
##   padding-left: 5px;
##   padding-right: 5px;
##   margin: 10px;
##   border-top-style: solid;
##   border-top-width: 1px;
##   border-top-color: #D3D3D3;
##   border-left-style: none;
##   border-left-width: 1px;
##   border-left-color: #D3D3D3;
##   border-right-style: none;
##   border-right-width: 1px;
##   border-right-color: #D3D3D3;
##   vertical-align: middle;
##   overflow-x: hidden;
## }
## 
## #fsdeledpoe .gt_stub {
##   color: #333333;
##   background-color: #FFFFFF;
##   font-size: 100%;
##   font-weight: initial;
##   text-transform: inherit;
##   border-right-style: solid;
##   border-right-width: 2px;
##   border-right-color: #D3D3D3;
##   padding-left: 12px;
## }
## 
## #fsdeledpoe .gt_summary_row {
##   color: #333333;
##   background-color: #FFFFFF;
##   text-transform: inherit;
##   padding-top: 8px;
##   padding-bottom: 8px;
##   padding-left: 5px;
##   padding-right: 5px;
## }
## 
## #fsdeledpoe .gt_first_summary_row {
##   padding-top: 8px;
##   padding-bottom: 8px;
##   padding-left: 5px;
##   padding-right: 5px;
##   border-top-style: solid;
##   border-top-width: 2px;
##   border-top-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_grand_summary_row {
##   color: #333333;
##   background-color: #FFFFFF;
##   text-transform: inherit;
##   padding-top: 8px;
##   padding-bottom: 8px;
##   padding-left: 5px;
##   padding-right: 5px;
## }
## 
## #fsdeledpoe .gt_first_grand_summary_row {
##   padding-top: 8px;
##   padding-bottom: 8px;
##   padding-left: 5px;
##   padding-right: 5px;
##   border-top-style: double;
##   border-top-width: 6px;
##   border-top-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_striped {
##   background-color: rgba(128, 128, 128, 0.05);
## }
## 
## #fsdeledpoe .gt_table_body {
##   border-top-style: solid;
##   border-top-width: 2px;
##   border-top-color: #D3D3D3;
##   border-bottom-style: solid;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_footnotes {
##   color: #333333;
##   background-color: #FFFFFF;
##   border-bottom-style: none;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
##   border-left-style: none;
##   border-left-width: 2px;
##   border-left-color: #D3D3D3;
##   border-right-style: none;
##   border-right-width: 2px;
##   border-right-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_footnote {
##   margin: 0px;
##   font-size: 90%;
##   padding: 4px;
## }
## 
## #fsdeledpoe .gt_sourcenotes {
##   color: #333333;
##   background-color: #FFFFFF;
##   border-bottom-style: none;
##   border-bottom-width: 2px;
##   border-bottom-color: #D3D3D3;
##   border-left-style: none;
##   border-left-width: 2px;
##   border-left-color: #D3D3D3;
##   border-right-style: none;
##   border-right-width: 2px;
##   border-right-color: #D3D3D3;
## }
## 
## #fsdeledpoe .gt_sourcenote {
##   font-size: 90%;
##   padding: 4px;
## }
## 
## #fsdeledpoe .gt_left {
##   text-align: left;
## }
## 
## #fsdeledpoe .gt_center {
##   text-align: center;
## }
## 
## #fsdeledpoe .gt_right {
##   text-align: right;
##   font-variant-numeric: tabular-nums;
## }
## 
## #fsdeledpoe .gt_font_normal {
##   font-weight: normal;
## }
## 
## #fsdeledpoe .gt_font_bold {
##   font-weight: bold;
## }
## 
## #fsdeledpoe .gt_font_italic {
##   font-style: italic;
## }
## 
## #fsdeledpoe .gt_super {
##   font-size: 65%;
## }
## 
## #fsdeledpoe .gt_footnote_marks {
##   font-style: italic;
##   font-size: 65%;
## }
## </style>
## <div id="fsdeledpoe" style="overflow-x:auto;overflow-y:auto;width:auto;height:auto;"><table class="gt_table">
##   
##   <thead class="gt_col_headings">
##     <tr>
##       <th class="gt_col_heading gt_center gt_columns_bottom_border" rowspan="2" colspan="1">Participant Characteristics</th>
##       <th class="gt_col_heading gt_center gt_columns_bottom_border" rowspan="2" colspan="1">Level</th>
##       <th class="gt_center gt_columns_top_border gt_column_spanner_outer" rowspan="1" colspan="2">
##         <span class="gt_column_spanner">Had COVID-19</span>
##       </th>
##     </tr>
##     <tr>
##       <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1">Think have not had COVID-19</th>
##       <th class="gt_col_heading gt_columns_bottom_border gt_center" rowspan="1" colspan="1">Think have had COVID-19</th>
##     </tr>
##   </thead>
##   <tbody class="gt_table_body">
##     <tr>
##       <td class="gt_row gt_center">Age</td>
##       <td class="gt_row gt_center">18 to 24 years</td>
##       <td class="gt_row gt_left">1003(70.5)</td>
##       <td class="gt_row gt_left">419(29.5)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Age</td>
##       <td class="gt_row gt_center">25 to 34 years</td>
##       <td class="gt_row gt_left">823(67.3)</td>
##       <td class="gt_row gt_left">400(32.7)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Age</td>
##       <td class="gt_row gt_center">35 to 44 years</td>
##       <td class="gt_row gt_left">751(71.9)</td>
##       <td class="gt_row gt_left">294(28.1)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Age</td>
##       <td class="gt_row gt_center">45 to 54 years</td>
##       <td class="gt_row gt_left">554(77.2)</td>
##       <td class="gt_row gt_left">164(22.8)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Age</td>
##       <td class="gt_row gt_center">55 years and over</td>
##       <td class="gt_row gt_left">1525(87.6)</td>
##       <td class="gt_row gt_left">216(12.4)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Highest Education or Professional Qualification</td>
##       <td class="gt_row gt_center">GCSE/vocational/A-level/No formal qualifications</td>
##       <td class="gt_row gt_left">3382(76.1)</td>
##       <td class="gt_row gt_left">1060(23.9)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Highest Education or Professional Qualification</td>
##       <td class="gt_row gt_center">Degree or higher (Bachelors, Masters, PhD</td>
##       <td class="gt_row gt_left">1200(74.3)</td>
##       <td class="gt_row gt_left">415(25.7)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Gender</td>
##       <td class="gt_row gt_center">Male</td>
##       <td class="gt_row gt_left">2197(75.9)</td>
##       <td class="gt_row gt_left">697(24.1)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Gender</td>
##       <td class="gt_row gt_center">Female</td>
##       <td class="gt_row gt_left">2459(75.5)</td>
##       <td class="gt_row gt_left">796(24.5)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Have a child</td>
##       <td class="gt_row gt_center">No</td>
##       <td class="gt_row gt_left">2005(76.4)</td>
##       <td class="gt_row gt_left">621(23.6)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Have a child</td>
##       <td class="gt_row gt_center">Yes</td>
##       <td class="gt_row gt_left">2386(75.5)</td>
##       <td class="gt_row gt_left">776(24.5)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Working in Key Sector</td>
##       <td class="gt_row gt_center">No</td>
##       <td class="gt_row gt_left">3105(80.5)</td>
##       <td class="gt_row gt_left">753(19.5)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Working in Key Sector</td>
##       <td class="gt_row gt_center">Yes</td>
##       <td class="gt_row gt_left">1551(67.7)</td>
##       <td class="gt_row gt_left">740(32.3)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Region</td>
##       <td class="gt_row gt_center">Midlands</td>
##       <td class="gt_row gt_left">781(75.7)</td>
##       <td class="gt_row gt_left">251(24.3)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Region</td>
##       <td class="gt_row gt_center">South and East</td>
##       <td class="gt_row gt_left">1369(76.7)</td>
##       <td class="gt_row gt_left">416(23.3)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Region</td>
##       <td class="gt_row gt_center">North</td>
##       <td class="gt_row gt_left">1120(77)</td>
##       <td class="gt_row gt_left">335(23)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Region</td>
##       <td class="gt_row gt_center">London</td>
##       <td class="gt_row gt_left">701(70.1)</td>
##       <td class="gt_row gt_left">299(29.9)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Region</td>
##       <td class="gt_row gt_center">Wales, Scotland and Northern Ireland</td>
##       <td class="gt_row gt_left">685(78.1)</td>
##       <td class="gt_row gt_left">192(21.9)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Employment Status</td>
##       <td class="gt_row gt_center">Not working</td>
##       <td class="gt_row gt_left">1714(82.8)</td>
##       <td class="gt_row gt_left">357(17.2)</td>
##     </tr>
##     <tr>
##       <td class="gt_row gt_center">Employment Status</td>
##       <td class="gt_row gt_center">Working</td>
##       <td class="gt_row gt_left">2871(71.9)</td>
##       <td class="gt_row gt_left">1124(28.1)</td>
##     </tr>
##   </tbody>
##   
##   
## </table></div>
Percentage <- c("Percentage")

graph_values_real$Percentage <- as.numeric(graph_values_real$Percentage)
graph_values_real$Ever_covid <- as.factor(graph_values_real$Ever_covid)

 graphh <- graph_values_real %>% 
ggplot(aes(fill=Ever_covid, x= vars, y=Percentage)) +
   geom_bar(position = position_dodge(width = 0.8),stat = "identity", width = 0.6 ) +
   scale_y_continuous(limits = c(0,60), breaks = c(0,10,20,30,40,50,60)) +
   scale_fill_grey(start = 0.8, end = 0) +
   theme_bw() +
   theme(axis.text.x = element_text(angle = 55, vjust = 1, hjust = 1), 
         legend.position = "bottom",
         legend.title = element_blank(),
         axis.title.x =  element_blank(),
         axis.title.y = element_text(vjust = 4))

This is our improved version of the graph

summarising all the adhere values since they are needed to reproduce the graph. We now also round to 2 decimal places.

  graph2 <- COVID %>% 
   group_by(Ever_covid) %>% 
   summarise(across(c(starts_with("adhere")), list(mean = mean, standard_deviation = sd)))
 
 graph2 <- graph2 %>% 
   mutate(across(everything(), round,2))
 
print(graph2)
## # A tibble: 2 x 7
##   Ever_covid Adhere_shop_gro… Adhere_shop_gro… Adhere_shop_oth… Adhere_shop_oth…
##        <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
## 1          0             0.63             0.48             0.75             0.43
## 2          1             0.54             0.5              0.55             0.5 
## # … with 2 more variables: Adhere_meet_friends_mean <dbl>,
## #   Adhere_meet_friends_standard_deviation <dbl>

pivoting to long data where we have our means, and repeating for SD

 graph2mean <- graph2 %>% 
   select(ends_with("mean"), Ever_covid) %>% 
   pivot_longer(ends_with("mean") ,  names_to = "Mean_vars", values_to = "Mean_values")

 graph2SD <- graph2 %>% 
   select(ends_with("deviation"), Ever_covid) %>% 
   pivot_longer(ends_with("deviation") ,  names_to = "SD_vars", values_to = "SD_values")

using bind_cols to bind these two together, and changing to a factor. Also labelling the Ever_covid variable like it will appear in the graph

 graph2 <- bind_cols(graph2SD,graph2mean)
## New names:
## * Ever_covid -> Ever_covid...1
## * Ever_covid -> Ever_covid...4
 graph2$Ever_covid...1 <- as.factor(graph2$Ever_covid...1)
 graph2$Ever_covid...1 <- factor(graph2$Ever_covid...1,levels = c(0,1), labels = c("Think have not had COVID-19","Think have had COVID-19"))

Part 3: Exploration

1 - does gender impact on shopping for non-essentials?

I wonder if gender impacts the percentage of non-essential shopping according to lockdown measures. Non-essential shopping is shopping for items other than groceries and medicine. Females are usually generalised to shop more, so I was curious if this applied here.

# overview of data needed
exploratory1 <- COVID %>% 
  group_by(Adhere_shop_other, gender) %>%
  count(Adhere_shop_other)
print(exploratory1)
## # A tibble: 4 x 3
## # Groups:   Adhere_shop_other, gender [4]
##                                               Adhere_shop_other     gender     n
##                                                       <dbl+lbl>  <dbl+lbl> <int>
## 1 0 [Reported shopping once or more (not adhering to guidance)] 1 [Male]    1050
## 2 0 [Reported shopping once or more (not adhering to guidance)] 2 [Female]   783
## 3 1 [Reported not shopping for non-essentials (adhering)]       1 [Male]    1844
## 4 1 [Reported not shopping for non-essentials (adhering)]       2 [Female]  2472
# manually cleaning names + filtering for shopping for non-essentials (not adhering)
# the variables were coded wrong, so 0 is shopping and 1 is not shopping
exploratory2 <- exploratory1 %>% 
  mutate(gender = case_when(gender == 1 ~ "Male", gender == 2 ~ "Female")) %>% mutate(Adhere_shop_other = case_when(Adhere_shop_other == 0 ~ "Shopping", Adhere_shop_other == 1 ~ "Not Shopping")) %>% 
  group_by(Adhere_shop_other, gender) %>%
  summarise(Total = sum(n))
## `summarise()` has grouped output by 'Adhere_shop_other'. You can override using the `.groups` argument.
print(exploratory2)
## # A tibble: 4 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other gender Total
##   <chr>             <chr>  <int>
## 1 Not Shopping      Female  2472
## 2 Not Shopping      Male    1844
## 3 Shopping          Female   783
## 4 Shopping          Male    1050
# pivoting to wide data to get total participants
exploratory2wide <- exploratory2 %>% 
  pivot_wider(
    id_cols = Adhere_shop_other,
    names_from = gender,
    values_from = Total
  ) 
print(exploratory2wide)
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other Female  Male
##   <chr>              <int> <int>
## 1 Not Shopping        2472  1844
## 2 Shopping             783  1050
# and adding Total column again and percentage
exploratory2wide %>% 
  rowwise() %>% 
  mutate(
    Total = sum(c(Female, Male))
  ) %>% 
  mutate(
    Percentage_Female = (Female/Total)*100
  ) %>% 
  mutate(
    Percentage_Male = (Male/Total)*100
  ) %>% 
  filter(Adhere_shop_other == "Shopping") 
## # A tibble: 1 x 6
## # Rowwise:  Adhere_shop_other
##   Adhere_shop_other Female  Male Total Percentage_Female Percentage_Male
##   <chr>              <int> <int> <int>             <dbl>           <dbl>
## 1 Shopping             783  1050  1833              42.7            57.3
print(exploratory2wide)
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other Female  Male
##   <chr>              <int> <int>
## 1 Not Shopping        2472  1844
## 2 Shopping             783  1050
# using janitor to put percentages in brackets
exploratory2wide %>% 
  adorn_percentages("row") %>% 
  adorn_pct_formatting(digits = 2) %>% 
  adorn_ns() %>% 
  filter(Adhere_shop_other == "Shopping") %>% 
  pivot_longer(
    cols = c(Female, Male),
    names_to = "gender",
    values_to = "Percentage")
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [1]
##   Adhere_shop_other gender Percentage   
##   <chr>             <chr>  <chr>        
## 1 Shopping          Female 42.72%  (783)
## 2 Shopping          Male   57.28% (1050)
gender <- c("Female", "Male")

print(exploratory2wide)
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other Female  Male
##   <chr>              <int> <int>
## 1 Not Shopping        2472  1844
## 2 Shopping             783  1050
# using ggplot to create a bar graph
ggplotfinal1 <- ggplot(data = exploratory2wide, aes(x = gender, y = Percentage, fill = gender)) + geom_bar(stat = "identity", width=0.5, colour="black", fill=rgb(1, 0.5, 0.7, 1)) +
  labs(x = "Gender", y ="Percentage (%) ", title = "Non-Adherence to Non-Essential Shopping Guidelines") + theme_minimal()

print(ggplotfinal1)

I had a lotttt of issues here, did a lot of Googling and filled up my laptop’s Chrome history with R-related questions.

  • tried to pivot to wide data

exploratorywide <- exploratory2 %>% pivot_wider( id_cols = Adhering_to_Guidelines, names_from = Gender, values_from = n )

  • tried to make a percentage column

exploratoryfinal %>% select(Total, n) %>% mutate( Percentage = (values*2))

Interestingly, it seems like males were more likely to go out for non-essential shopping. However, we must keep in mind still that this study heavily relied on self-reporting, and there is no real way to know if these numbers are accurate, or if these numbers represent the general UK population

2. does region impact on testing rates? I was interested in seeing if living in a different region in the UK impact testing rates. As a different number of participants were surveyed for each region, I counteracted this by finding the testing rate as a percentage, and included both positive and negative results.

# overview of data needed 
q7beentested <- c("q7beentested")

COVID <- read_sav(file ="coviddata.sav")

exp2 <- COVID %>% 
  group_by(region, q7beentested) %>% 
  count(region)
print(exp2)
## # A tibble: 15 x 3
## # Groups:   region, q7beentested [15]
##               region                                 q7beentested     n
##            <dbl+lbl>                                    <dbl+lbl> <int>
##  1 1 [Midlands]      0 [Not been tested]                            930
##  2 1 [Midlands]      1 [Tested and result showed no coronavirus]     62
##  3 1 [Midlands]      2 [Tested and result showed yes coronavirus]    40
##  4 2 [South & East]  0 [Not been tested]                           1656
##  5 2 [South & East]  1 [Tested and result showed no coronavirus]     70
##  6 2 [South & East]  2 [Tested and result showed yes coronavirus]    59
##  7 3 [North]         0 [Not been tested]                           1333
##  8 3 [North]         1 [Tested and result showed no coronavirus]     66
##  9 3 [North]         2 [Tested and result showed yes coronavirus]    56
## 10 4 [London]        0 [Not been tested]                            840
## 11 4 [London]        1 [Tested and result showed no coronavirus]     93
## 12 4 [London]        2 [Tested and result showed yes coronavirus]    67
## 13 5 [Wales/Scot/NI] 0 [Not been tested]                            815
## 14 5 [Wales/Scot/NI] 1 [Tested and result showed no coronavirus]     39
## 15 5 [Wales/Scot/NI] 2 [Tested and result showed yes coronavirus]    23
# changing labels
exp2 %>% 
  mutate(region = case_when(region == 1 ~ "Midlands", region == 2 ~ "South & East", region == 3 ~ "North", region == 4 ~ "London", region == 5 ~ "Wales, Scotland, Northern Ireland")) %>% 
  summarise(Total = sum(n)) 
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 15 x 3
## # Groups:   region [5]
##    region                                                     q7beentested Total
##    <chr>                                                         <dbl+lbl> <int>
##  1 London                         0 [Not been tested]                        840
##  2 London                         1 [Tested and result showed no coronavi…    93
##  3 London                         2 [Tested and result showed yes coronav…    67
##  4 Midlands                       0 [Not been tested]                        930
##  5 Midlands                       1 [Tested and result showed no coronavi…    62
##  6 Midlands                       2 [Tested and result showed yes coronav…    40
##  7 North                          0 [Not been tested]                       1333
##  8 North                          1 [Tested and result showed no coronavi…    66
##  9 North                          2 [Tested and result showed yes coronav…    56
## 10 South & East                   0 [Not been tested]                       1656
## 11 South & East                   1 [Tested and result showed no coronavi…    70
## 12 South & East                   2 [Tested and result showed yes coronav…    59
## 13 Wales, Scotland, Northern Ire… 0 [Not been tested]                        815
## 14 Wales, Scotland, Northern Ire… 1 [Tested and result showed no coronavi…    39
## 15 Wales, Scotland, Northern Ire… 2 [Tested and result showed yes coronav…    23
# filtering & changing labels for q7beentested, creating totals column and totaltested column
# finding percentages
exp2final <- exp2 %>% 
  mutate(q7beentested = case_when(q7beentested == 1 ~ "TestednoCOVID", q7beentested == 2 ~ "TestedCOVID")) %>% 
  pivot_wider(
    id_cols = region,
    names_from = q7beentested,
    values_from = n
  ) %>% 
  mutate(
    totals = rowSums(across(where(is.numeric)))) %>% 
  select(3, 4, 5) %>% 
  mutate(
    totaltested = sum(c(TestedCOVID, TestednoCOVID))
  ) %>% 
  select(starts_with("total")) %>% 
  mutate(
    TestingRate = (totaltested/totals)*100
  ) %>% 
  select(TestingRate) %>% 
  mutate(region = case_when(region == 1 ~ "Midlands", region == 2 ~ "South & East", region == 3 ~ "North", region == 4 ~ "London", region == 5 ~ "Wales, Scotland, Northern Ireland"))
## Adding missing grouping variables: `region`
## Adding missing grouping variables: `region`
## Adding missing grouping variables: `region`
print(exp2)
## # A tibble: 15 x 3
## # Groups:   region, q7beentested [15]
##               region                                 q7beentested     n
##            <dbl+lbl>                                    <dbl+lbl> <int>
##  1 1 [Midlands]      0 [Not been tested]                            930
##  2 1 [Midlands]      1 [Tested and result showed no coronavirus]     62
##  3 1 [Midlands]      2 [Tested and result showed yes coronavirus]    40
##  4 2 [South & East]  0 [Not been tested]                           1656
##  5 2 [South & East]  1 [Tested and result showed no coronavirus]     70
##  6 2 [South & East]  2 [Tested and result showed yes coronavirus]    59
##  7 3 [North]         0 [Not been tested]                           1333
##  8 3 [North]         1 [Tested and result showed no coronavirus]     66
##  9 3 [North]         2 [Tested and result showed yes coronavirus]    56
## 10 4 [London]        0 [Not been tested]                            840
## 11 4 [London]        1 [Tested and result showed no coronavirus]     93
## 12 4 [London]        2 [Tested and result showed yes coronavirus]    67
## 13 5 [Wales/Scot/NI] 0 [Not been tested]                            815
## 14 5 [Wales/Scot/NI] 1 [Tested and result showed no coronavirus]     39
## 15 5 [Wales/Scot/NI] 2 [Tested and result showed yes coronavirus]    23
# defining columns and q7beentested
TestednoCOVID <- c(q7beentested == 1)
TestedCOVID <- c(q7beentested == 2)
q7beentested <- c("TestednoCOVID", "TestedCOVID")
totals <- c("totals")
totaltested <- c("totaltested")
TestingRate <- c("TestingRate")

# creating the bar graph
ggplotfinal2 <- ggplot(data = exp2final, aes(x = region, y = TestingRate, fill = TestingRate)) + geom_bar(stat = "identity", width=0.5, colour="black", fill=rgb(1, 0.9, 1, 1)) +
  labs(x = "Region", y ="Testing Rate (%) ", title = "COVID Antigen Testing Rates by Region") + theme_minimal()

print(ggplotfinal2)

I mostly had trouble finding the total number of people who were tested (summing positive and negative COVID results):

  • i tried mutate(sum = rowSums(exp2[, -totals])) and mutate(sum = rowSums(.[2:3])) and also mutate(tested = sum(c_across(Tested_no_COVID:Tested_and_COVID)))

Testing rate seems to be highest in the London region

This may be due to better access to testing locations in a bigger city or more testing locations available with faster results available.

Again, the social desirability bias plays a part in these results as participants across regions may inflate their response to testing.

3. does having a child increase your worry about COVID? This exploratory analysis had NA values, possibly due to participants not answering this question. You would hope that the researchers would make it compulsory to answer all questions in order to receive compensation.

# overview of data needed  
exp3 <- COVID %>% 
  group_by(Has_child, q9worry) %>% 
  count(Has_child)
print(exp3)
## # A tibble: 15 x 3
## # Groups:   Has_child, q9worry [15]
##                     Has_child                q9worry     n
##                     <dbl+lbl>              <dbl+lbl> <int>
##  1  0 [Does not have a child] 1 [Not at all worried]    96
##  2  0 [Does not have a child] 2 [Not very worried]     347
##  3  0 [Does not have a child] 3 [Somewhat worried]    1003
##  4  0 [Does not have a child] 4 [Very worried]         770
##  5  0 [Does not have a child] 5 [Extremely worried]    410
##  6  1 [Has a child]           1 [Not at all worried]   101
##  7  1 [Has a child]           2 [Not very worried]     285
##  8  1 [Has a child]           3 [Somewhat worried]    1027
##  9  1 [Has a child]           4 [Very worried]         962
## 10  1 [Has a child]           5 [Extremely worried]    787
## 11 NA                         1 [Not at all worried]     9
## 12 NA                         2 [Not very worried]      36
## 13 NA                         3 [Somewhat worried]     122
## 14 NA                         4 [Very worried]         111
## 15 NA                         5 [Extremely worried]     83
# changing labels and omitting NA
exp3 %>% 
  mutate(Has_child = case_when(Has_child == 0 ~ "No Child", Has_child == 1 ~ "Child")) %>% 
  na.omit()
## # A tibble: 10 x 3
## # Groups:   Has_child, q9worry [10]
##    Has_child                q9worry     n
##    <chr>                  <dbl+lbl> <int>
##  1 No Child  1 [Not at all worried]    96
##  2 No Child  2 [Not very worried]     347
##  3 No Child  3 [Somewhat worried]    1003
##  4 No Child  4 [Very worried]         770
##  5 No Child  5 [Extremely worried]    410
##  6 Child     1 [Not at all worried]   101
##  7 Child     2 [Not very worried]     285
##  8 Child     3 [Somewhat worried]    1027
##  9 Child     4 [Very worried]         962
## 10 Child     5 [Extremely worried]    787
exp3final <- exp3 %>% 
  na.omit()

# changing them into factors for discrete variables
exp3final$q9worry <- as.factor(exp3final$q9worry)
exp3final$Has_child <-
  as.factor(exp3final$Has_child)

exp3final %>% 
  mutate(Has_child = case_when(Has_child == 0 ~ "No Child", Has_child == 1 ~ "Child"))
## # A tibble: 10 x 3
## # Groups:   Has_child, q9worry [10]
##    Has_child q9worry     n
##    <chr>     <fct>   <int>
##  1 No Child  1          96
##  2 No Child  2         347
##  3 No Child  3        1003
##  4 No Child  4         770
##  5 No Child  5         410
##  6 Child     1         101
##  7 Child     2         285
##  8 Child     3        1027
##  9 Child     4         962
## 10 Child     5         787
print(exp3final)
## # A tibble: 10 x 3
## # Groups:   Has_child, q9worry [10]
##    Has_child q9worry     n
##    <fct>     <fct>   <int>
##  1 0         1          96
##  2 0         2         347
##  3 0         3        1003
##  4 0         4         770
##  5 0         5         410
##  6 1         1         101
##  7 1         2         285
##  8 1         3        1027
##  9 1         4         962
## 10 1         5         787
# using ggplot
explore3graph <- ggplot(exp3final, aes(q9worry, n, group=Has_child, colour=Has_child)) +
  geom_line(mapping = aes(x = q9worry, y = n, group = Has_child), size = 1, linetype = 1) +
  theme_ipsum() +
  labs(x="Worry", y = "Number of Responses", title = "Impact of Having Children on Worry about COVID-19", subtitle = "1 = Not at all worried, 2 = Not very worried, 3 = Somewhat worried, 4 = Very worried, 5 = Extremely worried")

# final result!
explore3graph +
  theme(
    plot.subtitle = element_text(size = 10, face = "italic")
  ) + scale_color_manual(labels = c("No Child", "Child"), values = c("pink", "purple")) + labs(fill="Legend") +
  theme(legend.title = element_blank())

It seems that having a child only significantly impacts worry but at the higher end of the scale. There were more responses for “very worried” and “extremely worried” for those who have children. However, there are more participants who have children.

Has_child <- c("Has_Child")

exp3percent0 <- exp3 %>% 
  na.omit() %>% 
   pivot_wider(
    id_cols = Has_child,
    names_from = q9worry,
    values_from = n
  ) %>% 
   mutate(totals = rowSums(across(where(is.numeric)))) %>% 
  mutate(
    One = 48.73, Two = 54.91, Three = 49.40, Four = 44.46, Five = 34.25) %>% 
  select(c(Has_child, 8:12)) %>% 
  filter(Has_child == 0) %>% 
  pivot_longer(
    cols = c(2:6),
    names_to = "Worry",
    values_to = "Percentages"
  )

exp3percent1 <- exp3 %>% 
  na.omit() %>% 
   pivot_wider(
    id_cols = Has_child,
    names_from = q9worry,
    values_from = n
  ) %>% 
   mutate(totals = rowSums(across(where(is.numeric)))) %>% 
  mutate(
    One = 51.27, Two = 45.09, Three = 50.60, Four = 55.54, Five = 65.75) %>% 
  select(c(Has_child, 8:12)) %>% 
  filter(Has_child == 1) %>% 
  pivot_longer(
    cols = c(2:6),
    names_to = "Worry",
    values_to = "Percentages") %>% 
  zap_label()

exp3percentfinal <- bind_rows(exp3percent0, exp3percent1) %>% 
  zap_labels()


Percentages <- c("Percentages")
Regions <- c("Regions")
Worry <- c("Worry")
cumulativesum <- c("cumulativesum")
Has_child <- c("Has_child")

exp3percentfinal$Worry <- as.factor(exp3percentfinal$Worry)
exp3percentfinal$Has_child <-
  as.factor(exp3percentfinal$Has_child)

exp3percentfinal$Worry <- factor(exp3percentfinal$Worry, levels = c("One", "Two", "Three", "Four", "Five"))

finalgraph <- ggplot() +
  geom_bar(aes(y = Percentages, x = Worry, fill = Has_child), data = exp3percentfinal, stat = "identity", width = 0.5, position = "stack") + 
  theme(legend.title = element_blank()) +
  scale_fill_manual(labels = c("No Child", "Child"), values = c("pink", "purple")) +
  labs(fill="Legend") +
  scale_y_continuous(labels = dollar_format(suffix = "%", prefix = "")) +
  labs(x = "Worry", y = "Percentage", title = "Percentage of Worry Responses (%)", subtitle = "1 = Not at all worried, 2 = Not very worried, 3 = Somewhat worried, 4 = Very worried, 5 = Extremely worried") +
  theme(
    plot.subtitle = element_text(size = 10, face = "italic"))

print(finalgraph)

This stacked bar graph makes it really easy to visualise that more respondents with children are more intensely worried

Part 4: Recommendation

The 3 recommendations I have chosen are:

  • Clear exclusion criteria - we some variables, we didn’t know why participants were excluded. This means the total number of participants was not the original 6149 and interfered with our reproducibility as our numbers often didn’t match up

  • A code book - we didn’t even realise other groups got a code book, so getting pre-existing code for our descriptives would make reproducing everything so much easier. It was very time-consuming coding descriptives ourselves

  • read me document - this would’ve been a lifesaver to understand how each variable was coded in the data, as some Likert scales were 1-5 and some were 1-4, others were binary variables and some were continuous

Next Steps…

My next steps are to finalise my report, and figure out how to knit to pdf and word. I’ve heard some members of my group having trouble so I’m a bit nervous.

Thank you for reading my learning log :)