Verification Report

Part 1: Reaction

Summary

This paper by Smith et al investigates if believing you have had COVID-19 impacts how an individual self-reports their adherence to lockdown measures. It is important to understand if people will stop adhering to measures after testing positive, as it is still unclear if an individual can have COVID more than once, and how common this could be. So far, there is no previous evidence on if adhering to protective measures is different if an individual thinks they have COVID or not (could be self-diagnosed or with antigen/antibody test). Understanding if a COVID diagnosis changes the way we try to protect ourselves from COVID and if the diagnosis impacts how we report our behaviour will help us understand future ways of exiting from lockdown strategies.

This study utilised an online cross-sectional survey, spanning 6149 UK participants aged 18+. Participants were asked if they had had COVID-19, if they had been tested, their perceived immunity to COVID-19, and other variables such as how often they went shopping and saw friends. Measurement of variables included Likert scales, binary and continuous variables. The authors hypothesised that believing you have had COVID-19 makes you more likely to believe you are immune, in addition to being less likely to adhere to social distancing measures.

It was found that those who believed they had had COVID-19 were more likely to think they are immune and stop participating in activities such as washing hands and social distancing. There was no evidence found for an association between thinking that you had had COVID-19 and its perceived risk. It remains likely that people will be required to adhere to protective measures for COVID-19 even if they have had the illness previously.

Since research around COVID is still novel, the results from this paper may significantly impact the future of implementation of lockdown rules. Currently, no media communications specifically target those who believed they have had COVID-19. As the opinions towards lockdown measures and COVID-19 immunity are different in these people, it is worth addressing this gap in the media. However, this study heavily relies on self-reported measures, where the social desirability bias impacts the way participants respond, especially in the rates of adhering to lockdown measures. The response from participants also may not be representative of the UK population.

Reaction

I wonder whether the results of this study would be universal if the same method was applied in a different country, such as Australia. It would be interesting to see if an increase/decrease of COVID-19 rates in the particular country would change these results. Similarly, if harsher COVID-19 restrictions would have any effect on the results gathered

I was confused by Figure 1, I couldn’t figure out what scale they were using for the graph and it seemed very out of place. I didn’t like how the labels on the X axis were so long and how the percentages of yes and no for each variable did not add up to 100%.

The most interesting parts of this paper was the statistics for the COVID-19 antigen test. I thought it was interesting that more than half of those who tested negative believed they had COVID-19. And since people who thought they had had COVID-19 were less likely to correctly identify COVID-19 symptoms, it seems to me that those who think they are COVID negative have a better understanding of COVID-19 symptoms. Does that mean that people who think they have had COVID-19 and think they have increased immunity become complacent?

Part 2: Verification

Now we are going to reproduce!

Demographic descriptives

the number of participants who thought they didn’t and did have COVID

amount_covid <- COVID %>% 
  group_by(Ever_covid) %>% 
  count(Ever_covid)
print(amount_covid)
## # A tibble: 2 x 2
## # Groups:   Ever_covid [2]
##                           Ever_covid     n
##                            <dbl+lbl> <int>
## 1 0 [Think have not had coronavirus]  4656
## 2 1 [Think have had coronavirus]      1493

how many had an antigen test for COVID

amount_covid1_antigen <- COVID %>% 
  group_by(q7beentested) %>%
  count(q7beentested)
print(amount_covid1_antigen)
## # A tibble: 3 x 2
## # Groups:   q7beentested [3]
##                                   q7beentested     n
##                                      <dbl+lbl> <int>
## 1 0 [Not been tested]                           5574
## 2 1 [Tested and result showed no coronavirus]    330
## 3 2 [Tested and result showed yes coronavirus]   245

how many tested negative but thought they had COVID

negativethought <- COVID %>% 
  group_by(q7beentested, Ever_covid) %>% 
  filter(Ever_covid == 1, q7beentested == 1) %>% 
  count(Ever_covid)
print(negativethought)
## # A tibble: 1 x 3
## # Groups:   q7beentested, Ever_covid [1]
##                                 q7beentested                    Ever_covid     n
##                                    <dbl+lbl>                     <dbl+lbl> <int>
## 1 1 [Tested and result showed no coronaviru… 1 [Think have had coronaviru…   187

how many tested positive but thought they didnt have COVID

positivethought <- COVID %>% 
  group_by(q7beentested, Ever_covid) %>% 
  filter(Ever_covid == 0, q7beentested == 2) %>% 
  count(Ever_covid)
print(positivethought)
## # A tibble: 1 x 3
## # Groups:   q7beentested, Ever_covid [1]
##                               q7beentested                      Ever_covid     n
##                                  <dbl+lbl>                       <dbl+lbl> <int>
## 1 2 [Tested and result showed yes coronav… 0 [Think have not had coronavi…    56

descriptives from table 1

how many males/females

COVID <- read_sav(file ="coviddata.sav")
covidgender <- COVID %>% 
  group_by(gender, Ever_covid) %>%
  mutate(gender = case_when(gender == 1 ~ "Male", gender == 2 ~ "female")) %>% 
  mutate(Ever_covid = case_when(Ever_covid == 0 ~ "Think have not had COVID-19", Ever_covid == 1 ~ "Think have had COVID-19")) %>%
  count(gender)
print(covidgender)
## # A tibble: 4 x 3
## # Groups:   gender, Ever_covid [4]
##   gender Ever_covid                      n
##   <chr>  <chr>                       <int>
## 1 female Think have had COVID-19       796
## 2 female Think have not had COVID-19  2459
## 3 Male   Think have had COVID-19       697
## 4 Male   Think have not had COVID-19  2197

how many in each age category

age <- COVID %>% 
  group_by(age_categories) %>% 
  mutate(age_categories = case_when(age_categories == 1 ~ "18 to 24 years", age_categories == 2 ~ "25 to 34 years", age_categories == 3 ~ "35 to 44 years", age_categories == 4 ~ "45 to 54 years", age_categories == 5 ~ "55 years and over")) %>%
  count(age_categories)
print(age)
## # A tibble: 5 x 2
## # Groups:   age_categories [5]
##   age_categories        n
##   <chr>             <int>
## 1 18 to 24 years     1422
## 2 25 to 34 years     1223
## 3 35 to 44 years     1045
## 4 45 to 54 years      718
## 5 55 years and over  1741

how many have children

They excluded 361 people here for some reason. Our guess is that they didn’t answer the question

child <- COVID %>% 
  group_by(Has_child) %>% 
  count(Has_child)
print(child)
## # A tibble: 3 x 2
## # Groups:   Has_child [3]
##                    Has_child     n
##                    <dbl+lbl> <int>
## 1  0 [Does not have a child]  2626
## 2  1 [Has a child]            3162
## 3 NA                           361

employment status

They excluded 83 people here

employment <- COVID %>% 
  group_by(Working) %>% 
  count(Working)
print(employment)
## # A tibble: 3 x 2
## # Groups:   Working [3]
##                                             Working     n
##                                           <dbl+lbl> <int>
## 1  0 [Not working]                                   2071
## 2  1 [Working (full or part time or self-employed)]  3995
## 3 NA                                                   83

how many are working in key sector

worker <- COVID %>% 
  group_by(Key_worker) %>% 
  count(Key_worker)
print(worker)
## # A tibble: 2 x 2
## # Groups:   Key_worker [2]
##           Key_worker     n
##            <dbl+lbl> <int>
## 1 0 [Not key worker]  3858
## 2 1 [Key worker]      2291

level of education

They excluded 92 people, maybe those who didn’t fit into either category?

education <- COVID %>% 
  group_by(degree) %>% 
  count(degree)
print(education)
## # A tibble: 3 x 2
## # Groups:   degree [3]
##                                                  degree     n
##                                               <dbl+lbl> <int>
## 1  0 [GCSE/vocational/A-level/no formal qualifications]  4442
## 2  1 [Degree or higher (Bachelors, Masters, PhD)]        1615
## 3 NA                                                       92

how many from each region

region1 <- COVID %>% 
  group_by(region) %>% 
  count(region)
print(region1)
## # A tibble: 5 x 2
## # Groups:   region [5]
##              region     n
##           <dbl+lbl> <int>
## 1 1 [Midlands]       1032
## 2 2 [South & East]   1785
## 3 3 [North]          1455
## 4 4 [London]         1000
## 5 5 [Wales/Scot/NI]   877

how many agreed/strongly agreed that they had some immunity to COVID & this also compared to if they thought they had COVID

agreecovid <- COVID %>% 
  group_by(q8haveimmunity) %>% 
  filter(q8haveimmunity > 3) %>% 
  count(q8haveimmunity)
print(agreecovid)
## # A tibble: 2 x 2
## # Groups:   q8haveimmunity [2]
##       q8haveimmunity     n
##            <dbl+lbl> <int>
## 1 4 [Agree]            841
## 2 5 [Strongly agree]   299
agreecovid1 <- COVID %>% 
  group_by(q8haveimmunity, Ever_covid) %>% 
  filter(q8haveimmunity > 3) %>% 
  count(Ever_covid)
print(agreecovid1)
## # A tibble: 4 x 3
## # Groups:   q8haveimmunity, Ever_covid [4]
##       q8haveimmunity                         Ever_covid     n
##            <dbl+lbl>                          <dbl+lbl> <int>
## 1 4 [Agree]          0 [Think have not had coronavirus]   382
## 2 4 [Agree]          1 [Think have had coronavirus]       459
## 3 5 [Strongly agree] 0 [Think have not had coronavirus]   118
## 4 5 [Strongly agree] 1 [Think have had coronavirus]       181

Means & SDs, Tables/Figures

Table 1

The original table

This code finds the numbers and percentages for variables in table 1. First, we used zap labels to remove the pre-existing labels from each vector. We used pivot longer in order to find the total number for each variable to create percentages. We filtered out variables not needed for table 1 and then omitted NA values. We found percentages using mutate, rounding to 1 decimal place.

covid_1 <- COVID %>%
  zap_labels() %>% 
  pivot_longer(gender:region, names_to = "vars", values_to = "values") %>% 
  group_by(Ever_covid, vars,values) %>% 
  count(name = "number") %>% 
  na.omit %>% 
  group_by(vars,values) %>% 
  mutate(percentage = round(number/sum(number) * 100, 1))
print(covid_1)
## # A tibble: 40 x 5
## # Groups:   vars, values [20]
##    Ever_covid vars           values number percentage
##         <dbl> <chr>           <dbl>  <int>      <dbl>
##  1          0 age_categories      1   1003       70.5
##  2          0 age_categories      2    823       67.3
##  3          0 age_categories      3    751       71.9
##  4          0 age_categories      4    554       77.2
##  5          0 age_categories      5   1525       87.6
##  6          0 degree              0   3382       76.1
##  7          0 degree              1   1200       74.3
##  8          0 gender              1   2197       75.9
##  9          0 gender              2   2459       75.5
## 10          0 Has_child           0   2005       76.4
## # … with 30 more rows

We used this code to find how many thought they have not had COVID and how many thought they did. These numbers are at the top row of table 1 and table 3

amount_covid <- COVID %>% 
  group_by(Ever_covid) %>% 
  count(Ever_covid)
  print(amount_covid)
## # A tibble: 2 x 2
## # Groups:   Ever_covid [2]
##                           Ever_covid     n
##                            <dbl+lbl> <int>
## 1 0 [Think have not had coronavirus]  4656
## 2 1 [Think have had coronavirus]      1493

We now changed the data back to wide data to begin reproducing the table. We again used zap labels to remove any leftover labels since we previously grouped before omitting NA values, so now there should not be any more labels. We used mutate to turn everything into a factor.

table1 <- covid_1 %>% 
  pivot_wider(names_from = vars, values_from = values) %>% 
zap_labels() %>% mutate_if(is.numeric, as.factor)
print(table1)
## # A tibble: 40 x 10
##    Ever_covid number percentage age_categories degree gender Has_child
##    <fct>      <fct>  <fct>      <fct>          <fct>  <fct>  <fct>    
##  1 0          1003   70.5       1              <NA>   <NA>   <NA>     
##  2 0          823    67.3       2              <NA>   <NA>   <NA>     
##  3 0          751    71.9       3              <NA>   <NA>   <NA>     
##  4 0          554    77.2       4              <NA>   <NA>   <NA>     
##  5 0          1525   87.6       5              <NA>   <NA>   <NA>     
##  6 0          3382   76.1       <NA>           0      <NA>   <NA>     
##  7 0          1200   74.3       <NA>           1      <NA>   <NA>     
##  8 0          2197   75.9       <NA>           <NA>   1      <NA>     
##  9 0          2459   75.5       <NA>           <NA>   2      <NA>     
## 10 0          2005   76.4       <NA>           <NA>   <NA>   0        
## # … with 30 more rows, and 3 more variables: Key_worker <fct>, region <fct>,
## #   Working <fct>

Now using factors, we can create the labels from the original table.

  table1$Ever_covid <- factor(table1$Ever_covid, labels = c("Think have not had COVID-19 n = 4656", "Think have had COVID-19 n = 1493"))
table1$gender <-factor(table1$gender, labels = c("Male", "Female"))
table1$age_categories <- factor(table1$age_categories, labels = c("18 to 24 years", "25 to 34 years", "35 to 44 years", "45 to 54 years", "55 years and over"))
table1$Has_child <- factor(table1$Has_child, labels = c("No", "Yes"))
table1$Working <- factor(table1$Working, labels = c("Not working", "Working"))
table1$Key_worker <- factor(table1$Key_worker, labels = c("No", "Yes"))
table1$degree <-factor(table1$degree, labels = c("GCSE/vocational/A-level/No formal qualifications", "Degree or higher (Bachelors, Masters, PhD"))
table1$region <- factor(table1$region, labels = c("Midlands", "South and East", "North", "London", "Wales, Scotland and Northern Ireland"))

Now we pivot again to long data to find our percentages using mutate, and add the rest of the labels.

table1 <- table1 %>% 
  pivot_longer(age_categories:Working, names_to = "Participant Characteristics", values_to = "Levels") %>% 
  na.omit %>% 
  mutate(Real = paste0(number,"(",percentage,"%)")) %>%
  select(-percentage, -number) %>% 
  pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real) 
print(table1)
## # A tibble: 20 x 4
##    `Participant Chara… Levels          `Think have not had … `Think have had CO…
##    <chr>               <fct>           <chr>                 <chr>              
##  1 age_categories      18 to 24 years  1003(70.5%)           419(29.5%)         
##  2 age_categories      25 to 34 years  823(67.3%)            400(32.7%)         
##  3 age_categories      35 to 44 years  751(71.9%)            294(28.1%)         
##  4 age_categories      45 to 54 years  554(77.2%)            164(22.8%)         
##  5 age_categories      55 years and o… 1525(87.6%)           216(12.4%)         
##  6 degree              GCSE/vocationa… 3382(76.1%)           1060(23.9%)        
##  7 degree              Degree or high… 1200(74.3%)           415(25.7%)         
##  8 gender              Male            2197(75.9%)           697(24.1%)         
##  9 gender              Female          2459(75.5%)           796(24.5%)         
## 10 Has_child           No              2005(76.4%)           621(23.6%)         
## 11 Has_child           Yes             2386(75.5%)           776(24.5%)         
## 12 Key_worker          No              3105(80.5%)           753(19.5%)         
## 13 Key_worker          Yes             1551(67.7%)           740(32.3%)         
## 14 region              Midlands        781(75.7%)            251(24.3%)         
## 15 region              South and East  1369(76.7%)           416(23.3%)         
## 16 region              North           1120(77%)             335(23%)           
## 17 region              London          701(70.1%)            299(29.9%)         
## 18 region              Wales, Scotlan… 685(78.1%)            192(21.9%)         
## 19 Working             Not working     1714(82.8%)           357(17.2%)         
## 20 Working             Working         2871(71.9%)           1124(28.1%)
table1$`Participant Characteristics` <- as.factor(table1$`Participant Characteristics`)
table1$`Participant Characteristics` <- factor(table1$`Participant Characteristics`, levels = c ( "gender",  "age_categories", "Has_child", "Working", "Key_worker","degree", "region"), labels = c ( "Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region"))

Using gt to create the table. Using rowname_col and groupname_col to create the headings and values. We pretty much used the gt package from here onwards for this table, reordering the groups and adding labels and fixing text to match the original table

  table1 <- table1 %>% 
  gt(rowname_col = "Levels",
     groupname_col = "Participant Characteristics") %>% 
  tab_spanner(label = md("**Had COVID-19**"), 
              columns = vars("Think have not had COVID-19 n = 4656","Think have had COVID-19 n = 1493")) %>% 
    row_group_order(c("Gender", "Age", "Have a child","Employment Status", "Working in Key Sector","Highest Education or Professional Qualification","Region")) %>% 
    cols_label(
      'Think have not had COVID-19 n = 4656' = md("**Think have not had COVID-19 n = 4656**"),
      'Think have had COVID-19 n = 1493' = md( "**Think have had COVID-19 n = 1493**")
    ) %>% 
    tab_style(
      style = list(
        cell_text(weight = "bold"),
        cell_fill()
      ),
  locations = cells_row_groups())

Table 2

Original table 2

Started off by finding the mean and SD for variables in table 2, as this is what is needed to reproduce the original.

 MSD <- COVID %>% 
   group_by(Ever_covid) %>% 
   summarise(across(c(q8haveimmunity,Going_out_total,q9worry,q10arisk,q10brisk),
                    list(mean = mean, standard_deviation = sd)))  %>% 
   mutate(across(2:11, round,2))
print(MSD)
## # A tibble: 2 x 11
##   Ever_covid q8haveimmunity_… q8haveimmunity_… Going_out_total… Going_out_total…
##    <dbl+lbl>            <dbl>            <dbl>            <dbl>            <dbl>
## 1 0 [Think …             2.38             1.01             6.69             5.63
## 2 1 [Think …             3.33             1                9.35             7.69
## # … with 6 more variables: q9worry_mean <dbl>,
## #   q9worry_standard_deviation <dbl>, q10arisk_mean <dbl>,
## #   q10arisk_standard_deviation <dbl>, q10brisk_mean <dbl>,
## #   q10brisk_standard_deviation <dbl>

Once we have the means/SD, we pivot to long data, selecting only the variables we want and arranging by Ever_covid

table2mean <- MSD %>% 
  select(ends_with("mean"), Ever_covid) %>% 
  pivot_longer(ends_with("mean") ,  names_to = "Mean_vars", values_to = "Mean_values")

table2SD <- MSD %>% 
  select(ends_with("deviation"), Ever_covid) %>% 
  pivot_longer(ends_with("deviation") ,  names_to = "SD_vars", values_to = "SD_values")

And now we bind these together using bind_cols, and mutate to change into the labels we want for our table, as well as renaming them.

  table2 <- bind_cols(table2SD,table2mean) %>% 
  mutate(real = paste0("M = ", Mean_values,",", " SD = ", SD_values)) %>% 
  select(Ever_covid...1, Mean_vars, real) %>% 
  mutate(Level = case_when(startsWith(Mean_vars,"q8") ~ "1 = Strongly disagree to  5 = Strongly agree", startsWith(Mean_vars,"q9") ~"1 = not worried at all to  5 = extremely worried", startsWith(Mean_vars, "Going") ~"Range = 0 to 42", TRUE ~"1 = not risk at all to  4 = major risk ")) %>% 
  rename("Participant characteristics" = "Mean_vars") 
## New names:
## * Ever_covid -> Ever_covid...1
## * Ever_covid -> Ever_covid...4
print(table2)
## # A tibble: 10 x 4
##            Ever_covid...1 `Participant charact… real       Level                
##                 <dbl+lbl> <chr>                 <chr>      <chr>                
##  1 0 [Think have not had… q8haveimmunity_mean   M = 2.38,… "1 = Strongly disagr…
##  2 0 [Think have not had… Going_out_total_mean  M = 6.69,… "Range = 0 to 42"    
##  3 0 [Think have not had… q9worry_mean          M = 3.59,… "1 = not worried at …
##  4 0 [Think have not had… q10arisk_mean         M = 2.81,… "1 = not risk at all…
##  5 0 [Think have not had… q10brisk_mean         M = 3.39,… "1 = not risk at all…
##  6 1 [Think have had cor… q8haveimmunity_mean   M = 3.33,… "1 = Strongly disagr…
##  7 1 [Think have had cor… Going_out_total_mean  M = 9.35,… "Range = 0 to 42"    
##  8 1 [Think have had cor… q9worry_mean          M = 3.38,… "1 = not worried at …
##  9 1 [Think have had cor… q10arisk_mean         M = 2.81,… "1 = not risk at all…
## 10 1 [Think have had cor… q10brisk_mean         M = 3.3, … "1 = not risk at all…

Now we change them into factors..

  table2$Ever_covid...1 <- as.factor(table2$Ever_covid...1)
  table2$`Participant characteristics` <- as.factor(table2$`Participant characteristics`)
  table2$Ever_covid...1 <- factor(table2$Ever_covid...1, labels = c("Think have not had COVID-19", "Think have had COVID-19"))
  table2$`Participant characteristics` <- factor(table2$`Participant characteristics`, labels = c("I think I have some immunity to COVID-19", "Total out-of-home activity in the last seven days","Worry about COVID-19","Perceived risk of COVID-19 to onself","Perceived risk of COVID-19 to people in the UK"))

Back to wide data to use gt. We add our subheadings and heading/title and it’s complete.

  table2 <- table2 %>% 
    pivot_wider(id_cols = -real, names_from = Ever_covid...1, values_from = real) %>% 
    gt(rowname_col = "Levels",
       groupname_col = "Participant Characteristics") %>% 
    tab_spanner(label = "Had COVID-19", 
                columns = vars("Think have not had COVID-19", "Think have had COVID-19"))

Table 3

Original table 3

We pivot to longer data to find numbers for each variable in table 3, sorted according to if they have had COVID-19 or not.

  t3_top <- COVID %>% 
   pivot_longer(Adhere_shop_groceries:Sx_covid_nomissing, names_to = "vars" , values_to = "values") %>%
   group_by(vars, values) %>% 
   count(name = "number")
  print(t3_top)
## # A tibble: 8 x 3
## # Groups:   vars, values [8]
##   vars                  values number
##   <chr>                  <dbl>  <int>
## 1 Adhere_meet_friends        0    878
## 2 Adhere_meet_friends        1   5271
## 3 Adhere_shop_groceries      0   2389
## 4 Adhere_shop_groceries      1   3760
## 5 Adhere_shop_other          0   1833
## 6 Adhere_shop_other          1   4316
## 7 Sx_covid_nomissing         0   2517
## 8 Sx_covid_nomissing         1   3632

We now include Ever_covid while pivoting to long data as this is needed for the left column in our table. Omitting any NA values and mutating creates a new column with our percentages.

  covid_3 <- COVID %>% 
   pivot_longer(Adhere_shop_groceries:Sx_covid_nomissing, names_to = "vars", values_to = "values") %>% 
   group_by(Ever_covid,vars,values) %>% 
   count(name = "number") %>% 
   na.omit %>% 
   group_by(Ever_covid, vars) %>% 
   mutate(percentage = round(number/sum(number) * 100,1))
  print(covid_3)
## # A tibble: 16 x 5
## # Groups:   Ever_covid, vars [8]
##                          Ever_covid vars                values number percentage
##                           <dbl+lbl> <chr>                <dbl>  <int>      <dbl>
##  1 0 [Think have not had coronavir… Adhere_meet_friends      0    456        9.8
##  2 0 [Think have not had coronavir… Adhere_meet_friends      1   4200       90.2
##  3 0 [Think have not had coronavir… Adhere_shop_grocer…      0   1701       36.5
##  4 0 [Think have not had coronavir… Adhere_shop_grocer…      1   2955       63.5
##  5 0 [Think have not had coronavir… Adhere_shop_other        0   1156       24.8
##  6 0 [Think have not had coronavir… Adhere_shop_other        1   3500       75.2
##  7 0 [Think have not had coronavir… Sx_covid_nomissing       0   1729       37.1
##  8 0 [Think have not had coronavir… Sx_covid_nomissing       1   2927       62.9
##  9 1 [Think have had coronavirus]   Adhere_meet_friends      0    422       28.3
## 10 1 [Think have had coronavirus]   Adhere_meet_friends      1   1071       71.7
## 11 1 [Think have had coronavirus]   Adhere_shop_grocer…      0    688       46.1
## 12 1 [Think have had coronavirus]   Adhere_shop_grocer…      1    805       53.9
## 13 1 [Think have had coronavirus]   Adhere_shop_other        0    677       45.3
## 14 1 [Think have had coronavirus]   Adhere_shop_other        1    816       54.7
## 15 1 [Think have had coronavirus]   Sx_covid_nomissing       0    788       52.8
## 16 1 [Think have had coronavirus]   Sx_covid_nomissing       1    705       47.2

Back to wide data to prepare our tibble for the table by removing pre-existing labels using zap labels like before, and changing into factors.

  table3 <- covid_3 %>% 
    pivot_wider(names_from = vars, values_from = values) %>% 
    zap_labels() %>% mutate_if(is.numeric, as.factor) 
## `mutate_if()` ignored the following grouping variables:
## Column `Ever_covid`
print(table3)
## # A tibble: 16 x 7
## # Groups:   Ever_covid [2]
##    Ever_covid number percentage Adhere_meet_friends Adhere_shop_groceries
##         <dbl> <fct>  <fct>      <fct>               <fct>                
##  1          0 456    9.8        0                   <NA>                 
##  2          0 4200   90.2       1                   <NA>                 
##  3          0 1701   36.5       <NA>                0                    
##  4          0 2955   63.5       <NA>                1                    
##  5          0 1156   24.8       <NA>                <NA>                 
##  6          0 3500   75.2       <NA>                <NA>                 
##  7          0 1729   37.1       <NA>                <NA>                 
##  8          0 2927   62.9       <NA>                <NA>                 
##  9          1 422    28.3       0                   <NA>                 
## 10          1 1071   71.7       1                   <NA>                 
## 11          1 688    46.1       <NA>                0                    
## 12          1 805    53.9       <NA>                1                    
## 13          1 677    45.3       <NA>                <NA>                 
## 14          1 816    54.7       <NA>                <NA>                 
## 15          1 788    52.8       <NA>                <NA>                 
## 16          1 705    47.2       <NA>                <NA>                 
## # … with 2 more variables: Adhere_shop_other <fct>, Sx_covid_nomissing <fct>
  table3$Ever_covid <- as.factor(table3$Ever_covid)
  table3$Ever_covid <- factor(table3$Ever_covid, labels = c("Think have not had COVID-19", "Think have had COVID-19"))
  table3$Adhere_shop_groceries <- factor(table3$Adhere_shop_groceries, labels = c("On one or fewer days in the last week, n = 2389", "On two or more days in the last week, n = 3760"))
  table3$Adhere_meet_friends <- factor(table3$Adhere_meet_friends, labels = c("Not at all in the last week, n = 5271", "On one or more days in the last week, n = 878"))
  table3$Adhere_shop_other <- factor(table3$Adhere_shop_other, labels = c("Not at all in the last week, n = 1833", "On one or more days in the last week, n = 4316"))
  table3$Sx_covid_nomissing <- factor(table3$Sx_covid_nomissing, labels = c("Did not correctly identify symptoms, n = 2390", "Correctly identified common symptoms, n = 3632"))

Pivoting to long data once again to fix up our column headings

  table3 <- table3 %>% 
    pivot_longer(Adhere_meet_friends:Sx_covid_nomissing, names_to = "Participant Characteristics", values_to = "Levels") %>% 
    na.omit %>% 
    mutate(Real = paste0(number,"(",percentage,")")) %>% 
    select(-number,-percentage) %>%
    pivot_wider(id_cols = -Real, names_from = Ever_covid, values_from = Real) 
    print(table3)
## # A tibble: 8 x 4
##   `Participant Charac… Levels             `Think have not ha… `Think have had C…
##   <chr>                <fct>              <chr>               <chr>             
## 1 Adhere_meet_friends  Not at all in the… 456(9.8)            422(28.3)         
## 2 Adhere_meet_friends  On one or more da… 4200(90.2)          1071(71.7)        
## 3 Adhere_shop_groceri… On one or fewer d… 1701(36.5)          688(46.1)         
## 4 Adhere_shop_groceri… On two or more da… 2955(63.5)          805(53.9)         
## 5 Adhere_shop_other    Not at all in the… 1156(24.8)          677(45.3)         
## 6 Adhere_shop_other    On one or more da… 3500(75.2)          816(54.7)         
## 7 Sx_covid_nomissing   Did not correctly… 1729(37.1)          788(52.8)         
## 8 Sx_covid_nomissing   Correctly identif… 2927(62.9)          705(47.2)

Changing to factors

  table3$`Participant Characteristics` <- as.factor(table3$`Participant Characteristics`)
  table3$`Participant Characteristics` <- factor(table3$`Participant Characteristics`, labels = c(
    "Meet up with family and friends","Shopping for groceries/pharmacy", "Shopping for items other than groceries or pharmacy",
    "Correct identification of main symptoms"
  ))

Using gt to create the table. Complete

  table3 <- table3 %>% 
    gt(rowname_col = "Levels",
       groupname_col = "Participant Characteristics") %>% 
    tab_spanner(label = "Had COVID-19", 
                columns = vars("Think have not had COVID-19",
                               "Think have had COVID-19")) %>% 
    row_group_order(c("Meet up with family and friends", "Shopping for groceries/pharmacy",
                      "Shopping for items other than groceries or pharmacy", "Correct identification of main symptoms" ))

Figure 1 graph

This reproduces the original graph

We first use mutate to change the labels for going out total into 0 and 1 for more than 8 times going out. We now count these and create a percentage column.

graph_valuesfinal <- COVID %>%  
   mutate(Going_out_total = case_when(Going_out_total > 7 ~ 1, TRUE ~0)) %>% 
   group_by(Ever_covid, Going_out_total) %>% 
   count(Going_out_total, name = "number") %>%
   group_by(Ever_covid) %>% 
   mutate(Percentage = round(number/sum(number) * 100,1))

print(graph_valuesfinal)
## # A tibble: 4 x 4
## # Groups:   Ever_covid [2]
##                           Ever_covid Going_out_total number Percentage
##                            <dbl+lbl>           <dbl>  <int>      <dbl>
## 1 0 [Think have not had coronavirus]               0   2905       62.4
## 2 0 [Think have not had coronavirus]               1   1751       37.6
## 3 1 [Think have had coronavirus]                   0    729       48.8
## 4 1 [Think have had coronavirus]                   1    764       51.2

We now zap labels to get rid of NA values, and then pivot to longer data. We zap labels again in case we missed anything, create the percentage column and delete all the other columns that are not needed.

graph_valuesfinal <- COVID %>% 
   zap_labels() %>% 
   mutate(Going_out_total = case_when(Going_out_total > 7 ~ 1, TRUE ~0)) %>% 
   pivot_longer(q8haveimmunity:Sx_covid_nomissing, names_to = "vars", values_to = "values") %>%
   group_by(Ever_covid,vars,values) %>% 
   count(name = "number") %>% 
   na.omit %>% 
   group_by(Ever_covid, vars) %>% 
   mutate(Percentage = round(number/sum(number) * 100,1)) %>% 
   filter(case_when(vars == "q8haveimmunity" ~ values == 5,
                    vars == "Sx_covid_nomissing" ~ values == 0 , startsWith(vars,'Adhere') ~ values ==0,
                    TRUE ~ values == 1)) %>% 
   select(vars, Ever_covid, Percentage)
  print(graph_valuesfinal)
## # A tibble: 18 x 3
## # Groups:   Ever_covid, vars [18]
##    vars                  Ever_covid Percentage
##    <chr>                      <dbl>      <dbl>
##  1 Adhere_meet_friends            0        9.8
##  2 Adhere_shop_groceries          0       36.5
##  3 Adhere_shop_other              0       24.8
##  4 Going_out_total                0       37.6
##  5 q10arisk                       0        3.2
##  6 q10brisk                       0        1  
##  7 q8haveimmunity                 0        2.5
##  8 q9worry                        0        2.5
##  9 Sx_covid_nomissing             0       37.1
## 10 Adhere_meet_friends            1       28.3
## 11 Adhere_shop_groceries          1       46.1
## 12 Adhere_shop_other              1       45.3
## 13 Going_out_total                1       51.2
## 14 q10arisk                       1        3.9
## 15 q10brisk                       1        1.3
## 16 q8haveimmunity                 1       12.1
## 17 q9worry                        1        6  
## 18 Sx_covid_nomissing             1       52.8

We want to change to a factor for our grph. We also change the labels according to the original graph.

graph_valuesfinal$vars <- as.factor(graph_valuesfinal$vars)
  graph_valuesfinal$Ever_covid <- as.factor(graph_valuesfinal$Ever_covid)
  graph_valuesfinal$Ever_covid <- factor(graph_valuesfinal$Ever_covid,
                                         levels = c(0,1),
                                         labels = c("Think have not had COVID-19","Think have had COVID-19"))
  graph_valuesfinal$vars <- factor(graph_valuesfinal$vars, 
                                   levels = c("q8haveimmunity", "Adhere_shop_groceries","Adhere_shop_other", "Adhere_meet_friends", "Going_out_total","q9worry", "q10arisk","q10brisk", "Sx_covid_nomissing"), labels = c("Strongly agree that they are immune","Shopping for groceries/pharmacy two or more times","Shopping for items other than groceries/pharmacy","Meet  up with friends/family they do not live with","Out-of-home actiivty, 8 or more outings","Not worried at all","Perceive no risk at all to themselves","Perceive no risk at all to others","Did not identity common symptoms"))
  print(graph_valuesfinal)
## # A tibble: 18 x 3
## # Groups:   Ever_covid, vars [18]
##    vars                                       Ever_covid              Percentage
##    <fct>                                      <fct>                        <dbl>
##  1 Meet  up with friends/family they do not … Think have not had COV…        9.8
##  2 Shopping for groceries/pharmacy two or mo… Think have not had COV…       36.5
##  3 Shopping for items other than groceries/p… Think have not had COV…       24.8
##  4 Out-of-home actiivty, 8 or more outings    Think have not had COV…       37.6
##  5 Perceive no risk at all to themselves      Think have not had COV…        3.2
##  6 Perceive no risk at all to others          Think have not had COV…        1  
##  7 Strongly agree that they are immune        Think have not had COV…        2.5
##  8 Not worried at all                         Think have not had COV…        2.5
##  9 Did not identity common symptoms           Think have not had COV…       37.1
## 10 Meet  up with friends/family they do not … Think have had COVID-19       28.3
## 11 Shopping for groceries/pharmacy two or mo… Think have had COVID-19       46.1
## 12 Shopping for items other than groceries/p… Think have had COVID-19       45.3
## 13 Out-of-home actiivty, 8 or more outings    Think have had COVID-19       51.2
## 14 Perceive no risk at all to themselves      Think have had COVID-19        3.9
## 15 Perceive no risk at all to others          Think have had COVID-19        1.3
## 16 Strongly agree that they are immune        Think have had COVID-19       12.1
## 17 Not worried at all                         Think have had COVID-19        6  
## 18 Did not identity common symptoms           Think have had COVID-19       52.8
graphgraph <- graph_valuesfinal %>% 
ggplot(aes(fill=Ever_covid, x= vars, y=Percentage)) +
   geom_bar(position = position_dodge(width = 0.8),stat = "identity", width = 0.6 ) +
   scale_y_continuous(limits = c(0,60), breaks = c(0,10,20,30,40,50,60)) +
   theme_bw() +
   scale_fill_grey(start = 0.7, end = 0) +
   theme(axis.text.x = element_text(angle = 55, vjust = 1, hjust = 1), 
         legend.position = "bottom",
         legend.title = element_blank(),
         axis.title.x =  element_blank(),
         axis.title.y = element_text(vjust = 4)) 

This is our improved version of the graph

summarising all the adhere values since they are needed to reproduce the graph. We now also round to 2 decimal places.

  graph2 <- COVID %>% 
   group_by(Ever_covid) %>% 
   summarise(across(c(starts_with("adhere")), list(mean = mean, standard_deviation = sd)))
 
 graph2 <- graph2 %>% 
   mutate(across(everything(), round,2))
 
print(graph2)
## # A tibble: 2 x 7
##   Ever_covid Adhere_shop_gro… Adhere_shop_gro… Adhere_shop_oth… Adhere_shop_oth…
##        <dbl>            <dbl>            <dbl>            <dbl>            <dbl>
## 1          0             0.63             0.48             0.75             0.43
## 2          1             0.54             0.5              0.55             0.5 
## # … with 2 more variables: Adhere_meet_friends_mean <dbl>,
## #   Adhere_meet_friends_standard_deviation <dbl>

pivoting to long data where we have our means, and repeating for SD

 graph2mean <- graph2 %>% 
   select(ends_with("mean"), Ever_covid) %>% 
   pivot_longer(ends_with("mean") ,  names_to = "Mean_vars", values_to = "Mean_values")

 graph2SD <- graph2 %>% 
   select(ends_with("deviation"), Ever_covid) %>% 
   pivot_longer(ends_with("deviation") ,  names_to = "SD_vars", values_to = "SD_values")

using bind_cols to bind these two together, and changing to a factor. Also labelling the Ever_covid variable like it will appear in the graph

 graph2 <- bind_cols(graph2SD,graph2mean)
## New names:
## * Ever_covid -> Ever_covid...1
## * Ever_covid -> Ever_covid...4
 graph2$Ever_covid...1 <- as.factor(graph2$Ever_covid...1)
 graph2$Ever_covid...1 <- factor(graph2$Ever_covid...1,levels = c(0,1), labels = c("Think have not had COVID-19","Think have had COVID-19"))
 
graph22 <- graph2 %>% 
  ggplot(data = graph2, mapping = aes(fill = Ever_covid...1, x = Mean_vars, y = Mean_values)) +
  geom_bar(position = position_dodge(width = 0.8), stat = "identity") +
  scale_y_continuous(limits = c(0, 1)) +
  theme(axis.text.x = element_text(angle = 55, vjust = 1),
        legend.position = "bottom",
        legend.title = element_blank(),
        axis.title.x = element_blank(),
        axis.title.y = element_text(vjust = 4)) +
  ggtitle("Average Self-Reported Adherence to COVID-19 Lockdown Measures") +
  labs(y = "Mean Self-Reported Adherence", x = "Adherence to Measures")

Part 3: Exploration

1 - does gender impact on shopping for non-essentials?

I wonder if gender impacts the percentage of non-essential shopping according to lockdown measures. Non-essential shopping is shopping for items other than groceries and medicine. Females are usually generalised to shop more, so I was curious if this applied here.

# overview of data needed
exploratory1 <- COVID %>% 
  group_by(Adhere_shop_other, gender) %>%
  count(Adhere_shop_other)
print(exploratory1)
## # A tibble: 4 x 3
## # Groups:   Adhere_shop_other, gender [4]
##                                               Adhere_shop_other     gender     n
##                                                       <dbl+lbl>  <dbl+lbl> <int>
## 1 0 [Reported shopping once or more (not adhering to guidance)] 1 [Male]    1050
## 2 0 [Reported shopping once or more (not adhering to guidance)] 2 [Female]   783
## 3 1 [Reported not shopping for non-essentials (adhering)]       1 [Male]    1844
## 4 1 [Reported not shopping for non-essentials (adhering)]       2 [Female]  2472

manually cleaning names + filtering for shopping for non-essentials (not adhering). Changing label names using mutate, and summarising to find the total

# the variables were coded wrong, so 0 is shopping and 1 is not shopping
exploratory2 <- exploratory1 %>% 
  mutate(gender = case_when(gender == 1 ~ "Male", gender == 2 ~ "Female")) %>% mutate(Adhere_shop_other = case_when(Adhere_shop_other == 0 ~ "Shopping", Adhere_shop_other == 1 ~ "Not Shopping")) %>% 
  group_by(Adhere_shop_other, gender) %>%
  summarise(Total = sum(n))
## `summarise()` has grouped output by 'Adhere_shop_other'. You can override using the `.groups` argument.
print(exploratory2)
## # A tibble: 4 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other gender Total
##   <chr>             <chr>  <int>
## 1 Not Shopping      Female  2472
## 2 Not Shopping      Male    1844
## 3 Shopping          Female   783
## 4 Shopping          Male    1050

pivoting to wide data to get total participants, adding a Total column and finding percentages

exploratory2wide <- exploratory2 %>% 
  pivot_wider(
    id_cols = Adhere_shop_other,
    names_from = gender,
    values_from = Total
  ) 
print(exploratory2wide)
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other Female  Male
##   <chr>              <int> <int>
## 1 Not Shopping        2472  1844
## 2 Shopping             783  1050
# and adding Total column again and percentage using rowwise
exploratory2wide %>% 
  rowwise() %>% 
  mutate(
    Total = sum(c(Female, Male))
  ) %>% 
  mutate(
    Percentage_Female = (Female/Total)*100
  ) %>% 
  mutate(
    Percentage_Male = (Male/Total)*100
  ) %>% 
  filter(Adhere_shop_other == "Shopping") 
## # A tibble: 1 x 6
## # Rowwise:  Adhere_shop_other
##   Adhere_shop_other Female  Male Total Percentage_Female Percentage_Male
##   <chr>              <int> <int> <int>             <dbl>           <dbl>
## 1 Shopping             783  1050  1833              42.7            57.3
print(exploratory2wide)
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other Female  Male
##   <chr>              <int> <int>
## 1 Not Shopping        2472  1844
## 2 Shopping             783  1050

I figured that I could use janitor to put percentages in brackets

exploratory2wide %>% 
  adorn_percentages("row") %>% 
  adorn_pct_formatting(digits = 2) %>% 
  adorn_ns() %>% 
  filter(Adhere_shop_other == "Shopping") %>% 
  pivot_longer(
    cols = c(Female, Male),
    names_to = "gender",
    values_to = "Percentage")
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [1]
##   Adhere_shop_other gender Percentage   
##   <chr>             <chr>  <chr>        
## 1 Shopping          Female 42.72%  (783)
## 2 Shopping          Male   57.28% (1050)
gender <- c("Female", "Male")

print(exploratory2wide)
## # A tibble: 2 x 3
## # Groups:   Adhere_shop_other [2]
##   Adhere_shop_other Female  Male
##   <chr>              <int> <int>
## 1 Not Shopping        2472  1844
## 2 Shopping             783  1050

using ggplot to create a bar graph

Percentage <- c("Percentage")

ggplotfinal1 <- ggplot(data = exploratory2wide, aes(x = gender, y = Percentage, fill = gender)) + geom_bar(stat = "identity", width=0.5, colour="black", fill=rgb(1, 0.5, 0.7, 1)) +
  labs(x = "Gender", y ="Percentage (%) ", title = "Non-Adherence to Non-Essential Shopping Guidelines") + theme_minimal()

print(ggplotfinal1)

I had a lotttt of issues here, did a lot of Googling and filled up my laptop’s Chrome history with R-related questions.

  • tried to pivot to wide data

exploratorywide <- exploratory2 %>% pivot_wider( id_cols = Adhering_to_Guidelines, names_from = Gender, values_from = n )

  • tried to make a percentage column

exploratoryfinal %>% select(Total, n) %>% mutate( Percentage = (values*2))

Interestingly, it seems like males were more likely to go out for non-essential shopping. However, we must keep in mind still that this study heavily relied on self-reporting, and there is no real way to know if these numbers are accurate, or if these numbers represent the general UK population

2. does region impact on testing rates?

I was interested in seeing if living in a different region in the UK impact testing rates. As a different number of participants were surveyed for each region, I counteracted this by finding the testing rate as a percentage, and included both positive and negative results.

# overview of data needed 
q7beentested <- c("q7beentested")

COVID <- read_sav(file ="coviddata.sav")

exp2 <- COVID %>% 
  group_by(region, q7beentested) %>% 
  count(region)
print(exp2)
## # A tibble: 15 x 3
## # Groups:   region, q7beentested [15]
##               region                                 q7beentested     n
##            <dbl+lbl>                                    <dbl+lbl> <int>
##  1 1 [Midlands]      0 [Not been tested]                            930
##  2 1 [Midlands]      1 [Tested and result showed no coronavirus]     62
##  3 1 [Midlands]      2 [Tested and result showed yes coronavirus]    40
##  4 2 [South & East]  0 [Not been tested]                           1656
##  5 2 [South & East]  1 [Tested and result showed no coronavirus]     70
##  6 2 [South & East]  2 [Tested and result showed yes coronavirus]    59
##  7 3 [North]         0 [Not been tested]                           1333
##  8 3 [North]         1 [Tested and result showed no coronavirus]     66
##  9 3 [North]         2 [Tested and result showed yes coronavirus]    56
## 10 4 [London]        0 [Not been tested]                            840
## 11 4 [London]        1 [Tested and result showed no coronavirus]     93
## 12 4 [London]        2 [Tested and result showed yes coronavirus]    67
## 13 5 [Wales/Scot/NI] 0 [Not been tested]                            815
## 14 5 [Wales/Scot/NI] 1 [Tested and result showed no coronavirus]     39
## 15 5 [Wales/Scot/NI] 2 [Tested and result showed yes coronavirus]    23

changing labels and finding total using summarise

exp2 %>% 
  mutate(region = case_when(region == 1 ~ "Midlands", region == 2 ~ "South & East", region == 3 ~ "North", region == 4 ~ "London", region == 5 ~ "Wales, Scotland, Northern Ireland")) %>% 
  summarise(Total = sum(n)) 
## `summarise()` has grouped output by 'region'. You can override using the `.groups` argument.
## # A tibble: 15 x 3
## # Groups:   region [5]
##    region                                                     q7beentested Total
##    <chr>                                                         <dbl+lbl> <int>
##  1 London                         0 [Not been tested]                        840
##  2 London                         1 [Tested and result showed no coronavi…    93
##  3 London                         2 [Tested and result showed yes coronav…    67
##  4 Midlands                       0 [Not been tested]                        930
##  5 Midlands                       1 [Tested and result showed no coronavi…    62
##  6 Midlands                       2 [Tested and result showed yes coronav…    40
##  7 North                          0 [Not been tested]                       1333
##  8 North                          1 [Tested and result showed no coronavi…    66
##  9 North                          2 [Tested and result showed yes coronav…    56
## 10 South & East                   0 [Not been tested]                       1656
## 11 South & East                   1 [Tested and result showed no coronavi…    70
## 12 South & East                   2 [Tested and result showed yes coronav…    59
## 13 Wales, Scotland, Northern Ire… 0 [Not been tested]                        815
## 14 Wales, Scotland, Northern Ire… 1 [Tested and result showed no coronavi…    39
## 15 Wales, Scotland, Northern Ire… 2 [Tested and result showed yes coronav…    23

filtering & changing labels for q7beentested, creating totals column and totaltested column and finding percentages

exp2final <- exp2 %>% 
  mutate(q7beentested = case_when(q7beentested == 1 ~ "TestednoCOVID", q7beentested == 2 ~ "TestedCOVID")) %>% 
  pivot_wider(
    id_cols = region,
    names_from = q7beentested,
    values_from = n
  ) %>% 
  mutate(
    totals = rowSums(across(where(is.numeric)))) %>% 
  select(3, 4, 5) %>% 
  mutate(
    totaltested = sum(c(TestedCOVID, TestednoCOVID))
  ) %>% 
  select(starts_with("total")) %>% 
  mutate(
    TestingRate = (totaltested/totals)*100
  ) %>% 
  select(TestingRate) %>% 
  mutate(region = case_when(region == 1 ~ "Midlands", region == 2 ~ "South & East", region == 3 ~ "North", region == 4 ~ "London", region == 5 ~ "Wales, Scotland, Northern Ireland"))
## Adding missing grouping variables: `region`
## Adding missing grouping variables: `region`
## Adding missing grouping variables: `region`
print(exp2)
## # A tibble: 15 x 3
## # Groups:   region, q7beentested [15]
##               region                                 q7beentested     n
##            <dbl+lbl>                                    <dbl+lbl> <int>
##  1 1 [Midlands]      0 [Not been tested]                            930
##  2 1 [Midlands]      1 [Tested and result showed no coronavirus]     62
##  3 1 [Midlands]      2 [Tested and result showed yes coronavirus]    40
##  4 2 [South & East]  0 [Not been tested]                           1656
##  5 2 [South & East]  1 [Tested and result showed no coronavirus]     70
##  6 2 [South & East]  2 [Tested and result showed yes coronavirus]    59
##  7 3 [North]         0 [Not been tested]                           1333
##  8 3 [North]         1 [Tested and result showed no coronavirus]     66
##  9 3 [North]         2 [Tested and result showed yes coronavirus]    56
## 10 4 [London]        0 [Not been tested]                            840
## 11 4 [London]        1 [Tested and result showed no coronavirus]     93
## 12 4 [London]        2 [Tested and result showed yes coronavirus]    67
## 13 5 [Wales/Scot/NI] 0 [Not been tested]                            815
## 14 5 [Wales/Scot/NI] 1 [Tested and result showed no coronavirus]     39
## 15 5 [Wales/Scot/NI] 2 [Tested and result showed yes coronavirus]    23
# defining columns and q7beentested
TestednoCOVID <- c(q7beentested == 1)
TestedCOVID <- c(q7beentested == 2)
q7beentested <- c("TestednoCOVID", "TestedCOVID")
totals <- c("totals")
totaltested <- c("totaltested")
TestingRate <- c("TestingRate")

creating the bar graph using ggplot and geom_bar

ggplotfinal2 <- ggplot(data = exp2final, aes(x = region, y = TestingRate, fill = TestingRate)) + geom_bar(stat = "identity", width=0.5, colour="black", fill=rgb(1, 0.9, 1, 1)) +
  labs(x = "Region", y ="Testing Rate (%) ", title = "COVID Antigen Testing Rates by Region") + theme_minimal()

print(ggplotfinal2)

I mostly had trouble finding the total number of people who were tested (summing positive and negative COVID results):

  • i tried mutate(sum = rowSums(exp2[, -totals])) and mutate(sum = rowSums(.[2:3])) and also mutate(tested = sum(c_across(Tested_no_COVID:Tested_and_COVID)))

Testing rate seems to be highest in the London region

This may be due to better access to testing locations in a bigger city or more testing locations available with faster results available.

Again, the social desirability bias plays a part in these results as participants across regions may inflate their response to testing.

3. does having a child increase your worry about COVID?

This exploratory analysis had NA values, possibly due to participants not answering this question. You would hope that the researchers would make it compulsory to answer all questions in order to receive compensation.

# overview of data needed  
exp3 <- COVID %>% 
  group_by(Has_child, q9worry) %>% 
  count(Has_child)
print(exp3)
## # A tibble: 15 x 3
## # Groups:   Has_child, q9worry [15]
##                     Has_child                q9worry     n
##                     <dbl+lbl>              <dbl+lbl> <int>
##  1  0 [Does not have a child] 1 [Not at all worried]    96
##  2  0 [Does not have a child] 2 [Not very worried]     347
##  3  0 [Does not have a child] 3 [Somewhat worried]    1003
##  4  0 [Does not have a child] 4 [Very worried]         770
##  5  0 [Does not have a child] 5 [Extremely worried]    410
##  6  1 [Has a child]           1 [Not at all worried]   101
##  7  1 [Has a child]           2 [Not very worried]     285
##  8  1 [Has a child]           3 [Somewhat worried]    1027
##  9  1 [Has a child]           4 [Very worried]         962
## 10  1 [Has a child]           5 [Extremely worried]    787
## 11 NA                         1 [Not at all worried]     9
## 12 NA                         2 [Not very worried]      36
## 13 NA                         3 [Somewhat worried]     122
## 14 NA                         4 [Very worried]         111
## 15 NA                         5 [Extremely worried]     83

changing labels and omitting NA

exp3 %>% 
  mutate(Has_child = case_when(Has_child == 0 ~ "No Child", Has_child == 1 ~ "Child")) %>% 
  na.omit()
## # A tibble: 10 x 3
## # Groups:   Has_child, q9worry [10]
##    Has_child                q9worry     n
##    <chr>                  <dbl+lbl> <int>
##  1 No Child  1 [Not at all worried]    96
##  2 No Child  2 [Not very worried]     347
##  3 No Child  3 [Somewhat worried]    1003
##  4 No Child  4 [Very worried]         770
##  5 No Child  5 [Extremely worried]    410
##  6 Child     1 [Not at all worried]   101
##  7 Child     2 [Not very worried]     285
##  8 Child     3 [Somewhat worried]    1027
##  9 Child     4 [Very worried]         962
## 10 Child     5 [Extremely worried]    787
exp3final <- exp3 %>% 
  na.omit()

changing them into factors for discrete variables

exp3final$q9worry <- as.factor(exp3final$q9worry)
exp3final$Has_child <-
  as.factor(exp3final$Has_child)

exp3final %>% 
  mutate(Has_child = case_when(Has_child == 0 ~ "No Child", Has_child == 1 ~ "Child"))
## # A tibble: 10 x 3
## # Groups:   Has_child, q9worry [10]
##    Has_child q9worry     n
##    <chr>     <fct>   <int>
##  1 No Child  1          96
##  2 No Child  2         347
##  3 No Child  3        1003
##  4 No Child  4         770
##  5 No Child  5         410
##  6 Child     1         101
##  7 Child     2         285
##  8 Child     3        1027
##  9 Child     4         962
## 10 Child     5         787
print(exp3final)
## # A tibble: 10 x 3
## # Groups:   Has_child, q9worry [10]
##    Has_child q9worry     n
##    <fct>     <fct>   <int>
##  1 0         1          96
##  2 0         2         347
##  3 0         3        1003
##  4 0         4         770
##  5 0         5         410
##  6 1         1         101
##  7 1         2         285
##  8 1         3        1027
##  9 1         4         962
## 10 1         5         787

using ggplot to create the graph

explore3graph <- ggplot(exp3final, aes(q9worry, n, group=Has_child, colour=Has_child)) +
  geom_line(mapping = aes(x = q9worry, y = n, group = Has_child), size = 1, linetype = 1) +
  theme_ipsum() +
  labs(x="Worry", y = "Number of Responses", title = "Impact of Having Children on Worry about COVID-19", subtitle = "1 = Not at all worried, 2 = Not very worried, 3 = Somewhat worried, 4 = Very worried, 5 = Extremely worried")

# final result!
explore3graph +
  theme(
    plot.subtitle = element_text(size = 10, face = "italic")
  ) + scale_color_manual(labels = c("No Child", "Child"), values = c("pink", "purple")) + labs(fill="Legend") +
  theme(legend.title = element_blank())

It seems that having a child only significantly impacts worry but at the higher end of the scale. There were more responses for “very worried” and “extremely worried” for those who have children. However, there are more participants who have children.

Has_child <- c("Has_Child")

# finding percentages 
exp3percent0 <- exp3 %>% 
  na.omit() %>% 
   pivot_wider(
    id_cols = Has_child,
    names_from = q9worry,
    values_from = n
  ) %>% 
   mutate(totals = rowSums(across(where(is.numeric)))) %>% 
  mutate(
    One = 48.73, Two = 54.91, Three = 49.40, Four = 44.46, Five = 34.25) %>% 
  select(c(Has_child, 8:12)) %>% 
  filter(Has_child == 0) %>% 
  pivot_longer(
    cols = c(2:6),
    names_to = "Worry",
    values_to = "Percentages"
  )

exp3percent1 <- exp3 %>% 
  na.omit() %>% 
   pivot_wider(
    id_cols = Has_child,
    names_from = q9worry,
    values_from = n
  ) %>% 
   mutate(totals = rowSums(across(where(is.numeric)))) %>% 
  mutate(
    One = 51.27, Two = 45.09, Three = 50.60, Four = 55.54, Five = 65.75) %>% 
  select(c(Has_child, 8:12)) %>% 
  filter(Has_child == 1) %>% 
  pivot_longer(
    cols = c(2:6),
    names_to = "Worry",
    values_to = "Percentages") %>% 
  zap_label()

# deleting labels
exp3percentfinal <- bind_rows(exp3percent0, exp3percent1) %>% 
  zap_labels()

Percentages <- c("Percentages")
Regions <- c("Regions")
Worry <- c("Worry")
cumulativesum <- c("cumulativesum")
Has_child <- c("Has_child")

# changing into factors
exp3percentfinal$Worry <- as.factor(exp3percentfinal$Worry)
exp3percentfinal$Has_child <-
  as.factor(exp3percentfinal$Has_child)

exp3percentfinal$Worry <- factor(exp3percentfinal$Worry, levels = c("One", "Two", "Three", "Four", "Five"))

now we plot a stacked bar graph using geom_bar

the finished product!

finalgraph <- ggplot() +
  geom_bar(aes(y = Percentages, x = Worry, fill = Has_child), data = exp3percentfinal, stat = "identity", width = 0.5, position = "stack") + 
  theme(legend.title = element_blank()) +
  scale_fill_manual(labels = c("No Child", "Child"), values = c("pink", "purple")) +
  labs(fill="Legend") +
  scale_y_continuous(labels = dollar_format(suffix = "%", prefix = "")) +
  labs(x = "Worry", y = "Percentage", title = "Percentage of Worry Responses (%)", subtitle = "1 = Not at all worried, 2 = Not very worried, 3 = Somewhat worried, 4 = Very worried, 5 = Extremely worried") +
  theme(
    plot.subtitle = element_text(size = 10, face = "italic"))

print(finalgraph)

This stacked bar graph makes it really easy to visualise that more respondents with children are more intensely worried

Part 4: Recommendation

The 3 recommendations I have chosen are:

  • Clear exclusion criteria - with some variables, we didn’t know why participants were excluded. This means the total number of participants was not the original 6149 and interfered with our reproducibility as our numbers often didn’t match up. This impacted our percentages so we don’t know how accurate some are, which may have inflated or deflated my results in my exploratory analyses. The authors could’ve included their exclusion criteria within their paper, in the method section, with reasons as to why they excluded those participants within specific variables.

  • A code book - we didn’t even realise other groups got a code book, so getting pre-existing code for our descriptives would make reproducing everything so much easier. It was very time-consuming coding descriptives ourselves when we could’ve focused on reproducing figures instead. A codebook could have included the exact questions from the surveys, which would’ve helped us identify if the framing of the question influenced the participants to answer in a particular way. Summary statistics for each variable would also show where they excluded participants as well as any missing data, and also give us exact percentages for the values in our graph.

  • provide all values used in figures - this would’ve been a lifesaver to understand the percentages for each bar on our bar graph, as we had no way of confirming that our percentages were correct when we coded them ourselves. We just had to eye each percentage. Including each and every value would speed up the reproducibility process so much, and would not even have taken that much time on their part.

Thank you for reading my verification report! :)