Week 9: Learning Log

# setting the CRAN mirror
options(repos = c(CRAN = "http://cran.rstudio.com"))

My Coding Goals this Week

My goals this week are to complete the reaction, descriptives and get started on my exploratory analyses.

Challenges/Successes

To be honest, I had a hard time this week on my report.

Part 1: Reaction

Summary

This paper by Smith et al investigates if believing you have had COVID-19 impacts how an individual self-reports their adherence to lockdown measures. It is important to understand if people will stop adhering to measures after testing positive, as it is still unclear if an individual can have COVID more than once, and how common this could be. So far, there is no previous evidence on if adhering to protective measures is different if an individual thinks they have COVID or not (could be self-diagnosed or with antigen/antibody test). Understanding if a COVID diagnosis changes the way we try to protect ourselves from COVID and if the diagnosis impacts how we report our behaviour will help us understand future ways of exiting from lockdown strategies.

This study utilised an online cross-sectional survey, spanning 6149 UK participants aged 18+. Participants were asked if they had had COVID-19, if they had been tested, their perceived immunity to COVID-19, and other variables such as how often they went shopping and saw friends. Measurement of variables included Likert scales, binary and continuous variables. The authors hypothesised that believing you have had COVID-19 makes you more likely to believe you are immune, in addition to being less likely to adhere to social distancing measures.

It was found that those who believed they had had COVID-19 were more likely to think they are immune and stop participating in activities such as washing hands and social distancing. There was no evidence found for an association between thinking that you had had COVID-19 and its perceived risk. It remains likely that people will be required to adhere to protective measures for COVID-19 even if they have had the illness previously.

Since research around COVID is still novel, the results from this paper may significantly impact the future of implementation of lockdown rules. Currently, no media communications specifically target those who believed they have had COVID-19. As the opinions towards lockdown measures and COVID-19 immunity are different in these people, it is worth addressing this gap in the media. However, this study heavily relies on self-reported measures, where the social desirability bias impacts the way participants respond, especially in the rates of adhering to lockdown measures. The response from participants also may not be representative of the UK population.

Reaction

I wonder whether the results of this study would be universal if the same method was applied in a different country, such as Australia. It would be interesting to see if an increase/decrease of COVID-19 rates in the particular country would change these results. Similarly, if harsher COVID-19 restrictions would have any effect on the results gathered

I was confused by Figure 1, I couldn’t figure out what scale they were using for the graph and it seemed very out of place. I didn’t like how the labels on the X axis were so long and how the percentages of yes and no for each variable did not add up to 100%.

The most interesting parts of this paper was the statistics for the COVID-19 antigen test. I thought it was interesting that more than half of those who tested negative believed they had COVID-19. And since people who thought they had had COVID-19 were less likely to correctly identify COVID-19 symptoms, it seems to me that those who think they are COVID negative have a better understanding of COVID-19 symptoms. Does that mean that people who think they have had COVID-19 and think they have increased immunity become complacent?

Part 2: Verification

A majority of the tables and figures were reproduced as a group. I tried to reproduce as many as possible from the paper.

- However, I am having trouble creating a percentage column for each of these.

Next week I need to add descriptions to each descriptive stating what it’s measuring and what I/we did to code that variable.

I’ve pasted my errors at the bottom.

install.packages("dplyr")

## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//RtmpD5QB5b/downloaded_packages

install.packages("gt")

## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//RtmpD5QB5b/downloaded_packages

install.packages("forcats")

## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//RtmpD5QB5b/downloaded_packages

install.packages("ggplot")

## Warning: package 'ggplot' is not available for this version of R
## 
## A version of this package for your version of R might be available elsewhere,
## see the ideas at
## https://cran.r-project.org/doc/manuals/r-patched/R-admin.html#Installing-packages

install.packages("tidyr")

## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//RtmpD5QB5b/downloaded_packages

install.packages("janitor")

## 
## The downloaded binary packages are in
##  /var/folders/cw/l9bfyrms3md0tbkr1866zbl80000gn/T//RtmpD5QB5b/downloaded_packages

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(tidyverse)

## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──

## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.1.0     ✓ stringr 1.4.0
## ✓ tidyr   1.1.3     ✓ forcats 0.5.1
## ✓ readr   1.4.0

## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(haven)
library(gt)
library(forcats)
library(ggplot2)
library(tidyr)
library(janitor)

## 
## Attaching package: 'janitor'

## The following objects are masked from 'package:stats':
## 
##     chisq.test, fisher.test

COVID <- read_sav(file ="coviddata.sav")

# demographics - how many participants
# fix this, i want a percentage
# tried to use rename, mutate and pivot longer to create a tibble I could use for a percentage column but it didnt work

amount_covid <- COVID %>% 
  group_by(Ever_covid) %>% 
  count(Ever_covid)

amount_covid

## # A tibble: 2 x 2
## # Groups:   Ever_covid [2]
##                           Ever_covid     n
##                            <dbl+lbl> <int>
## 1 0 [Think have not had coronavirus]  4656
## 2 1 [Think have had coronavirus]      1493

# demographics - how many thought they had COVID-19
amount_covid1 <- COVID %>% 
  group_by(Ever_covid) %>% 
  filter(Ever_covid == 1) %>% 
  count(Ever_covid)

amount_covid1

## # A tibble: 1 x 2
## # Groups:   Ever_covid [1]
##                       Ever_covid     n
##                        <dbl+lbl> <int>
## 1 1 [Think have had coronavirus]  1493

# demographics - how many had antigen test for COVID
amount_covid1_antigen <- COVID %>% 
  group_by(q7beentested) %>%
  count(q7beentested)

amount_covid1_antigen

## # A tibble: 3 x 2
## # Groups:   q7beentested [3]
##                                   q7beentested     n
##                                      <dbl+lbl> <int>
## 1 0 [Not been tested]                           5574
## 2 1 [Tested and result showed no coronavirus]    330
## 3 2 [Tested and result showed yes coronavirus]   245

# demographics - how many tested negative but thought they had COVID
negativethought <- COVID %>% 
  group_by(q7beentested, Ever_covid) %>% 
  filter(Ever_covid == 1, q7beentested == 1) %>% 
  count(Ever_covid)

negativethought

## # A tibble: 1 x 3
## # Groups:   q7beentested, Ever_covid [1]
##                                 q7beentested                    Ever_covid     n
##                                    <dbl+lbl>                     <dbl+lbl> <int>
## 1 1 [Tested and result showed no coronaviru… 1 [Think have had coronaviru…   187

# demographics - how many tested positive but thought they didnt have COVID
positivethought <- COVID %>% 
  group_by(q7beentested, Ever_covid) %>% 
  filter(Ever_covid == 0, q7beentested == 2) %>% 
  count(Ever_covid)

positivethought

## # A tibble: 1 x 3
## # Groups:   q7beentested, Ever_covid [1]
##                               q7beentested                      Ever_covid     n
##                                  <dbl+lbl>                       <dbl+lbl> <int>
## 1 2 [Tested and result showed yes coronav… 0 [Think have not had coronavi…    56

# demographics - how many males/females (1 is male, 2 is female)
COVID <- read_sav(file ="coviddata.sav")
covidgender <- COVID %>% 
  group_by(gender, Ever_covid) %>%
  mutate(gender = case_when(gender == 1 ~ "Male", gender == 2 ~ "female")) %>% 
  mutate(Ever_covid = case_when(Ever_covid == 0 ~ "Think have not had COVID-19", Ever_covid == 1 ~ "Think have had COVID-19")) %>%
  count(gender)

print(covidgender)

## # A tibble: 4 x 3
## # Groups:   gender, Ever_covid [4]
##   gender Ever_covid                      n
##   <chr>  <chr>                       <int>
## 1 female Think have had COVID-19       796
## 2 female Think have not had COVID-19  2459
## 3 Male   Think have had COVID-19       697
## 4 Male   Think have not had COVID-19  2197

# demographics - age
# 1 is 18-24, 2 is 25-34, 3 is 35-44, 4 is 45-54, 5 is 55+
age <- COVID %>% 
  group_by(age_categories, Ever_covid) %>% 
  mutate(age_categories = case_when(age_categories == 1 ~ "18 to 24 years", age_categories == 2 ~ "25 to 34 years", age_categories == 3 ~ "35 to 44 years", age_categories == 4 ~ "45 to 54 years", age_categories == 5 ~ "55 years and over")) %>%
  mutate(Ever_covid = case_when(Ever_covid == 0 ~ "Think have not had COVID-19", Ever_covid == 1 ~ "Think have had COVID-19")) %>%
  count(age_categories)

print(age)

## # A tibble: 10 x 3
## # Groups:   age_categories, Ever_covid [10]
##    age_categories    Ever_covid                      n
##    <chr>             <chr>                       <int>
##  1 18 to 24 years    Think have had COVID-19       419
##  2 18 to 24 years    Think have not had COVID-19  1003
##  3 25 to 34 years    Think have had COVID-19       400
##  4 25 to 34 years    Think have not had COVID-19   823
##  5 35 to 44 years    Think have had COVID-19       294
##  6 35 to 44 years    Think have not had COVID-19   751
##  7 45 to 54 years    Think have had COVID-19       164
##  8 45 to 54 years    Think have not had COVID-19   554
##  9 55 years and over Think have had COVID-19       216
## 10 55 years and over Think have not had COVID-19  1525

# demographics - child
# they excluded 361 people here? bc they didnt answer? 0 is no child, 1 is child
child <- COVID %>% 
  group_by(Has_child, Ever_covid) %>% 
  count(Has_child)

print(child)

## # A tibble: 6 x 3
## # Groups:   Has_child, Ever_covid [6]
##                    Has_child                         Ever_covid     n
##                    <dbl+lbl>                          <dbl+lbl> <int>
## 1  0 [Does not have a child] 0 [Think have not had coronavirus]  2005
## 2  0 [Does not have a child] 1 [Think have had coronavirus]       621
## 3  1 [Has a child]           0 [Think have not had coronavirus]  2386
## 4  1 [Has a child]           1 [Think have had coronavirus]       776
## 5 NA                         0 [Think have not had coronavirus]   265
## 6 NA                         1 [Think have had coronavirus]        96

# demographics - employment status
# 0 is not working, 2 is working. excluded 83 people?
employment <- COVID %>% 
  group_by(Working, Ever_covid) %>% 
  count(Working)

print(employment)

## # A tibble: 6 x 3
## # Groups:   Working, Ever_covid [6]
##                                      Working                    Ever_covid     n
##                                    <dbl+lbl>                     <dbl+lbl> <int>
## 1  0 [Not working]                           0 [Think have not had corona…  1714
## 2  0 [Not working]                           1 [Think have had coronaviru…   357
## 3  1 [Working (full or part time or self-em… 0 [Think have not had corona…  2871
## 4  1 [Working (full or part time or self-em… 1 [Think have had coronaviru…  1124
## 5 NA                                         0 [Think have not had corona…    71
## 6 NA                                         1 [Think have had coronaviru…    12

# demographics - working in key sector
# 0 is no, 2 is yes
worker <- COVID %>% 
  group_by(Key_worker, Ever_covid) %>% 
  count(Key_worker)

print(worker)

## # A tibble: 4 x 3
## # Groups:   Key_worker, Ever_covid [4]
##           Key_worker                         Ever_covid     n
##            <dbl+lbl>                          <dbl+lbl> <int>
## 1 0 [Not key worker] 0 [Think have not had coronavirus]  3105
## 2 0 [Not key worker] 1 [Think have had coronavirus]       753
## 3 1 [Key worker]     0 [Think have not had coronavirus]  1551
## 4 1 [Key worker]     1 [Think have had coronavirus]       740

# demographics - education
# 0 is GCSE etc, 2 is degree/higher, they excluded 92? maybe those who didnt fit into either category?
education <- COVID %>% 
  group_by(degree, Ever_covid) %>% 
  count(degree)

print(education)

## # A tibble: 6 x 3
## # Groups:   degree, Ever_covid [6]
##                                        degree                   Ever_covid     n
##                                     <dbl+lbl>                    <dbl+lbl> <int>
## 1  0 [GCSE/vocational/A-level/no formal qual… 0 [Think have not had coron…  3382
## 2  0 [GCSE/vocational/A-level/no formal qual… 1 [Think have had coronavir…  1060
## 3  1 [Degree or higher (Bachelors, Masters, … 0 [Think have not had coron…  1200
## 4  1 [Degree or higher (Bachelors, Masters, … 1 [Think have had coronavir…   415
## 5 NA                                          0 [Think have not had coron…    74
## 6 NA                                          1 [Think have had coronavir…    18

# demographics - region
# 1 is midlands, 2 is south and east, 3 is north, 4 is london, 5 is walws, scotland, northern ireland
region1 <- COVID %>% 
  group_by(region, Ever_covid) %>% 
  count(region)

print(region1)

## # A tibble: 10 x 3
## # Groups:   region, Ever_covid [10]
##               region                         Ever_covid     n
##            <dbl+lbl>                          <dbl+lbl> <int>
##  1 1 [Midlands]      0 [Think have not had coronavirus]   781
##  2 1 [Midlands]      1 [Think have had coronavirus]       251
##  3 2 [South & East]  0 [Think have not had coronavirus]  1369
##  4 2 [South & East]  1 [Think have had coronavirus]       416
##  5 3 [North]         0 [Think have not had coronavirus]  1120
##  6 3 [North]         1 [Think have had coronavirus]       335
##  7 4 [London]        0 [Think have not had coronavirus]   701
##  8 4 [London]        1 [Think have had coronavirus]       299
##  9 5 [Wales/Scot/NI] 0 [Think have not had coronavirus]   685
## 10 5 [Wales/Scot/NI] 1 [Think have had coronavirus]       192

# demographics - how many agreed/strongly agreed that they had some immunity to COVID

agreecovid <- COVID %>% 
  group_by(q8haveimmunity) %>% 
  filter(q8haveimmunity > 3) %>% 
  count(q8haveimmunity)

agreecovid

## # A tibble: 2 x 2
## # Groups:   q8haveimmunity [2]
##       q8haveimmunity     n
##            <dbl+lbl> <int>
## 1 4 [Agree]            841
## 2 5 [Strongly agree]   299

# Those who thought they had had COVID-19 were more likely to agree that they had some immunity to COVID-19 (did not think they had had COVID-19: 10.7%, n = 500; thought they had had COVID-19: 42.9%, n = 640
agreecovid1 <- COVID %>% 
  group_by(q8haveimmunity, Ever_covid) %>% 
  filter(q8haveimmunity > 3) %>% 
  count(Ever_covid)

agreecovid1

## # A tibble: 4 x 3
## # Groups:   q8haveimmunity, Ever_covid [4]
##       q8haveimmunity                         Ever_covid     n
##            <dbl+lbl>                          <dbl+lbl> <int>
## 1 4 [Agree]          0 [Think have not had coronavirus]   382
## 2 4 [Agree]          1 [Think have had coronavirus]       459
## 3 5 [Strongly agree] 0 [Think have not had coronavirus]   118
## 4 5 [Strongly agree] 1 [Think have had coronavirus]       181

Descriptives: Errors

Error finding percentage for number thinking they have had COVID amount_covid <- COVID %>% group_by(Ever_covid) %>% count(name = “number”) %>% mutate(Percentage = round(number/sum(number) * 100, 1))

count(Ever_covid)

and another error from the same: amount_covidpercent %>% tabyl(Ever_covid, n) %>% adorn_totals(“col”) %>% adorn_percentages(“row”) %>% adorn_pct_formatting(digits = 2) %>% adorn_ns() %>% adorn_title( )

Part 3: Exploratory

I chose 3 questions I wanted to explore, and also chose 2 backup just in case.

does gender impact on shopping for non-essentials? I had a lot of trouble here.

Things I’m confused about:

what format does my tibble need to be in for ggplot?
does it matter if the columns are chr, dbl or int? and if it matters, how do we convert these to other forms?

# overview of data needed
exploratory1 <- COVID %>% 
  group_by(Adhere_shop_other, gender) %>% 
  count(Adhere_shop_other)

exploratory1

## # A tibble: 4 x 3
## # Groups:   Adhere_shop_other, gender [4]
##                                               Adhere_shop_other     gender     n
##                                                       <dbl+lbl>  <dbl+lbl> <int>
## 1 0 [Reported shopping once or more (not adhering to guidance)] 1 [Male]    1050
## 2 0 [Reported shopping once or more (not adhering to guidance)] 2 [Female]   783
## 3 1 [Reported not shopping for non-essentials (adhering)]       1 [Male]    1844
## 4 1 [Reported not shopping for non-essentials (adhering)]       2 [Female]  2472

# manually cleaning names + filtering for shopping for non-essentials (not adhering)
# i think this is where the variables were coded wrong, so 0 is shopping and 1 is not shopping
exploratory2 <- exploratory1 %>% 
  group_by(Adhere_shop_other, gender) %>% 
  rename(Adhering_to_Guidelines = Adhere_shop_other,
         Gender = gender) %>% 
  mutate(Gender = case_when(Gender == 1 ~ "Male", Gender == 2 ~ "Female")) %>% mutate(Adhering_to_Guidelines = case_when(Adhering_to_Guidelines == 0 ~ "Shopping", Adhering_to_Guidelines == 1 ~ "Not Shopping"))

exploratory2

## # A tibble: 4 x 3
## # Groups:   Adhering_to_Guidelines, Gender [4]
##   Adhering_to_Guidelines Gender     n
##   <chr>                  <chr>  <int>
## 1 Shopping               Male    1050
## 2 Shopping               Female   783
## 3 Not Shopping           Male    1844
## 4 Not Shopping           Female  2472

# arranging by gender, male first - this works but idk what to do with it
exploratory2 %>% 
  group_by(Adhering_to_Guidelines, Gender) %>% 
  arrange(desc(Gender))

## # A tibble: 4 x 3
## # Groups:   Adhering_to_Guidelines, Gender [4]
##   Adhering_to_Guidelines Gender     n
##   <chr>                  <chr>  <int>
## 1 Shopping               Male    1050
## 2 Not Shopping           Male    1844
## 3 Shopping               Female   783
## 4 Not Shopping           Female  2472

exploratory2

## # A tibble: 4 x 3
## # Groups:   Adhering_to_Guidelines, Gender [4]
##   Adhering_to_Guidelines Gender     n
##   <chr>                  <chr>  <int>
## 1 Shopping               Male    1050
## 2 Shopping               Female   783
## 3 Not Shopping           Male    1844
## 4 Not Shopping           Female  2472

# pivoting to wide data
exploratorywide <- exploratory2 %>% 
  pivot_wider(
    id_cols = Adhering_to_Guidelines,
    names_from = Gender,
    values_from = n
  )

print(exploratorywide)

## # A tibble: 2 x 3
## # Groups:   Adhering_to_Guidelines [2]
##   Adhering_to_Guidelines  Male Female
##   <chr>                  <int>  <int>
## 1 Shopping                1050    783
## 2 Not Shopping            1844   2472

# summing across rows to get total shopping and not shopping, then filtering out for just shopping
exploratoryfinal <- exploratorywide %>%  
  mutate(Total = sum(c(Male, Female))) %>% 
  filter(Adhering_to_Guidelines == "Shopping")

exploratoryfinal

## # A tibble: 1 x 4
## # Groups:   Adhering_to_Guidelines [1]
##   Adhering_to_Guidelines  Male Female Total
##   <chr>                  <int>  <int> <int>
## 1 Shopping                1050    783  1833

# pivoting back to long data - this is the final one that works

exploratoryfinal %>% 
  pivot_longer(
    cols = c(Male, Female),
    names_to = "Gender",
    values_to = "values"
  )

## # A tibble: 2 x 4
## # Groups:   Adhering_to_Guidelines [1]
##   Adhering_to_Guidelines Total Gender values
##   <chr>                  <int> <chr>   <int>
## 1 Shopping                1833 Male     1050
## 2 Shopping                1833 Female    783

Errors once again

Making a percentage column - this doesnt work exploratoryfinal %>% select(Total, n) %>% mutate( Percentage = (values*2))

attempt 2 at a percentage column - fail exploratoryfinal %>% mutate(Percentage = (n/nrow(exploratory1final)*100))

turning into a bar graph - doesnt work ggplot(exploratoryfinal, aes(x = Gender_final, y = Total)) + geom_bar()

does having a degree increase the ability to correctly identify symptoms of COVID?

# overview of data needed - 92 people were under NA, exclusion criteria?
degreeandcovid <- COVID %>% 
  group_by(degree, Sx_covid_nomissing) %>% 
  count(degree)
  
  degreeandcovid

## # A tibble: 6 x 3
## # Groups:   degree, Sx_covid_nomissing [6]
##                                         degree          Sx_covid_nomissing     n
##                                      <dbl+lbl>                   <dbl+lbl> <int>
## 1  0 [GCSE/vocational/A-level/no formal quali… 0 [Did not identify sx]      1845
## 2  0 [GCSE/vocational/A-level/no formal quali… 1 [Identified cough and fe…  2597
## 3  1 [Degree or higher (Bachelors, Masters, P… 0 [Did not identify sx]       624
## 4  1 [Degree or higher (Bachelors, Masters, P… 1 [Identified cough and fe…   991
## 5 NA                                           0 [Did not identify sx]        48
## 6 NA                                           1 [Identified cough and fe…    44

That’s all I’ve got so far for q2, I feel like I might run into the same questions so will probably attend q&a on Tuesday!

does having a child increase your worry about COVID?

And 2 backup questions;

does region impact on testing rates?
does age impact shopping for non-essentials? and could also look at if gender matters here too

Next Steps…

My next steps are to attend q&a on Tuesday, stop stressing about the verification report and finish off my exploratory analyses soon (hopefully!)

Thank you for reading my learning log :)