1 Group Homework

  • You will work with your group to complete this assignment.

  • Upload your html file on RPubs and include the link when you submit your submission files on Collab.

  • Submit your group’s shared .Rmd AND “knitted”.html files on Collab.

  • Your “knitted .html” submission must be created from your “group .Rmd” but be created on your own computer.

  • Confirm this with the following comment included in your submission text box: “Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”

  • Name the files with a group name and YOUR name for your submission.

  • One of the group members will present R codes and plots for Parts 2 and 3 in class on Nov. 2 (Tue). Please e-mail the instructor if you’re a presenter by 11:59pm, Nov. 1. Also, if you’re a presenter, please bring your laptop so that you can share your screen on zoom for the presentation.

2 Part 1

  • Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).

  • Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.

  • Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html, DS3003_visualizingerrors_reg_code.html, and DS3003_visualizingerrors_reg_code.html

# add your codes
library(foreign)
scs <- read.spss("SCS_QE.sav", to.data.frame=T)
library(boot)
library(tidyverse)

scs <- read.spss("SCS_QE.sav",to.data.frame = T)

b.stat <- function(data, i)
{
   b.dat <- data[i ,]
   out.lm <- lm(mathpre ~ mars, b.dat)
   predict(out.lm, data.frame(mars=scs$mars))  
}

scs2 <- scs[1:100,] # subset of the first 100 cases
b.out <- boot(scs2, b.stat, R = 2000) # R = num of replications

b.ci <- t(sapply(1:nrow(scs2), function(x) boot.ci(b.out, index = x, type = 'perc')$percent))[, 4:5]
dimnames(b.ci) <- list(rownames(scs2), c('lower', 'upper'))

scs4 <- cbind(scs2, b.ci)

ggplot(scs4, aes(x = mars, y = mathpre, alpha = .3)) + geom_point() + geom_jitter() + theme_bw() + labs(x = "Math Anxiety Score", y = "Math Achievement Score", title = "Scatterplot of Math Anxiety and Achievment") +
        geom_smooth(method='lm', formula= y~x, se = FALSE) +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3, fill="#69b3a2")

3 Part 2

  • Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.

  • First, get the latest data from from https://covid19.who.int/table.

    • The file should likely be named “WHO COVID-19 global table data October XXth 2021 at XXXXX.csv”

    • Don’t use the data that I uploaded on Collab. It’s not the most recent data.

  • Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).

  • Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.

data1 <- read_csv("WHO_Nov_1st.csv")


focus <- c('Ethiopia', 'Kenya', 'Cameroon', 'Brazil', 'United States of America',
           'Canada', 'Egypt', 'Jordan', 'Saudi Arabia',
           'France', 'Spain', 'Germany', 'Indonesia', 'Thailand', 'Timor-Leste',
           'Republic of Korea', 'Japan', 'China')

data2 <- data1 %>% filter(Name %in% focus) %>%
  mutate(rate = `Deaths - cumulative total`/`Cases - cumulative total`,
         SE = sqrt(rate*(1-rate)/`Cases - cumulative total`))

ggplot(data2, aes(x=rate, y=Name, fill = `WHO Region`)) + geom_col() + theme_bw() + 
  xlab('WHO: # Reported Deaths / # Reported Cases') + 
  geom_errorbar(aes(xmin=rate-1.96*SE, xmax=rate+1.96*SE), width=.2) + 
  facet_grid(~ `WHO Region`, scale="free") + labs(x = "Mortality Rate",y = "WHO: # Reported Deaths / # Reported Cases", title = "Mortality Rate by WHO Region") + coord_flip() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))

4 Part 3

  • See TABLE 2. COVID-19 vaccine effectiveness against COVID-19–associated hospitalization among adults without immunocompromising conditions, by vaccine product — 21 hospitals in 18 U.S. states, March–August 2021 from a recent study on Comparative Effectiveness of Moderna, Pfizer-BioNTech, and Janssen (Johnson & Johnson) Vaccines.

  • Draw your best plot to visualize results of VE against COVID-19 hospitalization (95% CI), i.e., the third column of TABLE 2.

  • First, save data about Vaccine/Period and VE against COVID-19 hospitalization (95% CI).

    • You could create a .csv file including VE against COVID-19 hospitalization (95% CI) and Vaccine/Period.
    • Or, you could save data in R directly, say using data.frame() or tibble().
  • Second, draw bar plots with error bars, using the data you saved in the first step.

    • You could use the facets function as in Part 2.

    • Or, you could draw separate bar plots for each vaccine and collect three bar plots into a single figure using something like, e.g., gridExtra::grid.arrange(). Here are some helpul notes and explanations.

Moderna <- c(93,93,92,NA)
Pfizer <- c(88,91,77,NA)
Janssen <- c(71,NA, NA,68)
Period <- c("Full surveillance period","14-120 days after full vaccination",">120 days after full vaccination",">28 days after full vaccination")

data_ve <- data.frame(Period,Moderna, Pfizer, Janssen)
data_ve2 <- gather(data_ve, provider, ve, Moderna:Janssen, factor_key=TRUE)

lower <- c(91,90,87,NA,85,88,67,NA,56,NA,NA,49)
upper <- c(95,95,96,NA,91,93,84,NA,81,NA,NA,80)

data_ve2$lower <- lower
data_ve2$upper <- upper

data_ve3 <- na.omit(data_ve2)

ggplot(data_ve3, aes(x = Period, y = ve, fill = Period, label = ve)) + facet_grid(~provider, space = "free", scale = "free") + geom_col() + ylab("VE against COVID-19 hospitalization (95% CI)") + theme(axis.text.x=element_blank()) + geom_errorbar(aes(ymin=lower, ymax=upper),width=.1) + geom_text(vjust =7)