1 Group Homework

You will work with your group to complete this assignment.
Upload your html file on RPubs and include the link when you submit your submission files on Collab.
Submit your group’s shared .Rmd AND “knitted”.html files on Collab.
Your “knitted .html” submission must be created from your “group .Rmd” but be created on your own computer.
Confirm this with the following comment included in your submission text box: “Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”
Name the files with a group name and YOUR name for your submission.
One of the group members will present R codes and plots for Parts 2 and 3 in class on Nov. 2 (Tue). Please e-mail the instructor if you’re a presenter by 11:59pm, Nov. 1. Also, if you’re a presenter, please bring your laptop so that you can share your screen on zoom for the presentation.

2 Part 1

Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).
Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.
Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html, DS3003_visualizingerrors_reg_code.html, and DS3003_visualizingerrors_reg_code.html

library(foreign)
library(boot)
library(ggplot2)
# reading in data
scs_data <- read.spss('SCS_QE.sav',to.data.frame = TRUE)
# boot function
# b.stat function from notes
b.stat <- function(data, i)
{
   b.dat <- data[i ,]
   out.lm <- lm(mars ~ mathpre, b.dat)
   predict(out.lm, data.frame(mathpre=scs_data2$mathpre))   
}
# first 100 cases
scs_data2 <- scs_data[1:100,]
b.out <- boot(scs_data2, b.stat, R = 2000) # 2000 bootstrap replicates

boot.ci(b.out, index = 1, type = 'perc') # 95% CI for the first observation

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b.out, type = "perc", index = 1)
## 
## Intervals : 
## Level     Percentile     
## 95%   (52.46, 60.28 )  
## Calculations and Intervals on Original Scale

# lower and upper bounds of confidence intervals
b.ci <- t(sapply(1:nrow(scs_data2), function(x) boot.ci(b.out, index = x, type = 'perc')$percent))[, 4:5]
dimnames(b.ci) <- list(rownames(scs_data2), c('lower', 'upper'))

# combining datasets
scs_data3 <- cbind(scs_data2, b.ci)

# plotting data, linear line, and bootstrap confidence intervals
ggplot(scs_data3, aes(x=mathpre, y=mars)) + geom_point(alpha=0.2) + labs(x = 'Math Achievement Score', y = 'Math Anxiety Score', title='Math Anxiety vs. Math Achievement') + theme_bw() + 
        geom_smooth(method='lm', formula= y~x, se = FALSE) +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3, fill="#69b3a2")

As shown in the graph, there is a negative relationship between Math Anxiety Score and Math Achievement Score. The 95% boostrap confidence intervals are smallest between Math Achievement Scores 5 through 10, corresponding to Math Anxiety Scores 50 through 60.

3 Part 2

Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.
First, get the latest data from from https://covid19.who.int/table.
- The file should likely be named “WHO COVID-19 global table data October XXth 2021 at XXXXX.csv”
- Don’t use the data that I uploaded on Collab. It’s not the most recent data.
Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).
Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.

library(DT)
library(dplyr)
library(TSA)
library(tidyverse)
# reading in data
covid_data <- read_csv('WHO_COVID-19_data_October_31st.csv')

# using datatable to help me choose three countries per region
datatable(covid_data)

regions_and_countries <- data.frame('Region' = c('Africa','Africa','Africa', 'Americas','Americas','Americas','Eastern Mediterranean','Eastern Mediterranean','Eastern Mediterranean','Europe','Europe','Europe','South-East Asia','South-East Asia','South-East Asia','Western Pacific', 'Western Pacific', 'Western Pacific'), 
                           'Country' = c('South Africa', 'Ethiopia', 'Kenya', 'Colombia', 'Brazil', 'Argentina', 'Morocco', 'Iraq', 'Pakistan', 'France', 'Spain', 'Turkey', 'India', 'Indonesia', 'Thailand', 'China', 'Malaysia', 'Japan'))

# getting error bars
covid_data2 <- covid_data %>% filter(Name %in% regions_and_countries$Country) %>% 
  mutate(rate = `Deaths - cumulative total`/`Cases - cumulative total`, 
         SE = sqrt(rate*(1-rate)/`Cases - cumulative total`))
# plotting
ggplot(covid_data2, aes(x=rate, y=Name)) + geom_col() + theme_bw() + 
  xlab('Mortality Rate') + ylab('Country') + labs(title='COVID-19 Mortality Rate vs. Country by WHO Region') +
  geom_errorbar(aes(xmin=rate-1.96*SE, xmax=rate+1.96*SE), width=.2) + 
  facet_grid(~ covid_data2$`WHO Region`, scale="free") + coord_flip() + 
  # changing country tick labels and facet grid WHO region label size
  theme(axis.text.x = element_text(angle = 45,hjust=1),strip.text.x = element_text(size = 6.5))

4 Part 3

See TABLE 2. COVID-19 vaccine effectiveness against COVID-19–associated hospitalization among adults without immunocompromising conditions, by vaccine product — 21 hospitals in 18 U.S. states, March–August 2021 from a recent study on Comparative Effectiveness of Moderna, Pfizer-BioNTech, and Janssen (Johnson & Johnson) Vaccines.
Draw your best plot to visualize results of VE against COVID-19 hospitalization (95% CI), i.e., the third column of TABLE 2.
First, save data about Vaccine/Period and VE against COVID-19 hospitalization (95% CI).
- You could create a .csv file including VE against COVID-19 hospitalization (95% CI) and Vaccine/Period.
- Or, you could save data in R directly, say using data.frame() or tibble().
Second, draw bar plots with error bars, using the data you saved in the first step.
- You could use the facets function as in Part 2.
- Or, you could draw separate bar plots for each vaccine and collect three bar plots into a single figure using something like, e.g., gridExtra::grid.arrange(). Here are some helpul notes and explanations.

vaccine_data <- data.frame('Vaccine' = c('Moderna', 'Moderna','Moderna','Pfizer-BioNTech','Pfizer-BioNTech','Pfizer-BioNTech', 'Janssen','Janssen'), 
                           'VaccinePeriod' = c('Full surveillance period','14-120 days after full vaccination','>120 days after full vaccination','Full surveillance period','14-120 days after full vaccination','>120 days after full vaccination','Full surveillance period','>28 days after full vaccination'),
                           'VE' = c(93,93,92,88,91,77,71,68), 
                           'LowerVECI' = c(91,90,87,85,88,67,56,49),
                           'UpperVECI' = c(95,95,96,91,93,84,81,80))

# plotting
attach(vaccine_data)
ggplot(vaccine_data, aes(x=VE, y=VaccinePeriod)) + geom_col() + theme_bw() + 
  xlab('VE against COVID-19 hospitalization') + ylab('Vaccine/Period') + labs(title='VE vs. Vaccine/Period by Vaccine') +
  geom_errorbar(aes(xmin=LowerVECI, xmax=UpperVECI), width=.2) + 
  facet_grid(~ Vaccine, scale="free") + coord_flip() + 
  # changing country tick labels and facet grid WHO region label size
  theme(axis.text.x = element_text(angle = 45,hjust=1),strip.text.x = element_text(size = 7))

Assignment 12 - Visualizing errors

Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer

Group #6, Abby Newbury (amn5ye)

Due Date: 11:59pm, Nov 1

1 Group Homework

2 Part 1

3 Part 2

4 Part 3