You will work with your group to complete this assignment.
Upload your html file on RPubs and include the link when you submit your submission files on Collab.
Submit your group’s shared .Rmd AND “knitted”.html files on Collab.
Your “knitted .html” submission must be created from your “group .Rmd” but be created on your own computer.
Confirm this with the following comment included in your submission text box: “Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”
Name the files with a group name and YOUR name for your submission.
One of the group members will present R codes and plots for Parts 2 and 3 in class on Nov. 2 (Tue). Please e-mail the instructor if you’re a presenter by 11:59pm, Nov. 1. Also, if you’re a presenter, please bring your laptop so that you can share your screen on zoom for the presentation.
Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).
Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.
Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html, DS3003_visualizingerrors_reg_code.html, and DS3003_visualizingerrors_reg_code.html
# add your codes
library(foreign)
scs <- read.spss("SCS_QE.sav", to.data.frame=T)
library(boot)
library(tidyverse)
scs <- read.spss("SCS_QE.sav",to.data.frame = T)
b.stat <- function(data, i)
{
b.dat <- data[i ,]
out.lm <- lm(mathpre ~ mars, b.dat)
predict(out.lm, data.frame(mars=scs$mars))
}
scs2 <- scs[1:100,] # subset of the first 100 cases
b.out <- boot(scs2, b.stat, R = 2000) # R = num of replications
b.ci <- t(sapply(1:nrow(scs2), function(x) boot.ci(b.out, index = x, type = 'perc')$percent))[, 4:5]
dimnames(b.ci) <- list(rownames(scs2), c('lower', 'upper'))
scs4 <- cbind(scs2, b.ci)
ggplot(scs4, aes(x = mars, y = mathpre, alpha = .3)) + geom_point() + geom_jitter() + theme_bw() + labs(x = "Math Anxiety Score", y = "Math Achievement Score", title = "Scatterplot of Math Anxiety and Achievment") +
geom_smooth(method='lm', formula= y~x, se = FALSE) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3, fill="#69b3a2")
Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.
First, get the latest data from from https://covid19.who.int/table.
The file should likely be named “WHO COVID-19 global table data October XXth 2021 at XXXXX.csv”
Don’t use the data that I uploaded on Collab. It’s not the most recent data.
Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).
Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.
data1 <- read_csv("WHO_Nov_1st.csv")
focus <- c('Ethiopia', 'Kenya', 'Cameroon', 'Brazil', 'United States of America',
'Canada', 'Egypt', 'Jordan', 'Saudi Arabia',
'France', 'Spain', 'Germany', 'Indonesia', 'Thailand', 'Timor-Leste',
'Republic of Korea', 'Japan', 'China')
data2 <- data1 %>% filter(Name %in% focus) %>%
mutate(rate = `Deaths - cumulative total`/`Cases - cumulative total`,
SE = sqrt(rate*(1-rate)/`Cases - cumulative total`))
ggplot(data2, aes(x=rate, y=Name, fill = `WHO Region`)) + geom_col() + theme_bw() +
xlab('WHO: # Reported Deaths / # Reported Cases') +
geom_errorbar(aes(xmin=rate-1.96*SE, xmax=rate+1.96*SE), width=.2) +
facet_grid(~ `WHO Region`, scale="free") + labs(x = "Mortality Rate",y = "WHO: # Reported Deaths / # Reported Cases", title = "Mortality Rate by WHO Region") + coord_flip() + theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust=1))
See TABLE 2. COVID-19 vaccine effectiveness against COVID-19–associated hospitalization among adults without immunocompromising conditions, by vaccine product — 21 hospitals in 18 U.S. states, March–August 2021 from a recent study on Comparative Effectiveness of Moderna, Pfizer-BioNTech, and Janssen (Johnson & Johnson) Vaccines.
Draw your best plot to visualize results of VE against COVID-19 hospitalization (95% CI), i.e., the third column of TABLE 2.
First, save data about Vaccine/Period and VE against COVID-19 hospitalization (95% CI).
.csv file including VE against COVID-19 hospitalization (95% CI) and Vaccine/Period.data.frame() or tibble().Second, draw bar plots with error bars, using the data you saved in the first step.
You could use the facets function as in Part 2.
Or, you could draw separate bar plots for each vaccine and collect three bar plots into a single figure using something like, e.g., gridExtra::grid.arrange(). Here are some helpul notes and explanations.
Moderna <- c(93,93,92,NA)
Pfizer <- c(88,91,77,NA)
Janssen <- c(71,NA, NA,68)
Period <- c("Full surveillance period","14-120 days after full vaccination",">120 days after full vaccination",">28 days after full vaccination")
data_ve <- data.frame(Period,Moderna, Pfizer, Janssen)
data_ve2 <- gather(data_ve, provider, ve, Moderna:Janssen, factor_key=TRUE)
lower <- c(91,90,87,NA,85,88,67,NA,56,NA,NA,49)
upper <- c(95,95,96,NA,91,93,84,NA,81,NA,NA,80)
data_ve2$lower <- lower
data_ve2$upper <- upper
data_ve3 <- na.omit(data_ve2)
ggplot(data_ve3, aes(x = Period, y = ve, fill = Period, label = ve)) + facet_grid(~provider, space = "free", scale = "free") + geom_col() + ylab("VE against COVID-19 hospitalization (95% CI)") + theme(axis.text.x=element_blank()) + geom_errorbar(aes(ymin=lower, ymax=upper),width=.1) + geom_text(vjust =7)