In this assignment, we are going to look at the data of COVID-19 cases recorded in China outside Hubei province and on a few other countries. The data set contains information on patient’s country of origin, gender, and age among others as you will see. The data has been obtained from a public data base reported in the Lancet. You will:
Marking rubrick: The marking scheme for this assignment is displayed in the script below and will be used to mark your individual assignments. In addition, it is essential that the report you submit can be knitted into an html report with the R code chunks set to eval = TRUE (otherwise you will receive 0 marks, regardless of whether the individual R code chunks run). In this assignment you simply need to feel the gaps marked with —. The R code and the R code ouput must be visible in the knitted report. For this assignment, you will need to upload the following into Moodle:
library(tidyverse)
library(readr)
library(kableExtra)
library(ggplot2)
library(lubridate)
library(gridExtra)
library(rmdformats)
# Reading data (you do not need to modify this file)
dat <- read_csv("Data/COVID19_2020_outsideHubei_23March.csv")
dat <- dat %>%
mutate(sex = ifelse(sex == "female", "Female", sex),
sex = ifelse(sex == "male", "Male", sex))
There are different ways to find out the dimension (number of rows and columns) of a data set and below are a few options that you can use:
dim(---) # 1pt
Using inline R code, complete the sentence where you report the number of rows and columns in the data set.
The data set has — (1pt) rows and — (1pt) variables.
Ensure that the table is captioned "These are the variables included in the COVID-19 data set". Make sure that you choose 2 kable_styling() options.
names(---) %>% # 1pt
---() %>% # 1pt
kable_styling(bootstrap_options = c(---, ---)) # 1pt
Create a new data set that contains only the following variables: country, age, sex, city, province, latitude, longitude, and display the first 5 rows.
dat2 <- dat %>%
dplyr::---(country, # 1pt
age,
sex,
city,
province,
latitude,
longitude)
head(---, ---) # 1pt
Inspect your data set in Question 4 and describe on a list (using markdown syntax) the type of variables (character, numeric, factor, etc.) in the data set and print the name of the variables in bold text.
Make sure the variables latitude, longitude, and age are defined as numeric variables in your data set dat2. Do not create a new data set but instead modify dat2 to accommodate the changes. Display the first 3 rows of the data set dat2.
dat2 <- dat2 %>%
---(latitude = ---(latitude), # 1pt
longitude = ---longitude), # 1pt
age = ---(age)) # 1pt
head(---, ---) # 1pt
Remove the cases of which we do not have information on the patient's age and keep those of which the gender of the patient is known. Name this newly created data set as dat3.
dat3 <- dat2 %>% dplyr::filter(!is.na(---), # 1pt
sex %in% c(---, # 1pt
---) # 1pt
)
dat4 <- dat3 %>%
dplyr::---(age >= 1) # 1pt
Using inline R, write a sentence describing the age of the oldest patient in this data set.
dat4 %>% dplyr::---(---) %>% # 1pt
summary() %>% # Nothing to add here
---() %>% # 1pt
---(bootstrap_options = c("striped", "hover")) # 1pt
The oldest patient in this data set is — (2pts) years old.
Count the number of cases per country, arrange the countries in decreasing order of cases, and display a table using kable() of the top 5 countries. Store the results into an object called dat5.
# Please turn eval = TRUE above when you have completed Questions 1-9
dat5 <- dat4 %>%
dplyr::select(country) %>%
dplyr::filter(!is.na(country)) %>%
group_by(country) %>%
mutate(n = n()) %>%
unique() %>%
arrange(-n)
kable(dat5[1:5,])
Use geom_point to plot the top 5 countries with the most cases using dat5. Store the plot in a variable called p1. Ensure that the plot is displayed in this section too. Also, make sure you output the plot in this section.
p1 = ---(dat5[1:5,], aes(x = country, y = n)) + # 1pt
---() + # 1pt
theme_bw() + # Nothing to add here
theme(axis.text.x = element_text(angle = 90)) # Nothing to add here
p1
Now repeat the same plot but without the command “theme_bw()”
# No new code here so no new points assigned
p2 = ---(dat5[1:5,], aes(x = country, y = n)) +
---() +
theme(axis.text.x = element_text(angle = 90))
p2
Plot figures p1 and p2 in the same plot using grid.arrange() from the gridExtra R package.
---(---, ---) # 3pts