Your “knitted .html” submission must be created from your “group .Rmd” but be created on your own computer.
Confirm this with the following comment included in your submission text box: “Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”
Name the files with a group name and YOUR name for your submission.
Each group member must be able to submit this assignment as created from their own computer. If only some members of the group submit the required files, those group members must additionally provide a supplemental explanation along with their submission as to why other students in their group have not completed this assignment.
Use the EuStockMarkets data that contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Then, create multiple lines that show changes of each index’s daily closing prices over time.
Please use function gather from package tidyr to transform the data from a wide to a long format. For more info, refer to our lecture materials on dataformats (i.e., DS3003_dataformat_facets_note.pdf, DS3003_dataformat_facets_code.rmd, or DS3003_dataformat_facets_code.html
Use function plot_ly from package plotly to create a line plot.
html file.library(tidyr) # load tidyr package
library(plotly) # load plotly package
library(ggplot2)
library(gridExtra)
library(foreign)
data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat = gather(dat)
dat$time <- time(EuStockMarkets) # add `time` variable
# add your codes
head(dat)
## key value time
## 1 DAX 1628.75 1991.496
## 2 DAX 1613.63 1991.500
## 3 DAX 1606.51 1991.504
## 4 DAX 1621.04 1991.508
## 5 DAX 1618.16 1991.512
## 6 DAX 1610.61 1991.515
plot_ly(x = dat$time, y = dat$value, color = dat$key, type = 'scatter', mode = 'lines', colors = c("red", "blue", "dark green", "purple")) %>% layout(xaxis = list(title = "Time"), yaxis = list(title = "Price"))
Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).
Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.
Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html and DS3003_visualizingerrors_reg_code.html.
# add your codes
library(boot)
## Warning: package 'boot' was built under R version 4.0.5
scs_data <- read.spss('SCS_QE.sav', to.data.frame = TRUE)
## re-encoding from CP1252
## Warning in read.spss("SCS_QE.sav", to.data.frame = TRUE): Undeclared level(s) 0
## added in variable: married
b.stat <- function(data, i)
{
b.dat <- data[i ,]
out.lm <- lm(mathpre ~ mars, b.dat)
predict(out.lm, data.frame(mars=data2$mars))
}
data2 <- scs_data[1:100,] # subset of the first 100 cases
b.out <- boot(data2, b.stat, R = 2000) # R = num of replications
boot.ci(b.out, index = 1, type = 'perc') # 95% CI for the first observation
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
##
## CALL :
## boot.ci(boot.out = b.out, type = "perc", index = 1)
##
## Intervals :
## Level Percentile
## 95% ( 5.930, 6.946 )
## Calculations and Intervals on Original Scale
b.ci <- t(sapply(1:nrow(data2), function(x) boot.ci(b.out, index = x, type = 'perc')$percent))[, 4:5]
dimnames(b.ci) <- list(rownames(data2), c('lower', 'upper'))
kable(head(b.ci, 4))
| lower | upper |
|---|---|
| 5.930232 | 6.946117 |
| 5.073862 | 6.184344 |
| 5.286621 | 6.299838 |
| 6.330801 | 7.748444 |
data4 <- cbind(data2, b.ci) # combine two datasets
ggplot(data4, aes(x=mars, y=mathpre)) + geom_point(alpha=0.2) + labs(x = 'Math Anxiety Rating Scale', y = 'Math Pretest Score', title = "Math Anxiety vs. Score") + theme_bw() +
geom_smooth(method='lm', formula= y~x, se = FALSE) +
geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3, fill="#69b3a2")
Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.
First, get the latest data from from https://covid19.who.int/table.
The file should likely be named “WHO COVID-19 global table data March XXth 2022 at XXXXX.csv”
Don’t use the data that I uploaded on Collab. It’s not the most recent data.
Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).
Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.
library(tidyverse)
library(gridExtra)
# add your codes
db <- read_csv("WHO-COVID-19-global-table-data.csv")
View(db)
regionMortalityRate =
select(db, "Name", "WHO Region", "Cases - cumulative total", "Deaths - cumulative total") %>%
rename("name"="Name",
"who_region" = "WHO Region",
"cases_cumulative" = "Cases - cumulative total",
"deaths_cumulative" = "Deaths - cumulative total") %>%
mutate(mortality_rate = deaths_cumulative/cases_cumulative) %>%
mutate(se = sqrt(mortality_rate*(1-mortality_rate)/cases_cumulative))
head(regionMortalityRate)
## # A tibble: 6 x 6
## name who_region cases_cumulative deaths_cumulati~ mortality_rate se
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Global <NA> 464809377 6062536 0.0130 5.26e-6
## 2 United St~ Americas 78932322 960935 0.0122 1.23e-5
## 3 India South-Eas~ 43004005 516281 0.0120 1.66e-5
## 4 Brazil Americas 29478039 655940 0.0223 2.72e-5
## 5 France Europe 23145857 137578 0.00594 1.60e-5
## 6 The Unite~ Europe 20001631 163386 0.00817 2.01e-5
print(as.list(regionMortalityRate[1,])[3])
## $cases_cumulative
## [1] 464809377
filter(regionMortalityRate, who_region=="Americas")
## # A tibble: 56 x 6
## name who_region cases_cumulative deaths_cumulati~ mortality_rate se
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 United S~ Americas 78932322 960935 0.0122 1.23e-5
## 2 Brazil Americas 29478039 655940 0.0223 2.72e-5
## 3 Argentina Americas 8990413 127363 0.0142 3.94e-5
## 4 Colombia Americas 6078487 139361 0.0229 6.07e-5
## 5 Mexico Americas 5619780 321619 0.0572 9.80e-5
## 6 Peru Americas 3538453 211661 0.0598 1.26e-4
## 7 Canada Americas 3379200 37020 0.0110 5.66e-5
## 8 Chile Americas 3353259 44246 0.0132 6.23e-5
## 9 Cuba Americas 1079111 8503 0.00788 8.51e-5
## 10 Bolivia ~ Americas 898941 21477 0.0239 1.61e-4
## # ... with 46 more rows
regions3mrate = filter(regionMortalityRate, name=="Zambia"|name=="South Africa"|name=="Kenya"|name=="Argentina"|name=="Peru"|name=="Mexico" |name=="Jordan"|name=="Iraq"|name=="Pakistan" | name=="France"|name=="Italy"|name=="Germany"|name=="Japan"|name=="Australia"|name=="Singapore"|name=="India"|name=="Nepal"|name=="Maldives")
head(regions3mrate)
## # A tibble: 6 x 6
## name who_region cases_cumulative deaths_cumulati~ mortality_rate se
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 India South-East~ 43004005 516281 0.0120 1.66e-5
## 2 France Europe 23145857 137578 0.00594 1.60e-5
## 3 Germany Europe 18287986 126646 0.00693 1.94e-5
## 4 Italy Europe 13645834 157442 0.0115 2.89e-5
## 5 Argentina Americas 8990413 127363 0.0142 3.94e-5
## 6 Japan Western Pa~ 5966960 26764 0.00449 2.74e-5
mortality_se = se = sd(regions3mrate$mortality_rate) / sqrt(length(regions3mrate$mortality_rate))
print(mortality_se)
## [1] 0.003983674
ggplot(regions3mrate) +
geom_bar( aes(x=name, y=mortality_rate), stat="identity", fill="skyblue", alpha=0.7) + facet_grid(~who_region, scale="free") + geom_errorbar( aes(x=name, ymin=mortality_rate-se, ymax=mortality_rate+se), width=0.4, colour="orange", alpha=0.9, size=1.3)