Group Homework

Group Homework

Part 1

Part 1: Instruction

  • Use the EuStockMarkets data that contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Then, create multiple lines that show changes of each index’s daily closing prices over time.

  • Please use function gather from package tidyr to transform the data from a wide to a long format. For more info, refer to our lecture materials on dataformats (i.e., DS3003_dataformat_facets_note.pdf, DS3003_dataformat_facets_code.rmd, or DS3003_dataformat_facets_code.html

  • Use function plot_ly from package plotly to create a line plot.

Part 1: Example

  • see the html file.

Part 1: Results

library(tidyr) # load tidyr package
library(plotly) # load plotly package
library(ggplot2)
library(gridExtra)
library(foreign)


data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat = gather(dat)
dat$time <- time(EuStockMarkets) # add `time` variable

# add your codes
head(dat)
##   key   value     time
## 1 DAX 1628.75 1991.496
## 2 DAX 1613.63 1991.500
## 3 DAX 1606.51 1991.504
## 4 DAX 1621.04 1991.508
## 5 DAX 1618.16 1991.512
## 6 DAX 1610.61 1991.515
plot_ly(x = dat$time, y = dat$value, color = dat$key, type = 'scatter', mode = 'lines', colors = c("red", "blue", "dark green", "purple")) %>% layout(xaxis = list(title = "Time"), yaxis = list(title = "Price"))

Part 2

Part 2: Instruction

  • Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).

  • Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.

  • Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html and DS3003_visualizingerrors_reg_code.html.

# add your codes
library(boot)
## Warning: package 'boot' was built under R version 4.0.5
scs_data <- read.spss('SCS_QE.sav', to.data.frame = TRUE)
## re-encoding from CP1252
## Warning in read.spss("SCS_QE.sav", to.data.frame = TRUE): Undeclared level(s) 0
## added in variable: married
b.stat <- function(data, i)
{
   b.dat <- data[i ,]
   out.lm <- lm(mathpre ~ mars, b.dat)
   predict(out.lm, data.frame(mars=data2$mars))   
}

data2 <- scs_data[1:100,] # subset of the first 100 cases
b.out <- boot(data2, b.stat, R = 2000) # R = num of replications

boot.ci(b.out, index = 1, type = 'perc') # 95% CI for the first observation
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b.out, type = "perc", index = 1)
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 5.930,  6.946 )  
## Calculations and Intervals on Original Scale
b.ci <- t(sapply(1:nrow(data2), function(x) boot.ci(b.out, index = x, type = 'perc')$percent))[, 4:5]
dimnames(b.ci) <- list(rownames(data2), c('lower', 'upper'))
kable(head(b.ci, 4))
lower upper
5.930232 6.946117
5.073862 6.184344
5.286621 6.299838
6.330801 7.748444
data4 <- cbind(data2, b.ci) # combine two datasets
ggplot(data4, aes(x=mars, y=mathpre)) + geom_point(alpha=0.2) + labs(x = 'Math Anxiety Rating Scale', y = 'Math Pretest Score', title = "Math Anxiety vs. Score") + theme_bw() + 
        geom_smooth(method='lm', formula= y~x, se = FALSE) +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3, fill="#69b3a2")

Part 3

Part 3: Instruction

  • Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.

  • First, get the latest data from from https://covid19.who.int/table.

    • The file should likely be named “WHO COVID-19 global table data March XXth 2022 at XXXXX.csv”

    • Don’t use the data that I uploaded on Collab. It’s not the most recent data.

Part 3: Instruction (Cont’d)

  • Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).

  • Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.

library(tidyverse)
library(gridExtra)

# add your codes
db <- read_csv("WHO-COVID-19-global-table-data.csv")
View(db)
regionMortalityRate = 
  select(db, "Name", "WHO Region", "Cases - cumulative total", "Deaths - cumulative total") %>%
  rename("name"="Name",
         "who_region" = "WHO Region",
         "cases_cumulative" = "Cases - cumulative total",
         "deaths_cumulative" = "Deaths - cumulative total") %>%
  mutate(mortality_rate = deaths_cumulative/cases_cumulative) %>%
  mutate(se = sqrt(mortality_rate*(1-mortality_rate)/cases_cumulative))

head(regionMortalityRate)
## # A tibble: 6 x 6
##   name       who_region cases_cumulative deaths_cumulati~ mortality_rate      se
##   <chr>      <chr>                 <dbl>            <dbl>          <dbl>   <dbl>
## 1 Global     <NA>              464809377          6062536        0.0130  5.26e-6
## 2 United St~ Americas           78932322           960935        0.0122  1.23e-5
## 3 India      South-Eas~         43004005           516281        0.0120  1.66e-5
## 4 Brazil     Americas           29478039           655940        0.0223  2.72e-5
## 5 France     Europe             23145857           137578        0.00594 1.60e-5
## 6 The Unite~ Europe             20001631           163386        0.00817 2.01e-5
print(as.list(regionMortalityRate[1,])[3])
## $cases_cumulative
## [1] 464809377
filter(regionMortalityRate, who_region=="Americas")
## # A tibble: 56 x 6
##    name      who_region cases_cumulative deaths_cumulati~ mortality_rate      se
##    <chr>     <chr>                 <dbl>            <dbl>          <dbl>   <dbl>
##  1 United S~ Americas           78932322           960935        0.0122  1.23e-5
##  2 Brazil    Americas           29478039           655940        0.0223  2.72e-5
##  3 Argentina Americas            8990413           127363        0.0142  3.94e-5
##  4 Colombia  Americas            6078487           139361        0.0229  6.07e-5
##  5 Mexico    Americas            5619780           321619        0.0572  9.80e-5
##  6 Peru      Americas            3538453           211661        0.0598  1.26e-4
##  7 Canada    Americas            3379200            37020        0.0110  5.66e-5
##  8 Chile     Americas            3353259            44246        0.0132  6.23e-5
##  9 Cuba      Americas            1079111             8503        0.00788 8.51e-5
## 10 Bolivia ~ Americas             898941            21477        0.0239  1.61e-4
## # ... with 46 more rows
regions3mrate = filter(regionMortalityRate, name=="Zambia"|name=="South Africa"|name=="Kenya"|name=="Argentina"|name=="Peru"|name=="Mexico" |name=="Jordan"|name=="Iraq"|name=="Pakistan" | name=="France"|name=="Italy"|name=="Germany"|name=="Japan"|name=="Australia"|name=="Singapore"|name=="India"|name=="Nepal"|name=="Maldives")
head(regions3mrate)
## # A tibble: 6 x 6
##   name      who_region  cases_cumulative deaths_cumulati~ mortality_rate      se
##   <chr>     <chr>                  <dbl>            <dbl>          <dbl>   <dbl>
## 1 India     South-East~         43004005           516281        0.0120  1.66e-5
## 2 France    Europe              23145857           137578        0.00594 1.60e-5
## 3 Germany   Europe              18287986           126646        0.00693 1.94e-5
## 4 Italy     Europe              13645834           157442        0.0115  2.89e-5
## 5 Argentina Americas             8990413           127363        0.0142  3.94e-5
## 6 Japan     Western Pa~          5966960            26764        0.00449 2.74e-5
mortality_se = se = sd(regions3mrate$mortality_rate) / sqrt(length(regions3mrate$mortality_rate))
print(mortality_se)
## [1] 0.003983674
ggplot(regions3mrate) +
    geom_bar( aes(x=name, y=mortality_rate), stat="identity", fill="skyblue", alpha=0.7) + facet_grid(~who_region, scale="free") + geom_errorbar( aes(x=name, ymin=mortality_rate-se, ymax=mortality_rate+se), width=0.4, colour="orange", alpha=0.9, size=1.3)

  • One of the group members will present R codes and plots for Part 3 in class on Mar. 21 (Mon). Also, if you’re a presenter, please bring your laptop so that you can share your screen on zoom for the presentation.