0.1 Group Homework

  • You will work with your group to complete this assignment.

  • Upload your html file on RPubs and include the link when you submit your submission files on Collab.

  • Submit your group’s shared .Rmd AND “knitted”.html files on Collab.

  • Note that this html file is now uploaded on RPubs.

0.2 Group Homework

  • Your “knitted .html” submission must be created from your “group .Rmd” but be created on your own computer.

  • Confirm this with the following comment included in your submission text box: “Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”

  • Name the files with a group name and YOUR name for your submission.

  • Each group member must be able to submit this assignment as created from their own computer. If only some members of the group submit the required files, those group members must additionally provide a supplemental explanation along with their submission as to why other students in their group have not completed this assignment.

1 Part 1

1.1 Part 1: Instruction

  • Use the EuStockMarkets data that contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Then, create multiple lines that show changes of each index’s daily closing prices over time.

  • Please use function gather from package tidyr to transform the data from a wide to a long format. For more info, refer to our lecture materials on dataformats (i.e., DS3003_dataformat_facets_note.pdf, DS3003_dataformat_facets_code.rmd, or DS3003_dataformat_facets_code.html

  • Use function plot_ly from package plotly to create a line plot.

1.2 Part 1: Example

  • see the html file.

1.3 Part 1: Results

library(tidyr) # load tidyr package
library(plotly) # load plotly package
library(foreign)
library(gridExtra)

data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat = gather(dat)
dat$time <- time(EuStockMarkets) # add `time` variable

head(dat)
##   key   value     time
## 1 DAX 1628.75 1991.496
## 2 DAX 1613.63 1991.500
## 3 DAX 1606.51 1991.504
## 4 DAX 1621.04 1991.508
## 5 DAX 1618.16 1991.512
## 6 DAX 1610.61 1991.515
plot_ly(x=dat$time, y=dat$value, color=dat$key, type='scatter', mode='lines',colors=c("red","blue","dark green","purple")) %>%
  layout(xaxis=list(title="Time"), yaxis= list(title="Price"))

2 Part 2

2.1 Part 2: Instruction

  • Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).

  • Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.

  • Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html and DS3003_visualizingerrors_reg_code.html.

library(boot)
scs_data = read.spss('SCS_QE.sav', to.data.frame=TRUE)
## re-encoding from CP1252
## Warning in read.spss("SCS_QE.sav", to.data.frame = TRUE): Undeclared level(s) 0
## added in variable: married
head(scs_data)
##   vocabpre mathpre numbmath likemath likelit preflit pextra pagree pconsc pemot
## 1       24       7        2        6       7       2     16     39     35    29
## 2       26       3        2        2      10       3     22     41     35    29
## 3       17       5        1        3       8       3     31     39     39    29
## 4       23       4        2        8      10       2     22     46     34    33
## 5       23       5        2        2       7       3     29     48     48    40
## 6       28       7        2        2       9       3     28     43     34    36
##   pintell mars beck              rq          vm      cauc         afram other
## 1      42   51    6 quasiexperiment Mathematics Caucasian         Other     0
## 2      30   76    5 quasiexperiment Mathematics Caucasian         Other     0
## 3      37   71    4 quasiexperiment Mathematics     Other Afro-American     0
## 4      32   33    3 quasiexperiment  Vocabulary Caucasian         Other     0
## 5      31   77    0 quasiexperiment  Vocabulary Caucasian         Other     0
## 6      41   44    1 quasiexperiment  Vocabulary Caucasian         Other     0
##   age   male married   parents  momdegr  daddegr credit       majormi  actcomp
## 1  19   male       0  15000.00 14.00000 14.00000     30 non-technical 27.00000
## 2  28 female       0 113471.45 12.00000  6.00000     45 non-technical 21.28776
## 3  19 female       0  89485.82 14.09398 13.59029     30     technical 20.00000
## 4  21 female       0  36000.00 14.00000 12.00000      0 non-technical 19.94736
## 5  34 female       0 176952.91 13.58694 16.00000      0 non-technical 19.58231
## 6  20   male       0  60000.00 14.00000 18.00000      0 non-technical 26.00000
##   hsgpaar collgpaa vocaball mathall
## 1 4.00000    2.730        7      17
## 2 3.50000    3.600       13      10
## 3 2.95000    2.760        7      13
## 4 2.27000    1.640       18       3
## 5 2.60512    3.657       17       4
## 6 2.57000    3.423       20       2
b.stat = function(data,i){
  b.dat = data[i,]
  out.lm = lm(mathpre ~ mars, b.dat)
  predict(out.lm, data.frame(mars=data2$mars))
}

data2 = scs_data[1:100, ] #subset of the first 100 cases
b.out = boot(data2, b.stat, R = 2000)
b.out
## 
## ORDINARY NONPARAMETRIC BOOTSTRAP
## 
## 
## Call:
## boot(data = data2, statistic = b.stat, R = 2000)
## 
## 
## Bootstrap Statistics :
##       original      bias    std. error
## t1*   6.457544 0.008755915   0.2683447
## t2*   5.610226 0.012475856   0.2824500
## t3*   5.779689 0.011731868   0.2617920
## t4*   7.067613 0.006077558   0.3760440
## t5*   5.576333 0.012624653   0.2874160
## t6*   6.694793 0.007714332   0.3025511
## t7*   6.050831 0.010541487   0.2462173
## t8*   5.813582 0.011583070   0.2585881
## t9*   6.864257 0.006970344   0.3336158
## t10*  6.355866 0.009202308   0.2579657
## t11*  6.762579 0.007416737   0.3144213
## t12*  5.678011 0.012178260   0.2733121
## t13*  5.610226 0.012475856   0.2824500
## t14*  6.728686 0.007565534   0.3083863
## t15*  6.525330 0.008458320   0.2768022
## t16*  6.420827 0.008917113   0.2642633
## t17*  6.186402 0.009946296   0.2476242
## t18*  5.271298 0.013963832   0.3417568
## t19*  6.050831 0.010541487   0.2462173
## t20*  6.254188 0.009648701   0.2506548
## t21*  6.965935 0.006523951   0.3542348
## t22*  5.983046 0.010839082   0.2478675
## t23*  4.356194 0.017981368   0.5599513
## t24*  7.067613 0.006077558   0.3760440
## t25*  6.220295 0.009797499   0.2489496
## t26*  5.813582 0.011583070   0.2585881
## t27*  6.627008 0.008011927   0.2915283
## t28*  5.542440 0.012773451   0.2926289
## t29*  5.949153 0.010987879   0.2492722
## t30*  6.796471 0.007267939   0.3206450
## t31*  4.729014 0.016344594   0.4651194
## t32*  6.084724 0.010392689   0.2459797
## t33*  6.694793 0.007714332   0.3025511
## t34*  7.169292 0.005631165   0.3988483
## t35*  6.830364 0.007119141   0.3270466
## t36*  5.339084 0.013666237   0.3283954
## t37*  7.169292 0.005631165   0.3988483
## t38*  5.237405 0.014112629   0.3486624
## t39*  6.491437 0.008607118   0.2724285
## t40*  6.694793 0.007714332   0.3025511
## t41*  5.983046 0.010839082   0.2478675
## t42*  5.745797 0.011880665   0.2653226
## t43*  6.999828 0.006375153   0.3613827
## t44*  6.457544 0.008755915   0.2683447
## t45*  6.999828 0.006375153   0.3613827
## t46*  6.830364 0.007119141   0.3270466
## t47*  5.542440 0.012773451   0.2926289
## t48*  5.255764 0.014032031   0.3449042
## t49*  4.220623 0.018576558   0.5955736
## t50*  4.085052 0.019171749   0.6316413
## t51*  6.457544 0.008755915   0.2683447
## t52*  6.660901 0.007863129   0.2969277
## t53*  6.016938 0.010690284   0.2468476
## t54*  6.593115 0.008160725   0.2863655
## t55*  6.593115 0.008160725   0.2863655
## t56*  4.796800 0.016046999   0.4485319
## t57*  5.711904 0.012029463   0.2691670
## t58*  6.593115 0.008160725   0.2863655
## t59*  6.898150 0.006821546   0.3403429
## t60*  5.881368 0.011285475   0.2532087
## t61*  6.254188 0.009648701   0.2506548
## t62*  7.101506 0.005928760   0.3835435
## t63*  7.270970 0.005184772   0.4224865
## t64*  5.101835 0.014707820   0.3775903
## t65*  6.350217 0.009227108   0.2574762
## t66*  7.033721 0.006226356   0.3686548
## t67*  6.321973 0.009351106   0.2551726
## t68*  5.271298 0.013963832   0.3417568
## t69*  5.203513 0.014261427   0.3557064
## t70*  6.423651 0.008904713   0.2645642
## t71*  7.067613 0.006077558   0.3760440
## t72*  6.559222 0.008309522   0.2814524
## t73*  7.033721 0.006226356   0.3686548
## t74*  7.338755 0.004887177   0.4386423
## t75*  5.169620 0.014410225   0.3628807
## t76*  6.389759 0.009053510   0.2611003
## t77*  6.016938 0.010690284   0.2468476
## t78*  6.525330 0.008458320   0.2768022
## t79*  4.938020 0.015427008   0.4148457
## t80*  6.965935 0.006523951   0.3542348
## t81*  6.084724 0.010392689   0.2459797
## t82*  7.338755 0.004887177   0.4386423
## t83*  6.118617 0.010243891   0.2461357
## t84*  6.762579 0.007416737   0.3144213
## t85*  6.559222 0.008309522   0.2814524
## t86*  7.169292 0.005631165   0.3988483
## t87*  5.785338 0.011707068   0.2612350
## t88*  5.881368 0.011285475   0.2532087
## t89*  5.237405 0.014112629   0.3486624
## t90*  6.999828 0.006375153   0.3613827
## t91*  6.288080 0.009499903   0.2527322
## t92*  5.847475 0.011434272   0.2557231
## t93*  6.355866 0.009202308   0.2579657
## t94*  6.491437 0.008607118   0.2724285
## t95*  4.491765 0.017386177   0.5248654
## t96*  6.932042 0.006672749   0.3472188
## t97*  7.338755 0.004887177   0.4386423
## t98*  6.423651 0.008904713   0.2645642
## t99*  6.084724 0.010392689   0.2459797
## t100* 6.627008 0.008011927   0.2915283
boot.ci(b.out, index=1, type='perc')
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b.out, type = "perc", index = 1)
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 5.951,  6.989 )  
## Calculations and Intervals on Original Scale
b.ci = t(sapply(1:nrow(data2), function(x) boot.ci(b.out, index=x, type='perc')$percent))[,4:5]
dimnames(b.ci) = list(rownames(data2), c('lower','upper'))
kable(head(b.ci,4))
lower upper
5.950943 6.988670
5.077370 6.188557
5.291592 6.312136
6.300778 7.787897
data4 = cbind(data2, b.ci)
ggplot(data4, aes(x=mars, y=mathpre)) + geom_point(alpha=0.2) + labs(x= 'Math Anxiety Rating Scale', y= 'Math Pretest Score', title = "Math Anxiety vs. Score") + theme_bw()  + geom_ribbon(aes(ymin = lower, ymax = upper), alpha=0.3, fill="#69b3a2") + geom_smooth(method='lm', formula= y~x, se=FALSE)

3 Part 3

3.1 Part 3: Instruction

  • Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.

  • First, get the latest data from from https://covid19.who.int/table.

    • The file should likely be named “WHO COVID-19 global table data March XXth 2022 at XXXXX.csv”

    • Don’t use the data that I uploaded on Collab. It’s not the most recent data.

3.2 Part 3: Instruction (Cont’d)

  • Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).

  • Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.

db <- read_csv("WHO-COVID-19-global-table-data.csv")
## Warning: One or more parsing issues, see `problems()` for details
## Rows: 238 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): Name, WHO Region
## dbl (9): Cases - cumulative total, Cases - cumulative total per 100000 popul...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
regionMortalityRate = 
  select(db, "Name", "WHO Region", "Cases - cumulative total", "Deaths - cumulative total") %>%
  rename("name"="Name",
         "who_region" = "WHO Region",
         "cases_cumulative" = "Cases - cumulative total",
         "deaths_cumulative" = "Deaths - cumulative total") %>%
  mutate(mortality_rate = deaths_cumulative/cases_cumulative) %>%
  mutate(se = sqrt(mortality_rate*(1-mortality_rate)/cases_cumulative))

head(regionMortalityRate)
## # A tibble: 6 × 6
##   name       who_region cases_cumulative deaths_cumulati… mortality_rate      se
##   <chr>      <chr>                 <dbl>            <dbl>          <dbl>   <dbl>
## 1 Global     <NA>              199466211          4244541         0.0213 1.02e-5
## 2 United St… Americas           35010407           609022         0.0174 2.21e-5
## 3 India      South-Eas…         31769132           425757         0.0134 2.04e-5
## 4 Brazil     Americas           19953501           557223         0.0279 3.69e-5
## 5 Russian F… Europe              6356784           161715         0.0254 6.25e-5
## 6 France     Europe              6039483           110921         0.0184 5.46e-5
print(as.list(regionMortalityRate[1,])[3])
## $cases_cumulative
## [1] 199466211
filter(regionMortalityRate, who_region=="Americas")
## # A tibble: 56 × 6
##    name      who_region cases_cumulative deaths_cumulati… mortality_rate      se
##    <chr>     <chr>                 <dbl>            <dbl>          <dbl>   <dbl>
##  1 United S… Americas           35010407           609022         0.0174 2.21e-5
##  2 Brazil    Americas           19953501           557223         0.0279 3.69e-5
##  3 Argentina Americas            4947030           106045         0.0214 6.51e-5
##  4 Colombia  Americas            4801050           121216         0.0252 7.16e-5
##  5 Mexico    Americas            2861498           241279         0.0843 1.64e-4
##  6 Peru      Americas            2114445           196518         0.0929 2.00e-4
##  7 Chile     Americas            1618457            35640         0.0220 1.15e-4
##  8 Canada    Americas            1430483            26592         0.0186 1.13e-4
##  9 Ecuador   Americas             487702            31644         0.0649 3.53e-4
## 10 Bolivia … Americas             474538            17859         0.0376 2.76e-4
## # … with 46 more rows
regions3mrate = filter(regionMortalityRate, name=="Zambia"|name=="South Africa"|name=="Kenya"|name=="Argentina"|name=="Peru"|name=="Mexico" |name=="Jordan"|name=="Iraq"|name=="Pakistan" | name=="France"|name=="Italy"|name=="Germany"|name=="Japan"|name=="Australia"|name=="Singapore"|name=="India"|name=="Nepal"|name=="Maldives")
head(regions3mrate)
## # A tibble: 6 × 6
##   name      who_region  cases_cumulative deaths_cumulati… mortality_rate      se
##   <chr>     <chr>                  <dbl>            <dbl>          <dbl>   <dbl>
## 1 India     South-East…         31769132           425757         0.0134 2.04e-5
## 2 France    Europe               6039483           110921         0.0184 5.46e-5
## 3 Argentina Americas             4947030           106045         0.0214 6.51e-5
## 4 Italy     Europe               4363374           128115         0.0294 8.08e-5
## 5 Germany   Europe               3777446            91704         0.0243 7.92e-5
## 6 Mexico    Americas             2861498           241279         0.0843 1.64e-4
mortality_se = se = sd(regions3mrate$mortality_rate) / sqrt(length(regions3mrate$mortality_rate))
print(mortality_se)
## [1] 0.005738795
ggplot(regions3mrate) +
    geom_bar( aes(x=name, y=mortality_rate), stat="identity", fill="skyblue", alpha=0.7) + facet_grid(~who_region, scale="free") + geom_errorbar( aes(x=name, ymin=mortality_rate-se, ymax=mortality_rate+se), width=0.4, colour="orange", alpha=0.9, size=1.3)

  • One of the group members will present R codes and plots for Part 3 in class on Mar. 21 (Mon). Also, if you’re a presenter, please bring your laptop so that you can share your screen on zoom for the presentation.