Group Homework

Group Homework

Part 1

Part 1: Instruction

  • Use the EuStockMarkets data that contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Then, create multiple lines that show changes of each index’s daily closing prices over time.

  • Please use function gather from package tidyr to transform the data from a wide to a long format. For more info, refer to our lecture materials on dataformats (i.e., DS3003_dataformat_facets_note.pdf, DS3003_dataformat_facets_code.rmd, or DS3003_dataformat_facets_code.html

  • Use function plot_ly from package plotly to create a line plot.

Part 1: Example

  • see the html file.

Part 1: Results

library(tidyr) # load tidyr package
library(plotly) # load plotly package
## Warning: package 'plotly' was built under R version 4.1.3
data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat$time <- time(EuStockMarkets) # add `time` variable


lstocks <- gather(dat, key = "Country", value = "stock_price", DAX, SMI, CAC, FTSE)

plot_ly(lstocks ,x = ~time, y = ~stock_price, type = 'scatter', mode = 'line', color = ~Country)  %>% layout(title = 'Stock prices over time grouped by country index', xaxis = list(title = 'Time (years)'), yaxis = list(title = 'Price (dollars)') )
# add your codes

#lstocks

Part 2

Part 2: Instruction

  • Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).

  • Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.

  • Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html and DS3003_visualizingerrors_reg_code.html.

library(foreign)

edu_df <- read.spss('SCS_QE.sav', to.data.frame = TRUE)
## re-encoding from CP1252
## Warning in read.spss("SCS_QE.sav", to.data.frame = TRUE): Undeclared level(s) 0
## added in variable: married
library("ggplot2")

ggplot(data = edu_df, aes(x=mars, y=mathpre)) + geom_point() + geom_jitter(width = .5, height = .5)+
    labs(title = "Scatter plot of Math Anxiety vs Math Achievement",
         x = "Math Anxiety",
         y = "Math Achievement") + geom_smooth(method='lm', formula= y~x, col="red", se = FALSE) + theme_bw()

library(boot)
b.stat <- function(data, i)
{
   b.dat <- data[i ,]
   out.lm <- lm(mathpre ~ mars, b.dat)
   predict(out.lm, data.frame(mars=edu_df$mars))   
}

b.out <- boot(edu_df, b.stat, R = 2000) # R = num of replications

boot.ci(b.out, index = 1, type = 'perc')
## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b.out, type = "perc", index = 1)
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 6.158,  6.936 )  
## Calculations and Intervals on Original Scale
b.ci <- t(sapply(1:nrow(edu_df), function(x) boot.ci(b.out, index = x, type = 'perc')$percent))[, 4:5]
dimnames(b.ci) <- list(rownames(edu_df), c('lower', 'upper'))
kable(head(b.ci, 4))
lower upper
6.158172 6.936304
5.240956 6.035088
5.459879 6.178394
6.666771 7.762991
bs_edu_df <- cbind(edu_df, b.ci) # combine two datasets
ggplot(bs_edu_df, aes(x=mars, y= mathpre)) + geom_point() + labs(title = "Scatter plot of Math Anxiety vs Math Achievement with confidence interval",
         x = "Math Anxiety",
         y = "Math Achievement") + theme_bw() + 
        geom_smooth(method='lm', formula= y~x, se = FALSE, color = 'red') +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3, fill="blue")

Part 3

Part 3: Instruction

  • Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.

  • First, get the latest data from from https://covid19.who.int/table.

    • The file should likely be named “WHO COVID-19 global table data March XXth 2022 at XXXXX.csv”

    • Don’t use the data that I uploaded on Collab. It’s not the most recent data.

Part 3: Instruction (Cont’d)

  • Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).

  • Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.

covids<- read.csv('WHO-COVID-19-global-table-data.csv')
# bar graphs
covids <- cbind(Country = rownames(covids), covids)
rownames(covids) <- 1:nrow(covids)
names(covids)[2] <- 'Region'
names(covids)[9] <- 'Cumulative_deaths'
names(covids)[4] <- 'Cumulative_cases'
covid_df_region <- covids #[ which( covids$Region %in% c("Americas", "South-East Asia", "Europe") ),]
focus <- c('Brazil', 'Mexico', 'Colombia', 
           'Finland', 'Norway','Italy',
           'Indonesia', 'India', 'Thailand',
           'Republic of Korea', 'Japan', 'Viet Nam',
           'Qatar', 'Iraq', 'Morocco',
           'Zambia','Algeria','Nigeria'
           )

covid_df_region %>% filter(Country %in% focus) %>% mutate(rate = `Cumulative_deaths`/`Cumulative_cases`) %>% 
ggplot(aes(x=rate, y=Region)) + geom_col() + theme_bw() + xlab('WHO: # Reported Deaths / # Reported Cases')

covid_region_SE <- covid_df_region %>% filter(Country %in% focus) %>%
  mutate(rate = `Cumulative_deaths`/`Cumulative_cases`, 
         SE = sqrt(rate*(1-rate)/`Cumulative_cases`))
ggplot(covid_region_SE, aes(x=Country, y=rate)) + geom_col() + theme(axis.text.x=element_text(angle=90, hjust=1)) + 
  geom_errorbar(aes(ymin=rate-1.96*SE, ymax=rate+1.96*SE), width=.2) + facet_grid(~ Region, scale="free") + labs(x = 'WHO: # Reported Deaths / # Reported Cases', y = 'Mortality rate', title="Mortality rate of countries grouped by region")

  • One of the group members will present R codes and plots for Part 3 in class on Mar. 21 (Mon). Also, if you’re a presenter, please bring your laptop so that you can share your screen on zoom for the presentation.