Group Homework

You will work with your group to complete this assignment.
Upload your html file on RPubs and include the link when you submit your submission files on Collab.
Submit your group’s shared .Rmd AND “knitted”.html files on Collab.
Note that this html file is now uploaded on RPubs.

Group Homework

Your “knitted .html” submission must be created from your “group .Rmd” but be created on your own computer.
Confirm this with the following comment included in your submission text box: “Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”
Name the files with a group name and YOUR name for your submission.
Each group member must be able to submit this assignment as created from their own computer. If only some members of the group submit the required files, those group members must additionally provide a supplemental explanation along with their submission as to why other students in their group have not completed this assignment.

Part 1

Part 1: Instruction

Use the EuStockMarkets data that contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Then, create multiple lines that show changes of each index’s daily closing prices over time.
Please use function gather from package tidyr to transform the data from a wide to a long format. For more info, refer to our lecture materials on dataformats (i.e., DS3003_dataformat_facets_note.pdf, DS3003_dataformat_facets_code.rmd, or DS3003_dataformat_facets_code.html
Use function plot_ly from package plotly to create a line plot.

Part 1: Example

see the html file.

Part 1: Results

library(tidyr) # load tidyr package
library(plotly) # load plotly package

## Warning: package 'plotly' was built under R version 4.1.3

data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat$time <- time(EuStockMarkets) # add `time` variable


lstocks <- gather(dat, key = "Country", value = "stock_price", DAX, SMI, CAC, FTSE)

plot_ly(lstocks ,x = ~time, y = ~stock_price, type = 'scatter', mode = 'line', color = ~Country)  %>% layout(title = 'Stock prices over time grouped by country index', xaxis = list(title = 'Time (years)'), yaxis = list(title = 'Price (dollars)') )

# add your codes

#lstocks

Part 2

Part 2: Instruction

Use the SCS Data set you downloaded from the previous group assignments, and then investigate the relationship between the mathematics achievement score (“mathpre”) and the math anxiety score (“mars”).
Plot the data, linear line, and bootstrap confidence envelopes. Use 2,000 bootstrap replicates (i.e., R=2000) in function boot, and add appropriate x- and y- labels, and a title to the graph.
Please refer to section: Linear regression with bootstrap confidence intervals in DS3003_visualizingerrors_reg_note.html and DS3003_visualizingerrors_reg_code.html.

library(foreign)

edu_df <- read.spss('SCS_QE.sav', to.data.frame = TRUE)

## re-encoding from CP1252

## Warning in read.spss("SCS_QE.sav", to.data.frame = TRUE): Undeclared level(s) 0
## added in variable: married

library("ggplot2")

ggplot(data = edu_df, aes(x=mars, y=mathpre)) + geom_point() + geom_jitter(width = .5, height = .5)+
    labs(title = "Scatter plot of Math Anxiety vs Math Achievement",
         x = "Math Anxiety",
         y = "Math Achievement") + geom_smooth(method='lm', formula= y~x, col="red", se = FALSE) + theme_bw()

library(boot)
b.stat <- function(data, i)
{
   b.dat <- data[i ,]
   out.lm <- lm(mathpre ~ mars, b.dat)
   predict(out.lm, data.frame(mars=edu_df$mars))   
}

b.out <- boot(edu_df, b.stat, R = 2000) # R = num of replications

boot.ci(b.out, index = 1, type = 'perc')

## BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
## Based on 2000 bootstrap replicates
## 
## CALL : 
## boot.ci(boot.out = b.out, type = "perc", index = 1)
## 
## Intervals : 
## Level     Percentile     
## 95%   ( 6.158,  6.936 )  
## Calculations and Intervals on Original Scale

b.ci <- t(sapply(1:nrow(edu_df), function(x) boot.ci(b.out, index = x, type = 'perc')$percent))[, 4:5]
dimnames(b.ci) <- list(rownames(edu_df), c('lower', 'upper'))
kable(head(b.ci, 4))

lower	upper
6.158172	6.936304
5.240956	6.035088
5.459879	6.178394
6.666771	7.762991

bs_edu_df <- cbind(edu_df, b.ci) # combine two datasets
ggplot(bs_edu_df, aes(x=mars, y= mathpre)) + geom_point() + labs(title = "Scatter plot of Math Anxiety vs Math Achievement with confidence interval",
         x = "Math Anxiety",
         y = "Math Achievement") + theme_bw() + 
        geom_smooth(method='lm', formula= y~x, se = FALSE, color = 'red') +
        geom_ribbon(aes(ymin = lower, ymax = upper), alpha = 0.3, fill="blue")

Part 3

Part 3: Instruction

Create WHO Reporting Barplots with error bars separated by WHO region using either facet_grid or facet_wrap.
First, get the latest data from from https://covid19.who.int/table.
- The file should likely be named “WHO COVID-19 global table data March XXth 2022 at XXXXX.csv”
- Don’t use the data that I uploaded on Collab. It’s not the most recent data.

Part 3: Instruction (Cont’d)

Second, create a subset including 3 countries per WHO region (Africa, Americas, Eastern Mediterranean, Europe, South-East Asia, Western Pacific). You can choose any three countries within each WHO region to compare the mortality rate (mutate(rate = "Deaths - cumulative total"/"Cases - cumulative total")).
Third, draw bar plots with error bars using your subset, but adjust the graph in the facets using either facet_grid or facet_wrap (e.g., facet_grid(~ "WHO region", scale="free"). Please include scale="free" in your facet function.

covids<- read.csv('WHO-COVID-19-global-table-data.csv')
# bar graphs
covids <- cbind(Country = rownames(covids), covids)
rownames(covids) <- 1:nrow(covids)
names(covids)[2] <- 'Region'
names(covids)[9] <- 'Cumulative_deaths'
names(covids)[4] <- 'Cumulative_cases'
covid_df_region <- covids #[ which( covids$Region %in% c("Americas", "South-East Asia", "Europe") ),]

focus <- c('Brazil', 'Mexico', 'Colombia', 
           'Finland', 'Norway','Italy',
           'Indonesia', 'India', 'Thailand',
           'Republic of Korea', 'Japan', 'Viet Nam',
           'Qatar', 'Iraq', 'Morocco',
           'Zambia','Algeria','Nigeria'
           )

covid_df_region %>% filter(Country %in% focus) %>% mutate(rate = `Cumulative_deaths`/`Cumulative_cases`) %>% 
ggplot(aes(x=rate, y=Region)) + geom_col() + theme_bw() + xlab('WHO: # Reported Deaths / # Reported Cases')

covid_region_SE <- covid_df_region %>% filter(Country %in% focus) %>%
  mutate(rate = `Cumulative_deaths`/`Cumulative_cases`, 
         SE = sqrt(rate*(1-rate)/`Cumulative_cases`))

ggplot(covid_region_SE, aes(x=Country, y=rate)) + geom_col() + theme(axis.text.x=element_text(angle=90, hjust=1)) + 
  geom_errorbar(aes(ymin=rate-1.96*SE, ymax=rate+1.96*SE), width=.2) + facet_grid(~ Region, scale="free") + labs(x = 'WHO: # Reported Deaths / # Reported Cases', y = 'Mortality rate', title="Mortality rate of countries grouped by region")

One of the group members will present R codes and plots for Part 3 in class on Mar. 21 (Mon). Also, if you’re a presenter, please bring your laptop so that you can share your screen on zoom for the presentation.

Assignment 10 - plotly & Uncertainty

Group 8, Joseph Bannon(jb9war)

Due Date: 11:59pm, Mar 20

Group Homework

Group Homework

Part 1

Part 1: Instruction

Part 1: Example

Part 1: Results

Part 2

Part 2: Instruction

Part 3

Part 3: Instruction

Part 3: Instruction (Cont’d)