Due Date: 11:59pm, Oct 25

Group Homework

  • You will work with your group to complete this assignment.

  • Upload your html file on RPubs and include the link when you submit your submission files on Collab.

  • Submit your group’s shared .Rmd AND “knitted”.html files on Collab.

  • Note that this html file is now uploaded on RPubs.

Group Homework

  • Your “knitted .html” submission must be created from your “group .Rmd” but be created on your own computer.

  • Confirm this with the following comment included in your submission text box: “Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”

  • Name the files with a group name and YOUR name for your submission.

  • Each group member must be able to submit this assignment as created from their own computer. If only some members of the group submit the required files, those group members must additionally provide a supplemental explanation along with their submission as to why other students in their group have not completed this assignment.

Part 1

Part 1: Instruction

  • Use the EuStockMarkets data that contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Then, create multiple lines that show changes of each index’s daily closing prices over time.

  • Please use function gather from package tidyr to transform the data from a wide to a long format. For more info, refer to our lecture materials on dataformats (i.e., DS3003_dataformat_facets_note.pdf, DS3003_dataformat_facets_code.rmd, or DS3003_dataformat_facets_code.html

  • Use function plot_ly from package plotly to create a line plot.

Part 1: Results

Long Data Transformation

library(tidyr) # load tidyr package
library(plotly) # load plotly package

data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat$time <- time(EuStockMarkets) # add `time` variable

# add your codes

# use gather to transform data from wide to long format
long_dat <- dat %>% gather(StockIndex, Price, c(DAX, SMI, CAC, FTSE))

Long Data

head(long_dat)
##       time StockIndex   Price
## 1 1991.496        DAX 1628.75
## 2 1991.500        DAX 1613.63
## 3 1991.504        DAX 1606.51
## 4 1991.508        DAX 1621.04
## 5 1991.512        DAX 1618.16
## 6 1991.515        DAX 1610.61

Plot_ly Line Plot Code

# use plot_ly to create a line plot and show multiple lines that show
#changes in each index's daily closing prices over time
line_plot <- plot_ly(x = long_dat$time, y = long_dat$Price, 
  type = 'scatter', mode = 'lines',color=long_dat$StockIndex) %>% 
  layout(title = "Daily Closing Prices vs. Time",
  xaxis = list(title = 'Time'), 
  yaxis = list(title = 'Price'),
  legend = list(title=list(text='Stock Index')))

Plot_ly Line Plot

Part 2

Part 2: Instruction

  • Use a dataset in data repositories (e.g., kaggle) that gives the measurements in different conditions like iris data. For more info on iris data, use ?iris.

  • Briefly describe the dataset you’re using for this assignment (e.g., means to access data, context, sample, variables, etc…).

  • Transform the dataset from a wide to a long format. Produce any ggplot where the key variable is used in function facet_grid or facet_wrap.

  • One of the group members will present R codes and plots for Part 2 in class on Oct. 26 (Tue). Please e-mail the instructor with your RPubs link if you’re a presenter by 11:59pm, Oct 25.

Part 2: Data Description

This dataset is titled penguins and was collected by Dr. Kristen Forman and the Palmer station in Antarctica. The dataset can be found at this link. The dataset has 333 observations and 9 variables. The variables include species, island, bill length, bill depth, flipper length, body mass, sex, and year. The variable species is similar to the species variable in the iris dataset in that it comprises different conditions (Adélie, Chinstrap and Gentoo). The variables bill length, bill depth, and flipper length are measurements in mm given for each different species condition. This dataset is in wide form and can be transformed into long form with the gather() function.

Part 2: Results

Long Data Transformation

# add your codes
penguins <- read.csv('penguins.csv')
# omit NA values
penguins <- na.omit(penguins)

# gather data into long format based on species (similar to iris ex.)
long_penguins <- penguins %>% gather(key=penguin_att, 
value=measurement, c(bill_length_mm, bill_depth_mm, 
                     flipper_length_mm))

Long Data

head(long_penguins)
##   X species    island body_mass_g    sex year    penguin_att measurement
## 1 1  Adelie Torgersen        3750   male 2007 bill_length_mm        39.1
## 2 2  Adelie Torgersen        3800 female 2007 bill_length_mm        39.5
## 3 3  Adelie Torgersen        3250 female 2007 bill_length_mm        40.3
## 4 5  Adelie Torgersen        3450 female 2007 bill_length_mm        36.7
## 5 6  Adelie Torgersen        3650   male 2007 bill_length_mm        39.3
## 6 7  Adelie Torgersen        3625 female 2007 bill_length_mm        38.9

ggplot with facet_grid code

plot <- ggplot(long_penguins, 
               aes(x=species, y=measurement)) + 
  geom_boxplot() + 
   facet_grid(~ penguin_att) + 
  theme_classic() + 
  labs(title = "Penguin Attribute Measurements for Each Species")

ggplot with facet_grid