Group 9: Kay Mattern, Andrea Parrish, and Max Nelson
Due Date: 11:59pm, Oct 25
“Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”
Use the EuStockMarkets data that contains the daily closing prices of major European stock indices: Germany DAX (Ibis), Switzerland SMI, France CAC, and UK FTSE. Then, create multiple lines that show changes of each index’s daily closing prices over time.
Please use function gather from package tidyr to transform the data from a wide to a long format. For more info, refer to our lecture materials on dataformats (i.e., DS3003_dataformat_facets_note.pdf, DS3003_dataformat_facets_code.rmd, or DS3003_dataformat_facets_code.html
Use function plot_ly from package plotly to create a line plot.
## Loading required package: ggplot2
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat$time <- time(EuStockMarkets) # add `time` variable
# add your codes
longEU <- gather(dat, key = EUMarket, value = price, DAX:FTSE)
line <- ggplot(longEU, aes(x = time, y = price, col = EUMarket)) + geom_line() + theme_classic()
plot1 <- ggplotly(line)## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
Use a dataset in data repositories (e.g., [kaggle] (https://www.kaggle.com/datasets)) that gives the measurements in different conditions like iris data. For more info on iris data, use ?iris.
Briefly describe the dataset you’re using for this assignment (e.g., means to access data, context, sample, variables, etc…).
One of the group members will present R codes and plots for Part 2 in class on Oct. 26 (Tue). Please e-mail the instructor with your RPubs link if you’re a presenter by 11:59pm, Oct 25.
This data from Kaggle is on Body Fat Predictions for 252 men in the United States. Some of the variables include: age, percentage of body fat, chest circumference, wrist circumference, thigh circumference, ankle circumference, etc. all in centimeters. We will be ignoring body fat and instead looking at boxplots of the differences in circumferences for certain body parts, broken up by ages, to see how these circumferences change as men get older.
Link: https://www.kaggle.com/fedesoriano/body-fat-prediction-dataset
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(plotly)
# getting the data
body <- read_csv("C:/Users/student/Desktop/Fourth Year/DS 3003 R Codes/Week 9 - plotly/bodyfat.csv")## Rows: 252 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (15): Density, BodyFat, Age, Weight, Height, Neck, Chest, Abdomen, Hip, ...
##
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
# changing the age column into ranges for ages
body <- body %>% mutate(AgeRange = case_when(Age >= 20 & Age <= 39 ~ '20s/30s',
Age >= 40 & Age <= 49 ~ '40s', Age >= 50 & Age <= 59 ~ '50s',
Age >= 60 & Age <= 89 ~ '60s+'))
attach(body)
table(AgeRange)## AgeRange
## 20s/30s 40s 50s 60s+
## 75 94 47 36
# converting it to long
longbody <- gather(body, key = body_part, value = measurement, c("Chest",
"Thigh", "Neck", "Abdomen"))
box <- ggplot(longbody, aes(x = AgeRange, y = measurement)) + geom_boxplot() +
facet_grid(~ body_part) + theme_classic() + labs(x = "Age Range",
y = "Circumference (cm)", title = "Boxplots of Body Part Circumferences (cm)
by Age")
plot2 <- ggplotly(box)
plot2 <- plot2 %>% layout(showlegend = TRUE)
plot2Overall, the boxplots for the circumferences of many parts of the body are consistent throughout mens’ lifetimes. One thing to note is that the 40-49 age range is where we consistently see the most variability as there are many outliers and larger spreads for that range. The general trend is that as men get older, the average circumference of each body part increases, seen through the increasing medians for each part. The only exception to this rule is the thigh, as the opposite is true, as men get older the average circumference of it decreases.