Assignment 11 - plotly

Group 9: Kay Mattern, Andrea Parrish, and Max Nelson

Due Date: 11:59pm, Oct 25

Group Homework

Group Homework

“Honor Pledge: I have recreated my group submission using using the tools I have installed on my own computer”

Part 1

Part 1: Instruction

Part 1: Results

library(tidyr) # load tidyr package
library(plotly) # load plotly package
## Loading required package: ggplot2
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
data(EuStockMarkets) # load EuStockMarkets
dat <- as.data.frame(EuStockMarkets) # coerce it to a data frame
dat$time <- time(EuStockMarkets) # add `time` variable

# add your codes
longEU <- gather(dat, key = EUMarket, value = price, DAX:FTSE)

line <- ggplot(longEU, aes(x = time, y = price, col = EUMarket)) + geom_line() + theme_classic()

plot1 <- ggplotly(line)
## Don't know how to automatically pick scale for object of type ts. Defaulting to continuous.
plot1 <- plot1 %>% layout(showlegend = TRUE)
plot1

Part 2

Part 2: Instruction

Part 2: Data Description

This data from Kaggle is on Body Fat Predictions for 252 men in the United States. Some of the variables include: age, percentage of body fat, chest circumference, wrist circumference, thigh circumference, ankle circumference, etc. all in centimeters. We will be ignoring body fat and instead looking at boxplots of the differences in circumferences for certain body parts, broken up by ages, to see how these circumferences change as men get older.

Link: https://www.kaggle.com/fedesoriano/body-fat-prediction-dataset

Part 2: Results

library(readr)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
library(ggplot2)
library(plotly)

# getting the data
body <- read_csv("C:/Users/student/Desktop/Fourth Year/DS 3003 R Codes/Week 9 - plotly/bodyfat.csv")
## Rows: 252 Columns: 15
## -- Column specification --------------------------------------------------------
## Delimiter: ","
## dbl (15): Density, BodyFat, Age, Weight, Height, Neck, Chest, Abdomen, Hip, ...
## 
## i Use `spec()` to retrieve the full column specification for this data.
## i Specify the column types or set `show_col_types = FALSE` to quiet this message.
# changing the age column into ranges for ages
body <- body %>% mutate(AgeRange = case_when(Age >= 20 & Age <= 39 ~ '20s/30s', 
Age >= 40 & Age <= 49 ~ '40s', Age >= 50 & Age <= 59 ~ '50s', 
Age >= 60 & Age <= 89 ~ '60s+'))

attach(body)
table(AgeRange)
## AgeRange
## 20s/30s     40s     50s    60s+ 
##      75      94      47      36
# converting it to long
longbody <- gather(body, key = body_part, value = measurement, c("Chest", 
"Thigh", "Neck", "Abdomen"))
box <- ggplot(longbody, aes(x = AgeRange, y = measurement)) + geom_boxplot() + 
facet_grid(~ body_part) + theme_classic() + labs(x = "Age Range", 
y = "Circumference (cm)", title = "Boxplots of Body Part Circumferences (cm) 
by Age")

plot2 <- ggplotly(box) 
plot2 <- plot2 %>% layout(showlegend = TRUE)
plot2

Overall, the boxplots for the circumferences of many parts of the body are consistent throughout mens’ lifetimes. One thing to note is that the 40-49 age range is where we consistently see the most variability as there are many outliers and larger spreads for that range. The general trend is that as men get older, the average circumference of each body part increases, seen through the increasing medians for each part. The only exception to this rule is the thigh, as the opposite is true, as men get older the average circumference of it decreases.