Slack, Deaths, Fifa Attendance, iTunes, Bitcoin
Plot the median number of messages posted by the day of week. Do this in two ways.
stat231-9-23-slack-by-day.csv providedhttps://abaldenko.github.io/STAT231-Fal2018/hw/data/stat231-9-23-slack-by-day.csvSOLUTION:
library(tidyverse)
# define function to return a plot
# of the median number of messages each day of the week
plot_median <- function(file_name) {
df <- read_csv(file_name)
order_of_week <- c("Monday","Tuesday","Wednesday",
"Thursday","Friday","Saturday","Sunday")
final_plot <- df %>%
select(`Messages Posted`, Date) %>%
mutate(day_of_week = weekdays(as.Date(Date))) %>%
group_by(day_of_week) %>%
summarize(median_messages = median(`Messages Posted`)) %>%
mutate(day_of_week = factor(day_of_week, levels = order_of_week)) %>%
arrange(day_of_week) %>%
ggplot(aes(x=day_of_week, y=median_messages)) +
geom_bar(stat="identity") +
labs(x = "Day of the Week", y = "Median number of messages posted",
title = "Median number of messages posted each day of the week")
return(final_plot)
}
# call above function on each form of the desired CSV
plot_median('data/stat231-9-23-slack-by-day.csv')
plot_median('https://abaldenko.github.io/STAT231-Fall2018/hw/data/stat231-9-23-slack-by-day.csv')
Calculate the median age at death both for individuals with and without children. Calculate this using deaths.xlsx provided.
SOLUTION:
library(readxl)
calculate_median_age <- function(file_name){
arts_sheet <- read_xlsx(file_name, sheet=2,
range='A5:F15', col_names=TRUE)
other_sheet <- read_xlsx(file_name, sheet = 3,
range='A5:F15', col_names=TRUE)
# Combine second and third sheets
deaths_final <- rbind(arts_sheet, other_sheet) %>%
select(`Date of death`, `Has kids`) %>%
mutate(children_status = ifelse(`Has kids`==TRUE,
"Children",
"No children")) %>%
group_by(children_status) %>%
summarize(median_date_of_death = median(`Date of death`))
return(deaths_final)
}
kable(calculate_median_age('data/deaths.xlsx'),
caption = "Median date of death for individuals with and without children")
| children_status | median_date_of_death |
|---|---|
| Children | 2016-11-24 |
| No children | 2016-11-07 |
Using the approach described in Section 5.5.1. generate a plot displaying total FIFA World Cup attendance by year. Calculate this in two ways.
FIFA_World_Cup.html providedhttps://en.wikipedia.org/wiki/FIFA_World_Cup.SOLUTION:
library(rvest)
attendance_by_year <- function(file_name) {
tables <- file_name %>%
read_html() %>%
html_nodes("table")
html_df <- html_table(tables[[3]], fill=TRUE)
names(html_df) <- html_df[1, ]
html_df <- html_df[2:22,] %>%
select(Year, Totalattendance) %>%
mutate(Totalattendance = as.numeric(gsub(",","",Totalattendance)))
plot <- html_df %>%
ggplot(aes(x=Year, y=Totalattendance)) +
geom_bar(stat="identity") +
labs(x = "Year", y = "Total Attendance",
title = "Total Attendance at the Olympics by Year")
return(plot)
}
attendance_by_year('data/FIFA_World_Cup.html')
attendance_by_year("https://en.wikipedia.org/wiki/FIFA_World_Cup")
What is the mean and median price of iTunes tracks by the artist MGMT (include all track types)? Calculate this in two ways.
mgmt.txt providedSOLUTION:
library(jsonlite)
library(httr)
mgmt_price <- function(file_name) {
raw_mgmt_df <- jsonlite::fromJSON(file_name)
mgmt_df <- raw_mgmt_df$results
mgmt_df_final <- mgmt_df %>%
select(artistName, trackPrice) %>%
filter(artistName == "MGMT", trackPrice >= 0) %>%
summarize(mean_price = mean(trackPrice), median_price = median(trackPrice))
return(mgmt_df_final)
}
kable(mgmt_price("data/MGMT.txt"),
caption = "Mean and median price for tracks by MGMT")
| mean_price | median_price |
|---|---|
| 1.349794 | 1.29 |
kable(mgmt_price( "https://itunes.apple.com/search?term=mgmt&limit=400"),
caption = "Mean and median price for tracks by MGMT")
| mean_price | median_price |
|---|---|
| 1.346122 | 1.29 |
What is the mean and standard deviation of Bitcoin closing price for the month of August 2018. Calculate this in two ways.
bitcoin_close.json providedSOLUTION:
library(lubridate)
bitcoin_closing_price <- function(file_name) {
raw_bitcoin <- jsonlite::fromJSON(file_name)
bitcoin_list <- raw_bitcoin$bpi
bitcoin_df <- as.data.frame(bitcoin_list) %>%
gather(key = "date", value = "closing_price") %>%
mutate(date = substring(date, 2)) %>%
mutate(date = lubridate::ymd(date)) %>%
mutate(month = lubridate::month(date)) %>%
filter(month == 8) %>%
summarize(mean_closing_price = mean(closing_price),
sd_closing_price = sd(closing_price))
return(bitcoin_df)
}
# The data you gave us was from 2013 and not 2018 so I answered accordingly
kable(bitcoin_closing_price("data/bitcoin_close.json"),
caption = "Mean and standard deviation of the Bitcoin closing price in August 2013")
| mean_closing_price | sd_closing_price |
|---|---|
| 103.6357 | 9.763663 |
kable(bitcoin_closing_price("https://api.coindesk.com/v1/bpi/historical/close.json?start=2013-07-01&end=2013-09-24"),
caption = "Mean and standard deviation of the Bitcoin closing price in August 2013")
| mean_closing_price | sd_closing_price |
|---|---|
| 103.6357 | 9.763663 |
To figure out how to read in a local HTML file: https://stackoverflow.com/questions/26108375/read-local-html-file-into-r
To figure out how to identify a specific day of the week: https://stackoverflow.com/questions/9216138/find-the-day-of-a-week
For manually reordering the rows of a dataframe: https://stackoverflow.com/questions/26548495/reorder-rows-using-custom-order
To remove commas from a number: https://stackoverflow.com/questions/28129554/in-r-remove-commas-from-a-field-and-have-the-modified-field-remain-part-of-the
To remove the first character in a string for an entire column: https://stackoverflow.com/questions/41663097/remove-first-character-from-a-string-in-data-frame-column
Collaborated with Nicole Frontero (in and outside of class) on questions 1 and 2. Went to TA hours on Monday night and collaborated with Martin Glusker on the Bitcoin and iTunes questions and helped Nicole with how to create functions in R.