PROBLEMS TO TURN IN

Slack, Deaths, Fifa Attendance, iTunes, Bitcoin

Slack (CSV)

Plot the median number of messages posted by the day of week. Do this in two ways.

  1. Using stat231-9-23-slack-by-day.csv provided
  2. Read directly from https://abaldenko.github.io/STAT231-Fal2018/hw/data/stat231-9-23-slack-by-day.csv

SOLUTION:

library(tidyverse)

# define function to return a plot 
# of the median number of messages each day of the week

plot_median <- function(file_name) {
  df <- read_csv(file_name)
  order_of_week <- c("Monday","Tuesday","Wednesday",
                     "Thursday","Friday","Saturday","Sunday")
  final_plot <- df %>% 
    select(`Messages Posted`, Date) %>% 
    mutate(day_of_week = weekdays(as.Date(Date))) %>% 
    group_by(day_of_week) %>% 
    summarize(median_messages = median(`Messages Posted`)) %>% 
    mutate(day_of_week = factor(day_of_week, levels = order_of_week)) %>% 
    arrange(day_of_week) %>% 
    ggplot(aes(x=day_of_week, y=median_messages)) +
      geom_bar(stat="identity") + 
      labs(x = "Day of the Week", y = "Median number of messages posted",
       title = "Median number of messages posted each day of the week")
  return(final_plot)
}

# call above function on each form of the desired CSV
plot_median('data/stat231-9-23-slack-by-day.csv')

plot_median('https://abaldenko.github.io/STAT231-Fall2018/hw/data/stat231-9-23-slack-by-day.csv')

Deaths (Excel)

Calculate the median age at death both for individuals with and without children. Calculate this using deaths.xlsx provided.

SOLUTION:

library(readxl)

calculate_median_age <- function(file_name){
  arts_sheet <- read_xlsx(file_name, sheet=2,
                          range='A5:F15', col_names=TRUE)
  other_sheet <- read_xlsx(file_name, sheet = 3, 
                           range='A5:F15', col_names=TRUE)
  # Combine second and third sheets
  deaths_final <- rbind(arts_sheet, other_sheet) %>% 
    select(`Date of death`, `Has kids`) %>% 
    mutate(children_status = ifelse(`Has kids`==TRUE,
                                    "Children",
                                    "No children")) %>% 
    group_by(children_status) %>% 
    summarize(median_date_of_death = median(`Date of death`))
  return(deaths_final)
}

kable(calculate_median_age('data/deaths.xlsx'), 
      caption = "Median date of death for individuals with and without children")
Median date of death for individuals with and without children
children_status median_date_of_death
Children 2016-11-24
No children 2016-11-07

Fifa Attendance (HTML)

Using the approach described in Section 5.5.1. generate a plot displaying total FIFA World Cup attendance by year. Calculate this in two ways.

  1. Using FIFA_World_Cup.html provided
  2. Directly from the url https://en.wikipedia.org/wiki/FIFA_World_Cup.

SOLUTION:

library(rvest)

attendance_by_year <- function(file_name) {
  tables <- file_name %>% 
    read_html() %>% 
    html_nodes("table")
  
  html_df <- html_table(tables[[3]], fill=TRUE)
  names(html_df) <- html_df[1, ] 
  
  html_df <- html_df[2:22,] %>% 
    select(Year, Totalattendance) %>% 
    mutate(Totalattendance = as.numeric(gsub(",","",Totalattendance)))
    
  plot <- html_df %>% 
    ggplot(aes(x=Year, y=Totalattendance)) +
    geom_bar(stat="identity") +
    labs(x = "Year", y = "Total Attendance",
        title = "Total Attendance at the Olympics by Year") 

  return(plot)
}

attendance_by_year('data/FIFA_World_Cup.html')

attendance_by_year("https://en.wikipedia.org/wiki/FIFA_World_Cup")

iTunes (JSON)

What is the mean and median price of iTunes tracks by the artist MGMT (include all track types)? Calculate this in two ways.

  1. Using mgmt.txt provided
  2. Directly from the iTunes API. You’ll to have to read the API documentation to generate the correct url.

SOLUTION:

library(jsonlite)
library(httr)

mgmt_price <- function(file_name) {
  
  raw_mgmt_df <- jsonlite::fromJSON(file_name) 
  mgmt_df <- raw_mgmt_df$results 

  mgmt_df_final <- mgmt_df %>% 
    select(artistName, trackPrice) %>% 
    filter(artistName == "MGMT", trackPrice >= 0) %>% 
    summarize(mean_price = mean(trackPrice), median_price = median(trackPrice))
  
  return(mgmt_df_final)
}

kable(mgmt_price("data/MGMT.txt"), 
      caption = "Mean and median price for tracks by MGMT")
Mean and median price for tracks by MGMT
mean_price median_price
1.349794 1.29
kable(mgmt_price( "https://itunes.apple.com/search?term=mgmt&limit=400"), 
      caption = "Mean and median price for tracks by MGMT")
Mean and median price for tracks by MGMT
mean_price median_price
1.346122 1.29

Bitcoin (More JSON)

What is the mean and standard deviation of Bitcoin closing price for the month of August 2018. Calculate this in two ways.

  1. Using bitcoin_close.json provided
  2. Directly from the CoinDesk API. You’ll have to read the API documentation to generate the correct url.

SOLUTION:

library(lubridate)

bitcoin_closing_price <- function(file_name) {

  raw_bitcoin <- jsonlite::fromJSON(file_name)
  bitcoin_list <- raw_bitcoin$bpi

  bitcoin_df <- as.data.frame(bitcoin_list) %>% 
    gather(key = "date", value = "closing_price") %>% 
    mutate(date = substring(date, 2)) %>% 
    mutate(date = lubridate::ymd(date)) %>% 
    mutate(month = lubridate::month(date)) %>% 
    filter(month == 8) %>% 
    summarize(mean_closing_price = mean(closing_price), 
              sd_closing_price = sd(closing_price))

  return(bitcoin_df)
}

# The data you gave us was from 2013 and not 2018 so I answered accordingly

kable(bitcoin_closing_price("data/bitcoin_close.json"), 
      caption = "Mean and standard deviation of the Bitcoin closing price in August 2013")
Mean and standard deviation of the Bitcoin closing price in August 2013
mean_closing_price sd_closing_price
103.6357 9.763663
kable(bitcoin_closing_price("https://api.coindesk.com/v1/bpi/historical/close.json?start=2013-07-01&end=2013-09-24"), 
      caption = "Mean and standard deviation of the Bitcoin closing price in August 2013")
Mean and standard deviation of the Bitcoin closing price in August 2013
mean_closing_price sd_closing_price
103.6357 9.763663

Works Cited

To figure out how to read in a local HTML file: https://stackoverflow.com/questions/26108375/read-local-html-file-into-r

To figure out how to identify a specific day of the week: https://stackoverflow.com/questions/9216138/find-the-day-of-a-week

For manually reordering the rows of a dataframe: https://stackoverflow.com/questions/26548495/reorder-rows-using-custom-order

To remove commas from a number: https://stackoverflow.com/questions/28129554/in-r-remove-commas-from-a-field-and-have-the-modified-field-remain-part-of-the

To remove the first character in a string for an entire column: https://stackoverflow.com/questions/41663097/remove-first-character-from-a-string-in-data-frame-column

Collaboration

Collaborated with Nicole Frontero (in and outside of class) on questions 1 and 2. Went to TA hours on Monday night and collaborated with Martin Glusker on the Bitcoin and iTunes questions and helped Nicole with how to create functions in R.