Question 1:

Who Variables Threshold Test % Correct
Alden job2 + height + diet3 + income2 0.5 84.1
Amanda income_level + job_new + body_type_buckets 0.5 70.7
Brenda income+height 0.5 83.2
James income + job + orientation + body_type 0.5 71.4
Albert height 0.5 83.0

Recall from Lec12.R and quiz

Question 2:

Important Principle of Coding: Don’t Repeat Yourself

# 1. Get gold & bitcoin data
# 2. Make column structure the same
# 3. Add variable type
gold <- Quandl("BUNDESBANK/BBK01_WT5511") %>% 
  select(Date, Value) %>% 
  mutate(type="Gold")

bitcoin <- Quandl("BAVERAGE/USD") %>% 
  rename(Value = `24h Average`) %>% 
  select(Date, Value) %>% 
  mutate(type="Bitcoin")

# Combine them into single data frame using bind_rows()
combined <- bind_rows(gold, bitcoin) %>% 
  # Group by here!
  group_by(type) %>% 
  # Then do the following ONLY ONCE:
  filter(year(Date) >= 2011) %>% 
  arrange(Date) %>% 
  mutate(
    Value_yest = lag(Value),
    rel_diff = 100 * (Value-Value_yest)/Value_yest
    )

# Plot
ggplot(combined, aes(x=Date, y=rel_diff, col=type)) +
  geom_line() +
  labs(y="% Change")

Question 3:

When parsing the time, what most of you did:

jukebox_hourly <- jukebox %>%
  mutate(
    date_time = parse_date_time(date_time, "%b %d %H%M%S %Y"),
    hour=hour(date_time)
    ) %>%
  group_by(hour) %>%
  summarise(count=n())

What’s wrong with this plot?

ggplot(data=jukebox_hourly, aes(x=hour, y=count)) + 
  geom_bar(stat="identity") +
  xlab("Hour of day") + 
  ylab("Number of songs played")