load packages

library(tidyverse)
library(tidytuesdayR)
library(janitor)
library(ggeasy)

read the data

Use tt_load() to pull last weeks data into R. This will create a list in your environnment. This week there is only 1 “thing” in the list but we still need to extract is using double square brackets[[ ]].

tt <- tt_load(2021, week=39)

## 
##  Downloading file 1 of 1: `nominees.csv`

nom <- tt[[1]]

glimpse(nom)

## Rows: 29,678
## Columns: 10
## $ category    <chr> "Outstanding Character Voice-Over Performance - 2021", "Ou…
## $ logo        <chr> "https://www.emmys.com/sites/default/files/styles/show_sea…
## $ production  <chr> NA, NA, NA, NA, "Elisabeth Williams, Production Designer",…
## $ type        <chr> "Nominee", "Nominee", "Nominee", "Nominee", "Nominee", "No…
## $ title       <chr> "black-ish: Election Special (Part 2)", "Bridgerton", "Fam…
## $ distributor <chr> "ABC", "Netflix", "FOX", "FX Networks", "Hulu", "Hulu", "H…
## $ producer    <chr> "ABC", "A Netflix Original Series in association with shon…
## $ year        <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021…
## $ page        <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ page_id     <dbl> 1, 2, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9…

twitter plot

We decided to have a go at reproducing the plot that Thomas Mock posted on twitter.

Filter for just 2014 and nominees (rather than winners also), then drop variables not needed using select and -variablenotwanted.

nom2014 <- nom %>%
  filter(year == "2014") %>%
  filter(type == "Nominee") %>%
  select(-logo, -production, -producer, -starts_with("page"))

Then use tabyl() from the janitor package to count the nominations by distributor, sorting in reverse order using arrange().

nom2014count <- nom2014 %>%
  tabyl(distributor) %>%
  arrange(-n) 

head(nom2014count)

##  distributor  n    percent
##          HBO 12 0.20689655
##      Netflix  8 0.13793103
##     Showtime  8 0.13793103
##  FX Networks  6 0.10344828
##          ABC  5 0.08620690
##          CBS  4 0.06896552

Not clear why there are only 12 nominations for HBO (should be 99), maybe the winners were counted with the nominees? Take out the nominee only filter and try again.

nomwin <- nom %>%
  filter(year == "2014") %>%
  select(-logo, -production, -producer, -starts_with("page"))

nomwin %>%
  tabyl(distributor) %>%
  arrange(-n)

##  distributor  n    percent
##          HBO 13 0.18571429
##      Netflix  9 0.12857143
##     Showtime  9 0.12857143
##          CBS  7 0.10000000
##  FX Networks  7 0.10000000
##          ABC  6 0.08571429
##          AMC  6 0.08571429
##          PBS  5 0.07142857
##          NBC  4 0.05714286
##  BBC America  1 0.01428571
##          FOX  1 0.01428571
##          IFC  1 0.01428571
##     Lifetime  1 0.01428571

Hmmm that doesn’t help. The dataframe only has 70 observations. There should be 99 for HBO alone.

Aside from the numbers not lining up at all, lets see what we can do with the graph

Plot n (aka count) by distributor.

nom2014count %>%
  ggplot(aes(x = distributor, y = n)) +
  geom_col() +
  coord_flip()

In order to colour the bars by distribution category, we need to create a new variable using mutate. We can use case_when() and str_detect() to make a new column that categorises each service as cable, streaming or broadcast. Use head() to print just the top few rows to see if that worked.

nom2014count <- nom2014count %>%
  mutate(service = case_when(
    str_detect(distributor, "HBO|FX|AMC|Showtime|Comedy|Lifetime|IFC") ~ "cable", 
                             str_detect(distributor, "Netflix") ~ "streaming", 
                             str_detect(distributor, "CBS|NBC|ABC|PBS|FOX|BBC") ~ "broadcast"))

head(nom2014count)

##  distributor  n    percent   service
##          HBO 12 0.20689655     cable
##      Netflix  8 0.13793103 streaming
##     Showtime  8 0.13793103     cable
##  FX Networks  6 0.10344828     cable
##          ABC  5 0.08620690 broadcast
##          CBS  4 0.06896552 broadcast

Now we can fill the bar by service type and use reorder to sort the bars from biggest to smallest by n.

nom2014count %>%
  ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
  geom_col() +
  coord_flip()

Well the data are not at all similar to the original plot but we can pretend and make it look as similar as possible. First, use theme_minimal() to get rid of the grey, and change the colour scheme to match the plot with scale_fill_manual().

nom2014count %>%
  ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
  geom_col() +
  scale_fill_manual(values = c(cable = "blue", broadcast = "yellow", streaming = "red")) +
  coord_flip() +
  theme_minimal()

Arggh that is terrible! Generic red, yellow and blue are not good. Lets look up colours that are closer to the original. I found a handy tool that allows you to upload an image and it will tell you what the colour palette values are. Check it out at imagecolorpicker

nom2014count %>%
  ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
  geom_col() +
  scale_fill_manual(values = c(cable = "#2f5fa3", broadcast = "#e9b53b", streaming = "#b12727")) +
  coord_flip() +
  theme_minimal()

Better! Lets move the legend and fix axes. The ggeasy package functions make these kind of things … easy!

nom2014count %>%
  ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
  geom_col() +
  scale_fill_manual(values = c(cable = "#2f5fa3", broadcast = "#e9b53b", streaming = "#b12727")) +
  coord_flip() +
  theme_minimal() +
  easy_move_legend(to = c("top")) +
  easy_remove_legend_title() +
  labs(title = "Netflix Challenges TV Networks at the 2014 Emmys", 
       subtitle = "Nominations at the 2014 Primetime Emmy Awards") +
  easy_remove_x_axis() +
  theme(axis.title.y=element_blank()) # removes y axis label

DONE!

susie lu 1

Plots number 2 comes from from Susie Lu

This time we are filtering for just HBO and Netflix shows in the years between 2013 and 2017.

nom_red <- nom %>% 
  filter(year > 2012 & year < 2018) %>%
  filter(type == "Nominee") %>%
  filter(distributor %in% c("HBO", "Netflix"))

Again use tabyl() to get counts by year and distributor

hbo_net_counts <- nom_red %>%
  tabyl(year, distributor)

head(hbo_net_counts)

##  year HBO Netflix
##  2013 189      33
##  2014  12       8
##  2015 219      79
##  2016 198      95
##  2017 216     192

Hmmmm that data is in wide format, we need to use pivot_longer() to pull the distributor (HBO and Netflix) into one column and counts into another.

hbo_net_long <- hbo_net_counts %>%
  pivot_longer(names_to = "distributor", values_to = "count", HBO:Netflix)

glimpse(hbo_net_long)

## Rows: 10
## Columns: 3
## $ year        <dbl> 2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016, 2017, 2017
## $ distributor <chr> "HBO", "Netflix", "HBO", "Netflix", "HBO", "Netflix", "HBO…
## $ count       <dbl> 189, 33, 12, 8, 219, 79, 198, 95, 216, 192

Better! now we want to distributor to be a factor (rather than character) and we want the order of the levels to be Netflix then HBO (the default is alphabetical).

hbo_net_long$distributor <- fct_relevel(hbo_net_long$distributor, c("Netflix", "HBO"))

levels(hbo_net_long$distributor)

## [1] "Netflix" "HBO"

Now we can plot!

hbo_net_long %>%
  ggplot(aes(x = year, y = count, fill = distributor)) +
  geom_col(position= "dodge")

Hmmm the numbers are definitely off again (what is up with 2014?), but we can fix the styling.

First, theme and colours.

hbo_net_long %>%
  ggplot(aes(x = year, y = count, fill = distributor)) +
  geom_col(position= "dodge") +
  scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
  theme_classic()

Now add gridlines and make the bars sit on the axis.

hbo_net_long %>%
  ggplot(aes(x = year, y = count, fill = distributor)) +
  geom_col(position= "dodge") +
  scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
  theme_classic() +
  theme(panel.grid.major.y = element_line()) +
  scale_y_continuous(expand = c(0,0), limits = c(0,250))

Drop the legend and add titles.

hbo_net_long %>%
  ggplot(aes(x = year, y = count, fill = distributor)) +
  geom_col(position= "dodge") +
  scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
  theme_classic() +
  theme(panel.grid.major.y = element_line()) +
  scale_y_continuous(expand = c(0,0), limits = c(0,250)) +
  easy_remove_legend() +
  labs(title = "NETFLIX VS. HBO AT THE EMMYS", subtitle = "Tracking the number of nominations HBO and Netflix have earned since the \n streaming service's first original program in 2013")

Out of interest, has Netflix surpassed HBO since 2017?? Remove the < 2018 filter and plot again

allyears <- nom %>% 
  filter(year > 2012) %>%
  filter(type == "Nominee") %>%
  filter(distributor %in% c("HBO", "Netflix")) %>%
  tabyl(year, distributor) %>%
 pivot_longer(names_to = "distributor", values_to = "count", HBO:Netflix) 

glimpse(allyears)

## Rows: 18
## Columns: 3
## $ year        <dbl> 2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016, 2017, 2017…
## $ distributor <chr> "HBO", "Netflix", "HBO", "Netflix", "HBO", "Netflix", "HBO…
## $ count       <dbl> 189, 33, 12, 8, 219, 79, 198, 95, 216, 192, 302, 251, 297,…

allyears$distributor <- fct_relevel(allyears$distributor, c("Netflix", "HBO"))

allyears$year <- as.factor(allyears$year) 

allyears %>%
  ggplot(aes(x = year, y = count, fill = distributor)) +
  geom_col(position= "dodge") +
  scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
  theme_classic() +
  theme(panel.grid.major.y = element_line()) +
  scale_y_continuous(expand = c(0,0), limits = c(0,500)) +
  easy_remove_legend() +
  labs(title = "NETFLIX VS. HBO AT THE EMMYS", subtitle = "Tracking the number of nominations HBI and Netflix have earned since the \n streaming service's first original program in 2013")

Wow- 2020 was a bumper year for Netflix!

Week 39: The Emmys

2021-09-28

load packages

read the data

twitter plot

susie lu 1