library(tidyverse)
library(tidytuesdayR)
library(janitor)
library(ggeasy)
Use tt_load()
to pull last weeks data into R. This will create a list in your environnment. This week there is only 1 “thing” in the list but we still need to extract is using double square brackets[[ ]].
tt <- tt_load(2021, week=39)
##
## Downloading file 1 of 1: `nominees.csv`
nom <- tt[[1]]
glimpse(nom)
## Rows: 29,678
## Columns: 10
## $ category <chr> "Outstanding Character Voice-Over Performance - 2021", "Ou…
## $ logo <chr> "https://www.emmys.com/sites/default/files/styles/show_sea…
## $ production <chr> NA, NA, NA, NA, "Elisabeth Williams, Production Designer",…
## $ type <chr> "Nominee", "Nominee", "Nominee", "Nominee", "Nominee", "No…
## $ title <chr> "black-ish: Election Special (Part 2)", "Bridgerton", "Fam…
## $ distributor <chr> "ABC", "Netflix", "FOX", "FX Networks", "Hulu", "Hulu", "H…
## $ producer <chr> "ABC", "A Netflix Original Series in association with shon…
## $ year <dbl> 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021…
## $ page <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ page_id <dbl> 1, 2, 3, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9…
We decided to have a go at reproducing the plot that Thomas Mock posted on twitter.
Filter for just 2014 and nominees (rather than winners also), then drop variables not needed using select and -variablenotwanted.
nom2014 <- nom %>%
filter(year == "2014") %>%
filter(type == "Nominee") %>%
select(-logo, -production, -producer, -starts_with("page"))
Then use tabyl()
from the janitor
package to count the nominations by distributor, sorting in reverse order using arrange()
.
nom2014count <- nom2014 %>%
tabyl(distributor) %>%
arrange(-n)
head(nom2014count)
## distributor n percent
## HBO 12 0.20689655
## Netflix 8 0.13793103
## Showtime 8 0.13793103
## FX Networks 6 0.10344828
## ABC 5 0.08620690
## CBS 4 0.06896552
Not clear why there are only 12 nominations for HBO (should be 99), maybe the winners were counted with the nominees? Take out the nominee only filter and try again.
nomwin <- nom %>%
filter(year == "2014") %>%
select(-logo, -production, -producer, -starts_with("page"))
nomwin %>%
tabyl(distributor) %>%
arrange(-n)
## distributor n percent
## HBO 13 0.18571429
## Netflix 9 0.12857143
## Showtime 9 0.12857143
## CBS 7 0.10000000
## FX Networks 7 0.10000000
## ABC 6 0.08571429
## AMC 6 0.08571429
## PBS 5 0.07142857
## NBC 4 0.05714286
## BBC America 1 0.01428571
## FOX 1 0.01428571
## IFC 1 0.01428571
## Lifetime 1 0.01428571
Hmmm that doesn’t help. The dataframe only has 70 observations. There should be 99 for HBO alone.
Aside from the numbers not lining up at all, lets see what we can do with the graph
Plot n (aka count) by distributor.
nom2014count %>%
ggplot(aes(x = distributor, y = n)) +
geom_col() +
coord_flip()
In order to colour the bars by distribution category, we need to create a new variable using mutate. We can use case_when()
and str_detect()
to make a new column that categorises each service as cable, streaming or broadcast. Use head()
to print just the top few rows to see if that worked.
nom2014count <- nom2014count %>%
mutate(service = case_when(
str_detect(distributor, "HBO|FX|AMC|Showtime|Comedy|Lifetime|IFC") ~ "cable",
str_detect(distributor, "Netflix") ~ "streaming",
str_detect(distributor, "CBS|NBC|ABC|PBS|FOX|BBC") ~ "broadcast"))
head(nom2014count)
## distributor n percent service
## HBO 12 0.20689655 cable
## Netflix 8 0.13793103 streaming
## Showtime 8 0.13793103 cable
## FX Networks 6 0.10344828 cable
## ABC 5 0.08620690 broadcast
## CBS 4 0.06896552 broadcast
Now we can fill the bar by service type and use reorder to sort the bars from biggest to smallest by n.
nom2014count %>%
ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
geom_col() +
coord_flip()
Well the data are not at all similar to the original plot but we can pretend and make it look as similar as possible. First, use theme_minimal()
to get rid of the grey, and change the colour scheme to match the plot with scale_fill_manual()
.
nom2014count %>%
ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
geom_col() +
scale_fill_manual(values = c(cable = "blue", broadcast = "yellow", streaming = "red")) +
coord_flip() +
theme_minimal()
Arggh that is terrible! Generic red, yellow and blue are not good. Lets look up colours that are closer to the original. I found a handy tool that allows you to upload an image and it will tell you what the colour palette values are. Check it out at imagecolorpicker
nom2014count %>%
ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
geom_col() +
scale_fill_manual(values = c(cable = "#2f5fa3", broadcast = "#e9b53b", streaming = "#b12727")) +
coord_flip() +
theme_minimal()
Better! Lets move the legend and fix axes. The ggeasy
package functions make these kind of things … easy!
nom2014count %>%
ggplot(aes(x = reorder(distributor, n), y = n, fill = service)) +
geom_col() +
scale_fill_manual(values = c(cable = "#2f5fa3", broadcast = "#e9b53b", streaming = "#b12727")) +
coord_flip() +
theme_minimal() +
easy_move_legend(to = c("top")) +
easy_remove_legend_title() +
labs(title = "Netflix Challenges TV Networks at the 2014 Emmys",
subtitle = "Nominations at the 2014 Primetime Emmy Awards") +
easy_remove_x_axis() +
theme(axis.title.y=element_blank()) # removes y axis label
DONE!
Plots number 2 comes from from Susie Lu
This time we are filtering for just HBO and Netflix shows in the years between 2013 and 2017.
nom_red <- nom %>%
filter(year > 2012 & year < 2018) %>%
filter(type == "Nominee") %>%
filter(distributor %in% c("HBO", "Netflix"))
Again use tabyl()
to get counts by year and distributor
hbo_net_counts <- nom_red %>%
tabyl(year, distributor)
head(hbo_net_counts)
## year HBO Netflix
## 2013 189 33
## 2014 12 8
## 2015 219 79
## 2016 198 95
## 2017 216 192
Hmmmm that data is in wide format, we need to use pivot_longer()
to pull the distributor (HBO and Netflix) into one column and counts into another.
hbo_net_long <- hbo_net_counts %>%
pivot_longer(names_to = "distributor", values_to = "count", HBO:Netflix)
glimpse(hbo_net_long)
## Rows: 10
## Columns: 3
## $ year <dbl> 2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016, 2017, 2017
## $ distributor <chr> "HBO", "Netflix", "HBO", "Netflix", "HBO", "Netflix", "HBO…
## $ count <dbl> 189, 33, 12, 8, 219, 79, 198, 95, 216, 192
Better! now we want to distributor to be a factor (rather than character) and we want the order of the levels to be Netflix then HBO (the default is alphabetical).
hbo_net_long$distributor <- fct_relevel(hbo_net_long$distributor, c("Netflix", "HBO"))
levels(hbo_net_long$distributor)
## [1] "Netflix" "HBO"
Now we can plot!
hbo_net_long %>%
ggplot(aes(x = year, y = count, fill = distributor)) +
geom_col(position= "dodge")
Hmmm the numbers are definitely off again (what is up with 2014?), but we can fix the styling.
First, theme and colours.
hbo_net_long %>%
ggplot(aes(x = year, y = count, fill = distributor)) +
geom_col(position= "dodge") +
scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
theme_classic()
Now add gridlines and make the bars sit on the axis.
hbo_net_long %>%
ggplot(aes(x = year, y = count, fill = distributor)) +
geom_col(position= "dodge") +
scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
theme_classic() +
theme(panel.grid.major.y = element_line()) +
scale_y_continuous(expand = c(0,0), limits = c(0,250))
Drop the legend and add titles.
hbo_net_long %>%
ggplot(aes(x = year, y = count, fill = distributor)) +
geom_col(position= "dodge") +
scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
theme_classic() +
theme(panel.grid.major.y = element_line()) +
scale_y_continuous(expand = c(0,0), limits = c(0,250)) +
easy_remove_legend() +
labs(title = "NETFLIX VS. HBO AT THE EMMYS", subtitle = "Tracking the number of nominations HBO and Netflix have earned since the \n streaming service's first original program in 2013")
Out of interest, has Netflix surpassed HBO since 2017?? Remove the < 2018 filter and plot again
allyears <- nom %>%
filter(year > 2012) %>%
filter(type == "Nominee") %>%
filter(distributor %in% c("HBO", "Netflix")) %>%
tabyl(year, distributor) %>%
pivot_longer(names_to = "distributor", values_to = "count", HBO:Netflix)
glimpse(allyears)
## Rows: 18
## Columns: 3
## $ year <dbl> 2013, 2013, 2014, 2014, 2015, 2015, 2016, 2016, 2017, 2017…
## $ distributor <chr> "HBO", "Netflix", "HBO", "Netflix", "HBO", "Netflix", "HBO…
## $ count <dbl> 189, 33, 12, 8, 219, 79, 198, 95, 216, 192, 302, 251, 297,…
allyears$distributor <- fct_relevel(allyears$distributor, c("Netflix", "HBO"))
allyears$year <- as.factor(allyears$year)
allyears %>%
ggplot(aes(x = year, y = count, fill = distributor)) +
geom_col(position= "dodge") +
scale_fill_manual(values = c(Netflix = "#ee696d", HBO = "#d71920")) +
theme_classic() +
theme(panel.grid.major.y = element_line()) +
scale_y_continuous(expand = c(0,0), limits = c(0,500)) +
easy_remove_legend() +
labs(title = "NETFLIX VS. HBO AT THE EMMYS", subtitle = "Tracking the number of nominations HBI and Netflix have earned since the \n streaming service's first original program in 2013")
Wow- 2020 was a bumper year for Netflix!