This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library("tidyverse")
simpsons <- readr::read_delim("https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2019/2019-08-27/simpsons-guests.csv", delim = "|") %>%
mutate(self = str_detect(role, "self|selves"),
season = parse_number(season))
There is 1 integer data type and 5 character data types present in this screencast. The five character data types include number(episode number), production_code(production code for the episode), episode_title(title of the episode), guest_star(Guest’s actual name), and role(role in the show, either a character or themselves). The only integer data type is season(season of the show). There are 1200 observations of six variables in this dataset. The rows each represent a guest star.
Hint: One graph of your choice.
simpsons %>%
filter(self) %>%
count(guest_star, sort = TRUE) %>%
filter(n > 1) %>%
mutate(guest_star = fct_reorder(guest_star, n)) %>%
ggplot(aes(guest_star, n)) +
geom_col() +
coord_flip() +
labs(title = "Who has played themselves in multiple Simpsons episodes?")
The story of this graph is to visualize the guest star and the amount of times they’ve played themselves in multiple Simpson’s episodes. This tells us how many times each actual guest star has guest starred as themselves on the show. This graph shows us the fact that Stephen Hawking has played as himself on the show the most times, with 4. The filter function was vital in making this graph because it let us filter the data to see who had the value “self” in role and “>1”. The visualization of the data also shows us that Ken Burns, Stephen Hawking, and Gary Coleman to have been on the show more than twice.