This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library(tidyverse)
library(schrute)
theme_set(theme_light())
office_transcripts <- as_tibble(theoffice)
office_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-17/office_ratings.csv')
the “schrute” package highlights the entire transcript from The Office. Inside the package, we can view: index, season, episode, episode_name, director, writer, character, text, text_w_direction, imdb_rating, total_votes, and air_date. “air_date” views when the episode was first televised. “total_votes” refers to the total number of people voting for that episode. “imdb_rating” is how the episode scored on the imdb site where 1 is the worst and 10 is the best. “text_w_direction” is dialogue spoken by the actor that includes stage direction. “text” is words spoken by that actor. “character” is the name of the character saying the line. “writer” refers to the names of the writers of that episode. “director” refers to the person who directed that episode. “episode_name” is the name of the episode. “episode” is the episode number. “season” is which season the episode is in. Lastly, “index” refers to row index. All of this information can be used to determine the ratings of each episode and season which Dave Robinson did in the Tidytuesday screencast video.
Hint: One graph of your choice.
office_ratings %>%
mutate(title = fct_inorder(title),
episode_number = row_number()) %>%
ggplot(aes(episode_number, imdb_rating)) +
geom_line() +
geom_smooth() +
geom_point(aes(color = factor(season), size = total_votes)) +
geom_text(aes(label = title), check_overlap = TRUE, hjust = 1) +
expand_limits(x = -10) +
theme(panel.grid.major.x = element_blank(),
legend.position = "none") +
labs(x = "Episode Number",
y = "IMDB Rating",
title = "The Office Popularity Over Time",
subtitle = "Color represents season, size represents number of votes")
This graph analyzes the popularity of each episode. The episode is color coded by season where you can see which episode is related to which season. At the episode name, you can see a colored ball where if it is smaller it indicates a smaller amount of ratings and if it is bigger than it has more ratings. The line graph then shows which episodes were rated more or less than other by showing how it was rated based on the y position. The lower the rating, the lower it will be on the graph and vise versa. In my graph it is harder to read the information, David Robinson was able to zoom in on the graph in the explaination where you could make it much larger and see everything. But for an Office fan, it is still easy to read. There is also a solid line going through the middle of the line graph that shows the average ratings throughout the entire series. Seasons 3, 4, and 5 seem to show the most viewers and seasons 7, 8, and 9 show the least amount of viewers. One of the lowest rated episodes was called “The Banker”, this episode was basically just a recap that showed many different scenes from previous episodes of The Office. There was nothing new about this episode and people tend to rate “clipshows” poorly. The highest rated episode was surpringly the finale episode. This surprises me because seasons 8 and 9 had terrible ratings but toward the end of the series it started to pick up again. It makes me wonder if people only watched and rated this final episode because they knew the series was coming to an end. I chose this graph over other graphs because it clearly shows how the office was performing. The Office is one of my favorite shows and I thought it was cool to see how the show was performing over time. Using the information in the graph, I was able to think about the episodes that were rated terribly and greatly and remember exactly why they were rated so. David Robinson did a great job with explaining what he was doing with creating a graph and what the functions did to help make the graph more clear to the viewers.