This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.
library(tidyverse)
library(schrute)
theme_set(theme_light())
office_transcripts <- as_tibble(theoffice) %>%
mutate(season = as.integer(season),
episode = as.integer(episode)) %>%
mutate(character = str_remove_all(character, '"')) %>%
mutate(name = str_to_lower(str_remove_all(episode_name, "\\.| \\(Part.*")))
office_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-17/office_ratings.csv') %>%
mutate(name = str_to_lower(str_remove_all(title, "\\.| \\(Part.*|\\: Part.*")))
the Data is ratings and scripts for The Office. The varriables are, ibm rating, the season the episode and characters lines.
Hint: One graph of your choice.
library(ggrepel)
office_ratings %>%
group_by(season) %>%
summarize(avg_rating = mean(imdb_rating)) %>%
ggplot(aes(season, avg_rating)) +
geom_line() +
scale_x_continuous(breaks = 1:9)
office_ratings %>%
mutate(title = fct_inorder(title),
episode_number = row_number()) %>%
ggplot(aes(episode_number, imdb_rating)) +
geom_line() +
geom_smooth() +
geom_point(aes(color = factor(season), size = total_votes)) +
geom_text(aes(label = title), check_overlap = TRUE, hjust = 1) +
expand_limits(x = -10) +
theme(panel.grid.major.x = element_blank(),
legend.position = "none") +
labs(x = "Episode number",
y = "IMDB Rating",
title = "Popularity of The Office episodes over time",
subtitle = "Color represents season, size represents # of ratings")
The story behind the graphed data is a display of the very funny sitcom called “The Office”. The data shows the ratings for the ratings and the scripts. It is broken up by diologue from characters and by season and episodes. it graphs the data by the rating, scripts, season and episode of which season of all the 188 episodes in total.