This is an extension of the tidytuesday assignment you have already done. Complete the questions below, using the screencast you chose for the tidytuesday assigment.

Import data

library(tidyverse)
library(schrute)
theme_set(theme_light())
office_transcripts <- as_tibble(theoffice) %>%
  mutate(season = as.integer(season),
         episode = as.integer(episode)) %>%
  mutate(character = str_remove_all(character, '"')) %>%
  mutate(name = str_to_lower(str_remove_all(episode_name, "\\.| \\(Part.*")))
office_ratings <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-17/office_ratings.csv') %>%
  mutate(name = str_to_lower(str_remove_all(title, "\\.| \\(Part.*|\\: Part.*")))

Description of the data and definition of variables

the Data is ratings and scripts for The Office. The varriables are, ibm rating, the season the episode and characters lines.

Visualize data

Hint: One graph of your choice.

library(ggrepel)
office_ratings %>%
  group_by(season) %>%
  summarize(avg_rating = mean(imdb_rating)) %>%
  ggplot(aes(season, avg_rating)) +
  geom_line() +
  scale_x_continuous(breaks = 1:9)

office_ratings %>%
  mutate(title = fct_inorder(title),
         episode_number = row_number()) %>%
  ggplot(aes(episode_number, imdb_rating)) +
  geom_line() +
  geom_smooth() +
  geom_point(aes(color = factor(season), size = total_votes)) +
  geom_text(aes(label = title), check_overlap = TRUE, hjust = 1) +
  expand_limits(x = -10) +
  theme(panel.grid.major.x = element_blank(),
        legend.position = "none") +
  labs(x = "Episode number",
       y = "IMDB Rating",
       title = "Popularity of The Office episodes over time",
       subtitle = "Color represents season, size represents # of ratings")

What is the story behind the graph?

The story behind the graphed data is a display of the very funny sitcom called “The Office”. The data shows the ratings for the ratings and the scripts. It is broken up by diologue from characters and by season and episodes. it graphs the data by the rating, scripts, season and episode of which season of all the 188 episodes in total.

Hide the messages, but display the code and its results on the webpage.

Write your name for the author at the top.

Use the correct slug.