Choose one of David Robinson’s tidytuesday screencasts, watch the video, and summarise. https://www.youtube.com/channel/UCeiiqmVK07qhY-wvg3IZiZQ

Instructions

You must follow the instructions below to get credits for this assignment.

Q1 What is the title of the screencast?

Tidy Tuesday screencast: analyzing ratings and scripts from The Office

Q2 When was it published?

March 16, 2020

Q3 Describe the data

Hint: What’s the source of the data; what does the row represent; how many observations?; what are the variables; and what do they mean?

The source of the data is an R package labeled “schrute”. To acces this data, Dave typed the function library(tidyverse) library(schrute). He then accessed the transcripts and saved it under the function office_transcripts <- as_tibble(theoffice). He then accessed the ratings and saved it under the function office_ratings <- readr::read_csv(‘https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-03-17/office_ratings.csv’)

Q4-Q5 Describe how Dave approached the analysis each step.

Hint: For example, importing data, understanding the data, data exploration, etc.

To better understand the data relevant to the ratings, Dave organized the information first by seasons. He showed the the ratings were best around season 4 and worst at season 8 when Steve Carrell left the show. He then tried to organize and color code each episode to its rating to show which episodes had the best and worst reviews. For the transciprts, Dave excluded all cast that were not very significant to the show such as one-liners. He left out keywords and made it so that it displayed only characters with a certain number of lines in the show. One of the things that Dave was trying to do is find out which characters said certain words the most. Such as Dwight and Jim saying “Michael” a lot and counting the amount of times they say it.

Q6 Did you see anything in the video that you learned in class? Describe.

Something in the video that we learned how to do in class is create a plot or line graph using the geom function. Using the ratings, Dave created a line graph that would show the ratings for each epsiode. We did similar analysis with different data but used the same process. Dave then color coded certain data that he wanted to stand out from the rest. In class we learned how to color code data and add different x and y labels to each graph.

Q7 What is a major finding from the analysis.

One of the major findings that I found most interesting was the ratings compared to the amount of views. One of the worst rated episodes had a significant amount of views and it was just a clip-show where the episode was compiled of multiple clips of previous episodes. Dave mentions that it does not surprise him that this episode was rated so poorly, because people do not like clip-shows. In seasons 8 and 9, it was shown that the view count was much lower than the previous seasons due to Steve Carrell leaving the show. The last 3 episodes of the series had larger viewings because it was the end of the series. Dave had shown that each season finale seemed to have more views than the other episodes in the season.

Q8 What is the most interesting thing you really liked about the analysis.

The most interesting thing that I found was that when analyzing the transcipts, the data was able to pick up words that were not real. One of the episodes of The Office, the cast was singing a Christmas son “Little Drummer Boy” and past of the lyrics are “pum” like a drum sound. The data picked up that Dwight Schrute said this word a significant amount of time for just this character. The functions that Dave used picked up specific words to specific characters which amazed me. The Office is one of my favorite shows so it was cool to see that the data was picking up on things that you would not think of.

Q9 Display the title and your name correctly at the top of the webpage.

Q10 Use the correct slug.