Visualizing Trends in The Office

Introduction

This report seeks to answer the following question:

Is there a relationship between the overall viewership of the show “The Office” and rating of the show? While answering this questing the data will also be examined in comparison to other outside variables that may have impacted the viewership over time or during specific seasons.

We will be using a data set called office_ratings obtained from https://github.com/rfordatascience/tidytuesday/blob/main/data/2020/2020-03-17/readme.md and https://en.wikipedia.org/wiki/List_of_The_Office_(American_TV_series)_episodes.It contains 186 entries, which makes up for most of the episodes in the series. There are 7 variables for each episode; relevant variables for this report include viewers(the total number of viewers in millions on original air date), imdb_rating(the average fan rating on IMDb.com from 1 to 10), and total_votes(number of ratings on IMDb,com). Other variables such as season(which season the episode aired) and episode(episode number within season) were also used in creating visualizations. The full data set can be viewed below:

Throughout, we will need the functionality of the tidyverse package, mainly to create visualizations.

library(tidyverse)

Distribution of Viewers, Average IMDb Rating, and Number of Ratings

The first trend I am studying is the overall distribution of viewers of The Office. The below distribution showcases the overall viewership trends within The Office as a whole. The graph was achieved through the use of the frequency polygon with the ggplotfunction:

ggplot(data=office_ratings,mapping=aes(x=viewers))+
  geom_freqpoly()+
  labs(title="Distribution of Viewers",
       x="Viewers (In Millions)",
       y="Number of Episodes")

This graph shows the distribution is right skewed, meaning the majority of the views hovered between 5 million and 8 million on averages throughout the duration of the shows run. The peak of the graph sits at 7.5 million making that the most common amount of viewers when the episode aired. The following graph highlights the distribution of an episodes average IMDb rating, again this graph was achieved with the use of a frequency polygon:

ggplot(data=office_ratings,mapping=aes(x=imdb_rating))+
  geom_freqpoly()+
  labs(title="Average IMDb Rating",
       x="Average Rating",
       y="Number of Episodes")

The distribution is relatively normal with a major peak around 8.2 with around 25 episodes at this rating. There are also smaller peaks at 9.3, and 8.7 which indicates that many episodes have average ratings sitting at each of these peaks. The last distribution in this section is looking at the total number of IMDb ratings throughout the shows run. This graph was also created with a frequency polygon:

ggplot(data=office_ratings,mapping=aes(x=total_votes))+
  geom_freqpoly()+
  labs(title="Distribution of Total IMDb Fan Ratings",
       x="Total Votes",
       y="Number of Episodes")

This graph is also right skewed, in a similar fashion to the distribution of viewers. The peak sits at roughly 1,800 total number of fan ratings and slowly decreases. There are two outliers one with 5,900 ratings and one with close to 8,000 ratings. Overall most episodes set within the 1,500 to 2,500 range.

Relationship Between Viewership and IMDb Rating

With a better understanding of the viewership, average IMDb ratings, and number of fan ratings, comparisons between each of them can be created. First using a scatter plot layered with a line of best fit a graph can be created looking at the relationship between viewership and average IMDb rating. This graph appears as follows:

ggplot(data=office_ratings,mapping=aes(x=viewers,y=imdb_rating))+
  geom_point()+
  geom_smooth(mapping = aes(x = viewers, y = imdb_rating), se=FALSE)+
  labs(title="Rating in Comparison to Viewers",
       x="Viewers",
       y="Average IMDb Rating")

As this graph demonstrates there is a slight positive correlation between the number of viewers and the average IMDb rating. Since this trend curve is not very consistent there is a slight correlation but not a direct causation between the number of viewers and the average IMDb rating. Considering the correlation there are a few exceptions to this pattern as shown in the following graph:

ggplot(data=office_ratings,mapping=aes(x=viewers,y=imdb_rating))+
  geom_point(aes(color=season))+
  geom_smooth(mapping = aes(x = viewers, y = imdb_rating), se=FALSE)+
  labs(title="Rating in Comparison to Viewers",
       x="Viewers",
       y="Average IMDB Rating",
       color="Season")

As seen in this graph there is an episode from season 5 that received high ratings and high viewership. There are three episodes from season nine that received higher than expected ratings based on the line of best fit. The first episode in season one has less reviews than expected for the amount views received. Similarly one episode from season 6 and 8 had similar points.

Relationship Between Viewership and Total Number of IMDb Ratings

After looking at the viewership compared to the average rating, the focus of the report will shift to looking at the total number of IMDb votes. Again using ggplot with the scatter plot function, the following graph depicts viewers compared to the total number of ratings:

ggplot(data=office_ratings,mapping=aes(x=viewers,y=total_votes))+
  geom_point()+
  geom_smooth(se=FALSE)+
  labs(title="Number of Ratings Compared to Viewers",
       x="Viewers",
       y="Total Number of Ratings")

This graph shows a positive correlation between the viewership and the total number of fan ratings from IMDb. This shows a relationship stronger than that of average rating and viewership but it is not the strongest positive relationship. There are also some exceptions to this pattern as the next visualization showcases:

ggplot(data=office_ratings,mapping=aes(x=viewers,y=total_votes))+
  geom_point(aes(color=season))+
  geom_smooth(se=FALSE)+
  labs(title="Number of Ratings in Comparison to Viewers",
       x="Viewers",
       y="Total Number of Ratings",
       color="Season")

By adding color to the existing plot, there is an episode in season nine that has a large amount of reviews considering the amount of views it received. Similarly there is an episode in season seven that also had more reviews that expected on the line of best fit.

Popularity and Appeal of The Office Over Time

Now that the report has looked at overall viewership and how that compares to IMDb interactions and ratings, we can turn attention to looking at the popularity and appeal of the show throughout the series run. By using a box plot the following graph looks at viewership trends for each season:

ggplot(data = office_ratings,mapping = aes(x = season, y = viewers)) +
  geom_boxplot() +
  labs(title = "Viewers in Comparison to Season",
       x = "Season",
       y = "Viewers")

The visualization above allows each season to be looked at individually and in comparison to the surrounding seasons. Based on the graph it can be assumed that the show was more popular during the second season through the fifth season. There is a slight decline in viewership with between season six and seven, followed by a drastic decline in season eight and nine. The Office overall grew in popularity from season one until season three and four, from then on the popularity slowed and then drastically declined. Another question that can be followed up with the replacement of viewers with imdb_rating is: how does the overall appeal of the show change in relation to its popularity? Which the following graph, again using box plots highlights:

ggplot(data = office_ratings,mapping = aes(x = season, y = imdb_rating)) +
  geom_boxplot() +
  labs(title = "IMDb Rating in Comparison to Season",
       x = "Season",
       y = "IMDb Rating")

This visualization highlights how the average IMDb rating of each season over time again increases from season one to its peak in season three and four. Then it has a slight decrease until season seven with a increase in season seven, then a drastic decline in season eight with a slight improvement in season nine.

Appeal in comparison to popularity have different patterns over the duration of the shows run. The popularity increases peaks in season three and four and then declines through the end of the series run. Whereas appeal increases until it peaks in season three and four, followed by two seasons of decline with an increase in season seven and again a decrease in eight with a slight increase again in season nine. This increase in eight could be explained by loyal fans that began to rate the show more doing its popularity decrease. This is a possible explanation for the increase in ratings during the decrease in viewership. A main character left the series at the end of season seven which could explain the drastic dip in viewership. But thankfully due to those “hardcore” fans giving the show its love on IMDb during the characters departure it allowed the appeal of the show to improve. Lastly, using geom_smooth we can compare the viewership in each individual seasons:

ggplot(data = office_ratings,mapping = aes(x = episode,y=viewers)) +
  geom_smooth(aes(color=season),se=FALSE)+
  labs(title = "Viewership Throughout Each Season",
       x = "Episode",
       y = "Viewership (In Millions)",
       color="Season")

This final graph highlights the distribution of viewers over the episodes of each season as designated by the colors aesthetic. Seasons two through season seven sit at roughly similar numbers of viewership, season one started off very strong in terms of viewership and then drastically decreased throughout the six episodes. Again the episode in season five that aired after the super bowl resulting in over 20 million views does have a large impact on the shape of this distribution. Season eight started off with lower viewership than previous seasons and continued to decrease throughout the season, this could be caused by a main character leaving at the end of season seven resulting in less people coming back to watch the show. Lastly, season nine had the lowest number of views of any season in the series through the entirety of this season except for a small spike during the series finally.

Conclusion

In summary, we can conclude that there are relationships between the average IMDb rating, Number of IMDb ratings, and the original air date viewers for all episodes of The Office, yet some of this relationships may not be strong enough to state they are a direct causation of one another. It can be concluded that the amount of viewers doesn’t directly determine the appeal of an episode it does impact the number of ratings an episode receives. As seen in this report The Office was and still is a very popular show and not only received but continues to receive good ratings.