Introduction

The Office is a popular TV series that began in 2005 and ran through 2013. During its course airing it had many successes as well as several failures. In this report I will provide graphics and explanations on why viewership changed during certain points of time and how that effected ratings and the overall popularity of this hit TV show.

Functions

I have called in the function tidyverse in order to provide effective and clean graphics

library(tidyverse)

Data Set

The following data set was collected by GitHub as well as Wikipedia and will be the data used in the following report. This data set includes information on 186 entries that have variables that include season, episode, title, viewers, IMDB ratings, total votes, and air date. The variables that are most utilized in this report are season, viewers, IMDB rating, and total votes.

datatable(office_ratings, options = list(scrollX = TRUE))

Continuious Variable Distribution

ggplot(data = office_ratings)+ 
  geom_point(mapping = aes(x = imdb_rating, y = total_votes, color = viewers )) +
  labs( title = "Continuous Variable Distribution" ,
  x = "IMBD Rating" ,
  y = "# of Votes",
  color = "# of Viewers")

According to the displacement of the three continuous variables; IMBD Rating, Number of Viewers, and Number of Rating it is clear to see the the natural relationship between these three variables. When ratings are at their lowest, so are the total number of voters and viewers. We can see the same relationship is true when we look at the highest rating. The higher the rating the higher the viewers and voters. As we can see from this graph there are a few extreme outliers. We will cover why this might have occurred using the following two graphics.

Outliers and Why They Occur

ggplot(data = office_ratings)+ 
  geom_point(mapping = aes(x = viewers, y =imdb_rating )) +
  labs( title = "Number of Viewers in Relation to TV Rating" ,
  x = "# of Viewers" ,
  y = "IMDB Rating")

Looking into the data trends for the number of viewers compared to the IMDB ratings we can see that most of the time the more viewers The Office has, the higher the IMDB rating is. Although we can see this is the case most of the time, we can also see that there are a few exceptions to this. By adding on the season variable we can clearly see some seasons have a much higher count in views but have a low rating. In each season we can see that there are episodes in specific season being watched at a high rate while also having poor reviews. This graph shows how the earlier the season is, the higher it is rated, while later seasons are rated much lower at a lower viewership count. Think of how many times you have started watching a series just to not like it and then stop watching. We can see that this was likely what happened as each season went on.

ggplot(data = office_ratings)+ 
  geom_point(mapping = aes(x = viewers, y =imdb_rating, color = season )) +
  labs( title = "Number of Viewers Based off Season and Rating" ,
  x = "# of Viewers" ,
  y = "IMDB Rating" , 
  color = "Season")

Viewership Over the Course of Air Time

ggplot(data = office_ratings) +
  geom_point(mapping = aes(y = viewers, x = air_date, color = imdb_rating)) +
  geom_smooth(mapping = aes(x = air_date, y = viewers,)) +
  labs(title = "Viewership over Time" ,
       x = "Date Aired" , 
       y = "Number of Viewers" , 
       color = "Rating")

This graphic shows us the decline in The Office viewership over the course of time. We can clearly see that the trend curve begins a steady decline around 2008. Along with the decline in viewership we also see a decline in IMDB rating starting around 2011. After looking at the previous two graphs we can see that over the course of time the show gained less popularity with its peak being in 2006 and its lowest at 2013. Although popularity decreased the appeal of the show did not see a decrease until the last two seasons. The reason behind a decreasing popularity but an increased appeal is that viewers who begin the show in the first season might not all finish the whole series while loyal viewers will watch every season and give it a rating.

ggplot(data = office_ratings) +
  geom_boxplot(mapping = aes(x = season, y = viewers))+
  labs(title = "Season Viewership" ,
       x = "Season" , 
       y = "Number of Viewers")

After looking at this graph we see a clear trend in viewership based on the season. Seasons one and nine have extremely low viewership most likely based off of content within the season. The beginning and ends of a series often include background information that might not have the most interesting story line where seasons in between might have better content.

Conclusion

After analyzing the data, we can see a slight connection between viewership and rating, but more of a correlation between season and overall appeal. One way to better gauge the overall appeal of The Office is to update data to include viewership past 2013 and the amount of times each season or episode is rewatched.