This report seeks to answer the following question:
Is there a relationship between the overall viewership of the show “The Office” and rating of the show? While answering this questing the data will also be examined in comparison to other outside variables that may have impacted the viewership over time or during specific seasons.
We will be using a data set called office_ratings
obtained from https://github.com/rfordatascience/tidytuesday/blob/main/data/2020/2020-03-17/readme.md
and https://en.wikipedia.org/wiki/List_of_The_Office_(American_TV_series)_episodes.It
contains 186 entries, which makes up for most of the episodes in the
series. There are 7 variables for each episode; relevant variables for
this report include viewers(the total number of viewers in
millions on original air date), imdb_rating(the average fan
rating on IMDb.com from 1 to 10), and total_votes(number of
ratings on IMDb,com). Other variables such as season(which
season the episode aired) and episode(episode number within
season) were also used in creating visualizations. The full data set can
be viewed below:
Throughout, we will need the functionality of the tidyverse package, mainly to create visualizations.
library(tidyverse)
The first trend I am studying is the overall distribution of viewers
of The Office. The below distribution showcases the overall viewership
trends within The Office as a whole. The graph was achieved through the
use of the frequency polygon with the ggplotfunction:
ggplot(data=office_ratings,mapping=aes(x=viewers))+
geom_freqpoly()+
labs(title="Distribution of Viewers",
x="Viewers (In Millions)",
y="Number of Episodes")
This graph shows the distribution is right skewed, meaning the majority of the views hovered between 5 million and 8 million on averages throughout the duration of the shows run. The peak of the graph sits at 7.5 million making that the most common amount of viewers when the episode aired. The following graph highlights the distribution of an episodes average IMDb rating, again this graph was achieved with the use of a frequency polygon:
ggplot(data=office_ratings,mapping=aes(x=imdb_rating))+
geom_freqpoly()+
labs(title="Average IMDb Rating",
x="Average Rating",
y="Number of Episodes")
The distribution is relatively normal with a major peak around 8.2 with around 25 episodes at this rating. There are also smaller peaks at 9.3, and 8.7 which indicates that many episodes have average ratings sitting at each of these peaks. The last distribution in this section is looking at the total number of IMDb ratings throughout the shows run. This graph was also created with a frequency polygon:
ggplot(data=office_ratings,mapping=aes(x=total_votes))+
geom_freqpoly()+
labs(title="Distribution of Total IMDb Fan Ratings",
x="Total Votes",
y="Number of Episodes")
This graph is also right skewed, in a similar fashion to the
distribution of viewers. The peak sits at roughly 1,800 total number of
fan ratings and slowly decreases. There are two outliers one with 5,900
ratings and one with close to 8,000 ratings. Overall most episodes set
within the 1,500 to 2,500 range.
With a better understanding of the viewership, average IMDb ratings, and number of fan ratings, comparisons between each of them can be created. First using a scatter plot layered with a line of best fit a graph can be created looking at the relationship between viewership and average IMDb rating. This graph appears as follows:
ggplot(data=office_ratings,mapping=aes(x=viewers,y=imdb_rating))+
geom_point()+
geom_smooth(mapping = aes(x = viewers, y = imdb_rating), se=FALSE)+
labs(title="Rating in Comparison to Viewers",
x="Viewers",
y="Average IMDb Rating")
As this graph demonstrates there is a slight positive correlation
between the number of viewers and the average IMDb rating. Since this
trend curve is not very consistent there is a slight correlation but not
a direct causation between the number of viewers and the average IMDb
rating. Considering the correlation there are a few exceptions to this
pattern as shown in the following graph:
ggplot(data=office_ratings,mapping=aes(x=viewers,y=imdb_rating))+
geom_point(aes(color=season))+
geom_smooth(mapping = aes(x = viewers, y = imdb_rating), se=FALSE)+
labs(title="Rating in Comparison to Viewers",
x="Viewers",
y="Average IMDB Rating",
color="Season")
As seen in this graph there is an episode from season 5 that received
high ratings and high viewership. There are three episodes from season
nine that received higher than expected ratings based on the line of
best fit. The first episode in season one has less reviews than expected
for the amount views received. Similarly one episode from season 6 and 8
had similar points.
After looking at the viewership compared to the average rating, the
focus of the report will shift to looking at the total number of IMDb
votes. Again using ggplot with the scatter plot function,
the following graph depicts viewers compared to the total number of
ratings:
ggplot(data=office_ratings,mapping=aes(x=viewers,y=total_votes))+
geom_point()+
geom_smooth(se=FALSE)+
labs(title="Number of Ratings Compared to Viewers",
x="Viewers",
y="Total Number of Ratings")
This graph shows a positive correlation between the viewership and the total number of fan ratings from IMDb. This shows a relationship stronger than that of average rating and viewership but it is not the strongest positive relationship. There are also some exceptions to this pattern as the next visualization showcases:
ggplot(data=office_ratings,mapping=aes(x=viewers,y=total_votes))+
geom_point(aes(color=season))+
geom_smooth(se=FALSE)+
labs(title="Number of Ratings in Comparison to Viewers",
x="Viewers",
y="Total Number of Ratings",
color="Season")
By adding color to the existing plot, there is an episode in season nine
that has a large amount of reviews considering the amount of views it
received. Similarly there is an episode in season seven that also had
more reviews that expected on the line of best fit.
Now that the report has looked at overall viewership and how that compares to IMDb interactions and ratings, we can turn attention to looking at the popularity and appeal of the show throughout the series run. By using a box plot the following graph looks at viewership trends for each season:
ggplot(data = office_ratings,mapping = aes(x = season, y = viewers)) +
geom_boxplot() +
labs(title = "Viewers in Comparison to Season",
x = "Season",
y = "Viewers")
The visualization above allows each season to be looked at
individually and in comparison to the surrounding seasons. Based on the
graph it can be assumed that the show was more popular during the second
season through the fifth season. There is a slight decline in viewership
with between season six and seven, followed by a drastic decline in
season eight and nine. The Office overall grew in popularity
from season one until season three and four, from then on the popularity
slowed and then drastically declined. Another question that can be
followed up with the replacement of viewers with
imdb_rating is: how does the overall appeal of the show
change in relation to its popularity? Which the following graph, again
using box plots highlights:
ggplot(data = office_ratings,mapping = aes(x = season, y = imdb_rating)) +
geom_boxplot() +
labs(title = "IMDb Rating in Comparison to Season",
x = "Season",
y = "IMDb Rating")
This visualization highlights how the average IMDb rating of each season
over time again increases from season one to its peak in season three
and four. Then it has a slight decrease until season seven with a
increase in season seven, then a drastic decline in season eight with a
slight improvement in season nine.
Appeal in comparison to popularity have different patterns over the
duration of the shows run. The popularity increases peaks in season
three and four and then declines through the end of the series run.
Whereas appeal increases until it peaks in season three and four,
followed by two seasons of decline with an increase in season seven and
again a decrease in eight with a slight increase again in season nine.
This increase in eight could be explained by loyal fans that began to
rate the show more doing its popularity decrease. This is a possible
explanation for the increase in ratings during the decrease in
viewership. A main character left the series at the end of season seven
which could explain the drastic dip in viewership. But thankfully due to
those “hardcore” fans giving the show its love on IMDb during the
characters departure it allowed the appeal of the show to improve.
Lastly, using geom_smooth we can compare the viewership in
each individual seasons:
ggplot(data = office_ratings,mapping = aes(x = episode,y=viewers)) +
geom_smooth(aes(color=season),se=FALSE)+
labs(title = "Viewership Throughout Each Season",
x = "Episode",
y = "Viewership (In Millions)",
color="Season")
This final graph highlights the distribution of viewers over the episodes of each season as designated by the colors aesthetic. Seasons two through season seven sit at roughly similar numbers of viewership, season one started off very strong in terms of viewership and then drastically decreased throughout the six episodes. Again the episode in season five that aired after the super bowl resulting in over 20 million views does have a large impact on the shape of this distribution. Season eight started off with lower viewership than previous seasons and continued to decrease throughout the season, this could be caused by a main character leaving at the end of season seven resulting in less people coming back to watch the show. Lastly, season nine had the lowest number of views of any season in the series through the entirety of this season except for a small spike during the series finally.
In summary, we can conclude that there are relationships between the average IMDb rating, Number of IMDb ratings, and the original air date viewers for all episodes of The Office, yet some of this relationships may not be strong enough to state they are a direct causation of one another. It can be concluded that the amount of viewers doesn’t directly determine the appeal of an episode it does impact the number of ratings an episode receives. As seen in this report The Office was and still is a very popular show and not only received but continues to receive good ratings.