Analyzing The Office

Relationships, Distributions, Trends, and Comments

Continuous Variables and Their Distributions

First, I am going to look at the distributions of each of the continuous variables.

ggplot(data = office_ratings) +
  geom_boxplot(mapping = aes(x = viewers)) +
  labs(x = "Viewers (Millions)",
       title = "Distribution of the Viewers Variable") +
  coord_flip()

In this box plot, we can see that for the viewers variable, the majority of the data falls right around 6-8 millions viewers per episode. One single episode doubled the next highest episode’s viewership numbers causing an incredibly large skew on the data set. This happened because the episode aired on NBC directly after the broadcast of Super Bowl XLIII which highly inflated the number of viewers.

ggplot(data = office_ratings) +
  geom_boxplot(mapping = aes(x = imdb_rating)) +
  labs(x = "Rating on IMDb.com",
       title = "Distribution of the IMDb_Rating Variable") +
  coord_flip()

In this distribution, we can observe most points lie between about 7.9 and 8.6. There are exceptions below 7 and above 9.5. There aren’t any super extreme values in this set.

ggplot(data = office_ratings) +
  geom_boxplot(mapping = aes(x = total_votes)) +
  labs(x = "Total Votes on IMDb.com",
       title = "Distribution of the Total Votes Variable") +
  coord_flip()

In this distribution, the vast majority of data points are close to 2000, with several episodes around 4000, 6000, and one that got 8000. I would guess that these episodes with higher votes than normal were season finales or fan-favorite episodes. The one that got 8000 was the episode that aired after the Super Bowl.

Relationships Between the Continuous Variables

Next, this report will take a look at the various relationships between continuous variables. I would like to see whether it is true that the more people that view an episode, the more it is liked.

ggplot(data = office_ratings, mapping = aes(x = viewers, y = imdb_rating)) +
  geom_smooth(se = FALSE) +
  geom_point() +
  labs(title = "Rating as a Function of Viewership",
       x = "Viewers (Millions)",
       y = "IMDb Rating")

It would appear that as a general trend, the more people that view an episode, the higher it is rated.

There are a good amount of variation in this data, however, and it causes a good amount of exceptions.

ggplot(data = office_ratings, mapping = aes(x = viewers, y = imdb_rating)) +
  geom_smooth(se = FALSE) +
  geom_point(mapping = aes(color = season)) +
  labs(title = "Rating as a Function of Viewership With Respect to Season",
       x = "Viewers (Millions)",
       y = "IMDb Rating",
       color = "Season")

In this visualization, I added season as a color to more easily distinguish outlier episodes. Several notable points above the curve, well-liked with lower viewers, are more than likely fan-favorite episodes and season finales. The points below the curve, the episodes that were highly viewed but rated low, are more than likely highly anticipated episodes that weren’t as good as people were expecting. Also, a lot of the points below the curve are from the later seasons after Steve Carell left.

Next, I want to look at whether more people tend to leave a rating when more people watch an episode.

ggplot(data = office_ratings, mapping = aes(x = viewers, y = total_votes)) +
  geom_smooth(se = FALSE) +
  geom_point() +
  labs(title = "Total IMDb Ratings as a Function of Viewership",
       x = "Viewers (Millions)",
       y = "Total IMDb Ratings")

It would seem that when more people watch an episode, more people leave a review. Again, this data is skewed because of the season 5 episode.

Popularity and Appeal of The Office Over Time

In this section, the report seeks to answer questions regarding the popularity and appeal of The Office to determine whether the show holds up season to season.

Firstly, I would like to examine the relationship between popularity and time to see if The Office remained popular for its duration, or if there were major fluctuations.

ggplot(data = office_ratings) +
  geom_line(mapping = aes(x = air_date, y = viewers)) +
  labs(x = "Air Date",
       y = "Viewers",
       title = "Popularity of The Office Over Time")

The Office experienced a severe drop off in viewers after the pilot episode followed by a steady increase in viewership up to the third season. After that, The Office declined steadily in popularity until the beginning of the eighth season. At the beginning of the eighth season, the viewers dropped off significantly until the ninth season finale episode.

The next important relationship in determining whether The Office holds up across the seasons is to examine the appeal of the show over time.

ggplot(data = office_ratings) +
  geom_line(mapping = aes(x = air_date, y = imdb_rating))  +
  labs(x = "Air Date",
       y = "Average Rating",
       title = "Appeal of The Office Over Time")

This line graph shows an increase in rating up to late 2006. Then, a generally steady trend to about 2011 with a sharp drop off afterwards. There does seem to be some stronger episodes around 2007-2008 and again in early 2011. This shows that the middle of the show was received much better than the early episodes and the last few years of the show with some stronger episodes in certain seasons.

Trends In Individual Seasons

Finally, I am going to examine trends in viewership between individual seasons

ggplot(data = office_ratings) +
  geom_line(mapping = aes(x = air_date, y = viewers, color = season)) +
  labs(x = "Air Date",
       y = "Viewers (millions)",
       color = "Season",
       title = "Viewers of The Office Season by Season")

In general, seasons 2-5 seem to be the most highly viewed seasons with a steady decline in the next two. Following that, seasons 8 and 9 are much less viewed than the previous seasons, excluding season 1. Season 1 sees a steep drop after the pilot. I think based off of the popularity and appeal graphs, it can be concluded that this is due to season 1 not being that well received. Seasons 2 and 3 both have lower initial viewership with a peak in the middle and a drop off around the end. This is also shown in the popularity and appeal graphs. Season 4 has pretty steady viewership with a drop off around the end of the season. Season 5’s viewers trend down with a crazy outlier because of the Super Bowl. Season 6 and 7 both have higher viewers toward the beginning and end of the seasons and a small peak in the middle. This is probably due to pre season hype and finale excitement. Seasons 8 and 9 are both a downward trend as in a lot of seasons. However, those seasons have significantly lower viewers as I would imagine a large amount of people stopped watching after Steve Carell left the show. The exception is that the finale of the ninth season got a lot of viewers. i think this is because it was the last episode of the show and because Steve Carell made an appearance.

Analyzing The Office

Andrew Moody

9/3/2024

Introduction

Relationships, Distributions, Trends, and Comments

Continuous Variables and Their Distributions

Relationships Between the Continuous Variables

Popularity and Appeal of The Office Over Time

Trends In Individual Seasons

Conclusion