Introduction

The problem being investigated today is the relationship between the amount of viewers of The Office, and the ratings on IMDb on each episode as the seasons progress. One might expect, the more and episode is watched, the higher it is rated; the more an episode is watched, the more ratings it has. This being said, the less an episode is watched, the lower it is rated, and the less ratings it has.

There are seven different variables in this data set each with one of three different variable types.

-Season; categorical; season during which the episode aired.

-Episode; categorical; episode number within the season.

-Title; categorical; title of episode.

-Viewers; continuous; number of viewers (in millions) on original air date.

-IMDb_rating; continuous; average fan rating on IMDb.com from 1 to 10

-Total_votes; continuous; number of ratings on IMDb.com

-Air_date; date; date episode originally aired

datatable(office_ratings, options=list(scrollX=TRUE))

Relationship Between the Number of Viewers and Episode Ratings

This chart shows all of the continuous variables: the relationship between the number of viewers, the number of ratings, and the average rating for that episode on a scale from 1-10.

ggplot(data = office_ratings) +
geom_point(mapping = aes(x = viewers, y = total_votes, color = imdb_rating)) +
labs(x = "Number of Viewers (In Millions)",
     y = "Number of Ratings on IMDb",
     color = "Average Rating: 1 - 10",
     title = "Viewers and Ratings of The Office")

This plot tells us that number of ratings, and the rating itself does not entirely correlate to the number of viewers that episode received. There are a few outliers which have many more ratings than those with the same amount of viewers. Aside from this, there is a small trend of more viewers correlating to higher rating of an episode, these points are just on a lower end of the plot.

Relationship Between the Number of Viewers in each Episode and Their Ratings

This chart shows the relationship between the number of viewers and the episode number as well as showing the rating from 1-10, in a color scale.

ggplot(data = office_ratings) +
geom_point(mapping = aes(x = episode, y =viewers, color = imdb_rating)) +
labs(x = "Episode Number",
     y = "Number of Viewers (In Millions)",
     title = "Viewers and Ratings of The Office Episodes",
     color = "IMDb Rating Out of 10")

Although there are more lighter spots(higher rated episodes), with more views, there isn’t a trend with the higher number of viewers having higher ratings as they are all mixed. Despite this being the case, the points towards the bottom have more of a trend of being lower rated overall, than those at the top.

Number of Viewers and the Number of Votes

This chart shows the relationship between the number of viewers (in millions), and the number of votes that correlated in each episode.

ggplot(data = office_ratings) +
geom_point(mapping = aes(x = viewers, y = total_votes)) +
labs(x = "Number of Viewers (In Millions)",
     y = "Number of Votes on IMDb",
     title = "Viewers and Number of Votes of The Office Episodes")

In general the more people watch an episode, the more votes there are on IMDb, but there are outliers that sit much higher than the rest of the votes. These outliers sit in the range from just under 6000 votes, to just about 8000 votes.

Popularity of the Office Over Time

This chart shows the relationship between the number of viewers and the IMDb rating; the season is shown as a difference in colored plot point.

ggplot(data = office_ratings) +
geom_point(mapping = aes(x = imdb_rating, y = viewers, color = season)) +
labs(x = "IMDb Rating",
     y = "Number of Viewers (In Millions)",
     title = "Popularity of The Office Over Time ",
     color = "Season Number")

This chart shows the popularity of the Office in relation through the number of viewers per each season, as well as the ratings throughout. This chart is displayed this way to show any correlation in decline in popularity (viewers), and IMDb rating (a factor that may influence viewer numbers in future seasons). This show has a peak towards the middle (seasons 4-6), and is lower in viewers and ratings towards the end (seasons 8-9). There are many more purple and pink plot points towards the bottom shifted left, and there are many turquoise and teal points in the higher section of points shifted more towards the middle right. This means that the data shown is much like a bell curve with many of the middle seasons higher, and the later seasons lower.

Appeal of the Office Over Time

This chart shows the relationship between the season progressions and their episodes IMDb ratings.

ggplot(data = office_ratings) +
geom_point(mapping = aes(x = season, y = imdb_rating, color = season)) +
labs(y = "IMDb Rating",
     x = "Season Number",
     color = "Season Number",
     title = "Appeal of The Office Over Time")

Appeal in this chart is shown through the IMDb rating as well as the season number. Each plot point represents an episode. Although there are a couple outliers in later seasons, there is a general trend of increase, then decrease in the points. Since this bell curve like trend is the case, this means the general appeal has decrease over time. In comparison to popularity, both have similar curves, but the popularity one is more obvious, and less overlapping. Since the chart for appeal is shown in vertical plot points, it is easier to see the overlap in IMDb reviews.

  1. Is there a trend in total viewership within the individual seasons? Are there any notable changes in viewership within any season? If so, can you explain the reason for these changes?

Popularity of The Office Within Seasons

This chart shows the different seasons and the number of viewers in each episode.

ggplot(data = office_ratings) +
geom_point(mapping = aes(x = viewers, y = season, color = season)) +
labs(y = "Season Number",
     x = "Number of Viewers (In Millions)",
     title = "Popularity of The Office Within Seasons", 
     color = "Season Number")

This chart follows similar trends of previous chart. Having lower starting views, growing in views in the middle, and then going back down, it shows like a bell curve. The largest jump in views is between the first and second seasons as there is no overlap in points.

The popularity of The Office grew very quickly, with many ratings, and viewers, but the popularity of it didn’t last the entirety of the show. With many comparisons showing a dramatic increase than decrease in viewers and ratings, the show’s popularity and appeal displays a bell curve distribution.