This report dives into viewership trends and audience ratings for the American television series “The Office”. I will be using tools from the ggplot package in R to investigate how the show’s popularity and appeal evolved over time. The dataset consists of 186 episodes and 7 variables, such as airdate, season, episode, IMDB rating, total number of votes, and viewership. The dataset in the analysis was obtained from a publicly available CSV file hosted on GitHub.

datatable(office_ratings, options = list(scrollX = TRUE))

Use well-chosen visualizations to answer the following questions.

1.How are each of the three continuous variables distributed? (Where is the peak and what does it tell you? What does the shape of the distribution tell you? Are there any extreme values?)

ggplot(data = office_ratings) +
  geom_histogram(mapping = aes(x = viewers), bins = 20, fill = "darkorange") +
  labs(title = "Distribution of Episode Viewership", 
       x = "Viewers (in millions)",
       y = "Number of Episodes")

Most episode have between 6-10 million viewers, with only a few episode exceeding 10 million. The shape of the distribution is slightly right skewed which suggest that there was occasional spikes in popularity.

ggplot(data = office_ratings) +
  geom_histogram(mapping = aes(x = imdb_rating), bins = 20, fill = "darkred") +
  labs(title = "Distribution of IMDB Ratings", 
       x = "IMDB Rating (1-10)",
       y = "Number of Episodes")

Ratings were clustered around 8-9 which indicates a fairly high audience satisfaction. A few episodes stood out with very high and low scores.

ggplot(data = office_ratings) +
  geom_histogram(mapping = aes(x = total_votes), bins = 20, fill = "darkblue") +
  labs(title = "Distribution of IMDB Vote Counts", 
       x = "Total Votes on IMDB",
       y = "Number of Episodes")

The vote counts vary widely. Some episodes had thousands of vote while others did not. However, this does show that there was continued fan engagement.

2.Is it the case that the more people watch an episode, the better it’s liked?

ggplot(data = office_ratings, mapping = aes(x = viewers, y = imdb_rating, color = episode)) +
  geom_point() +
  geom_smooth(method = lm, se = FALSE, color = "darkorange")+
  labs(title = "Relationship Between Viewership and IMDB Ratings",
       x = "Viewers (in Millions)",
       y = "IMDB Ratings")
## `geom_smooth()` using formula = 'y ~ x'

cor(office_ratings$viewers, office_ratings$imdb_rating)
## [1] 0.4918702

When looking a the visualization we can see that the is a moderate positive relationship between the two variables. The correlation is around .49, which suggest that more viewers does not necessarily mean an episode is better liked.

3.Are there any exceptions to the trend you noticed in the previous problem? Use a visualization to try to explain these exceptions.

ggplot(data = office_ratings, mapping = aes(x = viewers, y = imdb_rating, color = season)) +
  geom_point(size = 1.5) +
  labs(title = "Relationship Between Viewership and IMDB Ratings by Season",
       x = "Viewers (in Millions)",
       y = "IMDB Ratings")

Yes, there are some exceptions to the overall trend. In the scatterplot where the points are colored by season, we can see that there are some episodes with low viewership but high ratings. There are also some early episodes that had high viewership but only had average ratings.

4.Is it the case that the more people watch an episode, the more people leave an IMDb rating? Are there exceptions? If so, use a visualization to try to explain them.

ggplot(data = office_ratings, mapping = aes(x = viewers, y = total_votes, colour = episode)) +
  geom_point(size = 1.5) +
  geom_smooth(method = lm, se = FALSE, color = "red") +
  labs(title = "Relationship Between Viewership and # of IMDB Ratings",
       x = "Viewers (in Millions)",
       y = "Number of IMDB Votes")
## `geom_smooth()` using formula = 'y ~ x'

cor(office_ratings$viewers, office_ratings$total_votes)
## [1] 0.4749562

There is a moderate positive relationship between the number of viewers and number of votes. The correlation is 0.47, which means episodes with higher viewership tend to get more ratings. However, the scatterplot does show some exceptions where episodes with fewer viewers still received high number of votes.

5.How did the show’s popularity change over time?

ggplot(data = office_ratings, mapping = aes(x = air_date, y = viewers)) +
  geom_line(color = "blue", alpha = 0.4) +
  geom_point(size = 1.5, color = "black") +
  labs(title = "How The Office's Popularity Changed Over Time",
       x = "Air Date",
       y = "Viewers (in Millions)")

The line graph shows that early season tend to have more consistent viewership. However, after the mid series peak there is a gradual decline in popularity. Specifically in the later seasons where viewership fell below 5 million.

6.How did the show’s appeal change over time? (Be careful, popularity and appeal are not the same thing. Think about which variables address these two attributes.)

ggplot(data = office_ratings, mapping = aes(x = air_date, y = imdb_rating)) +
  geom_point( size = 1.5, color = "black") +
  geom_line(color = "red", alpha = 0.4) +
  labs(title = "How The office's Appeal Changed Over Time",
       x = "Air Date",
       y = "IMDB Ratings")

The ratings remained relatively stable throughout the series, while there were some fluctuations, the overall rating do not decline sharply.

7.In the previous two problems, you should notice that the show’s popularity and appeal don’t change in exactly the same way throughout the series. Use the differences you notice in the visualizations to explain why this might be.

Although viewership declined in later seasons, the ratings remained relatively stable, showing that the show’s core fans continued to enjoy it. This suggest that casual viewers dropped while loyal fan stayed engaged.

8.Is there a trend in total viewership within the individual seasons? Are there any notable changes in viewership within any season? If so, can you explain the reason for these changes?

ggplot(data = office_ratings, mapping = aes(x = episode, y = viewers, group = season, color = season)) +
  geom_line(alpha = 0.6) +
  geom_point(size = 1) +
  facet_grid(cols = vars(season)) +
  labs(title = "Episode to Episode Viewership Trends by Season",
       x = "Episode",
       y = "Viewers (Millions)")

Viewership within season fluctuates, with early seasons showing more stability and later seasons to have sharp drops. Overall, trends vary by season but show changing audience engagement.

This analysis reveals while The Office experienced a decline in viewership over its run, its appeal ratings remained fairly stable. There is a moderate positive relationship between viewership, ratings, and total votes but notable exceptions do exist. Overall these trends highlight the comples dynamics of audience engagement with a long running TV series.