library(tidyverse)
view(office_ratings)

This report is trying to answer the following questions. If viewers have an impact on rating and its appeal. We are also trying to see what happened to the Office appeal and rating over its life span.

We will be using a data set office_ratings from https://raw.githubusercontent.com. This data set contains all episode in the Office lifespan seasons (1-9). The data set also includes individual reviews and ratings for each episode. The variables that we will be using are Viewers (total amount of viewers per episode), imdb_rating (rating per each episode), total_votes (total amount of votes for each episode). We will also be using information from https://en.wikipedia.org/wiki/List_of_The_Office_(American_TV_series)_episodes to assist us.

In the graphs below we will be showing each continuous variables in a distributing and looking to see if there are any extreme values.

ggplot(data = office_ratings)+
  geom_histogram(mapping = aes(x = viewers))+
  labs(title = "Viewership amount",
       caption = "Data obtain from https://raw.githubusercontent.com")

ggplot(data = office_ratings)+
  geom_histogram(mapping = aes(x = imdb_rating))+
  labs(title = "Rating amount",
       caption = "Data obtain from https://raw.githubusercontent.com")

ggplot(data = office_ratings)+
  geom_histogram(mapping = aes(x = total_votes))+
  labs(title = "Amount of Total Votes",
       caption = "Data obtain from https://raw.githubusercontent.com")

For the extreme values. Viewers are at 7.5-8 million views. The distributions is very tight. Most of the views are in the 7-9 million range. For the imdb_rating there is and extreme at 8.2 rating. The distributions is very wide. Most of the ratings are spread out. For the total_votes there is an extreme at 1500,1600 votes. These distributions are tight within the 1400-2700 range. Out side of that the total votes are very disperse.


For the graph below we are trying to see if more people watch an episode then the better it is liked

ggplot(data = office_ratings, mapping = aes(x = imdb_rating, y = viewers))+
  geom_point()+
  geom_smooth(se = FALSE, method = lm)+
  labs(title = "Relationship of episode views and its rating",
       caption = "Data obtain from https://raw.githubusercontent.com")

This graph shows the viewers and the ratings per episode, We are trying to see if the viewership of the Office effects the rating. In this case most of the views are the similar with each other but the rating The rating of the Office increases.


We are trying to look for exceptions in the graph above to the previous question. The out lair at 26 million views with a rating of 9.6. And the point at 4.87 million views and and a rating of 6.7. These two points both follow the trend of rating and viewership matters unlike the rest.


For the graph below we are trying to see if there is any relation between the total views of an episode and its rating.

ggplot(data = office_ratings)+
  geom_point(mapping = aes(x= imdb_rating, y = total_votes, color = total_votes))+
  labs(title = "Relation between Episode and rating",
       caption = "Data obtain from https://raw.githubusercontent.com")

The higher rating the more views tends to be true in this graph/statement. But also the super bad reviews have some more reviews than expected as well. My guess to this trend is that, people who love the episode will/want to go out of there way to vote it super high. I can say the same thing as well for the bad episodes. Some people will go out of there way to vote the episode low, so move votes could come from that.


In the graph below we are trying to see if the shows popularity changed over time, We will compare Viewers and rating to get our answer.

ggplot(data = office_ratings, mapping = aes( x = air_date, y = viewers, color = episode))+
  geom_point()+
  geom_smooth(se=FALSE)+
  labs(title = "Popularity of Office episodes",
       caption = "Data obtain from https://raw.githubusercontent.com")

The office started off well, It started to pick up very quickly in early seasons(2,3,4). Then it started to fall off in the later seasons(7,8,9). The popularity of the office fell over time.


For the graph below we will try to see if the shows apparel changed over time. We will compare the air date of each episode in order compared to its rating

ggplot(data = office_ratings, mapping = aes( x = air_date, y = imdb_rating))+
  geom_smooth()+
  labs(title = "Episodes apparel",
       caption = "Data obtain from https://raw.githubusercontent.com")

The office appeal increased greatly at the start. Then the office appeal fell off over time. The appeal(rating) started its increase in early season(1,2,3) . It soon fell off around the mid seasons (4,5,6). The trend kept on going down until season 9


The appeal and the viewers line up for most of the office run duration, but they don’t line up in the last season. In season 9 the appeal of the show went up, but the viewership went down. The trend lines both go up in the early season, then fall off in the middle. One reason could be that the viewers who stayed were hardcore fans of the office. They would rate the show’s appeal(rating) higher than the average viewer of the other seasons for season 9.


In the graph above for question Popularity of Office episodes there is an outlier within season 5 episode 13, Stress Relief. The viewership has 11 million more viewers than the second most popular episode, season 1 episode 1 Pilot. When looking more deeply into it, the episode is a 2 part episode, so the viewers from 2 different episodes add up together and can make the total much higher. The 2 part episode also had one of the highest ratings(3rd) throughout the offices life span.

In conclusion the Viewers do not have a substance impact on the Offices rating and apparel. There is just not enough support shown from the data above.