Anna and Daniel (Table 3). Lab 2. HIMYM ratings over seasons.
#install.packages("zoo")
#install.packages("changepoint")
#install.packages("ggplot2")
#install.packages("dplyr")
library(ggplot2)
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(changepoint)
## Successfully loaded changepoint package version 2.2.2
## NOTE: Predefined penalty values changed in version 2.2. Previous penalty values with a postfix 1 i.e. SIC1 are now without i.e. SIC and previous penalties without a postfix i.e. SIC are now with a postfix 0 i.e. SIC0. See NEWS and help files for further details.
himym <- read.csv("C:/Users/Chaca/Desktop/Lab 2/himym.csv")
While looking for blog posts to recreated we stumbled on a lot of blogs with bad data or missing code. After looking through many blogs we finally chose to do ratings over the hit TV show “How I Met Your Mother”. Despite neither of us ever watching one episode of the show, the writer of the blog (https://www.r-bloggers.com/when-did-how-i-met-your-mother-become-less-legen-wait-for-it/) had a lot to say about the quality of the show through out a whole nine seasons. Although we never watched the show, it seems to be a general consensus that the show got worse in the later seasons. We have broken down the data into multiple graphs showing the IMDB ratings of the show through all nine seasons to see if this was true or not.
(Besides the data being from 2018 rather than 2013, everything else is the same as the blog post)
mean2.pelt <- cpt.mean(himym$Rating, method="PELT")
plot(mean2.pelt, type='l', cpt.col='red',xlab="Episode",
ylab='IMDB Avg Rating', cpt.width=2, main="Changepoints in IMDB Ratings of 'How I Met Your Mother'", ylim=c(0,10), xlim=c(0,200))
The writer of the original blog did not post his data, but did say he had received it from the IMDB website. So after spending the time to go and extract all the data from the site, this is the graph we reproduced. The spread in the data is pretty consistent in the data until about episode 150, after this we start to see a much wider spread in the data. There are some drastic spikes in the the later episodes, and not in a good way. Most of the later episodes ratings are below the show’s average rating which is about an 8.1,a relatively high rating for a show with some many seasons. These later seasons average rating seems to be somewhere around a 7.5 rating, substantially lower than 8.1. This is what is to be expected because according to the original blogger and many others online suggested the show got much worse as the seasons went on, especially the last two seasons.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
df <- data.frame(himym)
meandf <- df %>%
group_by(Season) %>%
summarize(mean_Rating = mean(Rating))
ggplot(df) +
geom_line(aes(Episode, Rating))+
facet_grid(.~Season) +
geom_hline(aes(yintercept=mean_Rating), data=meandf, color="red")+
labs(y="IMDB Avg Rating",
x="Episode",
title="IMDB Ratings of 'How I Met Your Mother' as of September 2018",
subtitle = "By Season (Seasons 1-9)")
For our addition to the analysis we separated each season into its own graph so it is much easier to compare them. Each season’s average rating is indicated by the red line, which for the most part goes down slightly as each season goes on. Seasons one through seven all have a relatively close season average even with the slight decrease almost every season, it is seasons eight and nine where the rating really drops. After looking online and reading some blogs people say that the show “repeats the same jokes” and “seems to go on forever”. Speculation shows that people were almost getting tired of the show but were still watching because they had already watched the first seven seasons and wanted to see how they would end the show. People seem to be specifically displeased with the end and when they finally explained how they met the mother. Which explains the very bad rating with the ending seasons and specifically the last episode. Looking at the data, the writers of the show should have stopped the show earlier and it would have preserved the reputation of the show.
Seasonplot <- ggplot(data=himym, aes(x=Total.Episode,y=Rating,color=Season))+
labs(y="IMDB Avg Rating", x="Episode", title="IMDB Ratings of 'How I Met Your Mother' as of September 2018", subtitle = "By Season (Seasons 1 - 9)")+
xlim(0, 200)+
ylim(5, 10)+
geom_line()+
geom_smooth(method="lm", se=FALSE, color= "black")
Seasonplot+scale_color_gradientn(colours = rainbow(9))
## Warning: Removed 8 rows containing non-finite values (stat_smooth).
## Warning: Removed 8 rows containing missing values (geom_path).
This graph shows the trend line for the whole show which seems to wrap this whole blog up well. The more seasons that came out the less people liked the show and the lower the rating got. Since we had to go and extract the data ourselves we could make sure it was in tidy data form, which gave us no problems with R. The data set we choice before this was not in tidy data form and made it super difficult trying to graph anything. So we changed our data set to this which worked perfectly, showing us the important of tidy data and how it really is the most compatible way to use data in R. This data set may have been a process to get but it was very easy interpret and analyze despite never seeing the show. If we could change something we probably would have picked a show that we had a little more knowledge on and didn’t have to rely on the opinions of the internet so much. However, this data really showed how using small multiples can be super useful and easy to show comparison. In the first graph it was easy to see how the ending seasons had lower ratings. By splitting it up into each season it made it super easy to see the general trend downward but gave more information than our last graph with just one trendline.