We are investigating how IMDB rating for Netflix shows varies based on the year of release and number of episodes. The dataset contains information from Wikipedia and the Internet Movie Data Base and pertains to 109 Netflix original series. The data was collected from 2013 to 2017. One limitation is that the data for IMDB rating does not take into account advertising for different shows or the popularity of the actors. These confounding variables could influence our data.
| show | year | episodes | imdb_rating | title |
|---|---|---|---|---|
| 1 | 2014 | 13 | 80 | Star Wars: The Clone Wars (season 6) |
| 2 | 2013 | 15 | 89 | Arrested Development (season 4) |
| 3 | 2013 | 4 | 60 | Russell Peters vs. the World |
| 4 | 2014 | 36 | 84 | BoJack Horseman |
| 5 | 2014 | 40 | 85 | Trailer Park Boys (seasons 8, 9, 10 and 11) |
| 6 | 2015 | 39 | 79 | Unbreakable Kimmy Schmidt |
Based on both plots, the data for IMDB rating does not seem to vary dramatically by year or episode number. However, there does seem to be a positive trend in the relationship between IMDB rating and number of episodes.
We studied the relationship between number of episodes and the respective show’s IMDB rating. This analysis suggested that there is a slight positive relationship between number of episodes and IMDB rating, meaning as the shows had more episodes, they also tended to have a greater rating. Based on our visual analysis, the medians of IMDB rating seem to be decreasing slightly as the years have passed from 2013 - 2017. The premiere year is represented by the color of the points shown on the graph.
| term | estimate | std_error | statistic | p_value | conf_low | conf_high |
|---|---|---|---|---|---|---|
| intercept | 71.591 | 12.082 | 5.925 | 0.000 | 47.618 | 95.564 |
| episodes | 0.179 | 0.283 | 0.634 | 0.528 | -0.382 | 0.741 |
| year2014 | 13.503 | 18.305 | 0.738 | 0.462 | -22.819 | 49.824 |
| year2015 | 1.598 | 14.222 | 0.112 | 0.911 | -26.621 | 29.818 |
| year2016 | 2.028 | 12.613 | 0.161 | 0.873 | -22.999 | 27.054 |
| year2017 | 11.312 | 17.321 | 0.653 | 0.515 | -23.057 | 45.681 |
| episodes:year2014 | -0.470 | 0.478 | -0.983 | 0.328 | -1.419 | 0.479 |
| episodes:year2015 | -0.158 | 0.379 | -0.417 | 0.678 | -0.911 | 0.594 |
| episodes:year2016 | -0.197 | 0.345 | -0.570 | 0.570 | -0.880 | 0.487 |
| episodes:year2017 | -1.228 | 1.163 | -1.056 | 0.293 | -3.536 | 1.079 |
From our table, there are two years with a positive correlation between episode number and premiere year, 2013 and 2015. For the other three years, IMDB ratings generally decreased as shows became longer. 2017 had the highest slope, which was also negative. This indicates that in 2017, IMDB ratings varied more by show length.
Our table results generally line up with the visualizations from our exploratory data analysis. Since most years have negative slopes, this ties in with the general declining medians of IMDB scores across the five years, which can be seen in our boxplots.
There seems to be some limitations of the data. For instance, in 2017 there were no shows with more than 25 episodes. Also, there could be many confounding variables that we did not take into account that affects IMDB rating such as: advertising, popularity of actors/actresses, possible scandals surrounding the show, etc. Finally, we had a relatively large data set but it would be helpful to have some more data points to expand our analysis further. Even having data from years 2010-2017 would be helpful to generalize our data further.
In the year 2013, as the number of episodes increased the IMDB rating also increased by a fair amount. In 2014, as the number of episodes increased the IMDB rating decreased fairly dramatically. In 2015, as the number of episodes increase the IMDB rating gradually increased. In 2016, as the number of episodes increased the IMDB rating slightly decreased, but there was no dramatic change. In 2017, as the number of episodes increased the IMDB rating decreased dramatically, but the results cannot be compared to shows with more episodes due to the fact that 2017 data only includes shows who had fewer than 25 episodes. Overall, exlcuding 2013, there seems to be either a decline in IMDB rating or no change based on the overall number of episodes for the show.
Confidence Intervals: 2013 CI: [-0.382, 0.741] 2014 CI: [-1.419, 0.479] 2015 CI: [-0.911, 0.594] 2016 CI: [-0.880, 0.487] 2017 CI: [-3.536 1.079]
It is difficult to determine the practical significance of the data. The confidence intervals include both negative and positive values which means the IMDB rating could either increase or decrease as the amount of episodes for the shows increases. You could simply look at the general trend of the data in the sample, but this data is not entirely representative of what would happen in every sample (as shown through the confidence intervals).
Given our p-values are fairly large, we cannot rule out the possibility that the relationship between the variables is 0. We cannot reject a statement saying that the number of episodes for a particular show does not have an effect on the overall IMDB rating.
***
For all CI’s and p-values to have valid interpretations the following must be met:
Since there is a weak linear relationship between residuals and number of episodes and a slight left-skew, it is implied that the p-values and confidence intervals hold no strong descriptive power of the data. No strong claims of correlation can be made; the p-values and confidence intervals have no valid interpretations. This means we cannot make a clear statement of the relationship between number of episodes and IMDB rating based on this data.
Since we could not reject our null hypothesis that number of episodes and IMDB rating have no correlation, we are still unsure if these two factors are related. There is no clear evidence that IMDB rating increases or decreases as the lengths of shows grows. There is also no evidence that the relatonship between these two variables has changed over the years 2013-2017. This information could be useful to Netflix show producers, as it does not appear to be important to increase the length of a show in order to change its rating.
There definitely are limitations to our work. As discussed above, the IMDB rating is only one component of what people might deem a “good” show. This IMDB rating could also be affected by the amount of advertising done for a particular show or the popularity of the actors who star in the show. The setting of the show could also play a particuarly important role in how people rate the show. Some people, for example, may enjoy the liveliness of a New York setting and have that play into their overall opinion of the show. The overall emotional appeal of the actors in the script could also largely play into the overall take-away viewers have from the show. Another final reason is the intended viewer audience and if the intended audience is reached. For example, House of Cards is likely intended for an audience of greater intelligence and if that audience is not reached, the show could likely be confusing for some viewers and lead to a lower IMDB rating.
However, with the limitations of determining a “good” show, IMDB rates are a simple and accessible way to determine the success of a show. There is a great deal to investigate and there are many other factors impacting IMDB ratings of shows that can be explored in future analysis. While our work focused on Netflix originals, another study might compare the differences between the IMDB ratings of original shows based on the platform they are released. You could compare the ratings of original shows currently airing on cable televsion versus original shows released on streaming platforms. In future work we could also directly compare networks, by averaging then comparing the IMDB ratings of original shows from individual networks.