1 Investigating The Breakdown of Netflix Media

2 Introduction

Netflix is streaming service and production company that offers a library of films and television series. As of December 31, 2021, Netflix had over 221.8 million subscribers worldwide. With such a large diversity of users, Netflix is expanding its content to include media from all different time periods, regions, and genres. This script will reveal information about the popularity of different rating groups offered on Netflix and which release year had the most successful media production (for the media offered on Netflix).

3 Questions to Investigate

  1. In what year were the best TV shows released?
  2. Which rating group is the most popular?

3.0.1 Preparing the Data

Upload Packages and Data

library(dplyr)
library(tidyverse)
netflix <- read.csv("~/R_DataScience/data/netflix.csv")

Filter out shows or movies that do not have a rating score and the moves that were not rated.

filtered_netflix <- na.omit(netflix)  %>% 
  filter(rating != "NR")

4 Using Graphics to Answer Questions

4.0.1 What is the most common rating group for movies and TV shows on Netflix?

ggplot(filtered_netflix, aes(rating, fill = rating)) +
  geom_bar() +
  labs(caption =  "Figure 1: Distribution of the Number of Movies Currently On Netflix Released Each Year",
       y = "Number of Movies",
       x = "Release Year",
       legend = "Rating") +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0))

4.0.2 What is the user score distribution for each rating group?

ggplot(filtered_netflix, aes(rating, user.rating.score, fill = rating), ) +
  geom_boxplot() +
  theme_bw() +
  labs (caption = "Figure 2: Distribution of User Rating Score for Each Rating Group",
        x = "Rating Group",
        y = "User Rating Score",
        fill  = "Rating") +
  theme(axis.ticks.x = element_blank(),
        axis.text.x = element_blank(),
        plot.caption = element_text(hjust = 0))

4.0.3 What is the association between release year for movies and TV shows and the average user score per year on Netflix?

# group the movies and shows by release year, then find the mean user rating score for each release year 
stats_netflix2 <- filtered_netflix %>% 
  group_by(release.year) %>% 
  summarise(mean(user.rating.score)) 

# assign appropriate column names
names(stats_netflix2) <- c("release_year", "mean_score")

ggplot(stats_netflix2, aes(release_year, mean_score)) +
  geom_point() +
  geom_smooth(method = lm)  +
  labs(caption =  "Figure 3: Average User Rating Score for each Release Year",
       y = "Average User Rating Score",
       x = "Release Year") +
  theme_bw() +
  theme(plot.caption = element_text(hjust = 0)) 

Note: The grey zone covers the 95% confidence level [we can be 95% confident that the predicted average user rating scores will be within that area].

5 Analysis

5.0.1 What is the average user rating score for each rating?

# group the movies and shows by the rating group, then mind the mean rating score for each rating
stats_netflix <- filtered_netflix  %>% 
  group_by(rating) %>% 
  summarise(mean(user.rating.score))

# assign appropriate column names
names(stats_netflix) <- c("rating", "mean_score")

# arrange years from highest to lowest mean user score
ordered_stats_netflix <- arrange(stats_netflix, desc(mean_score))

knitr::kable(ordered_stats_netflix, align = "cc", caption = "Figure 4: The Average User Rating Score for Each Rating Group", col.names = c("Rating", "Average User Rating Score"))
Figure 4: The Average User Rating Score for Each Rating Group
Rating Average User Rating Score
TV-MA 88.52083
R 87.11111
TV-14 86.42553
PG 86.20168
TV-PG 85.85714
G 77.24561
TV-Y 75.60000
TV-Y7-FV 75.31250
TV-G 74.50000
PG-13 74.00000
TV-Y7 73.92857

5.0.2 Which release year has the best ratings?

# group the movies and shows by release year, then find the mean user rating score for each release year 
stats_netflix2 <- filtered_netflix %>% 
  group_by(release.year) %>% 
  summarise(mean(user.rating.score)) 

# assign appropriate column names
names(stats_netflix2) <- c("release_year", "mean_score")

# arrange years from highest to lowest mean user score
ordered_stats_netflix2 <- arrange(stats_netflix2, desc(mean_score))



knitr::kable(ordered_stats_netflix2, align = "cc", caption = "Figure 5: The Average User Rating Score for Each Year the Movies Were Released", col.names = c("Release Year", "Average User Rating Score"))
Figure 5: The Average User Rating Score for Each Year the Movies Were Released
Release Year Average User Rating Score
2017 90.42623
2002 89.37500
2011 89.33333
2005 89.30000
2000 88.91667
2008 87.18750
2001 87.00000
2016 86.68020
1978 86.00000
1993 85.00000
2012 84.55556
2015 84.50000
1997 83.50000
2004 83.50000
1999 82.80000
2003 82.10000
1994 82.00000
2013 81.36000
2009 80.85714
1989 80.00000
1998 79.23077
2010 78.23077
2014 78.00000
2007 74.30769
2006 72.66667
1992 70.00000
1982 68.00000
1986 67.00000
1995 65.66667
1990 65.00000
1940 61.00000
1987 58.00000

6 Conclusion

6.0.1 Question 1

For the media available on Netflix, the more popular media (with the highest rating scores) have more recent release years for movies and TV shows [figure 3, 5]. There is also a large increase in average user rating score for movies released after the 1980s [figure 3,5]

6.0.2 Question 2

As for rating groups, the most popular were not necessarily the most common. For example, the R rating was the least common but one of the most popular [figures 1,2]. Generally, the media targeted toward older viewers (R, TV-MA) tended to be more popular, but media for all ages tended to be most common (PG, TV-14) [figures 1,2,4].

6.0.3 Future Research

Because Netflix has different media options around the globe, future research could analyze these same patterns in other regions of the world.