Netflix is streaming service and production company that offers a library of films and television series. As of December 31, 2021, Netflix had over 221.8 million subscribers worldwide. With such a large diversity of users, Netflix is expanding its content to include media from all different time periods, regions, and genres. This script will reveal information about the popularity of different rating groups offered on Netflix and which release year had the most successful media production (for the media offered on Netflix).
Upload Packages and Data
library(dplyr)
library(tidyverse)
netflix <- read.csv("~/R_DataScience/data/netflix.csv")
Filter out shows or movies that do not have a rating score and the moves that were not rated.
filtered_netflix <- na.omit(netflix) %>%
filter(rating != "NR")
ggplot(filtered_netflix, aes(rating, fill = rating)) +
geom_bar() +
labs(caption = "Figure 1: Distribution of the Number of Movies Currently On Netflix Released Each Year",
y = "Number of Movies",
x = "Release Year",
legend = "Rating") +
theme_bw() +
theme(plot.caption = element_text(hjust = 0))
ggplot(filtered_netflix, aes(rating, user.rating.score, fill = rating), ) +
geom_boxplot() +
theme_bw() +
labs (caption = "Figure 2: Distribution of User Rating Score for Each Rating Group",
x = "Rating Group",
y = "User Rating Score",
fill = "Rating") +
theme(axis.ticks.x = element_blank(),
axis.text.x = element_blank(),
plot.caption = element_text(hjust = 0))
# group the movies and shows by release year, then find the mean user rating score for each release year
stats_netflix2 <- filtered_netflix %>%
group_by(release.year) %>%
summarise(mean(user.rating.score))
# assign appropriate column names
names(stats_netflix2) <- c("release_year", "mean_score")
ggplot(stats_netflix2, aes(release_year, mean_score)) +
geom_point() +
geom_smooth(method = lm) +
labs(caption = "Figure 3: Average User Rating Score for each Release Year",
y = "Average User Rating Score",
x = "Release Year") +
theme_bw() +
theme(plot.caption = element_text(hjust = 0))
Note: The grey zone covers the 95% confidence level [we can be 95% confident that the predicted average user rating scores will be within that area].
# group the movies and shows by the rating group, then mind the mean rating score for each rating
stats_netflix <- filtered_netflix %>%
group_by(rating) %>%
summarise(mean(user.rating.score))
# assign appropriate column names
names(stats_netflix) <- c("rating", "mean_score")
# arrange years from highest to lowest mean user score
ordered_stats_netflix <- arrange(stats_netflix, desc(mean_score))
knitr::kable(ordered_stats_netflix, align = "cc", caption = "Figure 4: The Average User Rating Score for Each Rating Group", col.names = c("Rating", "Average User Rating Score"))
| Rating | Average User Rating Score |
|---|---|
| TV-MA | 88.52083 |
| R | 87.11111 |
| TV-14 | 86.42553 |
| PG | 86.20168 |
| TV-PG | 85.85714 |
| G | 77.24561 |
| TV-Y | 75.60000 |
| TV-Y7-FV | 75.31250 |
| TV-G | 74.50000 |
| PG-13 | 74.00000 |
| TV-Y7 | 73.92857 |
# group the movies and shows by release year, then find the mean user rating score for each release year
stats_netflix2 <- filtered_netflix %>%
group_by(release.year) %>%
summarise(mean(user.rating.score))
# assign appropriate column names
names(stats_netflix2) <- c("release_year", "mean_score")
# arrange years from highest to lowest mean user score
ordered_stats_netflix2 <- arrange(stats_netflix2, desc(mean_score))
knitr::kable(ordered_stats_netflix2, align = "cc", caption = "Figure 5: The Average User Rating Score for Each Year the Movies Were Released", col.names = c("Release Year", "Average User Rating Score"))
| Release Year | Average User Rating Score |
|---|---|
| 2017 | 90.42623 |
| 2002 | 89.37500 |
| 2011 | 89.33333 |
| 2005 | 89.30000 |
| 2000 | 88.91667 |
| 2008 | 87.18750 |
| 2001 | 87.00000 |
| 2016 | 86.68020 |
| 1978 | 86.00000 |
| 1993 | 85.00000 |
| 2012 | 84.55556 |
| 2015 | 84.50000 |
| 1997 | 83.50000 |
| 2004 | 83.50000 |
| 1999 | 82.80000 |
| 2003 | 82.10000 |
| 1994 | 82.00000 |
| 2013 | 81.36000 |
| 2009 | 80.85714 |
| 1989 | 80.00000 |
| 1998 | 79.23077 |
| 2010 | 78.23077 |
| 2014 | 78.00000 |
| 2007 | 74.30769 |
| 2006 | 72.66667 |
| 1992 | 70.00000 |
| 1982 | 68.00000 |
| 1986 | 67.00000 |
| 1995 | 65.66667 |
| 1990 | 65.00000 |
| 1940 | 61.00000 |
| 1987 | 58.00000 |
For the media available on Netflix, the more popular media (with the highest rating scores) have more recent release years for movies and TV shows [figure 3, 5]. There is also a large increase in average user rating score for movies released after the 1980s [figure 3,5]
As for rating groups, the most popular were not necessarily the most common. For example, the R rating was the least common but one of the most popular [figures 1,2]. Generally, the media targeted toward older viewers (R, TV-MA) tended to be more popular, but media for all ages tended to be most common (PG, TV-14) [figures 1,2,4].
Because Netflix has different media options around the globe, future research could analyze these same patterns in other regions of the world.