Netflix, Inc. is an American subscription streaming service and production company. Launched on August 29, 1997, it offers a film and television series library through distribution deals as well as its own productions, known as Netflix Originals. As of March 31, 2022, Netflix had over 221.6 million subscribers worldwide.
Being this popular and widely used, a lot of data is generated by Netflix. Let’s look at the data and try and find some insights from it. The dataset was obtained from Kaggle which contains movies and TV shows currently streaming on Netflix.
Loading required libraries necessary for analysis.
library(tidyverse)
library(readxl)
library(gghighlight)
library(corrplot)
library(ggtext)
Reading the data from Excel using readxl() function and storing it in a Dataframe. After storing it in a dataframe, it is arranged according the Release year of the particular Movie or TV Show in a descending order using arrange() function.
titles <- read_excel("C:/Users/Ameya/Downloads/titles.xlsx")
titles <- arrange(titles, release_year)
Unnecessary columns are removed from the dataframe.
titles <- titles[-c(4,6)]
head(titles, 30)
## # A tibble: 30 x 13
## id title type release_year runtime genres production_coun~ seasons
## <chr> <chr> <chr> <dbl> <dbl> <chr> <chr> <dbl>
## 1 ts300~ Five Ca~ SHOW 1945 48 ['docume~ United States 1
## 2 tm102~ Raya an~ MOVIE 1953 105 ['drama'~ ['EG'] NA
## 3 tm164~ White C~ MOVIE 1954 115 ['romanc~ United States NA
## 4 tm196~ The Bla~ MOVIE 1954 100 ['romanc~ ['EG'] NA
## 5 tm204~ Dark Wa~ MOVIE 1956 120 ['drama'~ ['EG'] NA
## 6 tm135~ Cairo S~ MOVIE 1958 77 ['drama'~ ['EG'] NA
## 7 tm358~ Ujala MOVIE 1959 142 ['romanc~ India NA
## 8 tm356~ Singapo~ MOVIE 1960 158 ['drama'~ India NA
## 9 tm442~ The Gun~ MOVIE 1961 158 ['war', ~ ['US', 'GB'] NA
## 10 tm102~ Profess~ MOVIE 1962 163 ['romanc~ India NA
## # ... with 20 more rows, and 5 more variables: imdb_id <chr>, imdb_score <dbl>,
## # imdb_votes <dbl>, tmdb_popularity <dbl>, tmdb_score <dbl>
After Data is transformed using all Tidyverse functions, further analysis can be performed. Netflix has different two types of streaming entertainment: Movies and TV Shows. Let’s see how the content is distributed.
mp <- ggplot(titles, aes(x = type)) +
geom_bar(fill = "red", width = 0.4) + theme_bw() +
labs(title = "Netflix Source Distribution", y = "No of titles", x = "Type")
mp + theme(plot.title = element_text(hjust = 0.5))
This BarPlot shows us that are more Movies streaming on Netflix than TV Shows. This BarPlot is made using the ggplot() function.
Netflix is available in many countries and each country has it’s own entertainment industry. The next visualisation shows the top countries producing content which is available to stream on Netflix.
country <- titles %>% group_by(production_countries) %>% summarise(title = n())
country <- country %>% arrange(desc(title))
country <- head(country, 5)
cplot <- ggplot(country, aes(x="", y=title, fill= production_countries)) +
geom_bar(stat="identity", width=1, color="white") +
coord_polar("y", start=0) + theme_void()
cplot + guides(fill = guide_legend(title = "Country")) +
labs(title = "Content produced by Countries") + theme(plot.title = element_text(hjust = 0.8))
As observed, United States has the most amount of content on Netflix for streaming. Behind it is India being the second in having the most content on Netflix.
People around the world have different tastes in Movies and TV shows. The next visualisation shows the top genres being streamed around the world.
genre <- titles %>% group_by(genres) %>% summarise(title = n())
genre <- genre %>% arrange(desc(title))
genre <- head(genre, 15)
gplot <- ggplot(genre, aes(x = reorder(genres, -title), y = title)) +
geom_bar(stat = "identity", fill = "red") +
theme(axis.text.x = element_text(angle = 90, vjust = .1, hjust = .1)) +
labs(title = "Top Genres", y = "No of titles", x = "Genre")
gplot + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = 0.3)) + coord_flip()
The visualisation shows that the Comedy genre is the most viewed genre on Netflix followed by Drama and Documentry/Documentation.
This Titles dataframe is divided according to TV shows and Movies for a detailed further analysis. This division is done using the subset() function. Two new Dataframes are created: movies and shows
movies <- subset(titles, type == "MOVIE")
shows <- subset(titles, type == "SHOW")
After creating two new dataframes, let’s find out the most popular TV show on Netflix as of now based on TMDB popularity. TMDB popularity is a value calculated from IMDB (Internet Movie Database).
shows1 <- subset(shows, tmdb_popularity > 300)
shows1 <- shows1[-c(1,9)]
plot <- ggplot(shows1, aes(x= title, y= tmdb_popularity)) +
geom_bar(stat = "identity", fill = "red") +
theme(axis.text.x = element_text(angle = 90, vjust = .1, hjust = .1)) +
labs(title = "Most Popular Show on Netflix as of now.", y = "TMDB Popularity", x = "Show Title") +
gghighlight(max(tmdb_popularity) > 1450)
plot + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = -0.1)) + coord_flip()
The Marked Heart is the most popular TV Show streaming on Netflix right now which can be seen highlighted along the horizontal BarPlot.
Just like the TV Show, the most popular Movie is found in a similiar way.
movies1 <- subset(movies, tmdb_popularity > 300)
movies1 <- movies1[-c(1,9)]
plot1 <- ggplot(movies1, aes(x= title, y= tmdb_popularity)) +
geom_bar(stat = "identity", fill = "red") +
theme(axis.text.x = element_text(angle = 90, vjust = .1, hjust = .1)) +
gghighlight(max(tmdb_popularity) > 1450) +
labs(title = "Most Popular Movie on Netflix as of now.", y = "TMDB Popularity", x = "Movie Title")
## label_key: title
plot1 + theme(panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black"),
plot.title = element_text(hjust = 0.1)) + coord_flip()
365 Days: This Day is said to be the most popular Movie streaming on Netflix right now.
Furthermore, Top rated Movies are found based on user ratings and user votes. It is done by using the Lollipop visualisation.
movies2 <- subset(movies, imdb_score > 8 & imdb_votes > 70000)
movies2 <- movies2[-c(1,9)]
plot2 <- ggplot(movies2, aes(x=title, y=imdb_score)) +
geom_segment( aes(x=title, xend=title, y=0, yend=imdb_score), color="red2") +
geom_point( color="red", size=4) +
theme_light() +
theme(
panel.grid.major.x = element_blank(),
panel.border = element_blank(),
axis.ticks.x = element_blank(),panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")
) +
xlab("Movies") +
ylab("Rating") + gghighlight(imdb_score > 8.5)
## label_key: title
plot2 + theme_bw() + coord_flip()
Based on the visualisation, the top 3 rated Movies on Netflix are Saving Private Ryan, Inception, Forrest Gump
Similiarly, the top rated shows on Netflix are found using the same visualisation as before.
shows2 <- subset(shows, imdb_score > 8.5 & imdb_votes > 70000)
shows2 <- shows2[-c(1,9)]
plot3 <- ggplot(shows2, aes(x=title, y=imdb_score)) +
geom_segment( aes(x=title, xend=title, y=0, yend=imdb_score), color="red3") +
geom_point( color="red", size=4) +
theme_light() +
theme(
panel.grid.major.x = element_blank(),
panel.border = element_blank(),
axis.ticks.x = element_blank(),panel.grid.major = element_blank(), panel.grid.minor = element_blank(),
panel.background = element_blank(), axis.line = element_line(colour = "black")
) +
xlab("Shows") +
ylab("Rating") + gghighlight(imdb_score > 9)
## label_key: title
plot3 + theme_bw() + coord_flip()
There are four series shown as top rated because TV Series’ Arcane and The Last Dance have the same rating making them tied for third place. They’re followed by Avatar: The Last Airbender being second highest rated and lastly Breaking Bad being the highest rated TV Series on Netflix.
Netflix is still a growing streaming platform which is not done yet even while being the best of the streaming services. It’s user numbers will keep increasing in coming years with it’s database of Movies and TV shows as well.