Netflix Logo


Introduction

Netflix is a streaming service that offers a wide variety of TV shows and movies. Created in America, the on-demand streaming platform is very popular as it is convenient and flexible. Users are able to view Netflix on multiple devices, allowing them to watch their favorite content whenever and wherever. According to Netflix’s revenue and usage statistics, at the end of 2020 there were about 204 million Netflix subscribers worldwide. Netflix is a global phenomenon and is only on the rise.

Why This Topic?

I watch Netflix practically everyday. I think that they do a great job of providing entertaining and relevant content. I always am able to find something to watch, and most of the time end up enjoying it too.

This is why I became curious as to how Netflix’s selection comes to be. with a huge variety of content in several different categories, I thought it would be interesting to do an analysis of Netflix’s content. There are tons of different factors to their selection such as genre, country of production, rating, release year, and duration.

Data

The data used for this project comes from Kaggle. This dataset consists of TV Shows and Movies available on Netflix as of 2019. The Kaggle data set is collected from Flixable which is a third-party Netflix search engine, making it a reliable source.

I had to make a few adjustments in Excel with the original data set. First, I didn’t see a necessity to include the cast or directors, so I deleted those columns. Then, some TV Shows and Movies were listed under multiple countries and genres. Using Excel, I deleted any entries following a comma so that it was focused on one variable. Lastly, the date added on Netflix column included the month and day. I knew this would be hard to analyze in R, so I created a formula in Excel to only account for the year.

How Does Netflix Choose Their Content?

When it comes to how Netflix chooses the content to stream on their platform, I did a lot of research and came across a very informative blog post. Of course, Netflix needs licenses from studios in order to broadcast their content on the streaming service, but the selection of their Movies and TV Shows is definitely not done at random.

Netflix is a very data driven company. The decision into each TV Show and movie put into their selection is backed up by a ton of data. Netflix uses analytics to determine which content will best serve their users. Since they have such a large number of subscribers, Netflix is able to gather a tremendous amount of data. From this data, Netflix is able to make better decisions.

Analysis

Since Netflix is an American company, I was curious to view the role the home-base country plays on the international streaming platform. Netflix’s media library is home to a lot of different countries’ works- but since Netflix is an American company, does it hold mostly American produced content?

My analysis will look at the TV shows and movies on Netflix as of 2019. Although I won’t be able to determine the reasoning for why each TV Show and Movie was selected to be apart of Netflix’s platform, I will be able to view the number of TV Shows and Movies on Netflix, their categorized genres, ratings and release year. In addition to analyzing Netflix’s 2019 data as a whole, I’ll provide which content for each TV Shows and Movies section is the oldest.

From there, I’ll be able to look into which countries have the most content on the streaming platform when it comes to number of TV Shows and Movies produced from their locations. I will also be able to analyze that country’s genre, ratings and release year to see if the specific country takes up the majority of Netflix’s content.

library(tidytext)
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.3.3     ✓ purrr   0.3.4
## ✓ tibble  3.0.6     ✓ dplyr   1.0.4
## ✓ tidyr   1.1.2     ✓ stringr 1.4.0
## ✓ readr   1.4.0     ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
library(readxl)
library(rmarkdown)
netflix_titles <- read_excel("~/Desktop/netflix_titles.xlsx")
## New names:
## * `` -> ...2
## * `` -> ...4
## * `` -> ...5
## * `` -> ...6
## * `` -> ...7
## * ...
colnames(netflix_titles) <- c('Type','Title','Country', 'notincludeddate','ReleaseYear','Rating','Duration', 'Genre','Added')

Netflix’s Selection as a Whole in 2019

The first topics I want to look at have to do with Netflix’s 2019 selection as a whole.

TV Shows Offered on Netflix in 2019

netflix_titles %>% 
  filter(Type%in%"TV Show") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1  2410

In 2019, there were 2,410 TV Shows to select from on Netflix.

Movies Offered on Netflix in 2019

netflix_titles %>% 
  filter(Type%in%"Movie") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1  5377

In 2019, there were 5,377 Movies to select from on Netflix.

From here, I can analyze the genres of the two types of content on Netflix.

TV Shows Genres

netflix_titles %>% 
  filter(Type%in%"TV Show") -> NetflixTV

NetflixTV %>% filter(!Genre %in% "TV Shows")%>% ggplot(aes(Genre, "TV Show",fill=Genre)) + geom_col() + coord_flip()

For TV Shows on Netflix, it was found that the majority of the genres were international TV Shows. This was interesting to see since Netflix is an American company.

 netflix_titles %>% 
    filter(Type%in%"TV Show" & Genre%in% "International TV Shows") %>% 
    count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   690

A total of 689 International TV Shows were on Netflix in 2019.

Movies Genres

  netflix_titles %>% 
    filter(Type%in%"Movie") -> NetflixMOV
  
  ggplot(NetflixMOV, aes(Genre, "Movie",fill=Genre)) + geom_col() + coord_flip()

For Movies on Netflix, it was found that the majority of the genres were Dramas.

  netflix_titles %>% 
    filter(Type%in%"Movie" & Genre%in% "Dramas") %>% 
    count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1  1384

A total of 1,384 Drama Movies were on Netflix in 2019.

After looking at the genres, it’s important to highlight the different ratings for the content.

TV Show Ratings

  ggplot(NetflixTV, aes(Rating, fill=Rating)) + geom_bar() + coord_flip()

This plot depicts how the most popular TV Show rating was TV-MA.

Movie Ratings

  ggplot(NetflixMOV,  aes(Rating, fill = Rating)) + geom_bar() + coord_flip()

This plot depicts how the most popular Movie rating was TV-MA, as well.

It’s interesting how for both Movies and TV shows that TV-MA. According to Spectrum, TV-MA means that the content is intended for adults and may be unsuitable for children under 17. This might make sense that most of Netflix’s content is rated TV-MA because in a study done by Statistica it was shown that a lot of Netflix subscribers are above the age of 18.

Next, I wanted to look into the relevancy when it comes to time with Netflix’s selection. This can be done by using the release year provided by the data set, since it tells us the year the TV Show/Movie was released to the public. Were Movies that had been recently released on the streaming platform? Or did Netflix add older movies?

Netflix Overall Selection Relevancy

  netflix_titles %>%
    group_by(ReleaseYear) %>% 
    count(sort = TRUE) %>% 
    ggplot(aes(ReleaseYear,n)) + geom_point() + geom_smooth() 
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'

When looking at the content selection portrayed in this plot, it was found that Netflix’s selection comes from a lot of different years. It was evident that more recent years had a higher number of content. Now, I’ll look into the year that Netflix added each piece of content for TV Shows and Movies. This is out of pure curiosity to see which TV Show or Movie has been on Netflix the longest.

TV Show Release Year

NetflixTV %>% 
  filter(!Added %in% 1900) %>% 
  filter(!Added %in% 2020) %>% 
  count(Added, sort = TRUE)
## # A tibble: 9 x 2
##   Added     n
##   <dbl> <int>
## 1  2019   656
## 2  2018   430
## 3  2017   361
## 4  2016   185
## 5  2015    30
## 6  2021    29
## 7  2014     6
## 8  2013     5
## 9  2008     1

I needed to filter out the year 1900 and 2020 because it didn’t make sense that the data set included it, especially 2020 since this data focuses on Netflix in 2019, so 2020 had not even happened yet. This could be credited to messy data. Once 1900 was filtered out, I was able to see the number of TV Shows that came from each year. It was found that 656 TV Shows that were released in 2019 were on Netflix, proving that Netflix’s TV Show selection was very recent content.

NetflixTV %>% 
  filter(!Added %in% 1900) %>%
  filter(!Added %in% 2020) %>% 
  count(Added, Title,sort = TRUE)
## # A tibble: 1,703 x 3
##    Added Title                            n
##    <dbl> <chr>                        <int>
##  1  2008 Dinner for Five                  1
##  2  2013 Breaking Bad                     1
##  3  2013 Gossip Girl                      1
##  4  2013 Jack Taylor                      1
##  5  2013 Russell Peters vs. the World     1
##  6  2013 The 4400                         1
##  7  2014 Goosebumps                       1
##  8  2014 Lilyhammer                       1
##  9  2014 Pee-wee's Playhouse              1
## 10  2014 The Borgias                      1
## # … with 1,693 more rows

This line of code shows that the TV Show that is the oldest piece on Netflix in their 2019 selection was called “Dinner for Five.” It was released in 2008.

Movie Release Year

NetflixMOV %>% 
  filter(!Added %in% 1900) %>% 
  filter(!Added %in% 2020) %>% 
  count(Added, sort = TRUE)
## # A tibble: 13 x 2
##    Added     n
##    <dbl> <int>
##  1  2019  1497
##  2  2018  1255
##  3  2017   864
##  4  2016   258
##  5  2021    88
##  6  2015    58
##  7  2014    19
##  8  2011    13
##  9  2013     6
## 10  2012     3
## 11  2009     2
## 12  2008     1
## 13  2010     1

Similar to TV Shows, I filtered out 1900 and 2020. after doing this, it was found that 1,497 Movies were released in 2019. This also proved that Netflix’s Movie selection came from very recently released content.

NetflixMOV %>% 
  filter(!Added %in% 1900) %>%
  filter(!Added %in% 2020) %>% 
  count(Added, Title,sort = TRUE)
## # A tibble: 4,065 x 3
##    Added Title                           n
##    <dbl> <chr>                       <int>
##  1  2008 To and From New York            1
##  2  2009 Just Another Love Story         1
##  3  2009 Splatter                        1
##  4  2010 Mad Ron's Prevues from Hell     1
##  5  2011 A Stoning in Fulham County      1
##  6  2011 Adam: His Song Continues        1
##  7  2011 Even the Rain                   1
##  8  2011 Hard Lessons                    1
##  9  2011 In Defense of a Married Man     1
## 10  2011 Joseph: King of Dreams          1
## # … with 4,055 more rows

It was interesting to see that the movie “To and From New York” was the oldest movie piece on Netflix’s 2019 selection because it also came from the year 2008.

Overall, when looking at Netflix’s selection in terms of the year the content was released to the public. The overall conclusion can be made that in 2019 Netflix’s content came from more recent years then it did older years. Perhaps this is why Netflix has such a huge subscription rate - people are able to watch recent movies and TV Shows whenever and wherever they want.

Netflix in Different Countries

It’s very important to note that Netflix has content on their streaming platform from a ton of different countries. So what country had the most produced content on Netflix? Let’s find out!

After creating a geo-spatial chart in Tableau depicting the countries and their number of produced TV shows and movies, it was very apparent that the United States took the lead for the most produced movies AND TV shows. The Tableau chart shows a map of the world for Movies (on top) and TV Shows (on bottom).

How different countries have their content on Netflix?

netflix_titles %>%
  group_by(Country) %>% 
  count(Type, sort = TRUE)%>% 
filter(!Country %in% NA) 
## # A tibble: 131 x 3
## # Groups:   Country [81]
##    Country        Type        n
##    <chr>          <chr>   <int>
##  1 United States  Movie    2100
##  2 India          Movie     883
##  3 United States  TV Show   783
##  4 United Kingdom Movie     341
##  5 United Kingdom TV Show   236
##  6 Canada         Movie     175
##  7 Japan          TV Show   162
##  8 South Korea    TV Show   152
##  9 France         Movie     137
## 10 Spain          Movie     119
## # … with 121 more rows

131 countries! This is definitely more than I thought, perhaps this is why Netflix has so many subscribers since they have a worldwide variety.

Looking at the information on a different plot, helps get a closer look at which countries were close to America in terms of production:

netflix_titles %>%
  group_by(Country) %>% 
  count(Type, sort = TRUE) %>% 
  filter(!Country %in% NA) %>% 
  ggplot(aes(reorder(Country, n), n, fill = Type)) + 
  geom_col() +
  coord_flip() +
  facet_wrap(~Type, scales = "free_y")

This plot shows all of the countries and their number of production for both TV Shows and Movies. To make it more clear and easy to read, I filtered it to the top 50 countries:

netflix_titles %>% 
  group_by(Country) %>% 
  count(Type, sort = TRUE) %>% 
  filter(n > 50) %>%
  filter(!Country %in% NA) %>% 
  ggplot(aes(reorder(Country, n), n, fill = Type)) + 
  geom_col() +
  coord_flip() +
  facet_wrap(~Type, scales = "free_y")

So much easier to read! This chart enables viewers to see how much more production the US did in comparison to the other countries.

This already proves what I was looking for - that the United States has the highest number of TV shows and Movies on Netflix- but I now want to dive deeper into the specifics of those American TV Shows and Movies.

United States Netflix Content

For this portion of the project, I will now look into the TV Shows and Movies produced in America that were in Netflix’s selection in 2019. I’ll start by addressing the exact count number of American TV Shows and Movies, then look into the genres for each, and lastly look at the ratings. I’ll then compare these findings to the overall Netflix findings from above.

Number of American TV Shows

netflix_titles %>% 
  filter(Country %in% ("United States")& Type %in% ("TV Show")) ->USATV
USATV %>% count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   783

It was found that there were 783 American produced TV Shows on the streaming platform. When looking back to the total number of TV Shows on Netflix in 2019, the count number was 2,410. 783 is 32.5% of 2,410… Meaning that 32.5% of the TV Show content on Netflix in 2019 was produced in America.

Number of American Movies

netflix_titles %>% 
  filter(Country %in% ("United States")& Type %in% ("Movie")) ->USAMOV  
USAMOV%>%count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1  2100

There were 2,100 American produced Movies on Netflix. When looking back to the total number of Movies on Netflix in 2019, the count number was 5,377. 2,100 is 39.05% of 5,377… Meaning that 39.05% of the Movies content on Netflix in 2019 was produced in America.

Genres of USA TV

Referencing back to the TV Show Genres on Netflix overall, it was shown that International TV Shows was the most popular genre of the entire selection. What’s the most popular genre for the American TV Shows?

ggplot(USATV, aes(Genre, fill=Genre))+geom_bar()+ coord_flip()

Very interesting! The most popular genre among the USA produced TV Shows on Netflix was Kids’ TV. How many TV Shows on Netflix as a whole were categorized by Kids’ TV?:

netflix_titles %>% 
  filter(Type%in%"TV Show" & Genre%in% "Kids' TV") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   359

Of the TV Shows selection as a whole, 359 shows were categorized under Kids’ TV. How many of these were American produced?:

USATV %>% 
  filter(Type%in%"TV Show" & Genre%in% "Kids' TV") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   163

163 TV Shows that were produced in the United States were categorized under Kids’ TV. Proving that 45.4% of the Kids’ TV Genre was produced in the United States.

Genres of USA Movies

Referencing back to the Movies Genres on Netflix overall, it was shown that Dramas were the most popular genre of the entire selection. What’s the most popular genre for the American Movies?

ggplot(USAMOV, aes(Genre, fill=Genre))+geom_bar()+ coord_flip()

The most popular genre among the USA produced Movies was Documentaries. It’s noted that Dramas were a close second. How many Movies on Netflix as a whole were categorized by Dramas in 2019?:

netflix_titles %>% 
  filter(Type%in%"Movie" & Genre%in% "Documentaries") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   751

Of the Movies selection as a whole, 751 movies were categorized under Documentaries. How many of these were American produced?:

USAMOV %>% 
  filter(Type%in%"Movie" & Genre%in% "Documentaries") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   397

397 Movies that were produced in the United States were categorized as Documentaries. From this number, it can be said that 52.86% of the Documentaries Genre on Netflix in 2019 were produced in the United States.

Ratings of USA TV

The same steps will be applied to determine how much content in terms of the ratings are based off of American production. Looking back at the TV Show Ratings on Netflix overall, it was shown that TV-MA was the most popular rating of the entire TV Show selection. What’s the most popular rating for the American TV Shows?

ggplot(USATV, aes(Rating, fill=Rating))+geom_bar()+ coord_flip()

The most popular rating among the TV Shows produced in America was TV-MA. This doesn’t come as a surprise since the overall selection on Netflix for both TV Shows and even Movies was TV-MA, but what percentage of that overall selection was produced in America?

USATV %>% 
  filter(Type%in%"TV Show" & Rating%in% "TV-MA") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   318

318 American TV Shows were rated TV-MA. What was the total number of TV Shows on Netflix that were rated TV-MA?

netflix_titles %>% 
  filter(Type%in%"TV Show" & Rating%in% "TV-MA") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1  1018

A huge number! 1,017 of the entire selection of TV Shows were rated TV-MA. From that number, 31.26% of the TV-MA TV Shows were produced in America.

Ratings of USA Movies

Similar to the TV Show Ratings on Netflix overall, it was shown that TV-MA was the most popular rating of the entire Movie selection too. What’s the most popular rating among the American Movies?

ggplot(USAMOV, aes(Rating, fill=Rating))+geom_bar()+ coord_flip()

No surprises here! The most popular rating among the Movies produced in America was TV-MA. What percentage of that overall selection was produced in America?

USAMOV %>% 
  filter(Type%in%"Movie" & Rating%in% "TV-MA") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   610

610 American Movies were rated TV-MA. What was the total number of Movies on Netflix that were rated TV-MA?

netflix_titles %>% 
  filter(Type%in%"Movie" & Rating%in% "TV-MA") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1  1845

1,845 of the entire selection of Movies on Netflix were rated TV-MA. What does this mean for American content? That 33.06% of that TV-MA rated Movies were produced in the United States.

How much percent of the relevant content on Netflix in 2019 was American produced?

Release Year of USA TV Shows

USATV %>% 
  filter(ReleaseYear%in%"2019") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   163

There were 163 American produced TV Shows that were released in 2019 on Netflix in 2019. When looking back at the TV Shows overall, there were 656 shows produced in 2019. This means that of the TV Shows in 2019 on Netflix, only 24.84% of it was new content from America. What about for movies?

Release Year of USA Movies

USAMOV %>% 
  filter(ReleaseYear%in%"2019") %>% 
  count()
## # A tibble: 1 x 1
##       n
##   <int>
## 1   224

There were 224 American produced Movies that were released in 2019 on Netflix in 2019. When looking back at the Movies overall, there were 1,497 movies produced in 2019. This means that of the Movies in 2019 on Netflix, only 14.96% of it was new content from America.

Conclusion

As I finish up my project, I’ll revert back to the beginning of the analysis At the beginning of my project, I stated my interest in America’s role on Netflix in terms of production quantity. Since America was the home country of Netflix’s origin, I wanted to look into the content that the United States produced. I looked into Netflix’s selection as a whole in terms of TV Show and Movie count and then dove into most popular genre, ratings, and release year for both TV Shows and Movies. After looking at the different countries Netflix had, it was evident that the United States had the most produced content on Netflix. In reference to the “Netflix in Different Countries” section of my project, it was very obvious that the United States had the highest number of TV shows and Movies on Netflix, totaling at 783 TV Shows and 2,100 Movies. I included the percentages that America took up when it came to most popular genre, ratings and release year for both TV Shows and Movies. One big find was that of the total Movies selection, 751 movies were categorized under Documentaries. 397 Documentary American Movies made up for 52.86% of the Documentaries Genre on Netflix in 2019. This proved that American content made up more than half of the Movie Documentary selection in 2019. As for the other categories, the USA percentage ranged from 14-50%, with majority of the categories falling into the 30th percentile. This is a very great percentage for America, especially with all of the other different countries to account for.

Future Exploration

With further research and data, it would be interesting to view the data for Netflix in 2020 and see if there was a big difference between the year on the streaming service. Another interesting exploration would be analyzing the growth and decline of the sections I analyzed such as genres, ratings, and release year on Netflix over a number of years. It would be able to make a potential observation of the factors that go into how Netflix has added or gotten rid of certain pieces of content. Lastly, since I looked into the American produced content on Netflix, I would love to take it further and look into the reasoning behind why this could be. Is it due to the home court advantage? Price? Relevancy? There could be due to a ton of different factors, so it would make for an interesting analysis