Netflix, an online streaming platform started off as rent-by-mail DVD service in 1997 which operated on a pay-per-rental model. Users would browse and add movies to their order and Netflix would post them by mail. On completion, users would post it back to Netflix. Rentals costed around $4 each with an additional $2 postage charge.
Later, Netflix switched to a model where users could keep the DVD’s for as long as they liked, but could only rent a new movie after returning their existing one.
Currently,Netflix has about 203.67 million paid subscribers worldwide as of the fourth quarter of 2020.
The word Netflix has been originated from the words Internet and flick.
Project Description
library(dplyr)
library(tidyverse)
library(ggplot2)
library(data.table)
library(lubridate)
library(DT)
library(wordcloud)
library(tidytext)
library(ggthemes)
Step 1: Examine the structure of the dataset and understand the variables
## 'data.frame': 7787 obs. of 12 variables:
## $ show_id : chr "s1" "s2" "s3" "s4" ...
## $ type : chr "TV Show" "Movie" "Movie" "Movie" ...
## $ title : chr "3%" "7:19" "23:59" "9" ...
## $ director : chr NA "Jorge Michel Grau" "Gilbert Chan" "Shane Acker" ...
## $ cast : chr "João Miguel, Bianca Comparato, Michel Gomes, Rodolfo Valente, Vaneza Oliveira, Rafael Lozano, Viviane Porto, M"| __truncated__ "Demián Bichir, Héctor Bonilla, Oscar Serrano, Azalia Ortiz, Octavio Michel, Carmen Beato" "Tedd Chan, Stella Chung, Henley Hii, Lawrence Koh, Tommy Kuan, Josh Lai, Mark Lee, Susan Leong, Benjamin Lim" "Elijah Wood, John C. Reilly, Jennifer Connelly, Christopher Plummer, Crispin Glover, Martin Landau, Fred Tatasc"| __truncated__ ...
## $ country : chr "Brazil" "Mexico" "Singapore" "United States" ...
## $ date_added : chr "August 14, 2020" "December 23, 2016" "December 20, 2018" "November 16, 2017" ...
## $ release_year: int 2020 2016 2011 2009 2008 2016 2019 1997 2019 2008 ...
## $ rating : chr "TV-MA" "TV-MA" "R" "PG-13" ...
## $ duration : chr "4 Seasons" "93 min" "78 min" "80 min" ...
## $ listed_in : chr "International TV Shows, TV Dramas, TV Sci-Fi & Fantasy" "Dramas, International Movies" "Horror Movies, International Movies" "Action & Adventure, Independent Movies, Sci-Fi & Fantasy" ...
## $ description : chr "In a future where the elite inhabit an island paradise far from the crowded slums, you get one chance to join t"| __truncated__ "After a devastating earthquake hits Mexico City, trapped survivors from all walks of life wait to be rescued wh"| __truncated__ "When an army recruit is found dead, his fellow soldiers are forced to confront a terrifying secret that's haunt"| __truncated__ "In a postapocalyptic world, rag-doll robots hide in fear from dangerous machines out to exterminate them, until"| __truncated__ ...
Step 2: Change data type of necessary variables
type <- as.factor(type)
country <- as.factor(country)
rating <- as.factor(rating)
Step 3: Create new columns from existing data if required
date_added <- mdy(date_added)
netflix_data$year_added <- format(date_added,"%Y")
netflix_data$month_added <- format(date_added,"%B")
netflix_data$day_added <- format(date_added,"%d")
Step 4: Identify missing values
## show_id type title director cast country
## 0 0 0 2389 718 507
## date_added release_year rating duration listed_in description
## 10 0 7 0 0 0
## year_added month_added day_added
## 10 10 10
Step 5: Separate columns that have multiple values in the same cell
listed_in to be separated
country to be separated
Which type of content is more popular on Netflix? Is it a TV show or Movie?
Which country has more content of TV Show’s on Netflix?
Which country has more content of Movies on Netflix?
Top 10 countries with more content in 2020
What kind of Genre is most prominent on Netflix?
Rating distribution over Netflix?
How does content addition change over time from 2011 to 2021
TV Show
Movie