This project analyzes a Netflix dataset containing information about
movies and TV shows.
The dataset includes attributes such as title, type, release year,
country, and duration.
The goal is to identify trends in content growth, country-wise
distribution, and differences between movies and TV shows.
# Load dataset
data <- read.csv("Netflix.csv", na.strings = c("", " ", "NA"))
View(data)
# View first rows
head(data)
## show_id type title director
## 1 s1 Movie Dick Johnson Is Dead Kirsten Johnson
## 2 s2 TV Show Blood & Water <NA>
## 3 s3 TV Show Ganglands Julien Leclercq
## 4 s4 TV Show Jailbirds New Orleans <NA>
## 5 s5 TV Show Kota Factory <NA>
## 6 s6 TV Show Midnight Mass Mike Flanagan
## cast
## 1 <NA>
## 2 Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng
## 3 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera
## 4 <NA>
## 5 Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar
## 6 Kate Siegel, Zach Gilford, Hamish Linklater, Henry Thomas, Kristin Lehman, Samantha Sloyan, Igby Rigney, Rahul Kohli, Annarah Cymone, Annabeth Gish, Alex Essoe, Rahul Abburi, Matt Biedel, Michael Trucco, Crystal Balint, Louis Oliver
## country date_added release_year rating duration
## 1 United States September 25, 2021 2020 PG-13 90 min
## 2 South Africa September 24, 2021 2021 TV-MA 2 Seasons
## 3 <NA> September 24, 2021 2021 TV-MA 1 Season
## 4 <NA> September 24, 2021 2021 TV-MA 1 Season
## 5 India September 24, 2021 2021 TV-MA 2 Seasons
## 6 <NA> September 24, 2021 2021 TV-MA 1 Season
## listed_in
## 1 Documentaries
## 2 International TV Shows, TV Dramas, TV Mysteries
## 3 Crime TV Shows, International TV Shows, TV Action & Adventure
## 4 Docuseries, Reality TV
## 5 International TV Shows, Romantic TV Shows, TV Comedies
## 6 TV Dramas, TV Horror, TV Mysteries
## description
## 1 As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable.
## 2 After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth.
## 3 To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war.
## 4 Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series.
## 5 In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life.
## 6 The arrival of a charismatic young priest brings glorious miracles, ominous mysteries and renewed religious fervor to a dying town desperate to believe.
Interpretation: The dataset is successfully loaded, and the first few rows show the structure of Netflix titles.
# Structure and summary
str(data)
## 'data.frame': 8807 obs. of 12 variables:
## $ show_id : chr "s1" "s2" "s3" "s4" ...
## $ type : chr "Movie" "TV Show" "TV Show" "TV Show" ...
## $ title : chr "Dick Johnson Is Dead" "Blood & Water" "Ganglands" "Jailbirds New Orleans" ...
## $ director : chr "Kirsten Johnson" NA "Julien Leclercq" NA ...
## $ cast : chr NA "Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile "| __truncated__ "Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, G"| __truncated__ NA ...
## $ country : chr "United States" "South Africa" NA NA ...
## $ date_added : chr "September 25, 2021" "September 24, 2021" "September 24, 2021" "September 24, 2021" ...
## $ release_year: int 2020 2021 2021 2021 2021 2021 2021 1993 2021 2021 ...
## $ rating : chr "PG-13" "TV-MA" "TV-MA" "TV-MA" ...
## $ duration : chr "90 min" "2 Seasons" "1 Season" "1 Season" ...
## $ listed_in : chr "Documentaries" "International TV Shows, TV Dramas, TV Mysteries" "Crime TV Shows, International TV Shows, TV Action & Adventure" "Docuseries, Reality TV" ...
## $ description : chr "As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical wa"| __truncated__ "After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is h"| __truncated__ "To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled "| __truncated__ "Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Or"| __truncated__ ...
summary(data)
## show_id type title director
## Length:8807 Length:8807 Length:8807 Length:8807
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## cast country date_added release_year
## Length:8807 Length:8807 Length:8807 Min. :1925
## Class :character Class :character Class :character 1st Qu.:2013
## Mode :character Mode :character Mode :character Median :2017
## Mean :2014
## 3rd Qu.:2019
## Max. :2021
## rating duration listed_in description
## Length:8807 Length:8807 Length:8807 Length:8807
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
Interpretation: The dataset contains multiple variables such as title, type, director, country, rating, and release year. Summary provides basic statistics and helps identify missing values.
# Check missing values
colSums(is.na(data))
## show_id type title director cast country
## 0 0 0 2634 825 831
## date_added release_year rating duration listed_in description
## 10 0 4 3 0 0
Interpretation: This shows the number of missing values in each column, helping identify which variables need cleaning.
# Fix misplaced values (duration stored in rating)
wrong_rows <- grepl("min", data$rating)
data$duration[wrong_rows] <- data$rating[wrong_rows]
data$rating[wrong_rows] <- NA
View(data)
Interpretation: Some duration values were incorrectly stored in the rating column. This step corrects that issue and improves data accuracy.
# Handle missing values
data$director[is.na(data$director)] <- "Unknown"
data$cast[is.na(data$cast)] <- "Not Available"
data$country[is.na(data$country)] <- "Unknown"
data$duration[is.na(data$duration)] <- "Not Available"
View(data)
head(data)
## show_id type title director
## 1 s1 Movie Dick Johnson Is Dead Kirsten Johnson
## 2 s2 TV Show Blood & Water Unknown
## 3 s3 TV Show Ganglands Julien Leclercq
## 4 s4 TV Show Jailbirds New Orleans Unknown
## 5 s5 TV Show Kota Factory Unknown
## 6 s6 TV Show Midnight Mass Mike Flanagan
## cast
## 1 Not Available
## 2 Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng
## 3 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera
## 4 Not Available
## 5 Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar
## 6 Kate Siegel, Zach Gilford, Hamish Linklater, Henry Thomas, Kristin Lehman, Samantha Sloyan, Igby Rigney, Rahul Kohli, Annarah Cymone, Annabeth Gish, Alex Essoe, Rahul Abburi, Matt Biedel, Michael Trucco, Crystal Balint, Louis Oliver
## country date_added release_year rating duration
## 1 United States September 25, 2021 2020 PG-13 90 min
## 2 South Africa September 24, 2021 2021 TV-MA 2 Seasons
## 3 Unknown September 24, 2021 2021 TV-MA 1 Season
## 4 Unknown September 24, 2021 2021 TV-MA 1 Season
## 5 India September 24, 2021 2021 TV-MA 2 Seasons
## 6 Unknown September 24, 2021 2021 TV-MA 1 Season
## listed_in
## 1 Documentaries
## 2 International TV Shows, TV Dramas, TV Mysteries
## 3 Crime TV Shows, International TV Shows, TV Action & Adventure
## 4 Docuseries, Reality TV
## 5 International TV Shows, Romantic TV Shows, TV Comedies
## 6 TV Dramas, TV Horror, TV Mysteries
## description
## 1 As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable.
## 2 After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth.
## 3 To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war.
## 4 Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series.
## 5 In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life.
## 6 The arrival of a charismatic young priest brings glorious miracles, ominous mysteries and renewed religious fervor to a dying town desperate to believe.
Interpretation: Missing values are replaced with meaningful placeholders to maintain dataset consistency.
# Remove rows where date_added is missing
data <- data[!is.na(data$date_added), ]
Interpretation: Rows with missing date information are removed since they are important for time-based analysis.
# Fill missing rating with mode
mode_rating <- names(sort(table(data$rating), decreasing = TRUE))[1]
data$rating[is.na(data$rating)] <- mode_rating
View(data)
Interpretation: Missing ratings are replaced with the most frequent rating value (mode).
# Fill missing duration with mode
mode_duration <- names(sort(table(data$duration), decreasing = TRUE))[1]
data$duration[is.na(data$duration)] <- mode_duration
View(data)
Interpretation: Missing duration values are replaced with the most common duration.
# Convert date column
data$date_added <- as.Date(data$date_added, format = "%B %d, %Y")
View(data)
head(data$date_added)
## [1] "2021-09-25" "2021-09-24" "2021-09-24" "2021-09-24" "2021-09-24"
## [6] "2021-09-24"
Interpretation: The date_added column is converted into Date format for proper time analysis.
# Convert categorical variables
data$type <- as.factor(data$type)
data$rating <- as.factor(data$rating)
View(data)
head(data)
## show_id type title director
## 1 s1 Movie Dick Johnson Is Dead Kirsten Johnson
## 2 s2 TV Show Blood & Water Unknown
## 3 s3 TV Show Ganglands Julien Leclercq
## 4 s4 TV Show Jailbirds New Orleans Unknown
## 5 s5 TV Show Kota Factory Unknown
## 6 s6 TV Show Midnight Mass Mike Flanagan
## cast
## 1 Not Available
## 2 Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng
## 3 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera
## 4 Not Available
## 5 Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar
## 6 Kate Siegel, Zach Gilford, Hamish Linklater, Henry Thomas, Kristin Lehman, Samantha Sloyan, Igby Rigney, Rahul Kohli, Annarah Cymone, Annabeth Gish, Alex Essoe, Rahul Abburi, Matt Biedel, Michael Trucco, Crystal Balint, Louis Oliver
## country date_added release_year rating duration
## 1 United States 2021-09-25 2020 PG-13 90 min
## 2 South Africa 2021-09-24 2021 TV-MA 2 Seasons
## 3 Unknown 2021-09-24 2021 TV-MA 1 Season
## 4 Unknown 2021-09-24 2021 TV-MA 1 Season
## 5 India 2021-09-24 2021 TV-MA 2 Seasons
## 6 Unknown 2021-09-24 2021 TV-MA 1 Season
## listed_in
## 1 Documentaries
## 2 International TV Shows, TV Dramas, TV Mysteries
## 3 Crime TV Shows, International TV Shows, TV Action & Adventure
## 4 Docuseries, Reality TV
## 5 International TV Shows, Romantic TV Shows, TV Comedies
## 6 TV Dramas, TV Horror, TV Mysteries
## description
## 1 As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable.
## 2 After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth.
## 3 To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war.
## 4 Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series.
## 5 In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life.
## 6 The arrival of a charismatic young priest brings glorious miracles, ominous mysteries and renewed religious fervor to a dying town desperate to believe.
Interpretation: Categorical variables are converted into factors for better analysis and modeling.
# Extract year and month
data$year_added <- format(data$date_added, "%Y")
data$month_added <- format(data$date_added, "%m")
View(data)
table(data$year_added)
##
## 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021
## 2 2 1 13 3 10 23 73 418 1164 1625 1999 1878 1498
table(data$month_added)
##
## 01 02 03 04 05 06 07 08 09 10 11 12
## 727 557 734 759 626 724 819 749 765 755 697 797
Interpretation: New columns help analyze yearly and monthly trends in content addition.
# Create content age column
data$content_age <- as.numeric(format(Sys.Date(), "%Y")) - data$release_year
table(data$content_age)
##
## 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 592 953 1030 1146 1032 901 558 352 287 236 185 193 152 135 88 96
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
## 80 64 59 51 45 37 39 36 38 24 25 22 28 23 17 22
## 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52
## 16 18 8 13 10 12 11 17 13 11 11 7 7 9 7 7
## 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## 10 5 5 2 2 3 5 1 2 2 2 3 1 4 1 3
## 70 71 72 79 80 81 82 83 84 101
## 2 3 2 1 2 4 3 3 2 1
View(data)
Interpretation: This shows how old each piece of content is, helping compare old vs new content.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Content by country
data %>%
group_by(country) %>%
summarise(total = n())
## # A tibble: 749 × 2
## country total
## <chr> <int>
## 1 , France, Algeria 1
## 2 , South Korea 1
## 3 Argentina 56
## 4 Argentina, Brazil, France, Poland, Germany, Denmark 1
## 5 Argentina, Chile 2
## 6 Argentina, Chile, Peru 1
## 7 Argentina, France 1
## 8 Argentina, France, United States, Germany, Qatar 1
## 9 Argentina, Italy 1
## 10 Argentina, Spain 8
## # ℹ 739 more rows
Interpretation: Shows how many titles each country contributes to Netflix.
# Top 10 countries
data %>%
group_by(country) %>%
summarise(total = n()) %>%
arrange(desc(total)) %>%
head(10)
## # A tibble: 10 × 2
## country total
## <chr> <int>
## 1 United States 2812
## 2 India 972
## 3 Unknown 830
## 4 United Kingdom 418
## 5 Japan 244
## 6 South Korea 199
## 7 Canada 181
## 8 Spain 145
## 9 France 124
## 10 Mexico 110
Interpretation: Displays the top 10 countries with the highest number of Netflix titles.
# Content by rating
table(data$rating)
##
## G NC-17 NR PG PG-13 R TV-14 TV-G
## 41 3 79 287 490 799 2157 220
## TV-MA TV-PG TV-Y TV-Y7 TV-Y7-FV UR
## 3212 861 306 333 6 3
Interpretation: Shows distribution of content based on ratings (e.g., TV-MA, PG, etc.).
# Titles added per year
data %>%
group_by(year_added) %>%
summarise(total = n())
## # A tibble: 15 × 2
## year_added total
## <chr> <int>
## 1 2008 2
## 2 2009 2
## 3 2010 1
## 4 2011 13
## 5 2012 3
## 6 2013 10
## 7 2014 23
## 8 2015 73
## 9 2016 418
## 10 2017 1164
## 11 2018 1625
## 12 2019 1999
## 13 2020 1878
## 14 2021 1498
## 15 <NA> 88
Interpretation: Shows how many titles were added each year, indicating growth trends.
# Sort by release year
data_sorted <- data[order(data$release_year, decreasing = TRUE), ]
head(data_sorted)
## show_id type title
## 2 s2 TV Show Blood & Water
## 3 s3 TV Show Ganglands
## 4 s4 TV Show Jailbirds New Orleans
## 5 s5 TV Show Kota Factory
## 6 s6 TV Show Midnight Mass
## 7 s7 Movie My Little Pony: A New Generation
## director
## 2 Unknown
## 3 Julien Leclercq
## 4 Unknown
## 5 Unknown
## 6 Mike Flanagan
## 7 Robert Cullen, José Luis Ucha
## cast
## 2 Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng
## 3 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera
## 4 Not Available
## 5 Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar
## 6 Kate Siegel, Zach Gilford, Hamish Linklater, Henry Thomas, Kristin Lehman, Samantha Sloyan, Igby Rigney, Rahul Kohli, Annarah Cymone, Annabeth Gish, Alex Essoe, Rahul Abburi, Matt Biedel, Michael Trucco, Crystal Balint, Louis Oliver
## 7 Vanessa Hudgens, Kimiko Glenn, James Marsden, Sofia Carson, Liza Koshy, Ken Jeong, Elizabeth Perkins, Jane Krakowski, Michael McKean, Phil LaMarr
## country date_added release_year rating duration
## 2 South Africa 2021-09-24 2021 TV-MA 2 Seasons
## 3 Unknown 2021-09-24 2021 TV-MA 1 Season
## 4 Unknown 2021-09-24 2021 TV-MA 1 Season
## 5 India 2021-09-24 2021 TV-MA 2 Seasons
## 6 Unknown 2021-09-24 2021 TV-MA 1 Season
## 7 Unknown 2021-09-24 2021 PG 91 min
## listed_in
## 2 International TV Shows, TV Dramas, TV Mysteries
## 3 Crime TV Shows, International TV Shows, TV Action & Adventure
## 4 Docuseries, Reality TV
## 5 International TV Shows, Romantic TV Shows, TV Comedies
## 6 TV Dramas, TV Horror, TV Mysteries
## 7 Children & Family Movies
## description
## 2 After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth.
## 3 To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war.
## 4 Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series.
## 5 In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life.
## 6 The arrival of a charismatic young priest brings glorious miracles, ominous mysteries and renewed religious fervor to a dying town desperate to believe.
## 7 Equestria's divided. But a bright-eyed hero believes Earth Ponies, Pegasi and Unicorns should be pals — and, hoof to heart, she’s determined to prove it.
## year_added month_added content_age
## 2 2021 09 5
## 3 2021 09 5
## 4 2021 09 5
## 5 2021 09 5
## 6 2021 09 5
## 7 2021 09 5
Interpretation: Displays the most recently released content first.
# Separate Movies and TV Shows
movies <- subset(data, type == "Movie")
tvshows <- subset(data, type == "TV Show")
head(movies)
## show_id type title director
## 1 s1 Movie Dick Johnson Is Dead Kirsten Johnson
## 7 s7 Movie My Little Pony: A New Generation Robert Cullen, José Luis Ucha
## 8 s8 Movie Sankofa Haile Gerima
## 10 s10 Movie The Starling Theodore Melfi
## 13 s13 Movie Je Suis Karl Christian Schwochow
## 14 s14 Movie Confessions of an Invisible Girl Bruno Garotti
## cast
## 1 Not Available
## 7 Vanessa Hudgens, Kimiko Glenn, James Marsden, Sofia Carson, Liza Koshy, Ken Jeong, Elizabeth Perkins, Jane Krakowski, Michael McKean, Phil LaMarr
## 8 Kofi Ghanaba, Oyafunmike Ogunlano, Alexandra Duah, Nick Medley, Mutabaruka, Afemo Omilami, Reggie Carter, Mzuri
## 10 Melissa McCarthy, Chris O'Dowd, Kevin Kline, Timothy Olyphant, Daveed Diggs, Skyler Gisondo, Laura Harrier, Rosalind Chao, Kimberly Quinn, Loretta Devine, Ravi Kapoor
## 13 Luna Wedler, Jannis Niewöhner, Milan Peschel, Edin Hasanović, Anna Fialová, Marlon Boess, Victor Boccard, Fleur Geffrier, Aziz Dyab, Mélanie Fouché, Elizaveta Maximová
## 14 Klara Castanho, Lucca Picon, Júlia Gomes, Marcus Bessa, Kiria Malheiros, Fernanda Concon, Gabriel Lima, Caio Cabral, Leonardo Cidade, Jade Cardozo
## country
## 1 United States
## 7 Unknown
## 8 United States, Ghana, Burkina Faso, United Kingdom, Germany, Ethiopia
## 10 United States
## 13 Germany, Czech Republic
## 14 Unknown
## date_added release_year rating duration
## 1 2021-09-25 2020 PG-13 90 min
## 7 2021-09-24 2021 PG 91 min
## 8 2021-09-24 1993 TV-MA 125 min
## 10 2021-09-24 2021 PG-13 104 min
## 13 2021-09-23 2021 TV-MA 127 min
## 14 2021-09-22 2021 TV-PG 91 min
## listed_in
## 1 Documentaries
## 7 Children & Family Movies
## 8 Dramas, Independent Movies, International Movies
## 10 Comedies, Dramas
## 13 Dramas, International Movies
## 14 Children & Family Movies, Comedies
## description
## 1 As her father nears the end of his life, filmmaker Kirsten Johnson stages his death in inventive and comical ways to help them both face the inevitable.
## 7 Equestria's divided. But a bright-eyed hero believes Earth Ponies, Pegasi and Unicorns should be pals — and, hoof to heart, she’s determined to prove it.
## 8 On a photo shoot in Ghana, an American model slips back in time, becomes enslaved on a plantation and bears witness to the agony of her ancestral past.
## 10 A woman adjusting to life after a loss contends with a feisty bird that's taken over her garden — and a husband who's struggling to find a way forward.
## 13 After most of her family is murdered in a terrorist bombing, a young woman is unknowingly lured into joining the very group that killed them.
## 14 When the clever but socially-awkward Tetê joins a new school, she'll do anything to fit in. But the queen bee among her classmates has other ideas.
## year_added month_added content_age
## 1 2021 09 6
## 7 2021 09 5
## 8 2021 09 33
## 10 2021 09 5
## 13 2021 09 5
## 14 2021 09 5
head(tvshows)
## show_id type title director
## 2 s2 TV Show Blood & Water Unknown
## 3 s3 TV Show Ganglands Julien Leclercq
## 4 s4 TV Show Jailbirds New Orleans Unknown
## 5 s5 TV Show Kota Factory Unknown
## 6 s6 TV Show Midnight Mass Mike Flanagan
## 9 s9 TV Show The Great British Baking Show Andy Devonshire
## cast
## 2 Ama Qamata, Khosi Ngema, Gail Mabalane, Thabang Molaba, Dillon Windvogel, Natasha Thahane, Arno Greeff, Xolile Tshabalala, Getmore Sithole, Cindy Mahlangu, Ryle De Morny, Greteli Fincham, Sello Maake Ka-Ncube, Odwa Gwanya, Mekaila Mathys, Sandi Schultz, Duane Williams, Shamilla Miller, Patrick Mofokeng
## 3 Sami Bouajila, Tracy Gotoas, Samuel Jouy, Nabiha Akkari, Sofia Lesaffre, Salim Kechiouche, Noureddine Farihi, Geert Van Rampelberg, Bakary Diombera
## 4 Not Available
## 5 Mayur More, Jitendra Kumar, Ranjan Raj, Alam Khan, Ahsaas Channa, Revathi Pillai, Urvi Singh, Arun Kumar
## 6 Kate Siegel, Zach Gilford, Hamish Linklater, Henry Thomas, Kristin Lehman, Samantha Sloyan, Igby Rigney, Rahul Kohli, Annarah Cymone, Annabeth Gish, Alex Essoe, Rahul Abburi, Matt Biedel, Michael Trucco, Crystal Balint, Louis Oliver
## 9 Mel Giedroyc, Sue Perkins, Mary Berry, Paul Hollywood
## country date_added release_year rating duration
## 2 South Africa 2021-09-24 2021 TV-MA 2 Seasons
## 3 Unknown 2021-09-24 2021 TV-MA 1 Season
## 4 Unknown 2021-09-24 2021 TV-MA 1 Season
## 5 India 2021-09-24 2021 TV-MA 2 Seasons
## 6 Unknown 2021-09-24 2021 TV-MA 1 Season
## 9 United Kingdom 2021-09-24 2021 TV-14 9 Seasons
## listed_in
## 2 International TV Shows, TV Dramas, TV Mysteries
## 3 Crime TV Shows, International TV Shows, TV Action & Adventure
## 4 Docuseries, Reality TV
## 5 International TV Shows, Romantic TV Shows, TV Comedies
## 6 TV Dramas, TV Horror, TV Mysteries
## 9 British TV Shows, Reality TV
## description
## 2 After crossing paths at a party, a Cape Town teen sets out to prove whether a private-school swimming star is her sister who was abducted at birth.
## 3 To protect his family from a powerful drug lord, skilled thief Mehdi and his expert team of robbers are pulled into a violent and deadly turf war.
## 4 Feuds, flirtations and toilet talk go down among the incarcerated women at the Orleans Justice Center in New Orleans on this gritty reality series.
## 5 In a city of coaching centers known to train India’s finest collegiate minds, an earnest but unexceptional student and his friends navigate campus life.
## 6 The arrival of a charismatic young priest brings glorious miracles, ominous mysteries and renewed religious fervor to a dying town desperate to believe.
## 9 A talented batch of amateur bakers face off in a 10-week competition, whipping up their best dishes in the hopes of being named the U.K.'s best.
## year_added month_added content_age
## 2 2021 09 5
## 3 2021 09 5
## 4 2021 09 5
## 5 2021 09 5
## 6 2021 09 5
## 9 2021 09 5
Interpretation: Separates dataset into movies and TV shows for comparison.
# Latest and oldest content
max(data$release_year)
## [1] 2021
min(data$release_year)
## [1] 1925
Interpretation: The maximum year shows the newest content, while the minimum year shows the oldest content available.
# Visualization 1: Distribution of Movies and TV Shows
barplot(table(data$type),
col = c("skyblue", "orange"),
main = "Distribution of Movies and TV Shows",
xlab = "Content Type",
ylab = "Number of Titles")
Interpretation: This bar chart compares the number of
Movies and TV Shows available on Netflix. It helps identify which type
of content dominates the platform. If Movies have a higher count, it
indicates Netflix focuses more on film content than episodic
content.
# Visualization 2: Content Added Over Years
year_data <- table(data$year_added)
barplot(year_data,
col = "lightgreen",
las = 2,
main = "Netflix Content Added Over Years",
xlab = "Year",
ylab = "Number of Titles")
Interpretation: Displays how content addition has
changed over time.Shows growth trend of Netflix.
# Visualization 3: Top 10 Countries by Content
library(dplyr)
top_countries <- data %>%
count(country) %>%
arrange(desc(n)) %>%
head(10)
barplot(top_countries$n,
names.arg = top_countries$country,
col = "purple",
las = 2,
main = "Top 10 Countries by Netflix Content",
xlab = "Country",
ylab = "Number of Titles")
Interpretation: Shows top countries producing Netflix
content.Highlights major contributors
# Visualization 4: Distribution of Content Ratings
barplot(table(data$rating),
col = "pink",
las = 2,
main = "Distribution of Content Ratings",
xlab = "Rating",
ylab = "Number of Titles")
Interpretation: Displays distribution of audience
ratings.Indicates target audience type.
# Visualization 5: Distribution of Release Years
hist(data$release_year,
col = "lightblue",
main = "Distribution of Release Years",
xlab = "Release Year",
ylab = "Frequency")
Interpretation: Shows how content is spread across
years.Highlights focus on recent vs older content.
# Mean, Median, Standard Deviation
mean(data$release_year)
## [1] 2014.183
median(data$release_year)
## [1] 2017
sd(data$release_year)
## [1] 8.822191
# Quartiles & Percentiles
quantile(data$release_year)
## 0% 25% 50% 75% 100%
## 1925 2013 2017 2019 2021
# Interquartile Range (IQR)
IQR(data$release_year)
## [1] 6
Interpretation: Summarizes central tendency and spread of release years.Helps understand overall distribution and variability.
# Outlier Detection using IQR
Q1 <- quantile(data$release_year, 0.25)
Q3 <- quantile(data$release_year, 0.75)
IQR_value <- IQR(data$release_year)
lower_bound <- Q1 - 1.5 * IQR_value
upper_bound <- Q3 + 1.5 * IQR_value
outliers <- data$release_year[
data$release_year < lower_bound | data$release_year > upper_bound
]
outliers
## [1] 1993 1996 1998 1997 1975 1978 1983 1987 2001 2002 2003 2001 1994 1994 2003
## [16] 1982 1994 1993 2003 2001 1989 1990 1991 1994 1998 1999 1986 2003 1992 1996
## [31] 2003 1999 1984 2001 1997 2003 2003 1980 1986 1961 1996 2000 1993 2002 2003
## [46] 2001 1995 1993 1985 1993 1995 1992 1993 2002 1999 1986 1999 1995 1991 1994
## [61] 2003 2000 1991 1995 2000 1999 1983 1976 1959 1997 2002 1997 1999 1997 2000
## [76] 1995 1993 1988 1995 2001 1992 2000 1999 1997 1984 1986 1989 2003 1980 2002
## [91] 2003 1998 1998 2003 2001 1981 1982 1988 1981 1972 2001 2000 1997 2003 2001
## [106] 1997 2001 1997 1984 2001 1993 1981 1996 1976 1998 1997 1999 1989 2003 1994
## [121] 2002 2000 1992 2001 1992 2001 2003 1981 2001 1996 1964 2002 1999 1998 1990
## [136] 1988 2002 1945 1997 2003 1999 1987 1997 1988 1993 1990 1999 1954 1979 2002
## [151] 1989 1994 1975 1991 2003 1982 1999 1980 1990 1982 2002 2003 2002 2003 1999
## [166] 2002 2002 1998 1998 1998 2000 1993 1999 1990 1997 1994 1979 1989 1982 1958
## [181] 1956 1963 1954 1970 2002 1999 1999 1981 1996 1982 1979 1984 1985 1981 1973
## [196] 1976 1994 2003 1990 1984 1998 1980 1993 2001 2003 1998 1987 1986 1994 1999
## [211] 2001 2001 2003 2002 2003 2000 1998 1998 1995 1997 2001 2003 2003 2003 2001
## [226] 2002 2002 1999 1975 1991 1992 1993 1989 1987 1991 1991 1990 1991 1925 2003
## [241] 2000 1998 1982 1972 1974 1979 1989 1998 2000 2003 1979 1993 2000 1998 1998
## [256] 1980 1986 2003 2000 1986 1991 2002 1978 1960 2000 1988 1976 1983 1988 1973
## [271] 1974 1989 1984 1966 1971 1962 1994 1993 1993 1995 1969 1998 1992 1994 1995
## [286] 1996 1996 1971 1975 1992 2001 2001 1977 1999 2001 1979 2001 1994 1992 1998
## [301] 1988 1986 1986 1990 1988 1987 1989 1988 1991 1990 1991 2000 1997 2000 2003
## [316] 1989 1977 1971 1993 1998 1996 2002 2002 1974 1989 2001 1977 1999 1998 2000
## [331] 1986 1991 1999 2000 2001 2003 1997 1994 1995 2003 1995 2001 2002 1995 1998
## [346] 1992 1972 1982 1997 1996 1992 1978 2003 1997 1999 2002 2002 2000 2001 2001
## [361] 1999 1991 1973 1967 2003 1960 1998 1992 1976 1992 1992 1958 2000 2000 2003
## [376] 1973 1997 1992 1999 1988 1994 1968 1975 2002 1993 1973 1999 2002 1967 2000
## [391] 1989 1977 2003 1997 2003 1991 2002 1992 2002 2002 1985 1965 1997 1989 2003
## [406] 1996 2000 2003 2003 1983 1979 2002 1973 2002 1990 1985 1997 2001 1996 2001
## [421] 1998 1998 1971 2000 2003 1982 1978 1977 1956 1994 1993 2002 2003 1990 1997
## [436] 1958 1979 1995 1990 2001 2003 1978 2000 1987 1999 1998 2001 1976 1997 1988
## [451] 1975 2000 1997 1995 1999 1974 1989 1981 1984 2003 2002 1995 1997 2001 1973
## [466] 1973 1960 1997 1992 2002 1981 2001 1982 1990 1996 1993 1980 2003 1995 2003
## [481] 1988 1996 1997 1945 1972 1998 2000 1987 2003 1982 1946 1997 1992 1990 1983
## [496] 2000 1998 1976 1962 1990 1994 1995 1979 1987 1981 1989 1983 1999 1986 2002
## [511] 1983 1988 1973 1985 2003 1997 2002 1996 2003 1997 2002 1984 1962 2001 1999
## [526] 1982 1993 1945 1981 1997 1995 1960 1983 1968 1984 2002 1988 2002 2000 1997
## [541] 1990 2003 1992 1992 1990 1988 1993 1946 2002 1998 1986 1980 1980 1998 2000
## [556] 1999 2003 2001 1997 2002 1996 1942 1996 1984 2000 1994 1980 1996 1974 1985
## [571] 1955 1984 1980 1985 1976 1979 1982 1985 1990 1968 1998 1995 1999 1998 2003
## [586] 1982 1983 1945 1964 1982 1955 2000 1993 1988 1998 1997 2000 1993 1999 1986
## [601] 2002 2002 2002 1990 1996 1993 1998 1991 2003 1976 2001 2002 2003 1992 1981
## [616] 1996 2002 1972 1999 1977 1980 1999 1993 1999 1991 1990 2003 1995 1967 2000
## [631] 1993 1942 2001 1983 1977 1965 1996 1994 1970 1992 1982 1981 1996 1978 1994
## [646] 2000 1975 2003 1979 1995 1992 1974 2003 2002 1982 1993 1998 1999 2003 2003
## [661] 1944 1991 1988 1984 1944 2001 1997 1993 1997 2002 1994 2001 1955 1978 1989
## [676] 1999 1974 2002 1996 2002 1963 1998 1999 1985 1988 1947 1995 2001 1997 2001
## [691] 1996 1990 1995 2001 1985 1995 1969 1944 1990 1943 1995 1983 1967 2000 2002
## [706] 2000 1967 1943 1999 1971 1981 1943 1994 2002 2001 2001 1973
Interpretation: Identifies unusually old or extreme release years.Helps detect outliers that may affect analysis.
# Correlation 1: Correlation Matrix
numeric_data <- data.frame(
release_year = data$release_year,
year_added = as.numeric(data$year_added),
content_age = data$content_age
)
cor(numeric_data)
## release_year year_added content_age
## release_year 1 NA -1
## year_added NA 1 NA
## content_age -1 NA 1
Interpretation: Shows strength and direction of relationships between variables.Values close to +1 = positive, -1 = negative, 0 = no correlation.
# Correlation 2: Release Year vs Content Age
cor(data$release_year, data$content_age)
## [1] -1
Interpretation: Shows negative correlation between release year and content age.Newer content has lower age.
# Correlation 3: Scatter Plot
plot(data$release_year, data$content_age,
col = "blue",
main = "Release Year vs Content Age",
xlab = "Release Year",
ylab = "Content Age")
Interpretation: Downward pattern shows negative
relationship.Helps visually confirm correlation.
# Regression 1: Simple Linear Regression
# Fit model
model1 <- lm(content_age ~ release_year, data = data)
# Model summary
summary(model1)
##
## Call:
## lm(formula = content_age ~ release_year, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.10e-12 -1.90e-12 -1.70e-12 -1.30e-12 1.21e-08
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.026e+03 3.141e-10 6.449e+12 <2e-16 ***
## release_year -1.000e+00 1.560e-13 -6.412e+12 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.29e-10 on 8795 degrees of freedom
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 4.111e+25 on 1 and 8795 DF, p-value: < 2.2e-16
# Plot with regression line
plot(data$release_year, data$content_age,
col = "blue",
main = "Release Year vs Content Age",
xlab = "Release Year",
ylab = "Content Age")
abline(model1, col = "red", lwd = 2)
Interpretation: Shows relationship between release year
and content age.Regression line indicates negative trend.
# Regression 2: Multiple Linear Regression
# Fit model
model2 <- lm(content_age ~ release_year + as.numeric(year_added), data = data)
# Summary
summary(model2)
##
## Call:
## lm(formula = content_age ~ release_year + as.numeric(year_added),
## data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.399e-09 2.000e-14 3.500e-13 6.100e-13 8.700e-13
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.026e+03 3.556e-10 5.697e+12 <2e-16 ***
## release_year -1.000e+00 3.141e-14 -3.183e+13 <2e-16 ***
## as.numeric(year_added) -2.288e-13 1.768e-13 -1.294e+00 0.196
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.571e-11 on 8706 degrees of freedom
## (88 observations deleted due to missingness)
## Multiple R-squared: 1, Adjusted R-squared: 1
## F-statistic: 5.13e+26 on 2 and 8706 DF, p-value: < 2.2e-16
Interpretation: Uses multiple variables for prediction.Improves model accuracy
# Regression 3: Prediction
# New data
new_data <- data.frame(release_year = 2020)
# Prediction
predict(model1, newdata = new_data)
## 1
## 6
Interpretation: Predicts content age for given year.Useful for forecasting.