The dataset is downloaded from kaggle and contains the details of content added to netflix from 2008 till 2021 broken down by countries along with other details of the show like title, director, cast, duration, description,etc. The objective of the visualization is to see how netflix has altered the type of content added over the years. For this, the date_added field was used to derive the year.
suppressPackageStartupMessages(library(tidyverse,warn.conflicts = FALSE))
suppressPackageStartupMessages(library(ggplot2,warn.conflicts = FALSE))
netflix_data <-read_csv('/Users/samishav/Documents/Fall 2021/Data Wrangling/Week 5/netflix_titles.csv')
as_tibble(netflix_data)
## # A tibble: 8,807 × 12
## show_id type title director cast country date_added release_year rating
## <chr> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <chr>
## 1 s1 Movie Dick … Kirsten … <NA> United… September… 2020 PG-13
## 2 s2 TV Show Blood… <NA> Ama … South … September… 2021 TV-MA
## 3 s3 TV Show Gangl… Julien L… Sami… <NA> September… 2021 TV-MA
## 4 s4 TV Show Jailb… <NA> <NA> <NA> September… 2021 TV-MA
## 5 s5 TV Show Kota … <NA> Mayu… India September… 2021 TV-MA
## 6 s6 TV Show Midni… Mike Fla… Kate… <NA> September… 2021 TV-MA
## 7 s7 Movie My Li… Robert C… Vane… <NA> September… 2021 PG
## 8 s8 Movie Sanko… Haile Ge… Kofi… United… September… 1993 TV-MA
## 9 s9 TV Show The G… Andy Dev… Mel … United… September… 2021 TV-14
## 10 s10 Movie The S… Theodore… Meli… United… September… 2021 PG-13
## # … with 8,797 more rows, and 3 more variables: duration <chr>,
## # listed_in <chr>, description <chr>
netflix_data$year<- as.integer(str_sub(netflix_data$date_added,-4))
netflix_data %>% drop_na("year")-> netflix_data_trimmed
The bar plot below shows the proportion of movies to tv shows added by netflix over the last 14 years. In 2008 there was an equal number of tv shows and movies introduced by netflix. But in the four years that followed, netflix only chose to add movies to its list of content. From 2013 onwards it decided to also start adding more tv shows, however the pattern is pretty erratic up until 2018 when there is a dip in the number of tv shows added against movies. From 2018 onwards, there is a slow but steady increase in the number of tv shows added by netflix in comparison to the number of movies.
ggplot(data = netflix_data_trimmed, aes(factor(year), fill = factor(type))) + geom_bar(position = "fill",) +scale_y_continuous(name = "Percent", labels = scales::percent) +
theme(axis.text.x = element_text(angle = 0, vjust = 0.5)) + scale_fill_manual(values =c("darkblue","darkred")) +xlab("Year")