🎬 Weekly Exercise: Netflix Dataset Analysis

Objective:
Find the 6 actors with the most appearances in TV Shows using the Netflix dataset.


1. Load Required Libraries

library(tidyverse)   # includes dplyr, tidyr, ggplot2, etc.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.5
## âś” forcats   1.0.1     âś” stringr   1.5.2
## âś” ggplot2   4.0.0     âś” tibble    3.3.0
## âś” lubridate 1.9.4     âś” tidyr     1.3.1
## âś” purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)

2. Import Dataset

Make sure you downloaded the dataset from Kaggle and saved it as Netflix.csv in your working directory.

Dataset link:
https://www.kaggle.com/datasets/dearsirmehta/100-analysis-using-netflix-datasets

# Read the Netflix dataset
Netflix <- read_csv("Netflix.csv")
## Rows: 6234 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): type, title, director, cast, country, date_added, rating, duration...
## dbl  (2): show_id, release_year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Display first few rows to confirm
head(Netflix)
## # A tibble: 6 Ă— 12
##    show_id type    title   director cast  country date_added release_year rating
##      <dbl> <chr>   <chr>   <chr>    <chr> <chr>   <chr>             <dbl> <chr> 
## 1 81145628 Movie   Norm o… Richard… Alan… United… September…         2019 TV-PG 
## 2 80117401 Movie   Jandin… <NA>     Jand… United… September…         2016 TV-MA 
## 3 70234439 TV Show Transf… <NA>     Pete… United… September…         2013 TV-Y7…
## 4 80058654 TV Show Transf… <NA>     Will… United… September…         2016 TV-Y7 
## 5 80125979 Movie   #reali… Fernand… Nest… United… September…         2017 TV-14 
## 6 80163890 TV Show Apaches <NA>     Albe… Spain   September…         2016 TV-MA 
## # ℹ 3 more variables: duration <chr>, listed_in <chr>, description <chr>

3. Transform Data

Netflix_Actor <- Netflix %>% 
  separate_rows(cast, sep = ", ") %>% 
  drop_na(cast) %>% 
  rename(actor = cast)

head(Netflix_Actor)
## # A tibble: 6 Ă— 12
##    show_id type  title     director actor country date_added release_year rating
##      <dbl> <chr> <chr>     <chr>    <chr> <chr>   <chr>             <dbl> <chr> 
## 1 81145628 Movie Norm of … Richard… Alan… United… September…         2019 TV-PG 
## 2 81145628 Movie Norm of … Richard… Andr… United… September…         2019 TV-PG 
## 3 81145628 Movie Norm of … Richard… Bria… United… September…         2019 TV-PG 
## 4 81145628 Movie Norm of … Richard… Cole… United… September…         2019 TV-PG 
## 5 81145628 Movie Norm of … Richard… Jenn… United… September…         2019 TV-PG 
## 6 81145628 Movie Norm of … Richard… Jona… United… September…         2019 TV-PG 
## # ℹ 3 more variables: duration <chr>, listed_in <chr>, description <chr>

4. Find the Top 6 Actors in TV Shows

Top_Actors <- Netflix_Actor %>%
  select(type, actor) %>% 
  filter(type == "TV Show") %>% 
  group_by(actor) %>% 
  summarise(Appearances = n()) %>% 
  arrange(desc(Appearances)) %>% 
  head(6)

Top_Actors
## # A tibble: 6 Ă— 2
##   actor              Appearances
##   <chr>                    <int>
## 1 Takahiro Sakurai            18
## 2 Yuki Kaji                   16
## 3 Daisuke Ono                 14
## 4 David Attenborough          14
## 5 Ashleigh Ball               12
## 6 Hiroshi Kamiya              12

5. Visualize the Top 6 Actors

Let’s make a bar chart for the top 6 actors with the most Netflix TV Show appearances.

Top_Actors %>%
  ggplot(aes(x = reorder(actor, Appearances), y = Appearances, fill = actor)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  labs(
    title = "Top 6 Actors with the Most TV Show Appearances on Netflix",
    x = "Actor",
    y = "Number of TV Shows"
  ) +
  theme_minimal()


âś… Summary

  • The dataset was cleaned and transformed to handle multiple actors per show.
  • The analysis revealed which six actors appear most frequently in Netflix TV Shows.
  • A bar plot visualizes the ranking clearly.

End of Exercise.