Objective:
Find the 6 actors with the most appearances in TV Shows
using the Netflix dataset.
library(tidyverse)   # includes dplyr, tidyr, ggplot2, etc.## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## âś” dplyr     1.1.4     âś” readr     2.1.5
## âś” forcats   1.0.1     âś” stringr   1.5.2
## âś” ggplot2   4.0.0     âś” tibble    3.3.0
## âś” lubridate 1.9.4     âś” tidyr     1.3.1
## âś” purrr     1.1.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## âś– dplyr::filter() masks stats::filter()
## âś– dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errorslibrary(readr)Make sure you downloaded the dataset from Kaggle and saved it as Netflix.csv in your working directory.
Dataset link:
https://www.kaggle.com/datasets/dearsirmehta/100-analysis-using-netflix-datasets
# Read the Netflix dataset
Netflix <- read_csv("Netflix.csv")## Rows: 6234 Columns: 12
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (10): type, title, director, cast, country, date_added, rating, duration...
## dbl  (2): show_id, release_year
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.# Display first few rows to confirm
head(Netflix)## # A tibble: 6 Ă— 12
##    show_id type    title   director cast  country date_added release_year rating
##      <dbl> <chr>   <chr>   <chr>    <chr> <chr>   <chr>             <dbl> <chr> 
## 1 81145628 Movie   Norm o… Richard… Alan… United… September…         2019 TV-PG 
## 2 80117401 Movie   Jandin… <NA>     Jand… United… September…         2016 TV-MA 
## 3 70234439 TV Show Transf… <NA>     Pete… United… September…         2013 TV-Y7…
## 4 80058654 TV Show Transf… <NA>     Will… United… September…         2016 TV-Y7 
## 5 80125979 Movie   #reali… Fernand… Nest… United… September…         2017 TV-14 
## 6 80163890 TV Show Apaches <NA>     Albe… Spain   September…         2016 TV-MA 
## # ℹ 3 more variables: duration <chr>, listed_in <chr>, description <chr>Netflix_Actor <- Netflix %>% 
  separate_rows(cast, sep = ", ") %>% 
  drop_na(cast) %>% 
  rename(actor = cast)
head(Netflix_Actor)## # A tibble: 6 Ă— 12
##    show_id type  title     director actor country date_added release_year rating
##      <dbl> <chr> <chr>     <chr>    <chr> <chr>   <chr>             <dbl> <chr> 
## 1 81145628 Movie Norm of … Richard… Alan… United… September…         2019 TV-PG 
## 2 81145628 Movie Norm of … Richard… Andr… United… September…         2019 TV-PG 
## 3 81145628 Movie Norm of … Richard… Bria… United… September…         2019 TV-PG 
## 4 81145628 Movie Norm of … Richard… Cole… United… September…         2019 TV-PG 
## 5 81145628 Movie Norm of … Richard… Jenn… United… September…         2019 TV-PG 
## 6 81145628 Movie Norm of … Richard… Jona… United… September…         2019 TV-PG 
## # ℹ 3 more variables: duration <chr>, listed_in <chr>, description <chr>Top_Actors <- Netflix_Actor %>%
  select(type, actor) %>% 
  filter(type == "TV Show") %>% 
  group_by(actor) %>% 
  summarise(Appearances = n()) %>% 
  arrange(desc(Appearances)) %>% 
  head(6)
Top_Actors## # A tibble: 6 Ă— 2
##   actor              Appearances
##   <chr>                    <int>
## 1 Takahiro Sakurai            18
## 2 Yuki Kaji                   16
## 3 Daisuke Ono                 14
## 4 David Attenborough          14
## 5 Ashleigh Ball               12
## 6 Hiroshi Kamiya              12Let’s make a bar chart for the top 6 actors with the most Netflix TV Show appearances.
Top_Actors %>%
  ggplot(aes(x = reorder(actor, Appearances), y = Appearances, fill = actor)) +
  geom_col(show.legend = FALSE) +
  coord_flip() +
  labs(
    title = "Top 6 Actors with the Most TV Show Appearances on Netflix",
    x = "Actor",
    y = "Number of TV Shows"
  ) +
  theme_minimal()End of Exercise.