Selain laporan ini, saya juga membuat dashboard interaktif dengan Shiny. Akses dashboardnya di sini: https://hairulysin.shinyapps.io/Netflix/

Dataset

Import Data

netflix <- read.csv("data_IP/netflix.csv")
netflix

Import Library

🛠 Cek struktur data

# base
glimpse(netflix)

#> Rows: 8,790
#> Columns: 10
#> $ show_id      <chr> "s1", "s3", "s6", "s14", "s8", "s9", "s10", "s939", "s13"…
#> $ type         <chr> "Movie", "TV Show", "TV Show", "Movie", "Movie", "TV Show…
#> $ title        <chr> "Dick Johnson Is Dead", "Ganglands", "Midnight Mass", "Co…
#> $ director     <chr> "Kirsten Johnson", "Julien Leclercq", "Mike Flanagan", "B…
#> $ country      <chr> "United States", "France", "United States", "Brazil", "Un…
#> $ date_added   <chr> "9/25/2021", "9/24/2021", "9/24/2021", "9/22/2021", "9/24…
#> $ release_year <int> 2020, 2021, 2021, 2021, 1993, 2021, 2021, 2019, 2021, 201…
#> $ rating       <chr> "PG-13", "TV-MA", "TV-MA", "TV-PG", "TV-MA", "TV-14", "PG…
#> $ duration     <chr> "90 min", "1 Season", "1 Season", "91 min", "125 min", "9…
#> $ listed_in    <chr> "Documentaries", "Crime TV Shows, International TV Shows,…

Berikut merupakan deskripsi dari variabel pada dataset ‘netflix.csv’ :

show_id : ID unik dari setiap entri.
type : Jenis konten (movie/tv show).
title : Judul film/tv show.
director : nama sutradara.
country : Negara asal
data_added : Tanggal ketika film ditambahkan ke platform netflix.
release_year : Tahun rilis film.
rating : klasifikasi usia untuk film/tv show.
duration : Durasi film/tv show.
listed_in : Gnere film/tv show.

Data Wrangling

🛠 Seleksi kolom :

Pertanyaan: Bagaimana distribusi jenis konten (Movie atau TV Show) dalam dataset Netflix?
Kolom yang dibutuhkan: type dan buat kolom baru count dari type.

Membuat tema algoritma untuk branding visualization

theme_algoritma <- theme(legend.key = element_rect(fill="black"),
           legend.background = element_rect(color="white", fill="#263238"),
           plot.subtitle = element_text(size=6, color="white"),
           panel.background = element_rect(fill="#dddddd"),
           panel.border = element_rect(fill=NA),
           panel.grid.minor.x = element_blank(),
           panel.grid.major.x = element_blank(),
           panel.grid.major.y = element_line(color="darkgrey", linetype=2), 
           panel.grid.minor.y = element_blank(),
           plot.background = element_rect(fill="#263238"),
           text = element_text(color="white"),
           axis.text = element_text(color="white")
           )

Business Question & Visualization

Bagaimana Persebaran Konten di Netflix?

palet_warna <- c("#db0000", "black")

# Membuat grafik bar chart
plot_bar <- ggplot(netflix1,
                   aes(x = type, 
                       y = count, 
                       fill = type,
                       text = glue("Jenis Konten : {type}
                                   Jumlah: {count}
                                   Distribusi : {round(count/sum(count)*100,2)} %"))) +
  geom_bar(stat = "identity", color = "black") +
  labs(x = "Jenis Konten", 
       y = "Jumlah", 
       fill = "Jenis Konten") +
  ggtitle("Jenis Konten di Netflix") +
  scale_fill_manual(values = palet_warna) +
  theme(plot.title = element_text(hjust = 0.5))+
  theme_algoritma+
  guides(fill = FALSE)

ggplotly(plot_bar, tooltip = "text")

Negara mana yang paling banyak berkontribusi dalam konten yang tersedia di Netflix?

library(maps)

world_map <- map_data("world")

netflix2 <- netflix %>% 
  group_by(country) %>% 
  summarise(contributors = n()) %>% 
  arrange(desc(contributors))

# Ubah nama negara langsung dalam dataset
netflix2 <- netflix2 %>%
  mutate(country = case_when(
    country == "United States" ~ "USA",
    # Tambahkan aturan perubahan nama negara lainnya sesuai kebutuhan
    TRUE ~ country  # Biarkan negara lain tidak berubah
  ))

map_data <- left_join(world_map, 
                      netflix2, 
                      by = c("region" = "country"))

plot_map <- ggplot(map_data, 
                   aes(x = long, 
                       y = lat, 
                       group = group, 
                       fill = contributors,
                      text = glue("Negara: {region}<br>Kontribusi: {contributors}")))+
  geom_polygon(color = "black") +
  scale_fill_gradient(low = "#db0000", high = "black", limits = c(0, max(netflix2$contributors))) +
  labs(title = "Kontribusi Negara dalam Konten Netflix",
       x = NULL,
       y = NULL,
       fill = "Kontribusi") +
  theme_algoritma+
    theme(plot.title = element_text(hjust = 0.5),
      panel.grid.major.x = element_line(color = "darkgrey", linetype = 2))

ggplotly(plot_map, tooltip = "text")

Bagaimana tren penambahan konten ke Netflix dari waktu ke waktu?

netflix3 <- netflix %>% 
  group_by(release_year) %>% 
  summarise(count = n()) %>% 
  arrange(release_year) %>% 
  filter(release_year >= 1980)

plot_trend <- ggplot(netflix3, 
                     aes(x = release_year, 
                         y = count,
                         text = glue("Tahun Rilis : {release_year}
                                     Jumlah konten : {count}"))) +
  geom_col(fill = "#564d4d", width = 0.8) +
  geom_col(data = netflix3 %>% 
             filter(release_year == 2018), fill = "#db0000", width = 1) +
  labs(title = "Tren Penambahan Konten ke Netflix Tiap Tahun",
       x = "Tahun Rilis",
       y = "Jumlah Konten Ditambahkan") +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(axis.text.x = element_blank()) +
  theme_algoritma 

ggplotly(plot_trend, tooltip = "text")

Bagaimana rating konten di Netflix terdistribusi?

Informasi Rating :

“G”: Konten yang cocok untuk semua usia.
“NC-17”: Konten yang hanya cocok untuk penonton dewasa berusia 17 tahun ke atas.
“NR”: Konten yang tidak memiliki rating atau belum ditentukan ratingnya.
“PG”: Disarankan pengawasan orang tua. Konten mungkin tidak cocok untuk anak-anak kecil.
“PG-13”: Disarankan pengawasan orang tua untuk anak-anak di bawah usia 13 tahun.
“R”: Konten yang hanya cocok untuk penonton dewasa. Dibutuhkan pengawasan orang tua.
“TV-14”: Konten yang disarankan untuk penonton usia 14 tahun ke atas.
“TV-G”: Konten yang cocok untuk semua penonton umur.
“TV-MA”: Konten yang hanya cocok untuk penonton dewasa. Tidak cocok untuk anak-anak.
“TV-PG”: Disarankan pengawasan orang tua. Konten mungkin tidak cocok untuk anak-anak kecil.
“TV-Y”: Konten yang cocok untuk anak-anak usia prasekolah.
“TV-Y7”: Konten yang cocok untuk anak-anak usia 7 tahun ke atas.
“TV-Y7-FV”: Konten yang cocok untuk anak-anak usia 7 tahun ke atas. Mungkin berisi kekerasan fantasi ringan atau sedikit bahasa kasar.
“UR”: Rating tidak terdefinisi atau tidak tersedia.

# Mengambil kolom rating dari dataset Netflix
netflix4 <- select(netflix, rating)

# Menghitung frekuensi rating
rating_counts <- table(netflix4$rating)
rating_data <- data.frame(rating = names(rating_counts), 
                          count = as.numeric(rating_counts))

# Membuat grafik bar plot dengan gradasi warna merah
plot_rating <- ggplot(rating_data, 
                      aes(x = reorder(rating, +count), 
                          y = count,
                          text = glue("Rating: {rating}
                                      Jumlah: {count}"))) +
  geom_bar(stat = "identity", aes(fill = count)) +
  labs(title = "Distribusi Rating Konten di Netflix",
       x = "Rating",
       y = "Jumlah",
       fill = " Jumlah") +
  # coord_flip() +
  scale_fill_gradient(low = "black", high = "#db0000") +  # Gradasi warna merah
  theme(plot.title = element_text(hjust = 0.5)) +
  theme_algoritma

ggplotly(plot_rating, tooltip = "text")

Bagaimana durasi rata-rata dari konten Movie di Netflix?

# Filter data hanya untuk tipe 'Movie'
netflix_5 <- netflix %>%
  filter(type == 'Movie')

# Membuat kolom 'duration_numeric' yang berisi durasi dalam format numerik
netflix_5 <- netflix_5 %>%
  mutate(Durasi = as.numeric(gsub("[^0-9]", "", duration)))

# Mendapatkan rata-rata durasi
mean_duration <- mean(netflix_5$Durasi)

# Visualisasi dengan histogram
plot_movie_duration <- ggplot(netflix_5, aes(x = Durasi)) +
  geom_histogram(binwidth = 10, fill = "#831010", color = "black") +
  geom_vline(xintercept = mean_duration, color = "black", linetype = "dashed", size = 1) +
  labs(title = "Distribusi Durasi Film di Netflix",
       x = "Durasi (Menit)",
       y = "Jumlah Film (count) ") +
  geom_text(x = mean_duration, y = 140, label = "Rata-rata", color = "white", vjust = -2, hjust = -0.2) +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme_algoritma +
  scale_y_continuous(name = "Jumlah Film")

ggplotly(plot_movie_duration)

Bagaimana tren jumlah film dan serial TV yang dirilis di platform Netflix sejak tahun 2000 ?

# Filter data berdasarkan abad ke-20
netflix_20th_century <- netflix %>%
  filter(release_year >= 2000)

# Menghitung jumlah film dan serial TV berdasarkan tahun dan tipe
trend_counts <- netflix_20th_century %>%
  group_by(release_year, type) %>%
  summarise(Jumlah = n()) %>%
  ungroup()

# Menghitung proporsi
trend_counts <- trend_counts %>%
  group_by(release_year) %>%
  mutate(proportion = Jumlah / sum(Jumlah))

# Visualisasi tren jumlah film dan serial TV
plot_trend <- trend_counts %>%
  ggplot(mapping = aes(x = release_year,
                       y = Jumlah,
                       fill = type,
                       text = paste("Tahun Rilis: ", release_year, "<br>Tipe: ", type, "<br>Jumlah: ", Jumlah))) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Tren Jumlah Film dan Serial TV di Netflix (Abad ke-20)",
       x = "Tahun Rilis",
       y = "Jumlah",
       fill = "Tipe") +
  scale_fill_manual(values = c("#831010", "black")) +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(legend.position = "none") +
  theme_algoritma

# Ubah plot menjadi interaktif dengan tooltip
plotly_plot <- ggplotly(plot_trend, tooltip = "text")

# Tampilkan plot interaktif
plotly_plot

Bagaimana perbandingan jumlah film yang dirilis setiap tahun di Indonesia dan Thailand?

library(plotly)

# Filter data untuk film yang dirilis di Indonesia sejak tahun 2000
filtered_data_indonesia <- netflix[netflix$country == "Indonesia" & netflix$release_year >= 2000, ]

# Filter data untuk film yang dirilis di Amerika Serikat sejak tahun 2000
filtered_data_thailand <- netflix[netflix$country == "Thailand" & netflix$release_year >= 2000, ]

# Menghitung jumlah film yang dirilis setiap tahun di Indonesia
yearly_counts_indonesia <- table(filtered_data_indonesia$release_year)

# Menghitung jumlah film yang dirilis setiap tahun di Amerika Serikat
yearly_counts_thailand <- table(filtered_data_thailand$release_year)

# Mengonversi tabel menjadi data frame
df_indonesia <- data.frame(tahun = as.numeric(names(yearly_counts_indonesia)), jumlah = as.numeric(yearly_counts_indonesia))
df_thailand <- data.frame(tahun = as.numeric(names(yearly_counts_thailand)), jumlah = as.numeric(yearly_counts_thailand))

# Membuat plot garis
line_plot <- ggplot() +
  geom_line(data = df_indonesia, aes(x = tahun, 
                                     y = jumlah, 
                                     color = "Indonesia"), size = 0.8) +
  geom_line(data = df_thailand, aes(x = tahun, 
                                    y = jumlah, 
                                    color = "Thailand"), size = 0.8) +
  labs(title = "Perbandingan Trend Rilis Film (Indonesia vs Thailand)",
       x = NULL, y = "Jumlah Film",
       color = "Negara") +
  scale_color_manual(values = c("Indonesia" = "#db0000", "Thailand" = "black")) +
  theme_algoritma +
  theme(plot.title = element_text(hjust = 0.5),
      panel.grid.major.x = element_line(color = "darkgrey", linetype = 2))
  theme(plot.title = element_text(hjust = 0.5))

#> List of 1
#>  $ plot.title:List of 11
#>   ..$ family       : NULL
#>   ..$ face         : NULL
#>   ..$ colour       : NULL
#>   ..$ size         : NULL
#>   ..$ hjust        : num 0.5
#>   ..$ vjust        : NULL
#>   ..$ angle        : NULL
#>   ..$ lineheight   : NULL
#>   ..$ margin       : NULL
#>   ..$ debug        : NULL
#>   ..$ inherit.blank: logi FALSE
#>   ..- attr(*, "class")= chr [1:2] "element_text" "element"
#>  - attr(*, "class")= chr [1:2] "theme" "gg"
#>  - attr(*, "complete")= logi FALSE
#>  - attr(*, "validate")= logi TRUE

# Konversi plot menjadi plot interaktif menggunakan plotly
interactive_plot <- ggplotly(line_plot, tooltip = c("x", "y", "group")) %>%
  layout(hoverlabel = list(namelength = -1),
         hovertemplate = "Tahun Rilis: %{x}<br>Total: %{y}<br>Negara: %{group}") %>% 
   layout(legend = list(orientation = "h", x = 0.35, y = -0.1)) 

# Menampilkan plot interaktif
interactive_plot

Siapa sutradara dengan jumlah konten terbanyak di Netflix?

# Menghitung jumlah konten berdasarkan sutradara
director_counts <- netflix %>%
  count(director, sort = TRUE)

# Mengambil 10 sutradara teratas
top_directors <- head(director_counts, n = 15)

# Menghilangkan sutradara dengan nilai "Not Given"
top_directors <- top_directors %>%
  filter(director != "Not Given")

# Menghitung persentase jumlah konten
top_directors <- top_directors %>%
  mutate(percentage = n / sum(n) * 100)

# Menyusun sutradara berdasarkan persentase tertinggi
top_directors <- top_directors %>%
  arrange(desc(percentage))

# Membuat plot bar chart horizontal dengan pengurutan sumbu y dan gradasi warna
p <- ggplot(top_directors, aes(x = percentage, 
                               y = reorder(director, percentage),
                               text = glue("Sutradara: {director}
                                           Persentase: {round(percentage, 2)} %",
                                           fill = percentage))) +
  geom_col(aes(fill = percentage)) +
  labs(title = "Sutradara dengan Jumlah Konten Terbanyak di Netflix",
       x = "Jumlah Persentase", y = "", fill = "Persentase") +
  theme_algoritma +
  theme(plot.title = element_text(hjust = 0.5),
      panel.grid.major.x = element_line(color = "darkgrey", linetype = 2),
      panel.grid.major.y = element_blank()) +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(legend.position = "none") +
  scale_x_continuous(labels = scales::number_format(suffix = "%")) +
  scale_fill_gradient(low = "#474747", high = "#c40815")

# Mengubah plot menjadi interaktif menggunakan ggplotly
ggplotly(p, tooltip = "text")

Bagaimana perbandingan rating Dewasa dan Remaja di top country?

perbandingan_rating <- netflix_new %>%
  group_by(country, rating_type) %>%
  summarize(jumlah_film = n()) %>%
  filter(rating_type %in% c("Remaja", "Dewasa")) %>%
  filter(country != "Not Given") %>%
  mutate(proporsi = round((jumlah_film / sum(jumlah_film)) * 100))

top_10_countries <- perbandingan_rating %>%
  group_by(country) %>%
  summarize(total_jumlah_film = sum(jumlah_film)) %>%
  top_n(10, total_jumlah_film) %>%
  inner_join(perbandingan_rating, by = "country") %>%
  mutate(country = reorder(country, desc(-jumlah_film)))

plot_composition <- top_10_countries %>% 
  ggplot(mapping = aes(x = proporsi,
                       y = country,
                       fill = factor(rating_type, levels = c("Dewasa", "Remaja")),
                       text = glue("Rating: {rating_type}\nPersentase: {proporsi}%"))) +
  geom_col(position = position_stack(reverse = TRUE)) +
  geom_vline(xintercept = 50, lty = 2, lwd = 1.5, col = "white") +
  scale_fill_manual(values = c("Dewasa" = "#db0000", "Remaja" = "#474747"), drop = FALSE) +
  scale_x_continuous(labels = scales::number_format(suffix = "%")) +
  labs(title = "Rating Dewasa vs Remaja",
       x = NULL,
       y = NULL) +
  theme_algoritma +
  theme(plot.title = element_text(hjust = 0.5),
      panel.grid.major.x = element_line(color = "darkgrey", linetype = 2),
      panel.grid.major.y = element_blank()) +
  theme(axis.text.y = element_text(hjust = 0.5, size = 10)) +
  theme(plot.title = element_text(hjust = 0.5)) +
  theme(legend.position = "none")

ggplotly(plot_composition, tooltip = c("text"))

Apakah terdapat perbedaan jumlah film dan TV show di negara - negara ASEAN?

asia_countries <- c("Indonesia", "Malaysia", "Thailand", "Singapore", "Philippines", "Vietnam")

asia_plot <- netflix_new %>%
  filter(country %in% asia_countries) %>%
  group_by(country, type) %>%
  summarise(count = n()) %>%
  ggplot(aes(x = country, y = count, fill = type, text = glue("Negara: {country}\nJumlah: {count}\nTipe: {type}"))) +
  geom_bar(stat = "identity", position = "stack") +
  labs(title = "Perbedaan Jumlah Film dan TV Show di beberapa Negara",
       x = "", y = "Jumlah", fill = "Tipe :") +
  theme(axis.text.x = element_text(angle = 0, hjust = 1)) +
  scale_fill_manual(values = c("Movie" = "#971400", "TV Show" = "black")) +
  theme_minimal() +
  theme(plot.title = element_text(size = 14)) +
  theme_algoritma

asia_plot_interaktif <- ggplotly(asia_plot, tooltip = "text") %>%
  layout(legend = list(orientation = "h", x = 0.35, y = -0.1)) 
asia_plot_interaktif

Apakah terdapat perbedaan durasi film dalam menit antara film dengan rating dewasa dan rating remaja

netflix_new <- netflix_new[netflix_new$rating_type != "Lainnya" & !is.na(netflix_new$durasi_menit), ]



# Menghitung rata-rata durasi
rata_durasi <- mean(netflix_new$durasi_menit, na.rm = TRUE)

# Membuat plot histogram dengan garis putus-putus rata-rata durasi
durasi_rating_plot <- ggplot(netflix_new, aes(x = durasi_menit, fill = rating_type, 
                                              text = paste("Durasi:", durasi_menit, "menit",
                                                           "\nKategori :", rating_type))) +
  geom_histogram(binwidth = 10, position = "identity", alpha = 0.7) +
  geom_vline(xintercept = rata_durasi, linetype = "dashed", color = "white") +  # Menambahkan garis putus-putus rata-rata
  geom_text(x = rata_durasi, y = 10, label = "Rata-rata", vjust = -1, color = "white", size = 3, angle = 90) +  # Menambahkan teks "Rata-rata"
  labs(title = "Perbedaan Durasi antara Rating Dewasa & Remaja",
       x = "Durasi (menit)", y = "Jumlah Film") +
  scale_fill_manual(values = c("Dewasa" = "#db0000", "Remaja" = "black"), 
                    labels = c("Dewasa", "Remaja"),
                    breaks = c("Dewasa", "Remaja"),
                    drop = FALSE) +
  theme(legend.position = "none") +
  theme_algoritma +
  theme(plot.title = element_text(hjust = 0.5),
        panel.grid.major.x = element_line(color = "darkgrey", linetype = 2))

# Mengubah plot menjadi interaktif menggunakan ggplotly
durasi_rating_plot <- ggplotly(durasi_rating_plot, tooltip = "text") %>%
  layout(legend = list(orientation = "h", x = 0.35, y = -0.2)) 

durasi_rating_plot

Netflix Visualization

Hairul Yasin

27 July 2023