Data Visualization - Disney
1 Introduction
Dalam projek ini, saya akan menganalisis data dari Kaggle, yaitu data Disney. Disney Plus, juga dikenal sebagai Disney+, adalah platform streaming video on-demand yang dimiliki dan dioperasikan oleh The Walt Disney Company. Diluncurkan pada November 2019, Disney Plus telah menjadi salah satu platform streaming terkemuka di dunia dengan katalog konten yang luas dan beragam, selanjutnya saya akan mencoba memvisualisasikan berdasarkan hasil analisis yang telah dilakukan pada dataset yang didapatkan.
1.1 Input Data
Pertama, kita perlu menambahkan dataset
disney <- read.csv("datainput/disney_plus_titles.csv")Deskripsi Kolom pada Dataset
show_id: Unique IDtype: Is it a Movie or a TV Showtitle: Name of the movie / showdirector: Directors of the movie / tv showcast: Main cast of the movie / showcountry: Country of productiondate_added: Date added on Disney+release_year: Original Release Year of the movie / tv showrating: Rating of the movie / tv showduration: Total duration of the movie / tv showlisted_in: Categories of the movie / showdescription: Description of the movie / show
kemudian melihat 10 data teraratas pada data frame yang telah dibuat
head(disney,10)#> show_id type title
#> 1 s1 Movie Duck the Halls: A Mickey Mouse Christmas Special
#> 2 s2 Movie Ernest Saves Christmas
#> 3 s3 Movie Ice Age: A Mammoth Christmas
#> 4 s4 Movie The Queen Family Singalong
#> 5 s5 TV Show The Beatles: Get Back
#> 6 s6 Movie Becoming Cousteau
#> 7 s7 TV Show Hawkeye
#> 8 s8 TV Show Port Protection Alaska
#> 9 s9 TV Show Secrets of the Zoo: Tampa
#> 10 s10 Movie A Muppets Christmas: Letters To Santa
#> director
#> 1 Alonso Ramirez Ramos, Dave Wasson
#> 2 John Cherry
#> 3 Karen Disher
#> 4 Hamish Hamilton
#> 5
#> 6 Liz Garbus
#> 7
#> 8
#> 9
#> 10 Kirk R. Thatcher
#> cast
#> 1 Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton
#> 2 Jim Varney, Noelle Parker, Douglas Seale
#> 3 Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah
#> 4 Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen
#> 5 John Lennon, Paul McCartney, George Harrison, Ringo Starr
#> 6 Jacques Yves Cousteau, Vincent Cassel
#> 7 Jeremy Renner, Hailee Steinfeld, Vera Farmiga, Fra Fee, Tony Dalton, Zahn McClarnon
#> 8 Gary Muehlberger, Mary Miller, Curly Leach, Sam Carlson, Stuart Andrews, David Squibb
#> 9 Dr. Ray Ball, Dr. Lauren Smith, Chris Massaro, Tiffany Burns, Mike Burns, Melinda Mendolusky
#> 10 Steve Whitmire, Dave Goelz, Bill Barretta, Eric Jacobson
#> country date_added release_year rating duration
#> 1 November 26, 2021 2016 TV-G 23 min
#> 2 November 26, 2021 1988 PG 91 min
#> 3 United States November 26, 2021 2011 TV-G 23 min
#> 4 November 26, 2021 2021 TV-PG 41 min
#> 5 November 25, 2021 2021 1 Season
#> 6 United States November 24, 2021 2021 PG-13 94 min
#> 7 November 24, 2021 2021 TV-14 1 Season
#> 8 United States November 24, 2021 2015 TV-14 2 Seasons
#> 9 United States November 24, 2021 2019 TV-PG 2 Seasons
#> 10 United States November 19, 2021 2008 G 45 min
#> listed_in
#> 1 Animation, Family
#> 2 Comedy
#> 3 Animation, Comedy, Family
#> 4 Musical
#> 5 Docuseries, Historical, Music
#> 6 Biographical, Documentary
#> 7 Action-Adventure, Superhero
#> 8 Docuseries, Reality, Survival
#> 9 Animals & Nature, Docuseries, Family
#> 10 Comedy, Family, Musical
#> description
#> 1 Join Mickey and the gang as they duck the halls!
#> 2 Santa Claus passes his magic bag to a new St. Nic.
#> 3 Sid the Sloth is on Santa's naughty list.
#> 4 This is real life, not just fantasy!
#> 5 A three-part documentary from Peter Jackson capturing a moment in music history with The Beatles.
#> 6 An inside look at the legendary life of adventurer Jacques-Yves Cousteau.
#> 7 Clint Barton/Hawkeye must team up with skilled archer Kate Bishop to unravel a criminal conspiracy.
#> 8 Residents of Port Protection must combat volatile conditions to survive and thrive in Alaska.
#> 9 A day in the life at ZooTampa is anything but ordinary. It's extraordinary!
#> 10 Celebrate the holiday season with all your favorite Muppets.
kemudian perlu untuk melihat dimensi data pada data frame
dim(disney)#> [1] 1450 12
2 Data Cleansing
Langkah pertama dalam melakukan analisis data adalah memastikan bahwa data yang akan digunakan bersih
2.1 Load Libraries
Pertama, kita perlu memuat library yang diperlukan
library(dplyr) # untuk transformasi data
library(plotly) # untuk membuat plot menjadi interaktif
library(glue) # untuk custom informasi saat plot interaktif
library(scales) # untuk custom keterangan axis atau lainnya
library(tidyr) # untuk custom keterangan axis atau lainnya
library(ggpubr) # untuk export plot
library(tidyverse)
library(lubridate)
library(ggplot2)2.2 Explicit Coercion
Selanjutnya, kita perlu memeriksa tipe data di setiap kolom sudah benar
str(disney)#> 'data.frame': 1450 obs. of 12 variables:
#> $ show_id : chr "s1" "s2" "s3" "s4" ...
#> $ type : chr "Movie" "Movie" "Movie" "Movie" ...
#> $ title : chr "Duck the Halls: A Mickey Mouse Christmas Special" "Ernest Saves Christmas" "Ice Age: A Mammoth Christmas" "The Queen Family Singalong" ...
#> $ director : chr "Alonso Ramirez Ramos, Dave Wasson" "John Cherry" "Karen Disher" "Hamish Hamilton" ...
#> $ cast : chr "Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton" "Jim Varney, Noelle Parker, Douglas Seale" "Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah" "Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen" ...
#> $ country : chr "" "" "United States" "" ...
#> $ date_added : chr "November 26, 2021" "November 26, 2021" "November 26, 2021" "November 26, 2021" ...
#> $ release_year: int 2016 1988 2011 2021 2021 2021 2021 2015 2019 2008 ...
#> $ rating : chr "TV-G" "PG" "TV-G" "TV-PG" ...
#> $ duration : chr "23 min" "91 min" "23 min" "41 min" ...
#> $ listed_in : chr "Animation, Family" "Comedy" "Animation, Comedy, Family" "Musical" ...
#> $ description : chr "Join Mickey and the gang as they duck the halls!" "Santa Claus passes his magic bag to a new St. Nic." "Sid the Sloth is on Santa's naughty list." "This is real life, not just fantasy!" ...
Berdasarkan tipe data untuk setiap kolom di atas, terdapat tipe data yang salah. Oleh karena itu, kita harus mengubah tipe datanya
disney$type <- as.factor(disney$type)
disney$date_added <- mdy(disney$date_added)
disney$release_year <- parse_date_time(disney$release_year,'y')
str(disney)#> 'data.frame': 1450 obs. of 12 variables:
#> $ show_id : chr "s1" "s2" "s3" "s4" ...
#> $ type : Factor w/ 2 levels "Movie","TV Show": 1 1 1 1 2 1 2 2 2 1 ...
#> $ title : chr "Duck the Halls: A Mickey Mouse Christmas Special" "Ernest Saves Christmas" "Ice Age: A Mammoth Christmas" "The Queen Family Singalong" ...
#> $ director : chr "Alonso Ramirez Ramos, Dave Wasson" "John Cherry" "Karen Disher" "Hamish Hamilton" ...
#> $ cast : chr "Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton" "Jim Varney, Noelle Parker, Douglas Seale" "Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah" "Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen" ...
#> $ country : chr "" "" "United States" "" ...
#> $ date_added : Date, format: "2021-11-26" "2021-11-26" ...
#> $ release_year: POSIXct, format: "2016-01-01" "1988-01-01" ...
#> $ rating : chr "TV-G" "PG" "TV-G" "TV-PG" ...
#> $ duration : chr "23 min" "91 min" "23 min" "41 min" ...
#> $ listed_in : chr "Animation, Family" "Comedy" "Animation, Comedy, Family" "Musical" ...
#> $ description : chr "Join Mickey and the gang as they duck the halls!" "Santa Claus passes his magic bag to a new St. Nic." "Sid the Sloth is on Santa's naughty list." "This is real life, not just fantasy!" ...
2.3 Check Missing Values
colSums(is.na(disney))#> show_id type title director cast country
#> 0 0 0 0 0 0
#> date_added release_year rating duration listed_in description
#> 3 0 0 0 0 0
saya memiliki nilai yang hilang di kolom date_added, saya akan melakukan tindakan pada NA dan mengubahnya menjadi string “Missing Values”
disney$director[disney$director==""] <- NA
disney$cast[disney$cast==""] <- NA
disney$country[disney$country==""] <- NA
disney$rating[disney$rating==""] <- NAdisney$director[which(is.na(disney$director))] <- "Missing Values"
disney$cast[which(is.na(disney$cast))] <- "Missing Values"
disney$country[which(is.na(disney$country))] <- "Missing Values"
disney$date_added[which(is.na(disney$date_added))] <- "01-01-01" #because the date_added column has a date data type
disney$rating[which(is.na(disney$rating))] <- "Missing Values"
colSums(is.na(disney))#> show_id type title director cast country
#> 0 0 0 0 0 0
#> date_added release_year rating duration listed_in description
#> 0 0 0 0 0 0
Bagus! saya tidak memiliki nilai yang hilang.
2.4 Finishing Data Cleansing
Pertama, saya memeriksa ulang data lagi
head(disney,10)#> show_id type title
#> 1 s1 Movie Duck the Halls: A Mickey Mouse Christmas Special
#> 2 s2 Movie Ernest Saves Christmas
#> 3 s3 Movie Ice Age: A Mammoth Christmas
#> 4 s4 Movie The Queen Family Singalong
#> 5 s5 TV Show The Beatles: Get Back
#> 6 s6 Movie Becoming Cousteau
#> 7 s7 TV Show Hawkeye
#> 8 s8 TV Show Port Protection Alaska
#> 9 s9 TV Show Secrets of the Zoo: Tampa
#> 10 s10 Movie A Muppets Christmas: Letters To Santa
#> director
#> 1 Alonso Ramirez Ramos, Dave Wasson
#> 2 John Cherry
#> 3 Karen Disher
#> 4 Hamish Hamilton
#> 5 Missing Values
#> 6 Liz Garbus
#> 7 Missing Values
#> 8 Missing Values
#> 9 Missing Values
#> 10 Kirk R. Thatcher
#> cast
#> 1 Chris Diamantopoulos, Tony Anselmo, Tress MacNeille, Bill Farmer, Russi Taylor, Corey Burton
#> 2 Jim Varney, Noelle Parker, Douglas Seale
#> 3 Raymond Albert Romano, John Leguizamo, Denis Leary, Queen Latifah
#> 4 Darren Criss, Adam Lambert, Derek Hough, Alexander Jean, Fall Out Boy, Jimmie Allen
#> 5 John Lennon, Paul McCartney, George Harrison, Ringo Starr
#> 6 Jacques Yves Cousteau, Vincent Cassel
#> 7 Jeremy Renner, Hailee Steinfeld, Vera Farmiga, Fra Fee, Tony Dalton, Zahn McClarnon
#> 8 Gary Muehlberger, Mary Miller, Curly Leach, Sam Carlson, Stuart Andrews, David Squibb
#> 9 Dr. Ray Ball, Dr. Lauren Smith, Chris Massaro, Tiffany Burns, Mike Burns, Melinda Mendolusky
#> 10 Steve Whitmire, Dave Goelz, Bill Barretta, Eric Jacobson
#> country date_added release_year rating duration
#> 1 Missing Values 2021-11-26 2016-01-01 TV-G 23 min
#> 2 Missing Values 2021-11-26 1988-01-01 PG 91 min
#> 3 United States 2021-11-26 2011-01-01 TV-G 23 min
#> 4 Missing Values 2021-11-26 2021-01-01 TV-PG 41 min
#> 5 Missing Values 2021-11-25 2021-01-01 Missing Values 1 Season
#> 6 United States 2021-11-24 2021-01-01 PG-13 94 min
#> 7 Missing Values 2021-11-24 2021-01-01 TV-14 1 Season
#> 8 United States 2021-11-24 2015-01-01 TV-14 2 Seasons
#> 9 United States 2021-11-24 2019-01-01 TV-PG 2 Seasons
#> 10 United States 2021-11-19 2008-01-01 G 45 min
#> listed_in
#> 1 Animation, Family
#> 2 Comedy
#> 3 Animation, Comedy, Family
#> 4 Musical
#> 5 Docuseries, Historical, Music
#> 6 Biographical, Documentary
#> 7 Action-Adventure, Superhero
#> 8 Docuseries, Reality, Survival
#> 9 Animals & Nature, Docuseries, Family
#> 10 Comedy, Family, Musical
#> description
#> 1 Join Mickey and the gang as they duck the halls!
#> 2 Santa Claus passes his magic bag to a new St. Nic.
#> 3 Sid the Sloth is on Santa's naughty list.
#> 4 This is real life, not just fantasy!
#> 5 A three-part documentary from Peter Jackson capturing a moment in music history with The Beatles.
#> 6 An inside look at the legendary life of adventurer Jacques-Yves Cousteau.
#> 7 Clint Barton/Hawkeye must team up with skilled archer Kate Bishop to unravel a criminal conspiracy.
#> 8 Residents of Port Protection must combat volatile conditions to survive and thrive in Alaska.
#> 9 A day in the life at ZooTampa is anything but ordinary. It's extraordinary!
#> 10 Celebrate the holiday season with all your favorite Muppets.
Langkah selanjutnya, saya ingin mengambil data yang diperlukan untuk melakukan analisis data, yaitu data yang bukan dari kolom “show_id”,“director”,“cast”,“country” dan “description”
disney <- disney %>% select(,c(-"show_id",-"director",-"cast",-"country",-"description"))
head(disney,10)#> type title date_added
#> 1 Movie Duck the Halls: A Mickey Mouse Christmas Special 2021-11-26
#> 2 Movie Ernest Saves Christmas 2021-11-26
#> 3 Movie Ice Age: A Mammoth Christmas 2021-11-26
#> 4 Movie The Queen Family Singalong 2021-11-26
#> 5 TV Show The Beatles: Get Back 2021-11-25
#> 6 Movie Becoming Cousteau 2021-11-24
#> 7 TV Show Hawkeye 2021-11-24
#> 8 TV Show Port Protection Alaska 2021-11-24
#> 9 TV Show Secrets of the Zoo: Tampa 2021-11-24
#> 10 Movie A Muppets Christmas: Letters To Santa 2021-11-19
#> release_year rating duration listed_in
#> 1 2016-01-01 TV-G 23 min Animation, Family
#> 2 1988-01-01 PG 91 min Comedy
#> 3 2011-01-01 TV-G 23 min Animation, Comedy, Family
#> 4 2021-01-01 TV-PG 41 min Musical
#> 5 2021-01-01 Missing Values 1 Season Docuseries, Historical, Music
#> 6 2021-01-01 PG-13 94 min Biographical, Documentary
#> 7 2021-01-01 TV-14 1 Season Action-Adventure, Superhero
#> 8 2015-01-01 TV-14 2 Seasons Docuseries, Reality, Survival
#> 9 2019-01-01 TV-PG 2 Seasons Animals & Nature, Docuseries, Family
#> 10 2008-01-01 G 45 min Comedy, Family, Musical
Kita mau mengambil category pertama dari kolom listed_in
disney <- disney %>% separate(listed_in,c("category","category2","category3"), sep = ",")disney <- disney %>% select(c(-"category2", -"category3"))
head(disney,10)#> type title date_added
#> 1 Movie Duck the Halls: A Mickey Mouse Christmas Special 2021-11-26
#> 2 Movie Ernest Saves Christmas 2021-11-26
#> 3 Movie Ice Age: A Mammoth Christmas 2021-11-26
#> 4 Movie The Queen Family Singalong 2021-11-26
#> 5 TV Show The Beatles: Get Back 2021-11-25
#> 6 Movie Becoming Cousteau 2021-11-24
#> 7 TV Show Hawkeye 2021-11-24
#> 8 TV Show Port Protection Alaska 2021-11-24
#> 9 TV Show Secrets of the Zoo: Tampa 2021-11-24
#> 10 Movie A Muppets Christmas: Letters To Santa 2021-11-19
#> release_year rating duration category
#> 1 2016-01-01 TV-G 23 min Animation
#> 2 1988-01-01 PG 91 min Comedy
#> 3 2011-01-01 TV-G 23 min Animation
#> 4 2021-01-01 TV-PG 41 min Musical
#> 5 2021-01-01 Missing Values 1 Season Docuseries
#> 6 2021-01-01 PG-13 94 min Biographical
#> 7 2021-01-01 TV-14 1 Season Action-Adventure
#> 8 2015-01-01 TV-14 2 Seasons Docuseries
#> 9 2019-01-01 TV-PG 2 Seasons Animals & Nature
#> 10 2008-01-01 G 45 min Comedy
Akhirnya, kita ingin membuat kolom baru yang bernama year_added dan month_added dari kolom date_added
disney$year_added <- year(disney$date_added)
disney$month_added <- month(disney$date_added, label = T)
head(disney,10)#> type title date_added
#> 1 Movie Duck the Halls: A Mickey Mouse Christmas Special 2021-11-26
#> 2 Movie Ernest Saves Christmas 2021-11-26
#> 3 Movie Ice Age: A Mammoth Christmas 2021-11-26
#> 4 Movie The Queen Family Singalong 2021-11-26
#> 5 TV Show The Beatles: Get Back 2021-11-25
#> 6 Movie Becoming Cousteau 2021-11-24
#> 7 TV Show Hawkeye 2021-11-24
#> 8 TV Show Port Protection Alaska 2021-11-24
#> 9 TV Show Secrets of the Zoo: Tampa 2021-11-24
#> 10 Movie A Muppets Christmas: Letters To Santa 2021-11-19
#> release_year rating duration category year_added
#> 1 2016-01-01 TV-G 23 min Animation 2021
#> 2 1988-01-01 PG 91 min Comedy 2021
#> 3 2011-01-01 TV-G 23 min Animation 2021
#> 4 2021-01-01 TV-PG 41 min Musical 2021
#> 5 2021-01-01 Missing Values 1 Season Docuseries 2021
#> 6 2021-01-01 PG-13 94 min Biographical 2021
#> 7 2021-01-01 TV-14 1 Season Action-Adventure 2021
#> 8 2015-01-01 TV-14 2 Seasons Docuseries 2021
#> 9 2019-01-01 TV-PG 2 Seasons Animals & Nature 2021
#> 10 2008-01-01 G 45 min Comedy 2021
#> month_added
#> 1 Nov
#> 2 Nov
#> 3 Nov
#> 4 Nov
#> 5 Nov
#> 6 Nov
#> 7 Nov
#> 8 Nov
#> 9 Nov
#> 10 Nov
Data cleansing tealh selesai dan data siap untuk digunakan dalam analisis dan visualisasi
3 Data Explanation
summary(disney)#> type title date_added
#> Movie :1052 Length:1450 Min. :0001-01-01
#> TV Show: 398 Class :character 1st Qu.:2019-11-12
#> Mode :character Median :2019-11-12
#> Mean :2016-03-18
#> 3rd Qu.:2020-11-23
#> Max. :2021-11-26
#>
#> release_year rating duration
#> Min. :1928-01-01 00:00:00.00 Length:1450 Length:1450
#> 1st Qu.:1999-01-01 00:00:00.00 Class :character Class :character
#> Median :2011-01-01 00:00:00.00 Mode :character Mode :character
#> Mean :2003-02-03 15:03:43.45
#> 3rd Qu.:2018-01-01 00:00:00.00
#> Max. :2021-01-01 00:00:00.00
#>
#> category year_added month_added
#> Length:1450 Min. : 1 Nov :809
#> Class :character 1st Qu.:2019 Apr : 86
#> Mode :character Median :2019 Jul : 85
#> Mean :2016 Jan : 64
#> 3rd Qu.:2020 Okt : 63
#> Max. :2021 Mei : 62
#> (Other):281
Insight: 1. Pada kolom type terdapat 1052 Movie titles dan 398 TV Show titles 2. Pada data ini, data dengan tanggal terakhir ada di tanggal 2021-11-26 3. Tahun rilis Movie/Tv Show pada disney platform berada di range 1928 - 2021 4. Tahun maksimum untuk Movie/TV Show by Disney di tahun 2021 5. Disney paling banyak menambahkan Movie/TV Show di bulan Nov, Apr, dan Jul
4 Study Case
- Tampilkan perbandingan antara jenis Movie atau jenis TV Show berdasarkan tahun rilis
disney %>% ggplot(mapping = aes(x=release_year, fill=type)) +
geom_histogram() +
labs(title = "Disney Films Released by Year", x="Release Year", y="Total Film") +
scale_fill_manual(values = c("Movie" = "Black", "TV Show" = "Red")) +
theme_minimal() +
theme(plot.title = element_text(face = "bold", hjust = 1))
Movie dan TV show cenderung memiliki tren naik di setiap tahunnya.
Dimana movie mengungguli TV Show di setiap tahun, namun TV Show juga
mengalami peningkatan yang cukup signifikan di rentang tahun 2000 -
2020
- Kategori apa yang termasuk kedalam 10 kategori teratas di Disney
top_categories <- disney %>%
group_by(category) %>%
count(name = "Freq") %>%
arrange(desc(Freq))
top10_categories <- head(top_categories,10)
top10_categories#> # A tibble: 10 × 2
#> # Groups: category [10]
#> category Freq
#> <chr> <int>
#> 1 Action-Adventure 452
#> 2 Animation 320
#> 3 Comedy 193
#> 4 Animals & Nature 173
#> 5 Documentary 65
#> 6 Coming of Age 56
#> 7 Biographical 35
#> 8 Docuseries 33
#> 9 Drama 27
#> 10 Buddy 20
ggplot(top10_categories,mapping=aes(x=Freq, reorder(category,Freq)))+
geom_col(aes(fill=Freq),color = "maroon",show.legend = F)+
scale_fill_gradient(low="pink",high="#cf2e2e")+
labs(title = "Disney's Top 10 Categories", x = "Total Film", y = NULL)+
theme_minimal()+
theme(plot.title=element_text(face="bold", hjust = 1))+
geom_label(data=top10_categories[1:4,], mapping=aes(label=Freq))+
geom_vline(xintercept = mean(top10_categories$Freq), col="yellow",linetype=2,lwd=1)
Plot di atas adalah visualisasi dari 10 kategori teratas di Disney
Ada garis kuning yang menunjukkan rata-rata kategori. Terdapat 4
kategori yang melebihi rata-rata yaitu kategori Action-Adventure,
Animation, Comedy dan Animals & Nature
- Rating apa yang termasuk kedalam 10 kategori teratas di Disney
top_ratings <- disney %>% group_by(rating) %>% count(name = "Freq") %>% arrange(desc(Freq))
top10_ratings <- head(top_ratings,10)
top10_ratings#> # A tibble: 10 × 2
#> # Groups: rating [10]
#> rating Freq
#> <chr> <int>
#> 1 TV-G 318
#> 2 TV-PG 301
#> 3 G 253
#> 4 PG 236
#> 5 TV-Y7 131
#> 6 TV-14 79
#> 7 PG-13 66
#> 8 TV-Y 50
#> 9 TV-Y7-FV 13
#> 10 Missing Values 3
ggplot(data = top10_ratings, mapping=aes(x=Freq,y=reorder(rating,Freq)))+
geom_col(aes(fill=Freq), color="black", show.legend = F)+
scale_fill_gradient(low="#79DAE8",high="#0AA1DD")+
labs(title="Disney's Top 10 Ratings", x = "Total Film", y= NULL)+
theme_minimal()+
theme(plot.title = element_text(face="bold",hjust=1))+
geom_label(data = top10_ratings[1:4,], mapping=aes(label=Freq))+
geom_vline(xintercept = mean(top10_ratings$Freq), col = "#FCF69C",linetype=2,lwd=1)
Plot diatas adalah visualisasi dari 10 peringkat teratas di Disney.
Terdapat garis kuning yang menandakan rata-rata ratings. Dimana ternyata
ada 4 rating yang melebihi rata-rata yakni TV-G, TV-PG, G dan PG. Untuk
penjelasan dari setiap kategori rating yang dimaksud sebagai berikut:
TV-G berarti TV Show yang bersifat General Audiences atau cocok untuk
semua umur, TV-PG berarti TV Show yang bersifat Parental Guidance atau
perlu bimbingan orang tua, kemudian G berarti Movie yang bersifat
General Audiences, dan PG berarti movie yang bersifat Parental
Guidance
- Bulan apa saja Disney menambahkan film paling banyak
disney %>% group_by(month_added,type)%>%
count(name = "Freq") %>%
ggplot(aes(x=month_added,y=Freq,fill=type))+
geom_col(aes(fill=type))+
labs(title="Disney Films Added by Month", x="Month", y="Total Film")+
theme_minimal()+
theme(plot.title=element_text(face="bold",hjust=1))
Dari plot dapat dilihat bahwa pada bulan November menjadi bulan
terbanyak Disney menambahkan Film baik itu Movie maupun TV Show
- Menampilkan tren durasi jenis film dari tahun 2000 - 2020
disney %>%
filter(type=='Movie' & release_year>="2000-01-01" & release_year<="2020-01-02") %>%
mutate(movie_duration=substr(duration,1,nchar(as.character(duration))-4)) %>%
mutate(movie_duration = as.integer(movie_duration)) %>%
group_by(release_year) %>%
summarise(avg_duration = mean(movie_duration)) %>%
ggplot(aes(x=release_year, y= avg_duration))+
geom_point() + geom_smooth()+
labs(title = "Disney Movie Duration From 2000 - 2020", x = "Year", y = "Duration(Minutes)")+
theme_minimal()+
theme(plot.title=element_text(face="bold",hjust=1))
Plot di atas menunjukkan bahwa durasi film dari tahun 2000 hingga
2020 mengalami tren penurunan.
5 Final Conclusion
Dari analisis dan visualisasi data yang telah dilakukan, dapat disimpulkan bahwa Movie dan TV Show mengalami peningkatan di setiap tahunnya. Dimana Movie mengungguli TV Show disetiap tahunnya. Namun, TV Show juga memiliki tren peningkatan yang signifikan di tahun 2000 hingga 2021. Kemudian, kategori terpopuler di Disney adalah kategori Action-Adventure, Animation, dan Comedy
Kemudian terdapat 4 rating yang melebihi rata-rata yakni TV-G, TV-PG, G dan PG. Jenis tren Movie lebih tinggi dibandingkan TV Show, dimana pada bulan November menjadi bulan yang paling sering penambahan Disney. Terakhir, durasi film dari tahun 2000 - 2020 mengalami tren penurunan dari segi durasi yang diberikan.