Halo semuanya! Dataset yang digunakan dalam visualisasi data ini diperoleh dari https://github.com/rfordatascience/tidytuesday/blob/master/data/2019/2019-04-23/raw_anime.csv
Tahap pertama untuk melakukan analisis data yaitu mengimport data. Pastikan file project berada di dalam folder yang sama dengan dataset yang kita miliki.
anime <- read.csv('tidy_anime.csv')
Lihat struktur dari dataset. Ubah tipe data yang belum sesuai. Kita juga dapat menghapus kolom yang tidak diperlukan.
str(anime)
## 'data.frame': 77911 obs. of 28 variables:
## $ animeID : int 1 1 1 1 1 1 5 5 5 5 ...
## $ name : chr "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" ...
## $ title_english : chr "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" ...
## $ title_japanese: chr "カウボーイビãƒ\220ップ" "カウボーイビãƒ\220ップ" "カウボーイビãƒ\220ップ" "カウボーイビãƒ\220ップ" ...
## $ title_synonyms: chr "[]" "[]" "[]" "[]" ...
## $ type : chr "TV" "TV" "TV" "TV" ...
## $ source : chr "Original" "Original" "Original" "Original" ...
## $ producers : chr "Bandai Visual" "Bandai Visual" "Bandai Visual" "Bandai Visual" ...
## $ genre : chr "Action" "Adventure" "Comedy" "Drama" ...
## $ studio : chr "Sunrise" "Sunrise" "Sunrise" "Sunrise" ...
## $ episodes : int 26 26 26 26 26 26 1 1 1 1 ...
## $ status : chr "Finished Airing" "Finished Airing" "Finished Airing" "Finished Airing" ...
## $ airing : logi FALSE FALSE FALSE FALSE FALSE FALSE ...
## $ start_date : chr "1998-04-03" "1998-04-03" "1998-04-03" "1998-04-03" ...
## $ end_date : chr "1999-04-02" "1999-04-02" "1999-04-02" "1999-04-02" ...
## $ duration : chr "24 min per ep" "24 min per ep" "24 min per ep" "24 min per ep" ...
## $ rating : chr "R - 17+ (violence & profanity)" "R - 17+ (violence & profanity)" "R - 17+ (violence & profanity)" "R - 17+ (violence & profanity)" ...
## $ score : num 8.81 8.81 8.81 8.81 8.81 8.81 8.41 8.41 8.41 8.41 ...
## $ scored_by : int 405664 405664 405664 405664 405664 405664 120243 120243 120243 120243 ...
## $ rank : int 26 26 26 26 26 26 164 164 164 164 ...
## $ popularity : int 39 39 39 39 39 39 449 449 449 449 ...
## $ members : int 795733 795733 795733 795733 795733 795733 197791 197791 197791 197791 ...
## $ favorites : int 43460 43460 43460 43460 43460 43460 776 776 776 776 ...
## $ synopsis : chr "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ ...
## $ background : chr "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ ...
## $ premiered : chr "Spring 1998" "Spring 1998" "Spring 1998" "Spring 1998" ...
## $ broadcast : chr "Saturdays at 01:00 (JST)" "Saturdays at 01:00 (JST)" "Saturdays at 01:00 (JST)" "Saturdays at 01:00 (JST)" ...
## $ related : chr "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ ...
anime$animeID <- as.character(anime$animeID)
anime$type <- as.factor(anime$type)
anime$genre <- as.factor(anime$genre)
# Install package lubridate untuk konversi ke tipe data date.
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.5
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
anime$start_date <- ymd(anime$start_date)
anime <- anime[, -c(3,4,5,12,15,24,25,28)]
Pisahkan data pada kolom premiered dan broadcast agar diperoleh data yang lebih spesifik. Hapus kolom yang tidak diperlukan. Gunakan package dplyr dan tidyr agar dapat menggunakan method separate dab select.
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.5
anime <- anime %>% separate(premiered, c('Prem_Season','Prem_Year')) %>%
separate(broadcast, c("Broad_Day", "at", "Broad_Time", "timezone"), sep = " " ) %>%
select(,-c(at, timezone))
## Warning: Expected 4 pieces. Additional pieces discarded in 331 rows [7263,
## 7264, 7265, 7266, 7267, 7268, 8899, 8900, 8901, 9960, 9961, 11288, 11289, 11290,
## 11291, 11292, 11293, 11294, 11295, 11296, ...].
## Warning: Expected 4 pieces. Missing pieces filled with `NA` in 14618 rows
## [23, 24, 25, 26, 27, 28, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
## 65, ...].
Isi nilai kosong NA dengan nilai yang sudah ditetapkan.
anime$Prem_Season <- ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(12,1,2), "Winter",
ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(3,4,5), "Spring",
ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(6,7,8), "Summer",
ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(9,10,11), "Fall",
no = NA ))))
Ubah nilai Unknown pada kolom source dan broad_day menjadi NA. ubah tipe data kolom yang baru dibuat dan belum sesuai.
anime$source[anime$source == 'Unknown'] <- NA
anime$Broad_Day[anime$Broad_Day == "Not"] <- NA
anime$Broad_Day[anime$Broad_Day == "Not"] <- NA
anime$Broad_Day[anime$Broad_Day == 'Unknown'] <- NA
anime$rating <- as.factor(anime$rating)
anime$Prem_Season <- as.factor(anime$Prem_Season)
anime$Broad_Day <- as.factor(anime$Broad_Day)
anime$Prem_Year <- as.numeric(anime$Prem_Year)
names(anime)
## [1] "animeID" "name" "type" "source" "producers"
## [6] "genre" "studio" "episodes" "airing" "start_date"
## [11] "duration" "rating" "score" "scored_by" "rank"
## [16] "popularity" "members" "favorites" "Prem_Season" "Prem_Year"
## [21] "Broad_Day" "Broad_Time"
Dari data anime tersebut, terdapat 22 variabel(kolom). Identitas kolom : * animeID : ID anime * name : judul anime * type : tipe anime (TV, Movie, OVA) * source : sumber anime (original, manga, game, etc) * producers : produser * genre : genre * studio : studio * episodes : keterangan episoda * airing : status anime * start_date : start date * duration : durasi anime * rating : rating umur * score : score viewers * scored_by : banyaknya user yang memberikan score * rank : ranking anime * popularity : diukur dari banyaknya member yang menyimpan ke dalam list anime * members : data members yang menyimpan anime ke dalam list * favorites : data members yang menyukai anime * Prem_Season : musim anime ditayangkan * Prem_Year : tahun anime ditayangkan * Broad_Day : hari anime broadcast * Broad_Time : waktu ketika anime dibroadcast
head(anime)
## animeID name type source producers genre studio episodes
## 1 1 Cowboy Bebop TV Original Bandai Visual Action Sunrise 26
## 2 1 Cowboy Bebop TV Original Bandai Visual Adventure Sunrise 26
## 3 1 Cowboy Bebop TV Original Bandai Visual Comedy Sunrise 26
## 4 1 Cowboy Bebop TV Original Bandai Visual Drama Sunrise 26
## 5 1 Cowboy Bebop TV Original Bandai Visual Sci-Fi Sunrise 26
## 6 1 Cowboy Bebop TV Original Bandai Visual Space Sunrise 26
## airing start_date duration rating score
## 1 FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity) 8.81
## 2 FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity) 8.81
## 3 FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity) 8.81
## 4 FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity) 8.81
## 5 FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity) 8.81
## 6 FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity) 8.81
## scored_by rank popularity members favorites Prem_Season Prem_Year Broad_Day
## 1 405664 26 39 795733 43460 Spring 1998 Saturdays
## 2 405664 26 39 795733 43460 Spring 1998 Saturdays
## 3 405664 26 39 795733 43460 Spring 1998 Saturdays
## 4 405664 26 39 795733 43460 Spring 1998 Saturdays
## 5 405664 26 39 795733 43460 Spring 1998 Saturdays
## 6 405664 26 39 795733 43460 Spring 1998 Saturdays
## Broad_Time
## 1 01:00
## 2 01:00
## 3 01:00
## 4 01:00
## 5 01:00
## 6 01:00
Pada data terdapat redudansi di mana ada anime yang muncul beberapa kali sehingga diperlukan pengambilan nilai unik dari data tersebut.
unique_anime <- data.frame(anime %>% distinct(animeID, .keep_all = TRUE))
head(unique_anime)
## animeID name type source producers
## 1 1 Cowboy Bebop TV Original Bandai Visual
## 2 5 Cowboy Bebop: Tengoku no Tobira Movie Original Sunrise
## 3 6 Trigun TV Manga Victor Entertainment
## 4 7 Witch Hunter Robin TV Original Bandai Visual
## 5 8 Bouken Ou Beet TV Manga TV Tokyo
## 6 16 Hachimitsu to Clover TV Manga Genco
## genre studio episodes airing start_date duration
## 1 Action Sunrise 26 FALSE 1998-04-03 24 min per ep
## 2 Action Bones 1 FALSE 2001-09-01 1 hr 55 min
## 3 Action Madhouse 26 FALSE 1998-04-01 24 min per ep
## 4 Action Sunrise 26 FALSE 2002-07-02 25 min per ep
## 5 Adventure Toei Animation 52 FALSE 2004-09-30 23 min per ep
## 6 Comedy J.C.Staff 24 FALSE 2005-04-15 23 min per ep
## rating score scored_by rank popularity members
## 1 R - 17+ (violence & profanity) 8.81 405664 26 39 795733
## 2 R - 17+ (violence & profanity) 8.41 120243 164 449 197791
## 3 PG-13 - Teens 13 or older 8.30 212537 255 146 408548
## 4 PG-13 - Teens 13 or older 7.33 32837 2371 1171 79397
## 5 PG - Children 7.03 4894 3544 3704 11708
## 6 PG-13 - Teens 13 or older 8.12 57065 419 536 172274
## favorites Prem_Season Prem_Year Broad_Day Broad_Time
## 1 43460 Spring 1998 Saturdays 01:00
## 2 776 Fall NA <NA> <NA>
## 3 10432 Spring 1998 Thursdays 01:15
## 4 537 Summer 2002 Tuesdays Unknown
## 5 14 Fall 2004 Thursdays 18:30
## 6 3752 Spring 2005 Fridays 00:35
install package ggplot2 agar dapat menggunakan fitur-fitur plot agar plot lebih menarik
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5
Pada visualisasi ini ingin menampilkan tren perkembangan produksi anime yang diambil dari data premiere year dengan menggunakan plot line. Lakukan subsetting data dengan menggunakan piping agar kita tidak perlu meng-assign hasil pekerjaan sebelumnya ke dalam object baru.
anime_peryear <- unique_anime %>%
filter(Prem_Year != 2019) %>%
filter(!is.na(Prem_Year)) %>%
group_by("Year" = Prem_Year) %>%
summarise(freq = n())
ggplot(anime_peryear,aes(x = Year, y = freq)) + geom_line() + geom_line(col = "navy") +
labs(title = "Anime per Year",
x = 'Year',
y = 'Anime') +
theme_minimal() + theme(plot.title = element_text(face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
Dari grafik di atas, ditampilkan banyaknya produksi anime per tahunnya dari range tahun 1960-an sampai 2018. Seiring berjalannya waktu, produksi anime mengalami kenaikan. Kenaikan paling signifikan berada pada sekitar tahun 2005 namun menurun di tahun berikutnya. Pertambahan anime dari awalnya 0 menjadi lebih dari 200 anime yang telah diluncurkan.
library(ggplot2)
library(dplyr)
anime_agg <- anime %>%
group_by(genre) %>%
summarise(rata_rata = mean(scored_by)) %>%
ungroup()
anime_agg <- head(anime_agg[order(anime_agg$rata_rata, decreasing = T),], 5)
anime_agg
## # A tibble: 5 x 2
## genre rata_rata
## <fct> <dbl>
## 1 Thriller 149982.
## 2 Psychological 101005.
## 3 Harem 82452.
## 4 Supernatural 75783.
## 5 Super Power 74591.
ggplot(anime_agg, aes(y = reorder(genre, rata_rata), x = rata_rata)) +
geom_col(aes(fill = rata_rata)) + scale_fill_gradient(low = 'pink', high = 'cornflowerblue') + labs(title = "Top 5 Anime Genres Scored by User", x = 'Mean Score', y = NULL) + theme_classic() + theme(legend.position = 'none') + theme(plot.title = element_text(face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))
Dari grafik di atas diperoleh informasi lima genre anime. Genre anime thriller memiliki rata-rata skor tertinggi dibandingkan dengan empat genre lainnya. Penonton lebih banyak memberikan score pada genre anime thriller.
anime_season <- unique_anime %>%
filter(!(is.na(Prem_Season))) %>%
filter(!(Prem_Season == "NA")) %>%
group_by(Prem_Season) %>%
summarise( mean_score = mean(popularity))
anime_season
## # A tibble: 4 x 2
## Prem_Season mean_score
## <fct> <dbl>
## 1 Fall 7119.
## 2 Spring 7340.
## 3 Summer 7572.
## 4 Winter 8879.
ggplot(anime_season, aes(x = reorder(Prem_Season, mean_score), y = mean_score)) + geom_col(aes(fill = mean_score)) + scale_fill_gradient(low = 'beige', high = 'skyblue') + geom_point(aes(col = Prem_Season)) + labs(title = "Anime Season Premiere", subtitle = 'Premiere Season by Popularity',y = 'Mean Score', x = NULL) + theme_dark() + theme(legend.position = 'none') + theme(plot.title = element_text(face = "bold", hjust = 0.5),
plot.subtitle = element_text(hjust = 0.5))
Dari grafik tersebut diperoleh informasi bahwa musim dingin memiliki point paling banyak dibandingkan dengan musim panas, musim gugur, dan musim gugur. Banyak penonton yang menonton anime selama musim dingin diikuti oleh musim panas. Grafik menunjukkan musim semi dan musim gugur relatif lebih rendah daripada musim dingin dan musim panas.
Dari ketika grafik di atas, diperoleh kesimpulan bahwa : - Tren anime naik di range tahun 2000 sampai 2020 - Produser dapat mempertimbangkan untuk meluncurkan anime dengan genre thriller di musim dingin untuk menarik lebih banyak penonton karena dari grafik dapat dilihat genre thriller dan musim dingin menempati posisi tertinggi.