Halo semuanya! Dataset yang digunakan dalam visualisasi data ini diperoleh dari https://github.com/rfordatascience/tidytuesday/blob/master/data/2019/2019-04-23/raw_anime.csv

Data Preparation

1. Read dan Cek Data

Tahap pertama untuk melakukan analisis data yaitu mengimport data. Pastikan file project berada di dalam folder yang sama dengan dataset yang kita miliki.

anime <- read.csv('tidy_anime.csv')

Lihat struktur dari dataset. Ubah tipe data yang belum sesuai. Kita juga dapat menghapus kolom yang tidak diperlukan.

str(anime)
## 'data.frame':    77911 obs. of  28 variables:
##  $ animeID       : int  1 1 1 1 1 1 5 5 5 5 ...
##  $ name          : chr  "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" ...
##  $ title_english : chr  "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" "Cowboy Bebop" ...
##  $ title_japanese: chr  "カウボーイビãƒ\220ップ" "カウボーイビãƒ\220ップ" "カウボーイビãƒ\220ップ" "カウボーイビãƒ\220ップ" ...
##  $ title_synonyms: chr  "[]" "[]" "[]" "[]" ...
##  $ type          : chr  "TV" "TV" "TV" "TV" ...
##  $ source        : chr  "Original" "Original" "Original" "Original" ...
##  $ producers     : chr  "Bandai Visual" "Bandai Visual" "Bandai Visual" "Bandai Visual" ...
##  $ genre         : chr  "Action" "Adventure" "Comedy" "Drama" ...
##  $ studio        : chr  "Sunrise" "Sunrise" "Sunrise" "Sunrise" ...
##  $ episodes      : int  26 26 26 26 26 26 1 1 1 1 ...
##  $ status        : chr  "Finished Airing" "Finished Airing" "Finished Airing" "Finished Airing" ...
##  $ airing        : logi  FALSE FALSE FALSE FALSE FALSE FALSE ...
##  $ start_date    : chr  "1998-04-03" "1998-04-03" "1998-04-03" "1998-04-03" ...
##  $ end_date      : chr  "1999-04-02" "1999-04-02" "1999-04-02" "1999-04-02" ...
##  $ duration      : chr  "24 min per ep" "24 min per ep" "24 min per ep" "24 min per ep" ...
##  $ rating        : chr  "R - 17+ (violence & profanity)" "R - 17+ (violence & profanity)" "R - 17+ (violence & profanity)" "R - 17+ (violence & profanity)" ...
##  $ score         : num  8.81 8.81 8.81 8.81 8.81 8.81 8.41 8.41 8.41 8.41 ...
##  $ scored_by     : int  405664 405664 405664 405664 405664 405664 120243 120243 120243 120243 ...
##  $ rank          : int  26 26 26 26 26 26 164 164 164 164 ...
##  $ popularity    : int  39 39 39 39 39 39 449 449 449 449 ...
##  $ members       : int  795733 795733 795733 795733 795733 795733 197791 197791 197791 197791 ...
##  $ favorites     : int  43460 43460 43460 43460 43460 43460 776 776 776 776 ...
##  $ synopsis      : chr  "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ "In the year 2071, humanity has colonized several of the planets and moons of the solar system leaving the now u"| __truncated__ ...
##  $ background    : chr  "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ "When Cowboy Bebop first aired in spring of 1998 on TV Tokyo, only episodes 2, 3, 7-15, and 18 were broadcast, i"| __truncated__ ...
##  $ premiered     : chr  "Spring 1998" "Spring 1998" "Spring 1998" "Spring 1998" ...
##  $ broadcast     : chr  "Saturdays at 01:00 (JST)" "Saturdays at 01:00 (JST)" "Saturdays at 01:00 (JST)" "Saturdays at 01:00 (JST)" ...
##  $ related       : chr  "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ "{'Adaptation': [{'mal_id': 173, 'type': 'manga', 'name': 'Cowboy Bebop', 'url': 'https://myanimelist.net/manga/"| __truncated__ ...
anime$animeID <- as.character(anime$animeID)
anime$type <- as.factor(anime$type)
anime$genre <- as.factor(anime$genre)

# Install package lubridate untuk konversi ke tipe data date.
library(lubridate)
## Warning: package 'lubridate' was built under R version 4.0.5
## 
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union
anime$start_date <- ymd(anime$start_date)

anime <- anime[, -c(3,4,5,12,15,24,25,28)]

2. Data Cleaning

Pisahkan data pada kolom premiered dan broadcast agar diperoleh data yang lebih spesifik. Hapus kolom yang tidak diperlukan. Gunakan package dplyr dan tidyr agar dapat menggunakan method separate dab select.

library(dplyr)
## Warning: package 'dplyr' was built under R version 4.0.5
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(tidyr)
## Warning: package 'tidyr' was built under R version 4.0.5
anime <- anime %>% separate(premiered, c('Prem_Season','Prem_Year')) %>% 
          separate(broadcast, c("Broad_Day", "at", "Broad_Time", "timezone"), sep = " " ) %>% 
          select(,-c(at, timezone))
## Warning: Expected 4 pieces. Additional pieces discarded in 331 rows [7263,
## 7264, 7265, 7266, 7267, 7268, 8899, 8900, 8901, 9960, 9961, 11288, 11289, 11290,
## 11291, 11292, 11293, 11294, 11295, 11296, ...].
## Warning: Expected 4 pieces. Missing pieces filled with `NA` in 14618 rows
## [23, 24, 25, 26, 27, 28, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64,
## 65, ...].

Isi nilai kosong NA dengan nilai yang sudah ditetapkan.

anime$Prem_Season <- ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(12,1,2), "Winter",
                     ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(3,4,5), "Spring",
                     ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(6,7,8), "Summer",
                     ifelse(as.numeric(format.Date(anime$start_date, "%m")) %in% c(9,10,11), "Fall",
                     no = NA )))) 

Ubah nilai Unknown pada kolom source dan broad_day menjadi NA. ubah tipe data kolom yang baru dibuat dan belum sesuai.

anime$source[anime$source == 'Unknown'] <- NA
anime$Broad_Day[anime$Broad_Day == "Not"] <- NA
anime$Broad_Day[anime$Broad_Day == "Not"] <- NA
anime$Broad_Day[anime$Broad_Day == 'Unknown'] <- NA

anime$rating <- as.factor(anime$rating)
anime$Prem_Season <- as.factor(anime$Prem_Season)
anime$Broad_Day <- as.factor(anime$Broad_Day)
anime$Prem_Year <- as.numeric(anime$Prem_Year)

names(anime)
##  [1] "animeID"     "name"        "type"        "source"      "producers"  
##  [6] "genre"       "studio"      "episodes"    "airing"      "start_date" 
## [11] "duration"    "rating"      "score"       "scored_by"   "rank"       
## [16] "popularity"  "members"     "favorites"   "Prem_Season" "Prem_Year"  
## [21] "Broad_Day"   "Broad_Time"

Dari data anime tersebut, terdapat 22 variabel(kolom). Identitas kolom : * animeID : ID anime * name : judul anime * type : tipe anime (TV, Movie, OVA) * source : sumber anime (original, manga, game, etc) * producers : produser * genre : genre * studio : studio * episodes : keterangan episoda * airing : status anime * start_date : start date * duration : durasi anime * rating : rating umur * score : score viewers * scored_by : banyaknya user yang memberikan score * rank : ranking anime * popularity : diukur dari banyaknya member yang menyimpan ke dalam list anime * members : data members yang menyimpan anime ke dalam list * favorites : data members yang menyukai anime * Prem_Season : musim anime ditayangkan * Prem_Year : tahun anime ditayangkan * Broad_Day : hari anime broadcast * Broad_Time : waktu ketika anime dibroadcast

Exploratory dan Visualisasi Data

head(anime)
##   animeID         name type   source     producers     genre  studio episodes
## 1       1 Cowboy Bebop   TV Original Bandai Visual    Action Sunrise       26
## 2       1 Cowboy Bebop   TV Original Bandai Visual Adventure Sunrise       26
## 3       1 Cowboy Bebop   TV Original Bandai Visual    Comedy Sunrise       26
## 4       1 Cowboy Bebop   TV Original Bandai Visual     Drama Sunrise       26
## 5       1 Cowboy Bebop   TV Original Bandai Visual    Sci-Fi Sunrise       26
## 6       1 Cowboy Bebop   TV Original Bandai Visual     Space Sunrise       26
##   airing start_date      duration                         rating score
## 1  FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity)  8.81
## 2  FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity)  8.81
## 3  FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity)  8.81
## 4  FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity)  8.81
## 5  FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity)  8.81
## 6  FALSE 1998-04-03 24 min per ep R - 17+ (violence & profanity)  8.81
##   scored_by rank popularity members favorites Prem_Season Prem_Year Broad_Day
## 1    405664   26         39  795733     43460      Spring      1998 Saturdays
## 2    405664   26         39  795733     43460      Spring      1998 Saturdays
## 3    405664   26         39  795733     43460      Spring      1998 Saturdays
## 4    405664   26         39  795733     43460      Spring      1998 Saturdays
## 5    405664   26         39  795733     43460      Spring      1998 Saturdays
## 6    405664   26         39  795733     43460      Spring      1998 Saturdays
##   Broad_Time
## 1      01:00
## 2      01:00
## 3      01:00
## 4      01:00
## 5      01:00
## 6      01:00

Pada data terdapat redudansi di mana ada anime yang muncul beberapa kali sehingga diperlukan pengambilan nilai unik dari data tersebut.

unique_anime <- data.frame(anime %>% distinct(animeID, .keep_all = TRUE))
head(unique_anime)
##   animeID                            name  type   source            producers
## 1       1                    Cowboy Bebop    TV Original        Bandai Visual
## 2       5 Cowboy Bebop: Tengoku no Tobira Movie Original              Sunrise
## 3       6                          Trigun    TV    Manga Victor Entertainment
## 4       7              Witch Hunter Robin    TV Original        Bandai Visual
## 5       8                  Bouken Ou Beet    TV    Manga             TV Tokyo
## 6      16            Hachimitsu to Clover    TV    Manga                Genco
##       genre         studio episodes airing start_date      duration
## 1    Action        Sunrise       26  FALSE 1998-04-03 24 min per ep
## 2    Action          Bones        1  FALSE 2001-09-01   1 hr 55 min
## 3    Action       Madhouse       26  FALSE 1998-04-01 24 min per ep
## 4    Action        Sunrise       26  FALSE 2002-07-02 25 min per ep
## 5 Adventure Toei Animation       52  FALSE 2004-09-30 23 min per ep
## 6    Comedy      J.C.Staff       24  FALSE 2005-04-15 23 min per ep
##                           rating score scored_by rank popularity members
## 1 R - 17+ (violence & profanity)  8.81    405664   26         39  795733
## 2 R - 17+ (violence & profanity)  8.41    120243  164        449  197791
## 3      PG-13 - Teens 13 or older  8.30    212537  255        146  408548
## 4      PG-13 - Teens 13 or older  7.33     32837 2371       1171   79397
## 5                  PG - Children  7.03      4894 3544       3704   11708
## 6      PG-13 - Teens 13 or older  8.12     57065  419        536  172274
##   favorites Prem_Season Prem_Year Broad_Day Broad_Time
## 1     43460      Spring      1998 Saturdays      01:00
## 2       776        Fall        NA      <NA>       <NA>
## 3     10432      Spring      1998 Thursdays      01:15
## 4       537      Summer      2002  Tuesdays    Unknown
## 5        14        Fall      2004 Thursdays      18:30
## 6      3752      Spring      2005   Fridays      00:35

install package ggplot2 agar dapat menggunakan fitur-fitur plot agar plot lebih menarik

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.0.5

Pada visualisasi ini ingin menampilkan tren perkembangan produksi anime yang diambil dari data premiere year dengan menggunakan plot line. Lakukan subsetting data dengan menggunakan piping agar kita tidak perlu meng-assign hasil pekerjaan sebelumnya ke dalam object baru.

anime_peryear <- unique_anime %>% 
  filter(Prem_Year != 2019) %>% 
  filter(!is.na(Prem_Year)) %>% 
  group_by("Year" = Prem_Year) %>% 
  summarise(freq = n())

ggplot(anime_peryear,aes(x = Year, y = freq)) + geom_line() + geom_line(col = "navy") +
  labs(title = "Anime per Year",
        x = 'Year',
       y = 'Anime') +
  theme_minimal() + theme(plot.title = element_text(face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

Dari grafik di atas, ditampilkan banyaknya produksi anime per tahunnya dari range tahun 1960-an sampai 2018. Seiring berjalannya waktu, produksi anime mengalami kenaikan. Kenaikan paling signifikan berada pada sekitar tahun 2005 namun menurun di tahun berikutnya. Pertambahan anime dari awalnya 0 menjadi lebih dari 200 anime yang telah diluncurkan.

library(ggplot2)
library(dplyr)

anime_agg <- anime %>% 
  group_by(genre) %>% 
  summarise(rata_rata = mean(scored_by)) %>% 
  ungroup()
anime_agg <- head(anime_agg[order(anime_agg$rata_rata, decreasing = T),], 5)
anime_agg
## # A tibble: 5 x 2
##   genre         rata_rata
##   <fct>             <dbl>
## 1 Thriller        149982.
## 2 Psychological   101005.
## 3 Harem            82452.
## 4 Supernatural     75783.
## 5 Super Power      74591.
ggplot(anime_agg, aes(y = reorder(genre, rata_rata), x = rata_rata)) +
  geom_col(aes(fill = rata_rata)) + scale_fill_gradient(low = 'pink', high = 'cornflowerblue') + labs(title = "Top 5 Anime Genres Scored by User", x = 'Mean Score', y = NULL) + theme_classic() + theme(legend.position = 'none') + theme(plot.title = element_text(face = "bold", hjust = 0.5), plot.subtitle = element_text(hjust = 0.5))

Dari grafik di atas diperoleh informasi lima genre anime. Genre anime thriller memiliki rata-rata skor tertinggi dibandingkan dengan empat genre lainnya. Penonton lebih banyak memberikan score pada genre anime thriller.

anime_season <- unique_anime %>% 
  filter(!(is.na(Prem_Season))) %>% 
  filter(!(Prem_Season == "NA")) %>%
  group_by(Prem_Season) %>% 
  summarise( mean_score = mean(popularity))
anime_season
## # A tibble: 4 x 2
##   Prem_Season mean_score
##   <fct>            <dbl>
## 1 Fall             7119.
## 2 Spring           7340.
## 3 Summer           7572.
## 4 Winter           8879.
ggplot(anime_season, aes(x = reorder(Prem_Season, mean_score), y = mean_score)) + geom_col(aes(fill = mean_score)) + scale_fill_gradient(low = 'beige', high = 'skyblue') + geom_point(aes(col = Prem_Season)) + labs(title = "Anime Season Premiere", subtitle = 'Premiere Season by Popularity',y = 'Mean Score', x = NULL) + theme_dark() + theme(legend.position = 'none') + theme(plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = 0.5))

Dari grafik tersebut diperoleh informasi bahwa musim dingin memiliki point paling banyak dibandingkan dengan musim panas, musim gugur, dan musim gugur. Banyak penonton yang menonton anime selama musim dingin diikuti oleh musim panas. Grafik menunjukkan musim semi dan musim gugur relatif lebih rendah daripada musim dingin dan musim panas.

Kesimpulan

Dari ketika grafik di atas, diperoleh kesimpulan bahwa : - Tren anime naik di range tahun 2000 sampai 2020 - Produser dapat mempertimbangkan untuk meluncurkan anime dengan genre thriller di musim dingin untuk menarik lebih banyak penonton karena dari grafik dapat dilihat genre thriller dan musim dingin menempati posisi tertinggi.