WARNING

DO NOT TAKE THIS ANALYSIS SERIOUSLY!! It is made out of fun.

Exploring the kpop_idols dataset

# Full Data
kpop_idols <- read.csv("kpop_idols.csv")
# Structure and summary of data in each column
str(kpop_idols)
## 'data.frame':    1310 obs. of  10 variables:
##  $ Stage.Name   : Factor w/ 1135 levels "A.M","Ace","Aeji",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Full.Name    : Factor w/ 1252 levels "","Abe Haruno",..: 1079 272 684 702 155 78 407 861 498 205 ...
##  $ Korean.Name  : Factor w/ 1242 levels "","강가영","강경원",..: 521 920 72 794 44 1047 21 798 298 1202 ...
##  $ K..Stage.Name: Factor w/ 1083 levels "","가가","가린",..: 508 507 495 490 475 482 491 509 484 481 ...
##  $ Date.of.Birth: Factor w/ 1181 levels "1977-12-31","1980-12-25",..: 670 276 928 921 1059 195 630 88 202 1006 ...
##  $ Group        : Factor w/ 209 levels "","(G)I-DLE",..: 119 196 93 130 77 62 146 1 148 72 ...
##  $ Country      : Factor w/ 12 levels "Australia","Canada",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ Birthplace   : Factor w/ 136 levels "","Andong","Ansan",..: 1 1 25 124 132 105 97 28 1 1 ...
##  $ Other.Group  : Factor w/ 56 levels "","2YOON","3RACHA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ Gender       : Factor w/ 2 levels "F","M": 2 2 1 1 1 1 1 1 1 1 ...
summary(kpop_idols)
##     Stage.Name          Full.Name     Korean.Name   K..Stage.Name 
##  Dawon   :   4               :   6          :   6   수빈   :   5  
##  Hayoung :   4   Kim Chaewon :   3   김동현 :   3   유진   :   5  
##  Jinwoo  :   4   Kim Donghyun:   3   김민석 :   3   지수   :   5  
##  Jinyoung:   4   Kim Jiwon   :   3   김소희 :   3   다원   :   4  
##  Jisoo   :   4   Kim Minseok :   3   김지원 :   3   민재   :   4  
##  Minhyuk :   4   Kim Sohee   :   3   김채원 :   3   민혁   :   4  
##  (Other) :1286   (Other)     :1289   (Other):1289   (Other):1283  
##     Date.of.Birth           Group             Country       Birthplace 
##  1994-01-18:   3               :  91   South Korea:1204          :621  
##  1994-12-24:   3   NCT         :  18   China      :  39   Seoul  :188  
##  1995-01-23:   3   14U         :  14   Japan      :  27   Busan  : 60  
##  1996-11-06:   3   Cosmic Girls:  13   USA        :  14   Gwangju: 35  
##  1996-11-21:   3   Seventeen   :  13   Taiwan     :   6   Incheon: 29  
##  1998-01-16:   3   IZ*ONE      :  12   Thailand   :   6   Daegu  : 27  
##  (Other)   :1292   (Other)     :1149   (Other)    :  14   (Other):350  
##          Other.Group   Gender 
##                :1188   F:634  
##  Super Junior-M:   6   M:676  
##  Loona 1/3     :   5          
##  NCT Dream     :   5          
##  9MUSES A      :   4          
##  Loona yyxy    :   4          
##  (Other)       :  98

Cleaning the kpop_idols dataset

kpop_idols_cleaned <- kpop_idols %>%
  mutate(Stage.Name=as.character(Stage.Name),Full.Name=as.character(Full.Name),Date.of.Birth=as.Date(Date.of.Birth)) %>%
  select(Stage.Name,Full.Name,Date.of.Birth,Group,Country,Gender) 
# blank cells in group column refer to solo artist, so I replace blank cells with "solo"
kpop_idols_cleaned$Group <- as.factor(if_else(kpop_idols_cleaned$Group == "", "Solo", as.character(kpop_idols_cleaned$Group)))
kpop_idols_cleaned
summary(kpop_idols_cleaned)
##   Stage.Name         Full.Name         Date.of.Birth                 Group     
##  Length:1310        Length:1310        Min.   :1977-12-31   Solo        :  91  
##  Class :character   Class :character   1st Qu.:1993-02-12   NCT         :  18  
##  Mode  :character   Mode  :character   Median :1996-05-30   14U         :  14  
##                                        Mean   :1996-01-13   Cosmic Girls:  13  
##                                        3rd Qu.:1999-05-14   Seventeen   :  13  
##                                        Max.   :2005-08-22   IZ*ONE      :  12  
##                                                             (Other)     :1149  
##         Country     Gender 
##  South Korea:1204   F:634  
##  China      :  39   M:676  
##  Japan      :  27          
##  USA        :  14          
##  Taiwan     :   6          
##  Thailand   :   6          
##  (Other)    :  14

Kpop idols and the things they have in common

birthdates

Most kpop idols were born on January, and least were born on June.

# most common month
kpop_idols_month <- kpop_idols_cleaned %>%
  mutate(month = month(Date.of.Birth)) %>%
  count(month) %>%
  arrange(desc(n)) %>%
  mutate(percent = n/sum(n)*100)
kpop_idols_month

In my perspective, the table shows no significance difference among the months. So, let’s be more specific in the data i.e. find common month-date.

kpop_idols_md <- kpop_idols_cleaned %>%
  mutate(md = format(Date.of.Birth, "%m-%d")) 
count(kpop_idols_md, kpop_idols_md$md) %>%
  arrange(desc(n))

Many idols were born on March 20. Let’s find out who were born on this date.

kpop_idols_md %>%
  filter(md == "03-20") %>%
  arrange(Date.of.Birth)

Now, let’s prepare data to see if our data covers all month-date pairs.

all_ymd <- seq(as.Date("1992-1-1"), as.Date("1992-12-31"), by = "1 day")
all_md <- as.factor(format(all_ymd, "%m-%d"))
kpop_md <- as.factor(unique(kpop_idols_md$md))
all_md_df_1 <- as.data.frame(all_md)
kpop_md_df_1 <- as.data.frame(kpop_md)
all_md_df_2 <- all_md_df_1 %>%
  mutate(md=all_md)
kpop_md_df_2 <- kpop_md_df_1 %>%
  mutate(md=kpop_md)
str(all_md_df_2)
## 'data.frame':    366 obs. of  2 variables:
##  $ all_md: Factor w/ 366 levels "01-01","01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ md    : Factor w/ 366 levels "01-01","01-02",..: 1 2 3 4 5 6 7 8 9 10 ...
str(kpop_md_df_2)
## 'data.frame':    355 obs. of  2 variables:
##  $ kpop_md: Factor w/ 355 levels "01-01","01-02",..: 355 231 289 261 50 143 230 147 187 195 ...
##  $ md     : Factor w/ 355 levels "01-01","01-02",..: 355 231 289 261 50 143 230 147 187 195 ...

No, it does not. Therefore, you have to be born on the following dates in order to be unique in the kpop industry.

no_kpop <- all_md_df_2 %>%
  anti_join(kpop_md_df_2, by="md")
## Warning: Column `md` joining factors with different levels, coercing to
## character vector
no_kpop

Final Words

According to the dataset, there is no kpop idol born on the above dates. If those are your birthday, you might add diversity to the kpop industry!

Data Source : Kaggle

THANK YOU :)