This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Note that the echo = FALSE parameter was added to the
code chunk to prevent printing of the R code that generated the
plot.
data1<-head(data,40)
data_group1 <- data1 %>%
group_by(artistName) %>%
summarise(avg_msplayed = mean(msPlayed),
Maximum = max(msPlayed))
print(data_group1)
## # A tibble: 36 × 3
## artistName avg_msplayed Maximum
## <chr> <dbl> <int>
## 1 AURORA 1945555 1945555
## 2 Austyn Johnson 60453 60453
## 3 Avenged Sevenfold 40539 40539
## 4 Beyoncé 233 233
## 5 Bhikhari Bala 57728 57728
## 6 Black Hill 229535 229535
## 7 Brent Faiyaz 104821 104821
## 8 Catie Turner 6158627 6158627
## 9 Chris Lake 192455 192455
## 10 Clean Bandit 246584 246584
## # ℹ 26 more rows
summary(data_group1)
## artistName avg_msplayed Maximum
## Length:36 Min. : 233 Min. : 233
## Class :character 1st Qu.: 116204 1st Qu.: 116204
## Mode :character Median : 217668 Median : 217668
## Mean :1184368 Mean :1229712
## 3rd Qu.:1297857 3rd Qu.:1701588
## Max. :9160154 Max. :9160154
ggplot(data_group1, aes(artistName,Maximum)) + geom_col(fill = "#bb0a69",colour = "yellow") + labs(title = "Avg msplayed by Artist", 'artistName','avg_maplayed')+theme(axis.text.x = element_text(angle = 90))
data_group1 <- data_group1 %>%
mutate(exp_prob = (avg_msplayed/ sum(avg_msplayed)))
print(data_group1)
## # A tibble: 36 × 4
## artistName avg_msplayed Maximum exp_prob
## <chr> <dbl> <int> <dbl>
## 1 AURORA 1945555 1945555 0.0456
## 2 Austyn Johnson 60453 60453 0.00142
## 3 Avenged Sevenfold 40539 40539 0.000951
## 4 Beyoncé 233 233 0.00000546
## 5 Bhikhari Bala 57728 57728 0.00135
## 6 Black Hill 229535 229535 0.00538
## 7 Brent Faiyaz 104821 104821 0.00246
## 8 Catie Turner 6158627 6158627 0.144
## 9 Chris Lake 192455 192455 0.00451
## 10 Clean Bandit 246584 246584 0.00578
## # ℹ 26 more rows
ggplot(data_group1, aes(artistName,exp_prob)) + geom_col(fill = "#15c8b1",colour = "red") + labs(title = "Avg msplayed by Artist", 'artistName','avg_maplayed')+theme(axis.text.x = element_text(angle = 90))
anomaly1 <- data_group1 %>%
filter(exp_prob == min(exp_prob)) %>%
pull(artistName)
print(anomaly1)
## [1] "Beyoncé"
data1 <- data1 %>%
mutate(anomaly_1 = ifelse(artistName == anomaly1, "Anomaly", "Normal"))
print(data1)
## trackName artistName msPlayed
## 1 A Better Place Project AER 119999
## 2 A Dangerous Thing AURORA 1945555
## 3 A Different Way (with Lauv) DJ Snake 66060
## 4 A Drug From God Chris Lake 192455
## 5 A Gift Of A Thistle James Horner 97568
## 6 A Little Bit Yours JP Saxe 99339
## 7 A Little More Catie Turner 6158627
## 8 A Little Piece of Heaven Avenged Sevenfold 40539
## 9 A Million Dreams Ziv Zaifman 269453
## 10 A Million Dreams (Reprise) Austyn Johnson 60453
## 11 A Moment Apart ODESZA 300828
## 12 A New Beginning Yasumu 143378
## 13 A Night to Remember High School Musical Cast 1514502
## 14 A Night to Remember - Original Version High School Musical Cast 832507
## 15 A Place Among the Stars Hans Zimmer 1670916
## 16 A Sky Full of Stars Taron Egerton 372839
## 17 A Soulmate Who Wasn’t Meant to Be Jess Benko 9160154
## 18 A wild river to take you home Black Hill 229535
## 19 A+E (feat. Kandaka Moore & Nikki Cislyn) Clean Bandit 246584
## 20 A-Punk Vampire Weekend 1804847
## 21 AGEN WIDA JOYRYDE 198997
## 22 ALIEN SUPERSTAR Beyoncé 233
## 23 ALL MINE Brent Faiyaz 104821
## 24 ALLERGIC clide 155200
## 25 AMAZING Rex Orange County 6890534
## 26 AMNESIA (feat. Boy In Space) DREAMDNVR 3734865
## 27 ANGOSTURA keshi 171018
## 28 ANKLES Jessie Reyez 187423
## 29 ANUBIS KUTE 262699
## 30 AVOID ME KUTE 800336
## 31 AVOID ME 3 KUTE 40239
## 32 Aaftaab The Local Train 2540803
## 33 Aag Lage Chahe Basti Mai Sirazee 27373
## 34 Aahe Neelasaila Bhikhari Bala 57728
## 35 Aaj Bhi Vishal Mishra 1793602
## 36 Aaj Bhi - WORMONO LoFi Remake Vishal Mishra 76002
## 37 Aaja We Mahiya Imran Khan 2400332
## 38 Aajkal Tere Mere Pyar Ke Charche Suman Kalyanpur 205800
## 39 Aankhein Khuli Jatin-Lalit 173287
## 40 Aap Ki Kashish Himesh Reshammiya 333653
## genre danceability energy key loudness speechiness
## 1 ambient guitar 0.4960 0.2550 9 -17.984 0.0283
## 2 art pop 0.5410 0.5560 11 -6.150 0.0356
## 3 edm 0.7840 0.7570 8 -3.912 0.0384
## 4 bass house 0.7140 0.8830 9 -4.430 0.0625
## 5 orchestral soundtrack 0.0828 0.0120 9 -36.045 0.0451
## 6 alt z 0.5980 0.2950 1 -8.553 0.0276
## 7 alt z 0.7920 0.4840 4 -9.897 0.1920
## 8 alternative metal 0.4860 0.8810 2 -5.623 0.0474
## 9 show tunes 0.2650 0.3120 7 -11.689 0.0569
## 10 show tunes 0.2530 0.1390 6 -17.067 0.0414
## 11 chillwave 0.5150 0.6630 7 -7.896 0.0328
## 12 lo-fi study 0.6950 0.2870 0 -14.608 0.0499
## 13 post-teen pop 0.7980 0.9040 0 -3.316 0.0476
## 14 post-teen pop 0.7980 0.9040 0 -3.316 0.0476
## 15 german soundtrack 0.1710 0.0325 4 -25.382 0.0430
## 16 hollywood 0.4400 0.6760 6 -7.570 0.0582
## 17 gen z singer-songwriter 0.5710 0.0274 9 -20.274 0.0649
## 18 instrumental post-rock 0.4580 0.3610 6 -12.066 0.0353
## 19 pop 0.8080 0.5190 5 -8.268 0.0403
## 20 baroque pop 0.5510 0.8190 2 -4.489 0.0525
## 21 bass house 0.7840 0.9130 1 -1.208 0.1770
## 22 pop 0.5450 0.6410 10 -6.398 0.0998
## 23 r&b 0.6170 0.3780 9 -8.540 0.0315
## 24 singer-songwriter pop 0.6090 0.4280 11 -7.796 0.0244
## 25 bedroom pop 0.7190 0.4610 8 -8.191 0.0454
## 26 dark r&b 0.7080 0.4200 9 -9.478 0.1210
## 27 chill r&b 0.5360 0.6590 6 -9.732 0.0483
## 28 canadian contemporary r&b 0.5140 0.7490 8 -4.336 0.4030
## 29 aggressive phonk 0.6270 0.9930 1 -1.304 0.0714
## 30 aggressive phonk 0.6560 0.9640 6 -2.087 0.0964
## 31 aggressive phonk 0.7060 0.9330 10 -0.096 0.0721
## 32 hindi indie 0.4640 0.3910 6 -9.884 0.0384
## 33 himachali pop 0.6130 0.5510 9 -8.613 0.1980
## 34 odia bhajan 0.3280 0.5710 1 -6.913 0.0395
## 35 desi pop 0.5810 0.3040 1 -9.707 0.0274
## 36 desi pop 0.3070 0.2650 1 -14.701 0.0497
## 37 desi hip hop 0.6390 0.5940 10 -6.497 0.0347
## 38 classic bollywood 0.5560 0.7920 9 -10.683 0.0715
## 39 afghan pop 0.7920 0.6350 7 -13.940 0.0863
## 40 filmi 0.6900 0.7800 8 -7.412 0.0489
## valence tempo id duration_ms anomaly_1
## 1 0.0809 141.961 2oC9Ah7npALCCPW5DC1gob 120000 Normal
## 2 0.1060 105.886 0PDlmmYkuQCUAFhMXvtlsU 215573 Normal
## 3 0.5870 104.996 1YMBg7rOjxzbya0fPOYfNX 198286 Normal
## 4 0.8190 126.016 4skbQNtyjy8A7mo8oqe2oD 192455 Normal
## 5 0.0578 169.895 3Yvi5NkUrSppVwrMHYkB6u 97667 Normal
## 6 0.3140 87.025 00cBcYOlnHoXX9ver3cmdE 225680 Normal
## 7 0.2450 96.939 2thRPgNgcWAwAYxP4HWdji 224528 Normal
## 8 0.6670 144.997 1BLfQ6dPXmuDrFmbdfW7Jl 480707 Normal
## 9 0.0998 54.747 0RoA7ObU6phWpqhlC9zH4Z 269453 Normal
## 10 0.1020 74.369 66y7x28jXOPrcmu3D5Zjh6 60453 Normal
## 11 0.0756 120.024 59wlTaYOL5tDUgXnbBQ3my 234244 Normal
## 12 0.3790 128.028 0xtbVIWkbfu5G6TgCVmvVn 143379 Normal
## 13 0.9660 135.008 3RuXekS5criwismaaj87iF 236400 Normal
## 14 0.9660 135.008 2BHu6HOFXgHR936cj10XIi 236400 Normal
## 15 0.0284 70.821 7KNoh3Bn4QfgkIGBywkAB0 207118 Normal
## 16 0.4290 125.013 2VeLdiYiILcCOEV0izfzEW 206438 Normal
## 17 0.0870 124.209 0gOz9JUXsaKVzLTSmFDtdo 316556 Normal
## 18 0.2310 86.966 0UFkbnxj34vZVgwwEDy29e 229536 Normal
## 19 0.3840 114.977 1Qi6CGoqTG1UvFcuRzKRNL 246584 Normal
## 20 0.8460 174.917 3AydAydLzyyZutA0375XIz 137760 Normal
## 21 0.5900 126.054 35V6qXUDCvwZdmbKe5PlPG 199048 Normal
## 22 0.4640 121.892 1Hohk6AufHZOrrhMXZppax 215460 Anomaly
## 23 0.2200 141.967 3XgGQ1wjo5khvq2UImjyNF 216063 Normal
## 24 0.6270 96.071 6DP8InyxdyYChHb2tcV6ia 155200 Normal
## 25 0.5480 123.915 3OM6qQmdFV6uy61GIqpRtf 209200 Normal
## 26 0.3170 145.981 1UyDQh3HBJwL6IxR9mXipZ 175827 Normal
## 27 0.1990 170.035 38umMmZQdeoOG7Zojor4g3 171020 Normal
## 28 0.4160 179.623 3ZWHzPucGUZvgNleqllzdK 169240 Normal
## 29 0.5800 109.975 1aqP9g3lLR8bpFtkCTm41N 131455 Normal
## 30 0.9650 114.007 5FX30idriKlEIRYTxpNf65 105789 Normal
## 31 0.7610 114.042 7dXyRgDZlQnA5OU1SPD1oe 120000 Normal
## 32 0.4180 106.887 4LtSTc3xANVhYeeN69nscM 233472 Normal
## 33 0.4820 110.243 0gbNOv3x5M87aczFOIEzfp 142410 Normal
## 34 0.6540 186.602 3fB2jhxOmwMsTxCYpjYjd4 313748 Normal
## 35 0.0973 109.006 65jpYG7TQCh0T7TtKO4Rjj 241206 Normal
## 36 0.0569 94.115 0muXl7VdnqjsMH3IDS2JFI 217976 Normal
## 37 0.3780 80.034 75Ak0cit8hV604Wjlq1gf2 232067 Normal
## 38 0.7760 98.554 2uiAZVVKqqYlM6KM9POTYq 309493 Normal
## 39 0.6480 122.004 16XEVyPh5NT31CAAqPbxQF 422713 Normal
## 40 0.7180 110.963 35Rrbt58ADLxyo1lu7xhZu 333653 Normal
##Conclusion for the 1st “group by” data frame: ##
##Here we can see that Beyonce is having the less expected probability compared to others so Anomaly tag is attached to it after translating to original data.##
possible_combinations <- expand.grid(artistName = unique(data1$artistName),
msPlayed = unique(data1$msPlayed))
print(head(possible_combinations))
## artistName msPlayed
## 1 Project AER 119999
## 2 AURORA 119999
## 3 DJ Snake 119999
## 4 Chris Lake 119999
## 5 James Horner 119999
## 6 JP Saxe 119999
missing_combinations <- anti_join(possible_combinations, data1, by = c("artistName", "msPlayed"))
print(head(missing_combinations))
## artistName msPlayed
## 1 AURORA 119999
## 2 DJ Snake 119999
## 3 Chris Lake 119999
## 4 James Horner 119999
## 5 JP Saxe 119999
## 6 Catie Turner 119999
data_for_combination<-head(data,8000)
combination_counts <- data_for_combination %>%
group_by(artistName, msPlayed) %>%
summarise(count = n(), .groups = 'drop')
print(combination_counts)
## # A tibble: 4,288 × 3
## artistName msPlayed count
## <chr> <int> <int>
## 1 !!! 3130 1
## 2 !!! 227066 2
## 3 $NOT 360000 2
## 4 $uicideboy$ 120005 1
## 5 11:11 Music Group 134582 2
## 6 12 Stones 206562 1
## 7 1nonly 270757 2
## 8 2 Chainz 50807 1
## 9 22 Void Beats 828909 1
## 10 24Herbs 263 1
## # ℹ 4,278 more rows
After calculating the frequencies we got different counts for different artistNames so we can analyze which artists have how many combinations with msPlayed.
combination_data <- table(data1$artistName, data1$msPlayed)
print(head(combination_data))
##
## 233 27373 40239 40539 57728 60453 66060 76002 97568 99339
## AURORA 0 0 0 0 0 0 0 0 0 0
## Austyn Johnson 0 0 0 0 0 1 0 0 0 0
## Avenged Sevenfold 0 0 0 1 0 0 0 0 0 0
## Beyoncé 1 0 0 0 0 0 0 0 0 0
## Bhikhari Bala 0 0 0 0 1 0 0 0 0 0
## Black Hill 0 0 0 0 0 0 0 0 0 0
##
## 104821 119999 143378 155200 171018 173287 187423 192455
## AURORA 0 0 0 0 0 0 0 0
## Austyn Johnson 0 0 0 0 0 0 0 0
## Avenged Sevenfold 0 0 0 0 0 0 0 0
## Beyoncé 0 0 0 0 0 0 0 0
## Bhikhari Bala 0 0 0 0 0 0 0 0
## Black Hill 0 0 0 0 0 0 0 0
##
## 198997 205800 229535 246584 262699 269453 300828 333653
## AURORA 0 0 0 0 0 0 0 0
## Austyn Johnson 0 0 0 0 0 0 0 0
## Avenged Sevenfold 0 0 0 0 0 0 0 0
## Beyoncé 0 0 0 0 0 0 0 0
## Bhikhari Bala 0 0 0 0 0 0 0 0
## Black Hill 0 0 1 0 0 0 0 0
##
## 372839 800336 832507 1514502 1670916 1793602 1804847
## AURORA 0 0 0 0 0 0 0
## Austyn Johnson 0 0 0 0 0 0 0
## Avenged Sevenfold 0 0 0 0 0 0 0
## Beyoncé 0 0 0 0 0 0 0
## Bhikhari Bala 0 0 0 0 0 0 0
## Black Hill 0 0 0 0 0 0 0
##
## 1945555 2400332 2540803 3734865 6158627 6890534 9160154
## AURORA 1 0 0 0 0 0 0
## Austyn Johnson 0 0 0 0 0 0 0
## Avenged Sevenfold 0 0 0 0 0 0 0
## Beyoncé 0 0 0 0 0 0 0
## Bhikhari Bala 0 0 0 0 0 0 0
## Black Hill 0 0 0 0 0 0 0
missing_combination <- which(combination_data == 0, arr.ind = TRUE)
print(head(missing_combination))
## row col
## AURORA 1 1
## Austyn Johnson 2 1
## Avenged Sevenfold 3 1
## Bhikhari Bala 5 1
## Black Hill 6 1
## Brent Faiyaz 7 1
data2<-head(data,40)
data_group2 <- data %>%
group_by(genre) %>%
summarise(median_duration = median(duration_ms),
Minimun2 = min(duration_ms))
print(head(data_group2))
## # A tibble: 6 × 3
## genre median_duration Minimun2
## <chr> <dbl> <int>
## 1 a cappella 269636 269636
## 2 abstract 465200 465200
## 3 abstract beats 73898 73898
## 4 abstract hip hop 283747 283747
## 5 acoustic opm 244800 244800
## 6 acoustic pop 232941 186584
summary(data_group2)
## genre median_duration Minimun2
## Length:523 Min. : 65800 Min. : 10027
## Class :character 1st Qu.:173634 1st Qu.:132486
## Mode :character Median :200673 Median :171467
## Mean :206649 Mean :177224
## 3rd Qu.:230168 3rd Qu.:211425
## Max. :621659 Max. :621659
ggplot(data_group2, aes(genre,median_duration)) + geom_point(size=3.5, color="#ff9600") + labs(title = "Avg duration played according to genre", 'genre','avg_duration')+theme(axis.text.x = element_text(angle = 90))
ggplot(data_group2, aes(genre,median_duration)) +
geom_boxplot()+theme(axis.title.x=element_text(colour="blue"),axis.title.y = element_text(colour="blue"))+ labs(title = "genre vs median_duration")
data_group2 <- data_group2 %>%
mutate(exp_prob2 = (median_duration / sum(median_duration)))
print(head(data_group2))
## # A tibble: 6 × 4
## genre median_duration Minimun2 exp_prob2
## <chr> <dbl> <int> <dbl>
## 1 a cappella 269636 269636 0.00249
## 2 abstract 465200 465200 0.00430
## 3 abstract beats 73898 73898 0.000684
## 4 abstract hip hop 283747 283747 0.00263
## 5 acoustic opm 244800 244800 0.00227
## 6 acoustic pop 232941 186584 0.00216
anomaly2 <- data_group2 %>%
filter(exp_prob2 == min(exp_prob2)) %>%
pull(genre)
print(anomaly2)
## [1] "ballet class"
ggplot(data_group2, aes(genre, exp_prob2)) +
geom_point()+
geom_smooth()
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
data2 <- data %>%
mutate(anomaly_1 = ifelse(artistName == anomaly1, "Anomaly", "Normal"))
print(head(data2))
## trackName artistName msPlayed genre
## 1 A Better Place Project AER 119999 ambient guitar
## 2 A Dangerous Thing AURORA 1945555 art pop
## 3 A Different Way (with Lauv) DJ Snake 66060 edm
## 4 A Drug From God Chris Lake 192455 bass house
## 5 A Gift Of A Thistle James Horner 97568 orchestral soundtrack
## 6 A Little Bit Yours JP Saxe 99339 alt z
## danceability energy key loudness speechiness valence tempo
## 1 0.4960 0.255 9 -17.984 0.0283 0.0809 141.961
## 2 0.5410 0.556 11 -6.150 0.0356 0.1060 105.886
## 3 0.7840 0.757 8 -3.912 0.0384 0.5870 104.996
## 4 0.7140 0.883 9 -4.430 0.0625 0.8190 126.016
## 5 0.0828 0.012 9 -36.045 0.0451 0.0578 169.895
## 6 0.5980 0.295 1 -8.553 0.0276 0.3140 87.025
## id duration_ms anomaly_1
## 1 2oC9Ah7npALCCPW5DC1gob 120000 Normal
## 2 0PDlmmYkuQCUAFhMXvtlsU 215573 Normal
## 3 1YMBg7rOjxzbya0fPOYfNX 198286 Normal
## 4 4skbQNtyjy8A7mo8oqe2oD 192455 Normal
## 5 3Yvi5NkUrSppVwrMHYkB6u 97667 Normal
## 6 00cBcYOlnHoXX9ver3cmdE 225680 Normal
possible_combinations2 <- expand.grid(genre = unique(data$genre),
duration_ms = unique(data$duration_ms))
print(head(possible_combinations))
## artistName msPlayed
## 1 Project AER 119999
## 2 AURORA 119999
## 3 DJ Snake 119999
## 4 Chris Lake 119999
## 5 James Horner 119999
## 6 JP Saxe 119999
missing_combinations2 <- anti_join(possible_combinations2, data, by = c("genre" , "duration_ms"))
print(head(missing_combinations2))
## genre duration_ms
## 1 art pop 120000
## 2 edm 120000
## 3 bass house 120000
## 4 orchestral soundtrack 120000
## 5 alt z 120000
## 6 alternative metal 120000
data_for_combination2 <-head(data,8000)
combination_counts2 <- data_for_combination2 %>%
group_by(genre, duration_ms) %>%
summarise(count = n(), .groups = 'drop')
print(combination_counts2)
## # A tibble: 4,252 × 3
## genre duration_ms count
## <chr> <int> <int>
## 1 a cappella 269636 2
## 2 abstract 465200 2
## 3 abstract beats 73898 2
## 4 abstract hip hop 283747 1
## 5 acoustic opm 244800 2
## 6 acoustic pop 186584 2
## 7 acoustic pop 189613 2
## 8 acoustic pop 198853 2
## 9 acoustic pop 226680 2
## 10 acoustic pop 231587 1
## # ℹ 4,242 more rows
After calculating the frequencies we got different counts for different genre so we can analyze which genre have how many combinations with duration_ms.
data3<-head(data,40)
data_group3 <- data %>%
group_by(genre) %>%
summarise(mean_loudness= mean(loudness),
)
print(data_group3)
## # A tibble: 523 × 2
## genre mean_loudness
## <chr> <dbl>
## 1 a cappella -37.8
## 2 abstract -11.9
## 3 abstract beats -8.69
## 4 abstract hip hop -6.46
## 5 acoustic opm -8.13
## 6 acoustic pop -9.25
## 7 adult standards -12.4
## 8 aesthetic rap -9.71
## 9 afghan pop -10.6
## 10 afrobeats -7.23
## # ℹ 513 more rows
summary(data_group3)
## genre mean_loudness
## Length:523 Min. :-37.841
## Class :character 1st Qu.: -9.626
## Mode :character Median : -7.528
## Mean : -8.528
## 3rd Qu.: -5.728
## Max. : -0.503
ggplot(data_group3, aes(genre,mean_loudness)) + geom_col(size=3.5, color="#3b5998") + labs(title = "Mean loudness played according to genre", 'genre','mean_loudness')+theme(axis.text.x = element_text(angle = 90))
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
data_group3 <- data_group3 %>%
mutate(exp_prob3 = (mean_loudness / sum(mean_loudness)))
print(data_group3)
## # A tibble: 523 × 3
## genre mean_loudness exp_prob3
## <chr> <dbl> <dbl>
## 1 a cappella -37.8 0.00848
## 2 abstract -11.9 0.00267
## 3 abstract beats -8.69 0.00195
## 4 abstract hip hop -6.46 0.00145
## 5 acoustic opm -8.13 0.00182
## 6 acoustic pop -9.25 0.00207
## 7 adult standards -12.4 0.00279
## 8 aesthetic rap -9.71 0.00218
## 9 afghan pop -10.6 0.00238
## 10 afrobeats -7.23 0.00162
## # ℹ 513 more rows
anomaly3 <- data_group3 %>%
filter(exp_prob3 == min(exp_prob3)) %>%
pull(genre)
print(anomaly3)
## [1] "j-idol"
data3 <- data %>%
mutate(anomaly_3 = ifelse(genre == anomaly3, "Anomaly", "Normal"))
print(head(data3$anomaly_3 == "Anomaly") )
## [1] FALSE FALSE FALSE FALSE FALSE FALSE
In this group genre is considered as anomaly as it is having less probability.
possible_combinations3 <- expand.grid(genre = unique(data$genre),
loudness = unique(data$loudness))
print(head(possible_combinations3))
## genre loudness
## 1 ambient guitar -17.984
## 2 art pop -17.984
## 3 edm -17.984
## 4 bass house -17.984
## 5 orchestral soundtrack -17.984
## 6 alt z -17.984
missing_combinations3 <- anti_join(possible_combinations3, data, by = c("genre" , "loudness"))
print(head(missing_combinations3))
## genre loudness
## 1 art pop -17.984
## 2 edm -17.984
## 3 bass house -17.984
## 4 orchestral soundtrack -17.984
## 5 alt z -17.984
## 6 alternative metal -17.984
data_for_combination3 <-head(data,8000)
combination_counts3 <- data_for_combination3 %>%
group_by(genre, loudness) %>%
summarise(count = n(), .groups = 'drop')
print(head(combination_counts3))
## # A tibble: 6 × 3
## genre loudness count
## <chr> <dbl> <int>
## 1 a cappella -37.8 2
## 2 abstract -11.9 2
## 3 abstract beats -8.69 2
## 4 abstract hip hop -6.46 1
## 5 acoustic opm -8.13 2
## 6 acoustic pop -12.6 2
After calculating the frequencies we got different counts for different genre so we can analyze which genre have how many combinations with loudness.
As I have categorical variables with a large number of unique categories, creating visualizations became cluttered and challenging to interpret. So it is more appropriate to aggregate or group the categories before visualization.
Due to this reason I am unable to plot the graphs with showing all the categorical names.
There is no clear pattern or relationship between the categorical and numerical variables. In these instances, creating a visualization may not reveal any meaningful insights.