Week3

R Markdown

This is an R Markdown document. Markdown is a simple formatting syntax for authoring HTML, PDF, and MS Word documents. For more details on using R Markdown see http://rmarkdown.rstudio.com.

When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

data1<-head(data,40)

Grouping Data by 1

data_group1 <- data1 %>%
  group_by(artistName) %>%
  summarise(avg_msplayed = mean(msPlayed),
            Maximum = max(msPlayed))
  
print(data_group1)

## # A tibble: 36 × 3
##    artistName        avg_msplayed Maximum
##    <chr>                    <dbl>   <int>
##  1 AURORA                 1945555 1945555
##  2 Austyn Johnson           60453   60453
##  3 Avenged Sevenfold        40539   40539
##  4 Beyoncé                    233     233
##  5 Bhikhari Bala            57728   57728
##  6 Black Hill              229535  229535
##  7 Brent Faiyaz            104821  104821
##  8 Catie Turner           6158627 6158627
##  9 Chris Lake              192455  192455
## 10 Clean Bandit            246584  246584
## # ℹ 26 more rows

summary(data_group1)

##   artistName         avg_msplayed        Maximum       
##  Length:36          Min.   :    233   Min.   :    233  
##  Class :character   1st Qu.: 116204   1st Qu.: 116204  
##  Mode  :character   Median : 217668   Median : 217668  
##                     Mean   :1184368   Mean   :1229712  
##                     3rd Qu.:1297857   3rd Qu.:1701588  
##                     Max.   :9160154   Max.   :9160154

ggplot(data_group1, aes(artistName,Maximum)) + geom_col(fill = "#bb0a69",colour = "yellow") + labs(title = "Avg msplayed by Artist", 'artistName','avg_maplayed')+theme(axis.text.x = element_text(angle = 90))

Calculating expected probability for each group based on ‘msplayed’

data_group1 <- data_group1 %>%
 mutate(exp_prob = (avg_msplayed/ sum(avg_msplayed)))
print(data_group1)

## # A tibble: 36 × 4
##    artistName        avg_msplayed Maximum   exp_prob
##    <chr>                    <dbl>   <int>      <dbl>
##  1 AURORA                 1945555 1945555 0.0456    
##  2 Austyn Johnson           60453   60453 0.00142   
##  3 Avenged Sevenfold        40539   40539 0.000951  
##  4 Beyoncé                    233     233 0.00000546
##  5 Bhikhari Bala            57728   57728 0.00135   
##  6 Black Hill              229535  229535 0.00538   
##  7 Brent Faiyaz            104821  104821 0.00246   
##  8 Catie Turner           6158627 6158627 0.144     
##  9 Chris Lake              192455  192455 0.00451   
## 10 Clean Bandit            246584  246584 0.00578   
## # ℹ 26 more rows

ggplot(data_group1, aes(artistName,exp_prob)) + geom_col(fill = "#15c8b1",colour = "red") + labs(title = "Avg msplayed by Artist", 'artistName','avg_maplayed')+theme(axis.text.x = element_text(angle = 90))

Assigning ‘anomaly’ tag to the lowest probability group

anomaly1 <- data_group1 %>%
  filter(exp_prob == min(exp_prob)) %>%
  pull(artistName)

print(anomaly1)

## [1] "Beyoncé"

Assigning the lowest probability group an “anomaly” tag and Translating anomaly back into your original data frame i.e data1

data1 <- data1 %>%
  mutate(anomaly_1 = ifelse(artistName == anomaly1, "Anomaly", "Normal"))
print(data1)

##                                   trackName               artistName msPlayed
## 1                            A Better Place              Project AER   119999
## 2                         A Dangerous Thing                   AURORA  1945555
## 3               A Different Way (with Lauv)                 DJ Snake    66060
## 4                           A Drug From God               Chris Lake   192455
## 5                       A Gift Of A Thistle             James Horner    97568
## 6                        A Little Bit Yours                  JP Saxe    99339
## 7                             A Little More             Catie Turner  6158627
## 8                  A Little Piece of Heaven        Avenged Sevenfold    40539
## 9                          A Million Dreams              Ziv Zaifman   269453
## 10               A Million Dreams (Reprise)           Austyn Johnson    60453
## 11                           A Moment Apart                   ODESZA   300828
## 12                          A New Beginning                   Yasumu   143378
## 13                      A Night to Remember High School Musical Cast  1514502
## 14   A Night to Remember - Original Version High School Musical Cast   832507
## 15                  A Place Among the Stars              Hans Zimmer  1670916
## 16                      A Sky Full of Stars            Taron Egerton   372839
## 17        A Soulmate Who Wasn’t Meant to Be               Jess Benko  9160154
## 18            A wild river to take you home               Black Hill   229535
## 19 A+E (feat. Kandaka Moore & Nikki Cislyn)             Clean Bandit   246584
## 20                                   A-Punk          Vampire Weekend  1804847
## 21                                AGEN WIDA                  JOYRYDE   198997
## 22                          ALIEN SUPERSTAR                  Beyoncé      233
## 23                                 ALL MINE             Brent Faiyaz   104821
## 24                                 ALLERGIC                    clide   155200
## 25                                  AMAZING        Rex Orange County  6890534
## 26             AMNESIA (feat. Boy In Space)                DREAMDNVR  3734865
## 27                                ANGOSTURA                    keshi   171018
## 28                                   ANKLES             Jessie Reyez   187423
## 29                                   ANUBIS                     KUTE   262699
## 30                                 AVOID ME                     KUTE   800336
## 31                               AVOID ME 3                     KUTE    40239
## 32                                  Aaftaab          The Local Train  2540803
## 33                 Aag Lage Chahe Basti Mai                  Sirazee    27373
## 34                          Aahe Neelasaila            Bhikhari Bala    57728
## 35                                  Aaj Bhi            Vishal Mishra  1793602
## 36            Aaj Bhi - WORMONO LoFi Remake            Vishal Mishra    76002
## 37                           Aaja We Mahiya               Imran Khan  2400332
## 38         Aajkal Tere Mere Pyar Ke Charche          Suman Kalyanpur   205800
## 39                           Aankhein Khuli              Jatin-Lalit   173287
## 40                           Aap Ki Kashish        Himesh Reshammiya   333653
##                        genre danceability energy key loudness speechiness
## 1             ambient guitar       0.4960 0.2550   9  -17.984      0.0283
## 2                    art pop       0.5410 0.5560  11   -6.150      0.0356
## 3                        edm       0.7840 0.7570   8   -3.912      0.0384
## 4                 bass house       0.7140 0.8830   9   -4.430      0.0625
## 5      orchestral soundtrack       0.0828 0.0120   9  -36.045      0.0451
## 6                      alt z       0.5980 0.2950   1   -8.553      0.0276
## 7                      alt z       0.7920 0.4840   4   -9.897      0.1920
## 8          alternative metal       0.4860 0.8810   2   -5.623      0.0474
## 9                 show tunes       0.2650 0.3120   7  -11.689      0.0569
## 10                show tunes       0.2530 0.1390   6  -17.067      0.0414
## 11                 chillwave       0.5150 0.6630   7   -7.896      0.0328
## 12               lo-fi study       0.6950 0.2870   0  -14.608      0.0499
## 13             post-teen pop       0.7980 0.9040   0   -3.316      0.0476
## 14             post-teen pop       0.7980 0.9040   0   -3.316      0.0476
## 15         german soundtrack       0.1710 0.0325   4  -25.382      0.0430
## 16                 hollywood       0.4400 0.6760   6   -7.570      0.0582
## 17   gen z singer-songwriter       0.5710 0.0274   9  -20.274      0.0649
## 18    instrumental post-rock       0.4580 0.3610   6  -12.066      0.0353
## 19                       pop       0.8080 0.5190   5   -8.268      0.0403
## 20               baroque pop       0.5510 0.8190   2   -4.489      0.0525
## 21                bass house       0.7840 0.9130   1   -1.208      0.1770
## 22                       pop       0.5450 0.6410  10   -6.398      0.0998
## 23                       r&b       0.6170 0.3780   9   -8.540      0.0315
## 24     singer-songwriter pop       0.6090 0.4280  11   -7.796      0.0244
## 25               bedroom pop       0.7190 0.4610   8   -8.191      0.0454
## 26                  dark r&b       0.7080 0.4200   9   -9.478      0.1210
## 27                 chill r&b       0.5360 0.6590   6   -9.732      0.0483
## 28 canadian contemporary r&b       0.5140 0.7490   8   -4.336      0.4030
## 29          aggressive phonk       0.6270 0.9930   1   -1.304      0.0714
## 30          aggressive phonk       0.6560 0.9640   6   -2.087      0.0964
## 31          aggressive phonk       0.7060 0.9330  10   -0.096      0.0721
## 32               hindi indie       0.4640 0.3910   6   -9.884      0.0384
## 33             himachali pop       0.6130 0.5510   9   -8.613      0.1980
## 34               odia bhajan       0.3280 0.5710   1   -6.913      0.0395
## 35                  desi pop       0.5810 0.3040   1   -9.707      0.0274
## 36                  desi pop       0.3070 0.2650   1  -14.701      0.0497
## 37              desi hip hop       0.6390 0.5940  10   -6.497      0.0347
## 38         classic bollywood       0.5560 0.7920   9  -10.683      0.0715
## 39                afghan pop       0.7920 0.6350   7  -13.940      0.0863
## 40                     filmi       0.6900 0.7800   8   -7.412      0.0489
##    valence   tempo                     id duration_ms anomaly_1
## 1   0.0809 141.961 2oC9Ah7npALCCPW5DC1gob      120000    Normal
## 2   0.1060 105.886 0PDlmmYkuQCUAFhMXvtlsU      215573    Normal
## 3   0.5870 104.996 1YMBg7rOjxzbya0fPOYfNX      198286    Normal
## 4   0.8190 126.016 4skbQNtyjy8A7mo8oqe2oD      192455    Normal
## 5   0.0578 169.895 3Yvi5NkUrSppVwrMHYkB6u       97667    Normal
## 6   0.3140  87.025 00cBcYOlnHoXX9ver3cmdE      225680    Normal
## 7   0.2450  96.939 2thRPgNgcWAwAYxP4HWdji      224528    Normal
## 8   0.6670 144.997 1BLfQ6dPXmuDrFmbdfW7Jl      480707    Normal
## 9   0.0998  54.747 0RoA7ObU6phWpqhlC9zH4Z      269453    Normal
## 10  0.1020  74.369 66y7x28jXOPrcmu3D5Zjh6       60453    Normal
## 11  0.0756 120.024 59wlTaYOL5tDUgXnbBQ3my      234244    Normal
## 12  0.3790 128.028 0xtbVIWkbfu5G6TgCVmvVn      143379    Normal
## 13  0.9660 135.008 3RuXekS5criwismaaj87iF      236400    Normal
## 14  0.9660 135.008 2BHu6HOFXgHR936cj10XIi      236400    Normal
## 15  0.0284  70.821 7KNoh3Bn4QfgkIGBywkAB0      207118    Normal
## 16  0.4290 125.013 2VeLdiYiILcCOEV0izfzEW      206438    Normal
## 17  0.0870 124.209 0gOz9JUXsaKVzLTSmFDtdo      316556    Normal
## 18  0.2310  86.966 0UFkbnxj34vZVgwwEDy29e      229536    Normal
## 19  0.3840 114.977 1Qi6CGoqTG1UvFcuRzKRNL      246584    Normal
## 20  0.8460 174.917 3AydAydLzyyZutA0375XIz      137760    Normal
## 21  0.5900 126.054 35V6qXUDCvwZdmbKe5PlPG      199048    Normal
## 22  0.4640 121.892 1Hohk6AufHZOrrhMXZppax      215460   Anomaly
## 23  0.2200 141.967 3XgGQ1wjo5khvq2UImjyNF      216063    Normal
## 24  0.6270  96.071 6DP8InyxdyYChHb2tcV6ia      155200    Normal
## 25  0.5480 123.915 3OM6qQmdFV6uy61GIqpRtf      209200    Normal
## 26  0.3170 145.981 1UyDQh3HBJwL6IxR9mXipZ      175827    Normal
## 27  0.1990 170.035 38umMmZQdeoOG7Zojor4g3      171020    Normal
## 28  0.4160 179.623 3ZWHzPucGUZvgNleqllzdK      169240    Normal
## 29  0.5800 109.975 1aqP9g3lLR8bpFtkCTm41N      131455    Normal
## 30  0.9650 114.007 5FX30idriKlEIRYTxpNf65      105789    Normal
## 31  0.7610 114.042 7dXyRgDZlQnA5OU1SPD1oe      120000    Normal
## 32  0.4180 106.887 4LtSTc3xANVhYeeN69nscM      233472    Normal
## 33  0.4820 110.243 0gbNOv3x5M87aczFOIEzfp      142410    Normal
## 34  0.6540 186.602 3fB2jhxOmwMsTxCYpjYjd4      313748    Normal
## 35  0.0973 109.006 65jpYG7TQCh0T7TtKO4Rjj      241206    Normal
## 36  0.0569  94.115 0muXl7VdnqjsMH3IDS2JFI      217976    Normal
## 37  0.3780  80.034 75Ak0cit8hV604Wjlq1gf2      232067    Normal
## 38  0.7760  98.554 2uiAZVVKqqYlM6KM9POTYq      309493    Normal
## 39  0.6480 122.004 16XEVyPh5NT31CAAqPbxQF      422713    Normal
## 40  0.7180 110.963 35Rrbt58ADLxyo1lu7xhZu      333653    Normal

##Conclusion for the 1st “group by” data frame: ##

##Here we can see that Beyonce is having the less expected probability compared to others so Anomaly tag is attached to it after translating to original data.##

possible_combinations <- expand.grid(artistName = unique(data1$artistName),
                                      msPlayed = unique(data1$msPlayed))
print(head(possible_combinations))

##     artistName msPlayed
## 1  Project AER   119999
## 2       AURORA   119999
## 3     DJ Snake   119999
## 4   Chris Lake   119999
## 5 James Horner   119999
## 6      JP Saxe   119999

missing_combinations <- anti_join(possible_combinations, data1, by = c("artistName", "msPlayed"))
print(head(missing_combinations))

##     artistName msPlayed
## 1       AURORA   119999
## 2     DJ Snake   119999
## 3   Chris Lake   119999
## 4 James Horner   119999
## 5      JP Saxe   119999
## 6 Catie Turner   119999

Calculating the frequency of each combination

data_for_combination<-head(data,8000) 

combination_counts <- data_for_combination %>%
  group_by(artistName, msPlayed) %>%
  summarise(count = n(), .groups = 'drop')
print(combination_counts)

## # A tibble: 4,288 × 3
##    artistName        msPlayed count
##    <chr>                <int> <int>
##  1 !!!                   3130     1
##  2 !!!                 227066     2
##  3 $NOT                360000     2
##  4 $uicideboy$         120005     1
##  5 11:11 Music Group   134582     2
##  6 12 Stones           206562     1
##  7 1nonly              270757     2
##  8 2 Chainz             50807     1
##  9 22 Void Beats       828909     1
## 10 24Herbs                263     1
## # ℹ 4,278 more rows

After calculating the frequencies we got different counts for different artistNames so we can analyze which artists have how many combinations with msPlayed.

Understanding well about Combinations and Missing Combinations

combination_data <- table(data1$artistName, data1$msPlayed)
print(head(combination_data))

##                    
##                     233 27373 40239 40539 57728 60453 66060 76002 97568 99339
##   AURORA              0     0     0     0     0     0     0     0     0     0
##   Austyn Johnson      0     0     0     0     0     1     0     0     0     0
##   Avenged Sevenfold   0     0     0     1     0     0     0     0     0     0
##   Beyoncé             1     0     0     0     0     0     0     0     0     0
##   Bhikhari Bala       0     0     0     0     1     0     0     0     0     0
##   Black Hill          0     0     0     0     0     0     0     0     0     0
##                    
##                     104821 119999 143378 155200 171018 173287 187423 192455
##   AURORA                 0      0      0      0      0      0      0      0
##   Austyn Johnson         0      0      0      0      0      0      0      0
##   Avenged Sevenfold      0      0      0      0      0      0      0      0
##   Beyoncé                0      0      0      0      0      0      0      0
##   Bhikhari Bala          0      0      0      0      0      0      0      0
##   Black Hill             0      0      0      0      0      0      0      0
##                    
##                     198997 205800 229535 246584 262699 269453 300828 333653
##   AURORA                 0      0      0      0      0      0      0      0
##   Austyn Johnson         0      0      0      0      0      0      0      0
##   Avenged Sevenfold      0      0      0      0      0      0      0      0
##   Beyoncé                0      0      0      0      0      0      0      0
##   Bhikhari Bala          0      0      0      0      0      0      0      0
##   Black Hill             0      0      1      0      0      0      0      0
##                    
##                     372839 800336 832507 1514502 1670916 1793602 1804847
##   AURORA                 0      0      0       0       0       0       0
##   Austyn Johnson         0      0      0       0       0       0       0
##   Avenged Sevenfold      0      0      0       0       0       0       0
##   Beyoncé                0      0      0       0       0       0       0
##   Bhikhari Bala          0      0      0       0       0       0       0
##   Black Hill             0      0      0       0       0       0       0
##                    
##                     1945555 2400332 2540803 3734865 6158627 6890534 9160154
##   AURORA                  1       0       0       0       0       0       0
##   Austyn Johnson          0       0       0       0       0       0       0
##   Avenged Sevenfold       0       0       0       0       0       0       0
##   Beyoncé                 0       0       0       0       0       0       0
##   Bhikhari Bala           0       0       0       0       0       0       0
##   Black Hill              0       0       0       0       0       0       0

missing_combination <- which(combination_data == 0, arr.ind = TRUE)
print(head(missing_combination))

##                   row col
## AURORA              1   1
## Austyn Johnson      2   1
## Avenged Sevenfold   3   1
## Bhikhari Bala       5   1
## Black Hill          6   1
## Brent Faiyaz        7   1

Group by 2

data2<-head(data,40)

  data_group2 <- data %>%
  group_by(genre) %>%
  summarise(median_duration = median(duration_ms),
            Minimun2 = min(duration_ms))
  
print(head(data_group2))

## # A tibble: 6 × 3
##   genre            median_duration Minimun2
##   <chr>                      <dbl>    <int>
## 1 a cappella                269636   269636
## 2 abstract                  465200   465200
## 3 abstract beats             73898    73898
## 4 abstract hip hop          283747   283747
## 5 acoustic opm              244800   244800
## 6 acoustic pop              232941   186584

summary(data_group2)

##     genre           median_duration     Minimun2     
##  Length:523         Min.   : 65800   Min.   : 10027  
##  Class :character   1st Qu.:173634   1st Qu.:132486  
##  Mode  :character   Median :200673   Median :171467  
##                     Mean   :206649   Mean   :177224  
##                     3rd Qu.:230168   3rd Qu.:211425  
##                     Max.   :621659   Max.   :621659

ggplot(data_group2, aes(genre,median_duration)) + geom_point(size=3.5, color="#ff9600") + labs(title = "Avg duration played according to genre", 'genre','avg_duration')+theme(axis.text.x = element_text(angle = 90))

ggplot(data_group2, aes(genre,median_duration))  + 
    geom_boxplot()+theme(axis.title.x=element_text(colour="blue"),axis.title.y = element_text(colour="blue"))+ labs(title = "genre vs median_duration")

data_group2 <- data_group2 %>%
 mutate(exp_prob2 = (median_duration / sum(median_duration)))
print(head(data_group2))

## # A tibble: 6 × 4
##   genre            median_duration Minimun2 exp_prob2
##   <chr>                      <dbl>    <int>     <dbl>
## 1 a cappella                269636   269636  0.00249 
## 2 abstract                  465200   465200  0.00430 
## 3 abstract beats             73898    73898  0.000684
## 4 abstract hip hop          283747   283747  0.00263 
## 5 acoustic opm              244800   244800  0.00227 
## 6 acoustic pop              232941   186584  0.00216

anomaly2 <- data_group2 %>%
  filter(exp_prob2 == min(exp_prob2)) %>%
  pull(genre)

print(anomaly2)

## [1] "ballet class"

ggplot(data_group2, aes(genre, exp_prob2)) + 
  geom_point()+
  geom_smooth()

## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'

data2 <- data %>%
  mutate(anomaly_1 = ifelse(artistName == anomaly1, "Anomaly", "Normal"))
print(head(data2))

##                     trackName   artistName msPlayed                 genre
## 1              A Better Place  Project AER   119999        ambient guitar
## 2           A Dangerous Thing       AURORA  1945555               art pop
## 3 A Different Way (with Lauv)     DJ Snake    66060                   edm
## 4             A Drug From God   Chris Lake   192455            bass house
## 5         A Gift Of A Thistle James Horner    97568 orchestral soundtrack
## 6          A Little Bit Yours      JP Saxe    99339                 alt z
##   danceability energy key loudness speechiness valence   tempo
## 1       0.4960  0.255   9  -17.984      0.0283  0.0809 141.961
## 2       0.5410  0.556  11   -6.150      0.0356  0.1060 105.886
## 3       0.7840  0.757   8   -3.912      0.0384  0.5870 104.996
## 4       0.7140  0.883   9   -4.430      0.0625  0.8190 126.016
## 5       0.0828  0.012   9  -36.045      0.0451  0.0578 169.895
## 6       0.5980  0.295   1   -8.553      0.0276  0.3140  87.025
##                       id duration_ms anomaly_1
## 1 2oC9Ah7npALCCPW5DC1gob      120000    Normal
## 2 0PDlmmYkuQCUAFhMXvtlsU      215573    Normal
## 3 1YMBg7rOjxzbya0fPOYfNX      198286    Normal
## 4 4skbQNtyjy8A7mo8oqe2oD      192455    Normal
## 5 3Yvi5NkUrSppVwrMHYkB6u       97667    Normal
## 6 00cBcYOlnHoXX9ver3cmdE      225680    Normal

Finding Possible and Missing Combinations.

possible_combinations2 <- expand.grid(genre = unique(data$genre),
                                      duration_ms = unique(data$duration_ms))
print(head(possible_combinations))

##     artistName msPlayed
## 1  Project AER   119999
## 2       AURORA   119999
## 3     DJ Snake   119999
## 4   Chris Lake   119999
## 5 James Horner   119999
## 6      JP Saxe   119999

missing_combinations2 <- anti_join(possible_combinations2, data, by = c("genre" , "duration_ms"))
print(head(missing_combinations2))

##                   genre duration_ms
## 1               art pop      120000
## 2                   edm      120000
## 3            bass house      120000
## 4 orchestral soundtrack      120000
## 5                 alt z      120000
## 6     alternative metal      120000

data_for_combination2 <-head(data,8000)

combination_counts2 <- data_for_combination2 %>%
  group_by(genre, duration_ms) %>%
  summarise(count = n(), .groups = 'drop')
print(combination_counts2)

## # A tibble: 4,252 × 3
##    genre            duration_ms count
##    <chr>                  <int> <int>
##  1 a cappella            269636     2
##  2 abstract              465200     2
##  3 abstract beats         73898     2
##  4 abstract hip hop      283747     1
##  5 acoustic opm          244800     2
##  6 acoustic pop          186584     2
##  7 acoustic pop          189613     2
##  8 acoustic pop          198853     2
##  9 acoustic pop          226680     2
## 10 acoustic pop          231587     1
## # ℹ 4,242 more rows

After calculating the frequencies we got different counts for different genre so we can analyze which genre have how many combinations with duration_ms.

Group by 3

data3<-head(data,40)

data_group3 <- data %>%
  group_by(genre) %>%
  summarise(mean_loudness= mean(loudness),
            )
  
print(data_group3)

## # A tibble: 523 × 2
##    genre            mean_loudness
##    <chr>                    <dbl>
##  1 a cappella              -37.8 
##  2 abstract                -11.9 
##  3 abstract beats           -8.69
##  4 abstract hip hop         -6.46
##  5 acoustic opm             -8.13
##  6 acoustic pop             -9.25
##  7 adult standards         -12.4 
##  8 aesthetic rap            -9.71
##  9 afghan pop              -10.6 
## 10 afrobeats                -7.23
## # ℹ 513 more rows

summary(data_group3)

##     genre           mean_loudness    
##  Length:523         Min.   :-37.841  
##  Class :character   1st Qu.: -9.626  
##  Mode  :character   Median : -7.528  
##                     Mean   : -8.528  
##                     3rd Qu.: -5.728  
##                     Max.   : -0.503

ggplot(data_group3, aes(genre,mean_loudness)) + geom_col(size=3.5, color="#3b5998") + labs(title = "Mean loudness played according to genre", 'genre','mean_loudness')+theme(axis.text.x = element_text(angle = 90))

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

data_group3 <- data_group3 %>%
 mutate(exp_prob3 = (mean_loudness / sum(mean_loudness)))
print(data_group3)

## # A tibble: 523 × 3
##    genre            mean_loudness exp_prob3
##    <chr>                    <dbl>     <dbl>
##  1 a cappella              -37.8    0.00848
##  2 abstract                -11.9    0.00267
##  3 abstract beats           -8.69   0.00195
##  4 abstract hip hop         -6.46   0.00145
##  5 acoustic opm             -8.13   0.00182
##  6 acoustic pop             -9.25   0.00207
##  7 adult standards         -12.4    0.00279
##  8 aesthetic rap            -9.71   0.00218
##  9 afghan pop              -10.6    0.00238
## 10 afrobeats                -7.23   0.00162
## # ℹ 513 more rows

anomaly3 <- data_group3 %>%
  filter(exp_prob3 == min(exp_prob3)) %>%
  pull(genre)

print(anomaly3)

## [1] "j-idol"

data3 <- data %>%
  mutate(anomaly_3 = ifelse(genre == anomaly3, "Anomaly", "Normal"))
print(head(data3$anomaly_3 == "Anomaly") )

## [1] FALSE FALSE FALSE FALSE FALSE FALSE

In this group genre is considered as anomaly as it is having less probability.

Finding Possible and Missing Combinations.

possible_combinations3 <- expand.grid(genre = unique(data$genre),
                                      loudness = unique(data$loudness))
print(head(possible_combinations3))

##                   genre loudness
## 1        ambient guitar  -17.984
## 2               art pop  -17.984
## 3                   edm  -17.984
## 4            bass house  -17.984
## 5 orchestral soundtrack  -17.984
## 6                 alt z  -17.984

missing_combinations3 <- anti_join(possible_combinations3, data, by = c("genre" , "loudness"))
print(head(missing_combinations3))

##                   genre loudness
## 1               art pop  -17.984
## 2                   edm  -17.984
## 3            bass house  -17.984
## 4 orchestral soundtrack  -17.984
## 5                 alt z  -17.984
## 6     alternative metal  -17.984

data_for_combination3 <-head(data,8000)

combination_counts3 <- data_for_combination3 %>%
  group_by(genre, loudness) %>%
  summarise(count = n(), .groups = 'drop')
print(head(combination_counts3))

## # A tibble: 6 × 3
##   genre            loudness count
##   <chr>               <dbl> <int>
## 1 a cappella         -37.8      2
## 2 abstract           -11.9      2
## 3 abstract beats      -8.69     2
## 4 abstract hip hop    -6.46     1
## 5 acoustic opm        -8.13     2
## 6 acoustic pop       -12.6      2

After calculating the frequencies we got different counts for different genre so we can analyze which genre have how many combinations with loudness.

Week3

2023-09-10

R Markdown

Grouping Data by 1

Calculating expected probability for each group based on ‘msplayed’

Assigning ‘anomaly’ tag to the lowest probability group

Assigning the lowest probability group an “anomaly” tag and Translating anomaly back into your original data frame i.e data1

Calculating the frequency of each combination

Understanding well about Combinations and Missing Combinations

Group by 2

Finding Possible and Missing Combinations.

Group by 3

Finding Possible and Missing Combinations.

Conclusion