Music is the language of the spirit. It opens the secret of life bringing peace, abolishing strife. - Kahlil Gibran

1 Introduction

1.1 Statement of Problems

Songs and music are examples of the types of art that exist in today’s modern world. With technology, the development of modern music is proliferating. In the case of music itself, many experts make music groupings with their respective criteria. For example, we know the types of music Pop, Rock, and Jazz. However, in its development, the custers of songs and music is not limited to these three genres.

In its development, music has a spectrum and variety of genres that are very branched. A song or music can be interpreted differently by the audience because of subjective measurements based on preferences for the tones, lyrics, and instruments in a piece of music. However, it turns out that music grouping can be done more objectively through measurement and custering in a mathematical way.

1.2 Project Goals

This project attempt to, first, cluster various examples of songs and music based on several criteria that are measured. By doing clustering, we can make more objective clusters. Second, we can even produce new groupings about how music differs from one another. Last, at the end of this project we will make interpretation from those clusters.

2 Solution

2.1 Data Preparation

library(dplyr)
library(tidyr)
library(GGally)
library(FactoMineR)
library(factoextra)
library(ggplot2)

Import the data and safe as df

df <- read.csv("SpotifyFeatures.csv")
glimpse(df)

#> Rows: 232,725
#> Columns: 18
#> $ ï..genre         <chr> "Movie", "Movie", "Movie", "Movie", "Movie", "Movie",~
#> $ artist_name      <chr> "Henri Salvador", "Martin & les fÃ©es", "Joseph Willi~
#> $ track_name       <chr> "C'est beau de faire un Show", "Perdu d'avance (par G~
#> $ track_id         <chr> "0BRjO6ga9RKCKjfDqeFgWV", "0BjC1NfoEOOusryehmNudP", "~
#> $ popularity       <int> 0, 1, 3, 0, 4, 0, 2, 15, 0, 10, 0, 2, 4, 3, 0, 0, 0, ~
#> $ acousticness     <dbl> 0.61100, 0.24600, 0.95200, 0.70300, 0.95000, 0.74900,~
#> $ danceability     <dbl> 0.389, 0.590, 0.663, 0.240, 0.331, 0.578, 0.703, 0.41~
#> $ duration_ms      <int> 99373, 137373, 170267, 152427, 82625, 160627, 212293,~
#> $ energy           <dbl> 0.9100, 0.7370, 0.1310, 0.3260, 0.2250, 0.0948, 0.270~
#> $ instrumentalness <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.123~
#> $ key              <chr> "C#", "F#", "C", "C#", "F", "C#", "C#", "F#", "C", "G~
#> $ liveness         <dbl> 0.3460, 0.1510, 0.1030, 0.0985, 0.2020, 0.1070, 0.105~
#> $ loudness         <dbl> -1.828, -5.559, -13.879, -12.178, -21.150, -14.970, -~
#> $ mode             <chr> "Major", "Minor", "Minor", "Major", "Major", "Major",~
#> $ speechiness      <dbl> 0.0525, 0.0868, 0.0362, 0.0395, 0.0456, 0.1430, 0.953~
#> $ tempo            <dbl> 166.969, 174.003, 99.488, 171.758, 140.576, 87.479, 8~
#> $ time_signature   <chr> "4-Apr", "4-Apr", "4-May", "4-Apr", "4-Apr", "4-Apr",~
#> $ valence          <dbl> 0.8140, 0.8160, 0.3680, 0.2270, 0.3900, 0.3580, 0.533~

df1 <- df %>% 
  mutate(genre = as.factor(ï..genre),
          artist_name = as.factor(artist_name),
         key = as.factor(key),
         mode = as.factor(mode),
         time_signature = as.factor(time_signature))

head(df1)

# NA Checking
colSums(is.na(df1))

#>         ï..genre      artist_name       track_name         track_id 
#>                0                0                0                0 
#>       popularity     acousticness     danceability      duration_ms 
#>                0                0                0                0 
#>           energy instrumentalness              key         liveness 
#>                0                0                0                0 
#>         loudness             mode      speechiness            tempo 
#>                0                0                0                0 
#>   time_signature          valence            genre 
#>                0                0                0

There is zero NA values in this dataset.

# take only numeric column

song <- df1 %>% 
  select_if(is.numeric) %>% 
  select(-c(popularity, duration_ms))
glimpse(song)

#> Rows: 232,725
#> Columns: 9
#> $ acousticness     <dbl> 0.61100, 0.24600, 0.95200, 0.70300, 0.95000, 0.74900,~
#> $ danceability     <dbl> 0.389, 0.590, 0.663, 0.240, 0.331, 0.578, 0.703, 0.41~
#> $ energy           <dbl> 0.9100, 0.7370, 0.1310, 0.3260, 0.2250, 0.0948, 0.270~
#> $ instrumentalness <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.123~
#> $ liveness         <dbl> 0.3460, 0.1510, 0.1030, 0.0985, 0.2020, 0.1070, 0.105~
#> $ loudness         <dbl> -1.828, -5.559, -13.879, -12.178, -21.150, -14.970, -~
#> $ speechiness      <dbl> 0.0525, 0.0868, 0.0362, 0.0395, 0.0456, 0.1430, 0.953~
#> $ tempo            <dbl> 166.969, 174.003, 99.488, 171.758, 140.576, 87.479, 8~
#> $ valence          <dbl> 0.8140, 0.8160, 0.3680, 0.2270, 0.3900, 0.3580, 0.533~

Of the many columns of the original dataset, the column with numeric values (excluding categorical) will be used in PCA and Clustering analysis.

2.2 Understand the Data

Before we carry out further preparation and analysis, we should understand the definition of each variable that we use. The following is an explanation of each variable:

Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
Danceability: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Energy: represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context.
Livenes: assertion of human presence within a technological network of communication (Sanden 2013)[https://www.routledge.com/Liveness-in-Modern-Music-Musicians-Technology-and-the-Perception-of-Performance/Sanden/p/book/9781138107977]
Loudness: That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud.
Speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.
Tempo: The overall estimated tempo of a track in beats per minute (BPM).
Valence: describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).

3 Exploratory Data Analysis

3.1 Correlation Matrix

ggcorr(song, label = T)

3.2 Summary of the Data

summary(song)

#>   acousticness     danceability        energy          instrumentalness   
#>  Min.   :0.0000   Min.   :0.0569   Min.   :0.0000203   Min.   :0.0000000  
#>  1st Qu.:0.0376   1st Qu.:0.4350   1st Qu.:0.3850000   1st Qu.:0.0000000  
#>  Median :0.2320   Median :0.5710   Median :0.6050000   Median :0.0000443  
#>  Mean   :0.3686   Mean   :0.5544   Mean   :0.5709577   Mean   :0.1483012  
#>  3rd Qu.:0.7220   3rd Qu.:0.6920   3rd Qu.:0.7870000   3rd Qu.:0.0358000  
#>  Max.   :0.9960   Max.   :0.9890   Max.   :0.9990000   Max.   :0.9990000  
#>     liveness          loudness        speechiness         tempo       
#>  Min.   :0.00967   Min.   :-52.457   Min.   :0.0222   Min.   : 30.38  
#>  1st Qu.:0.09740   1st Qu.:-11.771   1st Qu.:0.0367   1st Qu.: 92.96  
#>  Median :0.12800   Median : -7.762   Median :0.0501   Median :115.78  
#>  Mean   :0.21501   Mean   : -9.570   Mean   :0.1208   Mean   :117.67  
#>  3rd Qu.:0.26400   3rd Qu.: -5.501   3rd Qu.:0.1050   3rd Qu.:139.05  
#>  Max.   :1.00000   Max.   :  3.744   Max.   :0.9670   Max.   :242.90  
#>     valence      
#>  Min.   :0.0000  
#>  1st Qu.:0.2370  
#>  Median :0.4440  
#>  Mean   :0.4549  
#>  3rd Qu.:0.6600  
#>  Max.   :1.0000

The data used has nine columns with different rating scales. It can be seen that acousticness, daceability, energy, instrumentalness, liveness, speechiness, and valence have a similar scale, namely 0 -1. Meanwhile, the loudness and tempo columns have different scales.

# Covarians Matrix 
head(var(song))

#>                  acousticness danceability      energy instrumentalness
#> acousticness      0.125860362 -0.024004549 -0.06781644      0.033958915
#> danceability     -0.024004549  0.034450413  0.01593181     -0.020508345
#> energy           -0.067816440  0.015931805  0.06940883     -0.030227884
#> instrumentalness  0.033958915 -0.020508345 -0.03022788      0.091668682
#> liveness          0.004853762 -0.001534008  0.01007115     -0.008055978
#> loudness         -1.468729115  0.488376611  1.28963126     -0.919510999
#>                      liveness    loudness  speechiness      tempo      valence
#> acousticness      0.004853762 -1.46872911  0.009933929 -2.6116544 -0.030059094
#> danceability     -0.001534008  0.48837661  0.004633400  0.1258228  0.026411285
#> energy            0.010071149  1.28963126  0.007092851  1.8623333  0.029925682
#> instrumentalness -0.008055978 -0.91951100 -0.009950208 -0.9741865 -0.024214148
#> liveness          0.039312018  0.05433307  0.018764818 -0.3146191  0.000608679
#> loudness          0.054333071 35.97844681 -0.002529084 42.3244475  0.623816414

In unsupervised learning, specifically on Principal Component Analysis, differences in scale will significantly affect PCA results and flawed analysis. Variables with a larger scale will have more influence, and variables with a small scale will have no visible contribution to the PCA. Therefore, before performing PCA, we must standardize using the z-score.

4 Principal Component Analysis

The scale() function will reset the data scale using the z-score. After that, we can make PCA.

songz <- scale(song)
song_pca <- PCA(X = songz,
                scale.unit = F, # scale F karena sebelumnya sudah dilakukan scaling
                graph = F,
                ncp = 9)
data.frame(song_pca$eig)

By looking at the eigenvalues of PCA, we can see which components have contributed to PCA. Comp 1 and Comp 2 will try to capture the most information compared to other Comp. From the cumulative percentage, Comp 1 and 2 have won 56% of the data. The more Comp, the smaller the contribution. In this case, we will take at least 85% of the information. Of the nine comps, we will take up to comp 5. Taking 85% of the information means we lose 15% of the information that is not retrieved.

4.1 Variables Contribution to Comp

PCA allows us to know the contribution of each variable to a particular Comp. For example, we want to know which variables contribute to Comp 1 (as Comp captures the most information). We found that loudness, energy, acoustics, valence, and danceability variables (which crossed the red line) were significant variables.

fviz_contrib(song_pca,
             choice = "var", # kontribusi variable
             axes = 1) # mengambil PC yang ke berapa

Subtitute axes become 2 to see the contribution to Comp 2.

fviz_contrib(song_pca,
             choice = "var", # kontribusi variable
             axes = 2) # PC ke berapa?

Only speechiness and liveness are seen to be significant.

4.2 Visualization

PCA analysis will generally show two plots, commonly called biplots. Biplot will show the distribution of individual data and the correlation between variables and their contribution to Comp.

song_pca2 <- prcomp(song, scale = T)

pca_vis <- biplot(song_pca2,
       cex = 0.5,
       scale = F)

pca_vis

#> NULL

4.3 Individual Plot

Individual plots will plot each observation onto a two-dimensional graph. Due to a large amount of data, we get a graph with very dense points. In the chart below, less information can be obtained.

plot_pca <- plot.PCA(x = song_pca,
         choix = "ind", 
         select = "contrib 10")
plot_pca

4.4 Variable Plot

To see the variables and their relation to Comp 1 and Comp 2, we use a plot.PCA with Choix = “var”. The graph below does not show the distribution points but only the variable arrows. From the direction of the arrow and the length of the arrow, we can take insight.

plot.PCA(x = song_pca,
         choix = "var")

Direction of the arrow with X and Y-axis

Horizontal or vertical arrows indicate the tendency for these variables to be summarized the most by PC 1 or PC 2. PC 1 summarizes the most information from energy, valence, and loudness variables. While PC 2 mainly summarizes information from speechiness and liveness. Tempo, acousticness, and instrumentalness have a not too significant contribution compared to other variables.

fviz_pca_var(song_pca, col.var="contrib",
             gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
             repel = TRUE # Avoid text overlapping
             )

Arrow length
The length of the arrow is an indicator of the magnitude of the contribution of these variables to PC 1 or PC 2. For PC 1, acousticness, loudness, and energy contribute significantly, while speechiness and liveness variables contribute significantly to PC 2.
Angle between arrows
The narrower the angle between the arrows indicates the high correlation between variables. An angle less than 90 is a positive correlation. An angle closer to 180 degrees means a negative correlation, and an angle closer to 90 degrees means a weak correlation.

Strong Positive Correlation

daceability, energy, valence, loudness
liveness, speechiness

Strong Negative Correlation

acousticness dengan daceability, energy, valence, loudness
liveness, speechiness dengan instrumentalness

Weak/No Correlation

daceability, energy, valence, loudness dengan liveness, speechiness

5 Clustering with K-Means

5.1 Subsetting Components

head(song_pca$ind$coord[,1:5])

#>       Dim.1       Dim.2      Dim.3     Dim.4       Dim.5
#> 1  1.536645  0.01776867  1.2923246 1.1526013  0.17904537
#> 2  1.733206 -0.72461861  0.4588937 1.5785938  0.08757924
#> 3 -1.718394 -0.10520193 -1.5885322 0.5148125 -1.14236750
#> 4 -1.525497 -0.79869202  1.5463158 1.3008236 -1.40469333
#> 5 -2.596386 -0.06531696  0.2101561 1.4183869 -0.68891197
#> 6 -1.827470  0.23731037 -1.4573221 0.0958377 -1.08928900

song_kmeans <- as.data.frame(song_pca$ind$coord[,1:5])

5.2 Defining Optimum K or Groups

The method for determining the optimal K usually uses the Elbow method. However, in this project, considering the limited data and computational capacity, the Elbow method cannot be used with the usual functions. The function below will find the optimum number of clusters using the Elbow method but with a different code.

mydata <- song_kmeans
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
  for (i in 2:15) wss[i] <- sum(kmeans(mydata,
                                       centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
     ylab="Within groups sum of squares")

Based on the visualization that has been obtained above, it appears that 5 clusters already look optimal. Therefore we will use 5 clusters.

5.3 K-Means Fitting

Model Fitting

# Model Fitting
set.seed(255) # Set seed

song_cluster <- kmeans(song_kmeans,
                       centers = 5)
# Cluster size
song_cluster$size # 5 clusters

#> [1] 65189 44759 84123 10825 27829

song_cluster$tot.withinss # Goodness of Fit

#> [1] 696425.8

The Within Sum of Square score is high. Thus, we might expect that the clustering still has many overlapped data.

5.4 Visualize Clusters

fviz_cluster(object = song_cluster,
             data = song,
             geom = "point",
             ellipse = T) # numeric feature only

6 Conclusion

The resulting clustering with K-Means seems to have still data points that overlap each other and are not strictly separated. For example, clusters 1, 2, and 3 have data points that still overlap. Meanwhile, Clusters 4 and 5 have been separated from each other.

What can be interpreted from clustering using this music data is that music and its genres have a comprehensive spectrum. If we use measurements as in this case, it is still difficult to classify and separate between one type of music with other types of music firmly. In other words, every piece of music in a group is very likely to have different characteristics.

Using PCA, we can see the relationship between variables from the data we use and find out the correlation and contribution of the variables. Combined with the K-Means clustering algorithm, we can generate clustering using our music data.

6.1 Clusters Interpretation

After performing PCA analysis and loading clusters using K-Means, we can label the original data. Afterward, we will interpret them based on the groupings that have been made with K-Means before. We can see the average value of each column or feature.

# assign new label to the previous data
song$group <- song_cluster$cluster

# song profiling
song %>% 
  group_by(group) %>% 
  summarise_all(mean)

We can take the mean of each measurement and group it, and then generate an interpretation.

Group 1 - “Playlist at Bar”
Music in this category has the lowest acoustics and speechiness; relatively high danceability, the highest energy, and instrumentalness compared to others. Then, liveness music includes low, loudness, and high valence, with the fastest tempo. Music in this category is the type of music that can release energy and make the listener dance. The very high tempo makes this type of music suitable for use in bars. Examples of this music category are Disco and EDM.
Group 2 - “Dance Music”
This group has similar characteristics to group 1. This music has criteria for acousticness, high danceability; medium valence; fast tempo, low energy, instrumentalness, liveness, and loudness. Acoustics, and high danceability dominate this music. An example of this category is such as dance music.
Group 3 - “Listen this to Lift up Your Focus”
On the other hand, Group 3 has a significant difference from the previous two groups. Group 3 has the highest valence, with a slower tempo. Then, in terms of danceability, energy, instrumentalness, loudness, and speechiness, this music has a medium level. This music is low on acoustics and liveness. Group 3 is a type of “positivity” music that has a positive effect on the listener. This category can be used to calm the mind or improve focus.
Group 4 - “Pop Music”
This music is predominantly acoustic but with the highest danceability, energy, and speechiness; instrumentalness, liveness, low loudness. Then, this type of music has a medium valence and relatively slow tempo—examples of this type of music such as pop-jazz.
Group 5 - “Acoustic-Instrumentalism”
This Group 5 music includes instrumental acoustic music, which is dominated by instrument sounds. Following the characteristics of the music, this type has the slowest tempo, speechiness, valence, loudness, liveness, energy, and danceability. This type of music is pure instrumental acoustic music.

6.2 References

https://www.routledge.com/Liveness-in-Modern-Music-Musicians-Technology-and-the-Perception-of-Performance/Sanden/p/book/9781138107977
https://www.theverge.com/tldr/2018/2/5/16974194/spotify-recommendation-algorithm-playlist-hack-nelson

Spotify Music Categorization using K-Means Clustering

M Asadullah Al Ghozi

5/20/2021