Music is the language of the spirit. It opens the secret of life bringing peace, abolishing strife. - Kahlil Gibran
Songs and music are examples of the types of art that exist in today’s modern world. With technology, the development of modern music is proliferating. In the case of music itself, many experts make music groupings with their respective criteria. For example, we know the types of music Pop, Rock, and Jazz. However, in its development, the custers of songs and music is not limited to these three genres.
In its development, music has a spectrum and variety of genres that are very branched. A song or music can be interpreted differently by the audience because of subjective measurements based on preferences for the tones, lyrics, and instruments in a piece of music. However, it turns out that music grouping can be done more objectively through measurement and custering in a mathematical way.
This project attempt to, first, cluster various examples of songs and music based on several criteria that are measured. By doing clustering, we can make more objective clusters. Second, we can even produce new groupings about how music differs from one another. Last, at the end of this project we will make interpretation from those clusters.
library(dplyr)
library(tidyr)
library(GGally)
library(FactoMineR)
library(factoextra)
library(ggplot2)Import the data and safe as df
df <- read.csv("SpotifyFeatures.csv")
glimpse(df)#> Rows: 232,725
#> Columns: 18
#> $ ï..genre <chr> "Movie", "Movie", "Movie", "Movie", "Movie", "Movie",~
#> $ artist_name <chr> "Henri Salvador", "Martin & les fées", "Joseph Willi~
#> $ track_name <chr> "C'est beau de faire un Show", "Perdu d'avance (par G~
#> $ track_id <chr> "0BRjO6ga9RKCKjfDqeFgWV", "0BjC1NfoEOOusryehmNudP", "~
#> $ popularity <int> 0, 1, 3, 0, 4, 0, 2, 15, 0, 10, 0, 2, 4, 3, 0, 0, 0, ~
#> $ acousticness <dbl> 0.61100, 0.24600, 0.95200, 0.70300, 0.95000, 0.74900,~
#> $ danceability <dbl> 0.389, 0.590, 0.663, 0.240, 0.331, 0.578, 0.703, 0.41~
#> $ duration_ms <int> 99373, 137373, 170267, 152427, 82625, 160627, 212293,~
#> $ energy <dbl> 0.9100, 0.7370, 0.1310, 0.3260, 0.2250, 0.0948, 0.270~
#> $ instrumentalness <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.123~
#> $ key <chr> "C#", "F#", "C", "C#", "F", "C#", "C#", "F#", "C", "G~
#> $ liveness <dbl> 0.3460, 0.1510, 0.1030, 0.0985, 0.2020, 0.1070, 0.105~
#> $ loudness <dbl> -1.828, -5.559, -13.879, -12.178, -21.150, -14.970, -~
#> $ mode <chr> "Major", "Minor", "Minor", "Major", "Major", "Major",~
#> $ speechiness <dbl> 0.0525, 0.0868, 0.0362, 0.0395, 0.0456, 0.1430, 0.953~
#> $ tempo <dbl> 166.969, 174.003, 99.488, 171.758, 140.576, 87.479, 8~
#> $ time_signature <chr> "4-Apr", "4-Apr", "4-May", "4-Apr", "4-Apr", "4-Apr",~
#> $ valence <dbl> 0.8140, 0.8160, 0.3680, 0.2270, 0.3900, 0.3580, 0.533~
df1 <- df %>%
mutate(genre = as.factor(ï..genre),
artist_name = as.factor(artist_name),
key = as.factor(key),
mode = as.factor(mode),
time_signature = as.factor(time_signature))
head(df1)# NA Checking
colSums(is.na(df1))#> ï..genre artist_name track_name track_id
#> 0 0 0 0
#> popularity acousticness danceability duration_ms
#> 0 0 0 0
#> energy instrumentalness key liveness
#> 0 0 0 0
#> loudness mode speechiness tempo
#> 0 0 0 0
#> time_signature valence genre
#> 0 0 0
There is zero NA values in this dataset.
# take only numeric column
song <- df1 %>%
select_if(is.numeric) %>%
select(-c(popularity, duration_ms))
glimpse(song)#> Rows: 232,725
#> Columns: 9
#> $ acousticness <dbl> 0.61100, 0.24600, 0.95200, 0.70300, 0.95000, 0.74900,~
#> $ danceability <dbl> 0.389, 0.590, 0.663, 0.240, 0.331, 0.578, 0.703, 0.41~
#> $ energy <dbl> 0.9100, 0.7370, 0.1310, 0.3260, 0.2250, 0.0948, 0.270~
#> $ instrumentalness <dbl> 0.00000000, 0.00000000, 0.00000000, 0.00000000, 0.123~
#> $ liveness <dbl> 0.3460, 0.1510, 0.1030, 0.0985, 0.2020, 0.1070, 0.105~
#> $ loudness <dbl> -1.828, -5.559, -13.879, -12.178, -21.150, -14.970, -~
#> $ speechiness <dbl> 0.0525, 0.0868, 0.0362, 0.0395, 0.0456, 0.1430, 0.953~
#> $ tempo <dbl> 166.969, 174.003, 99.488, 171.758, 140.576, 87.479, 8~
#> $ valence <dbl> 0.8140, 0.8160, 0.3680, 0.2270, 0.3900, 0.3580, 0.533~
Of the many columns of the original dataset, the column with numeric values (excluding categorical) will be used in PCA and Clustering analysis.
Before we carry out further preparation and analysis, we should understand the definition of each variable that we use. The following is an explanation of each variable:
Acousticness: A confidence measure from 0.0 to 1.0 of whether the track is acoustic. 1.0 represents high confidence the track is acoustic.
Danceability: Describes how suitable a track is for dancing based on a combination of musical elements including tempo, rhythm stability, beat strength, and overall regularity. A value of 0.0 is least danceable and 1.0 is most danceable.Energy is a measure from 0.0 to 1.0 and represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Energy: represents a perceptual measure of intensity and activity. Typically, energetic tracks feel fast, loud, and noisy.
Instrumentalness: Predicts whether a track contains no vocals. “Ooh” and “aah” sounds are treated as instrumental in this context.
Livenes: assertion of human presence within a technological network of communication (Sanden 2013)[https://www.routledge.com/Liveness-in-Modern-Music-Musicians-Technology-and-the-Perception-of-Performance/Sanden/p/book/9781138107977]
Loudness: That attribute of auditory sensation in terms of which sounds can be ordered on a scale extending from quiet to loud.
Speechiness: Speechiness detects the presence of spoken words in a track. The more exclusively speech-like the recording (e.g. talk show, audio book, poetry), the closer to 1.0 the attribute value.
Tempo: The overall estimated tempo of a track in beats per minute (BPM).
Valence: describing the musical positiveness conveyed by a track. Tracks with high valence sound more positive (e.g. happy, cheerful, euphoric), while tracks with low valence sound more negative (e.g. sad, depressed, angry).
ggcorr(song, label = T)summary(song)#> acousticness danceability energy instrumentalness
#> Min. :0.0000 Min. :0.0569 Min. :0.0000203 Min. :0.0000000
#> 1st Qu.:0.0376 1st Qu.:0.4350 1st Qu.:0.3850000 1st Qu.:0.0000000
#> Median :0.2320 Median :0.5710 Median :0.6050000 Median :0.0000443
#> Mean :0.3686 Mean :0.5544 Mean :0.5709577 Mean :0.1483012
#> 3rd Qu.:0.7220 3rd Qu.:0.6920 3rd Qu.:0.7870000 3rd Qu.:0.0358000
#> Max. :0.9960 Max. :0.9890 Max. :0.9990000 Max. :0.9990000
#> liveness loudness speechiness tempo
#> Min. :0.00967 Min. :-52.457 Min. :0.0222 Min. : 30.38
#> 1st Qu.:0.09740 1st Qu.:-11.771 1st Qu.:0.0367 1st Qu.: 92.96
#> Median :0.12800 Median : -7.762 Median :0.0501 Median :115.78
#> Mean :0.21501 Mean : -9.570 Mean :0.1208 Mean :117.67
#> 3rd Qu.:0.26400 3rd Qu.: -5.501 3rd Qu.:0.1050 3rd Qu.:139.05
#> Max. :1.00000 Max. : 3.744 Max. :0.9670 Max. :242.90
#> valence
#> Min. :0.0000
#> 1st Qu.:0.2370
#> Median :0.4440
#> Mean :0.4549
#> 3rd Qu.:0.6600
#> Max. :1.0000
The data used has nine columns with different rating scales. It can be seen that acousticness, daceability, energy, instrumentalness, liveness, speechiness, and valence have a similar scale, namely 0 -1. Meanwhile, the loudness and tempo columns have different scales.
# Covarians Matrix
head(var(song))#> acousticness danceability energy instrumentalness
#> acousticness 0.125860362 -0.024004549 -0.06781644 0.033958915
#> danceability -0.024004549 0.034450413 0.01593181 -0.020508345
#> energy -0.067816440 0.015931805 0.06940883 -0.030227884
#> instrumentalness 0.033958915 -0.020508345 -0.03022788 0.091668682
#> liveness 0.004853762 -0.001534008 0.01007115 -0.008055978
#> loudness -1.468729115 0.488376611 1.28963126 -0.919510999
#> liveness loudness speechiness tempo valence
#> acousticness 0.004853762 -1.46872911 0.009933929 -2.6116544 -0.030059094
#> danceability -0.001534008 0.48837661 0.004633400 0.1258228 0.026411285
#> energy 0.010071149 1.28963126 0.007092851 1.8623333 0.029925682
#> instrumentalness -0.008055978 -0.91951100 -0.009950208 -0.9741865 -0.024214148
#> liveness 0.039312018 0.05433307 0.018764818 -0.3146191 0.000608679
#> loudness 0.054333071 35.97844681 -0.002529084 42.3244475 0.623816414
In unsupervised learning, specifically on Principal Component Analysis, differences in scale will significantly affect PCA results and flawed analysis. Variables with a larger scale will have more influence, and variables with a small scale will have no visible contribution to the PCA. Therefore, before performing PCA, we must standardize using the z-score.
The scale() function will reset the data scale using the z-score. After that, we can make PCA.
songz <- scale(song)
song_pca <- PCA(X = songz,
scale.unit = F, # scale F karena sebelumnya sudah dilakukan scaling
graph = F,
ncp = 9)
data.frame(song_pca$eig)By looking at the eigenvalues of PCA, we can see which components have contributed to PCA. Comp 1 and Comp 2 will try to capture the most information compared to other Comp. From the cumulative percentage, Comp 1 and 2 have won 56% of the data. The more Comp, the smaller the contribution. In this case, we will take at least 85% of the information. Of the nine comps, we will take up to comp 5. Taking 85% of the information means we lose 15% of the information that is not retrieved.
PCA allows us to know the contribution of each variable to a particular Comp. For example, we want to know which variables contribute to Comp 1 (as Comp captures the most information). We found that loudness, energy, acoustics, valence, and danceability variables (which crossed the red line) were significant variables.
fviz_contrib(song_pca,
choice = "var", # kontribusi variable
axes = 1) # mengambil PC yang ke berapaSubtitute axes become 2 to see the contribution to Comp 2.
fviz_contrib(song_pca,
choice = "var", # kontribusi variable
axes = 2) # PC ke berapa?Only speechiness and liveness are seen to be significant.
PCA analysis will generally show two plots, commonly called biplots. Biplot will show the distribution of individual data and the correlation between variables and their contribution to Comp.
song_pca2 <- prcomp(song, scale = T)
pca_vis <- biplot(song_pca2,
cex = 0.5,
scale = F)pca_vis#> NULL
Individual plots will plot each observation onto a two-dimensional graph. Due to a large amount of data, we get a graph with very dense points. In the chart below, less information can be obtained.
plot_pca <- plot.PCA(x = song_pca,
choix = "ind",
select = "contrib 10")
plot_pcaTo see the variables and their relation to Comp 1 and Comp 2, we use a plot.PCA with Choix = “var”. The graph below does not show the distribution points but only the variable arrows. From the direction of the arrow and the length of the arrow, we can take insight.
plot.PCA(x = song_pca,
choix = "var")Horizontal or vertical arrows indicate the tendency for these variables to be summarized the most by PC 1 or PC 2. PC 1 summarizes the most information from energy, valence, and loudness variables. While PC 2 mainly summarizes information from speechiness and liveness. Tempo, acousticness, and instrumentalness have a not too significant contribution compared to other variables.
fviz_pca_var(song_pca, col.var="contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE # Avoid text overlapping
)Arrow length
The length of the arrow is an indicator of the magnitude of the contribution of these variables to PC 1 or PC 2. For PC 1, acousticness, loudness, and energy contribute significantly, while speechiness and liveness variables contribute significantly to PC 2.
Angle between arrows
The narrower the angle between the arrows indicates the high correlation between variables. An angle less than 90 is a positive correlation. An angle closer to 180 degrees means a negative correlation, and an angle closer to 90 degrees means a weak correlation.
head(song_pca$ind$coord[,1:5])#> Dim.1 Dim.2 Dim.3 Dim.4 Dim.5
#> 1 1.536645 0.01776867 1.2923246 1.1526013 0.17904537
#> 2 1.733206 -0.72461861 0.4588937 1.5785938 0.08757924
#> 3 -1.718394 -0.10520193 -1.5885322 0.5148125 -1.14236750
#> 4 -1.525497 -0.79869202 1.5463158 1.3008236 -1.40469333
#> 5 -2.596386 -0.06531696 0.2101561 1.4183869 -0.68891197
#> 6 -1.827470 0.23731037 -1.4573221 0.0958377 -1.08928900
song_kmeans <- as.data.frame(song_pca$ind$coord[,1:5])The method for determining the optimal K usually uses the Elbow method. However, in this project, considering the limited data and computational capacity, the Elbow method cannot be used with the usual functions. The function below will find the optimum number of clusters using the Elbow method but with a different code.
mydata <- song_kmeans
wss <- (nrow(mydata)-1)*sum(apply(mydata,2,var))
for (i in 2:15) wss[i] <- sum(kmeans(mydata,
centers=i)$withinss)
plot(1:15, wss, type="b", xlab="Number of Clusters",
ylab="Within groups sum of squares")Based on the visualization that has been obtained above, it appears that 5 clusters already look optimal. Therefore we will use 5 clusters.
Model Fitting
# Model Fitting
set.seed(255) # Set seed
song_cluster <- kmeans(song_kmeans,
centers = 5)
# Cluster size
song_cluster$size # 5 clusters#> [1] 65189 44759 84123 10825 27829
song_cluster$tot.withinss # Goodness of Fit#> [1] 696425.8
The Within Sum of Square score is high. Thus, we might expect that the clustering still has many overlapped data.
fviz_cluster(object = song_cluster,
data = song,
geom = "point",
ellipse = T) # numeric feature onlyThe resulting clustering with K-Means seems to have still data points that overlap each other and are not strictly separated. For example, clusters 1, 2, and 3 have data points that still overlap. Meanwhile, Clusters 4 and 5 have been separated from each other.
What can be interpreted from clustering using this music data is that music and its genres have a comprehensive spectrum. If we use measurements as in this case, it is still difficult to classify and separate between one type of music with other types of music firmly. In other words, every piece of music in a group is very likely to have different characteristics.
Using PCA, we can see the relationship between variables from the data we use and find out the correlation and contribution of the variables. Combined with the K-Means clustering algorithm, we can generate clustering using our music data.
After performing PCA analysis and loading clusters using K-Means, we can label the original data. Afterward, we will interpret them based on the groupings that have been made with K-Means before. We can see the average value of each column or feature.
# assign new label to the previous data
song$group <- song_cluster$cluster
# song profiling
song %>%
group_by(group) %>%
summarise_all(mean)We can take the mean of each measurement and group it, and then generate an interpretation.
Group 1 - “Playlist at Bar”
Music in this category has the lowest acoustics and speechiness; relatively high danceability, the highest energy, and instrumentalness compared to others. Then, liveness music includes low, loudness, and high valence, with the fastest tempo. Music in this category is the type of music that can release energy and make the listener dance. The very high tempo makes this type of music suitable for use in bars. Examples of this music category are Disco and EDM.
Group 2 - “Dance Music”
This group has similar characteristics to group 1. This music has criteria for acousticness, high danceability; medium valence; fast tempo, low energy, instrumentalness, liveness, and loudness. Acoustics, and high danceability dominate this music. An example of this category is such as dance music.
Group 3 - “Listen this to Lift up Your Focus”
On the other hand, Group 3 has a significant difference from the previous two groups. Group 3 has the highest valence, with a slower tempo. Then, in terms of danceability, energy, instrumentalness, loudness, and speechiness, this music has a medium level. This music is low on acoustics and liveness. Group 3 is a type of “positivity” music that has a positive effect on the listener. This category can be used to calm the mind or improve focus.
Group 4 - “Pop Music”
This music is predominantly acoustic but with the highest danceability, energy, and speechiness; instrumentalness, liveness, low loudness. Then, this type of music has a medium valence and relatively slow tempo—examples of this type of music such as pop-jazz.
Group 5 - “Acoustic-Instrumentalism”
This Group 5 music includes instrumental acoustic music, which is dominated by instrument sounds. Following the characteristics of the music, this type has the slowest tempo, speechiness, valence, loudness, liveness, energy, and danceability. This type of music is pure instrumental acoustic music.