Spotify is a Swedish audio streaming and media services provider founded on 23rd April 2006 by Daniel Ek and Martin Lorentzon.
It substantially alters the music production industry’s operating model, as instead of purchasing a large number of tapes and CDs, one may listen to whatever music they want, whenever and anywhere they want, using their smart phone or tablet.
Spotify analyzes client listening habits by asking questions when they initially log in, such as what are your favorite music genres, and uses machine learning algorithms to recomend our favorite songs daily and weekly in the big data era. Our listening history is also being compiled in order to determine our preferences.
It’s a topic of interest as to how Spotify categorizes songs into broad categories. What are the characteristics of each genre, and how are they used to categorize them? In this project, we’ll go deeper into the issues.
Each song is assigned 12 audio features, 6 broad genres, and 24 subgenres based on the data. In the following sections, we will concentrate on these 14 variables.
The purpose of doing this project is to:
understand the relation between different features
identify patterns in different audio charecteristics with respect to different genres
understand which features makes a song popular
To fulfill those goals, we will perform:
checking correlation between different
Learning Corelation between features
library(tibble) : Used to create tibbles
library(tidyr) : Used to tidy up data
library(prettydoc) : Document themes for R Markdown
library(DT) : used for displaying R data objects (matrices or data frames) as tables on HTML pages
library(lubridate : used for date/time functions
library(magrittr) : used for piping
library(ggplot2) : used for data visualization
library(dplyr) : used for data manipulation
library(corrplot) : for displaying correlation matrices and confidence intervals
library(tm) : for text mining the “Genre” column
library(treemap) : For visualizing the treemap plots
library(factoextra) : For visualizing the clusters
In this section, we’ll go over the procedures for preparing data for analysis.
We will be using the data set provided in the curriculum
The data used for analyzing the songs played on Spotify is sourced via the spotifyr package. This package was created by Charlie Thompson, Josiah Parry, Donal Phipps, and Tom Wolff to make it easier to acquire your own data or generic metadata from Spotify’s API. Check out the webpage for the spotifyr program to learn how to collect your own data!
# Get the Data
library(tibble)
url <- "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2020/2020-01-21/spotify_songs.csv"
spotify_df <- as_data_frame(read.csv(url, stringsAsFactors = FALSE))head(spotify_df)## # A tibble: 6 x 23
## track_id track_name track_artist track_popularity track_album_id
## <chr> <chr> <chr> <int> <chr>
## 1 6f807x0ima9a1j3VPbc7VN I Don't C~ Ed Sheeran 66 2oCs0DGTsRO98~
## 2 0r7CVbZTWZgbTCYdfa2P31 Memories ~ Maroon 5 67 63rPSO264uRjW~
## 3 1z1Hg7Vb0AhHDiEmnDE79l All the T~ Zara Larsson 70 1HoSmj2eLcsrR~
## 4 75FpbthrwQmzHlBJLuGdC7 Call You ~ The Chainsm~ 60 1nqYsOef1yKKu~
## 5 1e8PAfcKUYoKkxPhrHqw4x Someone Y~ Lewis Capal~ 69 7m7vv9wlQ4i0L~
## 6 7fvUMiyapMsRRxr07cU8Ef Beautiful~ Ed Sheeran 67 2yiy9cd2QktrN~
## # ... with 18 more variables: track_album_name <chr>,
## # track_album_release_date <chr>, playlist_name <chr>, playlist_id <chr>,
## # playlist_genre <chr>, playlist_subgenre <chr>, danceability <dbl>,
## # energy <dbl>, key <int>, loudness <dbl>, mode <int>, speechiness <dbl>,
## # acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
## # tempo <dbl>, duration_ms <int>
dim(spotify_df)## [1] 32833 23
str(spotify_df)## tibble [32,833 x 23] (S3: tbl_df/tbl/data.frame)
## $ track_id : chr [1:32833] "6f807x0ima9a1j3VPbc7VN" "0r7CVbZTWZgbTCYdfa2P31" "1z1Hg7Vb0AhHDiEmnDE79l" "75FpbthrwQmzHlBJLuGdC7" ...
## $ track_name : chr [1:32833] "I Don't Care (with Justin Bieber) - Loud Luxury Remix" "Memories - Dillon Francis Remix" "All the Time - Don Diablo Remix" "Call You Mine - Keanu Silva Remix" ...
## $ track_artist : chr [1:32833] "Ed Sheeran" "Maroon 5" "Zara Larsson" "The Chainsmokers" ...
## $ track_popularity : int [1:32833] 66 67 70 60 69 67 62 69 68 67 ...
## $ track_album_id : chr [1:32833] "2oCs0DGTsRO98Gh5ZSl2Cx" "63rPSO264uRjW1X5E6cWv6" "1HoSmj2eLcsrR0vE9gThr4" "1nqYsOef1yKKuGOVchbsk6" ...
## $ track_album_name : chr [1:32833] "I Don't Care (with Justin Bieber) [Loud Luxury Remix]" "Memories (Dillon Francis Remix)" "All the Time (Don Diablo Remix)" "Call You Mine - The Remixes" ...
## $ track_album_release_date: chr [1:32833] "2019-06-14" "2019-12-13" "2019-07-05" "2019-07-19" ...
## $ playlist_name : chr [1:32833] "Pop Remix" "Pop Remix" "Pop Remix" "Pop Remix" ...
## $ playlist_id : chr [1:32833] "37i9dQZF1DXcZDD7cfEKhW" "37i9dQZF1DXcZDD7cfEKhW" "37i9dQZF1DXcZDD7cfEKhW" "37i9dQZF1DXcZDD7cfEKhW" ...
## $ playlist_genre : chr [1:32833] "pop" "pop" "pop" "pop" ...
## $ playlist_subgenre : chr [1:32833] "dance pop" "dance pop" "dance pop" "dance pop" ...
## $ danceability : num [1:32833] 0.748 0.726 0.675 0.718 0.65 0.675 0.449 0.542 0.594 0.642 ...
## $ energy : num [1:32833] 0.916 0.815 0.931 0.93 0.833 0.919 0.856 0.903 0.935 0.818 ...
## $ key : int [1:32833] 6 11 1 7 1 8 5 4 8 2 ...
## $ loudness : num [1:32833] -2.63 -4.97 -3.43 -3.78 -4.67 ...
## $ mode : int [1:32833] 1 1 0 1 1 1 0 0 1 1 ...
## $ speechiness : num [1:32833] 0.0583 0.0373 0.0742 0.102 0.0359 0.127 0.0623 0.0434 0.0565 0.032 ...
## $ acousticness : num [1:32833] 0.102 0.0724 0.0794 0.0287 0.0803 0.0799 0.187 0.0335 0.0249 0.0567 ...
## $ instrumentalness : num [1:32833] 0.00 4.21e-03 2.33e-05 9.43e-06 0.00 0.00 0.00 4.83e-06 3.97e-06 0.00 ...
## $ liveness : num [1:32833] 0.0653 0.357 0.11 0.204 0.0833 0.143 0.176 0.111 0.637 0.0919 ...
## $ valence : num [1:32833] 0.518 0.693 0.613 0.277 0.725 0.585 0.152 0.367 0.366 0.59 ...
## $ tempo : num [1:32833] 122 100 124 122 124 ...
## $ duration_ms : int [1:32833] 194754 162600 176616 169093 189052 163049 187675 207619 193187 253040 ...
names(spotify_df)## [1] "track_id" "track_name"
## [3] "track_artist" "track_popularity"
## [5] "track_album_id" "track_album_name"
## [7] "track_album_release_date" "playlist_name"
## [9] "playlist_id" "playlist_genre"
## [11] "playlist_subgenre" "danceability"
## [13] "energy" "key"
## [15] "loudness" "mode"
## [17] "speechiness" "acousticness"
## [19] "instrumentalness" "liveness"
## [21] "valence" "tempo"
## [23] "duration_ms"
summary(spotify_df)## track_id track_name track_artist track_popularity
## Length:32833 Length:32833 Length:32833 Min. : 0.00
## Class :character Class :character Class :character 1st Qu.: 24.00
## Mode :character Mode :character Mode :character Median : 45.00
## Mean : 42.48
## 3rd Qu.: 62.00
## Max. :100.00
## track_album_id track_album_name track_album_release_date
## Length:32833 Length:32833 Length:32833
## Class :character Class :character Class :character
## Mode :character Mode :character Mode :character
##
##
##
## playlist_name playlist_id playlist_genre playlist_subgenre
## Length:32833 Length:32833 Length:32833 Length:32833
## Class :character Class :character Class :character Class :character
## Mode :character Mode :character Mode :character Mode :character
##
##
##
## danceability energy key loudness
## Min. :0.0000 Min. :0.000175 Min. : 0.000 Min. :-46.448
## 1st Qu.:0.5630 1st Qu.:0.581000 1st Qu.: 2.000 1st Qu.: -8.171
## Median :0.6720 Median :0.721000 Median : 6.000 Median : -6.166
## Mean :0.6548 Mean :0.698619 Mean : 5.374 Mean : -6.720
## 3rd Qu.:0.7610 3rd Qu.:0.840000 3rd Qu.: 9.000 3rd Qu.: -4.645
## Max. :0.9830 Max. :1.000000 Max. :11.000 Max. : 1.275
## mode speechiness acousticness instrumentalness
## Min. :0.0000 Min. :0.0000 Min. :0.0000 Min. :0.0000000
## 1st Qu.:0.0000 1st Qu.:0.0410 1st Qu.:0.0151 1st Qu.:0.0000000
## Median :1.0000 Median :0.0625 Median :0.0804 Median :0.0000161
## Mean :0.5657 Mean :0.1071 Mean :0.1753 Mean :0.0847472
## 3rd Qu.:1.0000 3rd Qu.:0.1320 3rd Qu.:0.2550 3rd Qu.:0.0048300
## Max. :1.0000 Max. :0.9180 Max. :0.9940 Max. :0.9940000
## liveness valence tempo duration_ms
## Min. :0.0000 Min. :0.0000 Min. : 0.00 Min. : 4000
## 1st Qu.:0.0927 1st Qu.:0.3310 1st Qu.: 99.96 1st Qu.:187819
## Median :0.1270 Median :0.5120 Median :121.98 Median :216000
## Mean :0.1902 Mean :0.5106 Mean :120.88 Mean :225800
## 3rd Qu.:0.2480 3rd Qu.:0.6930 3rd Qu.:133.92 3rd Qu.:253585
## Max. :0.9960 Max. :0.9910 Max. :239.44 Max. :517810
For good analysis results, it’s important to clean our data and make it analysis ready. To clean the data, we will perform the following steps:
We’ll start by looking for duplicate records in the data; duplicate records will skew our results, so it’s critical to dedupe the data before proceeding with the analysis
spotify_df <- spotify_df[!duplicated(spotify_df$track_id),]We can remove various IDs from the dataset because they are just used as unique identifiers and will not affect the analysis
spotify_df <- spotify_df %>%
select(-ends_with("id"))
dim(spotify_df)## [1] 28356 20
Null values can have a significant impact on our analysis and the interpretations we acquire, hence it’s critical to first identify the columns that contain null values and then treat them
Let’s check the percentage of missing values in each of the columns
colSums(is.na(spotify_df))## track_name track_artist track_popularity
## 4 4 0
## track_album_name track_album_release_date playlist_name
## 4 0 0
## playlist_genre playlist_subgenre danceability
## 0 0 0
## energy key loudness
## 0 0 0
## mode speechiness acousticness
## 0 0 0
## instrumentalness liveness valence
## 0 0 0
## tempo duration_ms
## 0 0
There are two columns, track_name and track_artist, both of which have 4 null values. We can eliminate the records because eliminating 4 records will have no effect on our analysis, which comprises of 32833 records
spotify_df <- na.omit(spotify_df)Converting the following variables to factors to facilitate our analysis:
spotify_df <- spotify_df %>%
mutate(playlist_genre = as.factor(spotify_df$playlist_genre),
playlist_subgenre = as.factor(spotify_df$playlist_subgenre),
mode = as.factor(mode),
key = as.factor(key))Let’s check if the conversion was succesful
str(spotify_df)## tibble [28,352 x 20] (S3: tbl_df/tbl/data.frame)
## $ track_name : chr [1:28352] "I Don't Care (with Justin Bieber) - Loud Luxury Remix" "Memories - Dillon Francis Remix" "All the Time - Don Diablo Remix" "Call You Mine - Keanu Silva Remix" ...
## $ track_artist : chr [1:28352] "Ed Sheeran" "Maroon 5" "Zara Larsson" "The Chainsmokers" ...
## $ track_popularity : int [1:28352] 66 67 70 60 69 67 62 69 68 67 ...
## $ track_album_name : chr [1:28352] "I Don't Care (with Justin Bieber) [Loud Luxury Remix]" "Memories (Dillon Francis Remix)" "All the Time (Don Diablo Remix)" "Call You Mine - The Remixes" ...
## $ track_album_release_date: chr [1:28352] "2019-06-14" "2019-12-13" "2019-07-05" "2019-07-19" ...
## $ playlist_name : chr [1:28352] "Pop Remix" "Pop Remix" "Pop Remix" "Pop Remix" ...
## $ playlist_genre : Factor w/ 6 levels "edm","latin",..: 3 3 3 3 3 3 3 3 3 3 ...
## $ playlist_subgenre : Factor w/ 24 levels "album rock","big room",..: 4 4 4 4 4 4 4 4 4 4 ...
## $ danceability : num [1:28352] 0.748 0.726 0.675 0.718 0.65 0.675 0.449 0.542 0.594 0.642 ...
## $ energy : num [1:28352] 0.916 0.815 0.931 0.93 0.833 0.919 0.856 0.903 0.935 0.818 ...
## $ key : Factor w/ 12 levels "0","1","2","3",..: 7 12 2 8 2 9 6 5 9 3 ...
## $ loudness : num [1:28352] -2.63 -4.97 -3.43 -3.78 -4.67 ...
## $ mode : Factor w/ 2 levels "0","1": 2 2 1 2 2 2 1 1 2 2 ...
## $ speechiness : num [1:28352] 0.0583 0.0373 0.0742 0.102 0.0359 0.127 0.0623 0.0434 0.0565 0.032 ...
## $ acousticness : num [1:28352] 0.102 0.0724 0.0794 0.0287 0.0803 0.0799 0.187 0.0335 0.0249 0.0567 ...
## $ instrumentalness : num [1:28352] 0.00 4.21e-03 2.33e-05 9.43e-06 0.00 0.00 0.00 4.83e-06 3.97e-06 0.00 ...
## $ liveness : num [1:28352] 0.0653 0.357 0.11 0.204 0.0833 0.143 0.176 0.111 0.637 0.0919 ...
## $ valence : num [1:28352] 0.518 0.693 0.613 0.277 0.725 0.585 0.152 0.367 0.366 0.59 ...
## $ tempo : num [1:28352] 122 100 124 122 124 ...
## $ duration_ms : int [1:28352] 194754 162600 176616 169093 189052 163049 187675 207619 193187 253040 ...
## - attr(*, "na.action")= 'omit' Named int [1:4] 7669 8693 8694 17666
## ..- attr(*, "names")= chr [1:4] "7669" "8693" "8694" "17666"
We can also observe that duration is given in milliseconds, let’s convert it into minutes
spotify_df <- spotify_df %>% mutate(duration_min = duration_ms/60000)Let’s create a variable to assign rank based on the value in the column track_popularity that we’ve been given.:
spotify_df <- spotify_df %>%
mutate(popularity_group = as.numeric(case_when(
((track_popularity > 0) & (track_popularity < 20)) ~ "1",
((track_popularity >= 20) & (track_popularity < 40))~ "2",
((track_popularity >= 40) & (track_popularity < 60)) ~ "3",
TRUE ~ "4"))
)
table(spotify_df$popularity_group)##
## 1 2 3 4
## 4182 6162 8975 9033
library(DT)
datatable(head(spotify_df,5))Exploratory Data Analysis (EDA) can help us find relevant information in data that isn’t immediately obvious, but only if it’s done appropriately. Before we begin to create a model on the data, EDA is required. We can use EDA to identify data patterns, spot outliers or unusual events, and discover interesting relationships between variables.
We’ll start by looking at the correlation between the variables. Correlation tells us if the variables are interdependent. The magnitude of the correlation helps in determining the relationship’s strength, whilst the sign helps in determining whether the variables are moving in the same direction or in opposite directions
On the basis of the figure, we can see that there are a few variables with a high connection. To avoid multicollinearity, we must either choose one of the variables or use dimensionality reduction techniques
The correlation plot shows that energy and loudness are dependent on each other, let’s plot a scatter plot to visualize the relationship
b <- ggplot(spotify_df,aes(x = energy, y = loudness))
b + geom_point()
The graph indicates a strong relationship between the audio features energy and loudness
Spotify has six broad genres into which songs can be categorized. Let’s look at the number of songs in each genre in our database.
# songs per genre
spotify_df %>% group_by(Genre = playlist_genre) %>%
summarise(No_of_tracks = n()) %>% knitr::kable()| Genre | No_of_tracks |
|---|---|
| edm | 4877 |
| latin | 4136 |
| pop | 5132 |
| r&b | 4504 |
| rap | 5398 |
| rock | 4305 |
From the above data, we can see that pop genre has the maximum number of songs i.e. 5132 out of all the other genres
Let’s check the % of tracks belonging to each of the genres
spotify_df_pie_data <- spotify_df %>%
group_by(playlist_genre) %>%
summarise(Total_number_of_tracks = length(playlist_genre))
ggplot(spotify_df_pie_data, aes(x="", y=Total_number_of_tracks, fill=playlist_genre)) +
geom_bar(width = 1, stat = "identity") +
coord_polar("y", start=0) +
geom_text(aes(label = paste(round(Total_number_of_tracks / sum(Total_number_of_tracks) * 100, 1), "%")),
position = position_stack(vjust = 0.5))We can see from the pie chart that the number of tracks is evenly spread
Let’s see if the amount of tracks in a given genre has an impact on its popularity:
ggplot(spotify_df_bar_data, aes(fill=playlist_genre, y=Total_playlist_genre, x=popularity_group)) +
geom_bar(position="dodge", stat="identity")
This plot can provide some really interesting inferences like:
Let’s now check the relation between genre and the other variables in the data
library(ggpubr)
p1 <- spotify_df %>%
ggplot(aes(x = playlist_genre, y = valence, color = playlist_genre)) +
geom_boxplot(alpha = 0.5, notch = TRUE) +
theme_bw() +
labs(title = 'Which genre is the happiest?', x= 'Genres', y = 'Happiness' )
p2 <- spotify_df %>% ggplot(aes(x = playlist_genre, y = energy, color = playlist_genre)) +
geom_boxplot(alpha = 0.1, notch = TRUE) +
theme_bw() +
labs(title = 'How energetic are different Genres?', x= 'Genres', y = 'Energy' )
p3 <- spotify_df %>% ggplot(aes(x = playlist_genre, y = danceability, color = playlist_genre)) +
geom_boxplot(alpha = 0.5, notch = TRUE) +
theme_bw() +
labs(title = 'Which genre is the most danceable?', x= 'Genres', y = 'Danceability' )
p4 <- spotify_df %>% ggplot(aes(x = playlist_genre, y = tempo, color = playlist_genre)) +
geom_boxplot(alpha = 0.5, notch = TRUE) +
theme_bw() +
labs(title = 'Tempo across different Genres', x= 'Genres', y = 'Tempo' )
ggarrange(p1,p2,p3,p4 , nrow = 2, ncol = 2)
The graphs above depict the genres’ characteristics in terms of happiness, energy, danceability, and tempo. Let’s have a look at some of the findings:
Let’s look at the __top 3 subgenres within each genre
top <- spotify_df %>% select(playlist_genre, playlist_subgenre, track_popularity) %>% group_by(playlist_genre,playlist_subgenre) %>% summarise(n = n()) %>% top_n(3, n)## `summarise()` has grouped output by 'playlist_genre'. You can override using the `.groups` argument.
tm <- treemap(top, index = c("playlist_genre", "playlist_subgenre"), vSize = "n", vColor = 'playlist_genre', palette="RdYlBu")
The top 15 artists within each genre:
top <- spotify_df %>% select(playlist_genre, track_artist, track_popularity) %>% group_by(playlist_genre,track_artist) %>% summarise(n = n()) %>% top_n(15, n)## `summarise()` has grouped output by 'playlist_genre'. You can override using the `.groups` argument.
tm <- treemap(top, index = c("playlist_genre", "track_artist"), vSize = "n", vColor = 'playlist_genre', palette="RdYlBu")
The top 15 albums overall:
library(ggplot2)
library(plotly)
#finding popular artists
popular_artists <- spotify_df %>% group_by(Songs = track_name) %>%
summarise(No_of_tracks = n(),Popularity = mean(track_popularity)) %>%
filter(No_of_tracks > 2) %>%
arrange(desc(Popularity)) %>%
top_n(15, wt = Popularity) %>%
ggplot(aes(x = Songs, y = Popularity)) +
geom_bar(stat = "identity") +
coord_flip() + labs(title = "popular songs overall", x = "Songs", y = "Popularity")
ggplotly(popular_artists)
The top 15 artists overall:
library(ggplot2)
library(plotly)
#finding popular artists
popular_artists <- spotify_df %>% group_by(Artist = track_artist) %>%
summarise(No_of_tracks = n(),Popularity = mean(track_popularity)) %>%
filter(No_of_tracks > 2) %>%
arrange(desc(Popularity)) %>%
top_n(15, wt = Popularity) %>%
ggplot(aes(x = Artist, y = Popularity)) +
geom_bar(stat = "identity") +
coord_flip() + labs(title = "popular artists overall", x = "Artists", y = "Popularity")
ggplotly(popular_artists)To identify the songs that belong to the same group we will be performing K-Means clustering. K-Means clustering will group the songs into groups having similar audio characteristics.
To perform K-Means, we will start by selecting the predictor variables which are - ‘energy’, ‘liveness’,‘tempo’, ‘speechiness’, ‘acousticness’,‘instrumentalness’, ‘danceability’, ‘duration_ms’ ,‘loudness’,‘valence’
spotify.inp <- spotify_df[, c('energy', 'liveness','tempo', 'speechiness', 'acousticness','instrumentalness', 'danceability', 'duration_ms' ,'loudness','valence')]The next step is to scale the data which is performed in order to standardize all the columns
cluster.spotify_df.scaled <- scale(spotify.inp[, c('energy', 'liveness', 'tempo', 'speechiness' , 'acousticness', 'instrumentalness', 'danceability' , 'duration_ms' ,'loudness', 'valence')])K-Means groups the data into K- Clusters, therefore we need to identify the number of optimal groups. We will be using the elbow method to get the optimal number of clusters in our data.
set.seed(100)
fviz_nbclust(spotify.inp[1:2000,], kmeans, method = "wss")
We can see that the dent in the above graph is at 3, so we will select k as 3 and fit the model
k <- kmeans(cluster.spotify_df.scaled, centers = 3)fviz_cluster(k, geom = "point", data = cluster.spotify_df.scaled) + ggtitle("Grouping similar songs")
These are the ideal set of clusters obtained by using K-Means clustering
Let’s check for the song charecteristics within each cluster
insights## # A tibble: 3 x 9
## kclust acousticness danceability energy instrumentalness speechiness valence
## <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.478 0.615 0.440 0.143 0.0889 0.399
## 2 2 0.148 0.742 0.714 0.0274 0.137 0.653
## 3 3 0.0596 0.560 0.812 0.147 0.0800 0.384
## # ... with 2 more variables: liveness <dbl>, track_popularity <dbl>
These are the characteristics of the songs within each cluster. Based on this analysis the artists can determine which group their song will fall into and the average popularity it may have.
1. Objective :
The objective of this study was to understand about the features of different musical genres. Using Spotify data, we also discovered the underlying patterns and relationships among numerous audio parameters that describe music
2. Data :
There were 32833 records and 23 columns in the spotify data used for this analysis. There were both categorical and continuous variables in the dataset. This information was sufficient to analyze the relationship between various genres and auditory characteristics, as well as to examine the most popular songs and artists
3. Methodology :
We started by looking at the relationship between audio features and making scatter plots to explore at the relationship between variables with a high correlation value
Then we looked at some of the genre aspects such as popularity, genres with the most tracks, and various genres and its characteristics such as valence, energy, danceability, and so on
Using a tree map, we found popular subgenres within the 6 broad genres, as well as popular artists
We also came up with a list of the top 15 most popular artists and songs
4. Insights :
Certain audio characteristics were discovered to be substantially correlated with one another, such as energy and loudness, which have a direct strong linear relationship, whereas energy and acoustiness have a strong inverse independence
We discovered that most of the pop songs had high popularity followed by rap. Rap had the most songs, despite the fact that the distribution of tracks in each genre was fairly even
EDM songs were found to be the most energetic, and the tempo variance of EDM tracks was high
Latin songs were found to have the highest valence as well as danceability mostly because of there syncopation feature that makes them sound jubilant and alive, and as can be seen in the graph, they have a high valence value
From our analysis, “Just the Way You Are” was found to be the most popular track and “JACKBOYS” the most popular artist
5. Insights from Model Fitting: