Spotify Music Analysis
#Analyzing Popularity by Song Attributes
Author: Caitlin, Katelyn, Aidan
Hypothesis 1: Energy vs. Popularity
Our hypothesis is that higher-energy songs tend to be more popular. We believe energy level may correlate with popularity.
Hypothesis 2: Genre vs. Danceability
We hypothesize that dance-focused genres will have higher danceability scores, helping us understand genre-based differences.
Hypothesis 3: Acousticness vs. Popularity by Genre
Our hypothesis is that genres with higher acousticness might exhibit varying popularity levels.
Energy vs Popularity
Danceability vs Popularity
Accousticness vs Popularity by Genre
Using Machine Learning to Predict Song Popularity
We used Machine Learning (ML) techniques to predict song popularity based on attributes such as energy, danceability, and acousticness. Two models were applied: Linear Regression for simplicity and Random Forest for enhanced prediction accuracy.
Values of popularity, energy, danceability, and acousticness
title artist top.genre year bpm nrgy dnce dB live
1 Hey, Soul Sister Train neo mellow 2010 97 89 67 -4 8
2 Love The Way You Lie Eminem detroit hip hop 2010 87 93 75 -5 52
3 TiK ToK Kesha dance pop 2010 120 84 76 -3 29
4 Bad Romance Lady Gaga dance pop 2010 119 92 70 -4 8
5 Just the Way You Are Bruno Mars pop 2010 109 84 64 -5 9
6 Baby Justin Bieber canadian pop 2010 65 86 73 -5 11
val dur acous spch pop popularity energy danceability acousticness
1 80 217 19 4 83 31 0.2363399 0.2334955 0.04439971
2 64 263 24 23 82 79 0.5754357 0.2988560 0.93478335
3 71 200 10 14 80 51 0.4822579 0.1594213 0.56871703
4 71 295 0 4 79 14 0.5693527 0.5855671 0.22080275
5 43 221 2 4 78 67 0.1441523 0.1487780 0.87796118
6 54 214 4 14 77 42 0.1457088 0.1790596 0.46728881
popularity energy danceability acousticness
415 18 0.54799887 0.65368773 0.008041536
463 63 0.88597485 0.03753984 0.750375905
179 85 0.02368864 0.33111733 0.099271719
526 10 0.73555840 0.60210591 0.572619808
195 10 0.96261534 0.39009471 0.124420746
118 35 0.89171373 0.11210649 0.464568546
Build a linear regression model
Call:
lm(formula = popularity ~ energy + danceability + acousticness,
data = train_data)
Residuals:
Min 1Q Median 3Q Max
-49.578 -24.181 -0.917 24.697 50.257
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 50.5272 4.2739 11.822 <2e-16 ***
energy -1.0673 4.6938 -0.227 0.820
danceability 0.5892 4.5274 0.130 0.897
acousticness 0.6701 4.4996 0.149 0.882
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 28.52 on 478 degrees of freedom
Multiple R-squared: 0.0001888, Adjusted R-squared: -0.006086
F-statistic: 0.03009 on 3 and 478 DF, p-value: 0.993
[1] "Mean Squared Error: 757.132166183559"
[1] "Root Mean Squared Error: 27.5160347103931"
Feature Importance in Predicting Song Popularity