Spotify Music Analysis

Spotify Music Analysis

#Analyzing Popularity by Song Attributes

Author: Caitlin, Katelyn, Aidan

Hypothesis 1: Energy vs. Popularity

Our hypothesis is that higher-energy songs tend to be more popular. We believe energy level may correlate with popularity.

Hypothesis 2: Genre vs. Danceability

We hypothesize that dance-focused genres will have higher danceability scores, helping us understand genre-based differences.

Hypothesis 3: Acousticness vs. Popularity by Genre

Our hypothesis is that genres with higher acousticness might exhibit varying popularity levels.

Energy vs Popularity

Danceability vs Popularity

Accousticness vs Popularity by Genre

Using Machine Learning to Predict Song Popularity

Goal: Use ML to identify which attributes most influence song popularity.
  • **What did we do?

  • **We used Machine Learning (ML) techniques to predict song popularity based on attributes such as energy, danceability, and acousticness. Two models were applied: Linear Regression for simplicity and Random Forest for enhanced prediction accuracy.

Values of popularity, energy, danceability, and acousticness

                 title        artist       top.genre year bpm nrgy dnce dB live
1     Hey, Soul Sister         Train      neo mellow 2010  97   89   67 -4    8
2 Love The Way You Lie        Eminem detroit hip hop 2010  87   93   75 -5   52
3              TiK ToK         Kesha       dance pop 2010 120   84   76 -3   29
4          Bad Romance     Lady Gaga       dance pop 2010 119   92   70 -4    8
5 Just the Way You Are    Bruno Mars             pop 2010 109   84   64 -5    9
6                 Baby Justin Bieber    canadian pop 2010  65   86   73 -5   11
  val dur acous spch pop popularity    energy danceability acousticness
1  80 217    19    4  83         31 0.2363399    0.2334955   0.04439971
2  64 263    24   23  82         79 0.5754357    0.2988560   0.93478335
3  71 200    10   14  80         51 0.4822579    0.1594213   0.56871703
4  71 295     0    4  79         14 0.5693527    0.5855671   0.22080275
5  43 221     2    4  78         67 0.1441523    0.1487780   0.87796118
6  54 214     4   14  77         42 0.1457088    0.1790596   0.46728881
    popularity     energy danceability acousticness
415         18 0.54799887   0.65368773  0.008041536
463         63 0.88597485   0.03753984  0.750375905
179         85 0.02368864   0.33111733  0.099271719
526         10 0.73555840   0.60210591  0.572619808
195         10 0.96261534   0.39009471  0.124420746
118         35 0.89171373   0.11210649  0.464568546

Build a linear regression model


Call:
lm(formula = popularity ~ energy + danceability + acousticness, 
    data = train_data)

Residuals:
    Min      1Q  Median      3Q     Max 
-49.578 -24.181  -0.917  24.697  50.257 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept)   50.5272     4.2739  11.822   <2e-16 ***
energy        -1.0673     4.6938  -0.227    0.820    
danceability   0.5892     4.5274   0.130    0.897    
acousticness   0.6701     4.4996   0.149    0.882    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 28.52 on 478 degrees of freedom
Multiple R-squared:  0.0001888, Adjusted R-squared:  -0.006086 
F-statistic: 0.03009 on 3 and 478 DF,  p-value: 0.993
[1] "Mean Squared Error: 757.132166183559"
[1] "Root Mean Squared Error: 27.5160347103931"

Feature Importance in Predicting Song Popularity