Using Machine Learning to Predict Song Popularity*
Goal: Use ML to identify which attributes most influence song popularity.
**What did we do?
**We used Machine Learning (ML) techniques to predict song popularity based on attributes such as energy, danceability, and acousticness. Two models were applied: Linear Regression for simplicity and Random Forest for enhanced prediction accuracy.
#Values of popularity, energy, danceability, and acousticness
This code prepares the Spotify dataset for analysis:
- Load Libraries: Uses dplyr and ggplot2 for data handling and visuals.
- Import Data: Reads the Spotify dataset from a CSV file.
- Fill Missing Data: Adds random placeholder values for popularity, energy, danceability, & acousticness
- Clean Data: Keeps only the necessary columns and removes rows with missing values.
- Split Data: Divides the dataset into 80% training and 20% testing for machine learning.
This ensures the data is ready for accurate analysis and modeling
#Key Takeaways Data Preparation is Crucial: -Missing or incomplete data can skew results. Adding placeholder values ensures the dataset is complete for analysis.
Feature Selection: -Selecting key features like popularity, energy, danceability, and acousticness focuses the analysis on attributes most relevant to predicting song popularity.
Data Cleaning: -Removing missing values reduces noise and improves the quality of the input data for the model.
Training and Testing Split: -Dividing the data ensures the model is trained on one subset and validated on another, which prevents overfitting and improves the reliability of predictions.
#How This Improves Predictions: -Consistency: Cleaning and standardizing the data ensures the machine learning model learns from accurate inputs. -Focus: Limiting the dataset to the most relevant features allows the model to better capture relationships between attributes and popularity. -Validation: Testing on separate data evaluates how well the model predicts real-world scenarios, increasing confidence in its predictions.
title artist top.genre year bpm nrgy dnce dB live
1 Hey, Soul Sister Train neo mellow 2010 97 89 67 -4 8
2 Love The Way You Lie Eminem detroit hip hop 2010 87 93 75 -5 52
3 TiK ToK Kesha dance pop 2010 120 84 76 -3 29
4 Bad Romance Lady Gaga dance pop 2010 119 92 70 -4 8
5 Just the Way You Are Bruno Mars pop 2010 109 84 64 -5 9
6 Baby Justin Bieber canadian pop 2010 65 86 73 -5 11
val dur acous spch pop popularity energy danceability acousticness
1 80 217 19 4 83 31 0.2363399 0.2334955 0.04439971
2 64 263 24 23 82 79 0.5754357 0.2988560 0.93478335
3 71 200 10 14 80 51 0.4822579 0.1594213 0.56871703
4 71 295 0 4 79 14 0.5693527 0.5855671 0.22080275
5 43 221 2 4 78 67 0.1441523 0.1487780 0.87796118
6 54 214 4 14 77 42 0.1457088 0.1790596 0.46728881
popularity energy danceability acousticness
415 18 0.54799887 0.65368773 0.008041536
463 63 0.88597485 0.03753984 0.750375905
179 85 0.02368864 0.33111733 0.099271719
526 10 0.73555840 0.60210591 0.572619808
195 10 0.96261534 0.39009471 0.124420746
118 35 0.89171373 0.11210649 0.464568546