The topic of this data analysis is Spotify song characteristics, focusing on various musical features and their impact on song popularity. The primary variables include song title, artist, duration, explicit content status, release year, popularity, and musical features such as danceability, energy, and tempo. This data, sourced from Spotify, provides a comprehensive view of the musical landscape over the years and helps uncover trends in popular music.
Streaming platforms have revolutionized the music industry, offering vast libraries of songs and valuable data. Analyzing this data reveals listener preferences and the characteristics that make songs popular. This topic is significant as it aligns with my interest in understanding the dynamics of the music industry and the factors that drive the success of songs
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(highcharter)
Registered S3 method overwritten by 'quantmod':
method from
as.zoo.data.frame zoo
Rows: 2000 Columns: 18
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): artist, song, genre
dbl (14): duration_ms, year, popularity, danceability, energy, key, loudness...
lgl (1): explicit
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
artist song duration_ms explicit
Rihanna : 25 Sorry : 5 Min. :113000 Mode :logical
Drake : 23 Breathe: 3 1st Qu.:203580 FALSE:1449
Eminem : 21 Closer : 3 Median :223280 TRUE :551
Calvin Harris : 20 Don't : 3 Mean :228748
Britney Spears: 19 Faded : 3 3rd Qu.:248133
David Guetta : 18 Higher : 3 Max. :484146
(Other) :1874 (Other):1980
year popularity danceability energy
Min. :1998 Min. : 0.00 Min. :0.1290 Min. :0.0549
1st Qu.:2004 1st Qu.:56.00 1st Qu.:0.5810 1st Qu.:0.6220
Median :2010 Median :65.50 Median :0.6760 Median :0.7360
Mean :2009 Mean :59.87 Mean :0.6674 Mean :0.7204
3rd Qu.:2015 3rd Qu.:73.00 3rd Qu.:0.7640 3rd Qu.:0.8390
Max. :2020 Max. :89.00 Max. :0.9750 Max. :0.9990
key loudness mode speechiness
Min. : 0.000 Min. :-20.514 Min. :0.0000 Min. :0.02320
1st Qu.: 2.000 1st Qu.: -6.490 1st Qu.:0.0000 1st Qu.:0.03960
Median : 6.000 Median : -5.285 Median :1.0000 Median :0.05985
Mean : 5.378 Mean : -5.512 Mean :0.5535 Mean :0.10357
3rd Qu.: 8.000 3rd Qu.: -4.168 3rd Qu.:1.0000 3rd Qu.:0.12900
Max. :11.000 Max. : -0.276 Max. :1.0000 Max. :0.57600
acousticness instrumentalness liveness valence
Min. :0.0000192 Min. :0.0000000 Min. :0.0215 Min. :0.0381
1st Qu.:0.0140000 1st Qu.:0.0000000 1st Qu.:0.0881 1st Qu.:0.3867
Median :0.0557000 Median :0.0000000 Median :0.1240 Median :0.5575
Mean :0.1289549 Mean :0.0152260 Mean :0.1812 Mean :0.5517
3rd Qu.:0.1762500 3rd Qu.:0.0000683 3rd Qu.:0.2410 3rd Qu.:0.7300
Max. :0.9760000 Max. :0.9850000 Max. :0.8530 Max. :0.9730
tempo genre
Min. : 60.02 pop :428
1st Qu.: 98.99 hip hop, pop :277
Median :120.02 hip hop, pop, R&B :244
Mean :120.12 pop, Dance/Electronic:221
3rd Qu.:134.27 pop, R&B :178
Max. :210.85 hip hop :124
(Other) :528