The world of music streaming is abuzz with catchy tunes vying for our ears and attention. But what makes a song truly soar to the top of the charts and capture our hearts? In this exploration, we dive into the fascinating world of Spotify’s most streamed songs in 2023, armed with a dataset of nearly 1,000 musical champions. Our mission: to uncover the hidden gems within these songs, the attributes that contribute to their meteoric rise on the streaming platform.
This treasure trove of data holds a wealth of information about each song, like its name, artists, release date, and even its musical pulse and mood.
track_name: Name of the song
artist(s)_name: Name of the artist(s) of the song
artist_count: Number of artists contributing to the song
released_year: Year when the song was released
released_month: Month when the song was released
released_day: Day of the month when the song was released
in_spotify_playlists: Number of Spotify playlists the song is included in
in_spotify_charts: Presence and rank of the song on Spotify charts
streams: Total number of streams on Spotify
in_apple_playlists: Number of Apple Music playlists the song is included in
in_apple_charts: Presence and rank of the song on Apple Music charts
in_deezer_playlists: Number of Deezer playlists the song is included in
in_deezer_charts: Presence and rank of the song on Deezer charts
in_shazam_charts: Presence and rank of the song on Shazam charts
bpm: Beats per minute, a measure of song tempo
key: Key of the song
mode: Mode of the song (major or minor)
danceability_%: Percentage indicating how suitable the song is for dancing
valence_%: Positivity of the song’s musical content
energy_%: Perceived energy level of the song
acousticness_%: Amount of acoustic sound in the song
instrumentalness_%: Amount of instrumental content in the song
liveness_%: Presence of live performance elements
speechiness_%: Amount of spoken words in the song
Set the working directory path by either using
or
getwd()
## [1] "D:/INFO-H 510 Statistics for Datascience/Data Dive"
library(ggplot2)
data <- read.csv("spotify-2023.csv")
The View() function in R can be used to invoke a spreadsheet-style data viewer within RStudio.
Using GUI - under Environment tab -> Data -> Table icon
View(data)
head() function in R language is used to get the first parts of a vector, matrix, table, data frame or function.
Syntax: head(x, n)
Parameters:
x: specified data frame variable
n: number of row need to be printed
head(data,3)
## track_name artist.s._name artist_count
## 1 Seven (feat. Latto) (Explicit Ver.) Latto, Jung Kook 2
## 2 LALA Myke Towers 1
## 3 vampire Olivia Rodrigo 1
## released_year released_month released_day in_spotify_playlists
## 1 2023 7 14 553
## 2 2023 3 23 1474
## 3 2023 6 30 1397
## in_spotify_charts streams in_apple_playlists in_apple_charts
## 1 147 141381703 43 263
## 2 48 133716286 48 126
## 3 113 140003974 94 207
## in_deezer_playlists in_deezer_charts in_shazam_charts bpm key mode
## 1 45 10 826 125 B Major
## 2 58 14 382 92 C# Major
## 3 91 14 949 138 F Major
## danceability_. valence_. energy_. acousticness_. instrumentalness_.
## 1 80 89 83 31 0
## 2 71 61 74 7 0
## 3 51 32 53 17 0
## liveness_. speechiness_.
## 1 8 4
## 2 10 4
## 3 31 6
summary() is used to return the following from the given data.
Min: The minimum value in the given data
1st Qu: The value of the 1st quartile (25th percentile) in the given data
Median: The median value in the given data
3rd Qu: The value of the 3rd quartile (75th percentile) in the given data
Max: The maximum value in the given data
Artist_count: Number of artists contributing to the song
bpm: Beats per minute, a measure of song tempo
summary(data$artist_count)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 1.000 1.000 1.556 2.000 8.000
summary(data$bpm)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 65.0 100.0 121.0 122.5 140.0 206.0
table() function in R language is used
to create a categorical representation of data with variable name and
the frequency in the form of a table.
Syntax: table(x)
Parameters:s
x: Column to be summarized
key: Key of the song
mode: Mode of the song (major or minor)
table(data$key)
##
## A A# B C# D D# E F F# G G#
## 95 75 57 81 120 81 33 62 89 73 96 91
table(data$mode)
##
## Major Minor
## 550 403
Analyze the contribution of different attributes to a song’s success on the Spotify streaming platform, using the top streamed songs of 2023
Danceability Variation in Major and Minor Keys
Does the release date of a song influence its popularity?
Correlation between the number of Spotify streams and the song’s energy level?
# Aggregation Function for Danceability
danceability_aggregation <- aggregate(data$danceability_ ~ data$mode, data = data, FUN = mean)
danceability_aggregation
## data$mode data$danceability_
## 1 Major 65.23818
## 2 Minor 69.33251
Interpretation:
The aggregation results reveal interesting insights into the danceability of songs categorized by major and minor keys:
Major Key Danceability: The average danceability for songs in the major key is approximately 65.24.
Minor Key Danceability: In contrast, songs in the minor key exhibit a higher average danceability, around 69.33.
These findings suggest that songs in minor keys tend to have a slightly higher average danceability compared to those in major keys. Possible reasons or hypotheses based on music theory include:
Emotional Intensity: Minor keys are often associated with a more melancholic or emotional tone. The heightened emotional intensity in minor-key songs might contribute to a greater sense of rhythm and danceability.
Melodic Patterns: Minor keys may encourage certain melodic patterns or rhythmic structures that resonate well with danceable music styles. The inherent characteristics of minor keys might align with popular dance music trends.
Genre Influence: Different music genres often favor specific keys. If the dataset includes a variety of genres, the observed differences in danceability could be influenced by genre preferences for major or minor keys.
Cultural Trends: Cultural and regional music preferences can influence the popularity of major or minor keys in danceable songs. Analyzing regional variations might provide additional insights.
These hypotheses provide a starting point for further exploration and understanding the nuanced relationship between key characteristics and danceability in the context of your dataset.
Does the release date of a song influence its popularity
data |>
ggplot(aes(x = released_year, y = streams, color = as.factor(released_year))) +
geom_point() +
theme_classic() +
labs(title = "Song Streams Over Years",
x = "Release Year",
y = "Streams",
color = "Release Year")
Interpretation:
There is a general upward trend in the number of song streams over time. This is likely due to a number of factors, including the increasing popularity of music streaming services, the growing global population, and the increasing ease of music production and distribution.
Correlation between the number of Spotify streams and the song’s energy level
data|>
ggplot(aes(x = streams, y = energy_.)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "Correlation between Spotify Streams and Energy Level",
x = "Spotify Streams",
y = "Energy Level")
## `geom_smooth()` using formula = 'y ~ x'
cor(data$streams,data$energy_.)
## [1] -0.02631091
Interpretation:
The correlation coefficient being close to zero (-0.02631091) implies that there is almost no linear relationship between Spotify streams and the energy level of songs.
The weak negative correlation indicates that as Spotify streams increase, there is a slight tendency for energy levels to decrease, but the association is very minimal.