Week 2 | Data Dive - Summaries

Introduction

The world of music streaming is abuzz with catchy tunes vying for our ears and attention. But what makes a song truly soar to the top of the charts and capture our hearts? In this exploration, we dive into the fascinating world of Spotify’s most streamed songs in 2023, armed with a dataset of nearly 1,000 musical champions. Our mission: to uncover the hidden gems within these songs, the attributes that contribute to their meteoric rise on the streaming platform.

Key Features:

This treasure trove of data holds a wealth of information about each song, like its name, artists, release date, and even its musical pulse and mood.

  • track_name: Name of the song

  • artist(s)_name: Name of the artist(s) of the song

  • artist_count: Number of artists contributing to the song

  • released_year: Year when the song was released

  • released_month: Month when the song was released

  • released_day: Day of the month when the song was released

  • in_spotify_playlists: Number of Spotify playlists the song is included in

  • in_spotify_charts: Presence and rank of the song on Spotify charts

  • streams: Total number of streams on Spotify

  • in_apple_playlists: Number of Apple Music playlists the song is included in

  • in_apple_charts: Presence and rank of the song on Apple Music charts

  • in_deezer_playlists: Number of Deezer playlists the song is included in

  • in_deezer_charts: Presence and rank of the song on Deezer charts

  • in_shazam_charts: Presence and rank of the song on Shazam charts

  • bpm: Beats per minute, a measure of song tempo

  • key: Key of the song

  • mode: Mode of the song (major or minor)

  • danceability_%: Percentage indicating how suitable the song is for dancing

  • valence_%: Positivity of the song’s musical content

  • energy_%: Perceived energy level of the song

  • acousticness_%: Amount of acoustic sound in the song

  • instrumentalness_%: Amount of instrumental content in the song

  • liveness_%: Presence of live performance elements

  • speechiness_%: Amount of spoken words in the song

Working Directory

Set the working directory path by either using

  • setdw() function

or

  • GUI Session -> Set Working Directory -> choose directory
getwd()
## [1] "D:/INFO-H 510 Statistics for Datascience/Data Dive"

Libraries

library(ggplot2)

Load Data

data <- read.csv("spotify-2023.csv")

View Table

The View() function in R can be used to invoke a spreadsheet-style data viewer within RStudio.

Using GUI - under Environment tab -> Data -> Table icon

View(data)

First n records

head() function in R language is used to get the first parts of a vector, matrix, table, data frame or function.

Syntax: head(x, n)

Parameters:
x: specified data frame variable
n: number of row need to be printed

head(data,3)
##                            track_name   artist.s._name artist_count
## 1 Seven (feat. Latto) (Explicit Ver.) Latto, Jung Kook            2
## 2                                LALA      Myke Towers            1
## 3                             vampire   Olivia Rodrigo            1
##   released_year released_month released_day in_spotify_playlists
## 1          2023              7           14                  553
## 2          2023              3           23                 1474
## 3          2023              6           30                 1397
##   in_spotify_charts   streams in_apple_playlists in_apple_charts
## 1               147 141381703                 43             263
## 2                48 133716286                 48             126
## 3               113 140003974                 94             207
##   in_deezer_playlists in_deezer_charts in_shazam_charts bpm key  mode
## 1                  45               10              826 125   B Major
## 2                  58               14              382  92  C# Major
## 3                  91               14              949 138   F Major
##   danceability_. valence_. energy_. acousticness_. instrumentalness_.
## 1             80        89       83             31                  0
## 2             71        61       74              7                  0
## 3             51        32       53             17                  0
##   liveness_. speechiness_.
## 1          8             4
## 2         10             4
## 3         31             6

Column Summaries

Numeric columns summaries

summary() is used to return the following from the given data.

  • Min: The minimum value in the given data

  • 1st Qu: The value of the 1st quartile (25th percentile) in the given data

  • Median: The median value in the given data

  • 3rd Qu: The value of the 3rd quartile (75th percentile) in the given data

  • Max: The maximum value in the given data

Artist_count: Number of artists contributing to the song

bpm: Beats per minute, a measure of song tempo

summary(data$artist_count)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   1.000   1.000   1.556   2.000   8.000
summary(data$bpm)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    65.0   100.0   121.0   122.5   140.0   206.0

Categorical column summaries

table() function in R language is used to create a categorical representation of data with variable name and the frequency in the form of a table.

Syntax: table(x)

Parameters:s
x: Column to be summarized

key: Key of the song

mode: Mode of the song (major or minor)

table(data$key) 
## 
##       A  A#   B  C#   D  D#   E   F  F#   G  G# 
##  95  75  57  81 120  81  33  62  89  73  96  91
table(data$mode)
## 
## Major Minor 
##   550   403

Project Goals/Aim

Analyze the contribution of different attributes to a song’s success on the Spotify streaming platform, using the top streamed songs of 2023

Novel Questions

  1. Danceability Variation in Major and Minor Keys

  2. Does the release date of a song influence its popularity?

  3. Correlation between the number of Spotify streams and the song’s energy level?

Aggregation function

# Aggregation Function for Danceability
danceability_aggregation <- aggregate(data$danceability_ ~ data$mode, data = data, FUN = mean)

danceability_aggregation
##   data$mode data$danceability_
## 1     Major           65.23818
## 2     Minor           69.33251

Interpretation:

The aggregation results reveal interesting insights into the danceability of songs categorized by major and minor keys:

  • Major Key Danceability: The average danceability for songs in the major key is approximately 65.24.

  • Minor Key Danceability: In contrast, songs in the minor key exhibit a higher average danceability, around 69.33.

These findings suggest that songs in minor keys tend to have a slightly higher average danceability compared to those in major keys. Possible reasons or hypotheses based on music theory include:

  1. Emotional Intensity: Minor keys are often associated with a more melancholic or emotional tone. The heightened emotional intensity in minor-key songs might contribute to a greater sense of rhythm and danceability.

  2. Melodic Patterns: Minor keys may encourage certain melodic patterns or rhythmic structures that resonate well with danceable music styles. The inherent characteristics of minor keys might align with popular dance music trends.

  3. Genre Influence: Different music genres often favor specific keys. If the dataset includes a variety of genres, the observed differences in danceability could be influenced by genre preferences for major or minor keys.

  4. Cultural Trends: Cultural and regional music preferences can influence the popularity of major or minor keys in danceable songs. Analyzing regional variations might provide additional insights.

These hypotheses provide a starting point for further exploration and understanding the nuanced relationship between key characteristics and danceability in the context of your dataset.

Visualization

Does the release date of a song influence its popularity

data |>
  ggplot(aes(x = released_year, y = streams, color = as.factor(released_year))) +
  geom_point() +
  theme_classic() +
  labs(title = "Song Streams Over Years",
       x = "Release Year",
       y = "Streams",
       color = "Release Year")

Interpretation:

There is a general upward trend in the number of song streams over time. This is likely due to a number of factors, including the increasing popularity of music streaming services, the growing global population, and the increasing ease of music production and distribution.

Correlation between the number of Spotify streams and the song’s energy level

data|>
  ggplot(aes(x = streams, y = energy_.)) +
  geom_point() +
  geom_smooth(method = "lm", se = FALSE) +
  labs(title = "Correlation between Spotify Streams and Energy Level",
       x = "Spotify Streams",
       y = "Energy Level")
## `geom_smooth()` using formula = 'y ~ x'

cor(data$streams,data$energy_.)
## [1] -0.02631091

Interpretation:

The correlation coefficient being close to zero (-0.02631091) implies that there is almost no linear relationship between Spotify streams and the energy level of songs.

The weak negative correlation indicates that as Spotify streams increase, there is a slight tendency for energy levels to decrease, but the association is very minimal.