Project 2

Author

Jhonathan Urquilla

Final Project: Spotify Song Popularity Influences

Source: Spotify website under the artist Katy Perry page

My goal for this project is to explore which audio characteristics contribute most to a song’s success and whether these factors differ across artists. While audio features play a role, a song’s popularity is also shaped by factors beyond the music itself. In today’s digital age, music discovery and consumption happen through multiple channels such as streaming platforms, social media, and curated playlists. These channels use algorithms and marketing strategies that influence which songs gain exposure and reach wide audiences. This suggests that popularity depends not only on a song’s musical qualities but also on its cultural context and promotional support (Music Marketing Monday, 2023).

Loading Libraries we may use and data-set

library(tidyverse)
library(tidyr)
library(leaflet)

setwd("C:/Users/ubjho/Downloads")
songs <- read_csv("spotifysongs.csv")

Looking at data

head(songs)
# A tibble: 6 × 18
  artist   song  duration_ms explicit  year popularity danceability energy   key
  <chr>    <chr>       <dbl> <lgl>    <dbl>      <dbl>        <dbl>  <dbl> <dbl>
1 Britney… Oops…      211160 FALSE     2000         77        0.751  0.834     1
2 blink-1… All …      167066 FALSE     1999         79        0.434  0.897     0
3 Faith H… Brea…      250546 FALSE     1999         66        0.529  0.496     7
4 Bon Jovi It's…      224493 FALSE     2000         78        0.551  0.913     0
5 *NSYNC   Bye …      200560 FALSE     2000         65        0.614  0.928     8
6 Sisqo    Thon…      253733 TRUE      1999         69        0.706  0.888     2
# ℹ 9 more variables: loudness <dbl>, mode <dbl>, speechiness <dbl>,
#   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
#   tempo <dbl>, genre <chr>

Creating new variables

songs <- mutate(songs, explicit = as.factor(explicit))
songs <- mutate(songs, mode = as.factor(mode))
songs <- mutate(songs, year = as.integer(year))
songs <- mutate(songs, duration_min = duration_ms / 60000)
songs <- filter(songs, popularity != "", danceability != "")

Checking top 10 Popularity

songs_sorted <- arrange(songs_2010, desc(popularity))
head(songs_sorted, 10)
# A tibble: 10 × 21
   artist  song  duration_ms explicit  year popularity danceability energy   key
   <chr>   <chr>       <dbl> <fct>    <int>      <dbl>        <dbl>  <dbl> <dbl>
 1 The Ne… Swea…      240400 FALSE     2013         89        0.612  0.807    10
 2 Tom Od… Anot…      244360 TRUE      2013         88        0.445  0.537     4
 3 WILLOW  Wait…      196520 FALSE     2015         86        0.764  0.705     3
 4 Billie… love…      200185 FALSE     2018         86        0.351  0.296     4
 5 Billie… love…      200185 FALSE     2018         86        0.351  0.296     4
 6 Bruno … Lock…      233478 FALSE     2012         85        0.726  0.698     5
 7 Bruno … Lock…      233478 FALSE     2012         85        0.726  0.698     5
 8 The Ne… Dadd…      260173 FALSE     2015         85        0.588  0.521    10
 9 Avicii  The …      176658 FALSE     2014         85        0.527  0.835     6
10 Ed She… Perf…      263400 FALSE     2017         85        0.599  0.448     8
# ℹ 12 more variables: loudness <dbl>, mode <fct>, speechiness <dbl>,
#   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
#   tempo <dbl>, genre <chr>, duration_min <dbl>, song_clean <chr>,
#   artist_clean <chr>

Checking if “Locked out of Heaven” is a duplicate

locked_out_songs <- filter(songs_2010, song == "Locked out of Heaven" & artist == "Bruno Mars")

# Check the number of times the song appears
nrow(locked_out_songs)
[1] 2
Confirmed there are more than 1

Remove potential duplicates

songs_2010 <- distinct(songs_2010, song_clean, artist_clean, .keep_all = TRUE)
songs_2010 <- select(songs_2010, -song_clean, -artist_clean)

Check that duplicates were removed

songs_sorted <- arrange(songs_2010, desc(popularity))
head(songs_sorted, 10)
# A tibble: 10 × 19
   artist  song  duration_ms explicit  year popularity danceability energy   key
   <chr>   <chr>       <dbl> <fct>    <int>      <dbl>        <dbl>  <dbl> <dbl>
 1 The Ne… Swea…      240400 FALSE     2013         89        0.612  0.807    10
 2 Tom Od… Anot…      244360 TRUE      2013         88        0.445  0.537     4
 3 WILLOW  Wait…      196520 FALSE     2015         86        0.764  0.705     3
 4 Billie… love…      200185 FALSE     2018         86        0.351  0.296     4
 5 Bruno … Lock…      233478 FALSE     2012         85        0.726  0.698     5
 6 The Ne… Dadd…      260173 FALSE     2015         85        0.588  0.521    10
 7 Avicii  The …      176658 FALSE     2014         85        0.527  0.835     6
 8 Ed She… Perf…      263400 FALSE     2017         85        0.599  0.448     8
 9 Post M… Circ…      215280 FALSE     2019         85        0.695  0.762     0
10 Arctic… Why'…      161123 FALSE     2013         84        0.691  0.631     2
# ℹ 10 more variables: loudness <dbl>, mode <fct>, speechiness <dbl>,
#   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
#   tempo <dbl>, genre <chr>, duration_min <dbl>
locked_out_songs <- filter(songs_2010, song == "Locked out of Heaven" & artist == "Bruno Mars")

nrow(locked_out_songs)
[1] 1
Confirmed duplicates were removed

Checking for top 5 artist with total song quantity

top_artists <- songs_2010 |>
  group_by(artist) |>
  summarize(song_count = n()) |>
  arrange(desc(song_count)) |>
  slice_head(n = 5)
head(top_artists)
# A tibble: 5 × 2
  artist        song_count
  <chr>              <int>
1 Drake                 20
2 Calvin Harris         18
3 David Guetta          18
4 Ariana Grande         13
5 Katy Perry            13

I am going to start off with focusing only on Katy Perry

Creating popularity to numeric and order her songs by popularity

katy_songs <- songs_2010 |>
  filter(tolower(artist) == "katy perry")

katy_songs <- katy_songs |>
  mutate(popularity = as.numeric(popularity)) |>
  arrange(desc(popularity))

Create Graph only for Katy Perry songs and compare with each other

ggplot(katy_songs, aes(x = reorder(song, popularity), y = popularity, fill = factor(year))) +
  geom_bar(stat = "identity") +
 scale_fill_manual(values = c("2010" = "violetred3", "2012" = "orchid", "2013" = "brown3", "2017" = "gold"
  )) +
  labs(
    title = "Katy Perry Songs (2010+) by Popularity",
    x = "Song Title",
    y = "Popularity",
    fill = "Release Year",
    caption = "Data Source: Spotify Web"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", size = 14),
    axis.text.y = element_text(size = 10)
  )

katy_songs <- filter(songs_2010, artist == "Katy Perry") |>
  select(song, year, popularity, danceability, energy, loudness, tempo) |>
  arrange(desc(popularity))

katy_songs
# A tibble: 13 × 7
   song                       year popularity danceability energy loudness tempo
   <chr>                     <int>      <dbl>        <dbl>  <dbl>    <dbl> <dbl>
 1 Last Friday Night (T.G.I…  2012         74        0.649  0.815    -3.80 126. 
 2 Dark Horse                 2013         74        0.647  0.585    -6.12 132. 
 3 Part Of Me                 2012         73        0.678  0.918    -4.63 130. 
 4 Roar                       2013         73        0.554  0.772    -4.82 180. 
 5 California Gurls           2012         72        0.791  0.754    -3.73 125. 
 6 Firework                   2010         72        0.638  0.832    -5.04 124. 
 7 The One That Got Away      2012         72        0.687  0.792    -4.02 134. 
 8 Teenage Dream              2010         69        0.719  0.798    -4.58 120. 
 9 Chained To The Rhythm      2017         69        0.562  0.8      -5.40  95.0
10 E.T.                       2012         65        0.62   0.869    -5.25 152. 
11 Wide Awake                 2012         65        0.514  0.683    -5.10 160. 
12 This Is How We Do          2013         60        0.69   0.636    -6.03  96  
13 Unconditionally            2013          0        0.555  0.729    -4.81 129. 

Katy Perry Songs Regression Analysis

katy_songs <- filter(songs_2010, artist == "Katy Perry")

katy_songs_numeric <- select(katy_songs, popularity, danceability, energy, loudness, tempo)
library(DataExplorer)

plot_correlation(katy_songs_numeric)

For my 3rd graph I will be plotting an Interactive Scatter plot for the top 5 artist from my Data Set

Filter and mutating songs for those artists

library(plotly)

top5_artists <- c("Drake", "Calvin Harris", "David Guetta", "Ariana Grande", "Katy Perry")

top5_allsongs <- filter(songs_2010, artist %in% top5_artists)
top5_allsongs <- mutate(
  top5_allsongs,
  
popularity_scaled = (popularity / 100) * 35 + 5 #Help from Chatgbt, I was having issues with sizing on my graph
)

Setting up the graph to display

scatter_plot <- plot_ly(
  data = top5_allsongs,
  x = ~energy,
  y = ~danceability,
  color = ~artist,
  colors = c("pink2", "darkturquoise", "gold", "forestgreen", "orchid"),
  type = "scatter",
  mode = "markers",
  text = ~paste(
    "Song:", song,
    "<br>Artist:", artist,
    "<br>Popularity:", popularity,
    "<br>Year:", year,
    "<br>Tempo:", tempo
  ),
  marker = list(
    size = ~popularity_scaled,
    sizemode = "diameter",
    opacity = 0.7,
    sizemin = 1
  )
)

Editing how labels/titles on the graph

scatter_plot <- layout(
  scatter_plot,
  title = "Energy vs Danceability (Point Size by Popularity)",
  xaxis = list(title = "Energy"),
  yaxis = list(title = "Danceability"),
  legend = list(title = list(text = "Artist"))) #I didn't think adding Artist would change the little pop up color but very glad I did and it didn't break

Showing the graph created

scatter_plot

Conclusion