I chose the “spotifysongs.csv” data set from the class google drive that was directly scraped from Spotify.I am going to explore the top 5 artists for a particular year(2019). I am planning on doing the plot by identifying their top songs and how popular they were during that year and I am also going to see how was the danceability for their songs. The variables that I will be focusing on are: artist, song, year, danceability and popularity. Popularity is a quantitative variable. To clean this datset, I first need to filter for the year 2019. Next, I need to find the top 5 artists and I used the group_by, summarize, and arrange function to group all the artist together and add all of their popularity points. I chose this data set because I love listening to songs without being interrupted, and spotify is the best platform I have used(still using). The year I picked was 2019 and that was because COVID. I just wanted to see how it went for them in that particular year. The variables of this data set are as followed: artist name, song release year, genre, tempo, danceability, popularity, valence, whether a song is explicit or not, energy, acoustics, key, and some more.
Spotify is a Swedish audio streaming and media service provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 602 million monthly active users, including 236 million paying subscribers, as of December 2023.This data set includes popular songs spanning from 1998 to 2020. These songs are from varying genres and artists.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.4.4 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(tinytex)
library(plotly)
##
## Attaching package: 'plotly'
##
## The following object is masked from 'package:ggplot2':
##
## last_plot
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following object is masked from 'package:graphics':
##
## layout
library(dplyr)
setwd("/Users/janithrithilakasiri/Downloads")
Spotify <- read_csv("spotifysongs.csv")
## Rows: 2000 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (3): artist, song, genre
## dbl (14): duration_ms, year, popularity, danceability, energy, key, loudness...
## lgl (1): explicit
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
spotifyy_2019 <- Spotify %>%
filter(year == 2019)
spotifyy_2019
## # A tibble: 89 × 18
## artist song duration_ms explicit year popularity danceability energy key
## <chr> <chr> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 K-Ci &… Crazy 262773 FALSE 2019 30 0.68 0.644 0
## 2 Angie … If I… 244466 FALSE 2019 40 0.583 0.643 9
## 3 Aaliyah Rock… 275026 FALSE 2019 0 0.641 0.72 5
## 4 Libert… Just… 237359 FALSE 2019 43 0.786 0.614 5
## 5 Lil' K… Magi… 359973 TRUE 2019 47 0.849 0.498 2
## 6 Hinder Lips… 261053 FALSE 2019 35 0.474 0.744 2
## 7 Hinder Bett… 223533 FALSE 2019 30 0.451 0.682 2
## 8 Chris … Beau… 225881 FALSE 2019 53 0.415 0.775 5
## 9 Hayden… NUMB 217296 TRUE 2019 47 0.617 0.558 10
## 10 Nicky … X 172854 FALSE 2019 74 0.594 0.749 9
## # ℹ 79 more rows
## # ℹ 9 more variables: loudness <dbl>, mode <dbl>, speechiness <dbl>,
## # acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
## # tempo <dbl>, genre <chr>
top_5 <- spotifyy_2019 %>%
group_by(artist) %>%
summarise(total_popularity = sum(popularity)) %>%
arrange(desc(total_popularity)) %>%
top_n(5)
## Selecting by total_popularity
top_5
## # A tibble: 5 × 2
## artist total_popularity
## <chr> <dbl>
## 1 Post Malone 244
## 2 Ariana Grande 236
## 3 Lil Nas X 226
## 4 Ed Sheeran 193
## 5 Billie Eilish 158
filtered_songs_top5 <- spotifyy_2019 %>%
filter(artist %in% top_5$artist) %>%
distinct(song)
filtered_songs_top5
## # A tibble: 15 × 1
## song
## <chr>
## 1 Old Town Road - Remix
## 2 bad guy
## 3 7 rings
## 4 Sunflower - Spider-Man: Into the Spider-Verse
## 5 Antisocial (with Travis Scott)
## 6 Old Town Road
## 7 Wow.
## 8 Panini
## 9 break up with your girlfriend, i'm bored
## 10 Take Me Back to London (feat. Stormzy)
## 11 boyfriend (with Social House)
## 12 bury a friend
## 13 Cross Me (feat. Chance the Rapper & PnB Rock)
## 14 Goodbyes (Feat. Young Thug)
## 15 Circles
top_songs_by_top5 <- spotifyy_2019 %>%
filter(artist %in% top_5$artist) %>%
group_by(artist, song,danceability) %>%
summarise(total_popularity = sum(popularity)) %>%
arrange(artist, desc(total_popularity)) %>%
group_by(artist) %>%
top_n(1)
## `summarise()` has grouped output by 'artist', 'song'. You can override using
## the `.groups` argument.
## Selecting by total_popularity
top_songs_by_top5
## # A tibble: 5 × 4
## # Groups: artist [5]
## artist song danceability total_popularity
## <chr> <chr> <dbl> <dbl>
## 1 Ariana Grande 7 rings 0.778 83
## 2 Billie Eilish bad guy 0.701 83
## 3 Ed Sheeran Take Me Back to London (feat. Sto… 0.885 66
## 4 Lil Nas X Old Town Road - Remix 0.878 79
## 5 Post Malone Circles 0.695 85
artist_colors <- c("Post Malone" = "hotpink",
"Ariana Grande" = "orange",
"Lil Nas X" = "green",
"Ed Sheeran" = "red",
"Billie Eilish" = "purple" )
ggplot(top_songs_by_top5, aes(x = total_popularity, y = song, color = artist, size = danceability)) +
geom_point(alpha = 0.7, position = "jitter") +
labs(title = "Top 5 songs of Top 5 Artists in the year of 2019",
x = "Popularity Ratings",
y = "Top 5 songs of 2019",
size= "Danceability",
color = "Artist",
caption= "Source: Spotify") +
scale_color_manual(values = artist_colors) +
theme_gray()
Above visualization shows information on the top 5 artist and their songs in 2019. Number 1 song was “Circles” by Post Malone but there was no danceability to the song. In 2nd Place, “7 rings” by Ariana Grande and the danceability was fine with her song. In 3rd place, “Old Town Road” by Lil Nas X and the danceability for this song was more than fine as we all know. . In 4th place was, “Take me back to London” by Ed Sheeran and this song had the higheset danceability out of the 5 songs. At Last, “Badguy” by Billie Eilish. However, not all of the popular songs were high in danceability. First, I tried to do the top 10 songs but it wasn’t working as much as I thought because of overlapping plot points. Other than that I had so much fun with this project.