Project 2

What is Spotify?

Spotify is a Swedish audio streaming and media service provider founded on 23 April 2006 by Daniel Ek and Martin Lorentzon. It is one of the largest music streaming service providers, with over 602 million monthly active users, including 236 million paying subscribers, as of December 2023.This data set includes popular songs spanning from 1998 to 2020. These songs are from varying genres and artists.

Let’s load the libraries that we need to achieve this!!!

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.4.4     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(tinytex)
library(plotly)

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

library(dplyr)

Setting the working directory and calling the data set.

setwd("/Users/janithrithilakasiri/Downloads")
Spotify <- read_csv("spotifysongs.csv")

## Rows: 2000 Columns: 18
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr  (3): artist, song, genre
## dbl (14): duration_ms, year, popularity, danceability, energy, key, loudness...
## lgl  (1): explicit
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Let’s choose the year 2019…

spotifyy_2019 <- Spotify %>%
  filter(year == 2019)
spotifyy_2019

## # A tibble: 89 × 18
##    artist  song  duration_ms explicit  year popularity danceability energy   key
##    <chr>   <chr>       <dbl> <lgl>    <dbl>      <dbl>        <dbl>  <dbl> <dbl>
##  1 K-Ci &… Crazy      262773 FALSE     2019         30        0.68   0.644     0
##  2 Angie … If I…      244466 FALSE     2019         40        0.583  0.643     9
##  3 Aaliyah Rock…      275026 FALSE     2019          0        0.641  0.72      5
##  4 Libert… Just…      237359 FALSE     2019         43        0.786  0.614     5
##  5 Lil' K… Magi…      359973 TRUE      2019         47        0.849  0.498     2
##  6 Hinder  Lips…      261053 FALSE     2019         35        0.474  0.744     2
##  7 Hinder  Bett…      223533 FALSE     2019         30        0.451  0.682     2
##  8 Chris … Beau…      225881 FALSE     2019         53        0.415  0.775     5
##  9 Hayden… NUMB       217296 TRUE      2019         47        0.617  0.558    10
## 10 Nicky … X          172854 FALSE     2019         74        0.594  0.749     9
## # ℹ 79 more rows
## # ℹ 9 more variables: loudness <dbl>, mode <dbl>, speechiness <dbl>,
## #   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
## #   tempo <dbl>, genre <chr>

Now we are going to see the top 5 artist in 2019.

top_5 <- spotifyy_2019 %>%
  group_by(artist) %>%
  summarise(total_popularity = sum(popularity)) %>%
  arrange(desc(total_popularity)) %>%
  top_n(5)

## Selecting by total_popularity

top_5

## # A tibble: 5 × 2
##   artist        total_popularity
##   <chr>                    <dbl>
## 1 Post Malone                244
## 2 Ariana Grande              236
## 3 Lil Nas X                  226
## 4 Ed Sheeran                 193
## 5 Billie Eilish              158

Let’s see if I ever listened to their popular songs in 2019.

filtered_songs_top5 <- spotifyy_2019 %>%
  filter(artist %in% top_5$artist) %>%
  distinct(song)

filtered_songs_top5

## # A tibble: 15 × 1
##    song                                         
##    <chr>                                        
##  1 Old Town Road - Remix                        
##  2 bad guy                                      
##  3 7 rings                                      
##  4 Sunflower - Spider-Man: Into the Spider-Verse
##  5 Antisocial (with Travis Scott)               
##  6 Old Town Road                                
##  7 Wow.                                         
##  8 Panini                                       
##  9 break up with your girlfriend, i'm bored     
## 10 Take Me Back to London (feat. Stormzy)       
## 11 boyfriend (with Social House)                
## 12 bury a friend                                
## 13 Cross Me (feat. Chance the Rapper & PnB Rock)
## 14 Goodbyes (Feat. Young Thug)                  
## 15 Circles

Let’s find the top songs of these artists…

top_songs_by_top5 <- spotifyy_2019 %>%
  filter(artist %in% top_5$artist) %>%
  group_by(artist, song,danceability) %>%
  summarise(total_popularity = sum(popularity)) %>%
  arrange(artist, desc(total_popularity)) %>%
  group_by(artist) %>%
  top_n(1)

## `summarise()` has grouped output by 'artist', 'song'. You can override using
## the `.groups` argument.
## Selecting by total_popularity

top_songs_by_top5

## # A tibble: 5 × 4
## # Groups:   artist [5]
##   artist        song                               danceability total_popularity
##   <chr>         <chr>                                     <dbl>            <dbl>
## 1 Ariana Grande 7 rings                                   0.778               83
## 2 Billie Eilish bad guy                                   0.701               83
## 3 Ed Sheeran    Take Me Back to London (feat. Sto…        0.885               66
## 4 Lil Nas X     Old Town Road - Remix                     0.878               79
## 5 Post Malone   Circles                                   0.695               85

artist_colors <- c("Post Malone" = "hotpink",   
                    "Ariana Grande" = "orange", 
                    "Lil Nas X" = "green",
                    "Ed Sheeran" =  "red",
                    "Billie Eilish" = "purple"  )


ggplot(top_songs_by_top5, aes(x = total_popularity, y = song, color = artist, size = danceability)) +
  geom_point(alpha = 0.7, position = "jitter") +
  labs(title = "Top 5 songs of Top 5 Artists in the year of 2019",
       x = "Popularity Ratings",
       y = "Top 5 songs of 2019",
       size= "Danceability",
       color = "Artist",
       caption= "Source: Spotify") + 
  scale_color_manual(values = artist_colors) +
  theme_gray()

Project 2

Janithri Pannala

2024-04-12

What I will be doing and the reason behind the data set?