Step 2 | Sentiment

Remove Objects from Environment

ls()

## character(0)

rm(list = ls())

Load packages and lexicon.

library(genius)#Add lyrics to a data frame
library(dplyr)
library(tidytext)
lexicon <- get_sentiments("afinn")
glimpse(lexicon)

## Observations: 2,476
## Variables: 2
## $ word  <chr> "abandon", "abandoned", "abandons", "abducted", "abduction…
## $ score <int> -2, -2, -2, -2, -2, -2, -3, -3, -3, -3, 2, 2, 1, -1, -1, 2…

1. Load data from step 1: top100songs.rds

top100songs <- readRDS("top100songs.rds")

2. Data cleaning: The Genius package doesn’t like special characters

top100songsv2 <- top100songs %>% 
  mutate( Track = gsub("'", "", Track), #remove '
          Track = gsub("\\s*\\([^\\)]+\\)\\s*$", "", Track), #remove information between parenthesis
          Track = gsub("+", "", Track))
head(top100songsv2)

## # A tibble: 6 x 3
##   Artist       Track         TotalStreams
##   <chr>        <chr>                <dbl>
## 1 Drake        Gods Plan        453226629
## 2 XXXTENTACION SAD!             332633597
## 3 Post Malone  Psycho           306877012
## 4 Juice WRLD   Lucid Dreams     299907223
## 5 BlocBoy JB   Look Alive       266861797
## 6 Drake        Nice For What    263455062

3. Extract lyrics for the Top 100 songs and summarise sentiment.

This function will do the following:
1. Pull the lyrics from the top 100 songs data frame by Artist and Track
2. Sumarise the score of each track

This process may take a 3 to 4 seconds per song, depending on the speed of your ISP. Also, keep in mind that not all artists may have lyrics as an output.

Sentiment <- sapply(
  #X = 1:5 
  X = 1:nrow(top100songsv2)
  , FUN = function(row_num, topSongTBL){
      
      sentiment <- NA
      tryCatch({
          lyricTBL <- genius::genius_lyrics(
            artist = topSongTBL[["Artist"]][row_num]
            , song = topSongTBL[["Track"]][row_num]
          )
      
        sentiment <- lyricTBL %>%
          unnest_tokens(word, lyric) %>%
          select(word) %>%
          inner_join(lexicon) %>%
          summarise(score = sum(score))
        
        sentiment <- sentiment[[1]]
        
      }, error = function(e){
          print(paste0("Failed for song name: ", topSongTBL[["Track"]][row_num]))
      })
      
      return(sentiment)
  }
  , topSongTBL = top100songsv2
)

## [1] "Failed for song name: Ric Flair Drip"
## [1] "Failed for song name: Yes Indeed"
## [1] "Failed for song name: Love Lies"
## [1] "Failed for song name: Drip Too Hard"
## [1] "Failed for song name: LOVE. FEAT. ZACARI."
## [1] "Failed for song name: Finesse - Remix; feat. Cardi B"
## [1] "Failed for song name: Sunflower - Spider-Man: Into the Spider-Verse"
## [1] "Failed for song name: the remedy for a broken heart"
## [1] "Failed for song name: Tequila"
## [1] "Failed for song name: Wake Up in the Sky"
## [1] "Failed for song name: Pray For Me"

print(head(as.data.frame(Sentiment)))

##   Sentiment
## 1       -24
## 2       -40
## 3       -18
## 4       -36
## 5       -82
## 6       -14

Save the final result in an RDS file top100songsSENTIMENT.rds.

top100songsSENTIMENT <- cbind(top100songsv2, Sentiment)
saveRDS(top100songsSENTIMENT, "top100songsSENTIMENT.rds")
head(top100songsSENTIMENT)

##         Artist         Track TotalStreams Sentiment
## 1        Drake     Gods Plan    453226629       -24
## 2 XXXTENTACION          SAD!    332633597       -40
## 3  Post Malone        Psycho    306877012       -18
## 4   Juice WRLD  Lucid Dreams    299907223       -36
## 5   BlocBoy JB    Look Alive    266861797       -82
## 6        Drake Nice For What    263455062       -14

Step 2 | Sentiment_Analysis

J Argueta

4/24/2019

1. Load data from step 1: top100songs.rds

2. Data cleaning: The Genius package doesn’t like special characters

3. Extract lyrics for the Top 100 songs and summarise sentiment.

Save the final result in an RDS file top100songsSENTIMENT.rds.