A Jimmy Buffet Inspired, Unsupervised Learning and Text Mining of Emotion Terms in R

Regis O'Connor
July 2017

***

Objective & Methodology

This is a reverse engineering of a similar analysis published May 17, 2017 in in Data Science + and shared on R-bloggers.

https://datascienceplus.com/unsupervised-learning-and-text-mining-of-emotion-terms-using-r/

Raw data is a csv file of 18 songs written by Jimmy Buffet. The code uses tidyverse and tidytext to process the 18x2 data.frame through 8 steps including:

  • tokenization (tidytext)
  • filtering (tidyverse/dplyr)
  • anti-joining (tidyverse/dplyr)
  • joining (tidyverse/dplyr)
  • lexicon matching (tidytext)
  • summarizing (tidyverse/dplyr)
  • spreading (tidyverse/tidyr)

Pull Emotion Words, Aggregate by Song & Emotion

emotions_Jimmy <- song_lyrics %>%
  unnest_tokens(word, lyrics) %>%                           
  anti_join(stop_words, by = "word") %>%                  
  filter(!grepl('[0-9]', word)) %>%
  left_join(get_sentiments("nrc"), by = "word") %>%
  filter(!(sentiment == "negative" | sentiment == "positive")) %>%
  group_by(song, sentiment) %>%
  summarize( freq = n()) %>%
  mutate(percent=round(freq/sum(freq)*100)) %>%
  select(-freq) %>%
  spread(sentiment, percent, fill=0) %>%
  ungroup()

sd_scale_Jimmy <- function(x) {
  (x - mean(x))/sd(x)
}
emotions_Jimmy[,c(2:9)] <- apply(emotions_Jimmy[,c(2:9)], 2, sd_scale_Jimmy)
emotions_Jimmy <- as.data.frame(emotions_Jimmy)
rownames(emotions_Jimmy) <- emotions_Jimmy[,1]
emotions3_JB <- emotions_Jimmy[,-1]
emotions3_JB <- as.matrix(emotions3_JB)

Heatmap produced in RPres

plot of chunk unnamed-chunk-2

Heatmap processed in Power Point and imported to RPres