Regis O'Connor
July 2017
This is a reverse engineering of a similar analysis published May 17, 2017 in in Data Science + and shared on R-bloggers.
https://datascienceplus.com/unsupervised-learning-and-text-mining-of-emotion-terms-using-r/
Raw data is a csv file of 18 songs written by Jimmy Buffet. The code uses tidyverse and tidytext to process the 18x2 data.frame through 8 steps including:
emotions_Jimmy <- song_lyrics %>%
unnest_tokens(word, lyrics) %>%
anti_join(stop_words, by = "word") %>%
filter(!grepl('[0-9]', word)) %>%
left_join(get_sentiments("nrc"), by = "word") %>%
filter(!(sentiment == "negative" | sentiment == "positive")) %>%
group_by(song, sentiment) %>%
summarize( freq = n()) %>%
mutate(percent=round(freq/sum(freq)*100)) %>%
select(-freq) %>%
spread(sentiment, percent, fill=0) %>%
ungroup()
sd_scale_Jimmy <- function(x) {
(x - mean(x))/sd(x)
}
emotions_Jimmy[,c(2:9)] <- apply(emotions_Jimmy[,c(2:9)], 2, sd_scale_Jimmy)
emotions_Jimmy <- as.data.frame(emotions_Jimmy)
rownames(emotions_Jimmy) <- emotions_Jimmy[,1]
emotions3_JB <- emotions_Jimmy[,-1]
emotions3_JB <- as.matrix(emotions3_JB)