Pick an album of your choice - I recommend one with lots of lyrics. Rap albums are great for this.
- Find the album and get the lyrics, and unnest them.
library(geniusr)
library(tidyverse)
library(tidytext)
library(wordcloud2)
genius_token()
[1] "1hnFZXJzeCDTlzHpDIeazjD5vyeh78PGZX_4aQwHbhvNQ0eH3Ho2bNaUzzrL9C-7"
search_song("complexion (a zulu love)")
get_song_meta(722287)
NA
tpab_tracks <- scrape_tracklist(120991)
argument is not an atomic vector; coercing
tpab_words <- map_df(tpab_tracks$song_lyrics_url, scrape_lyrics_url)
tpab_lyrics <- map_df(tpab_tracks$song_lyrics_url, scrape_lyrics_url)
tpab_lyrics <- tpab_lyrics %>%
unnest_tokens(word, line) %>%
select(song_name, word)
tpab_lyrics
NA
For this assignment I chose one of my favorite hip-hop albums, Kendrick Lamar’s “To Pimp a Butterfly.” Here are the lyrics unnested from each track. As shown above, there are nearly 13,000 lyrics on this album.
- Clean the lyrics by removing stopwords, and then create a table and word cloud with the words counts.
tpab_lyrics %>%
anti_join(get_stopwords()) %>%
count(word, sort = T) %>%
top_n(200)
Joining, by = "word"
Selecting by n
NA
Here is a table of the most used words on “To Pimp a Butterfly” with “stop words” removed. “Know” appears the most (101 times) and “watch” appears the least (7 times).
tpab_lyrics %>%
anti_join(get_stopwords()) %>%
count(word, sort = T) %>%
top_n(200) %>%
wordcloud2(size = .5)
Joining, by = "word"
Selecting by n
Here is a word cloud for the highest amount of words used on Kendrick’s album, with “stop words” removed. The larger the words appear on the word cloud, the more times a word is used. As you can see, this album contains lots of profanity.
- Do sentiment analyses using bing and nrc, and create graphs of the words that contribute most to each sentiment.
bing <- get_sentiments("bing")
nrc <- get_sentiments("nrc")
tpab_lyrics %>%
inner_join(bing) %>%
count(word, sentiment, sort = T) %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(sentiment), scales = "free") +
labs(y = "To Pimp a Butterfly: 'Bing' Sentiment Lyrical Analysis",
x = NULL) +
scale_fill_viridis_d() +
coord_flip() +
theme_minimal()
Joining, by = "word"
Selecting by n

Here is a graph depicting the “Bing” sentiment analysis of the lyrics on “TPAB.” “Like” and “love” are the two most used words with positive meaning according to the “Bing” dictionary, and “shit” and “fuck” are the top two negatively associated words. Interestingly, the words “love,” “lie” and “complicated” are also some of the most commonly used words, which reflects some themes throughout the album.
tpab_lyrics %>%
inner_join(nrc) %>%
count(word, sentiment, sort = T) %>%
group_by(sentiment) %>%
top_n(10) %>%
ungroup() %>%
mutate(word = reorder(word, n)) %>%
ggplot(aes(word, n, fill = sentiment)) +
geom_col(show.legend = FALSE) +
facet_wrap(vars(sentiment), scales = "free") +
labs(y = "To Pimp a Butterfly: 'NRC' Sentiment Lyrical Analysis",
x = NULL) +
scale_fill_viridis_d() +
coord_flip() +
theme_minimal()
Joining, by = "word"
Selecting by n

Here is a graph showing the “NRC” sentiment analysis of the lyrics. Interestingly, “Love” and “shit” are two of the top words used throughout all the different sentiments, similar to the Bing analysis.
- Create bigrams of the lyrics, remove the stopwords, and create a table and word cloud of the most common bigrams.
tpab_bigrams <- tpab_words %>%
unnest_tokens(bigram, line, token = "ngrams", n = 2) %>%
select(bigram)
tpab_bigrams %>%
separate(bigram, c("word1", "word2", sep = " ")) %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
unite(bigram, word1, word2, sep = " ") %>%
count(bigram, sort = T) %>%
filter(n > 1)
Expected 3 pieces. Additional pieces discarded in 5 rows [3909, 6657, 7955, 8236, 10087].Expected 3 pieces. Missing pieces filled with `NA` in 11856 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Here is a table of the most common bigrams on the album. “Ain t” and “don t” are the top two, but since those are conjuctions, I would say that “zoom zoom” is the most common, with “gotta lie” appearing nearly as many times.
tpab_bigrams %>%
separate(bigram, c("word1", "word2", sep = " ")) %>%
filter(!word1 %in% stop_words$word) %>%
filter(!word2 %in% stop_words$word) %>%
unite(bigram, word1, word2, sep = " ") %>%
count(bigram, sort = T) %>%
filter(n > 1) %>%
wordcloud2(size = .6)
Expected 3 pieces. Additional pieces discarded in 5 rows [3909, 6657, 7955, 8236, 10087].Expected 3 pieces. Missing pieces filled with `NA` in 11856 rows [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, ...].
Here is a word cloud of the same data. Leaving out “ain t” and “don t,” the largest bigrams are “zoom zoom,” “gotta lie” and “complicated loving.”
- Use the bigram method to find the most common words that come after words of your choice, like i/you or he/she.
first_word <- c("complicated", "love")
tpab_bigrams %>%
count(bigram, sort = T) %>%
separate(bigram, c("word1", "word2"), sep = " ") %>%
filter(word1 %in% first_word) %>%
count(word1, word2, wt = n, sort = TRUE) %>%
rename(total = nn)
NA
I chose to use “complicated” and “love” as word pairs, since they were very prominent throughout the album. According to the table, the combination of “complicated” and “loving” appeared 18 times. “Love” appeared many more times and was paired with various different words, such as “it,” “myself,” “you,” and “complexion.”
LS0tCnRpdGxlOiAiTHlyaWNzIEFzc2lnbm1lbnQiCm91dHB1dDogaHRtbF9ub3RlYm9vawotLS0KClBpY2sgYW4gYWxidW0gb2YgeW91ciBjaG9pY2UgLSBJIHJlY29tbWVuZCBvbmUgd2l0aCBsb3RzIG9mIGx5cmljcy4gUmFwIGFsYnVtcyBhcmUgZ3JlYXQgZm9yIHRoaXMuCgoxLiBGaW5kIHRoZSBhbGJ1bSBhbmQgZ2V0IHRoZSBseXJpY3MsIGFuZCB1bm5lc3QgdGhlbS4gIAoKYGBge3J9CmxpYnJhcnkoZ2VuaXVzcikgICAgICAgICAgICAgICAgICAgICAgICAgCmxpYnJhcnkodGlkeXZlcnNlKQpsaWJyYXJ5KHRpZHl0ZXh0KQpsaWJyYXJ5KHdvcmRjbG91ZDIpCmBgYAoKYGBge3J9Cmdlbml1c190b2tlbigpCnNlYXJjaF9zb25nKCJjb21wbGV4aW9uIChhIHp1bHUgbG92ZSkiKQpnZXRfc29uZ19tZXRhKDcyMjI4NykKCmBgYApgYGB7cn0KdHBhYl90cmFja3MgPC0gc2NyYXBlX3RyYWNrbGlzdCgxMjA5OTEpCgp0cGFiX3dvcmRzIDwtIG1hcF9kZih0cGFiX3RyYWNrcyRzb25nX2x5cmljc191cmwsIHNjcmFwZV9seXJpY3NfdXJsKQoKdHBhYl9seXJpY3MgPC0gbWFwX2RmKHRwYWJfdHJhY2tzJHNvbmdfbHlyaWNzX3VybCwgc2NyYXBlX2x5cmljc191cmwpCgp0cGFiX2x5cmljcyA8LSB0cGFiX2x5cmljcyAlPiUgCiAgdW5uZXN0X3Rva2Vucyh3b3JkLCBsaW5lKSAlPiUgCiAgc2VsZWN0KHNvbmdfbmFtZSwgd29yZCkKCnRwYWJfbHlyaWNzCiAgCmBgYApGb3IgdGhpcyBhc3NpZ25tZW50IEkgY2hvc2Ugb25lIG9mIG15IGZhdm9yaXRlIGhpcC1ob3AgYWxidW1zLCBLZW5kcmljayBMYW1hcidzICJUbyBQaW1wIGEgQnV0dGVyZmx5LiIgSGVyZSBhcmUgdGhlIGx5cmljcyB1bm5lc3RlZCBmcm9tIGVhY2ggdHJhY2suIEFzIHNob3duIGFib3ZlLCB0aGVyZSBhcmUgbmVhcmx5IDEzLDAwMCBseXJpY3Mgb24gdGhpcyBhbGJ1bS4gCgoKMi4gQ2xlYW4gdGhlIGx5cmljcyBieSByZW1vdmluZyBzdG9wd29yZHMsIGFuZCB0aGVuIGNyZWF0ZSBhIHRhYmxlIGFuZCB3b3JkIGNsb3VkIHdpdGggdGhlIHdvcmRzIGNvdW50cy4gIAoKYGBge3J9CnRwYWJfbHlyaWNzICU+JSAKICBhbnRpX2pvaW4oZ2V0X3N0b3B3b3JkcygpKSAlPiUgCiAgY291bnQod29yZCwgc29ydCA9IFQpICU+JSAKICB0b3BfbigyMDApCiAgCmBgYApIZXJlIGlzIGEgdGFibGUgb2YgdGhlIG1vc3QgdXNlZCB3b3JkcyBvbiAiVG8gUGltcCBhIEJ1dHRlcmZseSIgd2l0aCAic3RvcCB3b3JkcyIgcmVtb3ZlZC4gIktub3ciIGFwcGVhcnMgdGhlIG1vc3QgKDEwMSB0aW1lcykgYW5kICJ3YXRjaCIgYXBwZWFycyB0aGUgbGVhc3QgKDcgdGltZXMpLiAKCgpgYGB7cn0KdHBhYl9seXJpY3MgJT4lIAogIGFudGlfam9pbihnZXRfc3RvcHdvcmRzKCkpICU+JSAKICBjb3VudCh3b3JkLCBzb3J0ID0gVCkgJT4lIAogIHRvcF9uKDIwMCkgJT4lIAogIHdvcmRjbG91ZDIoc2l6ZSA9IC41KQpgYGAKSGVyZSBpcyBhIHdvcmQgY2xvdWQgZm9yIHRoZSBoaWdoZXN0IGFtb3VudCBvZiB3b3JkcyB1c2VkIG9uIEtlbmRyaWNrJ3MgYWxidW0sIHdpdGggInN0b3Agd29yZHMiIHJlbW92ZWQuIFRoZSBsYXJnZXIgdGhlIHdvcmRzIGFwcGVhciBvbiB0aGUgd29yZCBjbG91ZCwgdGhlIG1vcmUgdGltZXMgYSB3b3JkIGlzIHVzZWQuIEFzIHlvdSBjYW4gc2VlLCB0aGlzIGFsYnVtIGNvbnRhaW5zIGxvdHMgb2YgcHJvZmFuaXR5LiAKCgozLiBEbyBzZW50aW1lbnQgYW5hbHlzZXMgdXNpbmcgYmluZyBhbmQgbnJjLCBhbmQgY3JlYXRlIGdyYXBocyBvZiB0aGUgd29yZHMgdGhhdCBjb250cmlidXRlIG1vc3QgdG8gZWFjaCBzZW50aW1lbnQuICAKCgpgYGB7cn0KYmluZyA8LSBnZXRfc2VudGltZW50cygiYmluZyIpCm5yYyA8LSBnZXRfc2VudGltZW50cygibnJjIikKYGBgCgpgYGB7cn0KdHBhYl9seXJpY3MgJT4lIAogIGlubmVyX2pvaW4oYmluZykgJT4lIAogIGNvdW50KHdvcmQsIHNlbnRpbWVudCwgc29ydCA9IFQpICU+JSAKICBncm91cF9ieShzZW50aW1lbnQpICU+JQogIHRvcF9uKDEwKSAlPiUKICB1bmdyb3VwKCkgJT4lCiAgbXV0YXRlKHdvcmQgPSByZW9yZGVyKHdvcmQsIG4pKSAlPiUKICBnZ3Bsb3QoYWVzKHdvcmQsIG4sIGZpbGwgPSBzZW50aW1lbnQpKSArCiAgZ2VvbV9jb2woc2hvdy5sZWdlbmQgPSBGQUxTRSkgKwogIGZhY2V0X3dyYXAodmFycyhzZW50aW1lbnQpLCBzY2FsZXMgPSAiZnJlZSIpICsKICBsYWJzKHkgPSAiVG8gUGltcCBhIEJ1dHRlcmZseTogJ0JpbmcnIFNlbnRpbWVudCBMeXJpY2FsIEFuYWx5c2lzIiwKICAgICAgIHggPSBOVUxMKSArCiAgc2NhbGVfZmlsbF92aXJpZGlzX2QoKSArCiAgY29vcmRfZmxpcCgpICsKICB0aGVtZV9taW5pbWFsKCkKYGBgCkhlcmUgaXMgYSBncmFwaCBkZXBpY3RpbmcgdGhlICJCaW5nIiBzZW50aW1lbnQgYW5hbHlzaXMgb2YgdGhlIGx5cmljcyBvbiAiVFBBQi4iICJMaWtlIiBhbmQgImxvdmUiIGFyZSB0aGUgdHdvIG1vc3QgdXNlZCB3b3JkcyB3aXRoIHBvc2l0aXZlIG1lYW5pbmcgYWNjb3JkaW5nIHRvIHRoZSAiQmluZyIgZGljdGlvbmFyeSwgYW5kICJzaGl0IiBhbmQgImZ1Y2siIGFyZSB0aGUgdG9wIHR3byBuZWdhdGl2ZWx5IGFzc29jaWF0ZWQgd29yZHMuIEludGVyZXN0aW5nbHksIHRoZSB3b3JkcyAibG92ZSwiICJsaWUiIGFuZCAiY29tcGxpY2F0ZWQiIGFyZSBhbHNvIHNvbWUgb2YgdGhlIG1vc3QgY29tbW9ubHkgdXNlZCB3b3Jkcywgd2hpY2ggcmVmbGVjdHMgc29tZSB0aGVtZXMgdGhyb3VnaG91dCB0aGUgYWxidW0uIAoKYGBge3J9CnRwYWJfbHlyaWNzICU+JSAKICBpbm5lcl9qb2luKG5yYykgJT4lIAogIGNvdW50KHdvcmQsIHNlbnRpbWVudCwgc29ydCA9IFQpICU+JSAKICBncm91cF9ieShzZW50aW1lbnQpICU+JQogIHRvcF9uKDEwKSAlPiUKICB1bmdyb3VwKCkgJT4lCiAgbXV0YXRlKHdvcmQgPSByZW9yZGVyKHdvcmQsIG4pKSAlPiUKICBnZ3Bsb3QoYWVzKHdvcmQsIG4sIGZpbGwgPSBzZW50aW1lbnQpKSArCiAgZ2VvbV9jb2woc2hvdy5sZWdlbmQgPSBGQUxTRSkgKwogIGZhY2V0X3dyYXAodmFycyhzZW50aW1lbnQpLCBzY2FsZXMgPSAiZnJlZSIpICsKICBsYWJzKHkgPSAiVG8gUGltcCBhIEJ1dHRlcmZseTogJ05SQycgU2VudGltZW50IEx5cmljYWwgQW5hbHlzaXMiLAogICAgICAgeCA9IE5VTEwpICsKICBzY2FsZV9maWxsX3ZpcmlkaXNfZCgpICsKICBjb29yZF9mbGlwKCkgKwogIHRoZW1lX21pbmltYWwoKQpgYGAKSGVyZSBpcyBhIGdyYXBoIHNob3dpbmcgdGhlICJOUkMiIHNlbnRpbWVudCBhbmFseXNpcyBvZiB0aGUgbHlyaWNzLiBJbnRlcmVzdGluZ2x5LCAiTG92ZSIgYW5kICJzaGl0IiBhcmUgdHdvIG9mIHRoZSB0b3Agd29yZHMgdXNlZCB0aHJvdWdob3V0IGFsbCB0aGUgZGlmZmVyZW50IHNlbnRpbWVudHMsIHNpbWlsYXIgdG8gdGhlIEJpbmcgYW5hbHlzaXMuCgo0LiBDcmVhdGUgYmlncmFtcyBvZiB0aGUgbHlyaWNzLCByZW1vdmUgdGhlIHN0b3B3b3JkcywgYW5kIGNyZWF0ZSBhIHRhYmxlIGFuZCB3b3JkIGNsb3VkIG9mIHRoZSBtb3N0IGNvbW1vbiBiaWdyYW1zLiAgCgoKYGBge3J9CnRwYWJfYmlncmFtcyA8LSB0cGFiX3dvcmRzICU+JSAKICB1bm5lc3RfdG9rZW5zKGJpZ3JhbSwgbGluZSwgdG9rZW4gPSAibmdyYW1zIiwgbiA9IDIpICU+JSAKICBzZWxlY3QoYmlncmFtKQpgYGAKCmBgYHtyfQp0cGFiX2JpZ3JhbXMgJT4lIAogIHNlcGFyYXRlKGJpZ3JhbSwgYygid29yZDEiLCAid29yZDIiLCBzZXAgPSAiICIpKSAlPiUgCiAgZmlsdGVyKCF3b3JkMSAlaW4lIHN0b3Bfd29yZHMkd29yZCkgJT4lIAogIGZpbHRlcighd29yZDIgJWluJSBzdG9wX3dvcmRzJHdvcmQpICU+JSAKICB1bml0ZShiaWdyYW0sIHdvcmQxLCB3b3JkMiwgc2VwID0gIiAiKSAlPiUgCiAgY291bnQoYmlncmFtLCBzb3J0ID0gVCkgJT4lIAogIGZpbHRlcihuID4gMSkKYGBgCkhlcmUgaXMgYSB0YWJsZSBvZiB0aGUgbW9zdCBjb21tb24gYmlncmFtcyBvbiB0aGUgYWxidW0uICJBaW4gdCIgYW5kICJkb24gdCIgYXJlIHRoZSB0b3AgdHdvLCBidXQgc2luY2UgdGhvc2UgYXJlIGNvbmp1Y3Rpb25zLCBJIHdvdWxkIHNheSB0aGF0ICJ6b29tIHpvb20iIGlzIHRoZSBtb3N0IGNvbW1vbiwgd2l0aCAiZ290dGEgbGllIiBhcHBlYXJpbmcgbmVhcmx5IGFzIG1hbnkgdGltZXMuIAoKCmBgYHtyfQp0cGFiX2JpZ3JhbXMgJT4lIAogIHNlcGFyYXRlKGJpZ3JhbSwgYygid29yZDEiLCAid29yZDIiLCBzZXAgPSAiICIpKSAlPiUgCiAgZmlsdGVyKCF3b3JkMSAlaW4lIHN0b3Bfd29yZHMkd29yZCkgJT4lIAogIGZpbHRlcighd29yZDIgJWluJSBzdG9wX3dvcmRzJHdvcmQpICU+JSAKICB1bml0ZShiaWdyYW0sIHdvcmQxLCB3b3JkMiwgc2VwID0gIiAiKSAlPiUgCiAgY291bnQoYmlncmFtLCBzb3J0ID0gVCkgJT4lIAogIGZpbHRlcihuID4gMSkgJT4lIAogIHdvcmRjbG91ZDIoc2l6ZSA9IC42KQpgYGAKSGVyZSBpcyBhIHdvcmQgY2xvdWQgb2YgdGhlIHNhbWUgZGF0YS4gTGVhdmluZyBvdXQgImFpbiB0IiBhbmQgImRvbiB0LCIgdGhlIGxhcmdlc3QgYmlncmFtcyBhcmUgInpvb20gem9vbSwiICJnb3R0YSBsaWUiIGFuZCAiY29tcGxpY2F0ZWQgbG92aW5nLiIKCgo1LiBVc2UgdGhlIGJpZ3JhbSBtZXRob2QgdG8gZmluZCB0aGUgbW9zdCBjb21tb24gd29yZHMgdGhhdCBjb21lIGFmdGVyIHdvcmRzIG9mIHlvdXIgY2hvaWNlLCBsaWtlIGkveW91IG9yIGhlL3NoZS4KCgpgYGB7cn0KZmlyc3Rfd29yZCA8LSBjKCJjb21wbGljYXRlZCIsICJsb3ZlIikKCnRwYWJfYmlncmFtcyAlPiUgCiAgY291bnQoYmlncmFtLCBzb3J0ID0gVCkgJT4lIAogIHNlcGFyYXRlKGJpZ3JhbSwgYygid29yZDEiLCAid29yZDIiKSwgc2VwID0gIiAiKSAlPiUgICAgCiAgZmlsdGVyKHdvcmQxICVpbiUgZmlyc3Rfd29yZCkgJT4lICAgICAgICAgICAgICAgICAgICAgICAgICAKICBjb3VudCh3b3JkMSwgd29yZDIsIHd0ID0gbiwgc29ydCA9IFRSVUUpICU+JSAKICByZW5hbWUodG90YWwgPSBubikKCmBgYApJIGNob3NlIHRvIHVzZSAiY29tcGxpY2F0ZWQiIGFuZCAibG92ZSIgYXMgd29yZCBwYWlycywgc2luY2UgdGhleSB3ZXJlIHZlcnkgcHJvbWluZW50IHRocm91Z2hvdXQgdGhlIGFsYnVtLiBBY2NvcmRpbmcgdG8gdGhlIHRhYmxlLCB0aGUgY29tYmluYXRpb24gb2YgImNvbXBsaWNhdGVkIiBhbmQgImxvdmluZyIgYXBwZWFyZWQgMTggdGltZXMuICJMb3ZlIiBhcHBlYXJlZCBtYW55IG1vcmUgdGltZXMgYW5kIHdhcyBwYWlyZWQgd2l0aCB2YXJpb3VzIGRpZmZlcmVudCB3b3Jkcywgc3VjaCBhcyAiaXQsIiAibXlzZWxmLCIgInlvdSwiIGFuZCAiY29tcGxleGlvbi4iCgo=