My Third Try Working with TS Red Lyrics
I created a pdf of all the lyrics from Taylor Swift’s Red (Taylor’s Version) album, including all of the From the Vault bonus songs.
The following were stripped out:
First, I’ll load in the necessary libraries.
Next, I’ll read in the pdf.
Now I will attempt to create a wordcloud:
dfm_inaug <- corpus_subset(corpus_red) %>%
dfm(remove = stopwords('english'), remove_punct = TRUE) %>%
dfm_trim(min_termfreq = 10, verbose = FALSE)
set.seed(100)
textplot_wordcloud(dfm_inaug)
Trying a different way of doing this:
I’m going to follow a tutorial I found on-line].
#read in pdf
red_pdf <- pdf_text("/Users/lissie/DACCS R/Text as Data/TS Data/Red_stripped_down.pdf")
#create corpus
corpus_red2 <- Corpus(VectorSource(red_pdf))
Next, we’ll clean the data using tm:
dtm <- TermDocumentMatrix(corpus_red2)
matrix <- as.matrix(dtm)
words <- sort(rowSums(matrix),decreasing=TRUE)
df <- data.frame(word = names(words),freq=words)
Create word cloud:
set.seed(1234) # for reproducibility
wordcloud(words = df$word, freq = df$freq, min.freq = 5, max.words=200, random.order=FALSE, scale=c(3.5,0.25), rot.per=0.35, colors=brewer.pal(8, "Dark2"))
Or a different version:
wordcloud2(data=df, size=1.6, color='random-dark')
Now I’m going to try and create a wordcloud with phrases instead of individual words:
text <- readLines("/Users/lissie/DACCS R/Text as Data/TS Data/red_lyrics.txt")
# freq = 1 adds a columns with just 1's for every value.
my_data <- data.frame(text = text, freq = 1, stringsAsFactors = FALSE)
# aggregate the data.
my_agr <- aggregate(freq ~ ., data = my_data, sum)
wordcloud(words = my_agr$text, freq = my_agr$freq, min.freq = 1,
max.words=200, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"), scale = c(10, .5))
(Note: a large number of sentences were not able to fit but for the sake publishing, I hid the warnings)