Libraries
library(tidyverse)
## -- Attaching packages --------------------------------------------------------------------------------------------------------------------------------- tidyverse 1.2.1 --
## v ggplot2 3.2.0 v purrr 0.3.2
## v tibble 2.1.3 v dplyr 0.8.3
## v tidyr 0.8.3 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'dplyr' was built under R version 3.6.1
## -- Conflicts ------------------------------------------------------------------------------------------------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(fivethirtyeight)
## Warning: package 'fivethirtyeight' was built under R version 3.6.1
library("tm")
## Warning: package 'tm' was built under R version 3.6.1
## Loading required package: NLP
##
## Attaching package: 'NLP'
## The following object is masked from 'package:ggplot2':
##
## annotate
library("SnowballC")
library("wordcloud")
## Warning: package 'wordcloud' was built under R version 3.6.1
## Loading required package: RColorBrewer
library("RColorBrewer")
Grab the data
br <- bob_ross
Tidy it up some. This code is from the TidyTuesday page.
br_tidy <- br %>%
separate(episode, into = c("season", "episode"), sep = "E") %>%
mutate(season = str_extract(season, "[:digit:]+")) %>%
mutate_at(vars(season, episode), as.integer)
Examine the names.
names(br_tidy)
## [1] "season" "episode" "episode_num"
## [4] "title" "apple_frame" "aurora_borealis"
## [7] "barn" "beach" "boat"
## [10] "bridge" "building" "bushes"
## [13] "cabin" "cactus" "circle_frame"
## [16] "cirrus" "cliff" "clouds"
## [19] "conifer" "cumulus" "deciduous"
## [22] "diane_andre" "dock" "double_oval_frame"
## [25] "farm" "fence" "fire"
## [28] "florida_frame" "flowers" "fog"
## [31] "framed" "grass" "guest"
## [34] "half_circle_frame" "half_oval_frame" "hills"
## [37] "lake" "lakes" "lighthouse"
## [40] "mill" "moon" "mountain"
## [43] "mountains" "night" "ocean"
## [46] "oval_frame" "palm_trees" "path"
## [49] "person" "portrait" "rectangle_3d_frame"
## [52] "rectangular_frame" "river" "rocks"
## [55] "seashell_frame" "snow" "snowy_mountain"
## [58] "split_frame" "steve_ross" "structure"
## [61] "sun" "tomb_frame" "tree"
## [64] "trees" "triple_frame" "waterfall"
## [67] "waves" "windmill" "window_frame"
## [70] "winter" "wood_framed"
I am going to drop columns with the word frame in them and two people’s names.
br_redacted <- br_tidy %>% select(-contains("frame")) %>% select(-contains("steve_ross")) %>% select(-contains("diane_andre"))
names(br_redacted)
## [1] "season" "episode" "episode_num"
## [4] "title" "aurora_borealis" "barn"
## [7] "beach" "boat" "bridge"
## [10] "building" "bushes" "cabin"
## [13] "cactus" "cirrus" "cliff"
## [16] "clouds" "conifer" "cumulus"
## [19] "deciduous" "dock" "farm"
## [22] "fence" "fire" "flowers"
## [25] "fog" "grass" "guest"
## [28] "hills" "lake" "lakes"
## [31] "lighthouse" "mill" "moon"
## [34] "mountain" "mountains" "night"
## [37] "ocean" "palm_trees" "path"
## [40] "person" "portrait" "river"
## [43] "rocks" "snow" "snowy_mountain"
## [46] "structure" "sun" "tree"
## [49] "trees" "waterfall" "waves"
## [52] "windmill" "winter"
I would like to build a word cloud. I found and read part of this article. http://www.sthda.com/english/wiki/text-mining-and-word-cloud-fundamentals-in-r-5-simple-steps-you-should-know
I know I need to have word frequency table to make this work quickly into a word cloud. I considered using gather, but I couldn’t remember how to do so off the top of my head so I decided to create a tibble with the variable names and then the column sums.
wordcounts <- tibble("word" = names(br_redacted[5:53]), "count" = colSums(br_redacted[5:53]))
arrange(wordcounts, desc(count))
## # A tibble: 49 x 2
## word count
## <chr> <dbl>
## 1 tree 361
## 2 trees 337
## 3 deciduous 227
## 4 conifer 212
## 5 clouds 179
## 6 mountain 160
## 7 lake 143
## 8 grass 142
## 9 river 126
## 10 bushes 120
## # ... with 39 more rows
This code is copied and modified from the above website resource.
set.seed(1234)
wordcloud(words = wordcounts$word, freq = wordcounts$count, random.order=FALSE, rot.per=0.35,
colors=brewer.pal(8, "Dark2"))
Comments: This was a super fun and quick project. I set aside 1 hour for it–and was only interupted a few times for baby issues!
Future ideas: Figure out how to pick colors for individual words. Combine tree and trees and other similar words.