DATA 607 Context Presentation

Nathan Cooper

November 1, 2017

Motivation

Libraries

suppressMessages(suppressWarnings(library("tm")))
suppressMessages(suppressWarnings(library("wordcloud")))
suppressMessages(suppressWarnings(library("tidyverse")))
suppressMessages(suppressWarnings(library("stringr")))
suppressMessages(suppressWarnings(library("SnowballC")))

Library ‘tm’ works to extract the text from the source document, tidyverse and stringr provide methods for cleaning the data. Library ‘SnowballC’ contains a word stemming algorithm that collapses words to their common roots. ‘wordcloud’ creates the visualization.

Prepping Data.

pth <- 'C:\\Users\\Nate\\Documents\\DataSet\\ted_main.csv'
ted <- pth %>% read.csv() %>% data.frame()
ncol(ted)
## [1] 17
nrow(ted)
## [1] 2550
#head(ted) This is too big for display during the talk.

Word Cloud Code Block

dsCloud <- Corpus(VectorSource(ted$description))
dsCloud <- dsCloud %>% tm_map(PlainTextDocument)
dsCloud <- dsCloud %>% tm_map(str_replace_all, pattern = "[[:punct:]]", replacement = " ")
dsCloud <- dsCloud %>% tm_map(tolower)
dsCloud <- dsCloud %>% tm_map(removeWords, c('ted','talk','says', 'three', 'two' , 'david' ,stopwords('english')))
#dsCloud <- dsCloud %>% tm_map(stemDocument,  language = 'english') #Stemming seems to truncate words
wordcloud(dsCloud, max.words = 100, random.order = FALSE, random.color = TRUE,colors=c('orange', 'black'))

References