DATA 607 Context Presentation

Nathan Cooper

November 1, 2017

Motivation

Excellent visualization for word count statistics
Enhances “telling a story with data”
Note it does not replace more rigorous analysis

Libraries

suppressMessages(suppressWarnings(library("tm")))
suppressMessages(suppressWarnings(library("wordcloud")))
suppressMessages(suppressWarnings(library("tidyverse")))
suppressMessages(suppressWarnings(library("stringr")))
suppressMessages(suppressWarnings(library("SnowballC")))

Library ‘tm’ works to extract the text from the source document, tidyverse and stringr provide methods for cleaning the data. Library ‘SnowballC’ contains a word stemming algorithm that collapses words to their common roots. ‘wordcloud’ creates the visualization.

Prepping Data.

pth <- 'C:\\Users\\Nate\\Documents\\DataSet\\ted_main.csv'
ted <- pth %>% read.csv() %>% data.frame()
ncol(ted)

## [1] 17

nrow(ted)

## [1] 2550

#head(ted) This is too big for display during the talk.

Word Cloud Code Block

dsCloud <- Corpus(VectorSource(ted$description))
dsCloud <- dsCloud %>% tm_map(PlainTextDocument)
dsCloud <- dsCloud %>% tm_map(str_replace_all, pattern = "[[:punct:]]", replacement = " ")
dsCloud <- dsCloud %>% tm_map(tolower)
dsCloud <- dsCloud %>% tm_map(removeWords, c('ted','talk','says', 'three', 'two' , 'david' ,stopwords('english')))
#dsCloud <- dsCloud %>% tm_map(stemDocument,  language = 'english') #Stemming seems to truncate words
wordcloud(dsCloud, max.words = 100, random.order = FALSE, random.color = TRUE,colors=c('orange', 'black'))

DATA 607 Context Presentation

Motivation

Libraries

Prepping Data.

Word Cloud Code Block

References