My Research Prospectus in 50 words

After spending months putting together my 38 page research prospectus, I thought it would be fun to see what keywords it all boiled down to. Here’s how I did it.

Bring in Data

# iExplorer and iMazing are $40 apps that will export iMessage conversations to .txt, but you can also just copy paste an iMessage conversation from your computer ddirectly into a word document, and save as .txt. That's what I did.
path <- '/Users/richpauloo/Desktop/Natural Languge Processing/Research Prospectus/rp.txt'

temp <- read.table(path, header = FALSE, fill = TRUE) # fill = TRUE b/c rows are of unequal length

Load libraries

library(dplyr)
library(tidytext)
library(stringr)
library(wordcloud)
library(knitr) # for tables
library(DT) # for dynamic tables

Tokenize and Remove Stop Words

# reshape the .txt data frame into one column
convo <- tidyr::gather(temp, key, word) %>% select(word)

# tokenize
tokens <- convo %>% 
  unnest_tokens(word, word) %>% 
  count(word, sort = TRUE) %>% 
  ungroup()

# remove stop words
data("stop_words")
tokens_clean <- tokens %>%
  anti_join(stop_words)

# remove numbers
nums <- tokens_clean %>% filter(str_detect(word, "^[0-9]")) %>% select(word) %>% unique()

tokens_clean <- tokens_clean %>% 
  anti_join(nums, by = "word")

# remove unique stop words that snuck in there
uni_sw <- data.frame(word = c("al","figure","i.e", "l3"))

tokens_clean <- tokens_clean %>% 
  anti_join(uni_sw, by = "word")

Visualize the top 50 words

# define a nice color palette
pal <- brewer.pal(8,"Dark2")

# plot the 100 most common words
tokens_clean %>% 
  with(wordcloud(word, n, random.order = FALSE, max.words = 50, colors=pal))

Sort through the words with a searchable data table

tokens_clean %>%
  datatable()

My Research Prospectus in 50 words

Rich Pauloo

10/1/2017

Bring in Data

Load libraries

Tokenize and Remove Stop Words

Visualize the top 50 words

Sort through the words with a searchable data table