While continuing work, practice, and learning of R Markdown I came across the gutenbergr package on ropensci.org so I thought I would try my hand at some more practice with data wrangling and data visualizations.

To begin I pulled up the required libraries.

library(gutenbergr)
library(tidytext)
library(tm)
## Loading required package: NLP
library(twitteR)
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:twitteR':
## 
##     id, location
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(wordcloud)
## Loading required package: RColorBrewer
After some thought I decided on Plato and I sent an api call to see what works of Plato were available through the https:///www.gutenberg.org api.
gutenberg_works(author == "Plato")
## # A tibble: 27 x 8
##    gutenberg_id title        author gutenbe~ langu~ gutenb~ rights   has_~
##           <int> <chr>        <chr>     <int> <chr>  <chr>   <chr>    <lgl>
##  1          150 The Republic Plato        93 en     <NA>    Public ~ T    
##  2         1571 Critias      Plato        93 en     <NA>    Public ~ T    
##  3         1572 Timaeus      Plato        93 en     <NA>    Public ~ T    
##  4         1579 Lysis        Plato        93 en     <NA>    Public ~ T    
##  5         1580 Charmides    Plato        93 en     <NA>    Public ~ T    
##  6         1584 Laches       Plato        93 en     <NA>    Public ~ T    
##  7         1591 Protagoras   Plato        93 en     <NA>    Public ~ T    
##  8         1598 Euthydemus   Plato        93 en     <NA>    Public ~ T    
##  9         1600 Symposium    Plato        93 en     <NA>    Public ~ T    
## 10         1616 Cratylus     Plato        93 en     <NA>    Public ~ T    
## # ... with 17 more rows
Taking a look at the list I narrowed it down to The Republic. With the book downloaded I began to clean it up and get it ready for analysis.
## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest
## Using mirror http://aleph.gutenberg.org
words <- plato %>% unnest_tokens(word,text)
plato_corpus <- Corpus(VectorSource(words3))
With The Republic in cleaned up corpus form I made a few wordclouds.

Wordcloud with words parsed in random order of frequency.

Wordcloud with words parsed in order of high frequency to low frequency.

Five dialogous of Plato

After finishing with The Republic I started to take a look at the 5 most common dialogues of Socrates written by Plato. Downloading just the one book and cleaning it went fine without to many misfires so wanted to try to wrangle multiply book downloads. Admittedly the five I chose are not lengthy texts but hey baby steps I’m still getting a handle on this data wrangling stuff.
  • Euthyphro
  • Apology
  • Crito
  • Meno
  • Phaedo
dial_corpus <- Corpus(VectorSource(dial_words3))
dial_clean <- tm_map(dial_corpus, removeWords, c("socrates", "meno", "euthyphro", "crito", "phaedo")) 

## [1] "Using direct authentication"

Trumpy Trump and the Trumper bunch!

Well what was I thinking last time while working with the twitter api to not include POTUS Donald Trump. Well I have been admonished for my lapse and decided to throw it in here at the end.

Wow great, this is great, how could it not be great. It’s great to have such great things going on now isn’t that great!