While continuing work, practice, and learning of R Markdown I came across the gutenbergr package on ropensci.org so I thought I would try my hand at some more practice with data wrangling and data visualizations.

To begin I pulled up the required libraries.

library(gutenbergr)
library(tidytext)
library(tm)

## Loading required package: NLP

library(twitteR)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:twitteR':
## 
##     id, location

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(wordcloud)

## Loading required package: RColorBrewer

After some thought I decided on Plato and I sent an api call to see what works of Plato were available through the https:///www.gutenberg.org api.

gutenberg_works(author == "Plato")

## # A tibble: 27 x 8
##    gutenberg_id title        author gutenbe~ langu~ gutenb~ rights   has_~
##           <int> <chr>        <chr>     <int> <chr>  <chr>   <chr>    <lgl>
##  1          150 The Republic Plato        93 en     <NA>    Public ~ T    
##  2         1571 Critias      Plato        93 en     <NA>    Public ~ T    
##  3         1572 Timaeus      Plato        93 en     <NA>    Public ~ T    
##  4         1579 Lysis        Plato        93 en     <NA>    Public ~ T    
##  5         1580 Charmides    Plato        93 en     <NA>    Public ~ T    
##  6         1584 Laches       Plato        93 en     <NA>    Public ~ T    
##  7         1591 Protagoras   Plato        93 en     <NA>    Public ~ T    
##  8         1598 Euthydemus   Plato        93 en     <NA>    Public ~ T    
##  9         1600 Symposium    Plato        93 en     <NA>    Public ~ T    
## 10         1616 Cratylus     Plato        93 en     <NA>    Public ~ T    
## # ... with 17 more rows

Taking a look at the list I narrowed it down to The Republic. With the book downloaded I began to clean it up and get it ready for analysis.

## Determining mirror for Project Gutenberg from http://www.gutenberg.org/robot/harvest

## Using mirror http://aleph.gutenberg.org

words <- plato %>% unnest_tokens(word,text)

plato_corpus <- Corpus(VectorSource(words3))

With The Republic in cleaned up corpus form I made a few wordclouds.

Wordcloud with words parsed in random order of frequency.

Wordcloud with words parsed in order of high frequency to low frequency.

Five dialogous of Plato

After finishing with The Republic I started to take a look at the 5 most common dialogues of Socrates written by Plato. Downloading just the one book and cleaning it went fine without to many misfires so wanted to try to wrangle multiply book downloads. Admittedly the five I chose are not lengthy texts but hey baby steps I’m still getting a handle on this data wrangling stuff.

Euthyphro
Apology
Crito
Meno
Phaedo

dial_corpus <- Corpus(VectorSource(dial_words3))

dial_clean <- tm_map(dial_corpus, removeWords, c("socrates", "meno", "euthyphro", "crito", "phaedo"))

## [1] "Using direct authentication"

Trumpy Trump and the Trumper bunch!

Well what was I thinking last time while working with the twitter api to not include POTUS Donald Trump. Well I have been admonished for my lapse and decided to throw it in here at the end.

Gutenberg Text Mining Practice

Michael Warner

January 15, 2018

While continuing work, practice, and learning of R Markdown I came across the gutenbergr package on ropensci.org so I thought I would try my hand at some more practice with data wrangling and data visualizations.

To begin I pulled up the required libraries.

After some thought I decided on Plato and I sent an api call to see what works of Plato were available through the https:///www.gutenberg.org api.

Taking a look at the list I narrowed it down to The Republic. With the book downloaded I began to clean it up and get it ready for analysis.

With The Republic in cleaned up corpus form I made a few wordclouds.

Wordcloud with words parsed in random order of frequency.

Wordcloud with words parsed in order of high frequency to low frequency.

Five dialogous of Plato

Trumpy Trump and the Trumper bunch!

Well what was I thinking last time while working with the twitter api to not include POTUS Donald Trump. Well I have been admonished for my lapse and decided to throw it in here at the end.

Wow great, this is great, how could it not be great. It’s great to have such great things going on now isn’t that great!