This report summarizes some of the basic data of the Blogs, News and twitter data. The first four charts summarize word and line counts. These are followed by a table of frequency counts of the words following the word ‘the’. Then a transition matrix from a Markov model is displayed. These data will be the backbone of a Shiny model that will allow a user to predict the next word of text.
## Loading required package: ngram
## [1] "Number words in US News"
These are the frequency counts for the words following “the”
## Next Word NUmber Occurrences Percent of Total
## first 501 1.23
## same 443 1.09
## most 280 0.69
## other 280 0.69
## best 242 0.59
## way 224 0.55
## time 223 0.55
## last 222 0.54
## world 217 0.53
## next 212 0.52
## new 199 0.49
These are the transition probabilities from the Markov Chain and sample sentences beginning with the word ‘the’
## In Oil after and fields most named of pagan platforms the
## In 0 0.0 0 0 0 0 0 0 0 0 1
## Oil 0 0.0 0 0 1 0 0 0 0 0 0
## after 0 0.0 0 0 0 0 0 0 1 0 0
## and 0 0.0 0 0 0 0 0 0 0 1 0
## fields 0 0.0 0 1 0 0 0 0 0 0 0
## most 0 0.0 0 0 0 0 0 1 0 0 0
## named 0 0.0 1 0 0 0 0 0 0 0 0
## of 0 0.0 0 0 0 0 0 0 0 0 1
## pagan 0 0.0 0 0 0 0 0 0 0 0 0
## platforms 0 0.0 0 0 0 0 0 0 0 0 0
## the 0 0.5 0 0 0 0 0 0 0 0 0
## thereafter, 0 0.0 0 0 0 1 0 0 0 0 0
## were 0 0.0 0 0 0 0 1 0 0 0 0
## years 0 0.0 0 0 0 0 0 0 0 0 0
## “gods”. 0 0.0 0 0 0 0 0 0 0 0 0
## thereafter, were years “gods”.
## In 0 0 0.0 0
## Oil 0 0 0.0 0
## after 0 0 0.0 0
## and 0 0 0.0 0
## fields 0 0 0.0 0
## most 0 0 0.0 0
## named 0 0 0.0 0
## of 0 0 0.0 0
## pagan 0 0 0.0 1
## platforms 0 1 0.0 0
## the 0 0 0.5 0
## thereafter, 0 0 0.0 0
## were 0 0 0.0 0
## years 1 0 0.0 0
## “gods”. 0 0 0.0 0
## [1] "the Oil fields and platforms were named after."
## [1] "the Oil fields and platforms were named after."
## [1] "the Oil fields and platforms were named after."
## [1] "the years thereafter, most of the Oil fields."
## [1] "the Oil fields and platforms were named after."