R Markdown

This report summarizes some of the basic data of the Blogs, News and twitter data. The first four charts summarize word and line counts. These are followed by a table of frequency counts of the words following the word ‘the’. Then a transition matrix from a Markov model is displayed. These data will be the backbone of a Shiny model that will allow a user to predict the next word of text.

## Loading required package: ngram

## [1] "Number words in US News"

These are the frequency counts for the words following “the”

##  Next Word NUmber Occurrences Percent of Total
##      first                501             1.23
##       same                443             1.09
##       most                280             0.69
##      other                280             0.69
##       best                242             0.59
##        way                224             0.55
##       time                223             0.55
##       last                222             0.54
##      world                217             0.53
##       next                212             0.52
##        new                199             0.49

These are the transition probabilities from the Markov Chain and sample sentences beginning with the word ‘the’

##             In Oil after and fields most named of pagan platforms the
## In           0 0.0     0   0      0    0     0  0     0         0   1
## Oil          0 0.0     0   0      1    0     0  0     0         0   0
## after        0 0.0     0   0      0    0     0  0     1         0   0
## and          0 0.0     0   0      0    0     0  0     0         1   0
## fields       0 0.0     0   1      0    0     0  0     0         0   0
## most         0 0.0     0   0      0    0     0  1     0         0   0
## named        0 0.0     1   0      0    0     0  0     0         0   0
## of           0 0.0     0   0      0    0     0  0     0         0   1
## pagan        0 0.0     0   0      0    0     0  0     0         0   0
## platforms    0 0.0     0   0      0    0     0  0     0         0   0
## the          0 0.5     0   0      0    0     0  0     0         0   0
## thereafter,  0 0.0     0   0      0    1     0  0     0         0   0
## were         0 0.0     0   0      0    0     1  0     0         0   0
## years        0 0.0     0   0      0    0     0  0     0         0   0
## “gods”.      0 0.0     0   0      0    0     0  0     0         0   0
##             thereafter, were years “gods”.
## In                    0    0   0.0       0
## Oil                   0    0   0.0       0
## after                 0    0   0.0       0
## and                   0    0   0.0       0
## fields                0    0   0.0       0
## most                  0    0   0.0       0
## named                 0    0   0.0       0
## of                    0    0   0.0       0
## pagan                 0    0   0.0       1
## platforms             0    1   0.0       0
## the                   0    0   0.5       0
## thereafter,           0    0   0.0       0
## were                  0    0   0.0       0
## years                 1    0   0.0       0
## “gods”.               0    0   0.0       0
## [1] "the Oil fields and platforms were named after."
## [1] "the Oil fields and platforms were named after."
## [1] "the Oil fields and platforms were named after."
## [1] "the years thereafter, most of the Oil fields."
## [1] "the Oil fields and platforms were named after."