Shiny App for getting text N-grams (Coursera Project)

About App

This application gets the most frequent n-gramms from downloaded text.

According to Wikipedia: “N-gram is a contiguous sequence of n items (words in our app) from a given sequence of text or speech… The n-grams typically are collected from a text or speech corpus…

An n-gram of size 1 is referred to as a "unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”. Larger sizes are sometimes referred to by the value of n, e.g., “four-gram”, “five-gram”, and so on.“

N-gram example

For example if you have a text : “If i want to extract n-grams from text using R, i can make a Shiny App for Coursera Developing Data Prodects Class using R. After that i can make a presentation using R Presenter and submit it at Coursera Project Page. And here is some more text to make n-grams more frequent and more different. Downloaded text to Shiny App by the way can be very very long (you can try even some books like the default book, included in my github repository also submitted at coursera assignment page.”

The most frequen N grams where N=2 are:

      Ngram Frequency
1   using R         3
2 Shiny App         2
3  can make         2

Wordcloud

My shiny app also makes a wordcloud of most frequent N-grams to easily visualise them. For our previous example if N=1 it would be:

plot of chunk unnamed-chunk-2

How to use

To use the app here you have to do this steps:

Upload the the file with text in csv or txt format, also checking separator and header checkboxes if needed. Default text example is here.
Choose the N for N-grams (for example, if N=2, then app will get the most frequent bigramms (2-words sequence) from text file.
To remove punctuation and spaces from uploaded text files use checkboxes from filters.
Choose minimum length and frequency of n-gram if needed. For example, “minimum length=3” means that all n-grams shorter than 3 character will be thrown away.