March 25, 2017
Natural Language Processing (NLP) is a way to “train” computers to use human words to do useful things (Chowdhury, 2003)*. The collective wisdom of Wikipedia organizes NLP's specific tasks in four general areas:
*Chowdhury, G. G. (2003), Natural language processing. Ann. Rev. Info. Sci. Tech., 37: 51-89. doi:10.1002/aris.1440370103
This amateur NLP project includes a Shiny app that guesses one word based on the frequency of its superceding co-occurence with another word.
In other words, the algorithm uses a statistical model that combines tokenized uni-, bi- and tri-grams to offer a prediction of the next possible word.
The raw data include news, blog and twitter posts from a dataset provided by Coursera.
This algorithm is unique in its willingness to work with “bad words”, because the author values spelling over censorship and because the profanity filter is not formally required in the final project.
The app uses a sample of one-, two- and three- word combinations of around 1MB that allows zippy loading and lightning-fast word guessing. A total of 127,666 observations are feeding the algorithm.
It is optimized for predicting fairly trivial phrases like “How are you…” ==> “doing” and “My heart is…” ==> “breaking”.
Along with that, it produces gems like “F…k animal…” ==> “cruelty” that the ASPCA would endorse in a heartbeat.
Quad- and quint-grams can be added along with a larger sample at the expense of computing resources and time.
To test the Next Word Best Guess (NWBG) app, just type in a word (or two or three) in the properly labeled box and set the slider to reflect the number of guesses you want to see.
The predictions will populate the space below. In this version of the app, you can ignore the input box at the bottom.
Other than that, try it out and have fun!