NLP Prediction

Mauricio Paletta
June 2, 2018

What are NLP?

Natural-Language Processing (NLP) is an area of computer science and artificial intelligence concerned with the interactions between computers and human (natural) languages, in particular how to program computers to fruitfully process large amounts of natural language data. For more information please visit:

NLP Prediction

The main goal of NLP prediction is forecasting the next word in a sentence based on the previous words. It might seem irrelevant if one thinks of NLP only in terms of processing text for semantic understanding. However, NLP also involves processing noisy data and checking text for errors.

An NLP algorithm that could catch errors as for example with a spell checking, would thus need to look beyond what letters form words and instead attempt to determine what word is most probable in a given sentence. For more information please visit:

My NLP Prediction strategy

  1. I built a prediction model based on data from a corpus called HC Corpora.
  2. For learning I used a low percentage of any of the three data sources: blogs, news and twitters. Data was first properly filtered and cleaned.
  3. I built three different data frames based on the prediction of a word according to de first (1-gram), second (2-gram) and third (3-gram) previous word (for more details please check the ngram concept.
  4. The prediction model was built with an own algorithm based on Katz back-off.
  5. The model was designed in order to find a balance between memory consumption and execution time.
  6. The predicted words are chosen based on an order made according to the weighted average cost calculated with the frequencies of the 1-gram, 2-gram and 3-gram prediction. The corresponding weights I used are 0.075, 0.3, and 0.625 given more relevance to 3-gram predictions.

My NLP Prediction Shiny application