Word Predictor App

DomR
December, 2014

What does Word Predictor App do?



  • Word Predictor App predicts next word based on last three words in the sentence.
  • It also suggests five likely words based on word frequencey distribution using corpus data from Capstone Dataset.

How does it work?



  • A sample dataset (20% of original corpus dataset) was used for building the prediction engine using R's tm package. see sampleData.r.
  • A lookup datagrams for bigrams, trigrams and quadgrams was build using the TermDocumentMatrix function.
  • Datagrams are saved into data files and used by Shiny App to reduce application load times.see Assignments2_V3.r.

How does it work (continued)?



  • Shiny server loads processed datagrams and generates frequency as well as frequency distribution
  • Smoothing techniques are applied using Simple Good Turing algorithm.
  • As user types on the Shiny App user interface, the a prediction algorithms selects five probable words given the weightage for matched grams as follows: 50% for quadrams,30% to trigrams and 20% for bigrams. see predictFunctions_V1.r.

How do I use it ?



  • Visit https://dxrodri.shinyapps.io/WordPredictor/
  • Wait for the app to initialize. Once the app is initialized, the app will display “Start Typing” message.
  • The app will predict next probable word as your type. It will also suggest five probable words.
  • For example, if you type “guys and gals are”, the app will predict the next word as terrific.
nextWord <- getNextWords(wordFrameGrams,SGT, "guys and gals are")
print(nextWord)
[1] "terrific"

Demo of the App



Future Improvements to the app



  • Use semantic analysis and part of speech
  • Expand datagram dictionary based on user inputs

Questions?





Thank you.

dxrodri at gmail