Poobalan
22 April 2016
This application attempts to predict the next word based on user input (using maximum of 2 words to predict). The prediction is based on datasets provided namely twitter, blog and news data from SwiftKey.
Challenges faced
Solution/Workarounds
Two algorithms were used:
Simple Back-off
Simple Back-off check the possible words in a 3-word table (trigram), then in a 2-word table (bigram) and finally returns the word with highest occurence in a 1-word table (unigram) if the trigram and bigram searches fail.
Simple Good-Turing
This algorithm takes into consideration that a word not in dictionary may be entered by user. thus it calculates these probabilities to make a better prediction of the word. It checks in 3-word table, and if no match is found, it then checks in 2-word table. If no match is found in either table, it returns a “not found” message.
1. User can enter input, choose a prediction method, and click on submit button on the sidebar.
2. The resulting prediction will appear in the main panel.
The application is accessible at https://libra22.shinyapps.io/TextPredictor/
Performance
Limitations