Monnappa Somanna
23-Dec-2016
The goal of this project is to “predict” the “most likely” word the user want to type based on previous 2 or 3 words
This application is useful in Mobile texting to enhance user experience by faster typing
Millions of News, Blog posts and Tweets are used as “Corpus” for training the Dataset.
Following are the key links:
Link to the Application
Link to the Github
Following are the key steps:
2.'Tokenization' of Text by breaking up the given text into units called Tokens. The tokens may be words or number or punctuation mark
Create n-gram sequence from the above Data. an N-gram is a contiguous sequence of N items from a given sequence of text or speech. … An n-gram of size 1 is referred to as a “unigram”;size 2 is a “bigram” ; size 3 is a “trigram”
Count the number of occurences of N-grams, We shall limit the n=4 for memory limitations
Calculate probabilties for each N-Gram using Maximum Likelihood Estimate And Simlple Linear Interpolation
Lookup the user input data for unigram, bigram and trigram
7.Extract the last three tokens (e.g. prev1, prev2) from the phrase. If the phrase is not long enough, extract the last two tokens or last token
Instructions for using the App:
Limitations of the App:
References: