Yanfei Chen
Thu Jul 30 12:16:16 2020
This app is the fruit of the final project of the Data Science Specialization by the Johns Hopkins University on Coursera. This app takes in one or several words and predict the next word. This kind of prediction is widely used in many products nowadays. One application is the input keyboard we use on our mobile phone. SwiftKey is a company that designs word predictor. This project was co-initiated by JHU and SwiftKey.
This app supports 3 languages: English, German and Finnish. Users input words into the left box and the predicted word(s) will show on the right. The number of predicted words ranges from 1 to 3. The leftmost one has the greatest possibility.
I generate tokens from the corpus and generate 1-gram to 4-gram lists. The 4-gram lists are as follows:
w1 w2 w3 w4 Freq
1 in the middle of 6
2 the end of the 6
w1 w2 w3 w4 Freq
1 auf den ersten blick 5
2 am montag in paris 4
w1 w2 w3 w4 Freq
1 tiimin voima on site 5
2 voima on site kun 5
The prediction model only takes the last 3 words of the input text as predictors. If these 3 words match exactly the first three words of an entry in the 4-gram list. The 4th word will be returned. If these 3 words do not match any entry, the leftmost word will be removed and the remaining 2 words will be examined in the 3-gram entry. This goes on. If the last word of the input text does not match any entry in the 1-gram list, the top 3 entries of the 1-gram list will be returned.