Word Predictor Presentation

Yanfei Chen
Thu Jul 30 12:16:16 2020

Overview

This app is the fruit of the final project of the Data Science Specialization by the Johns Hopkins University on Coursera. This app takes in one or several words and predict the next word. This kind of prediction is widely used in many products nowadays. One application is the input keyboard we use on our mobile phone. SwiftKey is a company that designs word predictor. This project was co-initiated by JHU and SwiftKey.

This app supports 3 languages: English, German and Finnish. Users input words into the left box and the predicted word(s) will show on the right. The number of predicted words ranges from 1 to 3. The leftmost one has the greatest possibility.

Methodology

I generate tokens from the corpus and generate 1-gram to 4-gram lists. The 4-gram lists are as follows:

   w1  w2     w3  w4 Freq
1  in the middle  of    6
2 the end     of the    6
   w1     w2     w3    w4 Freq
1 auf    den ersten blick    5
2  am montag     in paris    4
      w1    w2   w3   w4 Freq
1 tiimin voima   on site    5
2  voima    on site  kun    5

Methodology

The prediction model only takes the last 3 words of the input text as predictors. If these 3 words match exactly the first three words of an entry in the 4-gram list. The 4th word will be returned. If these 3 words do not match any entry, the leftmost word will be removed and the remaining 2 words will be examined in the 3-gram entry. This goes on. If the last word of the input text does not match any entry in the 1-gram list, the top 3 entries of the 1-gram list will be returned.

Example

example