Treatment of swear words
- I believe that not swear words themselves but the usage of words matter. Therefore, I arranged the model to show warnings whenever one of seven dirty words is used.
Methodology to improve the prediction accuracy
- When more than 3 words are given, the last 3 words are used to predict the next word (because this model considers maximum 4-gram words). If the sequence of the 3 words does not match with any 4-gram words in the database, the last 2 words are now used to match with 3-gram words in the database. The same process is done for 2 words.
- If no match is found, one of the most frequently used words in the database is randomly given.