Language Model
Bruno Tavares
10 de julho de 2020
Main Points
- A language model is a model that predicts a word based on the previous words
- In our case, we used only 2 words
- Our algorithm is based on the n-gram model
- In order to achieve faster results, we limited our database to only 10 Mb
Some Details
- We’ve employed the n-gram model with the tidytext package
- Just the Twitter dataset was used, for performance reasons
- A big dataframe was generated with the 3-gram sequences
- The app just perform a filtering of the 1st and 2nd words
Improvements
- We know that the prediction algo can be improved A LOT
- We can just increase the dataset
- We can combine other n-gram models
- There should be a more robust text treatment
- etc