We can predict the next word, with a reasonable degree of accuracy, using only a computer and a lot of text from online sources.
Using an approach utilizing
The model is based on 500,000 lines of combined blog-, twitter- and news-text. This ensures a reasonable applicability in different settings.
The app in use
The model utilizes a mix of 1-, 2-, 3-, 4-, 5- and 6-grams. The shortest n-grams are used for short bits of text, while for longer text up to 6-grams are used to ensure accurate sense of setting
We built an app, which we published as a shiny app.
You can try it out at
https://rasmusklitgaard.shinyapps.io/coursera_data_science_capstone/