Data science coursera Capstone

Rasmus Klitgaard

Word predictions based on online blogs, news and twitter

We can predict the next word, with a reasonable degree of accuracy, using only a computer and a lot of text from online sources.

Using an approach utilizing

  • n-gram based text profiling
  • Blog posts
  • twitter posts
  • News articles

The model

The model is based on 500,000 lines of combined blog-, twitter- and news-text. This ensures a reasonable applicability in different settings.

The app in use

About the model

The model utilizes a mix of 1-, 2-, 3-, 4-, 5- and 6-grams. The shortest n-grams are used for short bits of text, while for longer text up to 6-grams are used to ensure accurate sense of setting

Test it out yourself!

We built an app, which we published as a shiny app.

You can try it out at
https://rasmusklitgaard.shinyapps.io/coursera_data_science_capstone/