Final project (Data Science capstone)

Vipavee Trivittayasil
2016.7.17

About

This is a slide deck describing the app developed as parts of the final project for Data Science capstone, which is the final course for Data Science Specialization by John Hopkins University.

For more details on the course please visit https://www.coursera.org/learn/data-science-project/.

Purpose of the app

This app is developed to automatically guess the word you are going to type.

Over two million lines of twitter were used in the construction of the prediction model. From this data, tokens were extracted and only those with high frequency covering 50% of the total variability were used.

Algorithms

Markov model is a stochastic model used to model randomly changing system. The assumption of the model is that future states depend only on the current state and not the events that occurred before it.

From all tokens, bigram and trigram models were developed to predict the next word. To save the computation cost, only those covering 50% of bigram and 30% of trigrams were used. The main algorithms are that there are no input, the most common five starting words will be given. If one word is given, the next word will be given based on bigram model. If there are more than two words, it will search in trigram model first and if there is no available next word, it will look in bigram model.

Try it out!

Feel free to try the app out here https://chengvt.shinyapps.io/predicting_the_next_word/.

5 candidates of the potential next words will appear!