Leandro Jimenez
This presentation is a comprehensively explanation of a predicting the next word.
The application is the capstone project for the Coursera Data Science specialization
The main objective of this project is to build a brilliant application to predict the next word and show the result, which means the next word after typing a phrase
This exercise was divided in seven weeks which encourages us to the cleaning, the exploratory analysis and mainly the creation of a predictive model to put into practice the knowledge that I have acquired during this amazing specialization .
All text data that is used to create a frequency dictionary. Then, the prediction comes from a corpus called HC Corpora using well-known R packages
After creating the data sets of the data from three resources of Corpora of HC, the data was cleaned, eliminating:
This data sample was then tokenized into the so-called n-grams.
It was created data set of frequency matrices have been transferred to frequency dictionaries of bi-, tri-, and quadgram to each data set.
The prediction model uses the n-gram dataset to make prediction. A backoff predicting model is used to compare the first 3 words against the dataset and produce the predicted word based on the last word. It will then compare 2 words and lastly one word. The frequency column is used to sort the data with the highest frequency as better prediction score.
After opening the app: https://jleandroj1.shinyapps.io/capstonedatascience/
enter a phrase
wait a moment
see the word
see the complete phrase that you type
This app only works with the language english
To contact: jleandroj@gmail.com
A short messages
To my teachers: thanks for everthing, you changed my life
To everybody: You need to take this specialization it really amazing