Leandro Jimenez
This presentation is a comprehensively explanation of a predicting the next word.
The application is the capstone project for the Coursera Data Science specialization
The main objective of this project is to build a brilliant application to predict the next word and show the result, which means the next word after typing a phrase
This exercise was divided in seven weeks which encourages us to the cleaning, the exploratory analysis and mainly the creation of a predictive model to put into practice the knowledge that I have acquired during this amazing specialization .
All text data that is used to create a frequency dictionary. Then, the prediction comes from a corpus called HC Corpora using well-known R packages
After creating the data sets of the data from three resources of Corpora of HC, the data was cleaned, eliminating:
This data sample was then tokenized into the so-called n-grams.
In the fields of computational linguistics and probability, an n-gram is the contiguous sequence of n elements of a given sequence of text or speech
It was created data set of frequency matrices have been transferred to frequency dictionaries of bi-, tri-, and quadgram to each data set.
Finally, those dictionaries was used to predict the next word.
After opening the app: https://jleandroj1.shinyapps.io/capstonedatascience/
enter a phrase
wait a moment
see the word
see the complete phrase that you type
This app only works with the language english
To contact: jleandroj@gmail.com
A short messages
To my teachers: thanks for everthing, you changed my life
To everybody: You need to take this specialization it really amazing