Charles-Antoine de Thibault
19 March 2018
The application is the Capstone Project for the Data Science Specialisation from Johns Hopkins University on Coursera. https://www.coursera.org/specializations/jhu-data-science
The objective was to digest a raw dataset from blogs, news and twitter to be able to build a model that should predict the next word you are about to write given the last words written.
The whole process can be found on github on https://github.com/charlesdethibault/DataScienceCaptone.
The work is divided into 5 different steps.
To be able to create a model light enough to with acceptable accuracy able to run on shinyapps.io, I have decided to use a Back off Model which will compare the sequence of words types against the same sequence of words in the initial database.
If the sequence does not exist, it will remove the first word of the sequence and compare the new sequence to the database.
The app is available on https://charlesdethibault.shinyapps.io/SwiftFinal/.
If you enter your word or sentence of the left, the next predicted word will appear on the right.
The N gram used to predict the word will appear below prediction.