Guillem Mitjà
09/05/2021
This presentation is created as part of the requirement for the Coursera Data Science Capstone Course.
The goal of the project is to build a predictive text model combined with a shiny app UI that will predict the next word as the user types a sentence similar to the way most smart phone keyboards are implemented today using the technology of Swiftkey.
Shinny App: https://guillemmitja.shinyapps.io/Capstone/
Before building the word prediction algorithm, data are first processed and cleaned as steps below:
A subset of the original data was sampled from the three sources (blogs,twitter and news) which is then merged into one.
Next, data cleaning is done by conversion to lowercase, strip white space, and removing punctuation and numbers.
The corresponding n-grams are then created (Quadgram,Trigram and Bigram).
Next, the term-count tables are extracted from the N-Grams and sorted according to the frequency in descending order.
Lastly, the n-gram objects are saved as R-Compressed files (.RData files).
Explanation of the next word prediction flow is as below:
A Shiny application was developed based on the next word prediction model described previously as shown below.