- Clean Data Corpus and develop a predicting model
- Develop a Shiny app for a word prediction model
- launch Shiny app
06 aprile 2021
The data was clean and create a sample file, special characters like numbers, punctuations, whitespaces, as lowercases, stopwords were removed in the Data Cleaning process.
With the Tokennaization
function further analysis were performed and some basic EDA (Exploratory Data Analysis).
For modeling we created unigrams, bigrams, trigrams & quadgrams for the modeling purposes.
The predictive model has been developed using n-gram frequency matrices.
URL for my Shiny Application (https://gabyom.shinyapps.io/Word_predict/)
The Shiny App is built for the Next Word Prediction
.
The main goal is to predict the next best word for the sentence. The best next word will be shown in the display.
Quadgrams have the highest priority to find a match, if a match is not found, it will proceed to the trigrams, biagrams and last but not least unigrams.
Predict the next term of the user input sentence 1. For prediction of the next word, Quadgram is first used (first three words of Quadgram are the last three words of the user provided sentence). 2. If no Quadgram is found, back off to Trigram (first two words of Trigram are the last two words of the sentence). 3. If no Trigram is found, back off to Bigram (first word of Bigram is the last word of the sentence) 4. If no Bigram is found, back off to the most common word with highest frequency 'the' is returned.
Thank you.