Keith Wheeles
April 4, 2016
March 7, 2016: Assignment to develop word prediction app was provided along with three data files. Over 70 million words were included in these files to serve as the training and validation data to train a word prediction application. Preprocessing code written in Python was applied to develop a dictionary of words and counts of bigrams (two words appearing together) and trigrams (three words appearing together). Tables of words and counts were further preprocessed in R to assemble the final tables optimized for use by the final Shiny app. The app uses these counts to predict the next word that the user may type.
Resulting app:
Back-off model using trigram information if available, stepping back to bigram and then unigram if necessary
App “keyboard” implemented for:
Full “proof of concept” prototype. Further refinement could be applied:
I hope you enjoyed the app and appreciate your time in reviewing it!