Data Science Capstone: Simple Word Prediction

Rafael Reséndiz Ramirez
monday, April 27, 2015

Simple Word Prediction

alt text
1.- This app was developped by Rafael Resendiz Ramírez.
2.- The databases and algorithms were developped in accordance with the Coursera Capstone Project.
3.- The databases are too large, but,I made this app very light.
4.- The model is based in accordance with the Natural Processing Language and predictive language model.
5.- This shiny app was based on n-gram model with S-Back-off Smoothing and Kneser-Ney Smoothing

Rafael Reséndiz Ramírez
Mon Apr 27 17:45:52 2015

How to use

alt text

— You must enter a phrase or sentence at the top left panel, then in the next panel will select the number of words you predict that you would like to see. For example, by default, has left a sentence and have selected 2 words with smoothing Back-off for the model of n-grams.
— You may want to make attempts with different phrases, 'number of words' or smoothing methods. Then, you must press the button 'SUBMIT'.

Features

Smoothing Models

   The model can predict the words that continue in a sentence, showing what words have a higher probability of occurrence. In computational linguistic, an n-gram is a continuous sequence of n elements of a given sequence(text or voice). The n-grams are collected from a text or speech corpus.

Smoothing

      It is necessary a good estimate for the probability space n-gram model.

Implementation Kneser-Ney and Stupid Back-off

    The Kneser-Ney model were slow to display predicted words and accelerate the model with less code and relative loss of prediction accuracy. If you own a large corpus, the performance of the Stupid Backoff app displays a similar prediction accuracy Kneser-Ney with Smoothing.

Statistics

alt text

Here mi databases, records and variables.

References

 1. Körner, M. C. (n.d.). "Implementation of Modified Kneser-Ney Smoothing on Top of Generalized Language Models for Next Word Prediction Bachelorarbeit"", (September 2013).
 3. J. Schalkwyk, D. Beeferman, F. Beaufays, B. Byrne,C. Chelba, M. Cohen, M. Kamvar, and B. Strope, "Your word is my command”: Google search by voice: A case study", in Advances in Speech Recognition, Amy Neustein, Ed., pp. 61–90. Springer US, 2010.
 4. X. Lei, A. Senior, A. Gruenstein, and J. Sorensen, "Accurate and compact large vocabulary speech recognition on mobile devices", in Proceedings of Annual Conference of the International Speech Communication Association (Interspeech), 2013, pp. 662–665.
 6. [Coursera Discussion Board](https://class.coursera.org/dsscapstone-001/forum) .