Next word algoritm

TD
03-06-2019

Programmer's RSI problems

RSI is very costly to programmers and their employers

  • 1.8 million US employees suffer RSI annually*
  • 60% of wrokers suffer from wrist pain during their work*
  • Imagine the total extra cost you are paying for sickness leave and lost productivity of your programmers

* https://consumer.healthday.com/encyclopedia/pain-management-30/pain-health-news-520/repetitive-stress-injury-rsi-646236.html

Slide With Code

We made a demo program that shows how easily we could predict the next word you programmers are going to type

We used:

  • 180852 bi-grams
  • 68095 tri-grams
  • 12376 quad-grams

The ngrams were made by cleaning twitter messages, blogs and news articles (removing stopwords, etc.) and then creating n-grams.

The algorithm checks if the last three words typed (after cleaning) appear as the first three words of a quadgram. If not then the last two words typed in the trigram, otherwise the last word typed in the bigram. If so it gives the n-gram that contains the last word(s) typed and has the highest occurence in the texts of all n-grams that contain the last word(s) typed.

Extra fast C++ embeddings

The algorihm uses super fast C++ embeddings for superior performance

cppFunction('
    NumericVector cpp_make_frequency_vector(int n_names, NumericVector j_vec, NumericVector v_vec){
        int n = n_names;
        NumericVector ngram_frequencies(n);
        int id_max = j_vec.size();

        for (int i = 0; i < n; i++){
            ngram_frequencies[i] = 0;
        }

        int j;
        for (int id = 0; id < id_max; id++){
            j = j_vec[id] - 1;
            ngram_frequencies[j] += v_vec[id];
        }
        return ngram_frequencies;
    }
')

Experience the sensasion yourself

Go to https://t-publish.shinyapps.io/Word_predictor/ and experience the magic first hand!

Imagine how much typing we could save your programmers if we used the same algorithm to predict their next word

plot of chunk unnamed-chunk-2

Type a word or a couple of words and wait for the program to predict the next word