TD
03-06-2019
RSI is very costly to programmers and their employers
We made a demo program that shows how easily we could predict the next word you programmers are going to type
We used:
The ngrams were made by cleaning twitter messages, blogs and news articles (removing stopwords, etc.) and then creating n-grams.
The algorithm checks if the last three words typed (after cleaning) appear as the first three words of a quadgram. If not then the last two words typed in the trigram, otherwise the last word typed in the bigram. If so it gives the n-gram that contains the last word(s) typed and has the highest occurence in the texts of all n-grams that contain the last word(s) typed.
The algorihm uses super fast C++ embeddings for superior performance
cppFunction('
NumericVector cpp_make_frequency_vector(int n_names, NumericVector j_vec, NumericVector v_vec){
int n = n_names;
NumericVector ngram_frequencies(n);
int id_max = j_vec.size();
for (int i = 0; i < n; i++){
ngram_frequencies[i] = 0;
}
int j;
for (int id = 0; id < id_max; id++){
j = j_vec[id] - 1;
ngram_frequencies[j] += v_vec[id];
}
return ngram_frequencies;
}
')
Go to https://t-publish.shinyapps.io/Word_predictor/ and experience the magic first hand!
Imagine how much typing we could save your programmers if we used the same algorithm to predict their next word
Type a word or a couple of words and wait for the program to predict the next word