A word prediction app

Coursera capstone project

Joyce Clemente

2024-10-17 (v.1); 2024-10-17 (last update)

Objective and method

Number of unique ngrams (n_size) or colocates (c_size) used in the models.
ngram n_size c c_size
ugram 398317 -5 1506385
bgram 4689189 -6 1415277
tgram 2593472 -7 1367120
qgram 1868130
#Compute weights (user provides wh in field #2 of app).
w1 <- ((1 - wh)/3) * 2; w2 <- (1 - wh)/3; w3 <- (1 - wh)
#Weigh ngram probabilities (e.g. for 2,3,4-grams)
pw4 <- pw4 * wh; pw3 <- pw3 * w1; pw2 <- pw2 * w2
#N-grams and weights involved will change depending on the highest matched n-gram

App performance

Per phrase number of matches (test set).
Min. 1st Qu. Median Mean 3rd Qu. Max.
no_letter 1 462 2725 9173 12396 44264
one_letter 0 23 122 510 642 4986
two_letters 0 4 20 96 92 2347

How to use the app

What information does the app provide?