Johns Hopkins University Coursera Data Science Specialization & SwiftKey
Eric Thompson
January 2018
corpus object and created the tidy input dataset
for our model.a is entered by the user,toks_list is the list of tokens entered by the user; we use the rev() function
to ensure the algorithm first searches for the user's n-gram and, if not found,
proceeds in a logical, sequential manner through the possible skip-grams, andlast is the final word in the phrase entered by the user and a required part
of the skip-grama <- NULL
f <- function(a = NULL) {
# User provides an n-gram "a" and we create a list of its tokens
toks <- tokens(a)
toks_list <- tokens_skipgrams(toks,
n = 1:length(toks[[1]]),
skip = 0:4,
concatenator = "_")
# Re-order token list so it begins with the n-gram (no skips) and proceeds sequentially
toks_list <- rev(toks_list[[1]])
# Require the last word entered by user to be matched as part of skip-gram
last <- toks[[1]][length(toks[[1]])]
toks_list <- toks_list[grep(last, toks_list)]
# Loop through the tokens list until a match is found in input dataset
for (i in 1:length(toks_list)) {
if (nrow(input[`n-1` == toks_list[i], ]) != 0) {
# Return predicted word
return(input[`n-1` == toks_list[i], n])
break()
}
}
}