GusEsq
9 october 2016
I tried to keep my algorithm simple
e.g. Let's try the word “rock”“ '> nextwordpred("rock”) [1] “star i chair”
#I upload to the shiny serve a csv file with the corpora 2-gramm order by frequency
#This is the main reason the algorith its quick
freqtwoGramw <- read.csv("database.csv")
#function for running prediction
nextwordpred <- function(word) {
split the last word of the phrase
part1 <- strsplit(word, " ")
part2 <- length(part1[[1]])
lookfor <- part1[[1]][part2]
comilla <- "^"
check <- paste(comilla,lookfor, sep="")
#Show the 3 most frequent words
options <- grep(check, freqtwoGramw$term, ignore.case = FALSE)
option1 <- options[1]
option2 <- options[2]
option3 <- options[3]
result1 <- as.character(freqtwoGramw$term[option1])
result2 <- as.character(freqtwoGramw$term[option2])
result3 <- as.character(freqtwoGramw$term[option3])
word1 <- strsplit(result1, split = " ")[[1]][2]
word2 <- strsplit(result2, split = " ")[[1]][2]
word3 <- strsplit(result3, split = " ")[[1]][2]
final<- rbind(word1, word2, word3)
print(final)
}
I always follow the rule of understanding the problem as the most important thing, when bringing down a solution. The 2-gram frecuency term look up, will give a basic but powerful approach to the next word.
This course was really challenging, for me it was really difficult for all the coding, I'm more used to statistic work, but this kind of development it's finally product for a final user, and there is where the value is added.