Rui La
12/27/2016
2014 Data Science Capstone
The purpose of this project of the Data Science Capstone is to use the skills acquired in the previous courses to create an application based on a predictive model for text. Given a word or a sentence as input, the application will return a bunch of prediction words.
The next word prediction app is on shinyapps.io:
The App is slow when predicting next words. Please be patient. Thanks.
us_twitter = readLines("./final/en_US/en_US.twitter.txt", encoding = "UTF-8")
us_blog = readLines("./final/en_US/en_US.blogs.txt", encoding = "UTF-8")
us_news = readLines("./final/en_US/en_US.news.txt", encoding = "UTF-8")
string = str_replace_all(string, "[[:punct:]]", "")
string = str_replace_all(string, "[[:digit:]]", "")
string = tolower(string)
match = regmatches(datasource, regexpr(paste(term, "(.*?) "), datasource))
matchlist = gsub(paste(term, "| $"),"", match)
The data sample was then tokenized into so-called n-grams
Find the most frequent word after n-grams terms.
nwText[i] = names(sort(table(next.word), decreasing = TRUE)[i])
#nwText[i] will find i th most frequent word after the previous n-gram germ
Free to use and runs right in the browser usgin Shiny.