Pham Ngoc Hieu - Kuriboh Kuet
29-10-2020
Capstone project for the Data Science Specialization on Coursera.
The project was sponsored by SwiftKey company.
Focus on predicting next word the user may use and suggest them for more quickly typing.
Data set provided by SwiftKey.
clean_corpus <- function(corpus) {
corpus %>%
tm::tm_map(tm::stripWhitespace) %>%
tm::tm_map(replacePunctuation) %>%
tm::tm_map(tm::removeNumbers) %>%
tm::tm_map(content_transformer(tolower)) %>%
tm::tm_map(train_data, removeWords, bad_words_bank)
}
Build a transition matrix for the Markov Chain.
build_transition_matrix <- function (distribution_matrix) {
t(apply(distribution_matrix, 1, function(current_row) {
row_sum <- sum(current_row)
current_row / row_sum
}))
}
Bases on the transition matrix to predict the next word the user might use.
Smooths the transition matrix using Katz's Backoff model.