Arturo Cardenas
August 23rd, 2015
Throughout the entire Coursera's Data Science Specialization we improved our Data Scientist Tool kit from learning how to install R to creating a Natural Language Processing model. This app is the cherry on top and emulates the behavior of predicting text tools - such as SwiftKey - and it's the Specialization Capstone project.
This app is the result of an intensive data processing using KNIME, model developing using R and leveraging the advantages of RStudio & shinyapps
At the backend of the app there are 3 frequency tables with either 2, 3 or 4 N-Grams.
length(vector) = 3 (filling it with NAs when needed e.g. c(NA, NA, "I")data.table package, it find the next word through the N-Grams tables in a hierarchy approach:NA is displayed the app is ready to be usedI used KNIME to pre-process the corpus. Once I understood the problem, I realized that the core work was going to create the best N-grams possible. For this task I created the following workflow that helped me process the entire corpus in just a couple of hours.
KNIME is “VBA macros on steroids”!