Arkadiusz Oliwa
12 December 2018
The application is the capstone project for the Coursera Data Science specialization held by professors of the Johns Hopkins University.
The main goal of this capstone project is to build a shiny application that is able to predict the next word.
This exercise was divided into several tasks like data cleansing, exploratory analysis and the creation of a predictive model.
The general idea is that you can look at each pair (or triple, set of four, etc.) of words that occur next to each other. In a large corpus, you’re likely to see ‘the red’ and ‘red apple’ several times, but less likely to see ‘apple red’ and ‘red the’. This may be useful to predict next word in typing.
These co-occuring words are known as ‘n-grams’, where ‘n’ is a number saying how long a string of words you considered.
N-grams and all text mining was done with the usage of a variety of R packages like tm, quanteda etc…
The Shiny application has an input text box to enter a partial sentence or phrase for which the user would like to predict the next word.
Instruction is quite simple: