The goal of the capstone in the Data Science Specialization is to demonstrate the skill set obtained along the courses by creating a public data product. In this case, the product is a shiny app that will be used to predict the next word after typing a sentence (similar to those on mobile devices).
The data to train and test the model, specifically, a Statistical Language Model, come from the HC corpora web site (http://www.corpora.heliohost.org). There were available three text files (blogs, news and tweets), but only 10% of the lines of each files were used. The key idea was to avoid a long processing time while loading data. Likewise, the capstone is presented in partnership with "Swiftkey" (https://swiftkey.com), one of the worldwide leaders in using data science techniques to build keyboards for Android and iOS devices.
In order to develop the app, first the data was explored (https://rpubs.com/victorsalda/234366). Then, after considering several models such as the n-grams, neural network and positional statistical language models the first one was used. Likewise, to make the app run softly only a database (.csv files) with 2-grams (bigrams) and 3-grams (trigrams) were used.
The shiny app may be found at: https://victorsalda.shinyapps.io/data_science_capstone/