Fabiano Borsto
This presentation can in short however comprehensively pitch an application for predicting the next word.
The application is the capstone project for the Coursera Data Science specialization held by professors of the Johns Hopkins University and in cooperation with SwiftKey.
The main goal of this capstone project is to make a shiny application that's ready to predict subsequent word.
This exercise was divided into seven sub tasks like data cleansing, exploratory analysis, the creation of a predictive model and more.
All text data that is used to create a frequency dictionary and thus to predict the next words comes from a corpus called HC Corpora.
All text mining and natural language processing was done with the usage of a spread of well-known R packages.
After making {a information|a knowledge|an information} sample from the HC Corpora data, this sample was clean by conversion to little, removing punctuation, links, white area, numbers and every one varieties of special characters. This data sample was then tokenized into so-called n-grams.
In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. (Source)
Those aggregate bi-,tri- and quadgram term frequency matrices are transferred into frequency dictionaries.
The ensuing information.frames ar wont to predict subsequent word in reference to the text input by a user of the represented application and therefore the frequencies of the underlying n-grams table.
The program of this application was designed with Mobile First in mind. whereas getting into the text (1), the sector with the expected next word (2) refreshes instantly and conjointly the full text input (3) gets displayed.
The next word prediction app is hosted on shinyapps.io: https://fborsato.shinyapps.io/CAPSTONE
This pitch deck is located here: http://http://rpubs.com/fborsato/capstone
Learn more about the Coursera Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1