M Fisher (Coursera Student)
December 16, 2021
This presentation and accompanying app were developed for the capstone course of Coursera's Data Science Specialization, offered by Johns Hopkins University. You can read more about the specialization on the Coursera website. You can view the Shiny app at Next Word Prediction.
What is NLP?
Source: Wikipedia
Uses of NLP
This prediction model uses a data set generated by analyzing natural language data. The data set contains the top words following phrases of one, two, three, or four other words.
The prediction model reads the user's input, compares it to the data set, and suggests a next word using a simple back-off model. It always returns a suggestion, though the longer the phrase that fits, the better the prediction.
The model was built using R and the resulting application was published as a Shiny app. Try it out. It's awesome! ;)
We used text from blogs, news articles, and Twitter to generate the data set used by the model. Using that data, the processing script performed the following actions in order to prepare the data set for the Shiny app.
For more information, you can look at packages like tm, nlp, and quanteda. I used tidytext and a big a-ha moment came when I read the first four chapters of Text Mining in R: A Tidy Approach. (I highly recommend that book!)
Ideally, one would apply this approach to the largest possible amount of training text and achieve perfect accuracy. However, there are limitations: