How to predict the next word?
Discussion materials
By: Nicholas Ong
Date: Wed Dec 30 00:27:10 2020
This presentation is just an application, of up to 7 months of work on the Data Science Specialization by Johns Hopkins University.
The aim of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others.
We work on the following:
Overview, understanding the problem, and getting the data
Exploratory data analysis and modeling
Build and evaluate the prediction model
Reducing computational runtime and model complexity.
After this steps, the result was split into three N-grams files: gram2, gram3 and gram4
In short, the model takes the last few words of a sentence (a 4-gram if four words are used, 3-gram for three words, etc.) and uses statistics based on a large collection of English sentences to find the most probable next word, given that set of sentences.
The resulting language model is stored in a set of data files: a file for unigrams, a file for bigrams, etc. The information in these files is similar to the ARPA format files for n-gram models.
For this project we must submit:
A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
[My Shiny App] -
[https://huaigim.shinyapps.io/JHU_DataScienceCapstone/]
[My Github Repo] - [https://github.com/huaigim/JHU_DataScienceCapstone]
The capstone project class allows students to create a usable/ public data product that can be used to showcase your skills to potential employers. Projects will be drawn from real-world problems and will be conducted with industry, government, and academic partners.
For more details: [https://www.coursera.org/specializations/jhu-data-science]
A Shiny application was developed based on the next word prediction model described previously as shown below.
For example if you write “Who are”