This presentation was created as part of the Data Science Specialisation-capstone project . It briefly describes how the current n grams based next word prediction model was developed.
The next slides will briefly describe how the model was trained.
The code used for following step has been shared on Git Hub:
Data Split: The data was split into training, test and validation sets.
N Gram Modelling:
Due to limited hardware resources the model was trained only on the first 2,00,000 chunks of texts in the training dataset.
The raw chunks were split into sentences and the sentences were split into words/tokens which were then cleaned.
Accuracy: Currently the model is 0.14 accurate, which is indeed very low.
Currently limited hardware resources
In future, the state-of-the-art Transformers or Deep Learning methods can be used.
Shiny App:The shiny app was created using the the Shiny library in R. The app was hosted on shinyapps.io while the associated code is shared through the GitHub Repository.
Sahil Sharma
PhD Student
Data Science for Tourism Research
Email- sahilsharmahimalaya@gmail.com
LinkedIn
Twitter