2022-08-19

R Markdown

The goal of the task is to create a prediction model. When entering a text into the model, the code will predict what the next word will be.

Examples of this are seen in regular daily activities. Such prediction algorithms are seen when typing on your phone or even when doing a simple Google search.

Methodology and Limititations

Creating a prediction model involves analyzing large data sets. Due to the memory and time only a few of the data sets were analysed. The data sets used were:

- Blogs
- Twitter

Even then the size of the data was very large for current systems to handle.

Initial work involved cleansing the data, removing spaces and non-alpabetical characters as well making everything lower case.

Once the data was cleansed, Ngrams were created:

  • trigram_blogs <- ngram(lines, n=3);

Shiny App

The Shiny App produced works by entering text into the input box. Once entered, the algorithm will generate what the next word is likely to be by looking at the frequency in the data samples.

The Shiny App can be found: https://niteshchampaneri.shinyapps.io/Prediction_Model/

Future Improvements

- Going forward, using all data resources would enhance the model 
  and improve predictability. Unfortunately with current resources 
  this is not possible. 
- The app would be improved, showing more than just frequency charts.