Language Model

Bruno Tavares

10 de julho de 2020

Introduction

These slides presents some of the details of the app published in https://bbtavares.shinyapps.io/language_model/
It’s a language model that predicts a word based on the 2 last words

Main Points

A language model is a model that predicts a word based on the previous words
In our case, we used only 2 words
Our algorithm is based on the n-gram model
In order to achieve faster results, we limited our database to only 10 Mb

Some Details

We’ve employed the n-gram model with the tidytext package
Just the Twitter dataset was used, for performance reasons
A big dataframe was generated with the 3-gram sequences
The app just perform a filtering of the 1st and 2nd words

Improvements

We know that the prediction algo can be improved A LOT
We can just increase the dataset
We can combine other n-gram models
There should be a more robust text treatment
etc