Natural Language Word Prediction App

iair kleiman
August 2015

Coursera - JHU

App Introduction

The objetive of this app is to allow the user to write a sentence and to constantly predict the intended next word

Prediction Algorithm Pre-Process

  1. Corpus Loading
  2. Corpus Sampling
  3. Corpus Cleaning (whitespace, puntuation, numbers, lowercases)
  4. Construction of N-Grams(5, 4, 3, 2 and 1 grams)
  5. N-Grams size tunning (minimum frequency filter)
  6. From the N-Grams frequency construct N-Gram Probability

Important information

  • This app was develop completely under R
  • A training sample was used based on 5% of the blogs and twitts and 25% of news
  • A minimum frequency of 4 was required for every N-Gram

Prediction Algorithm

The used prediction Algorithm was Simple Interpolation

This is a better method in this case than Stupid Backoff

It will sum the weighted probability of a word based on each (1-4) N-Gram

\[ P(interp) = \lambda _{4}P(4Gram)+ \lambda _{3}P(3Gram)+\lambda _{2}P(2Gram)+\lambda _{1}P(1Gram) \]

Example

For the Frase: “Can you follow me please? It would mean the

last_word prob4 prob3 prob2 prob1 Prob_Interp
world 0.35 0.35 0.0325 0.0003426 0.7328427
kind 0.00 0.00 0.0250 0.0000414 0.0250414
climate 0.00 0.00 0.0175 0.0000035 0.0175035
time 0.00 0.00 0.0150 0.0002928 0.0152928
board 0.00 0.00 0.0150 0.0000472 0.0150472
horse 0.00 0.00 0.0150 0.0000109 0.0150109

App Instructions

Follow the link https://ikleiman.shinyapps.io/Capstone2

  1. Write a frase, any lenght, with or without numbers and/or puntuacion
  2. Do not press the SPACEBAR until you see the next predicted word screenshoot