Capstone

iair kleiman
August 2015

Coursera - JHU

Prediction Algorithm Pre-Process

  1. Corpus Loading
  2. Corpus Sampling
  3. Corpus Cleaning (whitespace, puntuation, numbers, lowercases)
  4. Construction of N-Grams(5, 4, 3, 2 and 1 grams)
  5. N-Grams size tunning (filtering according to minimum frequency)
  6. From the N-Grams frequency construct N-Gram Probability

Prediction Algorithm

The used prediction Algorithm was a Simple Interpolation

This is an better method in this case than Stupid Backoff

It will sum the weighted probability of a word based on each (1-4) N-Gram

\[ P(interp) = \lambda _{4}P(4Gram)+ \lambda _{3}P(3Gram)+\lambda _{2}P(2Gram)+\lambda _{1}P(1Gram) \]

Example

For the Frase: “Can you follow me please? It would mean the

last_word prob4 prob3 prob2 prob1 Prob_Interp
world 0.35 0.35 0.0325 0.0003426 0.7328427
kind 0.00 0.00 0.0250 0.0000414 0.0250414
climate 0.00 0.00 0.0175 0.0000035 0.0175035
time 0.00 0.00 0.0150 0.0002928 0.0152928
board 0.00 0.00 0.0150 0.0000472 0.0150472
horse 0.00 0.00 0.0150 0.0000109 0.0150109

App Instructions

Follow the link https://ikleiman.shinyapps.io/Capstone

  1. Write a frase, any lenght, with or without numbers and/or puntuacion
  2. Click Submit Button screenshoot