Predictive Model
- Approx 10% data is sampled from each of the 3 files
- Generate & Structure N-grams (1,2,3 & 4 grams) as a lookup table (data.table)
\[ P_{(sweet|sun\ is\ shining\ the\ weather\ is)} \approx P_{(sweet|the\ weather\ is)} \]
P(hola) \( {\Large \approx} \) P(chau)