Text Forecast

wraphaeljr
January 18, 2016

What It Does

The “Text Forecast” application takes a users input, and attempts to identify which word is likely to come next.

The main prediction is set apart
- Alternatives are listed below
Two metrics are provided to help indicate the strength of the prediction.
- The PropIndex is listed for the strongest prediction (max. score is a 91)
- The LocalScore is provided for every possibility (max. score is 100)

How the App Works

The use of this application is fairly straightforward:

The user types in a phrase and hits the “Predict” button.
- When the page is first loaded, the application tries predicting the first word, which is unoriginally the word “the”. I considered removing this feature, but it made me chuckle and I realized that technically that's a great prediction for the first word of a sentence.
- The user can increase the adjusting the maximum number of hits displayed. By default, it's set to ten predictions.
The alogrithm that's used (more on that in the next slide) is pretty simple, so the results are returned almost instantly.

The Prediction Algorithm

It was difficult to find a satisfying algorithm for this task. After many many hours, and many mistakes, I managed to extract unigrams, bigrams, trigrams, quadgrams, and quintgrams from blogs, twitter, and the news. I lopped off the final word of my terms and applied a Markov/“memoryless” assumption that the most critical information was contained in the last few words.

The algorithm basically cleans the text input, subsets it, and then uses multi-level if-else structures to search through different tables for the most popular matches. The tables each have different restrictions & variables to try to hone in on the best matches, but some bad ones still get through.

clean2(clean1(input)) %>% if_elif() %>% subf2() %>% if_elif2() %>% sub_filt() %>% ... #etc

Further Development

The general principles in the design of this application seem favorable to expansion. I feel like there are a lot of different directions a person could go in. I would be curious to see how the algorithm and the application could be expanded to be more accurate. As I mentioned before, I would love to see a more sophisticated model implemented. There are a lot of NLP utilities out there, and I think that with more experience, some really impressive results could be achieved.

I also think it would be cool to try to harness the behavior of the user to help weight the different predictions in the future.

Thank you for taking the time to check out the app !-D
Have a great day!