wraphaeljr
January 18, 2016
The “Text Forecast” application takes a users input, and attempts to identify which word is likely to come next.
The use of this application is fairly straightforward:
The user types in a phrase and hits the “Predict” button.
The alogrithm that's used (more on that in the next slide) is pretty simple, so the results are returned almost instantly.
It was difficult to find a satisfying algorithm for this task. After many many hours, and many mistakes, I managed to extract unigrams, bigrams, trigrams, quadgrams, and quintgrams from blogs, twitter, and the news. I lopped off the final word of my terms and applied a Markov/“memoryless” assumption that the most critical information was contained in the last few words.
The algorithm basically cleans the text input, subsets it, and then uses multi-level if-else structures to search through different tables for the most popular matches. The tables each have different restrictions & variables to try to hone in on the best matches, but some bad ones still get through.
clean2(clean1(input)) %>% if_elif() %>% subf2() %>% if_elif2() %>% sub_filt() %>% ... #etc
The general principles in the design of this application seem favorable to expansion. I feel like there are a lot of different directions a person could go in. I would be curious to see how the algorithm and the application could be expanded to be more accurate. As I mentioned before, I would love to see a more sophisticated model implemented. There are a lot of NLP utilities out there, and I think that with more experience, some really impressive results could be achieved.
I also think it would be cool to try to harness the behavior of the user to help weight the different predictions in the future.
Thank you for taking the time to check out the app !-D
Have a great day!