26/10/2020

Describing the App

Predictive text allows users to see words and/or phrases to be typed next, letting the user to type sentences with a small number of taps. This type of feature is commonly used by search engines and text message services. Our App aims at suggesting three options for the next word to be typed in a sentence.

Our model is based on the identification of multi-word expression in a collection of texts composed of twitter messages, blogs and news publications, and its ‘pairing’ with the phrases a user is typing. Still in a development stage, our app has been able to provide accurate suggestions 15% of the time, having the potential to increase its accuracy in the future by customizing the collection of common expressions to include a user’s own expressions, and expanding the capabilities of our app to include longer multi-word expressions.

Using the App

Our app is available at:

https://guar.shinyapps.io/PredictiveText/

To use the app, the user must enter an incomplete phrase on the textbox under the “Enter your phrase here” header, and click on the “Get Suggestions” button. Three suggested phrases will be displayed in the panel to the right, under the “Suggested phrases” header.

What’s going on in the App

Once the user has provided the input, the App will search a database populated with multi-word expressions. The database was built using an algorithm provided by quanteda package to identify multi-word expressions. We then processed these results to break each expression, extracting the last word on it. In this way we have a “history” field composed of (n-1) word expressions, and a “nextword” field composed of the last word in the original expression. We kept the three most frequent nextwords for each “history”.

In cases when a history record does not provide three different “next words”, we implement an algorithm which looks for a match considering the input phrase without its first word. If, after this search, there are missing options, then the algorithm deletes the second word of the input phrase, and searches for this “history” in the database again. This process is repeated until we find three suggested words.

References

Benoit K, Watanabe K, Wang H, Nulty P, Obeng A, Müller S, Matsuo A (2018). “quanteda: An R package for the quantitative analysis of textual data.” Journal of Open Source Software, 3(30), 774. doi: 10.21105/joss.00774, https://quanteda.io.