Text-as-Data
- How to Quantify Text & High Dimensionality
- Bag of Words Model
- Text-as-Data model
tidytext
- Introduction to tidytext
- Dictionary-based Sentiment Analysis
- Word Counts, TF-IDF
November 16, 2017
The Problem of High Dimensionality
"A sample of 30-word Twitter messages that use only the 1,000 most common words in the English language, for example, has roughly as many dimensions as there are atoms in the universe."
Credit: Chris Manning
All quantitative models of language are wrong – but some are useful.
Quantitative methods for text amplify human abilities, not replace them.
There is no globally best method for text analysis.
Validate, validate, validate.
Grimmer and Stewart, 2013
Quinn et al., 2010