Introduction to Text as Data

Blog post 1 describing the data that I am interested working on as a part of the course “Text as Data”

Rahul Gundeti (Graduate student, Data Analytics & Computational Social Sciences (DACSS), UMass Amherst.)
2022-05-03

The rise of the importance of the data can’t be ignored at the rate it is being generated today ans all the technological advancements are pointing towards it’s potential to drive the change that everyone wants. The ubiquitous transformations are necessarily stressing enough on how much one can rely on information and what value it holds in the 21st century a.k.a Digital age of human evolution. The phrase “Data is the new oil,” is a fact that no one can deny how badly they want to deny it. That’s where the digital revolution has headed to.

With all the rapid technological advancements in all fronts in all industries one thing that remains prudent and everyone considers vital these days are the Data that they can gather by all means. The data is just random information if you don’ know what to do with it and technically speaking there are many kinds of information that is collected. Of all the types, textual information is considered significantly important and there are many people working on to develop methods/ ways to synthesize the information and make it understandable to machine as naturally as how humans process it. But why?

One possible answer is “communication.” Communication is the Key. Machines communicate in binary language unlike us, humans! If we become successful at making machines learn to understand our language and interpret the meaning of it as effectively as we do it then the applications of that technology are limitless. It will make everyone life easier allowing them to concentrate more on the things that matter to them while leaving the rest to machines. To establish that communication at scale we need to understand text as data first and then form a story around the data because all data helps us to create a compelling story at the end that adds meaning to the data.

While all this sounds okay, it struck to me when I learnt about creating stories from textual data. Because for me data was always perceived to be numbers. I hope many of you are like me, an excel file with numbers comes to mind when I hear the word data. Suddenly it occurred to me what if it is possible for me to gain insights from stories. I decided to give it a try with my idea to analyse stories from the website “Humans of New York(HONY)” where Brandon Stanton demystifying lives in the New York streets one story at a time.