Framing my research question on HONY dataset

Blog post 2 describing the colletion of HONY dataset and also framing my researchquestions as a part of the course “Text as Data”

Rahul Gundeti (Graduate student, Data Analytics & Computational Social Sciences (DACSS), UMass Amherst.)
2022-05-03

What I want to find out from HONY data-set?

My approach with Humans of New York data-set is fairly simple. All I want to see is the influence/ impact propagated towards the readers by the HONY platform. I am trying to understand the intent hidden in the stories and how the readers are receiving them and what is happening in between. This can be achieved by answering the following questions:

Methodology:

The prime place to read all the stories is to go to the HONY website. Snippets of these stories are shared across all the social media platforms with a link redirecting to the website. Another source is a book published with curated stories from HONY.

I have decided to consider two sources for conducting my analysis:

The first source is the website which is more like a one way medium where I can get all the stories published and extract the sentiments from the stories.

I am considering twitter as my secondary source to perform my analysis which can be a two way medium where I can collect not only the sentiments from the stories but also the readers responses to those stories and how it influences them.

Comparing the sentiments carried between two data sources can help me identifying the overall influence propagated by HONY project.

The next step of my project and one of the most important step while dealing with data is Data collection. There are many ways to collect data and now I use web-scraping to collect the stories from the website.

HONY data collection

Website

I tried to scrape the website using R code but it wasn’t successful due to some technical constraints. I used an online extension called Open Web Scraper to help me with data scraping. I was able to scrape 1607 stories from the HONY website using the tool and save it as a txt file.

Twitter

Twitter data can be scraped with the help of Twitter API. Sign up for twitter developer account and fill the form with appropriate details as asked and wait for a day to get your developer account activated. Can’t wait that long? No problem, start the project with the twitter basic developer account.