Data and Methods

I will be using data from a dataset created by Farris et al. for this project. It represents a compilation of over 14,000 sources of human rights texts from four sources - Amnesty International, Human Rights Watch, Lawyers Committee for Human Rights and the US Department of State. In our world where media narratives are fragile and we may not necessarily have one source of information we can trust, these organizations have access to situations on the ground, are known for their non-profit work around the globe, and have an audience of readers who desire information beyond the daily news. Because this is my first time exploring this topic, I have reduced the data to half-chosen articles from the year 2000 up until the last year of articles found in this corpus, i.e. 2015. The authors have already created term matrices and corpora that are easily accessible to a general audience, however, I will cover the pre-processing strategies in my own code in order to gain a better understanding of how the coding is done.

The data can be found on:

https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/IAH8OY