Welcome!

Welcome to the Notebook accompanying the research project “Performing an Invisibility Spell”. This notebook is a way of adding transparency to said project by showing the code and procedures that yielded several data visualizations used in the aforementioned paper. For the operations made using Cortex, kindly refer to the link provided in the paper. In order to make this notebook as transparent as possible, each snip of code is explained, either before the code itself or using comments next to the code (the latter are introduced by a Hashtag)

Introductory Steps

We start by loading the extension “tidyverse”, which will aid us throughout the project (providing syntactic expressions like the pipeline operators "%>% as well as the ggplot and grepl functions that we will use to produce graphs and search for particular occurrences in our corpus).

library(tidyverse)
package 㤼㸱tidyverse㤼㸲 was built under R version 4.0.2Registered S3 methods overwritten by 'dbplyr':
  method         from
  print.tbl_lazy     
  print.tbl_sql      
-- Attaching packages --------------------------------------- tidyverse 1.3.0 --
v ggplot2 3.3.2     v purrr   0.3.4
v tibble  3.0.3     v dplyr   1.0.2
v tidyr   1.1.2     v stringr 1.4.0
v readr   1.4.0     v forcats 0.5.0
package 㤼㸱ggplot2㤼㸲 was built under R version 4.0.2package 㤼㸱tibble㤼㸲 was built under R version 4.0.2package 㤼㸱tidyr㤼㸲 was built under R version 4.0.2package 㤼㸱readr㤼㸲 was built under R version 4.0.2package 㤼㸱purrr㤼㸲 was built under R version 4.0.2package 㤼㸱dplyr㤼㸲 was built under R version 4.0.2package 㤼㸱stringr㤼㸲 was built under R version 4.0.2package 㤼㸱forcats㤼㸲 was built under R version 4.0.2-- Conflicts ------------------------------------------ tidyverse_conflicts() --
x dplyr::filter() masks stats::filter()
x dplyr::lag()    masks stats::lag()

Afterwards, we are telling R to read our 252 articles. They are formatted in a tabular way using the tsv format. Please do note that this is a simplification of the results gathered from Factiva. It does not contain some of the rich meta-data indexed by Factiva (ex: regions mentioned in the article) simply because we will not use them in this exploration. Furthermore, in order to make our visualizations readable, I decided to re-code the column “Publicatio” (the publication date of a given article) and assign months (format: Year/Month) rather than the day when an article was published.

article_by_month = read_tsv("252good.tsv")

-- Column specification --------------------------------------------------------
cols(
  Headline = col_character(),
  article_fu = col_character(),
  Publicatio = col_character()
)

The next command allows us to manually explore our corpus.

view(article_by_month)

We can see that our corpus contains the headlines of a given article (column “Headline”), the full text of the article (column “article_fu”) and their publication dates (column “Publicatio”)

Basic Exploration of our corpus: Frequencies of Publication

The next piece of code will produce a bar chart showing the frequency of our articles over time.

ggplot(article_by_month)+ #We are asking the ggplot function to take our corpus as the data to graph
  aes(Publicatio) + #The X axis should be the publication date
  geom_bar(colour = 'green', alpha=0.2) + #We want a green bar chart
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-May 2020)") #These are the labels we want

In the graph above, we can see that the media focused on publishing articles on human migration in the context on the Kerala floods and landslides of 2019 chiefly in August and September 2019. The immediate months during/after the disaster. Indeed, most of our articles belong to this time period (around 200). The publication of articles on human mobility and the Kerala floods/disaster never died off. However, it decreased from October 2019 onwards and experienced a surge in March 2020 that continues throughout April 2020. Lastly, the month of May 2020 should be taken with a pinch of salt since we only have data for the articles published on the first 11 days of May (hence, it makes sense for the number of articles contained in our corpus to be smaller).

Now, just to visualize our data differently, let’s create a new graph, this time we will ignore the limited data we have for May 2020.

We start by selecting the articles published from August 2019 to April 2020

articles_August_April = article_by_month %>%
  select(Publicatio) %>%
  filter(grepl("2019/08|2019/09|2019/10|2019/11|2019/12|2020/01|2020/02|2020/03|2020/04", Publicatio))

And we create a new graph

ggplot(articles_August_April)+ #We are asking the ggplot function to take our corpus as the data to graph
  aes(Publicatio) + #The X axis should be the publication date
  geom_bar(colour = 'blue', alpha=0.2) + #We want a blue bar chart to differentiate it from our previous graph
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-April 2020)") #These are the labels we want

articles_August_September = article_by_month %>%
  select(Publicatio) %>%
  filter(grepl("2019/08|2019/09", Publicatio))
  count(articles_August_September)

Lastly, since we would like to tell exactly how many articles were published in the two months with the highest frequencies, we will run the following script.

articles_August_September = article_by_month %>%
  select(Publicatio) %>% #We are asking R to select the date of publication as our variable of interest
  filter(grepl("2019/08|2019/09", Publicatio)) #Now, we are searching for articles published either on August 2019 or September 2019
  count(articles_August_September)

A total of 183 articles were published during the first two months.

Checking for mentions of the Adivasi communities

We understand that saying that articles talking about human mobility in the context of the Kerala floods and landslides in 2019 did not portray adivasi communities is a bold claim. Therefore, we would not be surprised if readers are unconvinced by just saying that Adivasis were not represented by the anglophone media on the basis that no mention of these communities appeared on our extracted list of the 500 most frequent terms.

This script is a rudimentary qualification of that claim, we are searching for any reference to adivasis made by the media when they talked about human migration in the context of the Kerala floods and landslides from August 2019 to May 2020

articles_adivasis = article_by_month %>%
  select(Headline, article_fu, Publicatio) %>% #We are asking R to select the full text of our articles and the headlines
  mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
  filter(grepl("adivasi|adivasis|tribal|tribe|tribes|forest dwelling|forest dweller|forest dwellers", article_fu)) #Now, we are searching for articles that mention adivasis in anyway collected in our dictionary. Note: The term forest dwelling and forest dwellers is a legalistic term used by the Indian government to refer to Adivasis in laws like the "The Scheduled Tribes and Other Traditional Forest Dwellers Act"
  view(articles_adivasis)
  count(articles_adivasis)

The count of articles shows that only 8 out of 252 articles mentioned Adivasis. A minor number of articles indeed. Out of those 8 articles, 5 talked about Adivasis in a passing way. For those five articles published in throughout August and September 2019, the main topic was the visit of Rahul Gandhi (the leader of the Congress Party) to his constituency, Wayanad in Northern Kerala, were he “patiently heard the woes of the people who were displaced” (The Week, September 2019). Two mentions of adivasis were done on specialized media like “Economic and Political Weekly”. A weekly journal of the social sciences were one article mentioned the case of adivasis as an example of how vulnerable migrants carry out informal jobs in New Delhi and, another article, of how the credibility of official statistics needs to be increased. The last article that our search query identified as talking about Adivasis was a letter to the editor of the newspaper “Scroll In” where the author talked about tribal communities in the Narmada valley (North India). Thus, we can say not only that only a minor number of articles used words like “Adivasi”, “tribal” or “forest dweller” but that, when they did so they were passing mentions where the main focus of interest was not the Adivasis.

Discerning between two discourses: Checking for mentions of COVID 19

As shown in our first bar chart, there was a resurgences of articles talking about human mobility in the context of the Kerala floods and landslides in March 2020.

Following our distant reading conducted on Cortext, we suspect that said surge is due to the COVID epidemic and instances that used the Kerala floods and landslides of 2019 as a reference point to human migration related to COVID-19. We will check for this in the next section.

We will start by finding mentions of COVID 19 in our corpus.

covid_ocurrances = article_by_month %>% 
  select(article_fu, Publicatio) %>% # We are selection our fields of interest, the full text of a given article and its publication date
  mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
  filter(grepl("covid|corona|coronavirus|covid-19|covid 19", article_fu)) #Lastly, we are asking R to find articles that used words referring to COVID-19 in their full texts.
  view(covid_ocurrances)
  count(covid_ocurrances)

We have a total of 35 articles that talk about COVID 19. Let’s graph their distribution across time next

ggplot(covid_ocurrances)+ #We are asking the ggplot function to take our search for mentions of COVID-19 as the data to graph
  aes(Publicatio) + #We specify that the X axis should be the publication date
  geom_bar(colour = 'deeppink3', alpha=0.2) + #We want a pink graph chart, the alpha of 0.2 refers to the background color
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about COVID-19 in the context of migration and the Kerala Floods and Landslides (August 2019-May 2020)") #Lastly, we are labeling our graph

This is all well and good, but a curious reader might ask him/herself how does the frequency of articles talking about COVID 19 compares to those who do not. Fear not, we will take care of it now by searching for how many articles were published from February 2020 onwards.

articles_from_april_onwards = article_by_month %>%
  select(Publicatio) %>% #This time we are focusing on the publication date rather than the text of articles
  filter(grepl("2020/02|2020/03|2020/04|2020/05", Publicatio)) #Now, we are searching for articles in our corpus published from the first mention of COVID 19 (February 2020) onward.
view(articles_from_april_onwards) #This command allows us to manually explore our results
count(articles_from_april_onwards)

In total, 43 articles were published from February 2020 onwards. Comparing this figure to the 35 articles than included COVID 19 we can say that from February onwards, COVID-19 reigned in the media’s discourse surrounding human mobility and the Kerala floods/landslides of 2019.

---
title: "Notebook Performing an Invisibility Spell"
output: html_notebook
---

# Welcome!
Welcome to the Notebook accompanying the research project "Performing an 
Invisibility Spell". This notebook is a way of adding transparency to said
project by showing the code and procedures that yielded several data
visualizations used in the aforementioned paper. For the operations
made using Cortex, kindly refer to the link provided in the paper.
In order to make this notebook as transparent as possible, each snip of code
is explained, either before the code itself or using comments next to the code
(the latter are introduced by a Hashtag)

# Introductory Steps
We start by loading the extension "tidyverse", which will aid us throughout
the project (providing syntactic expressions like the pipeline operators 
"%>% as well as the ggplot and grepl functions that we will use to produce graphs and
search for particular occurrences in our corpus).

```{r}
library(tidyverse)
```

Afterwards, we are telling R to read our 252 articles. They are formatted
in a tabular way using the tsv format. Please do note that this is a
simplification of the results gathered from Factiva. It does not contain some
of the rich meta-data indexed by Factiva (ex: regions mentioned in the article)
simply because we will not use them in this exploration. Furthermore, in order
to make our visualizations readable, I decided to re-code the column "Publicatio"
(the publication date of a given article) and assign months (format: Year/Month)
rather than the day when an article was published.

```{r}
article_by_month = read_tsv("252good.tsv")

```
The next command allows us to manually explore our corpus.

```{r}
view(article_by_month)
```

We can see that our corpus contains the headlines of a given article (column     "Headline"), the full text of the article (column "article_fu") and their
publication dates (column "Publicatio")

# Basic Exploration of our corpus: Frequencies of Publication

The next piece of code  will produce a bar chart showing the frequency of our
articles over time.

```{r}
ggplot(article_by_month)+ #We are asking the ggplot function to take our corpus as the data to graph
  aes(Publicatio) + #The X axis should be the publication date
  geom_bar(colour = 'green', alpha=0.2) + #We want a green bar chart
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-May 2020)") #These are the labels we want
```

In the graph above, we can see that the media focused on publishing articles on
human migration in the context on the Kerala floods and landslides of 2019
chiefly in August and September 2019. The immediate months during/after the
disaster. Indeed, most of our articles belong to this time period (around 200).
The publication of articles on human mobility and the Kerala floods/disaster
never died off. However, it decreased from October 2019 onwards and experienced
a surge in March 2020 that continues throughout April 2020. Lastly, the month of
May 2020 should be taken with a pinch of salt since we only have data for the
articles published on the first 11 days of May (hence, it makes sense for the
number of articles contained in our corpus to be smaller).

Now, just to visualize our data differently, let's create a new graph, this time
we will ignore the limited data we have for May 2020.

We start by selecting the articles published from August 2019 to April 2020

```{r}
articles_August_April = article_by_month %>%
  select(Publicatio) %>%
  filter(grepl("2019/08|2019/09|2019/10|2019/11|2019/12|2020/01|2020/02|2020/03|2020/04", Publicatio))
```
And we create a new graph

```{r}
ggplot(articles_August_April)+ #We are asking the ggplot function to take our corpus as the data to graph
  aes(Publicatio) + #The X axis should be the publication date
  geom_bar(colour = 'blue', alpha=0.2) + #We want a blue bar chart to differentiate it from our previous graph
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-April 2020)") #These are the labels we want
```

```{r}
articles_August_September = article_by_month %>%
  select(Publicatio) %>%
  filter(grepl("2019/08|2019/09", Publicatio))
  count(articles_August_September)
```


Lastly, since we would like to tell exactly how many articles were published
in the two months with the highest frequencies, we will run the following
script.

```{r}
articles_August_September = article_by_month %>%
  select(Publicatio) %>% #We are asking R to select the date of publication as our variable of interest
  filter(grepl("2019/08|2019/09", Publicatio)) #Now, we are searching for articles published either on August 2019 or September 2019
  count(articles_August_September)
```


A total of 183 articles were published during the first two months.

# Checking for mentions of the Adivasi communities

We understand that saying that articles talking about human mobility in the 
context of the Kerala floods and landslides in 2019 did not portray adivasi
communities is a bold claim. Therefore, we would not be surprised if readers
are unconvinced by just saying that Adivasis were not represented by the
anglophone media on the basis that no mention of these communities appeared
on our extracted list of the 500 most frequent terms.

This script is a rudimentary qualification of that claim, we are searching
for any reference to adivasis made by the media when they talked about
human migration in the context of the Kerala floods and landslides
from August 2019 to May 2020

```{r}
articles_adivasis = article_by_month %>%
  select(Headline, article_fu, Publicatio) %>% #We are asking R to select the full text of our articles and the headlines
  mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
  filter(grepl("adivasi|adivasis|tribal|tribe|tribes|forest dwelling|forest dweller|forest dwellers", article_fu)) #Now, we are searching for articles that mention adivasis in anyway collected in our dictionary. Note: The term forest dwelling and forest dwellers is a legalistic term used by the Indian government to refer to Adivasis in laws like the "The Scheduled Tribes and Other Traditional Forest Dwellers Act"
  view(articles_adivasis)
  count(articles_adivasis)
```

The count of articles shows that only 8 out of 252 articles mentioned Adivasis. 
A minor number of articles indeed. Out of those 8 articles, 5 talked about Adivasis in a passing way. For those five articles published in throughout August and September 2019, the main topic was the visit of Rahul Gandhi (the leader of the Congress Party) to his constituency, Wayanad in Northern Kerala, were he "patiently heard the woes of the people who were displaced" (The Week, September 2019). Two mentions of adivasis were done on specialized media like "Economic and Political Weekly". A weekly journal of the social sciences were one article mentioned the case of adivasis as an example of how vulnerable migrants carry out informal jobs in New Delhi and, another article, of how the credibility of official statistics needs to be increased. The last article that our search query identified as talking about Adivasis was a letter to the editor of the newspaper "Scroll In" where the author talked about tribal communities in the Narmada valley (North India). Thus, we can say not only that only a minor number of articles used words like "Adivasi", "tribal" or "forest dweller" but that, when they did so they were passing mentions where the main focus of interest was not the Adivasis.

# Discerning between two discourses: Checking for mentions of COVID 19

As shown in our first bar chart, there was a resurgences of articles
talking about human mobility in the context of the Kerala floods
and landslides in March 2020. 

Following our distant reading conducted on Cortext, we suspect that said surge 
is due to the COVID epidemic and instances that used the Kerala floods and 
landslides of 2019 as a reference point to human migration related to  COVID-19.
We will check for this in the next section.

We will start by finding mentions of COVID 19 in our corpus.
```{r}
covid_ocurrances = article_by_month %>% 
  select(article_fu, Publicatio) %>% # We are selecting our fields of interest, the full text of a given article and its publication date
  mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
  filter(grepl("covid|corona|coronavirus|covid-19|covid 19", article_fu)) #Lastly, we are asking R to find articles that used words referring to COVID-19 in their full texts.
  view(covid_ocurrances)
  count(covid_ocurrances)
```

We have a total of 35 articles that talk about COVID 19. Let's graph their
distribution across time next


```{r}
ggplot(covid_ocurrances)+ #We are asking the ggplot function to take our search for mentions of COVID-19 as the data to graph
  aes(Publicatio) + #We specify that the X axis should be the publication date
  geom_bar(colour = 'deeppink3', alpha=0.2) + #We want a pink graph chart, the alpha of 0.2 refers to the background color
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about COVID-19 in the context of migration and the Kerala Floods and Landslides (August 2019-May 2020)") #Lastly, we are labeling our graph
```

This is all well and good, but a curious reader might ask him/herself how does
the frequency of articles talking about COVID 19 compares to those who do not.
Fear not, we will take care of it now by searching for how many articles
were published from February 2020 onwards.


```{r}
articles_from_april_onwards = article_by_month %>%
  select(Publicatio) %>% #This time we are focusing on the publication date rather than the text of articles
  filter(grepl("2020/02|2020/03|2020/04|2020/05", Publicatio)) #Now, we are searching for articles in our corpus published from the first mention of COVID 19 (February 2020) onward.
view(articles_from_april_onwards) #This command allows us to manually explore our results
count(articles_from_april_onwards)
```
In total, 43 articles were published from February 2020 onwards. Comparing this
figure to the 35 articles than included COVID 19 we can say that from February
onwards, COVID-19 reigned in the media's discourse surrounding human mobility
and the Kerala floods/landslides of 2019.

