Welcome!
Welcome to the Notebook accompanying the research project “Performing an Invisibility Spell”. This notebook is a way of adding transparency to said project by showing the code and procedures that yielded several data visualizations used in the aforementioned paper. For the operations made using Cortex, kindly refer to the link provided in the paper. In order to make this notebook as transparent as possible, each snip of code is explained, either before the code itself or using comments next to the code (the latter are introduced by a Hashtag)
Introductory Steps
We start by loading the extension “tidyverse”, which will aid us throughout the project (providing syntactic expressions like the pipeline operators "%>% as well as the ggplot and grepl functions that we will use to produce graphs and search for particular occurrences in our corpus).
library(tidyverse)
Warning messages:
1: In readChar(file, size, TRUE) : truncating string with embedded nuls
2: In readChar(file, size, TRUE) : truncating string with embedded nuls
3: In readChar(file, size, TRUE) : truncating string with embedded nuls
Afterwards, we are telling R to read our 252 articles. They are formatted in a tabular way using the tsv format. Please do note that this is a simplification of the results gathered from Factiva. It does not contain some of the rich meta-data indexed by Factiva (ex: regions mentioned in the article) simply because we will not use them in this exploration. Furthermore, in order to make our visualisation readable, we decided to recode the column “Publicatio” (the publication date of a given article) and assign months (format: Year/Month) rather than the day when an article was published.
article_by_month = read_tsv("252good.tsv")
-- Column specification --------------------------------------------------------
cols(
Headline = col_character(),
article_fu = col_character(),
Publicatio = col_character()
)
The next command allows us to manually explore our corpus.
view(article_by_month)
We can see that our corpus contains the headlines of a given article (column “Headline”), the full text of the article (column “article_fu”) and their publication dates (column “Publicatio”)
Basic Exploration of our corpus: Frequencies of Publication
The next piece of code will produce a bar chart showing the frequency of our articles over time.
ggplot(article_by_month)+ #We are asking the ggplot function to take our corpus as the data to graph
aes(Publicatio) + #The X axis should be the publication date
geom_bar(colour = 'green', alpha=0.2) + #We want a green bar chart
labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-May 2020)") #These are the labels we want

In the graph above, we can see that the media focused on publishing articles on human migration in the context on the Kerala floods and landslides of 2019 chiefly in August and September 2019. The immediate months during/after the disaster. Indeed, most of our articles belong to this time period (around 200). The publication of articles on human mobility and the Kerala floods/disaster never died off. However, it decreased from October 2019 onwards and experienced a surge in March 2020 that continues throughout April 2020. Lastly, the month of May 2020 should be taken with a pinch of salt since we only have data for the articles published on the first 11 days of May (hence, it makes sense for the number of articles contained in our corpus to be smaller).
Given that we do not want to show the limited data we have for May 2020, let us create a new plot.
We start by selecting the articles published from August 2019 to April 2020
articles_August_April = article_by_month %>%
select(Publicatio) %>%
filter(grepl("2019/08|2019/09|2019/10|2019/11|2019/12|2020/01|2020/02|2020/03|2020/04", Publicatio))
And we create a new graph
ggplot(articles_August_April)+ #We are asking the ggplot function to take our corpus as the data to graph
aes(Publicatio) + #The X axis should be the publication date
geom_bar(colour = 'blue', alpha=0.2) + #We want a blue bar chart to differentiate it from our previous graph
labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-April 2020)") #These are the labels we want

Lastly, since we would like to tell exactly how many articles were published in the two months with the highest frequencies, we will run the following script.
articles_August_September = article_by_month %>%
select(Publicatio) %>% #We are asking R to select the date of publication as our variable of interest
filter(grepl("2019/08|2019/09", Publicatio)) #Now, we are searching for articles published either on August 2019 or September 2019
count(articles_August_September)
A total of 183 articles were published during the first two months.
Checking for mentions of the Adivasi communities
We understand that saying that articles talking about human mobility in the context of the Kerala floods and landslides in 2019 did not portray adivasi communities is a bold claim. Therefore, we would not be surprised if readers are unconvinced by just saying that Adivasis were not represented by the anglophone media on the basis that no mention of these communities appeared on our extracted list of the 500 most frequent terms.
This script is a rudimentary qualification of that claim, we are searching for any reference to adivasis made by the media when they talked about human migration in the context of the Kerala floods and landslides from August 2019 to May 2020
articles_adivasis = article_by_month %>%
select(Headline, article_fu) %>% #We are asking R to select the full text of our articles and the headlines
mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
filter(grepl("adivasi|adivasis|tribal|tribe|tribes|forest dwelling|forest dweller|forest dwellers", article_fu)) #Now, we are searching for articles that mention adivasis in anyway collected in our dictionary. Note: The term forest dwelling and forest dwellers is a legalistic term used by the Indian government to refer to Adivasis in laws like the "The Scheduled Tribes and Other Traditional Dorest Dwellers Act"
view(articles_adivasis)
count(articles_adivasis)
The count of articles shows that only 8 out of 252 articles mentioned Adivasis. A minor number of articles indeed.
Discerning between two discourses: Checking for mentions of COVID 19
We suspect that said surge is due to the COVID epidemic and instances that used the Kerala floods and landslides of 2019 as a reference point to human migration related to the COVID 19. We will check for this in the next section.
We will start by finding mentions of COVID 19 in our corpus.
covid_ocurrances = article_by_month %>%
select(article_fu, Publicatio) %>% # We are selection our fields of interest, the full text of a given article and its publication date
mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
filter(grepl("covid|corona|coronavirus|covid-19|covid 19", article_fu)) #Lastly, we are asking R to find articles that used words referring to COVID-19 in their full texts.
view(covid_ocurrances)
count(covid_ocurrances)
We have a total of 35 articles that talk about COVID 19. Let’s graph their distribution across time next
ggplot(covid_ocurrances)+ #We are asking the ggplot function to take our search for mentions of COVID-19 as the data to graph
aes(Publicatio) + #We specify that the X axis should be the publication date
geom_bar(colour = 'deeppink3', alpha=0.2) + #We want a pink graph chart, the alpha of 0.2 refers to the background color
labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about COVID-19 in the context of migration and the Kerala Floods and Landslides (August 2019-May 2020)") #Lastly, we are labeling our graph

This is all well and good, but a curious reader might ask him/herself how does the frequency of articles talking about COVID 19 compares to those who do not. Fear not, we will take care of it now by searching for how many articles were published from February 2020 onwards.
articles_from_april_onwards = article_by_month %>%
select(Publicatio) %>% #This time we are focusing on the publication date rather than the text of articles
filter(grepl("2020/02|2020/03|2020/04|2020/05", Publicatio)) #Now, we are searching for articles in our corpus published from the first mention of COVID 19 (February 2020) onward.
view(articles_from_april_onwards) #This command allows us to manually explore our results
count(articles_from_april_onwards)
In total, 43 articles were published from February 2020 onwards. Comparing this figure to the 35 articles than included COVID 19 we can say that from February onwards, COVID-19 reigned in the media’s discourse surrounding human mobility and the Kerala floods/landslides of 2019.
---
title: "Notebook Performing an Invisibility Spell"
output: html_notebook
---

# Welcome!
Welcome to the Notebook accompanying the research project "Performing an 
Invisibility Spell". This notebook is a way of adding transparency to said
project by showing the code and procedures that yielded several data
visualizations used in the aforementioned paper. For the operations
made using Cortex, kindly refer to the link provided in the paper.
In order to make this notebook as transparent as possible, each snip of code
is explained, either before the code itself or using comments next to the code
(the latter are introduced by a Hashtag)

# Introductory Steps
We start by loading the extension "tidyverse", which will aid us throughout
the project (providing syntactic expressions like the pipeline operators 
"%>% as well as the ggplot and grepl functions that we will use to produce graphs and
search for particular occurrences in our corpus).

```{r}
library(tidyverse)
```

Afterwards, we are telling R to read our 252 articles. They are formatted
in a tabular way using the tsv format. Please do note that this is a
simplification of the results gathered from Factiva. It does not contain some
of the rich meta-data indexed by Factiva (ex: regions mentioned in the article)
simply because we will not use them in this exploration. Furthermore, in order
to make our visualisation readable, we decided to recode the column "Publicatio"
(the publication date of a given article) and assign months (format: Year/Month)
rather than the day when an article was published.

```{r}
article_by_month = read_tsv("252good.tsv")

```
The next command allows us to manually explore our corpus.

```{r}
view(article_by_month)
```

We can see that our corpus contains the headlines of a given article (column     "Headline"), the full text of the article (column "article_fu") and their
publication dates (column "Publicatio")

# Basic Exploration of our corpus: Frequencies of Publication

The next piece of code  will produce a bar chart showing the frequency of our
articles over time.

```{r}
ggplot(article_by_month)+ #We are asking the ggplot function to take our corpus as the data to graph
  aes(Publicatio) + #The X axis should be the publication date
  geom_bar(colour = 'green', alpha=0.2) + #We want a green bar chart
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-May 2020)") #These are the labels we want
```

In the graph above, we can see that the media focused on publishing articles on
human migration in the context on the Kerala floods and landslides of 2019
chiefly in August and September 2019. The immediate months during/after the
disaster. Indeed, most of our articles belong to this time period (around 200).
The publication of articles on human mobility and the Kerala floods/disaster
never died off. However, it decreased from October 2019 onwards and experienced
a surge in March 2020 that continues throughout April 2020. Lastly, the month of
May 2020 should be taken with a pinch of salt since we only have data for the
articles published on the first 11 days of May (hence, it makes sense for the
number of articles contained in our corpus to be smaller).

Given that we do not want to show the limited data we have for May 2020, let us
create a new plot.

We start by selecting the articles published from August 2019 to April 2020

```{r}
articles_August_April = article_by_month %>%
  select(Publicatio) %>%
  filter(grepl("2019/08|2019/09|2019/10|2019/11|2019/12|2020/01|2020/02|2020/03|2020/04", Publicatio))
```
And we create a new graph

```{r}
ggplot(articles_August_April)+ #We are asking the ggplot function to take our corpus as the data to graph
  aes(Publicatio) + #The X axis should be the publication date
  geom_bar(colour = 'blue', alpha=0.2) + #We want a blue bar chart to differentiate it from our previous graph
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about migration and the Kerala Floods/Landslides (August 2019-April 2020)") #These are the labels we want
```



Lastly, since we would like to tell exactly how many articles were published
in the two months with the highest frequencies, we will run the following
script.

```{r}
articles_August_September = article_by_month %>%
  select(Publicatio) %>% #We are asking R to select the date of publication as our variable of interest
  filter(grepl("2019/08|2019/09", Publicatio)) #Now, we are searching for articles published either on August 2019 or September 2019
  count(articles_August_September)
```


A total of 183 articles were published during the first two months.

# Checking for mentions of the Adivasi communities

We understand that saying that articles talking about human mobility in the 
context of the Kerala floods and landslides in 2019 did not portray adivasi
communities is a bold claim. Therefore, we would not be surprised if readers
are unconvinced by just saying that Adivasis were not represented by the
anglophone media on the basis that no mention of these communities appeared
on our extracted list of the 500 most frequent terms.

This script is a rudimentary qualification of that claim, we are searching
for any reference to adivasis made by the media when they talked about
human migration in the context of the Kerala floods and landslides
from August 2019 to May 2020

```{r}
articles_adivasis = article_by_month %>%
  select(Headline, article_fu) %>% #We are asking R to select the full text of our articles and the headlines
  mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
  filter(grepl("adivasi|adivasis|tribal|tribe|tribes|forest dwelling|forest dweller|forest dwellers", article_fu)) #Now, we are searching for articles that mention adivasis in anyway collected in our dictionary. Note: The term forest dwelling and forest dwellers is a legalistic term used by the Indian government to refer to Adivasis in laws like the "The Scheduled Tribes and Other Traditional Dorest Dwellers Act"
  view(articles_adivasis)
  count(articles_adivasis)
```

The count of articles shows that only 8 out of 252 articles mentioned Adivasis.
A minor number of articles indeed.

# Discerning between two discourses: Checking for mentions of COVID 19

We suspect that said surge is due to the COVID epidemic and instances that
used the Kerala floods and landslides of 2019 as a reference point to human
migration related to the COVID 19. We will check for this in the next section.

We will start by finding mentions of COVID 19 in our corpus.
```{r}
covid_ocurrances = article_by_month %>% 
  select(article_fu, Publicatio) %>% # We are selection our fields of interest, the full text of a given article and its publication date
  mutate(article_fu = tolower(article_fu)) %>% # Here, we are mutating the full text of our articles into lower cases to go around the case sensitivity of our next function.
  filter(grepl("covid|corona|coronavirus|covid-19|covid 19", article_fu)) #Lastly, we are asking R to find articles that used words referring to COVID-19 in their full texts.
  view(covid_ocurrances)
  count(covid_ocurrances)
```

We have a total of 35 articles that talk about COVID 19. Let's graph their
distribution across time next


```{r}
ggplot(covid_ocurrances)+ #We are asking the ggplot function to take our search for mentions of COVID-19 as the data to graph
  aes(Publicatio) + #We specify that the X axis should be the publication date
  geom_bar(colour = 'deeppink3', alpha=0.2) + #We want a pink graph chart, the alpha of 0.2 refers to the background color
  labs(x="Month of Publication", y="Number of Articles", title = "Articles talking about COVID-19 in the context of migration and the Kerala Floods and Landslides (August 2019-May 2020)") #Lastly, we are labeling our graph
```

This is all well and good, but a curious reader might ask him/herself how does
the frequency of articles talking about COVID 19 compares to those who do not.
Fear not, we will take care of it now by searching for how many articles
were published from February 2020 onwards.


```{r}
articles_from_april_onwards = article_by_month %>%
  select(Publicatio) %>% #This time we are focusing on the publication date rather than the text of articles
  filter(grepl("2020/02|2020/03|2020/04|2020/05", Publicatio)) #Now, we are searching for articles in our corpus published from the first mention of COVID 19 (February 2020) onward.
view(articles_from_april_onwards) #This command allows us to manually explore our results
count(articles_from_april_onwards)
```
In total, 43 articles were published from February 2020 onwards. Comparing this
figure to the 35 articles than included COVID 19 we can say that from February
onwards, COVID-19 reigned in the media's discourse surrounding human mobility
and the Kerala floods/landslides of 2019.

