Research Questions
specific: How has the frequency and life impact of events in the India-Pakistan Conflict changed over 1990-2020?
specific: How many India-Pakistan conflict events took place in 1990-2020 in each province? Which demographic of Pakistan is most frequently reported on by media to be victim to India-Pakistan Conflict events in 1990-2020? What is the description of the violence that is being reported on between 1990-2020?
Background
The India-Pakistan conflict began during the partition of the two countries, formerly existing as the Indian-Subcontinent under the British Raj. During partition in the 1940s, a resource rich piece of land (Kashmir) was divided amongst the two countries. As a result, the cuontries have engaged in on an off conflict over control of this region. The countries have experienced phases of heightened conflict as well as those of positivity and peace. However, with the passing of time, tensions have strung high with the increase in conflict events and resulted in spreading to civilians, especially along the boarder, or “line of control.”
The hope for the project is to shed light on the nature of this conflict and emphasize the need for the passing of preventative policies on Pakistan’s end (since I am using data on Pakistan). Casualties will be the main variable of observation for this project, to establish how this butting of heads at the national level is actually a human rights issue at the ground level. The goal of the project is to humanize and highlight those experiencing the conflict. I expect there to be a correlation between amount of casualties and amount of events as there is likely a higher chance of death with more events. I also expect there to be a correlation between province of Kashmir and count of casualties because it is a boarder province and center of conflict.
In an ideal world, the policy implications of a project like this would to demonstrate a need for action in Pakistan passing policies to convert the existing existing cease-fire line into a permanent boundary. Additionally, steps would be taken to establishment of some sort of India-Pakistan Corridor to move forward in formal negotiations and decision making in allocation of land/resources1.
The first step is to load all packages needed to clean the data, I will load those packages and also those I will use to complete the rest of the analysis. I will then preview the overall data to see what needs to be ‘tidyed.’
library("kableExtra")
library("stringr")
library(tidyverse)
library(readxl)
library(RColorBrewer)
library(patchwork)
conflict_data <- read_excel("conflict_data_pak (1).xlsx")
conflict_data
## # A tibble: 149 x 52
## id relid year start_year end_year active_year code_status
## <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <chr>
## 1 59413 PAK-1990-1-345-4 1990 1990 1990 1 Clear
## 2 59553 PAK-1990-1-345-5 1990 1990 1990 1 Clear
## 3 51736 PAK-1991-1-345-5 1991 1991 1991 1 Clear
## 4 59651 PAK-1991-1-345-7 1991 1991 1991 1 Clear
## 5 54315 PAK-1991-1-345-8 1991 1991 1991 1 Clear
## 6 54880 PAK-1991-1-345-4 1991 1991 1991 1 Clear
## 7 54882 PAK-1991-1-345-6 1991 1991 1991 1 Clear
## 8 59664 PAK-1993-1-345-2 1993 1993 1993 0 Clear
## 9 53298 PAK-1994-1-345-7 1994 1994 1994 0 Clear
## 10 53814 PAK-1994-1-345-8 1994 1994 1994 0 Clear
## # ... with 139 more rows, and 45 more variables: type_of_violence <dbl>,
## # conflict_dset_id <dbl>, conflict_new_id <dbl>, conflict_name <chr>,
## # dyad_dset_id <dbl>, dyad_new_id <dbl>, dyad_name <chr>,
## # side_a_dset_id <dbl>, side_a_new_id <dbl>, side_a <chr>,
## # side_b_dset_id <dbl>, side_b_new_id <dbl>, side_b <chr>,
## # number_of_sources <dbl>, source_article <chr>, source_office <chr>,
## # source_date <chr>, source_headline <chr>, source_original <chr>, ...
Taking a look at the data below, I see that the principles of ‘tidy data’ are already adhered to:
The unit of analysis is a India-Pakistan conflict event, listed by the unique conflict event id (variable 1). I do see that majority of variable names are intuitive (please read list in tibble preview below). However, I will rename some of the relevant variables to better suiting names in the next step. The number of observations, or conflict events, captured in this data set is 149.
The next step is to see if relevant variables are ready to work with. I identified from the overall preview above that certain columns that I would like to work with need to be renamed, first I rename the columns as described below and and then preview the relevant columns:
conflict_data =
rename(conflict_data, Province = adm_1, deaths_Pakistan = deaths_b, deaths_India = deaths_a, accurate_countcasualty = high)
conflict_data %>%
select(deaths_Pakistan, deaths_India, deaths_civilians, Province, year, accurate_countcasualty)
## # A tibble: 149 x 6
## deaths_Pakistan deaths_India deaths_civilians Province year accurate_countc~
## <dbl> <dbl> <dbl> <chr> <dbl> <dbl>
## 1 0 0 0 Azad Ja~ 1990 5
## 2 2 0 0 Azad Ja~ 1990 2
## 3 0 0 2 Azad Ja~ 1991 2
## 4 0 0 0 Azad Ja~ 1991 4
## 5 0 0 0 Azad Ja~ 1991 4
## 6 7 0 0 Azad Ja~ 1991 7
## 7 0 0 0 Azad Ja~ 1991 9
## 8 3 0 0 Azad Ja~ 1993 3
## 9 0 0 1 Azad Ja~ 1994 1
## 10 0 0 1 Azad Ja~ 1994 1
## # ... with 139 more rows
The preview looks good, I have verified that the desired columns are now named in an intuitive manner and, no columns need to be combined or separated for my analysis (for example, if I needed to use date and columns were separated as month, day, year, I would need to combine). Now I will check for missing values, there do not seem to be any.
conflict_data2 =
conflict_data %>%
select(deaths_Pakistan, deaths_India, deaths_civilians, Province, id, year, accurate_countcasualty)
which(is.na(conflict_data2))
## integer(0)
This is not a data tidying step but a step I am doing for the project to keep the plots and tables visually consistent. This will increase overall organization of the project.
custom_theme =
theme_minimal() +
theme(
plot.title = element_text(face = "bold", size = 12),
plot.subtitle = element_text(face = "italic", size = 10),
axis.title = element_text(size = 10),
axis.text = element_text(size = 10),
legend.title = element_text(size = 8),
legend.text = element_text(size = 8))
##Creating table of summary statistics on casualties
conflict_data %>%
##Selecting two columns reflecting casualties that I would like ot include
select(deaths_civilians, accurate_countcasualty) %>%
##Calling to summarize
summary() %>%
##Calling to create table
kbl(col.names = c("Civilian Casualties", "Total Casualties"),
align = "ccc",
caption = "Summary Statistics on Casualties",
format.args = list(big.mark = ",")) %>%
#Aesthetics
column_spec(c(1,2), bold = F, background = "white") %>%
row_spec(0, bold = T, background = "lightblue") %>%
kable_minimal()
| Civilian Casualties | Total Casualties | |
|---|---|---|
| Min. :0.0000 | Min. : 1.000 | |
| 1st Qu.:0.0000 | 1st Qu.: 1.000 | |
| Median :0.0000 | Median : 2.000 | |
| Mean :0.7919 | Mean : 2.738 | |
| 3rd Qu.:1.0000 | 3rd Qu.: 3.000 | |
| Max. :5.0000 | Max. :24.000 |
##Finding the event which resulted in 24 deaths to mention in text
conflict_data %>%
filter(accurate_countcasualty == "24")
The minimum amount of total casualties observed in a India-Pakistan conflict event is 1: all conflict events have resulted in at least one death. The average amount of deaths per conflict event is approximately 3 people. The largest amount of deaths in a conflict event is 24. This event was a 1995 event in Neelam Valley Kashmir. The maximum amount of civilian deaths that occurred during one event is five. On average, approximately one civilian dies as a result of each India-Pakistan conflict.
##Viewing relevant descriptive statistics, mentioned in text
range(table(conflict_data$year))
mean(table(conflict_data$year))
##Creating part 1 of plot
p1<-
##Calling data and selecting variable
ggplot(data=conflict_data, aes(x=year)) +
##Setting aesthetics
geom_bar(mapping = aes(x = year),
fill = c("darkblue")) +
geom_text(size = 2.5, stat ='count', aes(label =..count..), vjust=-.5)+
##Adding labels
labs(title = "Count of India-Pakistan Conflict Events",
subtitle = "Years 1990 to 2020",
x = "Year",
y = "Number of India-Pakistan Conflict Events") +
##Adding theme which I created earlier
custom_theme
#Creating part 2, majority same process as above
p2<-
ggplot(data=conflict_data, aes(x=year)) +
#Setting y-limit to exclude year 2003 (y=67)
ylim(c(0, 20)) +
geom_bar(mapping = aes(x = year),
fill = c("lightblue")) +
geom_text(size = 2.5, stat ='count', aes(label =..count..), vjust=-.5)+
labs(title = "Count Excluding 2003",
subtitle = "Years 1990 to 2020",
x = "Year",
y = "Number of India-Pakistan Conflict Events",
##Adding a caption to appear in the right most bottom corner tosource the data
caption = "Data from UNCP Georeferenced Event Dataset, 2020") +
custom_theme
#Combining both parts to create overall graph
p1+p2
The average number of conflict events per year is 10.64. The count of India-Pakistan conflict events per year has been highly fluctuant with a range of 67. Looking at the graph below, we can further confirm this because we see that there are years between which we see a dramatic increases in conflict events. We are also able to identify years of peace, between 1995-1999 and 2003-2009 approximately. We can also note that the year 2003 was an all time peak in conflict-events, with a total of 67 events. We see that after that there were years of peace.
The historical context of this time frame (2003 onwards) is that there was a peak in tensions at the line of control, or border. Over the few years prior, frequency of conflict events had been increasing - we are able to see this in the graph as well with the steep increase from 1-9 between 2000-2022. In 2003 the violence was rampant and, a ‘cease-fire’ was passed towards the end of the year, November 2003, to control the state of affairs3.
Isolating the one dramatic nature of the time frame above, we are able to view a more natural trend of increase/decrease in conflict events in the left piece of the graph below, depicting count of conflict events of all years excluding 2003. Here, we can see, there is usually a flare up around every five years.
##Creating new, longer dataset, separating rows by side (Deaths side India or Pakistan)
conflict_data2 =
pivot_longer(conflict_data,
# Identify the column(s) of interest
c(`deaths_Pakistan`, `deaths_India`),
# New column where to store the names of the columns
names_to = "side",
# New column where to store the values
values_to = "death_count")
##Relevant descriptive statistics
##sum deaths by side, mentioned in text
conflict_data2 %>%
group_by(side) %>%
summarise(sum(death_count))
##Creating plot using new dataset
conflict_data2%>%
##calling data, variables, and fill by (side) and where to place bars relative to each other (dodge)
ggplot() +
geom_bar(mapping = aes(x = year, y = death_count,
fill = as.character(side)),
stat = "identity",
position = "dodge") +
##Coloring
scale_fill_manual(values = c("lightblue", "darkblue")) +
##Adding labels
labs(title = "Count of Deaths, India and Pakistan",
subtitle = "Year total across conflict events, Years 1990-2020",
x = "Year",
y = "Number of Deaths",
caption = "Data from UNCP Georeferenced Event Dataset, 2020") +
##Adding my custom theme
custom_theme
It is clear that Pakistan has an increased benefit when it comes to taking preventative measures as there is a clear difference in total amount of deaths between India and Pakistan, at 86 deaths of Pakistanis and 20 total deaths of Indians. This is visualized above, with bars representing Pakistani casualties being on average much higher than Indian. However it is important to keep the overall message in sight, which is that deaths on both sides have continually taken place over the past 30-year time frame. From 1993 approximately to 2018, there have been Indian deaths in the India-Pakistan conflict and, since 1990 to 2019 there have been Pakistani deaths in the conflict.
I predicted in my background section that there would be a correlation between the amount of conflict events and count of casualties. This table shows otherwise because many of the values of casualties by year in the graph above (depicting year-wise sum of casualties) and this table (depicting year casualty max of a single event) are the same. Indicating that there was one event per year that resulted in casualties of a certain count, and smaller events with no casualties.
##sum deaths by side by year, table
conflict_data2 %>%
group_by(side, year) %>%
summarise(max(death_count)) %>%
kbl(col.names = c("Country", "Year", "Max Death"),
align = "ccc",
caption = "India-Pakistan Maximum Count of Death in a Single Event, by Year",
format.args = list(big.mark = ""))%>%
column_spec(c(1,2), bold = F, background = "white") %>%
row_spec(0, bold = T, background = "lightblue") %>%
kable_minimal()
| Country | Year | Max Death |
|---|---|---|
| deaths_India | 1990 | 0 |
| deaths_India | 1991 | 0 |
| deaths_India | 1993 | 0 |
| deaths_India | 1994 | 4 |
| deaths_India | 1995 | 0 |
| deaths_India | 2000 | 0 |
| deaths_India | 2002 | 1 |
| deaths_India | 2003 | 5 |
| deaths_India | 2010 | 2 |
| deaths_India | 2013 | 0 |
| deaths_India | 2014 | 0 |
| deaths_India | 2017 | 0 |
| deaths_India | 2018 | 0 |
| deaths_India | 2019 | 5 |
| deaths_Pakistan | 1990 | 2 |
| deaths_Pakistan | 1991 | 7 |
| deaths_Pakistan | 1993 | 3 |
| deaths_Pakistan | 1994 | 0 |
| deaths_Pakistan | 1995 | 0 |
| deaths_Pakistan | 2000 | 1 |
| deaths_Pakistan | 2002 | 3 |
| deaths_Pakistan | 2003 | 5 |
| deaths_Pakistan | 2010 | 1 |
| deaths_Pakistan | 2013 | 2 |
| deaths_Pakistan | 2014 | 1 |
| deaths_Pakistan | 2017 | 8 |
| deaths_Pakistan | 2018 | 2 |
| deaths_Pakistan | 2019 | 2 |
India has experienced up to 5 deaths in a single event in the years 2003 and 2019. Pakistan has experienced upto 8 deaths in a single event in 2019. There have been only two years since 1990 where there have been maximum 0, or no casualties in in the India-Pakistan Conflict.
#Creating dataset for table of total deaths civilian, by province
per_province=
conflict_data %>%
group_by(Province) %>%
summarise(sum(deaths_civilians))
as.data.frame(per_province)
##Creating table
per_province %>%
#Naming column adding heading and formatting of observations
kbl(col.names = c("Province", "Civilian Deaths"),
align = "ccc",
caption = "Total Civilian Deaths in India-Pakistan Conflict, by Province",
format.args = list(big.mark = ",")) %>%
#Aesthetics
column_spec(c(1,2), bold = F, background = "white") %>%
row_spec(0, bold = T, background = "lightblue") %>%
kable_minimal()
| Province | Civilian Deaths |
|---|---|
| Azad Jammu & Kashmir | 87 |
| Gilgit-Baltistan | 6 |
| Punjab Province | 24 |
| Sindh Province | 1 |
As expected, Kashmir has the greatest amount of casualties by a large difference of 67 between casualties in Kashmir and the next highest state count. The emotions of Kashmiri citizens echo what is reflected by the data. The following conclusions of the UNICEF survey of Jammu and Kashmir, both Indian and Pakistani sides, reflect the Kashmiri sentiments about the India Pakistan conflict
80% of Kashmiris on both sides say that the dispute is important to them personally2.
19% of Pakistani Kashmiris and 43% of Indian Kashmiris expressed concerns of human rights abuses2.
##Plotting by province deaths over time, line graph
p3<-
##Creating Plot
conflict_data %>%
group_by(year,
Province) %>%
##Creating object for yearly civilian deaths
summarize(civdeaths_yearly = sum(deaths_civilians), na.rm = T) %>%
ggplot() +
##Defining x y, and different lines (province)
geom_line(aes(x = year,
y = civdeaths_yearly,
color = Province)) +
##Aesthetics
scale_color_manual(values=c("lightblue", "violet", "blue", "purple")) +
##Labels
labs(title = "Count of Civilian Deaths by Province, Pakistan",
subtitle = "Year totals across conflict events, Years 1990-2020",
x = "Year",
y = "Number of Civilian Deaths",
caption = "Data from UNCP Georeferenced Event Dataset, 2020") +
##Applying theme
custom_theme
##Creating textbox for peak point in graph
p3 +
##Defining text
annotate("text", label = "Flare up of boarder violence for
over 23 months, resulting in the passing
of a cease-fire motion in November
##Text aesthetics
2003", fontface = "italic", x = 2007, y = 45, size = 2) +
##Creating box and defining aesthetics
annotate("rect", xmin = 2002.25, xmax = 2012, ymin = 40, ymax = 50,
alpha = .1,fill = "blue")
The province of Gilgit-Baltistan has not experienced any India-Pakistan
conflict. Sindh province remains low with one casualty in the year 2019.
The punjeb province has seen an increase in India-Pakistan conflict
related casualties since 2010. This is because portions of Punjab also
boarder India’s province of Punjab. Civilian deaths in this area were on
a steady rise between 2010-2015. Since then, they have decreased rapidly
and we have not seen India-Pakistan violence in Punjab since
approximately 2018. Violence in Kashmir is dramatic, with periods of
rapid escalation and descliation, creating peaks. These periods are
followed by a period of little no no casualties, 2000-2003 for example.
With many periods of peace, followed by rapid increases in violence,
Kashmiris have been forced into hopelessness:
Using source article headlines, the following table is created to count the appearance of certain keywords in article headlines. They key words point to the nature of the event (for example, shelling, bombing, shooting) which point to personal characteristics (for example, job, age, gender) of those impacted.
#Creating object to hold keywords
demographic_words <- c("soldier", "civilian", "shelling", "teen", "dolescent", "villager", "women", "woman", "firing", "shooting")
#Creating object appearence_words to hold extracted souce headlines containing key words
appearence_words <- str_extract(conflict_data$source_headline, paste0(demographic_words, collapse='|'))
##Creating data frame from object appearence_words
keywords <- as.data.frame(table(appearence_words))
keywords
##Creating table with key word frequency count
keywords %>%
kbl(col.names = c("Keyword", "Frequency"),
align = "ccc",
caption = "Count of Keyword Appearances Across All Source Article Headlines",
format.args = list(big.mark = ","))%>%
column_spec(c(1,2), bold = F, background = "white") %>%
row_spec(0, bold = T, background = "lightblue") %>%
kable_minimal()
| Keyword | Frequency |
|---|---|
| civilian | 21 |
| dolescent | 1 |
| firing | 13 |
| shelling | 29 |
| shooting | 1 |
| soldier | 8 |
| villager | 2 |
| women | 1 |
The table above implies that a great deal of India-Pakistan violence is shelling. The keyword shelling was picked up in 29 source article headlines, a large 19.4% of all headlines in the dataset. Shelling is a high impact act because of the uncontrollable nature of targeting. Shelling refers to bombing with the use or artillery shells. In comparison, there are 8 mentions of shooting and 13 of firing. There are mentions of villagers, adolescents, and women in the headlines so, it is clear that the there is spillover of impact beyond those directly involved in conflict events. Furthermore, the word civilian is mentioned in 20 headlines, at 13.4% of total.
Keeping in mind that these are only a select few key words and there are a large variety of ways to refer to the same demographic, the estimates above should be viewed as underestimations. The picking up of these keywords should reflect the existence of impact on a certain group only and not be used to measure impact.
Key Findings from Graph 1: “Count of India Pakistan Conflict Events over 1990-2020”
Key Findings from Graph 2: Count of Casualties in India-Pakistan Conflict Events over 1990-2020
Key Findings from Table 2: Maximum Count of Deaths as a Result of an India-Pakistan Conflict Event, 1990-2020
Key Findings from Graph 3: Civilian Deaths Yearly by Province, 1990-2020
Key Findings from Graph 3: Civilian Deaths Yearly by Province, 1990-2020
Key Findings from Table 4: India-Pakistan Conflict Impact, Demographic and Nature of Violence
1 Sundberg, Ralph, and Erik Melander, 2013, “Introducing the UCDP Georeferenced Event Dataset”, Journal of Peace Research, vol.50, no.4
2 UNICEF, 2021, “Azad Jammu and Kashmir: Survey Findings Report 2020-21”, Multiple Indicators Survey
3 Hashim, Asad, 2019, “Timeline: India-Paksitan Relations”, Aljazeera.com https://www.aljazeera.com/news/2019/3/1/timeline-india-pakistan-relations