Research Questions and Background

Research Questions

  1. How has India-Pakistan violence changed over the years?

specific: How has the frequency and life impact of events in the India-Pakistan Conflict changed over 1990-2020?

  1. Who is impacted by this violence and how?

specific: How many India-Pakistan conflict events took place in 1990-2020 in each province? Which demographic of Pakistan is most frequently reported on by media to be victim to India-Pakistan Conflict events in 1990-2020? What is the description of the violence that is being reported on between 1990-2020?

Background

The India-Pakistan conflict began during the partition of the two countries, formerly existing as the Indian-Subcontinent under the British Raj. During partition in the 1940s, a resource rich piece of land (Kashmir) was divided amongst the two countries. As a result, the cuontries have engaged in on an off conflict over control of this region. The countries have experienced phases of heightened conflict as well as those of positivity and peace. However, with the passing of time, tensions have strung high with the increase in conflict events and resulted in spreading to civilians, especially along the boarder, or “line of control.”

The hope for the project is to shed light on the nature of this conflict and emphasize the need for the passing of preventative policies on Pakistan’s end (since I am using data on Pakistan). Casualties will be the main variable of observation for this project, to establish how this butting of heads at the national level is actually a human rights issue at the ground level. The goal of the project is to humanize and highlight those experiencing the conflict. I expect there to be a correlation between amount of casualties and amount of events as there is likely a higher chance of death with more events. I also expect there to be a correlation between province of Kashmir and count of casualties because it is a boarder province and center of conflict.

In an ideal world, the policy implications of a project like this would to demonstrate a need for action in Pakistan passing policies to convert the existing existing cease-fire line into a permanent boundary. Additionally, steps would be taken to establishment of some sort of India-Pakistan Corridor to move forward in formal negotiations and decision making in allocation of land/resources1.

Data Wrangling and Prep Work

Step 1: Loading the Data and Packages, Previewing Data

The first step is to load all packages needed to clean the data, I will load those packages and also those I will use to complete the rest of the analysis. I will then preview the overall data to see what needs to be ‘tidyed.’

library("kableExtra") 
library("stringr")
library(tidyverse)
library(readxl)
library(RColorBrewer)
library(patchwork)
conflict_data <- read_excel("conflict_data_pak (1).xlsx")
conflict_data
## # A tibble: 149 x 52
##       id relid             year start_year end_year active_year code_status
##    <dbl> <chr>            <dbl>      <dbl>    <dbl>       <dbl> <chr>      
##  1 59413 PAK-1990-1-345-4  1990       1990     1990           1 Clear      
##  2 59553 PAK-1990-1-345-5  1990       1990     1990           1 Clear      
##  3 51736 PAK-1991-1-345-5  1991       1991     1991           1 Clear      
##  4 59651 PAK-1991-1-345-7  1991       1991     1991           1 Clear      
##  5 54315 PAK-1991-1-345-8  1991       1991     1991           1 Clear      
##  6 54880 PAK-1991-1-345-4  1991       1991     1991           1 Clear      
##  7 54882 PAK-1991-1-345-6  1991       1991     1991           1 Clear      
##  8 59664 PAK-1993-1-345-2  1993       1993     1993           0 Clear      
##  9 53298 PAK-1994-1-345-7  1994       1994     1994           0 Clear      
## 10 53814 PAK-1994-1-345-8  1994       1994     1994           0 Clear      
## # ... with 139 more rows, and 45 more variables: type_of_violence <dbl>,
## #   conflict_dset_id <dbl>, conflict_new_id <dbl>, conflict_name <chr>,
## #   dyad_dset_id <dbl>, dyad_new_id <dbl>, dyad_name <chr>,
## #   side_a_dset_id <dbl>, side_a_new_id <dbl>, side_a <chr>,
## #   side_b_dset_id <dbl>, side_b_new_id <dbl>, side_b <chr>,
## #   number_of_sources <dbl>, source_article <chr>, source_office <chr>,
## #   source_date <chr>, source_headline <chr>, source_original <chr>, ...

Taking a look at the data below, I see that the principles of ‘tidy data’ are already adhered to:

  1. Each variable has its own column
  2. Each observation has its own row
  3. Each value has its own cell

The unit of analysis is a India-Pakistan conflict event, listed by the unique conflict event id (variable 1). I do see that majority of variable names are intuitive (please read list in tibble preview below). However, I will rename some of the relevant variables to better suiting names in the next step. The number of observations, or conflict events, captured in this data set is 149.

Step 2: Taking a Closer Look, Previewing Relevant Variables

The next step is to see if relevant variables are ready to work with. I identified from the overall preview above that certain columns that I would like to work with need to be renamed, first I rename the columns as described below and and then preview the relevant columns:

  • deaths_b to deaths_Pakistan
  • deaths_a to deahts_India
  • high to accurate_countcasualty
  • adm_1 to Province
conflict_data = 
  rename(conflict_data, Province = adm_1, deaths_Pakistan = deaths_b, deaths_India = deaths_a, accurate_countcasualty = high)
conflict_data %>%
  select(deaths_Pakistan, deaths_India, deaths_civilians, Province, year, accurate_countcasualty)
## # A tibble: 149 x 6
##    deaths_Pakistan deaths_India deaths_civilians Province  year accurate_countc~
##              <dbl>        <dbl>            <dbl> <chr>    <dbl>            <dbl>
##  1               0            0                0 Azad Ja~  1990                5
##  2               2            0                0 Azad Ja~  1990                2
##  3               0            0                2 Azad Ja~  1991                2
##  4               0            0                0 Azad Ja~  1991                4
##  5               0            0                0 Azad Ja~  1991                4
##  6               7            0                0 Azad Ja~  1991                7
##  7               0            0                0 Azad Ja~  1991                9
##  8               3            0                0 Azad Ja~  1993                3
##  9               0            0                1 Azad Ja~  1994                1
## 10               0            0                1 Azad Ja~  1994                1
## # ... with 139 more rows

The preview looks good, I have verified that the desired columns are now named in an intuitive manner and, no columns need to be combined or separated for my analysis (for example, if I needed to use date and columns were separated as month, day, year, I would need to combine). Now I will check for missing values, there do not seem to be any.

conflict_data2 = 
conflict_data %>%
  select(deaths_Pakistan, deaths_India, deaths_civilians, Province, id, year, accurate_countcasualty) 
which(is.na(conflict_data2))
## integer(0)

Step 3: Creating a Custom Theme For the Project

This is not a data tidying step but a step I am doing for the project to keep the plots and tables visually consistent. This will increase overall organization of the project.

custom_theme = 
  theme_minimal() +
  theme(
    plot.title = element_text(face = "bold", size = 12), 
    plot.subtitle = element_text(face = "italic", size = 10),
    axis.title = element_text(size = 10),
    axis.text = element_text(size = 10),
    legend.title = element_text(size = 8),
    legend.text = element_text(size = 8))

Analysis

Table 1: Overview of Casualties

##Creating table of summary statistics on casualties
conflict_data %>%
  ##Selecting two columns reflecting casualties that I would like ot include
  select(deaths_civilians, accurate_countcasualty) %>%
  ##Calling to summarize
  summary() %>%
  ##Calling to create table
  kbl(col.names = c("Civilian Casualties", "Total Casualties"),
      align = "ccc",
      caption = "Summary Statistics on Casualties",
      format.args = list(big.mark = ",")) %>%
  #Aesthetics
  column_spec(c(1,2), bold = F, background = "white") %>%
  row_spec(0, bold = T, background = "lightblue") %>%
  kable_minimal()
Summary Statistics on Casualties
Civilian Casualties Total Casualties
Min. :0.0000 Min. : 1.000
1st Qu.:0.0000 1st Qu.: 1.000
Median :0.0000 Median : 2.000
Mean :0.7919 Mean : 2.738
3rd Qu.:1.0000 3rd Qu.: 3.000
Max. :5.0000 Max. :24.000
##Finding the event which resulted in 24 deaths to mention in text
conflict_data %>%
  filter(accurate_countcasualty == "24")

The minimum amount of total casualties observed in a India-Pakistan conflict event is 1: all conflict events have resulted in at least one death. The average amount of deaths per conflict event is approximately 3 people. The largest amount of deaths in a conflict event is 24. This event was a 1995 event in Neelam Valley Kashmir. The maximum amount of civilian deaths that occurred during one event is five. On average, approximately one civilian dies as a result of each India-Pakistan conflict.

Graph 1: Count of India Pakistan Conflict Events over 1990-2020

##Viewing relevant descriptive statistics, mentioned in text
range(table(conflict_data$year))
mean(table(conflict_data$year))
##Creating part 1 of plot
p1<-
##Calling data and selecting variable
ggplot(data=conflict_data, aes(x=year)) +
##Setting aesthetics
  geom_bar(mapping = aes(x = year),
           fill = c("darkblue")) +
  geom_text(size = 2.5, stat ='count', aes(label =..count..), vjust=-.5)+
  ##Adding labels
  labs(title = "Count of India-Pakistan Conflict Events",
       subtitle = "Years 1990 to 2020",
       x = "Year",
       y = "Number of India-Pakistan Conflict Events") +
  ##Adding theme which I created earlier
  custom_theme

#Creating part 2, majority same process as above
p2<-
ggplot(data=conflict_data, aes(x=year)) +
  #Setting y-limit to exclude year 2003 (y=67)
  ylim(c(0, 20)) +
  geom_bar(mapping = aes(x = year),
           fill = c("lightblue")) +
  geom_text(size = 2.5, stat ='count', aes(label =..count..), vjust=-.5)+
  labs(title = "Count Excluding 2003",
       subtitle = "Years 1990 to 2020",
       x = "Year",
       y = "Number of India-Pakistan Conflict Events", 
       ##Adding a caption to appear in the right most bottom corner tosource the data 
       caption = "Data from UNCP Georeferenced Event Dataset, 2020") +
  custom_theme

#Combining both parts to create overall graph
p1+p2

The average number of conflict events per year is 10.64. The count of India-Pakistan conflict events per year has been highly fluctuant with a range of 67. Looking at the graph below, we can further confirm this because we see that there are years between which we see a dramatic increases in conflict events. We are also able to identify years of peace, between 1995-1999 and 2003-2009 approximately. We can also note that the year 2003 was an all time peak in conflict-events, with a total of 67 events. We see that after that there were years of peace.

The historical context of this time frame (2003 onwards) is that there was a peak in tensions at the line of control, or border. Over the few years prior, frequency of conflict events had been increasing - we are able to see this in the graph as well with the steep increase from 1-9 between 2000-2022. In 2003 the violence was rampant and, a ‘cease-fire’ was passed towards the end of the year, November 2003, to control the state of affairs3.

Isolating the one dramatic nature of the time frame above, we are able to view a more natural trend of increase/decrease in conflict events in the left piece of the graph below, depicting count of conflict events of all years excluding 2003. Here, we can see, there is usually a flare up around every five years.

Graph 2: Count of Casualties by Side in India-Pakistan Conflict Events, 1990-2020

##Creating new, longer dataset, separating rows by side (Deaths side India or Pakistan)
conflict_data2 =
  pivot_longer(conflict_data,
               # Identify the column(s) of interest
               c(`deaths_Pakistan`, `deaths_India`),
               # New column where to store the names of the columns
               names_to = "side",
               # New column where to store the values
               values_to = "death_count")

##Relevant descriptive statistics
##sum deaths by side, mentioned in text
conflict_data2 %>%
  group_by(side) %>%
  summarise(sum(death_count))
##Creating plot using new dataset
conflict_data2%>%
  ##calling data, variables, and fill by (side) and where to place bars relative to each other (dodge)
  ggplot() +
  geom_bar(mapping = aes(x = year, y = death_count,
                         fill = as.character(side)),
           stat = "identity",
           position = "dodge") +
  ##Coloring
  scale_fill_manual(values = c("lightblue", "darkblue")) +
  ##Adding labels
  labs(title = "Count of Deaths, India and Pakistan",
       subtitle = "Year total across conflict events, Years 1990-2020",
       x = "Year",
       y = "Number of Deaths", 
       caption = "Data from UNCP Georeferenced Event Dataset, 2020") +
  ##Adding my custom theme
  custom_theme

It is clear that Pakistan has an increased benefit when it comes to taking preventative measures as there is a clear difference in total amount of deaths between India and Pakistan, at 86 deaths of Pakistanis and 20 total deaths of Indians. This is visualized above, with bars representing Pakistani casualties being on average much higher than Indian. However it is important to keep the overall message in sight, which is that deaths on both sides have continually taken place over the past 30-year time frame. From 1993 approximately to 2018, there have been Indian deaths in the India-Pakistan conflict and, since 1990 to 2019 there have been Pakistani deaths in the conflict.

I predicted in my background section that there would be a correlation between the amount of conflict events and count of casualties. This table shows otherwise because many of the values of casualties by year in the graph above (depicting year-wise sum of casualties) and this table (depicting year casualty max of a single event) are the same. Indicating that there was one event per year that resulted in casualties of a certain count, and smaller events with no casualties.

Table 2: Maximum Count of Deaths as a Result of an India-Pakistan Conflict Event, 1990-2020

##sum deaths by side by year, table 
conflict_data2 %>%
  group_by(side, year) %>%
  summarise(max(death_count)) %>%
  kbl(col.names = c("Country", "Year", "Max Death"),
      align = "ccc",
      caption = "India-Pakistan Maximum Count of Death in a Single Event, by Year",
      format.args = list(big.mark = ""))%>%
  column_spec(c(1,2), bold = F, background = "white") %>%
  row_spec(0, bold = T, background = "lightblue") %>%
  kable_minimal()
India-Pakistan Maximum Count of Death in a Single Event, by Year
Country Year Max Death
deaths_India 1990 0
deaths_India 1991 0
deaths_India 1993 0
deaths_India 1994 4
deaths_India 1995 0
deaths_India 2000 0
deaths_India 2002 1
deaths_India 2003 5
deaths_India 2010 2
deaths_India 2013 0
deaths_India 2014 0
deaths_India 2017 0
deaths_India 2018 0
deaths_India 2019 5
deaths_Pakistan 1990 2
deaths_Pakistan 1991 7
deaths_Pakistan 1993 3
deaths_Pakistan 1994 0
deaths_Pakistan 1995 0
deaths_Pakistan 2000 1
deaths_Pakistan 2002 3
deaths_Pakistan 2003 5
deaths_Pakistan 2010 1
deaths_Pakistan 2013 2
deaths_Pakistan 2014 1
deaths_Pakistan 2017 8
deaths_Pakistan 2018 2
deaths_Pakistan 2019 2

India has experienced up to 5 deaths in a single event in the years 2003 and 2019. Pakistan has experienced upto 8 deaths in a single event in 2019. There have been only two years since 1990 where there have been maximum 0, or no casualties in in the India-Pakistan Conflict.

Table 3: Provincial Breakdown of Civilian Casualties

#Creating dataset for table of total deaths civilian, by province
per_province= 
conflict_data %>%
  group_by(Province) %>%
  summarise(sum(deaths_civilians))
as.data.frame(per_province)
##Creating table
per_province %>%
  #Naming column adding heading and formatting of observations
  kbl(col.names = c("Province", "Civilian Deaths"),
      align = "ccc",
      caption = "Total Civilian Deaths in India-Pakistan Conflict, by Province",
      format.args = list(big.mark = ",")) %>%
  #Aesthetics
  column_spec(c(1,2), bold = F, background = "white") %>%
  row_spec(0, bold = T, background = "lightblue") %>%
  kable_minimal()
Total Civilian Deaths in India-Pakistan Conflict, by Province
Province Civilian Deaths
Azad Jammu & Kashmir 87
Gilgit-Baltistan 6
Punjab Province 24
Sindh Province 1

As expected, Kashmir has the greatest amount of casualties by a large difference of 67 between casualties in Kashmir and the next highest state count. The emotions of Kashmiri citizens echo what is reflected by the data. The following conclusions of the UNICEF survey of Jammu and Kashmir, both Indian and Pakistani sides, reflect the Kashmiri sentiments about the India Pakistan conflict

  • 80% of Kashmiris on both sides say that the dispute is important to them personally2.

  • 19% of Pakistani Kashmiris and 43% of Indian Kashmiris expressed concerns of human rights abuses2.

Graph 3: Civilian Deaths Yearly by Province, 1990-2020

##Plotting by province deaths over time, line graph
p3<-
  ##Creating Plot 
conflict_data %>%
  group_by(year,
           Province) %>%
  ##Creating object for yearly civilian deaths
  summarize(civdeaths_yearly = sum(deaths_civilians), na.rm = T) %>%
  ggplot() +
  ##Defining x y, and different lines (province)
  geom_line(aes(x = year,
                y = civdeaths_yearly,
                color = Province)) +
  ##Aesthetics
  scale_color_manual(values=c("lightblue", "violet", "blue", "purple")) +
  ##Labels
  labs(title = "Count of Civilian Deaths by Province, Pakistan",
       subtitle = "Year totals across conflict events, Years 1990-2020",
       x = "Year",
       y = "Number of Civilian Deaths", 
       caption = "Data from UNCP Georeferenced Event Dataset, 2020") +
  ##Applying theme
  custom_theme

##Creating textbox for peak point in graph
p3 +
  ##Defining text
  annotate("text", label = "Flare up of boarder violence for
  over 23 months, resulting in the passing
  of a cease-fire motion in November
  ##Text aesthetics
  2003", fontface = "italic", x = 2007, y = 45, size = 2) +
  ##Creating box and defining aesthetics
  annotate("rect", xmin = 2002.25, xmax = 2012, ymin = 40, ymax = 50,
           alpha = .1,fill = "blue")

The province of Gilgit-Baltistan has not experienced any India-Pakistan conflict. Sindh province remains low with one casualty in the year 2019. The punjeb province has seen an increase in India-Pakistan conflict related casualties since 2010. This is because portions of Punjab also boarder India’s province of Punjab. Civilian deaths in this area were on a steady rise between 2010-2015. Since then, they have decreased rapidly and we have not seen India-Pakistan violence in Punjab since approximately 2018. Violence in Kashmir is dramatic, with periods of rapid escalation and descliation, creating peaks. These periods are followed by a period of little no no casualties, 2000-2003 for example. With many periods of peace, followed by rapid increases in violence, Kashmiris have been forced into hopelessness:

  • Only 27% of Pakistani Kashmiris and 57% of Indian Kashmiris believed that peace talks would ever succeed.2

Table 4: India-Pakistan Conflict Impact, Demographic and Nature of Violence

Using source article headlines, the following table is created to count the appearance of certain keywords in article headlines. They key words point to the nature of the event (for example, shelling, bombing, shooting) which point to personal characteristics (for example, job, age, gender) of those impacted.

#Creating object to hold keywords 
demographic_words <- c("soldier", "civilian", "shelling",  "teen", "dolescent", "villager", "women", "woman", "firing", "shooting")
#Creating object appearence_words to hold extracted souce headlines containing key words
appearence_words <- str_extract(conflict_data$source_headline, paste0(demographic_words, collapse='|'))
##Creating data frame from object appearence_words
keywords <- as.data.frame(table(appearence_words))
keywords
##Creating table with key word frequency count
keywords %>%
  kbl(col.names = c("Keyword", "Frequency"),
      align = "ccc",
      caption = "Count of Keyword Appearances Across All Source Article Headlines",
      format.args = list(big.mark = ","))%>%
  column_spec(c(1,2), bold = F, background = "white") %>%
  row_spec(0, bold = T, background = "lightblue") %>%
  kable_minimal()
Count of Keyword Appearances Across All Source Article Headlines
Keyword Frequency
civilian 21
dolescent 1
firing 13
shelling 29
shooting 1
soldier 8
villager 2
women 1

The table above implies that a great deal of India-Pakistan violence is shelling. The keyword shelling was picked up in 29 source article headlines, a large 19.4% of all headlines in the dataset. Shelling is a high impact act because of the uncontrollable nature of targeting. Shelling refers to bombing with the use or artillery shells. In comparison, there are 8 mentions of shooting and 13 of firing. There are mentions of villagers, adolescents, and women in the headlines so, it is clear that the there is spillover of impact beyond those directly involved in conflict events. Furthermore, the word civilian is mentioned in 20 headlines, at 13.4% of total.

Keeping in mind that these are only a select few key words and there are a large variety of ways to refer to the same demographic, the estimates above should be viewed as underestimations. The picking up of these keywords should reflect the existence of impact on a certain group only and not be used to measure impact.

Key Take Aways

How has India-Pakistan violence changed over the years?

Key Findings from Graph 1: “Count of India Pakistan Conflict Events over 1990-2020”

  • India-Pakistan conflict has remained fluctuant over the years, ranging from 0 to 67 events per year.
  • There seems to be a flare up every 5 years or so.
  • There have been consistent conflict events every year since 2018
  • The peak of the India-Pakistan conflict was around the year 2003
  • There are a few windows of peaceful years which are about 3-7 years long
  • Count of conflict events per year was lower in the first 5 years of conflict
  • Average number of conflict events per year has increased in the last 5 years

Key Findings from Graph 2: Count of Casualties in India-Pakistan Conflict Events over 1990-2020

  • Total number of deaths per decade, on both sides, has increased since 1990-2000.
  • Years 2000-2003 saw a high in casualties on both sides.
  • Pakistani number of casualties is higher or equal to Indian casualties in all years where casualties are observed.
  • There are six years in which only Paksitani casualties occur.
  • In the year 2018, there are aover twice the amount of Indian casualties, as compared to Pakistani casualties, with 5 on Indian deaths and 2 Pakistani deaths.
  • The all time highest observed number of casualties for a year, as a result of India-Paksitan conflict events, was 8 in 2017.

Key Findings from Table 2: Maximum Count of Deaths as a Result of an India-Pakistan Conflict Event, 1990-2020

  • The maximum number of casualties by event has been increasing for both sides since 1990.
  • 2003 saw the most, and equal amount across sides, of maximum casualties as a result of one event at 5 casualties for both India and Pakistan.
  • Sum of deaths across conflict events (previous graph) and maximum deaths seem to be the same for majority of years, indicating that usually one larger scale casualty causing event occurs per year.
  • The maximum of casualties for a single event was 8 for Pakistan, occurring in 2017 and, 5 for India, occurring in both 2018 and 2003.

Key Findings from Graph 3: Civilian Deaths Yearly by Province, 1990-2020

  • Kashmir has experienced periods of quick escalation and deescalation of violence, the entire cycle lasting approximately 5-10 years.
  • Kashmir has experienced 3-5 years of peace following these “cycles.”
  • Punjab experienced a steady increase in casualties as a result of India-Pakistan conflict events between 2010 and 2015, eventually decreasing and stopping all together.

Who is impacted by this violence and how?

Key Findings from Graph 3: Civilian Deaths Yearly by Province, 1990-2020

  • Civilians in four out of the five provinces of Pakistan have experienced at least one death as a result of the India-Pakistan conflict since 1990. *Kashmir has lost up to 47 civilian citizens in one year as a result of the India-Pakistan conflict
  • Sindh has experienced one civilian death as a result of the India-Pakistan conflict.
  • Punjabi civilians account for the are the second highest group of deaths in the India-Pakistan conflict, grouping by province.

Key Findings from Table 4: India-Pakistan Conflict Impact, Demographic and Nature of Violence

  • At least one conflict event has impacted the demographic groups:
    • Women
    • Villagers
    • Adolescents
    • Soldiers
  • The word “Shelling” commonly appears in headlines reporting on the India-Pakistan conflict, at an appearance total of 29 times. Indicating that shelling is a primary form of violence experienced in India-Pakistan conflict events.
  • “Shooting” and “Firing” appear a total of 21 times, indicating it is another primary form of violence experienced in this conflict.
  • The word “soldier” appears 8 times and “civilian” 20 times. This may indicate that soldier deaths go unreported more often.
  • The combination of the comparitively high appearance of the word “shelling” and “civilian” may indicate that the nature of this conflict is that is more experienced in mass destructive events, by civilians, than shootings across the boarder.

Limitations

  • It is possible that the ‘key word search’ element does not paint a very accurate picture as many headers may use different words to reflect the same thing.
  • It is also possible that the numbers captured in this data set are not the most accurate as these areas are poorly serviced areas and some smaller incidents may go unreported.
  • There are discrepancies between the total for deaths and death side a/b/civilian. As a result, it is possible that certain deaths were not categorized and only accounted for in the total.

References

1 Sundberg, Ralph, and Erik Melander, 2013, “Introducing the UCDP Georeferenced Event Dataset”, Journal of Peace Research, vol.50, no.4

2 UNICEF, 2021, “Azad Jammu and Kashmir: Survey Findings Report 2020-21”, Multiple Indicators Survey

3 Hashim, Asad, 2019, “Timeline: India-Paksitan Relations”, Aljazeera.com https://www.aljazeera.com/news/2019/3/1/timeline-india-pakistan-relations