1. Introduction

1.1 Project Summary:

This project aims to provide a comprehensive analysis of political violence in Ethiopia by leveraging the ACLED dataset. The dataset contains detailed information on conflict events, including actors, event types, locations, and fatalities. Through exploratory data analysis (EDA), we will uncover insights into the dynamics of political violence, patterns of actor interactions, and geographical trends within Ethiopia. By examining this data, we seek to better understand the nature and context of political conflicts in the region.

This project will answer the following questions:

1. What are the fatality rates?

2. What kinds of events are the frequent reasons for conflict?

3. Who are the major actors?

4. Which places are most prone to conflict?

5. Which media(s) has been the main source(s) of the conflict news

6. Is there a correlation between specific actor interactions and the level of violence in conflict events?

7. What is the impact of civilian targeting in conflict events, and are there any trends related to this?

8.How do different administrative divisions (ADMIN1, ADMIN2, ADMIN3) relate to the frequency and nature of conflict events?

1.2 Install Packages

#install.packages("tidyverse")
#install.packages("dplyr")
#install.packages("ggplot2")  
#install.packages("tidyr")
#install.packages("lubridate")
#install.packages("readr")
#install.packages("gridExtra")
#install.packages("sf")
#install.packages("patchwork")
#install.packages("rnaturalearth")
library("tidyverse")
library("ggplot2")
library("dplyr")
library("tidyr")
library("lubridate")
library("readr")
library("gridExtra")
library("patchwork")
library("sf")
library("rnaturalearth")

2. Prepare Data

2.1. Load Data

eth_data <- read.csv("C:/Users/selam/Downloads/ethiopia1 (3).csv")
head(eth_data)

2.2. About the Data

The dataset includes from the date 10/08/1997 to 10/20/2023

Copyright: ACLED (Armed Conflict Location & Event Data) is the source of these data and the data are publicly available

The ACLED dataset updates weekly.

Users must adhere to ACLED’s Terms of Use, utilizing data responsibly and in good faith. Attribution Policy requires clear acknowledgment of ACLED in any use, specifying the date of access, manipulated data details, and proper citation formats. ACLED encourages responsible academic use but monitors and addresses misuse.

The dataset contains 10660 observation of 31 variables.

→ The dataset used for this analysis filtered only for the country Ethiopia.

2.3. Data Cleaning

Converting Date

eth_data$EVENT_DATE <- as.POSIXct(eth_data$EVENT_DATE, format = "%Y-%m-%d %H:%M:%S")

Missing Value

missing_values <- colSums(is.na(eth_data))
print(missing_values)
##      EVENT_ID_CNTY         EVENT_DATE               YEAR     TIME_PRECISION 
##                  0                  0                  0                  0 
##      DISORDER_TYPE         EVENT_TYPE     SUB_EVENT_TYPE             ACTOR1 
##                  0                  0                  0                  0 
##      ASSOC_ACTOR_1             INTER1             ACTOR2      ASSOC_ACTOR_2 
##                  0                  0                  0                  0 
##             INTER2        INTERACTION CIVILIAN_TARGETING                ISO 
##                  0                  0                  0                  0 
##             REGION            COUNTRY             ADMIN1             ADMIN2 
##                  0                  0                  0                  0 
##             ADMIN3           LOCATION           LATITUDE          LONGITUDE 
##                  0                  0                  0                  0 
##      GEO_PRECISION             SOURCE       SOURCE_SCALE              NOTES 
##                  0                  0                  0                  0 
##         FATALITIES               TAGS          TIMESTAMP 
##                  0                  0                  0

Column name consistency

eth_data <- setNames(eth_data, tolower(colnames(eth_data)))

3. Analyzing

3.1. Fatality Rate Over Years

summary_by_year <- eth_data %>% 
  group_by(year) %>% 
  summarise(
    total_fatalities = sum(fatalities, na.rm = TRUE),
    total_events = n(),
    fatality_rate = total_fatalities/total_events
    )
head(summary_by_year)
## # A tibble: 6 × 4
##    year total_fatalities total_events fatality_rate
##   <int>            <int>        <int>         <dbl>
## 1  1997               84           23          3.65
## 2  1998              958           60         16.0 
## 3  1999            18839           73        258.  
## 4  2000             1413          146          9.68
## 5  2001              894           56         16.0 
## 6  2002             4197          233         18.0
ggplot(summary_by_year, aes(x = year, y = fatality_rate)) +
  geom_line() +
  labs(title = "Fatality Rate Trend Over Years",
       x = "Year",
       y = "Fatality Rate") +
  theme_minimal()+
  theme(panel.grid = element_blank())

Summary by Year: Fatality Rate, Total Events, and Total Fatality

ggplot(summary_by_year, aes(x = year))+
  geom_bar(aes(y = total_fatalities, fill = "Total Fatalities"), position = "dodge", stat = "identity")+
  geom_bar(aes(y = total_events, fill = "Total Events"), position = "dodge", stat = "identity")+
  geom_line(aes(y = fatality_rate * 50, group = 1, color = "Fatality Rate"), size = 1) +  # Multiply by 10 for better scale
  labs(title = "Summary by Year",
       x = "Year",
       y = "Count / Rate") +
  scale_fill_manual(values = c("Total Fatalities" = "lightcoral", "Total Events" = "skyblue")) +
  scale_color_manual(values = c("Fatality Rate" = "green")) +
  theme_minimal() +
  theme(legend.position = "top", legend.title = element_blank())+
  theme(panel.grid = element_blank())

Top Fatalities and Events by year

top_fatalities <- summary_by_year %>%
  arrange(desc(total_fatalities)) %>%
  head(5)

# Arrange the data by total_events in descending order and select the top 5
top_events <- summary_by_year %>%
  arrange(desc(total_events)) %>%
  head(5)

# Create a bar plot for top 5 years with high fatalities
plot_fatalities <- ggplot(top_fatalities, aes(x = factor(year), y = total_fatalities, fill = factor(year))) +
  geom_bar(stat = "identity") +
  labs(title = "Top 5 Years with High Fatalities",
       x = "Year",
       y = "Total Fatalities") +
  theme_minimal() +
   theme(panel.grid = element_blank())+
   guides(fill = FALSE)+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Create a bar plot for top 5 years with high number of events
plot_events <- ggplot(top_events, aes(x = factor(year), y = total_events, fill = factor(year))) +
  geom_bar(stat = "identity") +
  labs(title = "Top 5 Years with High Number of Events",
       x = "Year",
       y = "Total Events") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  theme(panel.grid = element_blank())+
   guides(fill = FALSE)
# Arrange the plots side by side
grid.arrange(plot_fatalities, plot_events, ncol = 2)

Key Observation

  1. Total Fatalities and Events:
  • Total number of fatalities varies significantly across the years, with notable peaks in 1999, the reason there is an outlier during 1999 due to the Eritrean–Ethiopian War.

  • Years like 2021, 2022 also stand out with both exceptionally high fatality rate and number of events.

  1. Top 5 Years by Total Events:
  • The year 2021 records the highest number of total events, closely trailed by 2022, 2016, 2023, and 2018.
  1. Top 5 Years by Total Fatalities:
  • Notably, 1999 stands out as the year with the highest total fatalities, succeeded by 2021, 2022, 2002, and 2020.

3.2. What kinds of events are the frequent reasons for conflict?

event_count <- eth_data %>% 
  group_by(event_type) %>% 
  summarise(total_event = n()) %>% 
  arrange(desc(total_event))
head(event_count)
## # A tibble: 6 × 2
##   event_type                 total_event
##   <chr>                            <int>
## 1 Battles                           4428
## 2 Protests                          2391
## 3 Violence against civilians        2226
## 4 Strategic developments             672
## 5 Riots                              558
## 6 Explosions/Remote violence         385
ggplot(event_count, aes(x = reorder(event_type, -total_event), y = total_event)) +
  geom_bar(stat = "identity", fill = "skyblue") +
  labs(title = "Distribution of Event Types",
       x = "Event Type",
       y = "Total Events") +
  theme_minimal() +
  theme(panel.grid = element_blank())+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

  • The bar chart highlights that ‘Battles’ emerge as the predominant event type in the conflict, surpassing other categories in frequency. This suggests that armed clashes and military engagements are the most prevalent forms of conflict events within the dataset.

3.3 Who are the major actors?

actor_count1 <- eth_data %>% 
  group_by(actor1) %>% 
  summarise(total_events = n())%>%
  arrange(desc(total_events))

eth_data_filtered <- eth_data %>%
   filter(actor2 != "")
actor_count2 <- eth_data_filtered %>% 
  group_by(actor2) %>% 
  summarise(total_events = n())%>%
  arrange(desc(total_events))
head(actor_count1)
## # A tibble: 6 × 2
##   actor1                                               total_events
##   <chr>                                                       <int>
## 1 Protesters (Ethiopia)                                        2374
## 2 Military Forces of Ethiopia (2018-)                          1632
## 3 Military Forces of Ethiopia (1991-2018)                      1432
## 4 TPLF: Tigray People's Liberation Front                        737
## 5 OLF: Oromo Liberation Front (Shane Splinter Faction)          677
## 6 Rioters (Ethiopia)                                            554
head(actor_count2)
## # A tibble: 6 × 2
##   actor2                                  total_events
##   <chr>                                          <int>
## 1 Civilians (Ethiopia)                            2832
## 2 TPLF: Tigray People's Liberation Front           911
## 3 Military Forces of Ethiopia (1991-2018)          794
## 4 Military Forces of Ethiopia (2018-)              760
## 5 ONLF: Ogaden National Liberation Front           540
## 6 OLF: Oromo Liberation Front                      311
plot_actor1 <- ggplot(actor_count1 [1:5, ], aes(x = reorder(actor1, -total_events), y = total_events))+
  geom_bar(stat = "identity", fill = "lightcoral")+
  labs(title = "Major Actors 1",
       x = "Actor1",
       y = "Total Events")+
  theme_minimal() +
  theme(panel.grid = element_blank())+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot_actor2 <- ggplot(actor_count2 [1:5, ], aes(x = reorder(actor2, -total_events), y = total_events))+
  geom_bar(stat = "identity", fill = "skyblue")+
  labs(title = "Major Actors 2",
       x = "Actor2",
       y = "Total Events")+
  theme_minimal() +
  theme(panel.grid = element_blank())+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

combined_plots <- plot_actor1 + plot_actor2

# View the combined plot
combined_plots

  • As we can see from the graph “protesters” are number one actors.

3.4. Which places are most prone to conflict?

world_map <- ne_download(returnclass = "sf")

eth_data_sf <- st_as_sf(eth_data, coords = c("longitude", "latitude"), crs = 4326)
# Filter the map data for Ethiopia
ethiopia_map <- world_map[world_map$ADM0_A3 == "ETH", ]

# Plot the Ethiopian map
ggplot() +
  geom_sf(data = ethiopia_map, fill = "lightgrey", color = "black") +
  geom_sf(data = eth_data_sf, aes(color = fatalities), size = 2) +
  scale_color_gradient(low = "lightgreen", high = "red", guide = "legend", name = "Fatalities") +
  labs(title = "Conflict Events in Ethiopia",
       subtitle = "Prone areas based on Fatalities",
       caption = "Source: ACLED Conflict Data for Ethiopia") +
  theme_minimal()+
  theme(panel.grid = element_blank())

event_by_reg <- eth_data %>% 
   group_by(admin1) %>% 
   summarise(total_event = n()) %>% 
   arrange(desc(total_event))
event_by_loc <- eth_data %>% 
    group_by(location) %>% 
    summarise(total_event = n()) %>% 
    arrange(desc(total_event))
head(event_by_reg)
## # A tibble: 6 × 2
##   admin1      total_event
##   <chr>             <int>
## 1 Oromia             4594
## 2 Amhara             1970
## 3 Tigray             1271
## 4 Somali             1003
## 5 Addis Ababa         468
## 6 Afar                338
head(event_by_loc)
## # A tibble: 6 × 2
##   location    total_event
##   <chr>             <int>
## 1 Addis Ababa         376
## 2 Jijiga              269
## 3 Ambo                140
## 4 Gonder              137
## 5 Nekemt              133
## 6 Gambella            114
plot1 <- ggplot(event_by_reg[1:5, ], aes(x = reorder(admin1, -total_event), y = total_event, fill = total_event))+
  geom_bar(stat = "identity")+
  scale_fill_gradient(low = "skyblue", high = "red") +
labs(title = "Event by Region",
       x = "Regions",
       y = "Total Events")+
  theme_minimal() +
  theme(panel.grid = element_blank())+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  guides(fill = FALSE)

plot2 <- ggplot(event_by_loc[1:5, ], aes(x = reorder(location, -total_event), y = total_event, fill = total_event))+
  geom_bar(stat = "identity")+
  scale_fill_gradient(low = "lightblue", high = "lightcoral") +
labs(title = "Event by Location",
       x = "Location",
       y = "Total Events")+
  theme_minimal() +
  theme(panel.grid = element_blank())+
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  guides(fill = FALSE)
combined_plots <- plot1 + plot2

# View the combined plot
combined_plots

From the above analysis:

1. High Incidence Regions:-

Oromia, with a total of 4594 events, stands out as the region with the highest incidence of political violence.

Amhara and Tigray follow with 1970 and 1271 events, respectively.

2. Location-Specific Hotspots:-

Addis Ababa is a specific location with a noteworthy 468 events, indicating a concentration of political events in the capital city.

Jijiga, with 269 events, is a notable hotspot outside the major regions.

These points suggest that Oromia, Amhara, and Tigray experience a relatively higher frequency of political events, and in a specific locations like Addis Ababa and Jijiga.

3.5. Which media(s) has been the main source(s) of the conflict news

source_summary <- eth_data %>% 
  group_by(source) %>% 
  summarise(total_events = n())

source_summary <- source_summary[order(-source_summary$total_events), ]
ggplot(source_summary[1:5, ], aes(reorder(source, -total_events), y = total_events))+
  geom_bar(stat = "identity", fill = "skyblue")+
  labs( title = "Top Media Sourses of Conflict News",
        x = "Source",
        y = "Total Events")+
  theme_minimal()+
  theme(panel.grid = element_blank())+
  theme(axis.text.x  = element_text(angle = 45, hjust = 1))

Oromiya Media Network is the number one source of conflict news as we can observe from the above graph.

3.6. Is there a correlation between specific actor interactions and the level of violence in conflict events?

actor_intr_summary <- eth_data %>%
  group_by(actor1, actor2, interaction) %>% 
  summarise(total_fatalities = sum(fatalities))
## `summarise()` has grouped output by 'actor1', 'actor2'. You can override using
## the `.groups` argument.
correl_coeffi <- cor(actor_intr_summary$interaction, actor_intr_summary$total_fatalities)

print(paste("Correlation Coefficient: ", correl_coeffi))
## [1] "Correlation Coefficient:  -0.0369581507249406"

Create a scatter plot

plot(actor_intr_summary$interaction, actor_intr_summary$total_fatalities,
      xlab = "Actor Interaction", ylab = "Total Fatalities",
     main = "Correlation between Actor Interaction and Fatalities")

Observation:

A correlation coefficient of -0.03696 suggests a very weak negative correlation between the specific actor interactions and the level of violence in conflict events. However, the correlation is close to zero, so the relationship between these variables is weak, and there isn’t a clear linear association between them.

3.8. How do different administrative divisions (ADMIN1, ADMIN2, ADMIN3) relate to the frequency and nature of conflict events?

grouped_data <- eth_data %>%
  group_by(admin1,admin2,admin3)
# Summarize by ADMIN1
events_by_admin1 <- grouped_data %>%
  group_by(admin1) %>% 
  summarise(total_events = n())

# Summarize by ADMIN2
events_by_admin2 <- grouped_data %>%
  group_by(admin2) %>% 
  summarise(total_events = n())

# Summarize by ADMIN3
events_by_admin3 <- grouped_data %>%
  group_by(admin3) %>% 
  summarise(total_events = n())
# Function to plot bar chart for an administrative level
plot_admin_level <- function(data, level) {
  ggplot(data, aes(x = reorder(get(level), -total_events), y = total_events)) +
    geom_bar(stat = "identity", fill = "skyblue") +
    labs(title = paste("Top 5 Events by", level),
         x = level,
         y = "Total Events") +
    theme_minimal() +
    theme(panel.grid = element_blank(),
          axis.text.x = element_text(angle = 45, hjust = 1))
}

# Select top 5 for each administrative level
top5_events_by_admin1 <- events_by_admin1 %>% top_n(5)
## Selecting by total_events
top5_events_by_admin2 <- events_by_admin2 %>% top_n(5)
## Selecting by total_events
top5_events_by_admin3 <- events_by_admin3 %>% top_n(5)
## Selecting by total_events
head(top5_events_by_admin1)
## # A tibble: 5 × 2
##   admin1      total_events
##   <chr>              <int>
## 1 Addis Ababa          468
## 2 Amhara              1970
## 3 Oromia              4594
## 4 Somali              1003
## 5 Tigray              1271
head(top5_events_by_admin2)
## # A tibble: 5 × 2
##   admin2       total_events
##   <chr>               <int>
## 1 East Hararge          535
## 2 North Shewa           541
## 3 North Wello           472
## 4 Region 14             468
## 5 West Shewa            581
head(top5_events_by_admin3)
## # A tibble: 5 × 2
##   admin3       total_events
##   <chr>               <int>
## 1 Ambo town             140
## 2 Gonder town           141
## 3 Jigjiga town          269
## 4 Lideta                382
## 5 Nekemte town          133
# Plot each administrative level side by side
plot1 <- plot_admin_level(top5_events_by_admin1, "admin1")
plot2 <- plot_admin_level(top5_events_by_admin2, "admin2")
plot3 <- plot_admin_level(top5_events_by_admin3, "admin3")

# Arrange plots side by side
library(gridExtra)
grid.arrange(plot1, plot2, plot3, ncol = 3)

Administrative Divisions and Conflict Events:

Significant variation in conflict events is observed across administrative divisions.

Certain regions, such as Oromia and Amhara, consistently stand out with high conflict frequencies.

The capital city Addis Ababa and other towns like Lideta and Jigjiga show concentrated political events.

West Shewa, East Hararge, and North Shewa emerge as hotspots at the ADMIN2 level, emphasizing regional disparities.

Specific towns like Gonder, Ambo, and Nekemte play noteworthy roles at the ADMIN3 level.

These patterns suggest that understanding conflict dynamics requires considering both administrative and geographical contexts.

4. Conclusions and Recommendations

Conclusion:

Regional Disparities:

Conflict events in Ethiopia are not uniform and exhibit regional disparities. Regions like Oromia and Amhara consistently experience higher frequencies of conflict.

Urban Concentration:

Urban centers, including the capital city Addis Ababa, and specific towns like Lideta and Jigjiga, are focal points for political events.

Temporal Trends:

Examining trends over time reveals fluctuations, suggesting a dynamic socio-political landscape.

Recommendations:

Enhanced Monitoring:

Implement a robust monitoring system, especially in high-conflict regions, to promptly identify and respond to emerging issues.

Community Engagement:

Foster community engagement and dialogue to address underlying issues contributing to conflict, emphasizing local context.

Urban Security Measures:

Implement targeted security measures in urban centers to mitigate the impact of conflict events, especially in capital regions.

Data-Driven Policies:

Utilize data analytics for evidence-based policy formulation, adapting strategies based on changing conflict dynamics.

Collaborative Initiatives:

Collaborate with local and international stakeholders to address conflict at various administrative levels and promote peace-building efforts.

Harmony in Diversity!

Progress in Unity!

Nurturing Peace for a Flourishing Ethiopia!