This project aims to provide a comprehensive analysis of political violence in Ethiopia by leveraging the ACLED dataset. The dataset contains detailed information on conflict events, including actors, event types, locations, and fatalities. Through exploratory data analysis (EDA), we will uncover insights into the dynamics of political violence, patterns of actor interactions, and geographical trends within Ethiopia. By examining this data, we seek to better understand the nature and context of political conflicts in the region.
This project will answer the following questions:
1. What are the fatality rates?
2. What kinds of events are the frequent reasons for conflict?
3. Who are the major actors?
4. Which places are most prone to conflict?
5. Which media(s) has been the main source(s) of the conflict news
6. Is there a correlation between specific actor interactions and the level of violence in conflict events?
7. What is the impact of civilian targeting in conflict events, and are there any trends related to this?
8.How do different administrative divisions (ADMIN1, ADMIN2, ADMIN3) relate to the frequency and nature of conflict events?
#install.packages("tidyverse")
#install.packages("dplyr")
#install.packages("ggplot2")
#install.packages("tidyr")
#install.packages("lubridate")
#install.packages("readr")
#install.packages("gridExtra")
#install.packages("sf")
#install.packages("patchwork")
#install.packages("rnaturalearth")
library("tidyverse")
library("ggplot2")
library("dplyr")
library("tidyr")
library("lubridate")
library("readr")
library("gridExtra")
library("patchwork")
library("sf")
library("rnaturalearth")
eth_data <- read.csv("C:/Users/selam/Downloads/ethiopia1 (3).csv")
head(eth_data)
The dataset includes from the date 10/08/1997 to 10/20/2023
Copyright: ACLED (Armed Conflict Location & Event Data) is the source of these data and the data are publicly available
The ACLED dataset updates weekly.
Users must adhere to ACLED’s Terms of Use, utilizing data responsibly and in good faith. Attribution Policy requires clear acknowledgment of ACLED in any use, specifying the date of access, manipulated data details, and proper citation formats. ACLED encourages responsible academic use but monitors and addresses misuse.
The dataset contains 10660 observation of 31 variables.
→ The dataset used for this analysis filtered only for the country Ethiopia.
Converting Date
eth_data$EVENT_DATE <- as.POSIXct(eth_data$EVENT_DATE, format = "%Y-%m-%d %H:%M:%S")
Missing Value
missing_values <- colSums(is.na(eth_data))
print(missing_values)
## EVENT_ID_CNTY EVENT_DATE YEAR TIME_PRECISION
## 0 0 0 0
## DISORDER_TYPE EVENT_TYPE SUB_EVENT_TYPE ACTOR1
## 0 0 0 0
## ASSOC_ACTOR_1 INTER1 ACTOR2 ASSOC_ACTOR_2
## 0 0 0 0
## INTER2 INTERACTION CIVILIAN_TARGETING ISO
## 0 0 0 0
## REGION COUNTRY ADMIN1 ADMIN2
## 0 0 0 0
## ADMIN3 LOCATION LATITUDE LONGITUDE
## 0 0 0 0
## GEO_PRECISION SOURCE SOURCE_SCALE NOTES
## 0 0 0 0
## FATALITIES TAGS TIMESTAMP
## 0 0 0
Column name consistency
eth_data <- setNames(eth_data, tolower(colnames(eth_data)))
summary_by_year <- eth_data %>%
group_by(year) %>%
summarise(
total_fatalities = sum(fatalities, na.rm = TRUE),
total_events = n(),
fatality_rate = total_fatalities/total_events
)
head(summary_by_year)
## # A tibble: 6 × 4
## year total_fatalities total_events fatality_rate
## <int> <int> <int> <dbl>
## 1 1997 84 23 3.65
## 2 1998 958 60 16.0
## 3 1999 18839 73 258.
## 4 2000 1413 146 9.68
## 5 2001 894 56 16.0
## 6 2002 4197 233 18.0
ggplot(summary_by_year, aes(x = year, y = fatality_rate)) +
geom_line() +
labs(title = "Fatality Rate Trend Over Years",
x = "Year",
y = "Fatality Rate") +
theme_minimal()+
theme(panel.grid = element_blank())
Summary by Year: Fatality Rate, Total Events, and Total Fatality
ggplot(summary_by_year, aes(x = year))+
geom_bar(aes(y = total_fatalities, fill = "Total Fatalities"), position = "dodge", stat = "identity")+
geom_bar(aes(y = total_events, fill = "Total Events"), position = "dodge", stat = "identity")+
geom_line(aes(y = fatality_rate * 50, group = 1, color = "Fatality Rate"), size = 1) + # Multiply by 10 for better scale
labs(title = "Summary by Year",
x = "Year",
y = "Count / Rate") +
scale_fill_manual(values = c("Total Fatalities" = "lightcoral", "Total Events" = "skyblue")) +
scale_color_manual(values = c("Fatality Rate" = "green")) +
theme_minimal() +
theme(legend.position = "top", legend.title = element_blank())+
theme(panel.grid = element_blank())
Top Fatalities and Events by year
top_fatalities <- summary_by_year %>%
arrange(desc(total_fatalities)) %>%
head(5)
# Arrange the data by total_events in descending order and select the top 5
top_events <- summary_by_year %>%
arrange(desc(total_events)) %>%
head(5)
# Create a bar plot for top 5 years with high fatalities
plot_fatalities <- ggplot(top_fatalities, aes(x = factor(year), y = total_fatalities, fill = factor(year))) +
geom_bar(stat = "identity") +
labs(title = "Top 5 Years with High Fatalities",
x = "Year",
y = "Total Fatalities") +
theme_minimal() +
theme(panel.grid = element_blank())+
guides(fill = FALSE)+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
## Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
## of ggplot2 3.3.4.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
# Create a bar plot for top 5 years with high number of events
plot_events <- ggplot(top_events, aes(x = factor(year), y = total_events, fill = factor(year))) +
geom_bar(stat = "identity") +
labs(title = "Top 5 Years with High Number of Events",
x = "Year",
y = "Total Events") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
theme(panel.grid = element_blank())+
guides(fill = FALSE)
# Arrange the plots side by side
grid.arrange(plot_fatalities, plot_events, ncol = 2)
Key Observation
Total number of fatalities varies significantly across the years, with notable peaks in 1999, the reason there is an outlier during 1999 due to the Eritrean–Ethiopian War.
Years like 2021, 2022 also stand out with both exceptionally high fatality rate and number of events.
event_count <- eth_data %>%
group_by(event_type) %>%
summarise(total_event = n()) %>%
arrange(desc(total_event))
head(event_count)
## # A tibble: 6 × 2
## event_type total_event
## <chr> <int>
## 1 Battles 4428
## 2 Protests 2391
## 3 Violence against civilians 2226
## 4 Strategic developments 672
## 5 Riots 558
## 6 Explosions/Remote violence 385
ggplot(event_count, aes(x = reorder(event_type, -total_event), y = total_event)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = "Distribution of Event Types",
x = "Event Type",
y = "Total Events") +
theme_minimal() +
theme(panel.grid = element_blank())+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
actor_count1 <- eth_data %>%
group_by(actor1) %>%
summarise(total_events = n())%>%
arrange(desc(total_events))
eth_data_filtered <- eth_data %>%
filter(actor2 != "")
actor_count2 <- eth_data_filtered %>%
group_by(actor2) %>%
summarise(total_events = n())%>%
arrange(desc(total_events))
head(actor_count1)
## # A tibble: 6 × 2
## actor1 total_events
## <chr> <int>
## 1 Protesters (Ethiopia) 2374
## 2 Military Forces of Ethiopia (2018-) 1632
## 3 Military Forces of Ethiopia (1991-2018) 1432
## 4 TPLF: Tigray People's Liberation Front 737
## 5 OLF: Oromo Liberation Front (Shane Splinter Faction) 677
## 6 Rioters (Ethiopia) 554
head(actor_count2)
## # A tibble: 6 × 2
## actor2 total_events
## <chr> <int>
## 1 Civilians (Ethiopia) 2832
## 2 TPLF: Tigray People's Liberation Front 911
## 3 Military Forces of Ethiopia (1991-2018) 794
## 4 Military Forces of Ethiopia (2018-) 760
## 5 ONLF: Ogaden National Liberation Front 540
## 6 OLF: Oromo Liberation Front 311
plot_actor1 <- ggplot(actor_count1 [1:5, ], aes(x = reorder(actor1, -total_events), y = total_events))+
geom_bar(stat = "identity", fill = "lightcoral")+
labs(title = "Major Actors 1",
x = "Actor1",
y = "Total Events")+
theme_minimal() +
theme(panel.grid = element_blank())+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
plot_actor2 <- ggplot(actor_count2 [1:5, ], aes(x = reorder(actor2, -total_events), y = total_events))+
geom_bar(stat = "identity", fill = "skyblue")+
labs(title = "Major Actors 2",
x = "Actor2",
y = "Total Events")+
theme_minimal() +
theme(panel.grid = element_blank())+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
combined_plots <- plot_actor1 + plot_actor2
# View the combined plot
combined_plots
world_map <- ne_download(returnclass = "sf")
eth_data_sf <- st_as_sf(eth_data, coords = c("longitude", "latitude"), crs = 4326)
# Filter the map data for Ethiopia
ethiopia_map <- world_map[world_map$ADM0_A3 == "ETH", ]
# Plot the Ethiopian map
ggplot() +
geom_sf(data = ethiopia_map, fill = "lightgrey", color = "black") +
geom_sf(data = eth_data_sf, aes(color = fatalities), size = 2) +
scale_color_gradient(low = "lightgreen", high = "red", guide = "legend", name = "Fatalities") +
labs(title = "Conflict Events in Ethiopia",
subtitle = "Prone areas based on Fatalities",
caption = "Source: ACLED Conflict Data for Ethiopia") +
theme_minimal()+
theme(panel.grid = element_blank())
event_by_reg <- eth_data %>%
group_by(admin1) %>%
summarise(total_event = n()) %>%
arrange(desc(total_event))
event_by_loc <- eth_data %>%
group_by(location) %>%
summarise(total_event = n()) %>%
arrange(desc(total_event))
head(event_by_reg)
## # A tibble: 6 × 2
## admin1 total_event
## <chr> <int>
## 1 Oromia 4594
## 2 Amhara 1970
## 3 Tigray 1271
## 4 Somali 1003
## 5 Addis Ababa 468
## 6 Afar 338
head(event_by_loc)
## # A tibble: 6 × 2
## location total_event
## <chr> <int>
## 1 Addis Ababa 376
## 2 Jijiga 269
## 3 Ambo 140
## 4 Gonder 137
## 5 Nekemt 133
## 6 Gambella 114
plot1 <- ggplot(event_by_reg[1:5, ], aes(x = reorder(admin1, -total_event), y = total_event, fill = total_event))+
geom_bar(stat = "identity")+
scale_fill_gradient(low = "skyblue", high = "red") +
labs(title = "Event by Region",
x = "Regions",
y = "Total Events")+
theme_minimal() +
theme(panel.grid = element_blank())+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
guides(fill = FALSE)
plot2 <- ggplot(event_by_loc[1:5, ], aes(x = reorder(location, -total_event), y = total_event, fill = total_event))+
geom_bar(stat = "identity")+
scale_fill_gradient(low = "lightblue", high = "lightcoral") +
labs(title = "Event by Location",
x = "Location",
y = "Total Events")+
theme_minimal() +
theme(panel.grid = element_blank())+
theme(axis.text.x = element_text(angle = 45, hjust = 1))+
guides(fill = FALSE)
combined_plots <- plot1 + plot2
# View the combined plot
combined_plots
From the above analysis:
1. High Incidence Regions:-
Oromia, with a total of 4594 events, stands out as the region with the highest incidence of political violence.
Amhara and Tigray follow with 1970 and 1271 events, respectively.
2. Location-Specific Hotspots:-
Addis Ababa is a specific location with a noteworthy 468 events, indicating a concentration of political events in the capital city.
Jijiga, with 269 events, is a notable hotspot outside the major regions.
These points suggest that Oromia, Amhara, and Tigray experience a relatively higher frequency of political events, and in a specific locations like Addis Ababa and Jijiga.
source_summary <- eth_data %>%
group_by(source) %>%
summarise(total_events = n())
source_summary <- source_summary[order(-source_summary$total_events), ]
ggplot(source_summary[1:5, ], aes(reorder(source, -total_events), y = total_events))+
geom_bar(stat = "identity", fill = "skyblue")+
labs( title = "Top Media Sourses of Conflict News",
x = "Source",
y = "Total Events")+
theme_minimal()+
theme(panel.grid = element_blank())+
theme(axis.text.x = element_text(angle = 45, hjust = 1))
→ Oromiya Media Network is the number one source of conflict news as we can observe from the above graph.
actor_intr_summary <- eth_data %>%
group_by(actor1, actor2, interaction) %>%
summarise(total_fatalities = sum(fatalities))
## `summarise()` has grouped output by 'actor1', 'actor2'. You can override using
## the `.groups` argument.
correl_coeffi <- cor(actor_intr_summary$interaction, actor_intr_summary$total_fatalities)
print(paste("Correlation Coefficient: ", correl_coeffi))
## [1] "Correlation Coefficient: -0.0369581507249406"
Create a scatter plot
plot(actor_intr_summary$interaction, actor_intr_summary$total_fatalities,
xlab = "Actor Interaction", ylab = "Total Fatalities",
main = "Correlation between Actor Interaction and Fatalities")
Observation:
A correlation coefficient of -0.03696 suggests a very weak negative correlation between the specific actor interactions and the level of violence in conflict events. However, the correlation is close to zero, so the relationship between these variables is weak, and there isn’t a clear linear association between them.
grouped_data <- eth_data %>%
group_by(admin1,admin2,admin3)
# Summarize by ADMIN1
events_by_admin1 <- grouped_data %>%
group_by(admin1) %>%
summarise(total_events = n())
# Summarize by ADMIN2
events_by_admin2 <- grouped_data %>%
group_by(admin2) %>%
summarise(total_events = n())
# Summarize by ADMIN3
events_by_admin3 <- grouped_data %>%
group_by(admin3) %>%
summarise(total_events = n())
# Function to plot bar chart for an administrative level
plot_admin_level <- function(data, level) {
ggplot(data, aes(x = reorder(get(level), -total_events), y = total_events)) +
geom_bar(stat = "identity", fill = "skyblue") +
labs(title = paste("Top 5 Events by", level),
x = level,
y = "Total Events") +
theme_minimal() +
theme(panel.grid = element_blank(),
axis.text.x = element_text(angle = 45, hjust = 1))
}
# Select top 5 for each administrative level
top5_events_by_admin1 <- events_by_admin1 %>% top_n(5)
## Selecting by total_events
top5_events_by_admin2 <- events_by_admin2 %>% top_n(5)
## Selecting by total_events
top5_events_by_admin3 <- events_by_admin3 %>% top_n(5)
## Selecting by total_events
head(top5_events_by_admin1)
## # A tibble: 5 × 2
## admin1 total_events
## <chr> <int>
## 1 Addis Ababa 468
## 2 Amhara 1970
## 3 Oromia 4594
## 4 Somali 1003
## 5 Tigray 1271
head(top5_events_by_admin2)
## # A tibble: 5 × 2
## admin2 total_events
## <chr> <int>
## 1 East Hararge 535
## 2 North Shewa 541
## 3 North Wello 472
## 4 Region 14 468
## 5 West Shewa 581
head(top5_events_by_admin3)
## # A tibble: 5 × 2
## admin3 total_events
## <chr> <int>
## 1 Ambo town 140
## 2 Gonder town 141
## 3 Jigjiga town 269
## 4 Lideta 382
## 5 Nekemte town 133
# Plot each administrative level side by side
plot1 <- plot_admin_level(top5_events_by_admin1, "admin1")
plot2 <- plot_admin_level(top5_events_by_admin2, "admin2")
plot3 <- plot_admin_level(top5_events_by_admin3, "admin3")
# Arrange plots side by side
library(gridExtra)
grid.arrange(plot1, plot2, plot3, ncol = 3)
Administrative Divisions and Conflict Events:
→ Significant variation in conflict events is observed across administrative divisions.
→ Certain regions, such as Oromia and Amhara, consistently stand out with high conflict frequencies.
→ The capital city Addis Ababa and other towns like Lideta and Jigjiga show concentrated political events.
→ West Shewa, East Hararge, and North Shewa emerge as hotspots at the ADMIN2 level, emphasizing regional disparities.
→ Specific towns like Gonder, Ambo, and Nekemte play noteworthy roles at the ADMIN3 level.
These patterns suggest that understanding conflict dynamics requires considering both administrative and geographical contexts.
Conclusion:
Regional Disparities:
Conflict events in Ethiopia are not uniform and exhibit regional disparities. Regions like Oromia and Amhara consistently experience higher frequencies of conflict.
Urban Concentration:
Urban centers, including the capital city Addis Ababa, and specific towns like Lideta and Jigjiga, are focal points for political events.
Temporal Trends:
Examining trends over time reveals fluctuations, suggesting a dynamic socio-political landscape.
Recommendations:
Enhanced Monitoring:
Implement a robust monitoring system, especially in high-conflict regions, to promptly identify and respond to emerging issues.
Community Engagement:
Foster community engagement and dialogue to address underlying issues contributing to conflict, emphasizing local context.
Urban Security Measures:
Implement targeted security measures in urban centers to mitigate the impact of conflict events, especially in capital regions.
Data-Driven Policies:
Utilize data analytics for evidence-based policy formulation, adapting strategies based on changing conflict dynamics.
Collaborative Initiatives:
Collaborate with local and international stakeholders to address conflict at various administrative levels and promote peace-building efforts.
Harmony in Diversity!
Progress in Unity!
Nurturing Peace for a Flourishing Ethiopia!