The Countries by Intentional Homicide Rate dataset provides information on the intentional homicide rate in countries around the world. The dataset contains information on more than 150 countries and territories, including both developed and developing nations. It provides a comprehensive overview of the variation in homicide rates across different regions and countries around the world.
The aim of this project is to explore global homicide data to understand how homicide rates differ across regions and countries.
1 Which year has the highest report?
2 Which region has the highest homicide crime report?
3 Which sub region has the highest homicide crime report?
4 Which particular locations has the highest homicide crime report?
5 Which region and sub region has the lowest homicide crime report?
6 What part of Africa has the highest homicide crime report?
library(tidyverse)
Before I started any analysis, I first had to load the data and get acquainted with it. i loaded the dataset and performed a quick check.I looked at the first few rows, the structure of the columns, and a statistical summary.
This initial look made me see that the data has 207 locations, with key information like the homicide Rate (per 100,000 people), the total Count, and the Year the data was recorded.
Understanding each column:
Location, Region, Subregion: It starts with names of the exact country or territory (Location), gets more specific with the area (Subregion), and finally names with the continent (Region).
Count: This is the total number of homicides that were recorded in that location.
Rate: This is the most important number for making comparisons. It’s the homicide count adjusted for the country’s population size, specifically showing the number of homicides “per 100,000 people”.
Year: This tells us the year the data was collected
crime_data1<- read.csv("C:\\Users\\USER\\Desktop\\R folder\\countries-by-intentional-homicide-rate.csv")
# view the head of the data
head(crime_data1)
## Location Region Subregion Rate Count Year
## 1 Afghanistan Asia Southern Asia 6.7 2474 2018
## 2 Albania Europe Southern Europe 2.1 61 2020
## 3 Algeria Africa Northern Africa 1.3 580 2020
## 4 Andorra Europe Southern Europe 2.6 2 2020
## 5 Angola Africa Middle Africa 4.8 1217 2012
## 6 Anguilla Americas Caribbean 28.3 4 2014
checking the structure of the data
str(crime_data1)
## 'data.frame': 195 obs. of 6 variables:
## $ Location : chr "Afghanistan" "Albania" "Algeria" "Andorra" ...
## $ Region : chr "Asia" "Europe" "Africa" "Europe" ...
## $ Subregion: chr "Southern Asia" "Southern Europe" "Northern Africa" "Southern Europe" ...
## $ Rate : num 6.7 2.1 1.3 2.6 4.8 28.3 9.2 5.3 1.8 1.9 ...
## $ Count : int 2474 61 580 2 1217 4 9 2416 52 2 ...
## $ Year : int 2018 2020 2020 2020 2012 2014 2020 2020 2020 2014 ...
# converted the region and subregion column into factors
crime_data1$Subregion<-as.factor(crime_data1$Subregion)
crime_data1$Region<- as.factor(crime_data1$Region)
summary(crime_data1)
## Location Region Subregion Rate
## Length:195 Africa :40 Caribbean :25 Min. : 0.000
## Class :character Americas:51 Western Asia :20 1st Qu.: 1.100
## Mode :character Asia :51 Southern Europe:17 Median : 2.600
## Europe :51 Eastern Africa :15 Mean : 6.845
## Oceania : 2 Northern Europe:15 3rd Qu.: 7.850
## South America :13 Max. :49.300
## (Other) :90
## Count Year
## Min. : 0 Min. :2006
## 1st Qu.: 28 1st Qu.:2016
## Median : 128 Median :2019
## Mean : 1943 Mean :2017
## 3rd Qu.: 785 3rd Qu.:2020
## Max. :47722 Max. :2021
##
colSums(is.na(crime_data1))
## Location Region Subregion Rate Count Year
## 0 0 0 0 0 0
Interpretation:
The dataset consists of 195 observations and 6 variables: Location (Country), Region, Subregion, Rate, Count, and Year. - Rate is a numerical variable representing the homicide rate per 100,000 population. - Count is the absolute number of homicides. - Year indicates when the data was recorded, ranging from 2006 to 2021, with most data concentrated around 2019-2020. - There are no missing values in the loaded columns based on the structure check (though empty strings might exist, str shows them as characters). - This dataset provides a snapshot of homicide statistics across different parts of the world.
country_by_year <- crime_data1 %>% count(Year)
country_by_year
## Year n
## 1 2006 2
## 2 2007 1
## 3 2008 1
## 4 2009 6
## 5 2010 3
## 6 2011 5
## 7 2012 11
## 8 2013 5
## 9 2014 5
## 10 2015 9
## 11 2016 9
## 12 2017 10
## 13 2018 13
## 14 2019 20
## 15 2020 94
## 16 2021 1
ggplot(country_by_year, aes(x = Year, y = n, fill = Year))+ geom_col()+scale_x_continuous(breaks = seq(min(crime_data1$Year), max(crime_data1$Year), by = 2))+ labs(title = "Number of Countries recorded by Year", x = "Year", y = "Number of Countries")
I found out that 2020 has the highest record of Homicide
crime_data <- crime_data1%>% filter(Year == 2020)
I wanted to know more about the data,I wanted to know if there are major differences between regions
To answer this, I grouped all the countries by their Region and calculated the average homicide rate for each one.
# Find all unique values in the 'Region' column
Region <- unique(crime_data1$Region)
Region
## [1] Asia Europe Africa Americas Oceania
## Levels: Africa Americas Asia Europe Oceania
# Group by Region and calculate the mean rate
regional_summary <- crime_data1 %>%
group_by(Region) %>%
summarise(
Average_Rate = mean(Rate, na.rm = TRUE),
Country_Count = n() # Also useful to count how many countries are in each region
) %>%
arrange(desc(Average_Rate)) # Sort the results for clarity
# Print the summary table
print(regional_summary)
## # A tibble: 5 × 3
## Region Average_Rate Country_Count
## <fct> <dbl> <int>
## 1 Americas 16.0 51
## 2 Africa 7.40 40
## 3 Asia 2.79 51
## 4 Oceania 1.75 2
## 5 Europe 1.48 51
# Create a bar chart of average homicide rates by region
ggplot(regional_summary, aes(x = reorder(Region, Average_Rate), y = Average_Rate)) +
geom_bar(stat = "identity", fill = "skyblue3") +
geom_text(aes(label = round(Average_Rate, 1)), vjust = -0.3) + # Add rounded labels on top of bars
labs(
title = "Average Intentional Homicide Rate by Region",
subtitle = "The Americas show a significantly higher average rate",
x = "Region",
y = "Average Homicide Rate (per 100,000 people)"
) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) # Angle the x-axis labels for readability
This chart shows that the Americas have a significantly higher average homicide rate than any other region in the world.
The Americas have the highest homicide rates because many countries in the region face strong influences from organized crime and drug-trafficking groups. These criminal networks often compete violently for control of drug routes, territory, and markets, leading to frequent shootings, assassinations, and clashes between rival gangs. Since these conflicts are violent by nature, they directly increase the number of intentional killings.
I wanted to know Which parts of the Americas are driving this high average.
I checked Americas, filtering the data to include only countries from that region. Then, I grouped them by Subregion.
# Filter for the Americas, group by Subregion, and calculate the average rate
americas_subregion_summary <- crime_data %>%
filter(Region == "Americas") %>%
group_by(Subregion) %>%
summarise(
Average_Rate = mean(Rate, na.rm = TRUE),
Country_Count = n() # Count countries in each subregion
) %>%
arrange(desc(Average_Rate))
# Print the resulting summary table
print(americas_subregion_summary)
## # A tibble: 4 × 3
## Subregion Average_Rate Country_Count
## <fct> <dbl> <int>
## 1 Central America 21.7 6
## 2 Caribbean 19.4 10
## 3 South America 12.1 9
## 4 Northern America 4.25 2
# Create a bar chart for the subregional averages in the Americas
ggplot(americas_subregion_summary, aes(x = reorder(Subregion, Average_Rate), y = Average_Rate)) +
geom_bar(stat = "identity", fill = "tomato3") +
geom_text(aes(label = round(Average_Rate, 1)), vjust = -0.3) + # Add labels above bars
labs(
title = "Average Homicide Rate by Subregion in the Americas",
subtitle = "The Caribbean and Central America drive the high regional average",
x = "Subregion",
y = "Average Homicide Rate (per 100,000 people)"
) +
theme_minimal()
This shows that the high average in the Americas isn’t uniform. The Central America and Caribbean has extremely high rates than other regions.
Central America has a major drug-trafficking route, so gangs and cartels fight over territory and smuggling control. These violent battles, combined with poverty and weak policing, make homicide rates extremely high.
The Caribbean they are key stop for drug shipments, and illegal guns are widely available. Small island states struggle with limited policing, so gang disputes and organized crime often turn deadly.
Northern America has stronger police systems, better economic stability, and fewer powerful cartels. These conditions reduce violent conflicts and help keep homicide rates comparatively low.
I checked Which countries have the highest homicide rates within Central America
# Filter for Central America, arrange by Rate.
central_america <- crime_data %>%
filter(Subregion == "Central America") %>%
arrange(desc(Rate))
# Print the resulting table
print(central_america)
## Location Region Subregion Rate Count Year
## 1 Honduras Americas Central America 36.3 3598 2020
## 2 Mexico Americas Central America 28.4 36579 2020
## 3 Belize Americas Central America 25.7 102 2020
## 4 Guatemala Americas Central America 17.5 3129 2020
## 5 Costa Rica Americas Central America 11.2 570 2020
## 6 Panama Americas Central America 11.1 480 2020
# Create a bar chart for the top Central American locations
ggplot(central_america, aes(x = reorder(Location, Rate), y = Rate)) +
geom_col(aes(fill = Location), show.legend = FALSE, width = 0.7) +
coord_flip() +
geom_text(aes(label = paste(Rate, sep="")), hjust = -0.1, size = 3.5) +
# Expands the plot limits to ensure text fits
scale_y_continuous(limits = c(0, 42)) +
labs(
title = "Top Homicide Rates in Central America",
subtitle = "Rate per 100,000 people. Year of data shown in parentheses.",
x = "Location",
y = "Homicide Rate"
) +
theme_minimal(base_size = 12)
This gives us the correct list of the top hotspots in the highest-risk subregion, led by Honduras and Mexico
I tried to find out why there was high crime rate in Honduras and i discovered that Honduras had very powerful street gangs, notably Mara Salvatrucha (MS-13) and 18th Street (Barrio 18).
Mara Salvatrucha (MS-13) is a violent transnational gang that operates heavily in countries like Honduras, El Salvador, and Guatemala. They are mainly involved in extortion, drug distribution, territory control, and violent retaliation. They use fear and violence to control neighborhoods and make money from local businesses, transportation routes, and drug sales.
Barrio 18 is another powerful gang in the same region, known for competing directly with MS-13. They are involved in extortion, drug dealing, robbery, and controlling local territories. Their rivalry with MS-13 often leads to deadly clashes, which heavily contributes to high homicide rates in Central America.
These gangs are deeply embedded in their society; they control territories, extort businesses, recruit youth, and were heavily involved in violent crime.
Mexico also had very powerful drug cartels and street gangs that were using the pandemic to exploit and carry out their evil agenda
Seeing that Caribbean also has high homicide rate, I decided to check Which specific locations have the highest rates within the Caribbean. This would take us from a broad trend to a list of specific hot spots.
# Filter for Caribbean
caribbean <- crime_data %>%
filter(Subregion == "Caribbean") %>%
arrange(desc(Rate))%>%
head(10)
# Print the resulting table
print(caribbean)
## Location Region Subregion Rate Count Year
## 1 Jamaica Americas Caribbean 44.7 1323 2020
## 2 Saint Lucia Americas Caribbean 28.3 52 2020
## 3 Dominica Americas Caribbean 20.8 15 2020
## 4 Saint Kitts and Nevis Americas Caribbean 18.8 10 2020
## 5 Bahamas Americas Caribbean 18.6 73 2020
## 6 Puerto Rico Americas Caribbean 18.5 529 2020
## 7 Barbados Americas Caribbean 14.3 41 2020
## 8 Grenada Americas Caribbean 12.4 14 2020
## 9 Antigua and Barbuda Americas Caribbean 9.2 9 2020
## 10 Dominican Republic Americas Caribbean 8.9 961 2020
# Create a bar chart for Caribbean locations
ggplot(caribbean, aes(x = reorder(Location, Rate), y = Rate)) +
geom_col(aes(fill = Location), show.legend = FALSE, width = 0.7) + # geom_col is a shortcut for geom_bar(stat="identity")
coord_flip() +
geom_text(aes(label = paste(Rate)), hjust = -0.1, size = 3.5) + # Label with Rate and Year
# Expands the plot limits to ensure text fits
scale_y_continuous(limits = c(0, 55)) +
labs(
title = "Top Homicide Rates in the Caribbean",
x = "Location",
y = "Homicide Rate"
) +
theme_minimal(base_size = 12)
Explanation
I found that Jamaica has the highest homicide rate in the Caribbean, so I looked into the underlying causes. Research shows that a major driver of Jamaica’s violence is the widespread availability of illegal firearms, many of which are trafficked from the U.S. mainland into the Caribbean. This inflow of weapons significantly fuels gang conflicts, organised crime, and retaliatory violence, all of which contribute to Jamaica’s exceptionally high homicide rate.
Saint Lucia was Saint Lucia also had high rate of illegal firearm and within this period they had very weak Criminal Justice System whereby people kill and get away with it.
From this analysis, Europe has the lowest homicide rates, so I decided to see which subregion and countries in Europe are actually the safest of the safe locations in Europe.
# Filter for Europe, group by Subregion, and calculate the average rate
europe_subregion_summary <- crime_data %>%
filter(Region == "Europe") %>%
group_by(Subregion) %>%
summarise(
Average_Rate = mean(Rate, na.rm = TRUE),
Country_Count = n()
) %>%
arrange(desc(Average_Rate))
# Print the resulting summary table
print(europe_subregion_summary)
## # A tibble: 4 × 3
## Subregion Average_Rate Country_Count
## <fct> <dbl> <int>
## 1 Eastern Europe 1.94 8
## 2 Northern Europe 1.79 9
## 3 Southern Europe 1.32 12
## 4 Western Europe 0.957 7
# Create a bar chart for the subregional averages in Europe
ggplot(europe_subregion_summary, aes(x = reorder(Subregion, Average_Rate), y = Average_Rate)) +
geom_col(fill = "blue4") +
geom_text(aes(label = round(Average_Rate, 2)), vjust = -0.5, size = 4) +
labs(
title = "Average Homicide Rate by Subregion in Europe",
subtitle = "Eastern Europe shows a higher average rate compared to other European subregions",
x = "Subregion",
y = "Average Homicide Rate (per 100,000 people)"
) +
# Set y-axis limit to be comparable to our Americas plot for context
coord_cartesian(ylim = c(0, 3)) +
theme_minimal(base_size = 12)
Explanation
I was able to see that Western Europe is actually the safest sub region in this analysis followed by Southern Europe, and the Eastern Europe is actually the least safest.
Step 7
Now i want to see the countries in this different sub regions, that is the western Europe and the Eastern Europe.
# Filter for the two subregions and select key columns
european_locations <- crime_data %>%
filter(Subregion %in% c("Western Europe", "Eastern Europe")) %>%
select(Location, Subregion, Rate, Year) %>%
arrange(Subregion, Rate) # Sort by subregion, then by rate
# Print the full list
european_locations
## Location Subregion Rate Year
## 1 Czech Republic Eastern Europe 0.7 2020
## 2 Poland Eastern Europe 0.7 2020
## 3 Hungary Eastern Europe 0.8 2020
## 4 Bulgaria Eastern Europe 1.0 2020
## 5 Slovakia Eastern Europe 1.2 2020
## 6 Romania Eastern Europe 1.5 2020
## 7 Moldova Eastern Europe 2.3 2020
## 8 Russia Eastern Europe 7.3 2020
## 9 Luxembourg Western Europe 0.2 2020
## 10 Switzerland Western Europe 0.5 2020
## 11 Netherlands Western Europe 0.6 2020
## 12 Austria Western Europe 0.7 2020
## 13 Germany Western Europe 0.8 2020
## 14 France Western Europe 1.3 2020
## 15 Liechtenstein Western Europe 2.6 2020
# Filter for the two subregions (this is the same data as before)
european_locations <- crime_data %>%
filter(Subregion %in% c("Western Europe", "Eastern Europe"))
# Create the plot using ggplot2
ggplot(european_locations, aes(x = reorder(Location, Rate), y = Rate, fill = Subregion)) +
geom_col(show.legend = FALSE) + # Creates the bars, hides the redundant legend
coord_flip() + # Flips the chart to make country names readable
facet_wrap(~ Subregion, scales = "free_y") + # Creates separate panels for each subregion
labs(
title = "Homicide Rates in Western vs. Eastern Europe",
subtitle = "Comparing individual countries within Europe's safest subregions",
x = "Country / Location",
y = "Homicide Rate (per 100,000 people)"
) +
theme_bw() + # A clean black and white theme
# Use different colors for each panel to make them distinct
scale_fill_manual(values = c("Western Europe" = "steelblue","Eastern Europe" = "firebrick3"))+
geom_text(
aes(label = paste(Rate, sep = "")),
hjust = 1,
size = 2
)
Explanation
Russia recorded the highest homicide rate in Eastern Europe in 2020 due to activities such as drug trafficking, widespread alcohol-related violence, stronger organised crime networks, and a high level of domestic violence. These combined factors kept Russia’s homicide rate above regional averages despite an overall long-term decline.
I also decided to check for why western Europe was very safe and i saw that they had well funded police and court systems, strict gun laws,strong social systems. Also i found out that most of the international organizations has the Headquaters in western Europe like:
- Interpol (France)
- World Trade Organization (Switzerland)
- FIFA (Switzerland)
- European Union (Belgium)
- NATO(National Atlantic Treaty Organization) (Belgium)
# 1. Filter for Africa and group by Subregion
africa_subregion_summary <- crime_data %>%
filter(Region == "Africa") %>%
group_by(Subregion) %>%
summarise(
Average_Rate = mean(Rate, na.rm = TRUE)
) %>%
arrange(desc(Average_Rate))
# 2. Print the summary table
print(africa_subregion_summary)
## # A tibble: 4 × 2
## Subregion Average_Rate
## <fct> <dbl>
## 1 Southern Africa 22.7
## 2 Western Africa 6.5
## 3 Eastern Africa 5.5
## 4 Northern Africa 2.47
ggplot(africa_subregion_summary, aes(x = reorder(Subregion, Average_Rate), y = Average_Rate)) +
geom_col(fill = "darkgoldenrod3") + # geom_col is a shortcut for bar charts
geom_text(aes(label = round(Average_Rate, 1)), vjust = -0.3) + # Add rate labels
labs(
title = "Average Homicide Rate by Subregion in Africa",
subtitle = "Southern Africa shows a significantly higher average rate",
x = "Subregion",
y = "Average Homicide Rate"
) +
theme_minimal()
Explanation
I discovered that in Africa, Southern Africa has the highest Homicide rates and Northern Africa has the lowest homicide rates.
Southern Africa’s high homicide rate comes from strong gang presence, economic inequality, high unemployment, and widespread access to guns — making everyday conflicts more deadly.
I want to check for the countries in Africa that has the highest homicide rates
# Filter for Africa to create a dataset of only African countries
africa_data <- crime_data %>%
filter(Region == "Africa")
# Find the 4 countries with the HIGHEST homicide rates
highest_in_africa <- africa_data %>%
arrange(desc(Rate)) %>%
head(4)
# Find the 4 countries with the LOWEST homicide rates
lowest_in_africa <- africa_data %>%
arrange(Rate) %>%
head(4)
# Print both tables
cat("--- Top 4 Highest Homicide Rates in Africa ---\n")
## --- Top 4 Highest Homicide Rates in Africa ---
print(highest_in_africa)
## Location Region Subregion Rate Count Year
## 1 South Africa Africa Southern Africa 33.5 19846 2020
## 2 Namibia Africa Southern Africa 11.9 303 2020
## 3 Uganda Africa Eastern Africa 9.7 4460 2020
## 4 Cape Verde Africa Western Africa 6.5 36 2020
cat("\n--- Top 4 Lowest Homicide Rates in Africa ---\n")
##
## --- Top 4 Lowest Homicide Rates in Africa ---
print(lowest_in_africa)
## Location Region Subregion Rate Count Year
## 1 Algeria Africa Northern Africa 1.3 580 2020
## 2 Morocco Africa Northern Africa 1.3 487 2020
## 3 Mauritius Africa Eastern Africa 2.8 35 2020
## 4 Kenya Africa Eastern Africa 4.0 2151 2020
# --- Code from previous step to get highest/lowest data ---
africa_data <- crime_data %>%
filter(Subregion == "Southern Africa")
africa_data1<-crime_data %>%
filter(Subregion=="Northern Africa")
highest_in_africa <- africa_data %>%
arrange(desc(Rate)) %>%
head(5) %>%
mutate(Category = "Southern Africa") # Add a category label
lowest_in_africa <- africa_data1 %>%
arrange(Rate) %>%
head(5) %>%
mutate(Category = "Northern Africa") # Add a category label
# Combine the two dataframes into one
africa_extremes <- bind_rows(highest_in_africa, lowest_in_africa)
# --- New code to create the plot ---
ggplot(africa_extremes, aes(x = reorder(Location, Rate), y = Rate, fill = Category)) +
geom_col(show.legend = FALSE) +
coord_flip() + # Flip coordinates to make country names readable
facet_wrap(~ Category, scales = "free_y") + # Create separate panels for high/low
labs(
title = "Highest and Lowest Homicide Rates in Africa",
subtitle = "Comparing the 4 most and least violent countries from the dataset",
x = "Country / Location",
y = "Homicide Rate"
) +
theme_bw() + # A clean theme
# Add labels with Rate and Year for context
geom_text(
aes(label = paste(Rate, sep = "")),
vjust = 1,
size = 3
)
Eplanation
I found out that in the year 2020, South Africa had understaffed and under-resourced police force and in 2020 many of their officers were infected with Covid 19 which made them to close down some of their police stations.
Country that has the highest crime rate in 2020
top_recent_hotspots <- crime_data %>%
arrange(desc(Rate)) %>%
head(15)
# Print the resulting table
print(top_recent_hotspots)
## Location Region Subregion Rate Count Year
## 1 Jamaica Americas Caribbean 44.7 1323 2020
## 2 Honduras Americas Central America 36.3 3598 2020
## 3 South Africa Africa Southern Africa 33.5 19846 2020
## 4 Mexico Americas Central America 28.4 36579 2020
## 5 Saint Lucia Americas Caribbean 28.3 52 2020
## 6 Belize Americas Central America 25.7 102 2020
## 7 Colombia Americas South America 22.6 11520 2020
## 8 Brazil Americas South America 22.5 47722 2020
## 9 Dominica Americas Caribbean 20.8 15 2020
## 10 Guyana Americas South America 20.0 157 2020
## 11 Saint Kitts and Nevis Americas Caribbean 18.8 10 2020
## 12 Bahamas Americas Caribbean 18.6 73 2020
## 13 Puerto Rico Americas Caribbean 18.5 529 2020
## 14 Guatemala Americas Central America 17.5 3129 2020
## 15 Barbados Americas Caribbean 14.3 41 2020
# Create the bar chart for the top 15 recent hotspots
ggplot(top_recent_hotspots, aes(x = reorder(Location, Rate), y = Rate, fill = Region)) +
geom_col() + # Creates the bar chart
coord_flip() + # Flips axes to make country names readable
# Add text labels for Rate and Year to each bar for clarity
geom_text(aes(label = paste(Rate, " (", Year, ")", sep = "")), hjust = -0.1, size = 3) +
# Manually set colors for each region for better visual distinction
scale_fill_manual(values = c("Americas" = "tomato3", "Africa" = "darkgoldenrod3", "Asia" = "mediumseagreen")) +
# Expand plot limits to ensure text labels fit
scale_y_continuous(limits = c(0, 50)) +
labs(
title = "Top 15 Homicide Rates from 2020",
subtitle = "Rates are per 100,000 people. Year of data shown in parentheses.",
x = "Country / Location",
y = "Homicide Rate",
fill = "Region"
) +
theme_minimal(base_size = 12) +
theme(legend.position = "bottom") # Move legend to the bottom
Explanation
I found out that this was the period of Covid 19 and there was economic hardship due to the lockdown and it led alot of people into desperation like property crimes, theft and robberies and the increase in this regions was mainly caused by organized crime from gangs that saw the lock down as an opportunity to extort and increase their criminal activities where intentional homicide comes in.