Analysis about crime data is really interesting thing to me. I have done data analysis about global terrorism dataset.
1. Global Terrorism Dataset Shiny Application
I believe visualizing about crime history helps people have better understandings about crime and terrorisms.
Also I think having good understanding about terrorisms has never been mor important in human history.
So, I hope this article helps you understand a little bit more about crime and terrorism data.
library(data.table)
library(tidyverse)## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000 ✔ purrr 0.2.4
## ✔ tibble 1.4.2 ✔ dplyr 0.7.4
## ✔ tidyr 0.8.0 ✔ stringr 1.3.0
## ✔ readr 1.1.1 ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between() masks data.table::between()
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::first() masks data.table::first()
## ✖ dplyr::lag() masks stats::lag()
## ✖ dplyr::last() masks data.table::last()
## ✖ purrr::transpose() masks data.table::transpose()
## ✖ dplyr::vars() masks ggplot2::vars()
library(readr)
library(stringr)
library(plotly)##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
library(scales)##
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
##
## discard
## The following object is masked from 'package:readr':
##
## col_factor
pakistan10 <- read.csv("https://s3-ap-southeast-2.amazonaws.com/koki25ando/PakistanSuicideAttacks+Ver+6+(10-October-2017).csv", stringsAsFactors=FALSE, fileEncoding="latin1")
pakistan11 <- read.csv("https://s3-ap-southeast-2.amazonaws.com/koki25ando/PakistanSuicideAttacks+Ver+11+(30-November-2017).csv", stringsAsFactors=FALSE, fileEncoding="latin1")pakistan <- bind_rows(pakistan10, pakistan11)
pakistan$Longitude <- as.numeric(pakistan$Longitude)## Warning: NAs introduced by coercion
pakistan$Location.Sensitivity <- as.factor(pakistan$Location.Sensitivity)
pakistan$Injured.Max <- as.numeric(pakistan$Injured.Max)## Warning: NAs introduced by coercion
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "civilian", replacement = "Civilian")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "foreigner", replacement = "Foreigner")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "police", replacement = "Police")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "Government official", replacement = "Government Official")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "religious", replacement = "Religious")
# Create a date
pakistan <-
pakistan %>%
select(-S.) %>%
separate(Date, sep = "-", into = c("Day", "Date"), extra = "merge")
pakistan$Date <-
pakistan$Date %>%
str_replace(" ", "-")
pakistan$Date <- pakistan$Date %>% as.Date("%b-%d-%Y")## Warning in strptime(x, format, tz = "GMT"): unknown timezone 'zone/tz/
## 2018c.1.0/zoneinfo/Australia/Sydney'
pakistan$Day <- as.factor(pakistan$Day)
pakistan <-
pakistan %>%
mutate(Year = year(Date))pakistan %>% head()## Day Date Islamic.Date Blast.Day.Type
## 1 Sunday 1995-11-19 25 Jumaada al-THaany 1416 A.H Holiday
## 2 Monday 2000-11-06 10 SHa`baan 1421 A.H Working Day
## 3 Wednesday 2002-05-08 25 safar 1423 A.H Working Day
## 4 Friday 2002-06-14 3 Raby` al-THaany 1423 A.H Working Day
## 5 Friday 2003-07-04 4 Jumaada al-awal 1424 A.H Working Day
## 6 Thursday 2003-12-25 2 Thw al-Qi`dah 1424 A.H. Holiday
## Holiday.Type Time City
## 1 Weekend N/A Islamabad
## 2 N/A Karachi
## 3 7:45 AM Karachi
## 4 11:10:00 AM Karachi
## 5 N/A Quetta
## 6 Christmas/birthday of Quaid-e-Azam 1:40:00 PM/1:42:00 PM Rawalpindi
## Latitude Longitude Province
## 1 33.7180 73.0718 Capital
## 2 24.9918 66.9911 Sindh
## 3 24.9918 66.9911 Sindh
## 4 24.9918 66.9911 Sindh
## 5 30.2095 67.0182 Baluchistan
## 6 33.6058 73.0437 Punjab
## Location
## 1 Egyptian Embassy
## 2 office of Nawa-e-Waqt
## 3 Pakistan Navy bus Parked outside Five Star Sheraton Hotel
## 4 US Consulate Civil Lines Area
## 5 Imambargah MeCongy Road Quetta
## 6 Jhanda Chichi area rawalpindi
## Location.Category Location.Sensitivity Open.Closed.Space
## 1 Foreign High Closed
## 2 Office Building Low Closed
## 3 Hotel Medium Closed
## 4 Foreign High Closed
## 5 Religious Medium Closed
## 6 Mobile Low Open
## Influencing.Event.Event Target.Type
## 1 Foreigner
## 2 Media
## 3 Foreigner
## 4 Foreigner
## 5 during Friday prayer Religious
## 6 president's/chief of army staff convoy passing from there Military
## Targeted.Sect.if.any Killed.Min Killed.Max Injured.Min Injured.Max
## 1 None 14 15 NA 60
## 2 None NA 3 NA 3
## 3 Christian 13 15 20 40
## 4 Christian NA 12 NA 51
## 5 Shiite 44 47 NA 65
## 6 None 16 18 NA 50
## No..of.Suicide.Blasts Explosive.Weight..max.
## 1 2
## 2 1
## 3 1 2.5 Kg
## 4 1 <NA>
## 5 1 <NA>
## 6 2 30kg in each car
## Hospital.Names
## 1
## 2
## 3 1.Jinnah Postgraduate Medical Center 2. Civil Hospital Karachi 3. PN Shifa
## 4 <NA>
## 5 1.CMH Quetta \n2.Civil Hospital 3. Boland Medical Complex
## 6 1.District headquarters \nHospital
## Temperature.C. Temperature.F. Year
## 1 15.835 60.503 1995
## 2 23.770 74.786 2000
## 3 31.460 88.628 2002
## 4 31.430 88.574 2002
## 5 33.120 91.616 2003
## 6 9.445 49.001 2003
Data itself has many interesting variables such as date, location data.etc…
world.map <- map_data ("world")##
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
##
## map
world.map <- world.map %>% filter(region == "Pakistan")
ggplot() +
geom_map(data=world.map, map=world.map,
aes(x=long, y=lat, group=group, map_id=region),
fill="white", colour="black") +
geom_point(data = pakistan, aes(x = Longitude, y = Latitude), colour = "red") +
labs(x = "Longitude", y = "Latitude",
title = "Pakistan", subtitle ="Where all the attacks happen?")## Warning: Ignoring unknown aesthetics: x, y
## Warning: Removed 7 rows containing missing values (geom_point).
pp <- ggplot(as.data.frame(table(pakistan$Province)) %>%
arrange(desc(Freq)),
aes(reorder(Var1, -Freq), Freq, fill = Var1)) +
geom_bar(stat = "identity") +
labs(x="Province", title = "Top 10 Provinces that experienced most Suicide Bombing Attacks")
ggplotly(pp)KPK Province has experienced by far the most accidents.
kpk <- pakistan %>%
filter(Province == "KPK")
bp <- kpk %>%
ggplot(aes(x = Year, fill = Location.Category)) +
geom_bar() +
labs(title = "Where did Suicide Bombing Attacks in KPK province from 2004 to 2017 happen?",
subtitle = "Where did they happen?") +
scale_fill_discrete(name="Location")
ggplotly(bp)## Warning: position_stack requires non-overlapping x intervals
2009 was the peak. And that year, police stations ware the main target.
ggplot() +
geom_map(data=world.map, map=world.map,
aes(x=long, y=lat, group=group, map_id=region),
fill="white", colour="black") +
geom_point(data = kpk, aes(x = Longitude, y = Latitude, colour = Injured.Min, size = Killed.Max)) +
labs(x = "Longitude", y = "Latitude", size = "Killed", title = "Suicide Bombing Attacks in KPK Province") +
scale_colour_gradient(low = "yellow", high = "red", name = "Injured") +
xlim(67,74) +
ylim(30,35.5)## Warning: Ignoring unknown aesthetics: x, y
## Warning: Removed 20 rows containing missing values (geom_point).
bar.plot <- kpk %>%
ggplot(aes(x = Year, fill = Target.Type)) +
geom_bar() +
labs(title = "Who were the targets in PKP?", fill = "Target Type")
ggplotly(bar.plot)One thing remarkable is that polices have been main targets.
city.p <- ggplot(as.data.frame(table(kpk$City)) %>%
arrange(desc(Freq)) %>% head(10),
aes(reorder(Var1, -Freq), Freq, fill = Var1)) +
geom_bar(stat = "identity") +
labs(x="City", title = "Which city has experienced the most Attacks?")
ggplotly(city.p)Peshawar has experienced by far the most sucide bombing attacks.
Now let’s focus on the city.
peshawar <- kpk %>% filter(City == "Peshawar")
peshawar.p <- peshawar %>%
ggplot(aes(Day, fill = Location.Category)) +
geom_bar() +
scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday",
"Wednesday", "Thursday", "Friday", "Saturday")) +
labs(title = "When did attacks happen?")
ggplotly(peshawar.p)Surprisingly, weekend days has experienced less attacks compared to working days.
peshawar <-
peshawar %>%
mutate(Month = month(Date))
peshawar$Month <- as.factor(peshawar$Month)
peshawar.month <- peshawar %>%
ggplot(aes(Month)) +
geom_bar() +
scale_x_discrete(labels = c('Jan','Feb','Mar', 'Apr','May','Jun',
'Jul','Aug','Sep','Oct','Nov','Dec')) +
labs(title = "When did attacks happen?")
ggplotly(peshawar.month)plot_ly(as.data.frame(table(peshawar$Target.Type))[-1,],
labels = ~Var1, values = ~Freq, type = 'pie') %>%
layout(title = 'Who were most targeted in Peshawar?',
xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))Ok, that’s it. What i have done in this article was finding
1. Dangerous zone/city in Pakistan
2. When thouse attacks happened
3. People targeted in those places
Thanks for reading my article.
Koki