1 Introduction

Analysis about crime data is really interesting thing to me. I have done data analysis about global terrorism dataset.
1. Global Terrorism Dataset Shiny Application

I believe visualizing about crime history helps people have better understandings about crime and terrorisms.
Also I think having good understanding about terrorisms has never been mor important in human history.

So, I hope this article helps you understand a little bit more about crime and terrorism data.

2 Code

2.1 Preparation

library(data.table)
library(tidyverse)
## ── Attaching packages ────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000     ✔ purrr   0.2.4     
## ✔ tibble  1.4.2          ✔ dplyr   0.7.4     
## ✔ tidyr   0.8.0          ✔ stringr 1.3.0     
## ✔ readr   1.1.1          ✔ forcats 0.3.0
## ── Conflicts ───────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::between()   masks data.table::between()
## ✖ dplyr::filter()    masks stats::filter()
## ✖ dplyr::first()     masks data.table::first()
## ✖ dplyr::lag()       masks stats::lag()
## ✖ dplyr::last()      masks data.table::last()
## ✖ purrr::transpose() masks data.table::transpose()
## ✖ dplyr::vars()      masks ggplot2::vars()
library(readr)
library(stringr)
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(scales)
## 
## Attaching package: 'scales'
## The following object is masked from 'package:purrr':
## 
##     discard
## The following object is masked from 'package:readr':
## 
##     col_factor
pakistan10 <- read.csv("https://s3-ap-southeast-2.amazonaws.com/koki25ando/PakistanSuicideAttacks+Ver+6+(10-October-2017).csv", stringsAsFactors=FALSE, fileEncoding="latin1")
pakistan11 <- read.csv("https://s3-ap-southeast-2.amazonaws.com/koki25ando/PakistanSuicideAttacks+Ver+11+(30-November-2017).csv", stringsAsFactors=FALSE, fileEncoding="latin1")

2.1.1 Data cleaning

pakistan <- bind_rows(pakistan10, pakistan11)
pakistan$Longitude <- as.numeric(pakistan$Longitude)
## Warning: NAs introduced by coercion
pakistan$Location.Sensitivity <- as.factor(pakistan$Location.Sensitivity)
pakistan$Injured.Max <- as.numeric(pakistan$Injured.Max)
## Warning: NAs introduced by coercion
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "civilian", replacement = "Civilian")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "foreigner", replacement = "Foreigner")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "police", replacement = "Police")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "Government official", replacement = "Government Official")
pakistan$Target.Type <- str_replace(pakistan$Target.Type, pattern = "religious", replacement = "Religious")

# Create a date
pakistan <- 
  pakistan %>% 
  select(-S.) %>% 
  separate(Date, sep = "-", into = c("Day", "Date"), extra = "merge")
pakistan$Date <- 
  pakistan$Date %>% 
  str_replace(" ", "-")
pakistan$Date <- pakistan$Date %>% as.Date("%b-%d-%Y")
## Warning in strptime(x, format, tz = "GMT"): unknown timezone 'zone/tz/
## 2018c.1.0/zoneinfo/Australia/Sydney'
pakistan$Day <- as.factor(pakistan$Day)

pakistan <- 
  pakistan %>% 
  mutate(Year = year(Date))

2.2 Analysis

2.2.1 Data Component

pakistan %>% head()
##         Day       Date                  Islamic.Date Blast.Day.Type
## 1    Sunday 1995-11-19 25 Jumaada al-THaany 1416 A.H        Holiday
## 2    Monday 2000-11-06          10 SHa`baan 1421 A.H    Working Day
## 3 Wednesday 2002-05-08             25 safar 1423 A.H    Working Day
## 4    Friday 2002-06-14    3 Raby` al-THaany 1423 A.H    Working Day
## 5    Friday 2003-07-04    4 Jumaada al-awal 1424 A.H    Working Day
## 6  Thursday 2003-12-25     2 Thw al-Qi`dah 1424 A.H.        Holiday
##                         Holiday.Type                  Time       City
## 1                            Weekend                   N/A  Islamabad
## 2                                                      N/A    Karachi
## 3                                                  7:45 AM   Karachi 
## 4                                              11:10:00 AM    Karachi
## 5                                                      N/A     Quetta
## 6 Christmas/birthday of Quaid-e-Azam 1:40:00 PM/1:42:00 PM Rawalpindi
##   Latitude Longitude    Province
## 1  33.7180   73.0718     Capital
## 2  24.9918   66.9911       Sindh
## 3  24.9918   66.9911       Sindh
## 4  24.9918   66.9911       Sindh
## 5  30.2095   67.0182 Baluchistan
## 6  33.6058   73.0437      Punjab
##                                                     Location
## 1                                           Egyptian Embassy
## 2                                      office of Nawa-e-Waqt
## 3 Pakistan Navy bus Parked outside Five Star Sheraton Hotel 
## 4                             US Consulate Civil Lines Area 
## 5                             Imambargah MeCongy Road Quetta
## 6                             Jhanda Chichi area rawalpindi 
##   Location.Category Location.Sensitivity Open.Closed.Space
## 1           Foreign                 High            Closed
## 2   Office Building                  Low            Closed
## 3             Hotel               Medium            Closed
## 4           Foreign                 High            Closed
## 5         Religious               Medium            Closed
## 6            Mobile                  Low              Open
##                                     Influencing.Event.Event Target.Type
## 1                                                             Foreigner
## 2                                                                 Media
## 3                                                             Foreigner
## 4                                                             Foreigner
## 5                                      during Friday prayer   Religious
## 6 president's/chief of army staff convoy passing from there    Military
##   Targeted.Sect.if.any Killed.Min Killed.Max Injured.Min Injured.Max
## 1                 None         14         15          NA          60
## 2                 None         NA          3          NA           3
## 3            Christian         13         15          20          40
## 4            Christian         NA         12          NA          51
## 5               Shiite         44         47          NA          65
## 6                 None         16         18          NA          50
##   No..of.Suicide.Blasts Explosive.Weight..max.
## 1                     2                       
## 2                     1                       
## 3                     1                 2.5 Kg
## 4                     1                   <NA>
## 5                     1                   <NA>
## 6                     2       30kg in each car
##                                                               Hospital.Names
## 1                                                                           
## 2                                                                           
## 3 1.Jinnah Postgraduate Medical Center 2. Civil Hospital Karachi 3. PN Shifa
## 4                                                                       <NA>
## 5                  1.CMH Quetta \n2.Civil Hospital 3. Boland Medical Complex
## 6                                        1.District headquarters \nHospital 
##   Temperature.C. Temperature.F. Year
## 1         15.835         60.503 1995
## 2         23.770         74.786 2000
## 3         31.460         88.628 2002
## 4         31.430         88.574 2002
## 5         33.120         91.616 2003
## 6          9.445         49.001 2003

Data itself has many interesting variables such as date, location data.etc…

2.2.2 Pakistan Map Visualisation

world.map <- map_data ("world")
## 
## Attaching package: 'maps'
## The following object is masked from 'package:purrr':
## 
##     map
world.map <- world.map %>% filter(region == "Pakistan")
ggplot() + 
  geom_map(data=world.map, map=world.map,
           aes(x=long, y=lat, group=group, map_id=region),
           fill="white", colour="black") + 
  geom_point(data = pakistan, aes(x = Longitude, y = Latitude), colour = "red") + 
  labs(x = "Longitude", y = "Latitude", 
       title = "Pakistan", subtitle ="Where all the attacks happen?")
## Warning: Ignoring unknown aesthetics: x, y
## Warning: Removed 7 rows containing missing values (geom_point).

2.2.3 Where is the Dangerous Zone?

pp <- ggplot(as.data.frame(table(pakistan$Province)) %>% 
         arrange(desc(Freq)),
       aes(reorder(Var1, -Freq), Freq, fill = Var1)) + 
  geom_bar(stat = "identity") + 
  labs(x="Province", title = "Top 10 Provinces that experienced most Suicide Bombing Attacks")
ggplotly(pp)

KPK Province has experienced by far the most accidents.

2.2.4 KPK Province’s data analysis

kpk <- pakistan %>% 
  filter(Province == "KPK")
bp <- kpk %>% 
  ggplot(aes(x = Year, fill = Location.Category)) + 
  geom_bar() +
  labs(title = "Where did Suicide Bombing Attacks in KPK province from 2004 to 2017 happen?", 
       subtitle = "Where did they happen?") +
  scale_fill_discrete(name="Location")
ggplotly(bp)
## Warning: position_stack requires non-overlapping x intervals

2009 was the peak. And that year, police stations ware the main target.

2.2.5 Map Visualisation with the sizes of attacks

ggplot() + 
  geom_map(data=world.map, map=world.map,
           aes(x=long, y=lat, group=group, map_id=region),
           fill="white", colour="black") + 
  geom_point(data = kpk, aes(x = Longitude, y = Latitude, colour = Injured.Min, size = Killed.Max)) + 
  labs(x = "Longitude", y = "Latitude", size = "Killed", title = "Suicide Bombing Attacks in KPK Province") + 
  scale_colour_gradient(low = "yellow", high = "red", name = "Injured") + 
  xlim(67,74) + 
  ylim(30,35.5)
## Warning: Ignoring unknown aesthetics: x, y
## Warning: Removed 20 rows containing missing values (geom_point).

bar.plot <- kpk %>% 
  ggplot(aes(x = Year, fill = Target.Type)) + 
  geom_bar() +
  labs(title = "Who were the targets in PKP?", fill = "Target Type")
ggplotly(bar.plot)

One thing remarkable is that polices have been main targets.

city.p <- ggplot(as.data.frame(table(kpk$City)) %>% 
         arrange(desc(Freq)) %>% head(10),
       aes(reorder(Var1, -Freq), Freq, fill = Var1)) + 
  geom_bar(stat = "identity") + 
  labs(x="City", title = "Which city has experienced the most Attacks?")
ggplotly(city.p)

Peshawar has experienced by far the most sucide bombing attacks.
Now let’s focus on the city.

peshawar <- kpk %>% filter(City == "Peshawar")
peshawar.p <- peshawar %>% 
  ggplot(aes(Day, fill = Location.Category)) + 
  geom_bar() + 
  scale_x_discrete(limits=c("Sunday", "Monday", "Tuesday", 
                            "Wednesday", "Thursday", "Friday", "Saturday")) + 
  labs(title = "When did attacks happen?")
ggplotly(peshawar.p)

Surprisingly, weekend days has experienced less attacks compared to working days.

peshawar <- 
  peshawar %>% 
  mutate(Month = month(Date))
peshawar$Month <- as.factor(peshawar$Month)

peshawar.month <- peshawar %>% 
  ggplot(aes(Month)) + 
  geom_bar() +
  scale_x_discrete(labels = c('Jan','Feb','Mar', 'Apr','May','Jun',
                              'Jul','Aug','Sep','Oct','Nov','Dec')) + 
  labs(title = "When did attacks happen?")
ggplotly(peshawar.month)

2.2.6 Who were most targeted?

plot_ly(as.data.frame(table(peshawar$Target.Type))[-1,], 
        labels = ~Var1, values = ~Freq, type = 'pie') %>%
  layout(title = 'Who were most targeted in Peshawar?',
         xaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE),
         yaxis = list(showgrid = FALSE, zeroline = FALSE, showticklabels = FALSE))

2.3 Conclusion

Ok, that’s it. What i have done in this article was finding
1. Dangerous zone/city in Pakistan
2. When thouse attacks happened
3. People targeted in those places

Thanks for reading my article.
Koki