I’ve decided to start exploring my deepest interst - conflict resolution. I had to find relevant data online, preprocess it, and learn about causal relationships. The database used was generated by the RAND institute (see references), and is considered a benchmark in this field.

Since I wanted to find data related to terrorism, I found data that is poor formated and had to clean it:

Data Preprocessing:

Data Preprocessing:

#clean environment:
rm(list = ls())
dat <- read.csv("/Users/oba2311/Desktop/Minerva/Junior/SS154/assignment1/date_clean.csv", header=T)
head(dat[,1:8])
##   X       Date             City            Country Perpetrator
## 1 0 1968-02-09     Buenos Aires          Argentina     Unknown
## 2 1 1968-02-12    Santo Domingo Dominican Republic     Unknown
## 3 2 1968-02-13       Montevideo            Uruguay     Unknown
## 4 3 1968-02-20         Santiago              Chile     Unknown
## 5 4 1968-02-21 Washington, D.C.      United States     Unknown
## 6 5 1968-02-21     Neot Hakikar             Israel     Unknown
##             Weapon Injuries Fatalities
## 1         Firearms        0          0
## 2       Explosives        0          0
## 3 Fire or Firebomb        0          0
## 4       Explosives        0          0
## 5       Explosives        0          0
## 6          Unknown        0          0
typeof(dat$Date)
## [1] "integer"
#change to numeric:
dat$Date<-as.integer(format(as.Date(dat$Date), "%Y%m%d"))
head(dat[,1:8])
##   X     Date             City            Country Perpetrator
## 1 0 19680209     Buenos Aires          Argentina     Unknown
## 2 1 19680212    Santo Domingo Dominican Republic     Unknown
## 3 2 19680213       Montevideo            Uruguay     Unknown
## 4 3 19680220         Santiago              Chile     Unknown
## 5 4 19680221 Washington, D.C.      United States     Unknown
## 6 5 19680221     Neot Hakikar             Israel     Unknown
##             Weapon Injuries Fatalities
## 1         Firearms        0          0
## 2       Explosives        0          0
## 3 Fire or Firebomb        0          0
## 4       Explosives        0          0
## 5       Explosives        0          0
## 6          Unknown        0          0

Let’s plot a histogram of the number of attacks over time, to learn about the trend:

date <- format(round(dat$Date, 4))
head(as.numeric(date))
## [1] 19680209 19680212 19680213 19680220 19680221 19680221
his<- hist(as.numeric(date))

maxh <- max(his$counts)
strh <- strheight('W')
strw <- strwidth(max(his$counts))
his<- hist(as.numeric(date),border = "red", main="Frequency of Terror Attacks Over Time", sub=substitute(paste(italic("Notice the increase in incidents in recent years"))), ylab="Number of Attacks", xlab = "Time", breaks = 41)
text(his$mids, strh + his$counts, labels=his$counts, adj=c(0, 0.5), srt=90)

We see that the current millenia is much worse than the previous one. We should point out that this can also be a feature of the data: as years go by, documentation and media becomes more accurate and robust. We should expect more incidents in the data even if there was no real growth. That said, the numbers shown are dramatic and it is fair to assume that there is indeed growing number of attacks.

Before further exploring, we can compare these results with other source to validate the data: Incidents over Time by Our World in Data We learn that the trend is indeed the same, even when using differert data scources.** Let’s find the outlier:

his$mids
##  [1] 19685000 19695000 19705000 19715000 19725000 19735000 19745000
##  [8] 19755000 19765000 19775000 19785000 19795000 19805000 19815000
## [15] 19825000 19835000 19845000 19855000 19865000 19875000 19885000
## [22] 19895000 19905000 19915000 19925000 19935000 19945000 19955000
## [29] 19965000 19975000 19985000 19995000 20005000 20015000 20025000
## [36] 20035000 20045000 20055000 20065000 20075000 20085000 20095000
#map counts per year to a year:
names(his$breaks) <- his$counts
outl<-max(his$counts)
names(outl)
## NULL

We see that the year 2006 is the highest number of attacks, in 39th place. Let’s verify:

his$breaks[39]
##     6660 
## 20060000
summary(dat[,1:8]) #Omit the description column.
##        X              Date               City                 Country     
##  Min.   :    0   Min.   :19680209          : 4974   Iraq          :10763  
##  1st Qu.:10032   1st Qu.:19990806   Baghdad: 4103   West Bank/Gaza: 2038  
##  Median :20064   Median :20041125   Kirkuk :  853   Afghanistan   : 2025  
##  Mean   :20064   Mean   :20004767   Mosul  :  839   Thailand      : 2009  
##  3rd Qu.:30096   3rd Qu.:20060823   Baqubah:  630   Colombia      : 1913  
##  Max.   :40128   Max.   :20091231   Athens :  435   Israel        : 1687  
##                                     (Other):28295   (Other)       :19694  
##                                         Perpetrator   
##  Unknown                                      :26190  
##  Other                                        : 2057  
##  Taliban                                      : 1000  
##  Revolutionary Armed Forces of Colombia (FARC):  616  
##  Hamas (Islamic Resistance Movement)          :  576  
##  Basque Fatherland and Freedom (ETA)          :  418  
##  (Other)                                      : 9272  
##                         Weapon         Injuries          Fatalities      
##  Explosives                :20523   Min.   :   0.000   Min.   :   0.000  
##  Firearms                  :11222   1st Qu.:   0.000   1st Qu.:   0.000  
##  Unknown                   : 3213   Median :   0.000   Median :   0.000  
##  Fire or Firebomb          : 2778   Mean   :   3.647   Mean   :   1.601  
##  Remote-detonated explosive: 1593   3rd Qu.:   1.000   3rd Qu.:   1.000  
##  Knives & sharp objects    :  418   Max.   :5000.000   Max.   :2749.000  
##  (Other)                   :  382

We see that the biggest terror attack led to the death of 2749 people (September 11).

We see that Iraq is the most dangerous place, and that the Taliban is the most effective and the worst terror organization. Let’s verify this information once again: Incidents all over the World

We know that September 11 is a huge outlier, so let’s see how to model does without it:

no_9_11<-ifelse(dat$Fatalities>=501,501,dat$Fatalities)
no_outliers <- data.frame(dat,no_9_11)
#Check that the max of the new column does not exceed 501:
summary(no_outliers[,10])
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   0.000   0.000   1.545   1.000 501.000
#The model predicts the number of fatalities, based on the number of injuries:
mdl<-lm(no_outliers$no_9_11 ~no_outliers$Injuries)
plot(no_outliers$no_9_11 ~no_outliers$Injuries, main="Injuries as a Regressor for Fatalities", xlab="Injuries", ylab = "Fatalities", xlim=c(0,300), ylim=c(0,250))
abline(mdl, col = "red")

Because of the cluster of low numbers of both fatalities and injuries, we see that the outliers make it hard to examine the plot. Let’s take a look at the summary:

summary(mdl)
## 
## Call:
## lm(formula = no_outliers$no_9_11 ~ no_outliers$Injuries)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -427.59   -1.22   -1.22   -0.22  398.78 
## 
## Coefficients:
##                       Estimate Std. Error t value Pr(>|t|)    
## (Intercept)          1.2249463  0.0362104   33.83   <2e-16 ***
## no_outliers$Injuries 0.0876721  0.0008471  103.49   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 7.227 on 40127 degrees of freedom
## Multiple R-squared:  0.2107, Adjusted R-squared:  0.2107 
## F-statistic: 1.071e+04 on 1 and 40127 DF,  p-value: < 2.2e-16

We see prima facie that both the intercept and the number of injuries are significant regressors (i.e. good predictors). Let’s perform a significance test:

As the p-value is much less than 0.05 (\(2e-16\)), we reject the null hypothesis that \(β\) = \(0\). Hence there is a significant relationship between the variables in the linear regression model of the specific dataset. We can assume that this relationship will hold outside of the data (i.e. out of sample) by common sense.

Resources:

“RAND Databse of Worldwide Terrorism Incidents” - https://www.rand.org/nsrd/projects/terrorism-incidents.html