Brief introduction

This document exhibits procedures and their outcomes applied in the analysis which tried to discern the relationship between inequality indicators and terrorism incidence. Due to the verbose output and fact that they can be found in a scientific paper “Differentiating the”roots“: intertwined inequalities as predictors of region-specific incidence of terrorism”, explanations and comments regarding choiceof analyses and research questions will be omitted and only basic information essential for understanding of the results will be provided. In order to make the analyses easier to follow, they are presented in different tabs. Firstly he general information on the analyses are provided, followed by the original analyses. The following tab presents the first robustness check, “Releveled analysis” which differs from the previous in terms of how the quantitative explanatory variables were categorized, followed by the second robustness check, “Redefined analysis”, in which the dependent variables were differently operationalized. Several “What if…?” analyses have been included in order to strengthen the conclusions of the analyses. In the end, several more specific inequality-terrorism relationship hypotheses were tested, one regarding the relationship between the two per continent and another that differentiates beween countries with a large number of Muslim citizens and those with a relatively lower number of Muslim citizens.

General information

  • the analyses were conducted in R (version 3.5.1)
  • macro-level data were obtained from various sources (The World Bank, UNDP,…) in order to test the relationship between different macro-level inequality indicators and incidence of terrorism
  • terrorism data were obtained from GTD and distinction was made between domestic and transnational terrorist incidents, which represented the two dependent variables of this study
  • independent variables included in this study were (variable markers are in brackets): GDP p.c. (GDP), annual GDP growth in % (GDP_gr), GINI coefficient (GINI), Human development index (HDI), durability of the political regime (durable), political rights (polR), civil liberties (civL), protection of human rights (HRP), population density (pop_density), major episodes of political violence (major_violence_ep), education index (education) and unemployment rate (unemployment)
  • data were collected for approximately 150 countries worldwide, but due to occasional missing values with no rational imputations, number of countries per analysis slightly varies
  • since data set contained entries from 2001 and 2014, in order to avoid autocorrelation, a decision tree was formed for each year (2001-2013) and then predicted against the test set (2014) in order to create an ensambled forest (consisting of those 13 trees) which gives predictions based on the rule of majority vote

Below is the correlation matrix of the predictors based on the 2014 data, indicating their high relatedness that might have contributed to insecure conclusions of the previous analyses, as well as undermine the practical difference between economic and socio-political inequality.

lapply(c("Hmisc", "readr"), library, character.only = T)
## [[1]]
##  [1] "Hmisc"     "ggplot2"   "Formula"   "survival"  "lattice"  
##  [6] "stats"     "graphics"  "grDevices" "utils"     "datasets" 
## [11] "methods"   "base"     
## 
## [[2]]
##  [1] "readr"     "Hmisc"     "ggplot2"   "Formula"   "survival" 
##  [6] "lattice"   "stats"     "graphics"  "grDevices" "utils"    
## [11] "datasets"  "methods"   "base"
data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)
data2 <- data1[complete.cases(data1), ]
data3 <- subset(data2, year == 2014)
correlations <- rcorr(as.matrix(data3[,c("GDP", "GDP_gr", "HDI", "pop_density", "major_violence_ep", "polR", "civL", "HRP", "unemployment", "education")]))
round(correlations$r, 2)
##                     GDP GDP_gr   HDI pop_density major_violence_ep  polR
## GDP                1.00  -0.14  0.70        0.18             -0.15 -0.40
## GDP_gr            -0.14   1.00 -0.28        0.03              0.05  0.22
## HDI                0.70  -0.28  1.00        0.14             -0.14 -0.47
## pop_density        0.18   0.03  0.14        1.00             -0.02  0.02
## major_violence_ep -0.15   0.05 -0.14       -0.02              1.00  0.12
## polR              -0.40   0.22 -0.47        0.02              0.12  1.00
## civL              -0.47   0.23 -0.52        0.03              0.21  0.94
## HRP                0.70  -0.19  0.57        0.09             -0.43 -0.63
## unemployment      -0.07  -0.15  0.13       -0.12             -0.10 -0.21
## education          0.63  -0.26  0.95        0.09             -0.22 -0.49
##                    civL   HRP unemployment education
## GDP               -0.47  0.70        -0.07      0.63
## GDP_gr             0.23 -0.19        -0.15     -0.26
## HDI               -0.52  0.57         0.13      0.95
## pop_density        0.03  0.09        -0.12      0.09
## major_violence_ep  0.21 -0.43        -0.10     -0.22
## polR               0.94 -0.63        -0.21     -0.49
## civL               1.00 -0.71        -0.25     -0.53
## HRP               -0.71  1.00         0.09      0.55
## unemployment      -0.25  0.09         1.00      0.17
## education         -0.53  0.55         0.17      1.00
round(correlations$P, 3)
##                     GDP GDP_gr   HDI pop_density major_violence_ep  polR
## GDP                  NA  0.108 0.000       0.038             0.083 0.000
## GDP_gr            0.108     NA 0.001       0.698             0.537 0.010
## HDI               0.000  0.001    NA       0.099             0.089 0.000
## pop_density       0.038  0.698 0.099          NA             0.856 0.796
## major_violence_ep 0.083  0.537 0.089       0.856                NA 0.169
## polR              0.000  0.010 0.000       0.796             0.169    NA
## civL              0.000  0.008 0.000       0.741             0.015 0.000
## HRP               0.000  0.022 0.000       0.298             0.000 0.000
## unemployment      0.429  0.077 0.118       0.144             0.230 0.012
## education         0.000  0.002 0.000       0.294             0.011 0.000
##                    civL   HRP unemployment education
## GDP               0.000 0.000        0.429     0.000
## GDP_gr            0.008 0.022        0.077     0.002
## HDI               0.000 0.000        0.118     0.000
## pop_density       0.741 0.298        0.144     0.294
## major_violence_ep 0.015 0.000        0.230     0.011
## polR              0.000 0.000        0.012     0.000
## civL                 NA 0.000        0.003     0.000
## HRP               0.000    NA        0.282     0.000
## unemployment      0.003 0.282           NA     0.040
## education         0.000 0.000        0.040        NA

Original analysis

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]
data2$GDP <- as.numeric(cut2(data2$GDP, g=8))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=8))
data2$HDI <- as.numeric(cut2(data2$HDI, g=8))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=8))
data2$durable  <- as.numeric(cut2(data2$durable, g=8))
data2$HRP <- as.numeric(cut2(data2$HRP, g=8))
data2$major_violence_ep <- data2$major_violence_ep
data2$polR  <- data2$polR
data2$civL <- data2$civL
data2$GINI  <- as.numeric(cut2(data2$GINI, g=8))
data2$education <- as.numeric(cut2(data2$education, g=8))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=8))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.05555556
optimal_complexity(dtr2002) 
## [1] 0.06451613
optimal_complexity(dtr2003) 
## [1] 0.025
optimal_complexity(dtr2004)
## [1] 0.28
optimal_complexity(dtr2005)
## [1] 0.2692308
optimal_complexity(dtr2006)
## [1] 0.03448276
optimal_complexity(dtr2007)
## [1] 0.025
optimal_complexity(dtr2008)
## [1] 0.07407407
optimal_complexity(dtr2009)
## [1] 0.02564103
optimal_complexity(dtr2010)
## [1] 0.07407407
optimal_complexity(dtr2011)
## [1] 0.03428571
optimal_complexity(dtr2012)
## [1] 0.05982906
optimal_complexity(dtr2013)
## [1] 0.025
# transnational

optimal_complexity(ttr2001)
## [1] 0.0625
optimal_complexity(ttr2002)
## [1] 0.05405405
optimal_complexity(ttr2003)
## [1] 0.1458333
optimal_complexity(ttr2004)
## [1] 0.05263158
optimal_complexity(ttr2005)
## [1] 0.0625
optimal_complexity(ttr2006)
## [1] 0.03947368
optimal_complexity(ttr2007)
## [1] 0.04081633
optimal_complexity(ttr2008)
## [1] 0.03589744
optimal_complexity(ttr2009)
## [1] 0.03409091
optimal_complexity(ttr2010)
## [1] 0.05405405
optimal_complexity(ttr2011)
## [1] 0.07142857
optimal_complexity(ttr2012)
## [1] 0.04054054
optimal_complexity(ttr2013)
## [1] 0.025

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .055))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .064))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .28))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .269))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .034))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .074))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .074))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .034))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .059))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .062))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .054))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .145))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .052))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .062))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .039))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .040))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .035))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .034))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .054))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .071))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .040))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 102  22
##          1   0  15
##                                           
##                Accuracy : 0.8417          
##                  95% CI : (0.7702, 0.8981)
##     No Information Rate : 0.7338          
##     P-Value [Acc > NIR] : 0.001792        
##                                           
##                   Kappa : 0.5002          
##  Mcnemar's Test P-Value : 7.562e-06       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.4054          
##          Pos Pred Value : 0.8226          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7338          
##          Detection Rate : 0.7338          
##    Detection Prevalence : 0.8921          
##       Balanced Accuracy : 0.7027          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 88 31
##          1  6 14
##                                           
##                Accuracy : 0.7338          
##                  95% CI : (0.6522, 0.8051)
##     No Information Rate : 0.6763          
##     P-Value [Acc > NIR] : 0.08535         
##                                           
##                   Kappa : 0.2891          
##  Mcnemar's Test P-Value : 7.961e-05       
##                                           
##             Sensitivity : 0.9362          
##             Specificity : 0.3111          
##          Pos Pred Value : 0.7395          
##          Neg Pred Value : 0.7000          
##              Prevalence : 0.6763          
##          Detection Rate : 0.6331          
##    Detection Prevalence : 0.8561          
##       Balanced Accuracy : 0.6236          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates (actual tree)

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 152 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 152 27 0 (0.82236842 0.17763158)  
##    2) major_violence_ep< 0.5 128 11 0 (0.91406250 0.08593750)  
##      4) education>=1.5 111  5 0 (0.95495495 0.04504505) *
##      5) education< 1.5 17  6 0 (0.64705882 0.35294118)  
##       10) HRP>=3.5 10  1 0 (0.90000000 0.10000000) *
##       11) HRP< 3.5 7  2 1 (0.28571429 0.71428571)  
##         22) GINI>=5.5 2  0 0 (1.00000000 0.00000000) *
##         23) GINI< 5.5 5  0 1 (0.00000000 1.00000000) *
##    3) major_violence_ep>=0.5 24  8 1 (0.33333333 0.66666667)  
##      6) HDI< 1.5 12  6 0 (0.50000000 0.50000000)  
##       12) polR< 5.5 3  0 0 (1.00000000 0.00000000) *
##       13) polR>=5.5 9  3 1 (0.33333333 0.66666667) *
##      7) HDI>=1.5 12  2 1 (0.16666667 0.83333333) *
dtr2002
## n= 153 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 153 31 0 (0.79738562 0.20261438)  
##    2) HRP>=2.5 116  9 0 (0.92241379 0.07758621) *
##    3) HRP< 2.5 37 15 1 (0.40540541 0.59459459)  
##      6) pop_density< 4.5 19  7 0 (0.63157895 0.36842105)  
##       12) GINI>=3.5 15  4 0 (0.73333333 0.26666667)  
##         24) durable>=1.5 11  1 0 (0.90909091 0.09090909) *
##         25) durable< 1.5 4  1 1 (0.25000000 0.75000000) *
##       13) GINI< 3.5 4  1 1 (0.25000000 0.75000000) *
##      7) pop_density>=4.5 18  3 1 (0.16666667 0.83333333) *
dtr2003
## n= 153 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 153 24 0 (0.84313725 0.15686275)  
##    2) major_violence_ep< 0.5 130  8 0 (0.93846154 0.06153846)  
##      4) HRP>=1.5 127  6 0 (0.95275591 0.04724409) *
##      5) HRP< 1.5 3  1 1 (0.33333333 0.66666667) *
##    3) major_violence_ep>=0.5 23  7 1 (0.30434783 0.69565217)  
##      6) GDP_gr< 1.5 3  0 0 (1.00000000 0.00000000) *
##      7) GDP_gr>=1.5 20  4 1 (0.20000000 0.80000000)  
##       14) HRP>=4 2  0 0 (1.00000000 0.00000000) *
##       15) HRP< 4 18  2 1 (0.11111111 0.88888889) *
dtr2004
## n= 154 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 154 25 0 (0.8376623 0.1623377) *
dtr2005
## n= 155 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 155 26 0 (0.83225806 0.16774194)  
##   2) HRP>=1.5 132 11 0 (0.91666667 0.08333333) *
##   3) HRP< 1.5 23  8 1 (0.34782609 0.65217391) *
dtr2006
## n= 156 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 156 29 0 (0.81410256 0.18589744)  
##    2) major_violence_ep< 1.5 137 12 0 (0.91240876 0.08759124)  
##      4) HRP>=2.5 113  5 0 (0.95575221 0.04424779) *
##      5) HRP< 2.5 24  7 0 (0.70833333 0.29166667)  
##       10) GINI>=3 20  4 0 (0.80000000 0.20000000) *
##       11) GINI< 3 4  1 1 (0.25000000 0.75000000) *
##    3) major_violence_ep>=1.5 19  2 1 (0.10526316 0.89473684) *
dtr2007
## n= 156 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 156 35 0 (0.77564103 0.22435897)  
##    2) HRP>=1.5 134 15 0 (0.88805970 0.11194030)  
##      4) major_violence_ep< 0.5 125  9 0 (0.92800000 0.07200000) *
##      5) major_violence_ep>=0.5 9  3 1 (0.33333333 0.66666667)  
##       10) civL>=4.5 4  1 0 (0.75000000 0.25000000) *
##       11) civL< 4.5 5  0 1 (0.00000000 1.00000000) *
##    3) HRP< 1.5 22  2 1 (0.09090909 0.90909091) *
dtr2008
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 141 27 0 (0.80851064 0.19148936)  
##    2) major_violence_ep< 1.5 123 14 0 (0.88617886 0.11382114) *
##    3) major_violence_ep>=1.5 18  5 1 (0.27777778 0.72222222)  
##      6) GDP_gr< 1.5 2  0 0 (1.00000000 0.00000000) *
##      7) GDP_gr>=1.5 16  3 1 (0.18750000 0.81250000)  
##       14) pop_density< 1.5 2  0 0 (1.00000000 0.00000000) *
##       15) pop_density>=1.5 14  1 1 (0.07142857 0.92857143) *
dtr2009
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 141 26 0 (0.81560284 0.18439716)  
##     2) major_violence_ep< 1.5 124 14 0 (0.88709677 0.11290323)  
##       4) HRP>=5.5 48  1 0 (0.97916667 0.02083333) *
##       5) HRP< 5.5 76 13 0 (0.82894737 0.17105263)  
##        10) HDI< 6.5 73 10 0 (0.86301370 0.13698630)  
##          20) pop_density< 7.5 67  7 0 (0.89552239 0.10447761)  
##            40) GINI>=4.5 38  2 0 (0.94736842 0.05263158) *
##            41) GINI< 4.5 29  5 0 (0.82758621 0.17241379)  
##              82) unemployment< 6.5 22  2 0 (0.90909091 0.09090909) *
##              83) unemployment>=6.5 7  3 0 (0.57142857 0.42857143)  
##               166) HRP< 3.5 3  0 0 (1.00000000 0.00000000) *
##               167) HRP>=3.5 4  1 1 (0.25000000 0.75000000) *
##          21) pop_density>=7.5 6  3 0 (0.50000000 0.50000000)  
##            42) GINI>=4.5 3  0 0 (1.00000000 0.00000000) *
##            43) GINI< 4.5 3  0 1 (0.00000000 1.00000000) *
##        11) HDI>=6.5 3  0 1 (0.00000000 1.00000000) *
##     3) major_violence_ep>=1.5 17  5 1 (0.29411765 0.70588235)  
##       6) education>=5.5 3  0 0 (1.00000000 0.00000000) *
##       7) education< 5.5 14  2 1 (0.14285714 0.85714286) *
dtr2010
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 141 27 0 (0.8085106 0.1914894)  
##   2) major_violence_ep< 0.5 122 13 0 (0.8934426 0.1065574) *
##   3) major_violence_ep>=0.5 19  5 1 (0.2631579 0.7368421)  
##     6) pop_density< 1.5 2  0 0 (1.0000000 0.0000000) *
##     7) pop_density>=1.5 17  3 1 (0.1764706 0.8235294) *
dtr2011
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 141 35 0 (0.75177305 0.24822695)  
##    2) major_violence_ep< 1.5 127 22 0 (0.82677165 0.17322835)  
##      4) HRP>=5.5 49  3 0 (0.93877551 0.06122449) *
##      5) HRP< 5.5 78 19 0 (0.75641026 0.24358974)  
##       10) civL>=3.5 51  9 0 (0.82352941 0.17647059) *
##       11) civL< 3.5 27 10 0 (0.62962963 0.37037037)  
##         22) GDP< 6.5 25  8 0 (0.68000000 0.32000000)  
##           44) HDI>=4.5 10  1 0 (0.90000000 0.10000000) *
##           45) HDI< 4.5 15  7 0 (0.53333333 0.46666667)  
##             90) HRP>=3.5 11  3 0 (0.72727273 0.27272727) *
##             91) HRP< 3.5 4  0 1 (0.00000000 1.00000000) *
##         23) GDP>=6.5 2  0 1 (0.00000000 1.00000000) *
##    3) major_violence_ep>=1.5 14  1 1 (0.07142857 0.92857143) *
dtr2012
## n= 139 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 139 39 0 (0.71942446 0.28057554)  
##    2) major_violence_ep< 0.5 121 23 0 (0.80991736 0.19008264)  
##      4) durable>=3.5 84 11 0 (0.86904762 0.13095238) *
##      5) durable< 3.5 37 12 0 (0.67567568 0.32432432)  
##       10) HRP>=4.5 18  1 0 (0.94444444 0.05555556) *
##       11) HRP< 4.5 19  8 1 (0.42105263 0.57894737)  
##         22) GDP_gr>=6.5 4  0 0 (1.00000000 0.00000000) *
##         23) GDP_gr< 6.5 15  4 1 (0.26666667 0.73333333) *
##    3) major_violence_ep>=0.5 18  2 1 (0.11111111 0.88888889) *
dtr2013
## n= 139 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 139 39 0 (0.71942446 0.28057554)  
##     2) major_violence_ep< 0.5 120 23 0 (0.80833333 0.19166667)  
##       4) HRP>=4.5 70  5 0 (0.92857143 0.07142857) *
##       5) HRP< 4.5 50 18 0 (0.64000000 0.36000000)  
##        10) polR>=4.5 29  6 0 (0.79310345 0.20689655)  
##          20) pop_density< 7.5 24  2 0 (0.91666667 0.08333333) *
##          21) pop_density>=7.5 5  1 1 (0.20000000 0.80000000) *
##        11) polR< 4.5 21  9 1 (0.42857143 0.57142857)  
##          22) durable>=5.5 5  1 0 (0.80000000 0.20000000) *
##          23) durable< 5.5 16  5 1 (0.31250000 0.68750000)  
##            46) education< 1.5 2  0 0 (1.00000000 0.00000000) *
##            47) education>=1.5 14  3 1 (0.21428571 0.78571429)  
##              94) GDP_gr< 5.5 8  3 1 (0.37500000 0.62500000)  
##               188) GDP_gr>=4.5 3  0 0 (1.00000000 0.00000000) *
##               189) GDP_gr< 4.5 5  0 1 (0.00000000 1.00000000) *
##              95) GDP_gr>=5.5 6  0 1 (0.00000000 1.00000000) *
##     3) major_violence_ep>=0.5 19  3 1 (0.15789474 0.84210526)  
##       6) GDP_gr< 2.5 3  1 0 (0.66666667 0.33333333) *
##       7) GDP_gr>=2.5 16  1 1 (0.06250000 0.93750000) *

Although prediction of transnational terrorism was unsuccessful, the variables performed fairly in prediction of domestic terrorism. Two most consistent and strongest predictors were major political violent events and protection of human rights, with several other weaker predictors varying from year to year.

Releveled analysis

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "randomForest", "rpart.plot", "rattle", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]
data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.02777778
optimal_complexity(dtr2002) 
## [1] 0.025
optimal_complexity(dtr2003) 
## [1] 0.125
optimal_complexity(dtr2004)
## [1] 0.12
optimal_complexity(dtr2005)
## [1] 0.07692308
optimal_complexity(dtr2006)
## [1] 0.03448276
optimal_complexity(dtr2007)
## [1] 0.04285714
optimal_complexity(dtr2008)
## [1] 0.1111111
optimal_complexity(dtr2009)
## [1] 0.07692308
optimal_complexity(dtr2010)
## [1] 0.07407407
optimal_complexity(dtr2011)
## [1] 0.04285714
optimal_complexity(dtr2012)
## [1] 0.03846154
optimal_complexity(dtr2013)
## [1] 0.025
# transnational

optimal_complexity(ttr2001)
## [1] 0.0625
optimal_complexity(ttr2002)
## [1] 0.06756757
optimal_complexity(ttr2003)
## [1] 0.125
optimal_complexity(ttr2004)
## [1] 0.05263158
optimal_complexity(ttr2005)
## [1] 0.0625
optimal_complexity(ttr2006)
## [1] 0.02631579
optimal_complexity(ttr2007)
## [1] 0.04081633
optimal_complexity(ttr2008)
## [1] 0.02564103
optimal_complexity(ttr2009)
## [1] 0.03977273
optimal_complexity(ttr2010)
## [1] 0.05405405
optimal_complexity(ttr2011)
## [1] 0.04761905
optimal_complexity(ttr2012)
## [1] 0.03474903
optimal_complexity(ttr2013)
## [1] 0.03846154

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .027))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .12))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .076))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .034))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .042))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .111))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .076))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .074))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .042))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .038))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .062))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .067))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .052))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .062))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .026))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .040))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .039))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .054))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .047))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .034))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .038))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 102  21
##          1   0  16
##                                          
##                Accuracy : 0.8489         
##                  95% CI : (0.7784, 0.904)
##     No Information Rate : 0.7338         
##     P-Value [Acc > NIR] : 0.0008782      
##                                          
##                   Kappa : 0.5279         
##  Mcnemar's Test P-Value : 1.275e-05      
##                                          
##             Sensitivity : 1.0000         
##             Specificity : 0.4324         
##          Pos Pred Value : 0.8293         
##          Neg Pred Value : 1.0000         
##              Prevalence : 0.7338         
##          Detection Rate : 0.7338         
##    Detection Prevalence : 0.8849         
##       Balanced Accuracy : 0.7162         
##                                          
##        'Positive' Class : 0              
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 88 31
##          1  6 14
##                                           
##                Accuracy : 0.7338          
##                  95% CI : (0.6522, 0.8051)
##     No Information Rate : 0.6763          
##     P-Value [Acc > NIR] : 0.08535         
##                                           
##                   Kappa : 0.2891          
##  Mcnemar's Test P-Value : 7.961e-05       
##                                           
##             Sensitivity : 0.9362          
##             Specificity : 0.3111          
##          Pos Pred Value : 0.7395          
##          Neg Pred Value : 0.7000          
##              Prevalence : 0.6763          
##          Detection Rate : 0.6331          
##    Detection Prevalence : 0.8561          
##       Balanced Accuracy : 0.6236          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink"))+ theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1)
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates (actual model)

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink"))+ theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 152 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 152 27 0 (0.82236842 0.17763158)  
##    2) HRP>=1.5 120  8 0 (0.93333333 0.06666667)  
##      4) education>=1.5 98  3 0 (0.96938776 0.03061224) *
##      5) education< 1.5 22  5 0 (0.77272727 0.22727273)  
##       10) GINI>=3.5 8  0 0 (1.00000000 0.00000000) *
##       11) GINI< 3.5 14  5 0 (0.64285714 0.35714286)  
##         22) HRP>=3.5 5  0 0 (1.00000000 0.00000000) *
##         23) HRP< 3.5 9  4 1 (0.44444444 0.55555556)  
##           46) durable>=3 2  0 0 (1.00000000 0.00000000) *
##           47) durable< 3 7  2 1 (0.28571429 0.71428571) *
##    3) HRP< 1.5 32 13 1 (0.40625000 0.59375000)  
##      6) major_violence_ep< 0.5 11  4 0 (0.63636364 0.36363636)  
##       12) GDP>=1.5 5  0 0 (1.00000000 0.00000000) *
##       13) GDP< 1.5 6  2 1 (0.33333333 0.66666667)  
##         26) GDP_gr< 3.5 3  1 0 (0.66666667 0.33333333) *
##         27) GDP_gr>=3.5 3  0 1 (0.00000000 1.00000000) *
##      7) major_violence_ep>=0.5 21  6 1 (0.28571429 0.71428571)  
##       14) GDP_gr>=3.5 5  2 0 (0.60000000 0.40000000) *
##       15) GDP_gr< 3.5 16  3 1 (0.18750000 0.81250000)  
##         30) unemployment< 2.5 7  3 1 (0.42857143 0.57142857)  
##           60) pop_density< 4.5 4  1 0 (0.75000000 0.25000000) *
##           61) pop_density>=4.5 3  0 1 (0.00000000 1.00000000) *
##         31) unemployment>=2.5 9  0 1 (0.00000000 1.00000000) *
dtr2002
## n= 153 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 153 31 0 (0.79738562 0.20261438)  
##    2) HRP>=1.5 121 11 0 (0.90909091 0.09090909) *
##    3) HRP< 1.5 32 12 1 (0.37500000 0.62500000)  
##      6) pop_density< 1.5 6  1 0 (0.83333333 0.16666667) *
##      7) pop_density>=1.5 26  7 1 (0.26923077 0.73076923)  
##       14) civL>=1.5 9  4 0 (0.55555556 0.44444444)  
##         28) durable>=1.5 6  1 0 (0.83333333 0.16666667) *
##         29) durable< 1.5 3  0 1 (0.00000000 1.00000000) *
##       15) civL< 1.5 17  2 1 (0.11764706 0.88235294) *
dtr2003
## n= 153 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 153 24 0 (0.84313725 0.15686275)  
##   2) major_violence_ep< 0.5 130  8 0 (0.93846154 0.06153846) *
##   3) major_violence_ep>=0.5 23  7 1 (0.30434783 0.69565217) *
dtr2004
## n= 154 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 154 25 0 (0.83766234 0.16233766)  
##   2) major_violence_ep< 0.5 129  9 0 (0.93023256 0.06976744) *
##   3) major_violence_ep>=0.5 25  9 1 (0.36000000 0.64000000) *
dtr2005
## n= 155 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 155 26 0 (0.8322581 0.1677419)  
##    2) HRP>=1.5 120  6 0 (0.9500000 0.0500000) *
##    3) HRP< 1.5 35 15 1 (0.4285714 0.5714286)  
##      6) GINI>=2.5 24 11 0 (0.5416667 0.4583333)  
##       12) HDI< 2.5 15  4 0 (0.7333333 0.2666667) *
##       13) HDI>=2.5 9  2 1 (0.2222222 0.7777778)  
##         26) HDI>=3.5 2  0 0 (1.0000000 0.0000000) *
##         27) HDI< 3.5 7  0 1 (0.0000000 1.0000000) *
##      7) GINI< 2.5 11  2 1 (0.1818182 0.8181818) *
dtr2006
## n= 156 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 156 29 0 (0.81410256 0.18589744)  
##    2) major_violence_ep< 0.5 132 10 0 (0.92424242 0.07575758) *
##    3) major_violence_ep>=0.5 24  5 1 (0.20833333 0.79166667)  
##      6) major_violence_ep< 1.5 10  4 1 (0.40000000 0.60000000)  
##       12) polR>=1.5 2  0 0 (1.00000000 0.00000000) *
##       13) polR< 1.5 8  2 1 (0.25000000 0.75000000) *
##      7) major_violence_ep>=1.5 14  1 1 (0.07142857 0.92857143) *
dtr2007
## n= 156 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 156 35 0 (0.77564103 0.22435897)  
##    2) HRP>=1.5 123 10 0 (0.91869919 0.08130081) *
##    3) HRP< 1.5 33  8 1 (0.24242424 0.75757576)  
##      6) polR>=1.5 15  7 1 (0.46666667 0.53333333)  
##       12) durable>=1.5 9  3 0 (0.66666667 0.33333333) *
##       13) durable< 1.5 6  1 1 (0.16666667 0.83333333) *
##      7) polR< 1.5 18  1 1 (0.05555556 0.94444444) *
dtr2008
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 141 27 0 (0.8085106 0.1914894)  
##   2) major_violence_ep< 0.5 119 13 0 (0.8907563 0.1092437) *
##   3) major_violence_ep>=0.5 22  8 1 (0.3636364 0.6363636)  
##     6) HRP>=1.5 3  0 0 (1.0000000 0.0000000) *
##     7) HRP< 1.5 19  5 1 (0.2631579 0.7368421) *
dtr2009
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 141 26 0 (0.81560284 0.18439716)  
##    2) HRP>=1.5 108  9 0 (0.91666667 0.08333333) *
##    3) HRP< 1.5 33 16 1 (0.48484848 0.51515152)  
##      6) major_violence_ep< 1.5 20  7 0 (0.65000000 0.35000000)  
##       12) unemployment>=1.5 16  4 0 (0.75000000 0.25000000) *
##       13) unemployment< 1.5 4  1 1 (0.25000000 0.75000000) *
##      7) major_violence_ep>=1.5 13  3 1 (0.23076923 0.76923077) *
dtr2010
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 141 27 0 (0.8085106 0.1914894)  
##   2) major_violence_ep< 0.5 122 13 0 (0.8934426 0.1065574) *
##   3) major_violence_ep>=0.5 19  5 1 (0.2631579 0.7368421)  
##     6) pop_density< 1.5 2  0 0 (1.0000000 0.0000000) *
##     7) pop_density>=1.5 17  3 1 (0.1764706 0.8235294) *
dtr2011
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 141 35 0 (0.7517730 0.2482270)  
##    2) major_violence_ep< 0.5 121 19 0 (0.8429752 0.1570248) *
##    3) major_violence_ep>=0.5 20  4 1 (0.2000000 0.8000000)  
##      6) polR>=1.5 9  4 1 (0.4444444 0.5555556)  
##       12) major_violence_ep< 1.5 5  1 0 (0.8000000 0.2000000) *
##       13) major_violence_ep>=1.5 4  0 1 (0.0000000 1.0000000) *
##      7) polR< 1.5 11  0 1 (0.0000000 1.0000000) *
dtr2012
## n= 139 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 139 39 0 (0.7194245 0.2805755)  
##    2) major_violence_ep< 0.5 121 23 0 (0.8099174 0.1900826)  
##      4) durable>=2.5 82 10 0 (0.8780488 0.1219512) *
##      5) durable< 2.5 39 13 0 (0.6666667 0.3333333)  
##       10) HRP>=2.5 23  3 0 (0.8695652 0.1304348)  
##         20) GDP_gr>=3.5 12  0 0 (1.0000000 0.0000000) *
##         21) GDP_gr< 3.5 11  3 0 (0.7272727 0.2727273)  
##           42) HRP>=3.5 8  0 0 (1.0000000 0.0000000) *
##           43) HRP< 3.5 3  0 1 (0.0000000 1.0000000) *
##       11) HRP< 2.5 16  6 1 (0.3750000 0.6250000)  
##         22) GDP_gr>=4.5 3  0 0 (1.0000000 0.0000000) *
##         23) GDP_gr< 4.5 13  3 1 (0.2307692 0.7692308) *
##    3) major_violence_ep>=0.5 18  2 1 (0.1111111 0.8888889) *
dtr2013
## n= 139 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 139 39 0 (0.71942446 0.28057554)  
##    2) major_violence_ep< 0.5 120 23 0 (0.80833333 0.19166667)  
##      4) HRP>=2.5 80  9 0 (0.88750000 0.11250000) *
##      5) HRP< 2.5 40 14 0 (0.65000000 0.35000000)  
##       10) durable>=3.5 14  1 0 (0.92857143 0.07142857) *
##       11) durable< 3.5 26 13 0 (0.50000000 0.50000000)  
##         22) HDI< 2.5 16  5 0 (0.68750000 0.31250000)  
##           44) unemployment< 4.5 13  3 0 (0.76923077 0.23076923) *
##           45) unemployment>=4.5 3  1 1 (0.33333333 0.66666667) *
##         23) HDI>=2.5 10  2 1 (0.20000000 0.80000000) *
##    3) major_violence_ep>=0.5 19  3 1 (0.15789474 0.84210526)  
##      6) durable>=1.5 11  3 1 (0.27272727 0.72727273)  
##       12) durable< 3.5 4  1 0 (0.75000000 0.25000000) *
##       13) durable>=3.5 7  0 1 (0.00000000 1.00000000) *
##      7) durable< 1.5 8  0 1 (0.00000000 1.00000000) *

This robustness analysis generally confirmed the conclusions of the original analysis.

Redefined analysis

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-robust.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]
data2$GDP <- as.numeric(cut2(data2$GDP, g=8))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=8))
data2$HDI <- as.numeric(cut2(data2$HDI, g=8))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=8))
data2$durable  <- as.numeric(cut2(data2$durable, g=8))
data2$HRP <- as.numeric(cut2(data2$HRP, g=8))
data2$major_violence_ep <- data2$major_violence_ep
data2$polR  <- data2$polR
data2$civL <- data2$civL
data2$GINI  <- as.numeric(cut2(data2$GINI, g=8))
data2$education <- as.numeric(cut2(data2$education, g=8))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=8))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,13)] # domestic
gtdt <- data2[, -c(1,12)] # transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.0617284
optimal_complexity(dtr2002) 
## [1] 0.06451613
optimal_complexity(dtr2003) 
## [1] 0.04166667
optimal_complexity(dtr2004)
## [1] 0.08
optimal_complexity(dtr2005)
## [1] 0.07692308
optimal_complexity(dtr2006)
## [1] 0.03448276
optimal_complexity(dtr2007)
## [1] 0.02702703
optimal_complexity(dtr2008)
## [1] 0.07407407
optimal_complexity(dtr2009)
## [1] 0.05128205
optimal_complexity(dtr2010)
## [1] 0.07142857
optimal_complexity(dtr2011)
## [1] 0.03428571
optimal_complexity(dtr2012)
## [1] 0.04878049
optimal_complexity(dtr2013)
## [1] 0.025
# transnational

optimal_complexity(ttr2001)
## [1] 0.125
optimal_complexity(ttr2002)
## [1] 0.07692308
optimal_complexity(ttr2003)
## [1] 0.1458333
optimal_complexity(ttr2004)
## [1] 0.03076923
optimal_complexity(ttr2005)
## [1] 0.06060606
optimal_complexity(ttr2006)
## [1] 0.04054054
optimal_complexity(ttr2007)
## [1] 0.03061224
optimal_complexity(ttr2008)
## [1] 0.03418803
optimal_complexity(ttr2009)
## [1] 0.04545455
optimal_complexity(ttr2010)
## [1] 0.05128205
optimal_complexity(ttr2011)
## [1] 0.05681818
optimal_complexity(ttr2012)
## [1] 0.04385965
optimal_complexity(ttr2013)
## [1] 0.03921569

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .061))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .064))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .041))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .08))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .076))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .034))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .027))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .074))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .051))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .071))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .034))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .048))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .025))

# transnational 

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .125))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .076))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .145))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .030))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .060))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .040))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .030))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .034))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .045))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .051))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .056))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .043))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 25, cp = .039))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 100  21
##          1   0  16
##                                           
##                Accuracy : 0.8467          
##                  95% CI : (0.7753, 0.9025)
##     No Information Rate : 0.7299          
##     P-Value [Acc > NIR] : 0.0008575       
##                                           
##                   Kappa : 0.5266          
##  Mcnemar's Test P-Value : 1.275e-05       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.4324          
##          Pos Pred Value : 0.8264          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7299          
##          Detection Rate : 0.7299          
##    Detection Prevalence : 0.8832          
##       Balanced Accuracy : 0.7162          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 85 32
##          1  6 14
##                                           
##                Accuracy : 0.7226          
##                  95% CI : (0.6397, 0.7957)
##     No Information Rate : 0.6642          
##     P-Value [Acc > NIR] : 0.08591         
##                                           
##                   Kappa : 0.2771          
##  Mcnemar's Test P-Value : 5.002e-05       
##                                           
##             Sensitivity : 0.9341          
##             Specificity : 0.3043          
##          Pos Pred Value : 0.7265          
##          Neg Pred Value : 0.7000          
##              Prevalence : 0.6642          
##          Detection Rate : 0.6204          
##    Detection Prevalence : 0.8540          
##       Balanced Accuracy : 0.6192          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1)
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 141 27 0 (0.80851064 0.19148936)  
##    2) HRP>=1.5 120 12 0 (0.90000000 0.10000000)  
##      4) HDI>=2.5 88  3 0 (0.96590909 0.03409091) *
##      5) HDI< 2.5 32  9 0 (0.71875000 0.28125000)  
##       10) GINI>=5.5 13  0 0 (1.00000000 0.00000000) *
##       11) GINI< 5.5 19  9 0 (0.52631579 0.47368421)  
##         22) HRP>=4.5 8  1 0 (0.87500000 0.12500000) *
##         23) HRP< 4.5 11  3 1 (0.27272727 0.72727273) *
##    3) HRP< 1.5 21  6 1 (0.28571429 0.71428571) *
dtr2002
## n= 142 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 142 31 0 (0.78169014 0.21830986)  
##    2) HRP>=2.5 107 10 0 (0.90654206 0.09345794) *
##    3) HRP< 2.5 35 14 1 (0.40000000 0.60000000)  
##      6) pop_density< 4.5 17  6 0 (0.64705882 0.35294118)  
##       12) durable>=2.5 9  1 0 (0.88888889 0.11111111) *
##       13) durable< 2.5 8  3 1 (0.37500000 0.62500000)  
##         26) major_violence_ep>=5 2  0 0 (1.00000000 0.00000000) *
##         27) major_violence_ep< 5 6  1 1 (0.16666667 0.83333333) *
##      7) pop_density>=4.5 18  3 1 (0.16666667 0.83333333) *
dtr2003
## n= 142 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 142 24 0 (0.83098592 0.16901408)  
##    2) major_violence_ep< 0.5 119  8 0 (0.93277311 0.06722689)  
##      4) HRP>=1.5 116  6 0 (0.94827586 0.05172414) *
##      5) HRP< 1.5 3  1 1 (0.33333333 0.66666667) *
##    3) major_violence_ep>=0.5 23  7 1 (0.30434783 0.69565217)  
##      6) GDP_gr< 1.5 3  0 0 (1.00000000 0.00000000) *
##      7) GDP_gr>=1.5 20  4 1 (0.20000000 0.80000000)  
##       14) HRP>=4.5 2  0 0 (1.00000000 0.00000000) *
##       15) HRP< 4.5 18  2 1 (0.11111111 0.88888889) *
dtr2004
## n= 143 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 143 25 0 (0.82517483 0.17482517)  
##   2) HRP>=1.5 122 10 0 (0.91803279 0.08196721) *
##   3) HRP< 1.5 21  6 1 (0.28571429 0.71428571) *
dtr2005
## n= 146 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 146 26 0 (0.82191781 0.17808219)  
##    2) HRP>=2.5 108  6 0 (0.94444444 0.05555556) *
##    3) HRP< 2.5 38 18 1 (0.47368421 0.52631579)  
##      6) GDP_gr< 3.5 9  1 0 (0.88888889 0.11111111) *
##      7) GDP_gr>=3.5 29 10 1 (0.34482759 0.65517241)  
##       14) HDI>=5.5 2  0 0 (1.00000000 0.00000000) *
##       15) HDI< 5.5 27  8 1 (0.29629630 0.70370370)  
##         30) polR>=6.5 4  1 0 (0.75000000 0.25000000) *
##         31) polR< 6.5 23  5 1 (0.21739130 0.78260870) *
dtr2006
## n= 148 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 148 29 0 (0.80405405 0.19594595)  
##    2) major_violence_ep< 1.5 129 12 0 (0.90697674 0.09302326)  
##      4) HRP>=2.5 107  5 0 (0.95327103 0.04672897) *
##      5) HRP< 2.5 22  7 0 (0.68181818 0.31818182)  
##       10) GINI>=2.5 18  4 0 (0.77777778 0.22222222)  
##         20) polR>=3.5 15  2 0 (0.86666667 0.13333333) *
##         21) polR< 3.5 3  1 1 (0.33333333 0.66666667) *
##       11) GINI< 2.5 4  1 1 (0.25000000 0.75000000) *
##    3) major_violence_ep>=1.5 19  2 1 (0.10526316 0.89473684) *
dtr2007
## n= 148 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 148 37 0 (0.75000000 0.25000000)  
##    2) major_violence_ep< 0.5 126 17 0 (0.86507937 0.13492063)  
##      4) HRP>=1.5 118 11 0 (0.90677966 0.09322034) *
##      5) HRP< 1.5 8  2 1 (0.25000000 0.75000000) *
##    3) major_violence_ep>=0.5 22  2 1 (0.09090909 0.90909091)  
##      6) GDP_gr< 4.5 7  2 1 (0.28571429 0.71428571)  
##       12) major_violence_ep< 1.5 2  0 0 (1.00000000 0.00000000) *
##       13) major_violence_ep>=1.5 5  0 1 (0.00000000 1.00000000) *
##      7) GDP_gr>=4.5 15  0 1 (0.00000000 1.00000000) *
dtr2008
## n= 137 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 137 27 0 (0.80291971 0.19708029)  
##    2) major_violence_ep< 1.5 119 14 0 (0.88235294 0.11764706) *
##    3) major_violence_ep>=1.5 18  5 1 (0.27777778 0.72222222)  
##      6) GDP_gr< 1.5 2  0 0 (1.00000000 0.00000000) *
##      7) GDP_gr>=1.5 16  3 1 (0.18750000 0.81250000)  
##       14) pop_density< 1.5 2  0 0 (1.00000000 0.00000000) *
##       15) pop_density>=1.5 14  1 1 (0.07142857 0.92857143) *
dtr2009
## n= 137 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 137 26 0 (0.81021898 0.18978102)  
##    2) major_violence_ep< 1.5 120 14 0 (0.88333333 0.11666667)  
##      4) HRP>=5.5 50  2 0 (0.96000000 0.04000000) *
##      5) HRP< 5.5 70 12 0 (0.82857143 0.17142857)  
##       10) GINI>=4.5 39  3 0 (0.92307692 0.07692308) *
##       11) GINI< 4.5 31  9 0 (0.70967742 0.29032258)  
##         22) pop_density< 6.5 25  4 0 (0.84000000 0.16000000) *
##         23) pop_density>=6.5 6  1 1 (0.16666667 0.83333333) *
##    3) major_violence_ep>=1.5 17  5 1 (0.29411765 0.70588235)  
##      6) education>=5.5 3  0 0 (1.00000000 0.00000000) *
##      7) education< 5.5 14  2 1 (0.14285714 0.85714286) *
dtr2010
## n= 138 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 138 28 0 (0.7971014 0.2028986)  
##   2) major_violence_ep< 0.5 119 13 0 (0.8907563 0.1092437) *
##   3) major_violence_ep>=0.5 19  4 1 (0.2105263 0.7894737)  
##     6) pop_density< 1.5 2  0 0 (1.0000000 0.0000000) *
##     7) pop_density>=1.5 17  2 1 (0.1176471 0.8823529) *
dtr2011
## n= 138 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 138 35 0 (0.74637681 0.25362319)  
##    2) major_violence_ep< 1.5 124 22 0 (0.82258065 0.17741935)  
##      4) HRP>=5.5 49  3 0 (0.93877551 0.06122449) *
##      5) HRP< 5.5 75 19 0 (0.74666667 0.25333333)  
##       10) unemployment< 5.5 53 10 0 (0.81132075 0.18867925)  
##         20) pop_density< 6.5 38  3 0 (0.92105263 0.07894737) *
##         21) pop_density>=6.5 15  7 0 (0.53333333 0.46666667)  
##           42) GINI>=5.5 6  1 0 (0.83333333 0.16666667) *
##           43) GINI< 5.5 9  3 1 (0.33333333 0.66666667)  
##             86) HDI< 2.5 2  0 0 (1.00000000 0.00000000) *
##             87) HDI>=2.5 7  1 1 (0.14285714 0.85714286) *
##       11) unemployment>=5.5 22  9 0 (0.59090909 0.40909091)  
##         22) civL>=2.5 19  6 0 (0.68421053 0.31578947) *
##         23) civL< 2.5 3  0 1 (0.00000000 1.00000000) *
##    3) major_violence_ep>=1.5 14  1 1 (0.07142857 0.92857143) *
dtr2012
## n= 136 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 136 41 0 (0.6985294 0.3014706)  
##    2) major_violence_ep< 0.5 118 25 0 (0.7881356 0.2118644)  
##      4) HRP>=2.5 104 18 0 (0.8269231 0.1730769) *
##      5) HRP< 2.5 14  7 0 (0.5000000 0.5000000)  
##       10) HDI>=4.5 4  0 0 (1.0000000 0.0000000) *
##       11) HDI< 4.5 10  3 1 (0.3000000 0.7000000) *
##    3) major_violence_ep>=0.5 18  2 1 (0.1111111 0.8888889) *
dtr2013
## n= 136 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 136 41 0 (0.69852941 0.30147059)  
##    2) HRP>=2.5 110 21 0 (0.80909091 0.19090909)  
##      4) HRP>=4.5 70  7 0 (0.90000000 0.10000000) *
##      5) HRP< 4.5 40 14 0 (0.65000000 0.35000000)  
##       10) civL>=3.5 29  7 0 (0.75862069 0.24137931)  
##         20) pop_density< 6.5 22  2 0 (0.90909091 0.09090909) *
##         21) pop_density>=6.5 7  2 1 (0.28571429 0.71428571) *
##       11) civL< 3.5 11  4 1 (0.36363636 0.63636364)  
##         22) pop_density>=5.5 4  0 0 (1.00000000 0.00000000) *
##         23) pop_density< 5.5 7  0 1 (0.00000000 1.00000000) *
##    3) HRP< 2.5 26  6 1 (0.23076923 0.76923077)  
##      6) GDP_gr< 2.5 3  0 0 (1.00000000 0.00000000) *
##      7) GDP_gr>=2.5 23  3 1 (0.13043478 0.86956522)  
##       14) GDP_gr>=7.5 2  0 0 (1.00000000 0.00000000) *
##       15) GDP_gr< 7.5 21  1 1 (0.04761905 0.95238095) *

Again, the conclusions from the original analysis were confirmed. This robustness check also confirmed the conclusions of the original analysis.

What if…?

What would happen if we would use only the two strongest and most consistent predictors of terrorism in the prediction?

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]
data2$GDP <- as.numeric(cut2(data2$GDP, g=8))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=8))
data2$HDI <- as.numeric(cut2(data2$HDI, g=8))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=8))
data2$durable  <- as.numeric(cut2(data2$durable, g=8))
data2$HRP <- as.numeric(cut2(data2$HRP, g=8))
data2$major_violence_ep <- data2$major_violence_ep
data2$polR  <- data2$polR
data2$civL <- data2$civL
data2$GINI  <- as.numeric(cut2(data2$GINI, g=8))
data2$education <- as.numeric(cut2(data2$education, g=8))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=8))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.03703704
optimal_complexity(dtr2002) 
## [1] 0.025
optimal_complexity(dtr2003) 
## [1] 0.08333333
optimal_complexity(dtr2004)
## [1] 0.025
optimal_complexity(dtr2005)
## [1] 0.2692308
optimal_complexity(dtr2006)
## [1] 0.025
optimal_complexity(dtr2007)
## [1] 0.025
optimal_complexity(dtr2008)
## [1] 0.025
optimal_complexity(dtr2009)
## [1] 0.025
optimal_complexity(dtr2010)
## [1] 0.025
optimal_complexity(dtr2011)
## [1] 0.025
optimal_complexity(dtr2012)
## [1] 0.025
optimal_complexity(dtr2013)
## [1] 0.025
# transnational

optimal_complexity(ttr2001)
## [1] 0.125
optimal_complexity(ttr2002)
## [1] 0.02702703
optimal_complexity(ttr2003)
## [1] 0.025
optimal_complexity(ttr2004)
## [1] 0.025
optimal_complexity(ttr2005)
## [1] 0.025
optimal_complexity(ttr2006)
## [1] 0.025
optimal_complexity(ttr2007)
## [1] 0.025
optimal_complexity(ttr2008)
## [1] 0.025
optimal_complexity(ttr2009)
## [1] 0.025
optimal_complexity(ttr2010)
## [1] 0.025
optimal_complexity(ttr2011)
## [1] 0.025
optimal_complexity(ttr2012)
## [1] 0.025
optimal_complexity(ttr2013)
## [1] 0.025

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .037))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .083))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .026))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .125))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .027))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 101  20
##          1   1  17
##                                          
##                Accuracy : 0.8489         
##                  95% CI : (0.7784, 0.904)
##     No Information Rate : 0.7338         
##     P-Value [Acc > NIR] : 0.0008782      
##                                          
##                   Kappa : 0.5376         
##  Mcnemar's Test P-Value : 8.568e-05      
##                                          
##             Sensitivity : 0.9902         
##             Specificity : 0.4595         
##          Pos Pred Value : 0.8347         
##          Neg Pred Value : 0.9444         
##              Prevalence : 0.7338         
##          Detection Rate : 0.7266         
##    Detection Prevalence : 0.8705         
##       Balanced Accuracy : 0.7248         
##                                          
##        'Positive' Class : 0              
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 88 31
##          1  6 14
##                                           
##                Accuracy : 0.7338          
##                  95% CI : (0.6522, 0.8051)
##     No Information Rate : 0.6763          
##     P-Value [Acc > NIR] : 0.08535         
##                                           
##                   Kappa : 0.2891          
##  Mcnemar's Test P-Value : 7.961e-05       
##                                           
##             Sensitivity : 0.9362          
##             Specificity : 0.3111          
##          Pos Pred Value : 0.7395          
##          Neg Pred Value : 0.7000          
##              Prevalence : 0.6763          
##          Detection Rate : 0.6331          
##    Detection Prevalence : 0.8561          
##       Balanced Accuracy : 0.6236          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates (actual tree)

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 152 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 152 27 0 (0.8223684 0.1776316)  
##   2) major_violence_ep< 0.5 128 11 0 (0.9140625 0.0859375) *
##   3) major_violence_ep>=0.5 24  8 1 (0.3333333 0.6666667)  
##     6) HRP>=1.5 5  2 0 (0.6000000 0.4000000) *
##     7) HRP< 1.5 19  5 1 (0.2631579 0.7368421) *
dtr2002
## n= 153 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 153 31 0 (0.79738562 0.20261438)  
##    2) HRP>=2.5 116  9 0 (0.92241379 0.07758621) *
##    3) HRP< 2.5 37 15 1 (0.40540541 0.59459459)  
##      6) major_violence_ep< 0.5 17  7 0 (0.58823529 0.41176471) *
##      7) major_violence_ep>=0.5 20  5 1 (0.25000000 0.75000000)  
##       14) major_violence_ep>=5.5 3  1 0 (0.66666667 0.33333333) *
##       15) major_violence_ep< 5.5 17  3 1 (0.17647059 0.82352941) *
dtr2003
## n= 153 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 153 24 0 (0.84313725 0.15686275)  
##   2) major_violence_ep< 0.5 130  8 0 (0.93846154 0.06153846) *
##   3) major_violence_ep>=0.5 23  7 1 (0.30434783 0.69565217)  
##     6) HRP>=4 2  0 0 (1.00000000 0.00000000) *
##     7) HRP< 4 21  5 1 (0.23809524 0.76190476) *
dtr2004
## n= 154 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 154 25 0 (0.83766234 0.16233766)  
##   2) major_violence_ep< 0.5 129  9 0 (0.93023256 0.06976744)  
##     4) HRP>=1.5 124  6 0 (0.95161290 0.04838710) *
##     5) HRP< 1.5 5  2 1 (0.40000000 0.60000000) *
##   3) major_violence_ep>=0.5 25  9 1 (0.36000000 0.64000000)  
##     6) HRP>=2.5 2  0 0 (1.00000000 0.00000000) *
##     7) HRP< 2.5 23  7 1 (0.30434783 0.69565217) *
dtr2005
## n= 155 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 155 26 0 (0.83225806 0.16774194)  
##    2) HRP>=1.5 132 11 0 (0.91666667 0.08333333)  
##      4) major_violence_ep< 0.5 124  7 0 (0.94354839 0.05645161) *
##      5) major_violence_ep>=0.5 8  4 0 (0.50000000 0.50000000)  
##       10) HRP>=2.5 2  0 0 (1.00000000 0.00000000) *
##       11) HRP< 2.5 6  2 1 (0.33333333 0.66666667) *
##    3) HRP< 1.5 23  8 1 (0.34782609 0.65217391)  
##      6) major_violence_ep< 4.5 18  8 1 (0.44444444 0.55555556)  
##       12) major_violence_ep>=0.5 12  5 0 (0.58333333 0.41666667) *
##       13) major_violence_ep< 0.5 6  1 1 (0.16666667 0.83333333) *
##      7) major_violence_ep>=4.5 5  0 1 (0.00000000 1.00000000) *
dtr2006
## n= 156 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 156 29 0 (0.81410256 0.18589744)  
##   2) major_violence_ep< 1.5 137 12 0 (0.91240876 0.08759124) *
##   3) major_violence_ep>=1.5 19  2 1 (0.10526316 0.89473684) *
dtr2007
## n= 156 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 156 35 0 (0.77564103 0.22435897)  
##   2) HRP>=1.5 134 15 0 (0.88805970 0.11194030)  
##     4) major_violence_ep< 0.5 125  9 0 (0.92800000 0.07200000) *
##     5) major_violence_ep>=0.5 9  3 1 (0.33333333 0.66666667) *
##   3) HRP< 1.5 22  2 1 (0.09090909 0.90909091) *
dtr2008
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 141 27 0 (0.8085106 0.1914894)  
##   2) major_violence_ep< 1.5 123 14 0 (0.8861789 0.1138211) *
##   3) major_violence_ep>=1.5 18  5 1 (0.2777778 0.7222222) *
dtr2009
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 141 26 0 (0.8156028 0.1843972)  
##   2) major_violence_ep< 1.5 124 14 0 (0.8870968 0.1129032) *
##   3) major_violence_ep>=1.5 17  5 1 (0.2941176 0.7058824) *
dtr2010
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 141 27 0 (0.8085106 0.1914894)  
##   2) major_violence_ep< 0.5 122 13 0 (0.8934426 0.1065574) *
##   3) major_violence_ep>=0.5 19  5 1 (0.2631579 0.7368421) *
dtr2011
## n= 141 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 141 35 0 (0.75177305 0.24822695)  
##   2) major_violence_ep< 1.5 127 22 0 (0.82677165 0.17322835) *
##   3) major_violence_ep>=1.5 14  1 1 (0.07142857 0.92857143) *
dtr2012
## n= 139 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 139 39 0 (0.7194245 0.2805755)  
##   2) major_violence_ep< 0.5 121 23 0 (0.8099174 0.1900826) *
##   3) major_violence_ep>=0.5 18  2 1 (0.1111111 0.8888889) *
dtr2013
## n= 139 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 139 39 0 (0.7194245 0.2805755)  
##   2) major_violence_ep< 0.5 120 23 0 (0.8083333 0.1916667) *
##   3) major_violence_ep>=0.5 19  3 1 (0.1578947 0.8421053) *

As we can notice from this analysis, the two strongest predictors perform fairly well without any additional explanatory variables.

What if…? (2)

What would happen if we excluded the two strongest and most consistent predictors of terrorism from the prediction?

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]
data2$GDP <- as.numeric(cut2(data2$GDP, g=8))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=8))
data2$HDI <- as.numeric(cut2(data2$HDI, g=8))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=8))
data2$durable  <- as.numeric(cut2(data2$durable, g=8))
data2$HRP <- as.numeric(cut2(data2$HRP, g=8))
data2$major_violence_ep <- data2$major_violence_ep
data2$polR  <- data2$polR
data2$civL <- data2$civL
data2$GINI  <- as.numeric(cut2(data2$GINI, g=8))
data2$education <- as.numeric(cut2(data2$education, g=8))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=8))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.09259259
optimal_complexity(dtr2002) 
## [1] 0.03225806
optimal_complexity(dtr2003) 
## [1] 0.05
optimal_complexity(dtr2004)
## [1] 0.05333333
optimal_complexity(dtr2005)
## [1] 0.05769231
optimal_complexity(dtr2006)
## [1] 0.03448276
optimal_complexity(dtr2007)
## [1] 0.02857143
optimal_complexity(dtr2008)
## [1] 0.03703704
optimal_complexity(dtr2009)
## [1] 0.03846154
optimal_complexity(dtr2010)
## [1] 0.06481481
optimal_complexity(dtr2011)
## [1] 0.05714286
optimal_complexity(dtr2012)
## [1] 0.07692308
optimal_complexity(dtr2013)
## [1] 0.05128205
# transnational

optimal_complexity(ttr2001)
## [1] 0.0625
optimal_complexity(ttr2002)
## [1] 0.05405405
optimal_complexity(ttr2003)
## [1] 0.075
optimal_complexity(ttr2004)
## [1] 0.05263158
optimal_complexity(ttr2005)
## [1] 0.03125
optimal_complexity(ttr2006)
## [1] 0.03684211
optimal_complexity(ttr2007)
## [1] 0.06122449
optimal_complexity(ttr2008)
## [1] 0.06410256
optimal_complexity(ttr2009)
## [1] 0.04545455
optimal_complexity(ttr2010)
## [1] 0.0472973
optimal_complexity(ttr2011)
## [1] 0.025
optimal_complexity(ttr2012)
## [1] 0.06756757
optimal_complexity(ttr2013)
## [1] 0.05769231

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .092))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .032))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .05))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .053))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .057))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .034))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .028))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .037))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .038))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .064))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .057))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .076))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .051))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .062))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .054))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .075))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .052))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .031))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .036))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .061))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .064))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .045))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .047))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .067))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .057))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 102  32
##          1   0   5
##                                           
##                Accuracy : 0.7698          
##                  95% CI : (0.6908, 0.8369)
##     No Information Rate : 0.7338          
##     P-Value [Acc > NIR] : 0.195           
##                                           
##                   Kappa : 0.1865          
##  Mcnemar's Test P-Value : 4.251e-08       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.1351          
##          Pos Pred Value : 0.7612          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.7338          
##          Detection Rate : 0.7338          
##    Detection Prevalence : 0.9640          
##       Balanced Accuracy : 0.5676          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 94 40
##          1  0  5
##                                           
##                Accuracy : 0.7122          
##                  95% CI : (0.6294, 0.7858)
##     No Information Rate : 0.6763          
##     P-Value [Acc > NIR] : 0.2084          
##                                           
##                   Kappa : 0.1446          
##  Mcnemar's Test P-Value : 6.984e-10       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.1111          
##          Pos Pred Value : 0.7015          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.6763          
##          Detection Rate : 0.6763          
##    Detection Prevalence : 0.9640          
##       Balanced Accuracy : 0.5556          
##                                           
##        'Positive' Class : 0               
## 

The results above suggest that the inequality variables included into these trees generally do not perform well in terrorism prediction, thus implying protection of human rights (and not general inequality) and major episodes of political violence as the only relevant global predictors of terrorism.

Asia

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-robust.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2a <- data1[complete.cases(data1), ]

# continent selection

data2 <- subset(data2a, cont == 1)

# categorization

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

gtdd <- data2[, -c(1,2,14)] # domestic
gtdt <- data2[, -c(1,2,13)] # transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.025
optimal_complexity(dtr2002) 
## [1] 0.07142857
optimal_complexity(dtr2003) 
## [1] 0.03846154
optimal_complexity(dtr2004)
## [1] 0.025
optimal_complexity(dtr2005)
## [1] 0.1538462
optimal_complexity(dtr2006)
## [1] 0.07142857
optimal_complexity(dtr2007)
## [1] 0.1111111
optimal_complexity(dtr2008)
## [1] 0.06666667
optimal_complexity(dtr2009)
## [1] 0.025
optimal_complexity(dtr2010)
## [1] 0.025
optimal_complexity(dtr2011)
## [1] 0.1176471
optimal_complexity(dtr2012)
## [1] 0.025
optimal_complexity(dtr2013)
## [1] 0.025
# transnational

optimal_complexity(ttr2001)
## [1] 0.25
optimal_complexity(ttr2002)
## [1] 0.09375
optimal_complexity(ttr2003)
## [1] 0.07692308
optimal_complexity(ttr2004)
## [1] 0.06666667
optimal_complexity(ttr2005)
## [1] 0.06666667
optimal_complexity(ttr2006)
## [1] 0.025
optimal_complexity(ttr2007)
## [1] 0.025
optimal_complexity(ttr2008)
## [1] 0.1333333
optimal_complexity(ttr2009)
## [1] 0.2666667
optimal_complexity(ttr2010)
## [1] 0.07142857
optimal_complexity(ttr2011)
## [1] 0.025
optimal_complexity(ttr2012)
## [1] 0.1111111
optimal_complexity(ttr2013)
## [1] 0.125

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .071))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .038))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .153))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .071))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .111))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .066))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .117))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))

# transnational 

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .25))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .093))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .076))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .066))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .066))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .133))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .026))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .071))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .111))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic

ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 18  6
##          1  1 12
##                                           
##                Accuracy : 0.8108          
##                  95% CI : (0.6484, 0.9204)
##     No Information Rate : 0.5135          
##     P-Value [Acc > NIR] : 0.0001781       
##                                           
##                   Kappa : 0.6186          
##  Mcnemar's Test P-Value : 0.1305700       
##                                           
##             Sensitivity : 0.9474          
##             Specificity : 0.6667          
##          Pos Pred Value : 0.7500          
##          Neg Pred Value : 0.9231          
##              Prevalence : 0.5135          
##          Detection Rate : 0.4865          
##    Detection Prevalence : 0.6486          
##       Balanced Accuracy : 0.8070          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 12 11
##          1  3 11
##                                           
##                Accuracy : 0.6216          
##                  95% CI : (0.4476, 0.7754)
##     No Information Rate : 0.5946          
##     P-Value [Acc > NIR] : 0.43777         
##                                           
##                   Kappa : 0.2765          
##  Mcnemar's Test P-Value : 0.06137         
##                                           
##             Sensitivity : 0.8000          
##             Specificity : 0.5000          
##          Pos Pred Value : 0.5217          
##          Neg Pred Value : 0.7857          
##              Prevalence : 0.4054          
##          Detection Rate : 0.3243          
##    Detection Prevalence : 0.6216          
##       Balanced Accuracy : 0.6500          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1)
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 34 12 0 (0.6470588 0.3529412)  
##   2) major_violence_ep< 0.5 25  3 0 (0.8800000 0.1200000) *
##   3) major_violence_ep>=0.5 9  0 1 (0.0000000 1.0000000) *
dtr2002
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 34 14 0 (0.5882353 0.4117647)  
##    2) major_violence_ep< 0.5 25  5 0 (0.8000000 0.2000000)  
##      4) education>=2.5 14  0 0 (1.0000000 0.0000000) *
##      5) education< 2.5 11  5 0 (0.5454545 0.4545455)  
##       10) GDP< 2.5 7  2 0 (0.7142857 0.2857143) *
##       11) GDP>=2.5 4  1 1 (0.2500000 0.7500000) *
##    3) major_violence_ep>=0.5 9  0 1 (0.0000000 1.0000000) *
dtr2003
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 34 13 0 (0.61764706 0.38235294)  
##    2) major_violence_ep< 0.5 23  2 0 (0.91304348 0.08695652)  
##      4) HRP>=3.5 16  0 0 (1.00000000 0.00000000) *
##      5) HRP< 3.5 7  2 0 (0.71428571 0.28571429)  
##       10) pop_density< 2.5 4  0 0 (1.00000000 0.00000000) *
##       11) pop_density>=2.5 3  1 1 (0.33333333 0.66666667) *
##    3) major_violence_ep>=0.5 11  0 1 (0.00000000 1.00000000) *
dtr2004
## n= 35 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 35 16 0 (0.54285714 0.45714286)  
##   2) HRP>=2.5 20  2 0 (0.90000000 0.10000000)  
##     4) major_violence_ep< 0.5 17  0 0 (1.00000000 0.00000000) *
##     5) major_violence_ep>=0.5 3  1 1 (0.33333333 0.66666667) *
##   3) HRP< 2.5 15  1 1 (0.06666667 0.93333333) *
dtr2005
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 13 0 (0.6486486 0.3513514)  
##   2) HRP>=2.5 20  1 0 (0.9500000 0.0500000) *
##   3) HRP< 2.5 17  5 1 (0.2941176 0.7058824)  
##     6) civL>=1.5 4  1 0 (0.7500000 0.2500000) *
##     7) civL< 1.5 13  2 1 (0.1538462 0.8461538) *
dtr2006
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 14 0 (0.62162162 0.37837838)  
##    2) HRP>=2.5 21  1 0 (0.95238095 0.04761905) *
##    3) HRP< 2.5 16  3 1 (0.18750000 0.81250000)  
##      6) GDP>=2.5 6  3 0 (0.50000000 0.50000000)  
##       12) GDP< 3.5 4  1 0 (0.75000000 0.25000000) *
##       13) GDP>=3.5 2  0 1 (0.00000000 1.00000000) *
##      7) GDP< 2.5 10  0 1 (0.00000000 1.00000000) *
dtr2007
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 18 0 (0.51351351 0.48648649)  
##   2) HRP>=2.5 20  3 0 (0.85000000 0.15000000)  
##     4) durable>=1.5 18  1 0 (0.94444444 0.05555556) *
##     5) durable< 1.5 2  0 1 (0.00000000 1.00000000) *
##   3) HRP< 2.5 17  2 1 (0.11764706 0.88235294) *
dtr2008
## n= 36 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 36 15 0 (0.5833333 0.4166667)  
##   2) HRP>=2.5 20  2 0 (0.9000000 0.1000000) *
##   3) HRP< 2.5 16  3 1 (0.1875000 0.8125000)  
##     6) polR>=1.5 5  2 0 (0.6000000 0.4000000) *
##     7) polR< 1.5 11  0 1 (0.0000000 1.0000000) *
dtr2009
## n= 36 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 36 12 0 (0.66666667 0.33333333)  
##   2) HRP>=2.5 20  1 0 (0.95000000 0.05000000) *
##   3) HRP< 2.5 16  5 1 (0.31250000 0.68750000)  
##     6) HDI>=3.5 4  0 0 (1.00000000 0.00000000) *
##     7) HDI< 3.5 12  1 1 (0.08333333 0.91666667) *
dtr2010
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 13 0 (0.64864865 0.35135135)  
##   2) major_violence_ep< 0.5 26  3 0 (0.88461538 0.11538462)  
##     4) HDI>=1.5 23  1 0 (0.95652174 0.04347826) *
##     5) HDI< 1.5 3  1 1 (0.33333333 0.66666667) *
##   3) major_violence_ep>=0.5 11  1 1 (0.09090909 0.90909091) *
dtr2011
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 17 0 (0.5405405 0.4594595)  
##    2) major_violence_ep< 0.5 26  6 0 (0.7692308 0.2307692)  
##      4) education>=3.5 15  0 0 (1.0000000 0.0000000) *
##      5) education< 3.5 11  5 1 (0.4545455 0.5454545)  
##       10) unemployment< 2.5 7  2 0 (0.7142857 0.2857143) *
##       11) unemployment>=2.5 4  0 1 (0.0000000 1.0000000) *
##    3) major_violence_ep>=0.5 11  0 1 (0.0000000 1.0000000) *
dtr2012
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 16 0 (0.5675676 0.4324324)  
##    2) major_violence_ep< 0.5 26  5 0 (0.8076923 0.1923077)  
##      4) pop_density< 3.5 16  0 0 (1.0000000 0.0000000) *
##      5) pop_density>=3.5 10  5 0 (0.5000000 0.5000000)  
##       10) durable>=3.5 4  0 0 (1.0000000 0.0000000) *
##       11) durable< 3.5 6  1 1 (0.1666667 0.8333333) *
##    3) major_violence_ep>=0.5 11  0 1 (0.0000000 1.0000000) *
dtr2013
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 18 0 (0.5135135 0.4864865)  
##    2) major_violence_ep< 0.5 26  7 0 (0.7307692 0.2692308)  
##      4) pop_density< 3.5 16  1 0 (0.9375000 0.0625000) *
##      5) pop_density>=3.5 10  4 1 (0.4000000 0.6000000)  
##       10) durable>=4 4  0 0 (1.0000000 0.0000000) *
##       11) durable< 4 6  0 1 (0.0000000 1.0000000) *
##    3) major_violence_ep>=0.5 11  0 1 (0.0000000 1.0000000) *

Generally, analysis conducted on Asian countries exhibits similar results to the previous global analyses, revealing the protection of human rights and major episodes of political violence as dominant predictors of domestic terrorism. Analysis focused on transnational terrorism yielded no significant results.

What if…? Asia

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-robust.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2a <- data1[complete.cases(data1), ]

# continent selection

data2 <- subset(data2a, cont == 1)

# categorization

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

gtdd <- data2[, -c(1,2,14)] # domestic
gtdt <- data2[, -c(1,2,13)] # transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.025
optimal_complexity(dtr2002) 
## [1] 0.025
optimal_complexity(dtr2003) 
## [1] 0.025
optimal_complexity(dtr2004)
## [1] 0.025
optimal_complexity(dtr2005)
## [1] 0.025
optimal_complexity(dtr2006)
## [1] 0.025
optimal_complexity(dtr2007)
## [1] 0.025
optimal_complexity(dtr2008)
## [1] 0.06666667
optimal_complexity(dtr2009)
## [1] 0.025
optimal_complexity(dtr2010)
## [1] 0.025
optimal_complexity(dtr2011)
## [1] 0.025
optimal_complexity(dtr2012)
## [1] 0.025
optimal_complexity(dtr2013)
## [1] 0.02777778
# transnational

optimal_complexity(ttr2001)
## [1] 0.025
optimal_complexity(ttr2002)
## [1] 0.025
optimal_complexity(ttr2003)
## [1] 0.025
optimal_complexity(ttr2004)
## [1] 0.06666667
optimal_complexity(ttr2005)
## [1] 0.025
optimal_complexity(ttr2006)
## [1] 0.025
optimal_complexity(ttr2007)
## [1] 0.025
optimal_complexity(ttr2008)
## [1] 0.025
optimal_complexity(ttr2009)
## [1] 0.2666667
optimal_complexity(ttr2010)
## [1] 0.025
optimal_complexity(ttr2011)
## [1] 0.025
optimal_complexity(ttr2012)
## [1] 0.025
optimal_complexity(ttr2013)
## [1] 0.03125

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .066))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~HRP + major_violence_ep, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .027))

# transnational 

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .066))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .026))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~HRP + major_violence_ep, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .031))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic

ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 19  7
##          1  0 11
##                                           
##                Accuracy : 0.8108          
##                  95% CI : (0.6484, 0.9204)
##     No Information Rate : 0.5135          
##     P-Value [Acc > NIR] : 0.0001781       
##                                           
##                   Kappa : 0.6174          
##  Mcnemar's Test P-Value : 0.0233422       
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.6111          
##          Pos Pred Value : 0.7308          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.5135          
##          Detection Rate : 0.5135          
##    Detection Prevalence : 0.7027          
##       Balanced Accuracy : 0.8056          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 12 11
##          1  3 11
##                                           
##                Accuracy : 0.6216          
##                  95% CI : (0.4476, 0.7754)
##     No Information Rate : 0.5946          
##     P-Value [Acc > NIR] : 0.43777         
##                                           
##                   Kappa : 0.2765          
##  Mcnemar's Test P-Value : 0.06137         
##                                           
##             Sensitivity : 0.8000          
##             Specificity : 0.5000          
##          Pos Pred Value : 0.5217          
##          Neg Pred Value : 0.7857          
##              Prevalence : 0.4054          
##          Detection Rate : 0.3243          
##    Detection Prevalence : 0.6216          
##       Balanced Accuracy : 0.6500          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1)
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")

Location of splits

dtr2001
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 34 12 0 (0.6470588 0.3529412)  
##   2) major_violence_ep< 0.5 25  3 0 (0.8800000 0.1200000) *
##   3) major_violence_ep>=0.5 9  0 1 (0.0000000 1.0000000) *
dtr2002
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 34 14 0 (0.5882353 0.4117647)  
##   2) major_violence_ep< 0.5 25  5 0 (0.8000000 0.2000000) *
##   3) major_violence_ep>=0.5 9  0 1 (0.0000000 1.0000000) *
dtr2003
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 34 13 0 (0.61764706 0.38235294)  
##   2) major_violence_ep< 0.5 23  2 0 (0.91304348 0.08695652) *
##   3) major_violence_ep>=0.5 11  0 1 (0.00000000 1.00000000) *
dtr2004
## n= 35 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 35 16 0 (0.54285714 0.45714286)  
##   2) HRP>=2.5 20  2 0 (0.90000000 0.10000000)  
##     4) major_violence_ep< 0.5 17  0 0 (1.00000000 0.00000000) *
##     5) major_violence_ep>=0.5 3  1 1 (0.33333333 0.66666667) *
##   3) HRP< 2.5 15  1 1 (0.06666667 0.93333333) *
dtr2005
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 13 0 (0.6486486 0.3513514)  
##   2) HRP>=2.5 20  1 0 (0.9500000 0.0500000) *
##   3) HRP< 2.5 17  5 1 (0.2941176 0.7058824) *
dtr2006
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 14 0 (0.62162162 0.37837838)  
##   2) HRP>=2.5 21  1 0 (0.95238095 0.04761905) *
##   3) HRP< 2.5 16  3 1 (0.18750000 0.81250000) *
dtr2007
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 18 0 (0.5135135 0.4864865)  
##   2) HRP>=2.5 20  3 0 (0.8500000 0.1500000) *
##   3) HRP< 2.5 17  2 1 (0.1176471 0.8823529) *
dtr2008
## n= 36 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 36 15 0 (0.5833333 0.4166667)  
##   2) HRP>=2.5 20  2 0 (0.9000000 0.1000000) *
##   3) HRP< 2.5 16  3 1 (0.1875000 0.8125000)  
##     6) major_violence_ep< 0.5 5  2 0 (0.6000000 0.4000000) *
##     7) major_violence_ep>=0.5 11  0 1 (0.0000000 1.0000000) *
dtr2009
## n= 36 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 36 12 0 (0.6666667 0.3333333)  
##    2) HRP>=2.5 20  1 0 (0.9500000 0.0500000) *
##    3) HRP< 2.5 16  5 1 (0.3125000 0.6875000)  
##      6) major_violence_ep< 2.5 13  5 1 (0.3846154 0.6153846)  
##       12) HRP< 1.5 7  3 0 (0.5714286 0.4285714)  
##         24) major_violence_ep< 1.5 3  0 0 (1.0000000 0.0000000) *
##         25) major_violence_ep>=1.5 4  1 1 (0.2500000 0.7500000) *
##       13) HRP>=1.5 6  1 1 (0.1666667 0.8333333) *
##      7) major_violence_ep>=2.5 3  0 1 (0.0000000 1.0000000) *
dtr2010
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 13 0 (0.64864865 0.35135135)  
##   2) major_violence_ep< 0.5 26  3 0 (0.88461538 0.11538462) *
##   3) major_violence_ep>=0.5 11  1 1 (0.09090909 0.90909091) *
dtr2011
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 17 0 (0.5405405 0.4594595)  
##   2) major_violence_ep< 0.5 26  6 0 (0.7692308 0.2307692) *
##   3) major_violence_ep>=0.5 11  0 1 (0.0000000 1.0000000) *
dtr2012
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 16 0 (0.5675676 0.4324324)  
##   2) major_violence_ep< 0.5 26  5 0 (0.8076923 0.1923077) *
##   3) major_violence_ep>=0.5 11  0 1 (0.0000000 1.0000000) *
dtr2013
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 18 0 (0.51351351 0.48648649)  
##    2) major_violence_ep< 0.5 26  7 0 (0.73076923 0.26923077)  
##      4) HRP>=3.5 13  1 0 (0.92307692 0.07692308) *
##      5) HRP< 3.5 13  6 0 (0.53846154 0.46153846)  
##       10) HRP>=2.5 10  4 0 (0.60000000 0.40000000) *
##       11) HRP< 2.5 3  1 1 (0.33333333 0.66666667) *
##    3) major_violence_ep>=0.5 11  0 1 (0.00000000 1.00000000) *

Overall, these results suggest that inclusion of only the two strongest predictors achieved as correct prediction as the entire set of predictors.

What if…? Asia (2)

What would happen if we excluded the two strongest and most consistent predictors of terrorism from the prediction? Would Asian countries exhibit the same trends as found globally or the global sample included noise in the results that may have hidden the true relationship between inequality and terrorism?

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-robust.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2a <- data1[complete.cases(data1), ]

# continent selection

data2 <- subset(data2a, cont == 1)

# categorization

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

gtdd <- data2[, -c(1,2,14)] # domestic
gtdt <- data2[, -c(1,2,13)] # transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.08333333
optimal_complexity(dtr2002) 
## [1] 0.1071429
optimal_complexity(dtr2003) 
## [1] 0.3076923
optimal_complexity(dtr2004)
## [1] 0.09375
optimal_complexity(dtr2005)
## [1] 0.2307692
optimal_complexity(dtr2006)
## [1] 0.1428571
optimal_complexity(dtr2007)
## [1] 0.05555556
optimal_complexity(dtr2008)
## [1] 0.1333333
optimal_complexity(dtr2009)
## [1] 0.25
optimal_complexity(dtr2010)
## [1] 0.025
optimal_complexity(dtr2011)
## [1] 0.1176471
optimal_complexity(dtr2012)
## [1] 0.04166667
optimal_complexity(dtr2013)
## [1] 0.025
# transnational

optimal_complexity(ttr2001)
## [1] 0.2142857
optimal_complexity(ttr2002)
## [1] 0.25
optimal_complexity(ttr2003)
## [1] 0.1538462
optimal_complexity(ttr2004)
## [1] 0.1333333
optimal_complexity(ttr2005)
## [1] 0.06666667
optimal_complexity(ttr2006)
## [1] 0.2142857
optimal_complexity(ttr2007)
## [1] 0.025
optimal_complexity(ttr2008)
## [1] 0.025
optimal_complexity(ttr2009)
## [1] 0.1333333
optimal_complexity(ttr2010)
## [1] 0.2142857
optimal_complexity(ttr2011)
## [1] 0.05882353
optimal_complexity(ttr2012)
## [1] 0.3333333
optimal_complexity(ttr2013)
## [1] 0.15625

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .083))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .107))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .307))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .093))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .230))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .142))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .055))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .133))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .25))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .117))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .041))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-HRP-major_violence_ep, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))

# transnational 

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .214))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .153))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .133))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .066))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .214))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .133))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .214))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .058))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .333))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-HRP-major_violence_ep, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .156))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic

ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 19  8
##          1  0 10
##                                           
##                Accuracy : 0.7838          
##                  95% CI : (0.6179, 0.9017)
##     No Information Rate : 0.5135          
##     P-Value [Acc > NIR] : 0.000667        
##                                           
##                   Kappa : 0.5621          
##  Mcnemar's Test P-Value : 0.013328        
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.5556          
##          Pos Pred Value : 0.7037          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.5135          
##          Detection Rate : 0.5135          
##    Detection Prevalence : 0.7297          
##       Balanced Accuracy : 0.7778          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 12  9
##          1  3 13
##                                           
##                Accuracy : 0.6757          
##                  95% CI : (0.5021, 0.8199)
##     No Information Rate : 0.5946          
##     P-Value [Acc > NIR] : 0.2023          
##                                           
##                   Kappa : 0.3675          
##  Mcnemar's Test P-Value : 0.1489          
##                                           
##             Sensitivity : 0.8000          
##             Specificity : 0.5909          
##          Pos Pred Value : 0.5714          
##          Neg Pred Value : 0.8125          
##              Prevalence : 0.4054          
##          Detection Rate : 0.3243          
##    Detection Prevalence : 0.5676          
##       Balanced Accuracy : 0.6955          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1)
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 34 12 0 (0.6470588 0.3529412)  
##    2) HDI>=1.5 25  5 0 (0.8000000 0.2000000)  
##      4) pop_density< 2.5 11  0 0 (1.0000000 0.0000000) *
##      5) pop_density>=2.5 14  5 0 (0.6428571 0.3571429)  
##       10) durable>=3.5 7  1 0 (0.8571429 0.1428571) *
##       11) durable< 3.5 7  3 1 (0.4285714 0.5714286)  
##         22) polR>=0.5 4  1 0 (0.7500000 0.2500000) *
##         23) polR< 0.5 3  0 1 (0.0000000 1.0000000) *
##    3) HDI< 1.5 9  2 1 (0.2222222 0.7777778)  
##      6) GDP_gr>=3.5 3  1 0 (0.6666667 0.3333333) *
##      7) GDP_gr< 3.5 6  0 1 (0.0000000 1.0000000) *
dtr2002
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 34 14 0 (0.5882353 0.4117647)  
##    2) education>=2.5 17  3 0 (0.8235294 0.1764706)  
##      4) pop_density< 3.5 11  0 0 (1.0000000 0.0000000) *
##      5) pop_density>=3.5 6  3 0 (0.5000000 0.5000000)  
##       10) pop_density>=4.5 3  0 0 (1.0000000 0.0000000) *
##       11) pop_density< 4.5 3  0 1 (0.0000000 1.0000000) *
##    3) education< 2.5 17  6 1 (0.3529412 0.6470588)  
##      6) pop_density< 2.5 9  4 0 (0.5555556 0.4444444)  
##       12) GDP_gr< 4.5 7  2 0 (0.7142857 0.2857143) *
##       13) GDP_gr>=4.5 2  0 1 (0.0000000 1.0000000) *
##      7) pop_density>=2.5 8  1 1 (0.1250000 0.8750000) *
dtr2003
## n= 34 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 34 13 0 (0.6176471 0.3823529)  
##   2) pop_density< 3.5 22  5 0 (0.7727273 0.2272727) *
##   3) pop_density>=3.5 12  4 1 (0.3333333 0.6666667) *
dtr2004
## n= 35 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 35 16 0 (0.5428571 0.4571429)  
##    2) education>=2.5 19  3 0 (0.8421053 0.1578947)  
##      4) pop_density< 3.5 13  0 0 (1.0000000 0.0000000) *
##      5) pop_density>=3.5 6  3 0 (0.5000000 0.5000000)  
##       10) pop_density>=4.5 3  0 0 (1.0000000 0.0000000) *
##       11) pop_density< 4.5 3  0 1 (0.0000000 1.0000000) *
##    3) education< 2.5 16  3 1 (0.1875000 0.8125000) *
dtr2005
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 13 0 (0.6486486 0.3513514)  
##   2) education>=2.5 20  3 0 (0.8500000 0.1500000) *
##   3) education< 2.5 17  7 1 (0.4117647 0.5882353)  
##     6) GINI>=3.5 5  1 0 (0.8000000 0.2000000) *
##     7) GINI< 3.5 12  3 1 (0.2500000 0.7500000) *
dtr2006
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 14 0 (0.6216216 0.3783784)  
##    2) GDP>=2.5 22  4 0 (0.8181818 0.1818182) *
##    3) GDP< 2.5 15  5 1 (0.3333333 0.6666667)  
##      6) GDP_gr>=3.5 8  3 0 (0.6250000 0.3750000)  
##       12) pop_density< 3 6  1 0 (0.8333333 0.1666667) *
##       13) pop_density>=3 2  0 1 (0.0000000 1.0000000) *
##      7) GDP_gr< 3.5 7  0 1 (0.0000000 1.0000000) *
dtr2007
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 18 0 (0.5135135 0.4864865)  
##    2) durable>=1.5 28  9 0 (0.6785714 0.3214286)  
##      4) GINI>=4.5 6  0 0 (1.0000000 0.0000000) *
##      5) GINI< 4.5 22  9 0 (0.5909091 0.4090909)  
##       10) pop_density< 3.5 14  4 0 (0.7142857 0.2857143)  
##         20) education>=2.5 9  1 0 (0.8888889 0.1111111) *
##         21) education< 2.5 5  2 1 (0.4000000 0.6000000) *
##       11) pop_density>=3.5 8  3 1 (0.3750000 0.6250000)  
##         22) GDP>=4 3  1 0 (0.6666667 0.3333333) *
##         23) GDP< 4 5  1 1 (0.2000000 0.8000000) *
##    3) durable< 1.5 9  0 1 (0.0000000 1.0000000) *
dtr2008
## n= 36 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 36 15 0 (0.5833333 0.4166667)  
##    2) HDI>=3.5 15  2 0 (0.8666667 0.1333333) *
##    3) HDI< 3.5 21  8 1 (0.3809524 0.6190476)  
##      6) GDP_gr>=3.5 10  4 0 (0.6000000 0.4000000)  
##       12) durable>=1.5 8  2 0 (0.7500000 0.2500000) *
##       13) durable< 1.5 2  0 1 (0.0000000 1.0000000) *
##      7) GDP_gr< 3.5 11  2 1 (0.1818182 0.8181818) *
dtr2009
## n= 36 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 36 12 0 (0.6666667 0.3333333) *
dtr2010
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 13 0 (0.6486486 0.3513514)  
##    2) education>=3.5 16  1 0 (0.9375000 0.0625000) *
##    3) education< 3.5 21  9 1 (0.4285714 0.5714286)  
##      6) pop_density< 3.5 11  4 0 (0.6363636 0.3636364)  
##       12) unemployment< 3 5  0 0 (1.0000000 0.0000000) *
##       13) unemployment>=3 6  2 1 (0.3333333 0.6666667)  
##         26) GDP_gr< 3.5 3  1 0 (0.6666667 0.3333333) *
##         27) GDP_gr>=3.5 3  0 1 (0.0000000 1.0000000) *
##      7) pop_density>=3.5 10  2 1 (0.2000000 0.8000000)  
##       14) HDI>=3.5 2  0 0 (1.0000000 0.0000000) *
##       15) HDI< 3.5 8  0 1 (0.0000000 1.0000000) *
dtr2011
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 37 17 0 (0.5405405 0.4594595)  
##   2) education>=3.5 17  2 0 (0.8823529 0.1176471) *
##   3) education< 3.5 20  5 1 (0.2500000 0.7500000)  
##     6) GDP>=4.5 2  0 0 (1.0000000 0.0000000) *
##     7) GDP< 4.5 18  3 1 (0.1666667 0.8333333) *
dtr2012
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 16 0 (0.5675676 0.4324324)  
##    2) education>=3.5 18  3 0 (0.8333333 0.1666667)  
##      4) GDP_gr>=2.5 8  0 0 (1.0000000 0.0000000) *
##      5) GDP_gr< 2.5 10  3 0 (0.7000000 0.3000000)  
##       10) GINI< 2.5 4  0 0 (1.0000000 0.0000000) *
##       11) GINI>=2.5 6  3 0 (0.5000000 0.5000000)  
##         22) GINI>=3.5 4  1 0 (0.7500000 0.2500000) *
##         23) GINI< 3.5 2  0 1 (0.0000000 1.0000000) *
##    3) education< 3.5 19  6 1 (0.3157895 0.6842105)  
##      6) pop_density< 3.5 10  4 0 (0.6000000 0.4000000)  
##       12) durable>=1.5 7  1 0 (0.8571429 0.1428571) *
##       13) durable< 1.5 3  0 1 (0.0000000 1.0000000) *
##      7) pop_density>=3.5 9  0 1 (0.0000000 1.0000000) *
dtr2013
## n= 37 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 37 18 0 (0.5135135 0.4864865)  
##    2) durable>=2.5 23  7 0 (0.6956522 0.3043478)  
##      4) pop_density< 2.5 8  0 0 (1.0000000 0.0000000) *
##      5) pop_density>=2.5 15  7 0 (0.5333333 0.4666667)  
##       10) unemployment< 1.5 4  0 0 (1.0000000 0.0000000) *
##       11) unemployment>=1.5 11  4 1 (0.3636364 0.6363636)  
##         22) GDP_gr< 2.5 5  1 0 (0.8000000 0.2000000) *
##         23) GDP_gr>=2.5 6  0 1 (0.0000000 1.0000000) *
##    3) durable< 2.5 14  3 1 (0.2142857 0.7857143)  
##      6) pop_density< 3.5 7  3 1 (0.4285714 0.5714286)  
##       12) GDP>=2.5 4  1 0 (0.7500000 0.2500000) *
##       13) GDP< 2.5 3  0 1 (0.0000000 1.0000000) *
##      7) pop_density>=3.5 7  0 1 (0.0000000 1.0000000) *

To conclude on this brief analysis, it has been shown that not only protection of human rights and major political violence episodes predict terrorism in Asia - interaction of macro-level social and economic inequality may also be related to terrorism.

Africa

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-robust.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2a <- data1[complete.cases(data1), ]

# continent selection

data2 <- subset(data2a, cont == 3)

# categorization

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

gtdd <- data2[, -c(1,2,14)] # domestic
gtdt <- data2[, -c(1,2,13)] # transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.2
optimal_complexity(dtr2002) 
## [1] 0.2222222
optimal_complexity(dtr2003) 
## [1] 0.3333333
optimal_complexity(dtr2004)
## [1] 0.125
optimal_complexity(dtr2005)
## [1] 0.2142857
optimal_complexity(dtr2006)
## [1] 0.05
optimal_complexity(dtr2007)
## [1] 0.025
optimal_complexity(dtr2008)
## [1] 0.16
optimal_complexity(dtr2009)
## [1] 0.125
optimal_complexity(dtr2010)
## [1] 0.125
optimal_complexity(dtr2011)
## [1] 0.1111111
optimal_complexity(dtr2012)
## [1] 0.08333333
optimal_complexity(dtr2013)
## [1] 0.1428571
# transnational

optimal_complexity(ttr2001)
## [1] 0.1481481
optimal_complexity(ttr2002)
## [1] 0.07142857
optimal_complexity(ttr2003)
## [1] 0
optimal_complexity(ttr2004)
## [1] 0.2222222
optimal_complexity(ttr2005)
## [1] 0.4285714
optimal_complexity(ttr2006)
## [1] 0.125
optimal_complexity(ttr2007)
## [1] 0.1
optimal_complexity(ttr2008)
## [1] 0.1666667
optimal_complexity(ttr2009)
## [1] 0.1333333
optimal_complexity(ttr2010)
## [1] 0.2307692
optimal_complexity(ttr2011)
## [1] 0.1818182
optimal_complexity(ttr2012)
## [1] 0.025
optimal_complexity(ttr2013)
## [1] 0.1333333

The complexity parameters of the decision trees were then optimized according to the obtained results. In cases where optimal cp was below .25, the value of .25 was used.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .2))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .222))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .333))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .214))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .05))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .16))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .111))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .083))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .142))

# transnational 

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .148))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .071))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .222))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .428))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .1))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .166))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .133))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .230))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .181))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .133))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic

ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 31  9
##          1  0  2
##                                          
##                Accuracy : 0.7857         
##                  95% CI : (0.6319, 0.897)
##     No Information Rate : 0.7381         
##     P-Value [Acc > NIR] : 0.306815       
##                                          
##                   Kappa : 0.247          
##  Mcnemar's Test P-Value : 0.007661       
##                                          
##             Sensitivity : 1.0000         
##             Specificity : 0.1818         
##          Pos Pred Value : 0.7750         
##          Neg Pred Value : 1.0000         
##              Prevalence : 0.7381         
##          Detection Rate : 0.7381         
##    Detection Prevalence : 0.9524         
##       Balanced Accuracy : 0.5909         
##                                          
##        'Positive' Class : 0              
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 29  8
##          1  2  3
##                                           
##                Accuracy : 0.7619          
##                  95% CI : (0.6055, 0.8795)
##     No Information Rate : 0.7381          
##     P-Value [Acc > NIR] : 0.4413          
##                                           
##                   Kappa : 0.2527          
##  Mcnemar's Test P-Value : 0.1138          
##                                           
##             Sensitivity : 0.9355          
##             Specificity : 0.2727          
##          Pos Pred Value : 0.7838          
##          Neg Pred Value : 0.6000          
##              Prevalence : 0.7381          
##          Detection Rate : 0.6905          
##    Detection Prevalence : 0.8810          
##       Balanced Accuracy : 0.6041          
##                                           
##        'Positive' Class : 0               
## 

Unlike in Asia, used variables did not significantly improve terrorism prediction.

Europe

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart.plot", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-robust.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2a <- data1[complete.cases(data1), ]

# continent selection

data2 <- subset(data2a, cont == 2)

# categorization

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

gtdd <- data2[, -c(1,2,14)] # domestic
gtdt <- data2[, -c(1,2,13)] # transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0
optimal_complexity(dtr2002) 
## [1] 0.1666667
optimal_complexity(dtr2003) 
## [1] 0
optimal_complexity(dtr2004)
## [1] 0.1666667
optimal_complexity(dtr2005)
## [1] 0.025
optimal_complexity(dtr2006)
## [1] 0.3333333
optimal_complexity(dtr2007)
## [1] 0.1666667
optimal_complexity(dtr2008)
## [1] 0.25
optimal_complexity(dtr2009)
## [1] 0.125
optimal_complexity(dtr2010)
## [1] 0.25
optimal_complexity(dtr2011)
## [1] 0.3333333
optimal_complexity(dtr2012)
## [1] 0.2
optimal_complexity(dtr2013)
## [1] 0.2
# transnational

optimal_complexity(ttr2001)
## [1] 0.3333333
optimal_complexity(ttr2002)
## [1] 0.15
optimal_complexity(ttr2003)
## [1] 0.3333333
optimal_complexity(ttr2004)
## [1] 0.025
optimal_complexity(ttr2005)
## [1] 0.2857143
optimal_complexity(ttr2006)
## [1] 0.025
optimal_complexity(ttr2007)
## [1] 0.1212121
optimal_complexity(ttr2008)
## [1] 0.2222222
optimal_complexity(ttr2009)
## [1] 0.2272727
optimal_complexity(ttr2010)
## [1] 0.3
optimal_complexity(ttr2011)
## [1] 0.1666667
optimal_complexity(ttr2012)
## [1] 0.025
optimal_complexity(ttr2013)
## [1] 0.1481481

The complexity parameters of the decision trees were then optimized according to the obtained results. In cases where optimal cp was below .25, the value of .25 was used.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .166))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .166))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .333))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .166))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .25))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .25))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .333))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .2))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .2))

# transnational 

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .333))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .15))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .333))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .285))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .121))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .222))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .227))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .3))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .166))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .148))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic

ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 28  5
##          1  0  0
##                                          
##                Accuracy : 0.8485         
##                  95% CI : (0.681, 0.9489)
##     No Information Rate : 0.8485         
##     P-Value [Acc > NIR] : 0.61636        
##                                          
##                   Kappa : 0              
##  Mcnemar's Test P-Value : 0.07364        
##                                          
##             Sensitivity : 1.0000         
##             Specificity : 0.0000         
##          Pos Pred Value : 0.8485         
##          Neg Pred Value :    NaN         
##              Prevalence : 0.8485         
##          Detection Rate : 0.8485         
##    Detection Prevalence : 1.0000         
##       Balanced Accuracy : 0.5000         
##                                          
##        'Positive' Class : 0              
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 17  9
##          1  5  2
##                                           
##                Accuracy : 0.5758          
##                  95% CI : (0.3922, 0.7452)
##     No Information Rate : 0.6667          
##     P-Value [Acc > NIR] : 0.9001          
##                                           
##                   Kappa : -0.05           
##  Mcnemar's Test P-Value : 0.4227          
##                                           
##             Sensitivity : 0.7727          
##             Specificity : 0.1818          
##          Pos Pred Value : 0.6538          
##          Neg Pred Value : 0.2857          
##              Prevalence : 0.6667          
##          Detection Rate : 0.5152          
##    Detection Prevalence : 0.7879          
##       Balanced Accuracy : 0.4773          
##                                           
##        'Positive' Class : 0               
## 

Again, none of the models emerged significant.

Muslim countries

Muslim countries are defined as countries that have more than 50% of Muslim citizens OR more than 1 million Muslim citizens AND are located in Asia or Africa (according to the 2009 estimate by PEW).

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]

# formation of splitting variable
data2$Muslim <- ifelse(data2$country == "Morocco"|data2$country =="Iraq"|data2$country =="Saudi Arabia"|data2$country =="Kuwait"|data2$country =="Gambia"|data2$country =="Afghanistan"|data2$country =="Tunisia"|data2$country =="Iran"|data2$country =="Azerbaijan"|data2$country =="Yemen"|data2$country =="Mauritania"|data2$country =="Niger"|data2$country =="Somalia"|data2$country =="Maldives"|data2$country =="Comoros"|data2$country =="Jordan"|data2$country =="Algeria"|data2$country =="Djibouti"|data2$country =="Libya"|data2$country =="Pakistan"|data2$country =="Uzbekistan"|data2$country =="Senegal"|data2$country =="Egypt"|data2$country =="Turkmenistan"|data2$country =="Mali"|data2$country =="Syria"|data2$country =="Bangladesh"|data2$country =="Indonesia"|data2$country =="Oman"|data2$country =="Kyrgyzstan"|data2$country =="Guinea"|data2$country =="Tajikistan"|data2$country == "Bahrain"|data2$country =="Qatar"|data2$country =="United Arab Emirates"|data2$country =="Sudan"|data2$country =="Sierra Leone"|data2$country =="Brunei"|data2$country =="Malaysia"|data2$country =="Lebanon"|data2$country =="Burkina Faso"|data2$country =="Kazakhstan"|data2$country =="Chad"|data2$country =="Nigeria"|data2$country =="India"|data2$country =="China"|data2$country =="Russian Federation"|data2$country =="Tanzania"|data2$country =="Cote d'Ivoire"|data2$country =="Mozambique"|data2$country =="Philippines"|data2$country =="Uganda"|data2$country =="Thailand"|data2$country =="Ghana"|data2$country =="Cameroon"|data2$country =="Kenya"|data2$country =="Benin"|data2$country =="Malawi"|data2$country =="Myanmar"|data2$country =="Eritrea"|data2$country =="Nepal"|data2$country =="Israel",1,2)

data2 <- subset(data2, Muslim == 1)

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.2666667
optimal_complexity(dtr2002) 
## [1] 0.0625
optimal_complexity(dtr2003) 
## [1] 0.03333333
optimal_complexity(dtr2004)
## [1] 0.08823529
optimal_complexity(dtr2005)
## [1] 0.2666667
optimal_complexity(dtr2006)
## [1] 0.025
optimal_complexity(dtr2007)
## [1] 0.06060606
optimal_complexity(dtr2008)
## [1] 0.05882353
optimal_complexity(dtr2009)
## [1] 0.125
optimal_complexity(dtr2010)
## [1] 0.1
optimal_complexity(dtr2011)
## [1] 0.08695652
optimal_complexity(dtr2012)
## [1] 0.04347826
optimal_complexity(dtr2013)
## [1] 0.05797101
# transnational

optimal_complexity(ttr2001)
## [1] 0.025
optimal_complexity(ttr2002)
## [1] 0.05555556
optimal_complexity(ttr2003)
## [1] 0.025
optimal_complexity(ttr2004)
## [1] 0.09090909
optimal_complexity(ttr2005)
## [1] 0.1111111
optimal_complexity(ttr2006)
## [1] 0.06818182
optimal_complexity(ttr2007)
## [1] 0.025
optimal_complexity(ttr2008)
## [1] 0.08695652
optimal_complexity(ttr2009)
## [1] 0.06
optimal_complexity(ttr2010)
## [1] 0.025
optimal_complexity(ttr2011)
## [1] 0.08
optimal_complexity(ttr2012)
## [1] 0.08333333
optimal_complexity(ttr2013)
## [1] 0.1818182

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .266))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .062))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .033))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .088))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .266))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .060))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .058))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .1))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .086))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .043))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .057))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .055))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .090))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .111))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .068))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .086))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .06))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .08))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .083))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .181))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 29 11
##          1  0 12
##                                          
##                Accuracy : 0.7885         
##                  95% CI : (0.653, 0.8894)
##     No Information Rate : 0.5577         
##     P-Value [Acc > NIR] : 0.0004469      
##                                          
##                   Kappa : 0.5489         
##  Mcnemar's Test P-Value : 0.0025688      
##                                          
##             Sensitivity : 1.0000         
##             Specificity : 0.5217         
##          Pos Pred Value : 0.7250         
##          Neg Pred Value : 1.0000         
##              Prevalence : 0.5577         
##          Detection Rate : 0.5577         
##    Detection Prevalence : 0.7692         
##       Balanced Accuracy : 0.7609         
##                                          
##        'Positive' Class : 0              
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 18 18
##          1  4 12
##                                          
##                Accuracy : 0.5769         
##                  95% CI : (0.432, 0.7127)
##     No Information Rate : 0.5769         
##     P-Value [Acc > NIR] : 0.558525       
##                                          
##                   Kappa : 0.2011         
##  Mcnemar's Test P-Value : 0.005578       
##                                          
##             Sensitivity : 0.8182         
##             Specificity : 0.4000         
##          Pos Pred Value : 0.5000         
##          Neg Pred Value : 0.7500         
##              Prevalence : 0.4231         
##          Detection Rate : 0.3462         
##    Detection Prevalence : 0.6923         
##       Balanced Accuracy : 0.6091         
##                                          
##        'Positive' Class : 0              
## 

Variable importance for domestic terrorism

# with surrogates only
xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates (actual tree)

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,8,16,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "gray34", "goldenrod", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 53 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 53 15 0 (0.7169811 0.2830189)  
##   2) major_violence_ep< 0.5 37  5 0 (0.8648649 0.1351351) *
##   3) major_violence_ep>=0.5 16  6 1 (0.3750000 0.6250000) *
dtr2002
## n= 53 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 53 16 0 (0.6981132 0.3018868)  
##   2) major_violence_ep< 0.5 39  5 0 (0.8717949 0.1282051) *
##   3) major_violence_ep>=0.5 14  3 1 (0.2142857 0.7857143)  
##     6) pop_density< 2.5 5  2 0 (0.6000000 0.4000000) *
##     7) pop_density>=2.5 9  0 1 (0.0000000 1.0000000) *
dtr2003
## n= 53 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 53 15 0 (0.71698113 0.28301887)  
##    2) major_violence_ep< 0.5 37  1 0 (0.97297297 0.02702703) *
##    3) major_violence_ep>=0.5 16  2 1 (0.12500000 0.87500000)  
##      6) pop_density< 2.5 6  2 1 (0.33333333 0.66666667)  
##       12) durable>=2.5 3  1 0 (0.66666667 0.33333333) *
##       13) durable< 2.5 3  0 1 (0.00000000 1.00000000) *
##      7) pop_density>=2.5 10  0 1 (0.00000000 1.00000000) *
dtr2004
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 54 17 0 (0.68518519 0.31481481)  
##    2) HRP>=2.5 31  2 0 (0.93548387 0.06451613) *
##    3) HRP< 2.5 23  8 1 (0.34782609 0.65217391)  
##      6) pop_density< 3.5 13  6 0 (0.53846154 0.46153846)  
##       12) GINI< 4.5 11  4 0 (0.63636364 0.36363636) *
##       13) GINI>=4.5 2  0 1 (0.00000000 1.00000000) *
##      7) pop_density>=3.5 10  1 1 (0.10000000 0.90000000) *
dtr2005
## n= 55 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 55 15 0 (0.72727273 0.27272727)  
##   2) HRP>=2.5 31  1 0 (0.96774194 0.03225806) *
##   3) HRP< 2.5 24 10 1 (0.41666667 0.58333333) *
dtr2006
## n= 55 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 55 19 0 (0.65454545 0.34545455)  
##    2) HRP>=2.5 31  2 0 (0.93548387 0.06451613) *
##    3) HRP< 2.5 24  7 1 (0.29166667 0.70833333)  
##      6) GDP>=2.5 12  6 0 (0.50000000 0.50000000)  
##       12) polR>=1.5 8  2 0 (0.75000000 0.25000000)  
##         24) GINI>=1.5 5  0 0 (1.00000000 0.00000000) *
##         25) GINI< 1.5 3  1 1 (0.33333333 0.66666667) *
##       13) polR< 1.5 4  0 1 (0.00000000 1.00000000) *
##      7) GDP< 2.5 12  1 1 (0.08333333 0.91666667) *
dtr2007
## n= 55 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 55 22 0 (0.60000000 0.40000000)  
##    2) HRP>=2.5 31  4 0 (0.87096774 0.12903226) *
##    3) HRP< 2.5 24  6 1 (0.25000000 0.75000000)  
##      6) major_violence_ep< 0.5 11  5 1 (0.45454545 0.54545455)  
##       12) GDP>=1.5 8  3 0 (0.62500000 0.37500000)  
##         24) civL< 1.5 6  1 0 (0.83333333 0.16666667) *
##         25) civL>=1.5 2  0 1 (0.00000000 1.00000000) *
##       13) GDP< 1.5 3  0 1 (0.00000000 1.00000000) *
##      7) major_violence_ep>=0.5 13  1 1 (0.07692308 0.92307692) *
dtr2008
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 54 17 0 (0.68518519 0.31481481)  
##    2) major_violence_ep< 0.5 39  6 0 (0.84615385 0.15384615)  
##      4) GINI>=1.5 29  2 0 (0.93103448 0.06896552) *
##      5) GINI< 1.5 10  4 0 (0.60000000 0.40000000)  
##       10) HRP>=3.5 4  0 0 (1.00000000 0.00000000) *
##       11) HRP< 3.5 6  2 1 (0.33333333 0.66666667)  
##         22) pop_density< 3.5 3  1 0 (0.66666667 0.33333333) *
##         23) pop_density>=3.5 3  0 1 (0.00000000 1.00000000) *
##    3) major_violence_ep>=0.5 15  4 1 (0.26666667 0.73333333)  
##      6) HRP>=1.5 5  2 0 (0.60000000 0.40000000) *
##      7) HRP< 1.5 10  1 1 (0.10000000 0.90000000) *
dtr2009
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 54 16 0 (0.7037037 0.2962963)  
##   2) major_violence_ep< 1.5 44  8 0 (0.8181818 0.1818182) *
##   3) major_violence_ep>=1.5 10  2 1 (0.2000000 0.8000000) *
dtr2010
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 54 15 0 (0.7222222 0.2777778)  
##   2) major_violence_ep< 0.5 40  5 0 (0.8750000 0.1250000) *
##   3) major_violence_ep>=0.5 14  4 1 (0.2857143 0.7142857) *
dtr2011
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 54 23 0 (0.5740741 0.4259259)  
##    2) major_violence_ep< 1.5 45 14 0 (0.6888889 0.3111111)  
##      4) GINI>=3.5 19  2 0 (0.8947368 0.1052632)  
##        8) unemployment< 4.5 17  0 0 (1.0000000 0.0000000) *
##        9) unemployment>=4.5 2  0 1 (0.0000000 1.0000000) *
##      5) GINI< 3.5 26 12 0 (0.5384615 0.4615385)  
##       10) pop_density< 4.5 22  8 0 (0.6363636 0.3636364)  
##         20) education>=4.5 6  0 0 (1.0000000 0.0000000) *
##         21) education< 4.5 16  8 0 (0.5000000 0.5000000)  
##           42) durable< 2.5 8  2 0 (0.7500000 0.2500000) *
##           43) durable>=2.5 8  2 1 (0.2500000 0.7500000) *
##       11) pop_density>=4.5 4  0 1 (0.0000000 1.0000000) *
##    3) major_violence_ep>=1.5 9  0 1 (0.0000000 1.0000000) *
dtr2012
## n= 52 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 52 23 0 (0.55769231 0.44230769)  
##    2) major_violence_ep< 0.5 39 11 0 (0.71794872 0.28205128)  
##      4) durable>=3.5 18  1 0 (0.94444444 0.05555556) *
##      5) durable< 3.5 21 10 0 (0.52380952 0.47619048)  
##       10) HDI< 2.5 9  1 0 (0.88888889 0.11111111) *
##       11) HDI>=2.5 12  3 1 (0.25000000 0.75000000)  
##         22) HRP>=3.5 5  2 0 (0.60000000 0.40000000) *
##         23) HRP< 3.5 7  0 1 (0.00000000 1.00000000) *
##    3) major_violence_ep>=0.5 13  1 1 (0.07692308 0.92307692) *
dtr2013
## n= 52 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 52 23 0 (0.55769231 0.44230769)  
##    2) major_violence_ep< 0.5 38 10 0 (0.73684211 0.26315789)  
##      4) HRP>=4.5 9  0 0 (1.00000000 0.00000000) *
##      5) HRP< 4.5 29 10 0 (0.65517241 0.34482759)  
##       10) pop_density< 2.5 10  1 0 (0.90000000 0.10000000) *
##       11) pop_density>=2.5 19  9 0 (0.52631579 0.47368421)  
##         22) durable>=2.5 13  4 0 (0.69230769 0.30769231) *
##         23) durable< 2.5 6  1 1 (0.16666667 0.83333333) *
##    3) major_violence_ep>=0.5 14  1 1 (0.07142857 0.92857143) *

The results confirm major episodes of political violence and protection of human rights as the dominant predictors of domestic terrorism.

Non-Muslim countries

Non-Muslim countries are defined as countries that have less than 50% of Muslim citizens AND less than 1 million Muslim citizens (according to the 2009 estimate by PEW).

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]

# formation of splitting variable
data2$Muslim <- ifelse(data2$country == "Morocco"|data2$country =="Iraq"|data2$country =="Saudi Arabia"|data2$country =="Kuwait"|data2$country =="Gambia"|data2$country =="Afghanistan"|data2$country =="Tunisia"|data2$country =="Iran"|data2$country =="Azerbaijan"|data2$country =="Yemen"|data2$country =="Mauritania"|data2$country =="Niger"|data2$country =="Somalia"|data2$country =="Maldives"|data2$country =="Comoros"|data2$country =="Jordan"|data2$country =="Algeria"|data2$country =="Djibouti"|data2$country =="Libya"|data2$country =="Pakistan"|data2$country =="Uzbekistan"|data2$country =="Senegal"|data2$country =="Egypt"|data2$country =="Turkmenistan"|data2$country =="Mali"|data2$country =="Syria"|data2$country =="Bangladesh"|data2$country =="Indonesia"|data2$country =="Oman"|data2$country =="Kyrgyzstan"|data2$country =="Guinea"|data2$country =="Tajikistan"|data2$country == "Bahrain"|data2$country =="Qatar"|data2$country =="United Arab Emirates"|data2$country =="Sudan"|data2$country =="Sierra Leone"|data2$country =="Brunei"|data2$country =="Malaysia"|data2$country =="Lebanon"|data2$country =="Burkina Faso"|data2$country =="Kazakhstan"|data2$country =="Chad"|data2$country =="Nigeria"|data2$country =="India"|data2$country =="China"|data2$country =="Russian Federation"|data2$country =="Tanzania"|data2$country =="Cote d'Ivoire"|data2$country =="Mozambique"|data2$country =="Philippines"|data2$country =="Uganda"|data2$country =="Thailand"|data2$country =="Ghana"|data2$country =="Cameroon"|data2$country =="Kenya"|data2$country =="Benin"|data2$country =="Malawi"|data2$country =="Myanmar"|data2$country =="Eritrea"|data2$country =="Nepal"|data2$country =="Israel",1,2)
data2 <- subset(data2, Muslim == 2)

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.04166667
optimal_complexity(dtr2002) 
## [1] 0.1333333
optimal_complexity(dtr2003) 
## [1] 0.1111111
optimal_complexity(dtr2004)
## [1] 0.125
optimal_complexity(dtr2005)
## [1] 0.09090909
optimal_complexity(dtr2006)
## [1] 0.025
optimal_complexity(dtr2007)
## [1] 0.03846154
optimal_complexity(dtr2008)
## [1] 0.1
optimal_complexity(dtr2009)
## [1] 0.1
optimal_complexity(dtr2010)
## [1] 0.04166667
optimal_complexity(dtr2011)
## [1] 0.05208333
optimal_complexity(dtr2012)
## [1] 0.0625
optimal_complexity(dtr2013)
## [1] 0.0625
# transnational

optimal_complexity(ttr2001)
## [1] 0.08333333
optimal_complexity(ttr2002)
## [1] 0.1052632
optimal_complexity(ttr2003)
## [1] 0.09090909
optimal_complexity(ttr2004)
## [1] 0.1875
optimal_complexity(ttr2005)
## [1] 0.07142857
optimal_complexity(ttr2006)
## [1] 0.125
optimal_complexity(ttr2007)
## [1] 0.04347826
optimal_complexity(ttr2008)
## [1] 0.04166667
optimal_complexity(ttr2009)
## [1] 0.1052632
optimal_complexity(ttr2010)
## [1] 0.1282051
optimal_complexity(ttr2011)
## [1] 0.05882353
optimal_complexity(ttr2012)
## [1] 0.07692308
optimal_complexity(ttr2013)
## [1] 0.04545455

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .041))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .133))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .111))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .090))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .025))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .038))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .1))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .1))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .041))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .052))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .062))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-Muslim, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .062))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .083))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .105))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .090))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .187))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .071))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .125))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .043))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .041))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .105))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .128))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .058))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .076))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-Muslim, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 11, xval = 50, cp = .045))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 73  9
##          1  0  5
##                                           
##                Accuracy : 0.8966          
##                  95% CI : (0.8127, 0.9516)
##     No Information Rate : 0.8391          
##     P-Value [Acc > NIR] : 0.089579        
##                                           
##                   Kappa : 0.4825          
##  Mcnemar's Test P-Value : 0.007661        
##                                           
##             Sensitivity : 1.0000          
##             Specificity : 0.3571          
##          Pos Pred Value : 0.8902          
##          Neg Pred Value : 1.0000          
##              Prevalence : 0.8391          
##          Detection Rate : 0.8391          
##    Detection Prevalence : 0.9425          
##       Balanced Accuracy : 0.6786          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 68 11
##          1  4  4
##                                           
##                Accuracy : 0.8276          
##                  95% CI : (0.7316, 0.9002)
##     No Information Rate : 0.8276          
##     P-Value [Acc > NIR] : 0.5684          
##                                           
##                   Kappa : 0.2589          
##  Mcnemar's Test P-Value : 0.1213          
##                                           
##             Sensitivity : 0.9444          
##             Specificity : 0.2667          
##          Pos Pred Value : 0.8608          
##          Neg Pred Value : 0.5000          
##              Prevalence : 0.8276          
##          Detection Rate : 0.7816          
##    Detection Prevalence : 0.9080          
##       Balanced Accuracy : 0.6056          
##                                           
##        'Positive' Class : 0               
## 

The results above suggest that terrorism does not seem to be related to objective inequality in non-Muslim countries.

What if…? Muslim countries

Muslim countries are defined as countries that have more than 50% of Muslim citizens OR more than 1 million Muslim citizens (according to the 2009 estimate by PEW). Only the two strongest predictors of domestic terrorism are used in this analysis.

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]

# formation of splitting variable
data2$Muslim <- ifelse(data2$country == "Morocco"|data2$country =="Iraq"|data2$country =="Saudi Arabia"|data2$country =="Kuwait"|data2$country =="Gambia"|data2$country =="Afghanistan"|data2$country =="Tunisia"|data2$country =="Iran"|data2$country =="Azerbaijan"|data2$country =="Yemen"|data2$country =="Mauritania"|data2$country =="Niger"|data2$country =="Somalia"|data2$country =="Maldives"|data2$country =="Comoros"|data2$country =="Jordan"|data2$country =="Algeria"|data2$country =="Djibouti"|data2$country =="Libya"|data2$country =="Pakistan"|data2$country =="Uzbekistan"|data2$country =="Senegal"|data2$country =="Egypt"|data2$country =="Turkmenistan"|data2$country =="Mali"|data2$country =="Syria"|data2$country =="Bangladesh"|data2$country =="Kosovo"|data2$country =="Indonesia"|data2$country =="Oman"|data2$country =="Kyrgyzstan"|data2$country =="Guinea"|data2$country =="Tajikistan"|data2$country == "Bahrain"|data2$country =="Albania"|data2$country =="Qatar"|data2$country =="United Arab Emirates"|data2$country =="Sudan"|data2$country =="Sierra Leone"|data2$country =="Brunei"|data2$country =="Malaysia"|data2$country =="Lebanon"|data2$country =="Burkina Faso"|data2$country =="Kazakhstan"|data2$country =="Chad"|data2$country =="Nigeria"|data2$country =="India"|data2$country =="China"|data2$country =="Russian Federation"|data2$country =="Tanzania"|data2$country =="Cote d'Ivoire"|data2$country =="Mozambique"|data2$country =="Philippines"|data2$country =="Uganda"|data2$country =="Thailand"|data2$country =="Ghana"|data2$country =="Cameroon"|data2$country =="Kenya"|data2$country =="United States"|data2$country =="Benin"|data2$country =="Malawi"|data2$country =="Myanmar"|data2$country =="Eritrea"|data2$country =="Bosnia and Herzegovina"|data2$country =="Nepal"|data2$country =="Israel",1,2)

data2 <- subset(data2, Muslim == 1)

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.2
optimal_complexity(dtr2002) 
## [1] 0.025
optimal_complexity(dtr2003) 
## [1] 0.025
optimal_complexity(dtr2004)
## [1] 0.025
optimal_complexity(dtr2005)
## [1] 0.025
optimal_complexity(dtr2006)
## [1] 0.025
optimal_complexity(dtr2007)
## [1] 0.025
optimal_complexity(dtr2008)
## [1] 0.1176471
optimal_complexity(dtr2009)
## [1] 0.0625
optimal_complexity(dtr2010)
## [1] 0.025
optimal_complexity(dtr2011)
## [1] 0.04166667
optimal_complexity(dtr2012)
## [1] 0.025
optimal_complexity(dtr2013)
## [1] 0.025
# transnational

optimal_complexity(ttr2001)
## [1] 0.025
optimal_complexity(ttr2002)
## [1] 0.025
optimal_complexity(ttr2003)
## [1] 0.025
optimal_complexity(ttr2004)
## [1] 0.025
optimal_complexity(ttr2005)
## [1] 0.025
optimal_complexity(ttr2006)
## [1] 0.025
optimal_complexity(ttr2007)
## [1] 0.03703704
optimal_complexity(ttr2008)
## [1] 0.025
optimal_complexity(ttr2009)
## [1] 0.025
optimal_complexity(ttr2010)
## [1] 0.025
optimal_complexity(ttr2011)
## [1] 0.025
optimal_complexity(ttr2012)
## [1] 0.04166667
optimal_complexity(ttr2013)
## [1] 0.025

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .2))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .117))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .062))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .041))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~HRP+major_violence_ep, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .037))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .041))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~HRP+major_violence_ep, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 1, xval = 50, cp = .025))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 29 10
##          1  1 14
##                                           
##                Accuracy : 0.7963          
##                  95% CI : (0.6647, 0.8937)
##     No Information Rate : 0.5556          
##     P-Value [Acc > NIR] : 0.0001925       
##                                           
##                   Kappa : 0.5714          
##  Mcnemar's Test P-Value : 0.0158613       
##                                           
##             Sensitivity : 0.9667          
##             Specificity : 0.5833          
##          Pos Pred Value : 0.7436          
##          Neg Pred Value : 0.9333          
##              Prevalence : 0.5556          
##          Detection Rate : 0.5370          
##    Detection Prevalence : 0.7222          
##       Balanced Accuracy : 0.7750          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 20 19
##          1  3 12
##                                           
##                Accuracy : 0.5926          
##                  95% CI : (0.4503, 0.7243)
##     No Information Rate : 0.5741          
##     P-Value [Acc > NIR] : 0.448024        
##                                           
##                   Kappa : 0.2355          
##  Mcnemar's Test P-Value : 0.001384        
##                                           
##             Sensitivity : 0.8696          
##             Specificity : 0.3871          
##          Pos Pred Value : 0.5128          
##          Neg Pred Value : 0.8000          
##              Prevalence : 0.4259          
##          Detection Rate : 0.3704          
##    Detection Prevalence : 0.7222          
##       Balanced Accuracy : 0.6283          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = T, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1)
# with competitors only

xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = T)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates

xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2009, surrogates = F, competes = F)
vimp9 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp9, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, col = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(8, 16)) + scale_color_manual(values = c("gray34", "goldenrod")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2)

Location of splits

dtr2001
## n= 56 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 56 15 0 (0.7321429 0.2678571) *
dtr2002
## n= 56 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 56 17 0 (0.6964286 0.3035714)  
##   2) major_violence_ep< 0.5 42  6 0 (0.8571429 0.1428571) *
##   3) major_violence_ep>=0.5 14  3 1 (0.2142857 0.7857143) *
dtr2003
## n= 56 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 56 15 0 (0.73214286 0.26785714)  
##   2) major_violence_ep< 0.5 39  1 0 (0.97435897 0.02564103) *
##   3) major_violence_ep>=0.5 17  3 1 (0.17647059 0.82352941) *
dtr2004
## n= 57 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 57 17 0 (0.7017544 0.2982456)  
##   2) HRP>=2.5 32  1 0 (0.9687500 0.0312500) *
##   3) HRP< 2.5 25  9 1 (0.3600000 0.6400000)  
##     6) major_violence_ep< 0.5 8  3 0 (0.6250000 0.3750000) *
##     7) major_violence_ep>=0.5 17  4 1 (0.2352941 0.7647059) *
dtr2005
## n= 58 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 58 15 0 (0.7413793 0.2586207)  
##    2) HRP>=2.5 32  1 0 (0.9687500 0.0312500) *
##    3) HRP< 2.5 26 12 1 (0.4615385 0.5384615)  
##      6) major_violence_ep< 2.5 23 11 0 (0.5217391 0.4782609)  
##       12) major_violence_ep< 0.5 9  3 0 (0.6666667 0.3333333)  
##         24) HRP>=1.5 6  1 0 (0.8333333 0.1666667) *
##         25) HRP< 1.5 3  1 1 (0.3333333 0.6666667) *
##       13) major_violence_ep>=0.5 14  6 1 (0.4285714 0.5714286) *
##      7) major_violence_ep>=2.5 3  0 1 (0.0000000 1.0000000) *
dtr2006
## n= 58 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 58 20 0 (0.6551724 0.3448276)  
##   2) major_violence_ep< 0.5 39  5 0 (0.8717949 0.1282051) *
##   3) major_violence_ep>=0.5 19  4 1 (0.2105263 0.7894737) *
dtr2007
## n= 58 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 58 23 0 (0.60344828 0.39655172)  
##   2) HRP>=2.5 34  5 0 (0.85294118 0.14705882)  
##     4) major_violence_ep< 0.5 31  3 0 (0.90322581 0.09677419) *
##     5) major_violence_ep>=0.5 3  1 1 (0.33333333 0.66666667) *
##   3) HRP< 2.5 24  6 1 (0.25000000 0.75000000) *
dtr2008
## n= 56 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 56 17 0 (0.6964286 0.3035714)  
##   2) major_violence_ep< 0.5 40  6 0 (0.8500000 0.1500000) *
##   3) major_violence_ep>=0.5 16  5 1 (0.3125000 0.6875000)  
##     6) HRP>=2.5 2  0 0 (1.0000000 0.0000000) *
##     7) HRP< 2.5 14  3 1 (0.2142857 0.7857143) *
dtr2009
## n= 56 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 56 16 0 (0.7142857 0.2857143)  
##    2) HRP>=2.5 32  3 0 (0.9062500 0.0937500) *
##    3) HRP< 2.5 24 11 1 (0.4583333 0.5416667)  
##      6) major_violence_ep< 1.5 14  5 0 (0.6428571 0.3571429)  
##       12) HRP< 1.5 4  0 0 (1.0000000 0.0000000) *
##       13) HRP>=1.5 10  5 0 (0.5000000 0.5000000)  
##         26) major_violence_ep< 0.5 8  3 0 (0.6250000 0.3750000) *
##         27) major_violence_ep>=0.5 2  0 1 (0.0000000 1.0000000) *
##      7) major_violence_ep>=1.5 10  2 1 (0.2000000 0.8000000)  
##       14) HRP>=1.5 2  0 0 (1.0000000 0.0000000) *
##       15) HRP< 1.5 8  0 1 (0.0000000 1.0000000) *
dtr2010
## n= 56 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 56 15 0 (0.7321429 0.2678571)  
##   2) major_violence_ep< 0.5 41  5 0 (0.8780488 0.1219512) *
##   3) major_violence_ep>=0.5 15  5 1 (0.3333333 0.6666667) *
dtr2011
## n= 56 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 56 24 0 (0.5714286 0.4285714)  
##    2) major_violence_ep< 1.5 47 15 0 (0.6808511 0.3191489)  
##      4) major_violence_ep< 0.5 40 11 0 (0.7250000 0.2750000) *
##      5) major_violence_ep>=0.5 7  3 1 (0.4285714 0.5714286)  
##       10) HRP< 1.5 3  1 0 (0.6666667 0.3333333) *
##       11) HRP>=1.5 4  1 1 (0.2500000 0.7500000) *
##    3) major_violence_ep>=1.5 9  0 1 (0.0000000 1.0000000) *
dtr2012
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 54 24 0 (0.55555556 0.44444444)  
##   2) major_violence_ep< 0.5 40 11 0 (0.72500000 0.27500000) *
##   3) major_violence_ep>=0.5 14  1 1 (0.07142857 0.92857143) *
dtr2013
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 54 24 0 (0.55555556 0.44444444)  
##   2) major_violence_ep< 0.5 39 10 0 (0.74358974 0.25641026) *
##   3) major_violence_ep>=0.5 15  1 1 (0.06666667 0.93333333) *

The results suggest that protection of human rights and major episodes of political violence are sufficient predictors of domestic terrorism.

What if…? Muslim countries (2)

Muslim countries are defined as countries that have more than 50% of Muslim citizens OR more than 1 million Muslim citizens AND are located in Africa or Asia (according to the 2009 estimate by PEW).

Activation of libraries and dataset

# library activation

lapply(c("readr", "reshape2", "plyr", "dplyr", "tidyr", "Hmisc", "caret", "rpart", "plotly"), library, character.only = T)

# dataset preparation

data1 <- read_delim("post-prepared.csv", ";", escape_double = FALSE, locale = locale(decimal_mark = ",", grouping_mark = "."), trim_ws = TRUE)

Categorization of explanatory variables

data2 <- data1[complete.cases(data1), ]

# formation of splitting variable
data2$Muslim <- ifelse(data2$country == "Morocco"|data2$country =="Iraq"|data2$country =="Saudi Arabia"|data2$country =="Kuwait"|data2$country =="Gambia"|data2$country =="Afghanistan"|data2$country =="Tunisia"|data2$country =="Iran"|data2$country =="Azerbaijan"|data2$country =="Yemen"|data2$country =="Mauritania"|data2$country =="Niger"|data2$country =="Somalia"|data2$country =="Maldives"|data2$country =="Comoros"|data2$country =="Jordan"|data2$country =="Algeria"|data2$country =="Djibouti"|data2$country =="Libya"|data2$country =="Pakistan"|data2$country =="Uzbekistan"|data2$country =="Senegal"|data2$country =="Egypt"|data2$country =="Turkmenistan"|data2$country =="Mali"|data2$country =="Syria"|data2$country =="Bangladesh"|data2$country =="Indonesia"|data2$country =="Oman"|data2$country =="Kyrgyzstan"|data2$country =="Guinea"|data2$country =="Tajikistan"|data2$country == "Bahrain"|data2$country =="Qatar"|data2$country =="United Arab Emirates"|data2$country =="Sudan"|data2$country =="Sierra Leone"|data2$country =="Brunei"|data2$country =="Malaysia"|data2$country =="Lebanon"|data2$country =="Burkina Faso"|data2$country =="Kazakhstan"|data2$country =="Chad"|data2$country =="Nigeria"|data2$country =="India"|data2$country =="China"|data2$country =="Russian Federation"|data2$country =="Tanzania"|data2$country =="Cote d'Ivoire"|data2$country =="Mozambique"|data2$country =="Philippines"|data2$country =="Uganda"|data2$country =="Thailand"|data2$country =="Ghana"|data2$country =="Cameroon"|data2$country =="Kenya"|data2$country =="Benin"|data2$country =="Malawi"|data2$country =="Myanmar"|data2$country =="Eritrea"|data2$country =="Nepal"|data2$country =="Israel",1,2)

data2 <- subset(data2, Muslim == 1)

data2$GDP <- as.numeric(cut2(data2$GDP, g=5))
data2$GDP_gr <- as.numeric(cut2(data2$GDP_gr, g=5))
data2$HDI <- as.numeric(cut2(data2$HDI, g=5))
data2$pop_density  <- as.numeric(cut2(data2$pop_density, g=5))
data2$durable  <- as.numeric(cut2(data2$durable, g=5))
data2$HRP <- as.numeric(cut2(data2$HRP, g=5))
data2$major_violence_ep <- ifelse(data2$major_violence_ep < 1, 0, ifelse(data2$major_violence_ep > 5, 3, ifelse(data2$major_violence_ep > 0 & data2$major_violence_ep < 3, 1, 2))) 
data2$polR  <- ifelse(data2$polR < 3, 0, ifelse(data2$polR > 5, 2, 1))
data2$civL <- ifelse(data2$civL < 3, 0, ifelse(data2$civL > 5, 2, 1))
data2$GINI  <- as.numeric(cut2(data2$GINI, g=5))
data2$education <- as.numeric(cut2(data2$education, g=5))
data2$unemployment <- as.numeric(cut2(data2$unemployment, g=5))

Splitting the database

Firstly the irrelevant variables were removed, followed by splitting the data with respect to year.

data2 <- data2[,-1]
gtdd <- data2[, -c(1,3,4,5,6,16,17)] #domestic
gtdt <- data2[, -c(1,3,4,5,6,15,17)] #transnational

# domestic

gtdd2001 <- subset(gtdd, year == 2001)
gtdd2002 <- subset(gtdd, year == 2002)
gtdd2003 <- subset(gtdd, year == 2003)
gtdd2004 <- subset(gtdd, year == 2004)
gtdd2005 <- subset(gtdd, year == 2005)
gtdd2006 <- subset(gtdd, year == 2006)
gtdd2007 <- subset(gtdd, year == 2007)
gtdd2008 <- subset(gtdd, year == 2008)
gtdd2009 <- subset(gtdd, year == 2009)
gtdd2010 <- subset(gtdd, year == 2010)
gtdd2011 <- subset(gtdd, year == 2011)
gtdd2012 <- subset(gtdd, year == 2012)
gtdd2013 <- subset(gtdd, year == 2013)
gtdd2014 <- subset(gtdd, year == 2014)

# transnational

gtdt2001 <- subset(gtdt, year == 2001)
gtdt2002 <- subset(gtdt, year == 2002)
gtdt2003 <- subset(gtdt, year == 2003)
gtdt2004 <- subset(gtdt, year == 2004)
gtdt2005 <- subset(gtdt, year == 2005)
gtdt2006 <- subset(gtdt, year == 2006)
gtdt2007 <- subset(gtdt, year == 2007)
gtdt2008 <- subset(gtdt, year == 2008)
gtdt2009 <- subset(gtdt, year == 2009)
gtdt2010 <- subset(gtdt, year == 2010)
gtdt2011 <- subset(gtdt, year == 2011)
gtdt2012 <- subset(gtdt, year == 2012)
gtdt2013 <- subset(gtdt, year == 2013)
gtdt2014 <- subset(gtdt, year == 2014)

Training of decision trees

rpctrl <- rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025)

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2001, method = "class", control = rpctrl)
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2002, method = "class", control = rpctrl)
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2003, method = "class", control = rpctrl)
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2004, method = "class", control = rpctrl)
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2005, method = "class", control = rpctrl)
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2006, method = "class", control = rpctrl)
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2007, method = "class", control = rpctrl)
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2008, method = "class", control = rpctrl)
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2009, method = "class", control = rpctrl)
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2010, method = "class", control = rpctrl)
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2011, method = "class", control = rpctrl)
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2012, method = "class", control = rpctrl)
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2013, method = "class", control = rpctrl)

# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2001, method = "class", control = rpctrl)
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2002, method = "class", control = rpctrl)
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2003, method = "class", control = rpctrl)
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2004, method = "class", control = rpctrl)
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2005, method = "class", control = rpctrl)
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2006, method = "class", control = rpctrl)
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2007, method = "class", control = rpctrl)
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2008, method = "class", control = rpctrl)
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2009, method = "class", control = rpctrl)
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2010, method = "class", control = rpctrl)
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2011, method = "class", control = rpctrl)
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2012, method = "class", control = rpctrl)
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2013, method = "class", control = rpctrl)

Optimization

Initial complexity parameter (cp) was limited to .025 in order to avoid variables that would not provide a significant contribution to the model predictiveness if included into the model. This limitation served as a starting point for estimation of the optimal complexity parameter, shown below.

# finding the optimal complexity parameter 

optimal_complexity <- function(x){x$cptable[which.min(x$cptable[,"xerror"]),"CP"]}

# domestic

optimal_complexity(dtr2001) 
## [1] 0.08
optimal_complexity(dtr2002) 
## [1] 0.1875
optimal_complexity(dtr2003) 
## [1] 0.1333333
optimal_complexity(dtr2004)
## [1] 0.07843137
optimal_complexity(dtr2005)
## [1] 0.1333333
optimal_complexity(dtr2006)
## [1] 0.2105263
optimal_complexity(dtr2007)
## [1] 0.09090909
optimal_complexity(dtr2008)
## [1] 0.1764706
optimal_complexity(dtr2009)
## [1] 0.025
optimal_complexity(dtr2010)
## [1] 0.06666667
optimal_complexity(dtr2011)
## [1] 0.02898551
optimal_complexity(dtr2012)
## [1] 0.025
optimal_complexity(dtr2013)
## [1] 0.2608696
# transnational

optimal_complexity(ttr2001)
## [1] 0.1176471
optimal_complexity(ttr2002)
## [1] 0.1388889
optimal_complexity(ttr2003)
## [1] 0.2307692
optimal_complexity(ttr2004)
## [1] 0.09090909
optimal_complexity(ttr2005)
## [1] 0.1388889
optimal_complexity(ttr2006)
## [1] 0.2272727
optimal_complexity(ttr2007)
## [1] 0.025
optimal_complexity(ttr2008)
## [1] 0.1304348
optimal_complexity(ttr2009)
## [1] 0.04
optimal_complexity(ttr2010)
## [1] 0.04166667
optimal_complexity(ttr2011)
## [1] 0.12
optimal_complexity(ttr2012)
## [1] 0.0625
optimal_complexity(ttr2013)
## [1] 0.04545455

The complexity parameters of the decision trees were then optimized according to the obtained results.

Post-optimization training of decision trees

# domestic

set.seed(619)
dtr2001 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .08))
set.seed(619)
dtr2002 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .187))
set.seed(619)
dtr2003 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .133))
set.seed(619)
dtr2004 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .078))
set.seed(619)
dtr2005 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .133))
set.seed(619)
dtr2006 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .210))
set.seed(619)
dtr2007 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .090))
set.seed(619)
dtr2008 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .176))
set.seed(619)
dtr2009 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
dtr2010 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .066))
set.seed(619)
dtr2011 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .028))
set.seed(619)
dtr2012 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
dtr2013 <- rpart(as.factor(dom_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdd2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .026))


# transnational

set.seed(619)
ttr2001 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2001, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .117))
set.seed(619)
ttr2002 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2002, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .138))
set.seed(619)
ttr2003 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2003, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .230))
set.seed(619)
ttr2004 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2004, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .090))
set.seed(619)
ttr2005 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2005, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .138))
set.seed(619)
ttr2006 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2006, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .227))
set.seed(619)
ttr2007 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2007, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .025))
set.seed(619)
ttr2008 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2008, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .130))
set.seed(619)
ttr2009 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2009, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .04))
set.seed(619)
ttr2010 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2010, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .041))
set.seed(619)
ttr2011 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2011, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .12))
set.seed(619)
ttr2012 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2012, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .062))
set.seed(619)
ttr2013 <- rpart(as.factor(trans_a)~.-year-Muslim-HRP-major_violence_ep, data = gtdt2013, method = "class", control = rpart.control(minsplit = 6, maxcompete = 9, xval = 50, cp = .045))

Prediction against the test set

# domestic

preddtr2001 <- predict(dtr2001, gtdd2014, type = "class")
preddtr2002 <- predict(dtr2002, gtdd2014, type = "class")
preddtr2003 <- predict(dtr2003, gtdd2014, type = "class")
preddtr2004 <- predict(dtr2004, gtdd2014, type = "class")
preddtr2005 <- predict(dtr2005, gtdd2014, type = "class")
preddtr2006 <- predict(dtr2006, gtdd2014, type = "class")
preddtr2007 <- predict(dtr2007, gtdd2014, type = "class")
preddtr2008 <- predict(dtr2008, gtdd2014, type = "class")
preddtr2009 <- predict(dtr2009, gtdd2014, type = "class")
preddtr2010 <- predict(dtr2010, gtdd2014, type = "class")
preddtr2011 <- predict(dtr2011, gtdd2014, type = "class")
preddtr2012 <- predict(dtr2012, gtdd2014, type = "class")
preddtr2013 <- predict(dtr2013, gtdd2014, type = "class")

# transnational

predttr2001 <- predict(ttr2001, gtdt2014, type = "class")
predttr2002 <- predict(ttr2002, gtdt2014, type = "class")
predttr2003 <- predict(ttr2003, gtdt2014, type = "class")
predttr2004 <- predict(ttr2004, gtdt2014, type = "class")
predttr2005 <- predict(ttr2005, gtdt2014, type = "class")
predttr2006 <- predict(ttr2006, gtdt2014, type = "class")
predttr2007 <- predict(ttr2007, gtdt2014, type = "class")
predttr2008 <- predict(ttr2008, gtdt2014, type = "class")
predttr2009 <- predict(ttr2009, gtdt2014, type = "class")
predttr2010 <- predict(ttr2010, gtdt2014, type = "class")
predttr2011 <- predict(ttr2011, gtdt2014, type = "class")
predttr2012 <- predict(ttr2012, gtdt2014, type = "class")
predttr2013 <- predict(ttr2013, gtdt2014, type = "class")

Model ensambling and final evaluation

# domestic
ensdtr <- cbind(preddtr2001, preddtr2002, preddtr2003, preddtr2004, preddtr2005, preddtr2006, preddtr2007, preddtr2008, preddtr2009, preddtr2010, preddtr2011, preddtr2012, preddtr2013)
ensdtr1 <- rowMeans(ensdtr)
ensdtr_fin <- ifelse(ensdtr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensdtr_fin), as.factor(gtdd2014$dom_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 26 10
##          1  3 13
##                                           
##                Accuracy : 0.75            
##                  95% CI : (0.6105, 0.8597)
##     No Information Rate : 0.5577          
##     P-Value [Acc > NIR] : 0.003303        
##                                           
##                   Kappa : 0.4768          
##  Mcnemar's Test P-Value : 0.096092        
##                                           
##             Sensitivity : 0.8966          
##             Specificity : 0.5652          
##          Pos Pred Value : 0.7222          
##          Neg Pred Value : 0.8125          
##              Prevalence : 0.5577          
##          Detection Rate : 0.5000          
##    Detection Prevalence : 0.6923          
##       Balanced Accuracy : 0.7309          
##                                           
##        'Positive' Class : 0               
## 
# transnational

ensttr <- cbind(predttr2001, predttr2002, predttr2003, predttr2004, predttr2005, predttr2006, predttr2007, predttr2008, predttr2009, predttr2010, predttr2011, predttr2012, predttr2013)
ensttr1 <- rowMeans(ensttr)
ensttr_fin <- ifelse(ensttr1 < 1.50, 0, 1)
confusionMatrix(as.factor(ensttr_fin), as.factor(gtdt2014$trans_a))
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  0  1
##          0 19 14
##          1  3 16
##                                           
##                Accuracy : 0.6731          
##                  95% CI : (0.5289, 0.7967)
##     No Information Rate : 0.5769          
##     P-Value [Acc > NIR] : 0.10225         
##                                           
##                   Kappa : 0.3722          
##  Mcnemar's Test P-Value : 0.01529         
##                                           
##             Sensitivity : 0.8636          
##             Specificity : 0.5333          
##          Pos Pred Value : 0.5758          
##          Neg Pred Value : 0.8421          
##              Prevalence : 0.4231          
##          Detection Rate : 0.3654          
##    Detection Prevalence : 0.6346          
##       Balanced Accuracy : 0.6985          
##                                           
##        'Positive' Class : 0               
## 

Variable importance for domestic terrorism

# with surrogates only

xxx <- varImp(dtr2001, surrogates = T, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = T, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = T, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = T, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = T, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = T, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = T, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = T, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = T, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = T, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = T, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = T, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1)
# with competitors only

xxx <- varImp(dtr2001, surrogates = F, competes = T)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = T)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = T)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = T)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = T)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = T)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = T)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = T)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = T)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = T)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = T)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = T)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot1 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot1) 
# without competitors and surrogates

xxx <- varImp(dtr2001, surrogates = F, competes = F)
vimp1 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2002, surrogates = F, competes = F)
vimp2 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2003, surrogates = F, competes = F)
vimp3 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2004, surrogates = F, competes = F)
vimp4 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2005, surrogates = F, competes = F)
vimp5 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2006, surrogates = F, competes = F)
vimp6 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2007, surrogates = F, competes = F)
vimp7 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2008, surrogates = F, competes = F)
vimp8 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2010, surrogates = F, competes = F)
vimp10 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2011, surrogates = F, competes = F)
vimp11 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2012, surrogates = F, competes = F)
vimp12 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))
xxx <- varImp(dtr2013, surrogates = F, competes = F)
vimp13 <- cbind(rownames(xxx), data.frame(xxx, row.names=NULL))

vimps_d1 <- join_all(list(vimp1, vimp2, vimp3, vimp4, vimp5, vimp6, vimp7, vimp8, vimp10, vimp11, vimp12, vimp13), type = "left", by = "rownames(xxx)")
colnames(vimps_d1) <- c("variable", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008", "2010", "2011", "2012", "2013")
vimps_d1m <- melt(vimps_d1, id.vars = "variable", variable.name = "year")
VarImpPlot2 <- ggplot(vimps_d1m, aes(x = year, y = value, color = variable, shape = variable)) + geom_point(size = 3) + scale_shape_manual(values = c(0,1,2,3,4,5,6,15,17,18)) + scale_color_manual(values = c("plum", "purple", "red", "turquoise", "royalblue", "yellow", "springgreen", "darkgreen", "cornsilk3", "deeppink")) + theme(panel.grid.major = element_blank(), panel.background = element_rect(fill = "white"), axis.line = element_line(size = 0.5, linetype = "solid", color = "black")) + ylab("reduction in the loss function")
ggplotly(VarImpPlot2) 

Location of splits

dtr2001
## n= 53 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 53 15 0 (0.7169811 0.2830189)  
##   2) pop_density< 4.5 44  9 0 (0.7954545 0.2045455) *
##   3) pop_density>=4.5 9  3 1 (0.3333333 0.6666667)  
##     6) polR>=1.5 4  1 0 (0.7500000 0.2500000) *
##     7) polR< 1.5 5  0 1 (0.0000000 1.0000000) *
dtr2002
## n= 53 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 53 16 0 (0.6981132 0.3018868)  
##   2) pop_density< 3.5 35  6 0 (0.8285714 0.1714286) *
##   3) pop_density>=3.5 18  8 1 (0.4444444 0.5555556)  
##     6) GINI>=4.5 4  0 0 (1.0000000 0.0000000) *
##     7) GINI< 4.5 14  4 1 (0.2857143 0.7142857) *
dtr2003
## n= 53 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 53 15 0 (0.7169811 0.2830189)  
##    2) pop_density< 3.5 35  5 0 (0.8571429 0.1428571) *
##    3) pop_density>=3.5 18  8 1 (0.4444444 0.5555556)  
##      6) GINI>=4.5 4  0 0 (1.0000000 0.0000000) *
##      7) GINI< 4.5 14  4 1 (0.2857143 0.7142857)  
##       14) GDP_gr>=4.5 4  1 0 (0.7500000 0.2500000) *
##       15) GDP_gr< 4.5 10  1 1 (0.1000000 0.9000000) *
dtr2004
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 54 17 0 (0.6851852 0.3148148)  
##    2) pop_density< 3.5 36  7 0 (0.8055556 0.1944444)  
##      4) durable>=1.5 28  3 0 (0.8928571 0.1071429) *
##      5) durable< 1.5 8  4 0 (0.5000000 0.5000000)  
##       10) GDP< 2.5 6  2 0 (0.6666667 0.3333333)  
##         20) GINI< 4.5 4  0 0 (1.0000000 0.0000000) *
##         21) GINI>=4.5 2  0 1 (0.0000000 1.0000000) *
##       11) GDP>=2.5 2  0 1 (0.0000000 1.0000000) *
##    3) pop_density>=3.5 18  8 1 (0.4444444 0.5555556)  
##      6) GINI>=4.5 4  0 0 (1.0000000 0.0000000) *
##      7) GINI< 4.5 14  4 1 (0.2857143 0.7142857) *
dtr2005
## n= 55 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 55 15 0 (0.7272727 0.2727273)  
##   2) pop_density< 4.5 45  9 0 (0.8000000 0.2000000) *
##   3) pop_density>=4.5 10  4 1 (0.4000000 0.6000000)  
##     6) GINI>=1.5 6  2 0 (0.6666667 0.3333333) *
##     7) GINI< 1.5 4  0 1 (0.0000000 1.0000000) *
dtr2006
## n= 55 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 55 19 0 (0.6545455 0.3454545)  
##   2) pop_density< 4.5 45 12 0 (0.7333333 0.2666667) *
##   3) pop_density>=4.5 10  3 1 (0.3000000 0.7000000) *
dtr2007
## n= 55 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 55 22 0 (0.6000000 0.4000000)  
##   2) durable>=1.5 40 11 0 (0.7250000 0.2750000)  
##     4) pop_density< 4.5 36  8 0 (0.7777778 0.2222222) *
##     5) pop_density>=4.5 4  1 1 (0.2500000 0.7500000) *
##   3) durable< 1.5 15  4 1 (0.2666667 0.7333333) *
dtr2008
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
## 1) root 54 17 0 (0.6851852 0.3148148)  
##   2) pop_density< 4.5 45 11 0 (0.7555556 0.2444444) *
##   3) pop_density>=4.5 9  3 1 (0.3333333 0.6666667) *
dtr2009
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 54 16 0 (0.70370370 0.29629630)  
##    2) pop_density< 4.5 44  9 0 (0.79545455 0.20454545)  
##      4) durable>=2.5 28  2 0 (0.92857143 0.07142857) *
##      5) durable< 2.5 16  7 0 (0.56250000 0.43750000)  
##       10) civL< 1.5 12  3 0 (0.75000000 0.25000000)  
##         20) GDP_gr>=1.5 6  0 0 (1.00000000 0.00000000) *
##         21) GDP_gr< 1.5 6  3 0 (0.50000000 0.50000000)  
##           42) durable< 1.5 4  1 0 (0.75000000 0.25000000) *
##           43) durable>=1.5 2  0 1 (0.00000000 1.00000000) *
##       11) civL>=1.5 4  0 1 (0.00000000 1.00000000) *
##    3) pop_density>=4.5 10  3 1 (0.30000000 0.70000000)  
##      6) GDP_gr< 1.5 4  1 0 (0.75000000 0.25000000) *
##      7) GDP_gr>=1.5 6  0 1 (0.00000000 1.00000000) *
dtr2010
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 54 15 0 (0.72222222 0.27777778)  
##    2) pop_density< 4.5 43  8 0 (0.81395349 0.18604651)  
##      4) durable>=3.5 23  1 0 (0.95652174 0.04347826) *
##      5) durable< 3.5 20  7 0 (0.65000000 0.35000000)  
##       10) GDP_gr>=2.5 17  4 0 (0.76470588 0.23529412)  
##         20) civL< 1.5 14  2 0 (0.85714286 0.14285714) *
##         21) civL>=1.5 3  1 1 (0.33333333 0.66666667) *
##       11) GDP_gr< 2.5 3  0 1 (0.00000000 1.00000000) *
##    3) pop_density>=4.5 11  4 1 (0.36363636 0.63636364)  
##      6) GINI>=3.5 4  1 0 (0.75000000 0.25000000) *
##      7) GINI< 3.5 7  1 1 (0.14285714 0.85714286) *
dtr2011
## n= 54 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 54 23 0 (0.57407407 0.42592593)  
##     2) education>=4.5 12  1 0 (0.91666667 0.08333333) *
##     3) education< 4.5 42 20 1 (0.47619048 0.52380952)  
##       6) GINI>=3.5 17  5 0 (0.70588235 0.29411765)  
##        12) durable>=1.5 15  3 0 (0.80000000 0.20000000)  
##          24) unemployment< 4.5 13  1 0 (0.92307692 0.07692308) *
##          25) unemployment>=4.5 2  0 1 (0.00000000 1.00000000) *
##        13) durable< 1.5 2  0 1 (0.00000000 1.00000000) *
##       7) GINI< 3.5 25  8 1 (0.32000000 0.68000000)  
##        14) HDI< 1.5 5  1 0 (0.80000000 0.20000000) *
##        15) HDI>=1.5 20  4 1 (0.20000000 0.80000000)  
##          30) unemployment>=3.5 11  4 1 (0.36363636 0.63636364)  
##            60) civL< 1.5 7  3 0 (0.57142857 0.42857143)  
##             120) durable< 1.5 2  0 0 (1.00000000 0.00000000) *
##             121) durable>=1.5 5  2 1 (0.40000000 0.60000000) *
##            61) civL>=1.5 4  0 1 (0.00000000 1.00000000) *
##          31) unemployment< 3.5 9  0 1 (0.00000000 1.00000000) *
dtr2012
## n= 52 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 52 23 0 (0.55769231 0.44230769)  
##    2) durable>=2.5 32  9 0 (0.71875000 0.28125000)  
##      4) polR>=1.5 16  1 0 (0.93750000 0.06250000) *
##      5) polR< 1.5 16  8 0 (0.50000000 0.50000000)  
##       10) GDP_gr>=4.5 3  0 0 (1.00000000 0.00000000) *
##       11) GDP_gr< 4.5 13  5 1 (0.38461538 0.61538462)  
##         22) GDP< 2.5 4  1 0 (0.75000000 0.25000000) *
##         23) GDP>=2.5 9  2 1 (0.22222222 0.77777778)  
##           46) durable>=4.5 3  1 0 (0.66666667 0.33333333) *
##           47) durable< 4.5 6  0 1 (0.00000000 1.00000000) *
##    3) durable< 2.5 20  6 1 (0.30000000 0.70000000)  
##      6) HDI< 2.5 7  2 0 (0.71428571 0.28571429)  
##       12) polR< 1.5 4  0 0 (1.00000000 0.00000000) *
##       13) polR>=1.5 3  1 1 (0.33333333 0.66666667) *
##      7) HDI>=2.5 13  1 1 (0.07692308 0.92307692) *
dtr2013
## n= 52 
## 
## node), split, n, loss, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 52 23 0 (0.5576923 0.4423077)  
##    2) durable>=1.5 42 15 0 (0.6428571 0.3571429)  
##      4) polR>=1.5 20  3 0 (0.8500000 0.1500000)  
##        8) pop_density< 2.5 11  0 0 (1.0000000 0.0000000) *
##        9) pop_density>=2.5 9  3 0 (0.6666667 0.3333333)  
##         18) GDP_gr< 2.5 3  0 0 (1.0000000 0.0000000) *
##         19) GDP_gr>=2.5 6  3 0 (0.5000000 0.5000000)  
##           38) unemployment>=3.5 2  0 0 (1.0000000 0.0000000) *
##           39) unemployment< 3.5 4  1 1 (0.2500000 0.7500000) *
##      5) polR< 1.5 22 10 1 (0.4545455 0.5454545)  
##       10) GINI>=3.5 10  3 0 (0.7000000 0.3000000)  
##         20) unemployment< 4 8  1 0 (0.8750000 0.1250000) *
##         21) unemployment>=4 2  0 1 (0.0000000 1.0000000) *
##       11) GINI< 3.5 12  3 1 (0.2500000 0.7500000)  
##         22) HDI< 1.5 2  0 0 (1.0000000 0.0000000) *
##         23) HDI>=1.5 10  1 1 (0.1000000 0.9000000) *
##    3) durable< 1.5 10  2 1 (0.2000000 0.8000000)  
##      6) education< 1.5 3  1 0 (0.6666667 0.3333333) *
##      7) education>=1.5 7  0 1 (0.0000000 1.0000000) *

These results confirm that interplay of different forms of inequality is also related to incidence of domestic terrorist attacks.

Brief conclusions

  • from the spectre of inequality indicators, protection of human rights and major episodes of political violence seem to be the most relevant predictors of domestic terrorism worldwide
  • however, the inequality-(domestic) terrorism relationship seems to stem from Asia, as it is the only continent where terrorism can be predicted by inequality variables
  • the two strongest predictors of terrorism, protection of human rights and major episodes of political violence, also predict terrorism well in countries with large number of Muslim citizens
  • in Asia and Muslim countries, a plethora of inequality indices seems to be related to domestic terrorism
  • transnational terrorism was not successfully predicted by the chosen inequality indicators

Thank you for your time!