Which factors predict whether a suicide attack is claimed by a terrorist organization?
The data set I am investigating is called “suicide_attacks.csv”. The source of the data set is the CORGIS Dataset Project found on https://corgis-edu.github.io/corgis/csv/suicide_attacks/ . This data set is on The Chicago Project on Security and Terrorism (CPOST) maintaining a searchable database on all suicide attacks from 1982 through October 2020. Additionally providing information on the location of attacks, the target type, the weapon used, and systematic information on the demographic and general biographical characteristics of suicide attackers. I choose this data set because I thought it would be interesting to know what predicts whether a suicide attack is claimed by a terrorist organization. I choose a logistic regression analysis because the outcome is binary whether an attack is claimed(1) not (0). In my data set there is 10018 observations and 39 variables.
claim (categorical): I converted it to binary.
Predictors These 5 predictor were chosen because they capture. different characteristics that could influence whether an attack is claimed.
statistics_#_killed_high(numerical): The number of people that were killed and represents the highest estimated death count.
statistics_#_attackers(numerical): The number of attackers involved to know whether it was organized or a measured attack. This displays how coordinated the attack was and the capacity of attackers, higher number could indicate more planned attacks.
target_type(categorical): The victim or target of the attack. This is either civilian, political or security could reflect specific target goals.
target_region(categorical) : The target area of the attack. In this data set there are four regions Asia,Americas, Europe and Africa. This covers the geographic differences in suicide attack claiming patterns.
date_year(numerical): The year in which the attack occurred. This predictor investigates claiming behavior over time, to predict claiming behavior in years to come.
library(tidyverse)
setwd("~/Desktop/Data 101")
suicide_attacks <-read_csv("suicide_attacks.csv")
I started to analyze my data by checking the head and structure using the “head” and “str” function. I then cleaned the data by removing the ‘.’ between the words using ‘gsub’ and replacing it with underscores. After, I checked for NA’s using ‘colSums’ and I noticed that my observations remained the same therefore I had no NA’s. I then checked how many different target types there were using the ‘table’ function and saw that there was only 4 for the unknown target types. Additionally, I used dplyr functions such as filter, select and mutate.First, I renamed the variables for ’statistics_#killed_high’ and ’statistics#_attackers(numerical)’ to num_killed and num_attackers. Then, I filtered out the target types that were unknown because it had significantly less than the other categories using filter(target_type != “Unknown”). Next, I used select to select the variables I was focused on such as claim and my predictors;num_killed, num_attackers, target_type, date_year, target_region. Lastly I mutated to convert the claims to binary either 1 for claimed and 0 for the rest. The types of plot I will generate to aid me in answering my research question is a ROC and AUC Curve.
head(suicide_attacks)
## # A tibble: 6 × 39
## groups claim status statistics.sources date.year date.month date.day
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Islamic State Susp… Confi… 2 2015 6 2
## 2 Islamic State Susp… Possi… 3 2017 1 6
## 3 Islamic State Susp… Possi… 3 2017 1 6
## 4 Unknown Group Uncl… Confi… 4 2004 10 5
## 5 Taliban (IEA) Clai… Possi… 5 2017 7 4
## 6 Al-Jaysh al-Isl… Clai… Confi… 4 2012 10 3
## # ℹ 32 more variables: `statistics.# wounded_low` <dbl>,
## # `statistics.# wounded_high` <dbl>, `statistics.# killed_low` <dbl>,
## # `statistics.# killed_high` <dbl>, `statistics.# killed_low_civilian` <dbl>,
## # `statistics.# killed_high_civilian` <dbl>,
## # `statistics.# killed_low_political` <dbl>,
## # `statistics.# killed_high_political` <dbl>,
## # `statistics.# killed_low_security` <dbl>, …
str(suicide_attacks)
## spc_tbl_ [10,018 × 39] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
## $ groups : chr [1:10018] "Islamic State" "Islamic State" "Islamic State" "Unknown Group" ...
## $ claim : chr [1:10018] "Suspected" "Suspected" "Suspected" "Unclaimed" ...
## $ status : chr [1:10018] "Confirmed Suicide" "Possible - Too Few Sources" "Possible - Too Few Sources" "Confirmed Suicide" ...
## $ statistics.sources : num [1:10018] 2 3 3 4 5 4 4 4 4 4 ...
## $ date.year : num [1:10018] 2015 2017 2017 2004 2017 ...
## $ date.month : num [1:10018] 6 1 1 10 7 10 10 10 10 10 ...
## $ date.day : num [1:10018] 2 6 6 5 4 3 3 3 3 3 ...
## $ statistics.# wounded_low : num [1:10018] 8 0 0 10 2 100 100 100 100 100 ...
## $ statistics.# wounded_high : num [1:10018] 8 0 0 15 2 120 120 120 120 120 ...
## $ statistics.# killed_low : num [1:10018] 5 40 40 1 0 31 31 31 31 31 ...
## $ statistics.# killed_high : num [1:10018] 5 40 40 10 0 40 40 40 40 40 ...
## $ statistics.# killed_low_civilian : num [1:10018] 0 20 20 1 0 31 31 31 31 31 ...
## $ statistics.# killed_high_civilian : num [1:10018] 0 20 20 10 0 40 40 40 40 40 ...
## $ statistics.# killed_low_political : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
## $ statistics.# killed_high_political: num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
## $ statistics.# killed_low_security : num [1:10018] 5 20 20 0 0 0 0 0 0 0 ...
## $ statistics.# killed_high_security : num [1:10018] 5 20 20 0 0 0 0 0 0 0 ...
## $ statistics.# belt_bomb : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
## $ statistics.# truck_bomb : num [1:10018] 0 0 0 0 1 0 0 0 0 0 ...
## $ statistics.# car_bomb : num [1:10018] 1 0 0 1 0 1 1 1 1 1 ...
## $ statistics.# weapon_oth : num [1:10018] 0 1 1 0 0 0 0 0 0 0 ...
## $ statistics.# weapon_unk : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
## $ target.weapon : chr [1:10018] "Car bomb" "Unspecified" "Unspecified" "Car bomb" ...
## $ target.region : chr [1:10018] "Asia" "Asia" "Asia" "Asia" ...
## $ target.subregion : chr [1:10018] "Western Asia" "Western Asia" "Western Asia" "Western Asia" ...
## $ target.country : chr [1:10018] "Syria" "Syria" "Syria" "Iraq" ...
## $ target.province : chr [1:10018] "Hasaka (Al Haksa)" "Deir ez-Zor" "Deir ez-Zor" "Baghdad" ...
## $ target.city : chr [1:10018] "Al Hasakah" "Deir ez-Zor" "Deir ez-Zor" "Baghdad" ...
## $ target.location : chr [1:10018] "close to a children's hospital" "Route between City & Deir ez-Zor Airport" "Route between City & Deir ez-Zor Airport" "Al Dora neighborhood, near refinery and cathedral" ...
## $ target.latitude : num [1:10018] 36.5 35.3 35.3 33.3 31.8 ...
## $ target.longtitude : num [1:10018] 40.8 40.1 40.1 44.4 64.5 ...
## $ target.desc : chr [1:10018] "Syrian Army checkpoint" "Syrian regime forces" "Syrian regime forces" "Iraqi Police patrol" ...
## $ target.type : chr [1:10018] "Security" "Security" "Security" "Security" ...
## $ target.nationality : chr [1:10018] "Syrian" "Syrian" "Syrian" "Iraqi" ...
## $ statistics.# attackers : num [1:10018] 1 2 2 1 1 3 3 3 3 3 ...
## $ statistics.# female_attackers : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
## $ statistics.# male_attackers : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
## $ statistics.# unknown_attackers : num [1:10018] 1 2 2 1 1 3 3 3 3 3 ...
## $ attacker.gender : chr [1:10018] "Unknown" "Unknown" "Unknown" "Unknown" ...
## - attr(*, "spec")=
## .. cols(
## .. groups = col_character(),
## .. claim = col_character(),
## .. status = col_character(),
## .. statistics.sources = col_double(),
## .. date.year = col_double(),
## .. date.month = col_double(),
## .. date.day = col_double(),
## .. `statistics.# wounded_low` = col_double(),
## .. `statistics.# wounded_high` = col_double(),
## .. `statistics.# killed_low` = col_double(),
## .. `statistics.# killed_high` = col_double(),
## .. `statistics.# killed_low_civilian` = col_double(),
## .. `statistics.# killed_high_civilian` = col_double(),
## .. `statistics.# killed_low_political` = col_double(),
## .. `statistics.# killed_high_political` = col_double(),
## .. `statistics.# killed_low_security` = col_double(),
## .. `statistics.# killed_high_security` = col_double(),
## .. `statistics.# belt_bomb` = col_double(),
## .. `statistics.# truck_bomb` = col_double(),
## .. `statistics.# car_bomb` = col_double(),
## .. `statistics.# weapon_oth` = col_double(),
## .. `statistics.# weapon_unk` = col_double(),
## .. target.weapon = col_character(),
## .. target.region = col_character(),
## .. target.subregion = col_character(),
## .. target.country = col_character(),
## .. target.province = col_character(),
## .. target.city = col_character(),
## .. target.location = col_character(),
## .. target.latitude = col_double(),
## .. target.longtitude = col_double(),
## .. target.desc = col_character(),
## .. target.type = col_character(),
## .. target.nationality = col_character(),
## .. `statistics.# attackers` = col_double(),
## .. `statistics.# female_attackers` = col_double(),
## .. `statistics.# male_attackers` = col_double(),
## .. `statistics.# unknown_attackers` = col_double(),
## .. attacker.gender = col_character()
## .. )
## - attr(*, "problems")=<externalptr>
names(suicide_attacks) <- gsub("[(). \\-]", "_", names(suicide_attacks)) # replace ., (), space, with dash
names(suicide_attacks) <- gsub("_$", "", names(suicide_attacks)) # remove trailing underscore
head(suicide_attacks) #verify
## # A tibble: 6 × 39
## groups claim status statistics_sources date_year date_month date_day
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Islamic State Susp… Confi… 2 2015 6 2
## 2 Islamic State Susp… Possi… 3 2017 1 6
## 3 Islamic State Susp… Possi… 3 2017 1 6
## 4 Unknown Group Uncl… Confi… 4 2004 10 5
## 5 Taliban (IEA) Clai… Possi… 5 2017 7 4
## 6 Al-Jaysh al-Isl… Clai… Confi… 4 2012 10 3
## # ℹ 32 more variables: `statistics_#_wounded_low` <dbl>,
## # `statistics_#_wounded_high` <dbl>, `statistics_#_killed_low` <dbl>,
## # `statistics_#_killed_high` <dbl>, `statistics_#_killed_low_civilian` <dbl>,
## # `statistics_#_killed_high_civilian` <dbl>,
## # `statistics_#_killed_low_political` <dbl>,
## # `statistics_#_killed_high_political` <dbl>,
## # `statistics_#_killed_low_security` <dbl>, …
colSums(is.na(suicide_attacks))
## groups claim
## 0 0
## status statistics_sources
## 0 0
## date_year date_month
## 0 0
## date_day statistics_#_wounded_low
## 0 0
## statistics_#_wounded_high statistics_#_killed_low
## 0 0
## statistics_#_killed_high statistics_#_killed_low_civilian
## 0 0
## statistics_#_killed_high_civilian statistics_#_killed_low_political
## 0 0
## statistics_#_killed_high_political statistics_#_killed_low_security
## 0 0
## statistics_#_killed_high_security statistics_#_belt_bomb
## 0 0
## statistics_#_truck_bomb statistics_#_car_bomb
## 0 0
## statistics_#_weapon_oth statistics_#_weapon_unk
## 0 0
## target_weapon target_region
## 0 0
## target_subregion target_country
## 0 0
## target_province target_city
## 0 0
## target_location target_latitude
## 0 0
## target_longtitude target_desc
## 0 0
## target_type target_nationality
## 0 0
## statistics_#_attackers statistics_#_female_attackers
## 0 0
## statistics_#_male_attackers statistics_#_unknown_attackers
## 0 0
## attacker_gender
## 0
There were no NA’s therefore I did not need to filter out NA’s.
table(suicide_attacks$target_type)
##
## Civilian Political Security Unknown
## 2386 1080 6548 4
Unknown had to little values in its category therefore I filtered it out.
filtering_suicide_attacks<-suicide_attacks |>
rename(num_killed =`statistics_#_killed_high`, num_attackers =`statistics_#_attackers`) |> #Learned in data 110
filter(target_type != "Unknown") |> # removing unknown as a target type because it only had 4
select(claim,num_killed, num_attackers, target_type, date_year, target_region)
filtering_suicide_attacks
## # A tibble: 10,014 × 6
## claim num_killed num_attackers target_type date_year target_region
## <chr> <dbl> <dbl> <chr> <dbl> <chr>
## 1 Suspected 5 1 Security 2015 Asia
## 2 Suspected 40 2 Security 2017 Asia
## 3 Suspected 40 2 Security 2017 Asia
## 4 Unclaimed 10 1 Security 2004 Asia
## 5 Claimed 0 1 Security 2017 Asia
## 6 Claimed 40 3 Civilian 2012 Asia
## 7 Claimed 40 3 Civilian 2012 Asia
## 8 Claimed 40 3 Civilian 2012 Asia
## 9 Claimed 40 3 Civilian 2012 Asia
## 10 Claimed 40 3 Civilian 2012 Asia
## # ℹ 10,004 more rows
table(filtering_suicide_attacks$claim)
##
## Claimed Denied Suspected Unclaimed
## 4237 128 1438 4211
suicide_attacks_clean <-filtering_suicide_attacks |>
mutate(claim = ifelse(claim =="Claimed", 1,0), #making claim to be binary for the logistic regression model
target_type=as.factor(target_type),
target_region=as.factor(target_region))
suicide_attacks_clean
## # A tibble: 10,014 × 6
## claim num_killed num_attackers target_type date_year target_region
## <dbl> <dbl> <dbl> <fct> <dbl> <fct>
## 1 0 5 1 Security 2015 Asia
## 2 0 40 2 Security 2017 Asia
## 3 0 40 2 Security 2017 Asia
## 4 0 10 1 Security 2004 Asia
## 5 1 0 1 Security 2017 Asia
## 6 1 40 3 Civilian 2012 Asia
## 7 1 40 3 Civilian 2012 Asia
## 8 1 40 3 Civilian 2012 Asia
## 9 1 40 3 Civilian 2012 Asia
## 10 1 40 3 Civilian 2012 Asia
## # ℹ 10,004 more rows
For the logistic I first did quality control to make sure my categorical predictors had at least 5 in each category therefore I was able to use the fitted logistic regression model to predict whether an attack is claimed or not, we can use the predictors that are statistically significant (p<0.05) and interpret the effects on attack being claimed. Next I did checked the R^2 value to explain the variation in claimed attacked, based on number of attackers, number of people killed, target type, year and target_region. Furthermore, I analyzed the P-value to understand if the model is statistically significant. Lastly, I did a confusion matrix, Key performance metrics to understand the errors and predicting strengths and limitations of the model.
xtabs(~ claim + num_attackers, data=suicide_attacks_clean)
## num_attackers
## claim 1 2 3 4 5 6 7 8 9 10 13 15 50
## 0 3941 934 372 200 120 36 28 40 18 60 13 15 0
## 1 2771 674 327 136 115 72 28 16 0 20 13 15 50
xtabs(~ claim + target_type, data=suicide_attacks_clean)
## target_type
## claim Civilian Political Security
## 0 1566 518 3693
## 1 820 562 2855
xtabs(~ claim + num_killed, data=suicide_attacks_clean)
## num_killed
## claim -1 0 1 2 3 4 5 6 7 8 9 10 11 12
## 0 20 1298 527 522 474 369 350 231 243 194 107 181 119 92
## 1 17 722 288 347 265 286 261 196 168 114 108 142 90 130
## num_killed
## claim 13 14 15 16 17 18 19 20 21 22 23 24 25 26
## 0 79 51 87 50 63 53 29 75 22 23 27 15 52 22
## 1 60 67 85 61 61 51 34 53 36 23 31 19 27 15
## num_killed
## claim 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 0 18 12 4 40 10 15 12 8 12 7 8 4 8 26
## 1 21 20 13 37 18 11 6 6 32 7 8 5 1 74
## num_killed
## claim 41 42 43 44 45 46 47 48 49 50 51 52 53 54
## 0 13 8 6 2 8 2 1 7 6 15 3 1 8 4
## 1 5 4 6 6 6 17 2 4 1 5 1 3 3 7
## num_killed
## claim 55 56 57 58 59 60 61 62 63 64 65 66 67 68
## 0 6 6 4 6 1 7 7 0 1 0 2 1 4 4
## 1 3 3 2 6 4 9 9 3 0 3 1 2 1 3
## num_killed
## claim 69 70 71 72 73 74 75 76 78 80 83 84 85 88
## 0 1 11 3 0 3 0 1 4 2 2 2 0 6 4
## 1 1 3 1 5 1 2 0 4 0 1 7 1 1 1
## num_killed
## claim 89 90 91 92 93 94 95 96 100 101 102 103 105 106
## 0 0 2 2 0 0 0 0 2 4 0 2 0 0 0
## 1 3 1 1 1 2 1 2 0 3 1 3 1 2 2
## num_killed
## claim 114 115 117 118 120 125 126 130 135 143 149 150 152 156
## 0 0 0 1 0 3 0 1 0 1 1 0 7 0 1
## 1 1 2 0 2 1 1 0 1 0 1 1 1 2 3
## num_killed
## claim 165 184 200 202 213 241 250 324 358 2753
## 0 1 5 0 0 0 2 0 0 1 10
## 1 0 5 13 2 1 1 2 1 0 10
xtabs(~ claim + target_region, data=suicide_attacks_clean)
## target_region
## claim Africa Americas Asia Europe
## 0 919 22 4738 98
## 1 442 16 3718 61
xtabs(~ claim + date_year, data=suicide_attacks_clean)
## date_year
## claim 1974 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993
## 0 0 0 0 5 1 3 1 0 0 0 0 3 1 2
## 1 3 3 2 4 2 40 2 3 3 2 7 7 0 20
## date_year
## claim 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
## 0 4 5 2 10 6 4 15 25 14 88 129 269 285 448
## 1 16 31 23 19 26 38 34 71 71 63 135 159 154 174
## date_year
## claim 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
## 0 320 221 168 194 212 396 525 787 553 579 303 199
## 1 195 171 206 153 199 264 403 313 446 416 188 171
Each category has more more than 5 in both groups, so this variable is safe to use in modeling — no need to drop or merge any levels.
logistic <- glm(claim ~num_attackers+num_killed+target_type+date_year+target_region, data=suicide_attacks_clean, family="binomial")
summary(logistic)
##
## Call:
## glm(formula = claim ~ num_attackers + num_killed + target_type +
## date_year + target_region, family = "binomial", data = suicide_attacks_clean)
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 73.5021814 7.6032238 9.667 < 2e-16 ***
## num_attackers 0.0524707 0.0075206 6.977 3.02e-12 ***
## num_killed 0.0007869 0.0002855 2.756 0.005853 **
## target_typePolitical 0.6876375 0.0761571 9.029 < 2e-16 ***
## target_typeSecurity 0.3224261 0.0520843 6.190 6.00e-10 ***
## date_year -0.0369949 0.0037745 -9.801 < 2e-16 ***
## target_regionAmericas -1.6511216 0.6300264 -2.621 0.008774 **
## target_regionAsia 0.2417477 0.0657500 3.677 0.000236 ***
## target_regionEurope -0.0941257 0.1776315 -0.530 0.596186
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 13645 on 10013 degrees of freedom
## Residual deviance: 13323 on 10005 degrees of freedom
## AIC: 13341
##
## Number of Fisher Scoring iterations: 4
Interpretation Using the fitted logistic regression model to predict whether an attack is claimed or not, we can use the predictors that are statistically significant (p<0.05) and interpret the effects on attack being claimed. The meaningful log-odds ratios for the variables are down below.
The number of attackers 0.0524707 (p=3.02e-12): Number of attackers increases the log odds of an attack being claimed. This predictor is statistically significant meaning that number of attacker is important in predicting whether an attack is claimed.
The number of people killed in an attack 0.0007869 (p= 0.005853 ) Number of people killed in an attack increases the log odds of an attack being claimed however it is slight. This predictor is statistically significant meaning that number of people killed plays a small role in predicting whether an attack is claimed or not.
target_type Political 0.6876375 (p=< 2e-16) Political targets have a higher log-odds of an attack being claimed compared to the reference which is a civilian. This is highly significant and statistically meaningful.
target_type Security 0.322 (p=6.00e-10) Security targets have a higher log- odds of being an attack being claimed compared to a civilian. This is very significant.
Date_year = -0.037 (p = < 2e-16) Strong negative effect, very significant, the date year of the attack is significant in determining whether an attack is claimed or not.
target_region Americas = -1.65 (p=0.008774 ) There is a high negative effect of target region in America in determining whether an attack is claimed or not as compared to the reference Africa, this is statistically significant.
target_region Asia = 0.24 (p=0.000236) There is an effect of target region in Asia in determining whether an attack is claimed or not as compared to the reference Africa, this is statistically significant.
target_regionEurope = -0.0941257 p=0.596186
Slight negative effect, but not significant — Europe as a region alone,
adjusted for other variables, doesn’t add much here.
The overall model has a lower residual deviance than the null deviance, this means that the predictors above do improve the model fit compared to an intercept-only model,
r_square <- 1 - (logistic$deviance/logistic$null.deviance)
r_square
## [1] 0.02353615
The R^2 is 2.3 % s means that the model explains about 2.3% of the variation in claimed attacked, based on number of attackers, number of people killed, target type, year and target_region. This low value is often in logistic regressions due to behavioral data set in which a terrorist claims an attack. A low R^2 does not mean the model is incorrect, this just indicates that claiming a suicide attacks has many other behavioral and social factors. Disclaimer I am aware that the R^2 is low and that my model could favor from having other predictors in this model to improve the explanatory power.
1 - pchisq((logistic$null.deviance - logistic$deviance), df=(length(logistic$coefficients) -1))
## [1] 0
The P-value is 0 meaning that the model is extremely significant.
predicted.probs <- logistic$fitted.values
predicted.classes <- ifelse (predicted.probs > 0.5, 1, 0)
confusion <- table(
Predicted = factor (predicted.classes, levels = c(0, 1)),
Actual = factor (suicide_attacks_clean$claim, levels = c(0, 1)))
confusion
## Actual
## Predicted 0 1
## 0 5192 3455
## 1 585 782
5192 the attack was not claimed, and the model correctly predicted them as not claimed. This is a True Negative.
3455 attacks were claimed, but the model mistakenly predicted them as not claimed. This is a False Negative.
585 attacks were not claimed, but the model mistakenly predicted them as claimed. This is a False Positive.
782 attacks were actually claimed, and the model correctly predicted them as claimed. This is a True Positive.
#Extract Values:
TN <- 5192
FP <- 585
FN <- 3455
TP <- 782
#Metrics
accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN) # also called recall or true positive rate
specificity <- TN / (TN + FP) # true negative rate
precision <- TP / (TP + FP) # positive predictive value
f1_score <- 2 * (precision * sensitivity) / (precision + sensitivity)
cat("Accuracy:", round(accuracy, 3), "\nSensitivity:", round(sensitivity, 3), "\nSpecificity:", round(specificity, 3), "\nPrecision:", round(precision, 3))
## Accuracy: 0.597
## Sensitivity: 0.185
## Specificity: 0.899
## Precision: 0.572
The model has an accuracy of 59.7% meaning it performs which is slightly above the baseline and that it correctly identifies some cases. Overall, having being a weak predictive model.
The model is better at identifying attacks not claimed because it has a high specificity of 89.9% than detecting suicide attacks that were claimed. The sensitivity is 18.5% which is poor meaning that the model struggles at determining attacks that were claimed and misses a lot of attacks that are claimed.
The precision is 57.2 percent showing that the positive predictions are weakly correct, the lower sensitivity suggest that the model misses a lot of claims.
In conclusion, the model is better at identify suicide attacks that were not claimed than cases of attacks that were claimed, this is important to consider because in research context being able to identify attacks which are claimed is far more meaningful than then attacks that are not. The model’s inability to identify claimed attack suggest that the predictors included are not sufficient to predict claims. This highlights how terrorist that claim attacks is not just dependent on organizational identity, strategic grouping, area of target and the year in which it happened.
library(pROC)
## Type 'citation("pROC")' for a citation.
##
## Attaching package: 'pROC'
## The following objects are masked from 'package:stats':
##
## cov, smooth, var
# ROC curve & AUC on full data
roc_obj <- roc(response = suicide_attacks_clean$claim,
predictor = logistic$fitted.values,
levels = c(0, 1),
direction = "<")
# Print AUC value
auc_val <- auc(roc_obj); auc_val
## Area under the curve: 0.5784
# Plot ROC with AUC displayed
plot.roc(roc_obj, print.auc = TRUE, legacy.axes = TRUE,
xlab = "False Positive Rate (1 - Specificity)",
ylab = "True Positive Rate (Sensitivity)")
The AUC = 0.578 means that the model is marginally better than baseline at distinguishing between claimed attacks and unclaimed attacks.
On the plot, the curve is above the diagonal “random guess” line, which shows the model is better than chance.
In plain words: if you randomly pick one claimed attack and one unclaimed attack, the model has about a 57% chance of ranking the claimed suicide attack higher (more likely to have claimed attack).
Overall, this logistic regression model showcased 5 different predictors of terrorist attacks. Target type was noted as the strongest predictor of a suicide attack being claimed. In particular, political targets had the highest log odds of an attack being claimed as compared to the reference civilians. The target region differences in Asia had the highest claiming rates when compared to Africa while America had the lowest. All values were statistically significant except the target region Europe. However, the model had several predictive constraints. he model is better at identifying attacks not claimed because it has a high specificity of 89.9% than detecting suicide attacks that were claimed. The sensitivity is 18.5% which is poor meaning that the model struggles at determining attacks that were claimed and misses a lot of attacks that are claimed. The precision is 57.2 percent showing that the positive predictions are weakly correct, the lower sensitivity suggest that the model misses a lot of claims.
As stated earlier, the R^2 is 2.3 % means that the model explains about 2.3% of the variation in claimed attacked, based on number of attackers, number of people killed, target type, year and target_region. This low value is often in logistic regressions due to behavioral data set in which a terrorist claims an attack.
Additionally, the AUC is 0.578 means that the model is slightly better than baseline at distinguishing between claimed attacks and unclaimed attacks. Thus, if you randomly pick one claimed attack and one unclaimed attack, the model has about a 57% chance of ranking the claimed suicide attack higher (more likely to have claimed attack)
A potential avenue is using a different data set on suicide attack claims that incorporates different factors since I had tried to use more factors but it was not helpful in this model and it only explains about 2.3% of the variation. A low R^2 does not mean the model is incorrect. This just indicates that claiming a suicide attacks has many other behavioral and social factors which I could explore using another data set to compare against this data set.