Introduction:

Research question:

Which factors predict whether a suicide attack is claimed by a terrorist organization?

The data set I am investigating is called “suicide_attacks.csv”. The source of the data set is the CORGIS Dataset Project found on https://corgis-edu.github.io/corgis/csv/suicide_attacks/ . This data set is on The Chicago Project on Security and Terrorism (CPOST) maintaining a searchable database on all suicide attacks from 1982 through October 2020. Additionally providing information on the location of attacks, the target type, the weapon used, and systematic information on the demographic and general biographical characteristics of suicide attackers. I choose this data set because I thought it would be interesting to know what predicts whether a suicide attack is claimed by a terrorist organization. I choose a logistic regression analysis because the outcome is binary whether an attack is claimed(1) not (0). In my data set there is 10018 observations and 39 variables.

The variables I choose to specifically focus on were:

claim (categorical): I converted it to binary.

Predictors These 5 predictor were chosen because they capture. different characteristics that could influence whether an attack is claimed.

statistics_#_killed_high(numerical): The number of people that were killed and represents the highest estimated death count.

statistics_#_attackers(numerical): The number of attackers involved to know whether it was organized or a measured attack. This displays how coordinated the attack was and the capacity of attackers, higher number could indicate more planned attacks.

target_type(categorical): The victim or target of the attack. This is either civilian, political or security could reflect specific target goals.

target_region(categorical) : The target area of the attack. In this data set there are four regions Asia,Americas, Europe and Africa. This covers the geographic differences in suicide attack claiming patterns.

date_year(numerical): The year in which the attack occurred. This predictor investigates claiming behavior over time, to predict claiming behavior in years to come.

Loading library

library(tidyverse)

Setting working directory

setwd("~/Desktop/Data 101")
suicide_attacks <-read_csv("suicide_attacks.csv")

Data Analysis:

I started to analyze my data by checking the head and structure using the “head” and “str” function. I then cleaned the data by removing the ‘.’ between the words using ‘gsub’ and replacing it with underscores. After, I checked for NA’s using ‘colSums’ and I noticed that my observations remained the same therefore I had no NA’s. I then checked how many different target types there were using the ‘table’ function and saw that there was only 4 for the unknown target types. Additionally, I used dplyr functions such as filter, select and mutate.First, I renamed the variables for ’statistics_#killed_high’ and ’statistics#_attackers(numerical)’ to num_killed and num_attackers. Then, I filtered out the target types that were unknown because it had significantly less than the other categories using filter(target_type != “Unknown”). Next, I used select to select the variables I was focused on such as claim and my predictors;num_killed, num_attackers, target_type, date_year, target_region. Lastly I mutated to convert the claims to binary either 1 for claimed and 0 for the rest. The types of plot I will generate to aid me in answering my research question is a ROC and AUC Curve.

Analyzing the head and structure of the dataset

head(suicide_attacks)

## # A tibble: 6 × 39
##   groups           claim status statistics.sources date.year date.month date.day
##   <chr>            <chr> <chr>               <dbl>     <dbl>      <dbl>    <dbl>
## 1 Islamic State    Susp… Confi…                  2      2015          6        2
## 2 Islamic State    Susp… Possi…                  3      2017          1        6
## 3 Islamic State    Susp… Possi…                  3      2017          1        6
## 4 Unknown Group    Uncl… Confi…                  4      2004         10        5
## 5 Taliban (IEA)    Clai… Possi…                  5      2017          7        4
## 6 Al-Jaysh al-Isl… Clai… Confi…                  4      2012         10        3
## # ℹ 32 more variables: `statistics.# wounded_low` <dbl>,
## #   `statistics.# wounded_high` <dbl>, `statistics.# killed_low` <dbl>,
## #   `statistics.# killed_high` <dbl>, `statistics.# killed_low_civilian` <dbl>,
## #   `statistics.# killed_high_civilian` <dbl>,
## #   `statistics.# killed_low_political` <dbl>,
## #   `statistics.# killed_high_political` <dbl>,
## #   `statistics.# killed_low_security` <dbl>, …

str(suicide_attacks)

## spc_tbl_ [10,018 × 39] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ groups                            : chr [1:10018] "Islamic State" "Islamic State" "Islamic State" "Unknown Group" ...
##  $ claim                             : chr [1:10018] "Suspected" "Suspected" "Suspected" "Unclaimed" ...
##  $ status                            : chr [1:10018] "Confirmed Suicide" "Possible - Too Few Sources" "Possible - Too Few Sources" "Confirmed Suicide" ...
##  $ statistics.sources                : num [1:10018] 2 3 3 4 5 4 4 4 4 4 ...
##  $ date.year                         : num [1:10018] 2015 2017 2017 2004 2017 ...
##  $ date.month                        : num [1:10018] 6 1 1 10 7 10 10 10 10 10 ...
##  $ date.day                          : num [1:10018] 2 6 6 5 4 3 3 3 3 3 ...
##  $ statistics.# wounded_low          : num [1:10018] 8 0 0 10 2 100 100 100 100 100 ...
##  $ statistics.# wounded_high         : num [1:10018] 8 0 0 15 2 120 120 120 120 120 ...
##  $ statistics.# killed_low           : num [1:10018] 5 40 40 1 0 31 31 31 31 31 ...
##  $ statistics.# killed_high          : num [1:10018] 5 40 40 10 0 40 40 40 40 40 ...
##  $ statistics.# killed_low_civilian  : num [1:10018] 0 20 20 1 0 31 31 31 31 31 ...
##  $ statistics.# killed_high_civilian : num [1:10018] 0 20 20 10 0 40 40 40 40 40 ...
##  $ statistics.# killed_low_political : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
##  $ statistics.# killed_high_political: num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
##  $ statistics.# killed_low_security  : num [1:10018] 5 20 20 0 0 0 0 0 0 0 ...
##  $ statistics.# killed_high_security : num [1:10018] 5 20 20 0 0 0 0 0 0 0 ...
##  $ statistics.# belt_bomb            : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
##  $ statistics.# truck_bomb           : num [1:10018] 0 0 0 0 1 0 0 0 0 0 ...
##  $ statistics.# car_bomb             : num [1:10018] 1 0 0 1 0 1 1 1 1 1 ...
##  $ statistics.# weapon_oth           : num [1:10018] 0 1 1 0 0 0 0 0 0 0 ...
##  $ statistics.# weapon_unk           : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
##  $ target.weapon                     : chr [1:10018] "Car bomb" "Unspecified" "Unspecified" "Car bomb" ...
##  $ target.region                     : chr [1:10018] "Asia" "Asia" "Asia" "Asia" ...
##  $ target.subregion                  : chr [1:10018] "Western Asia" "Western Asia" "Western Asia" "Western Asia" ...
##  $ target.country                    : chr [1:10018] "Syria" "Syria" "Syria" "Iraq" ...
##  $ target.province                   : chr [1:10018] "Hasaka (Al Haksa)" "Deir ez-Zor" "Deir ez-Zor" "Baghdad" ...
##  $ target.city                       : chr [1:10018] "Al Hasakah" "Deir ez-Zor" "Deir ez-Zor" "Baghdad" ...
##  $ target.location                   : chr [1:10018] "close to a children's hospital" "Route between City & Deir ez-Zor Airport" "Route between City & Deir ez-Zor Airport" "Al Dora neighborhood, near refinery and cathedral" ...
##  $ target.latitude                   : num [1:10018] 36.5 35.3 35.3 33.3 31.8 ...
##  $ target.longtitude                 : num [1:10018] 40.8 40.1 40.1 44.4 64.5 ...
##  $ target.desc                       : chr [1:10018] "Syrian Army checkpoint" "Syrian regime forces" "Syrian regime forces" "Iraqi Police patrol" ...
##  $ target.type                       : chr [1:10018] "Security" "Security" "Security" "Security" ...
##  $ target.nationality                : chr [1:10018] "Syrian" "Syrian" "Syrian" "Iraqi" ...
##  $ statistics.# attackers            : num [1:10018] 1 2 2 1 1 3 3 3 3 3 ...
##  $ statistics.# female_attackers     : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
##  $ statistics.# male_attackers       : num [1:10018] 0 0 0 0 0 0 0 0 0 0 ...
##  $ statistics.# unknown_attackers    : num [1:10018] 1 2 2 1 1 3 3 3 3 3 ...
##  $ attacker.gender                   : chr [1:10018] "Unknown" "Unknown" "Unknown" "Unknown" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   groups = col_character(),
##   ..   claim = col_character(),
##   ..   status = col_character(),
##   ..   statistics.sources = col_double(),
##   ..   date.year = col_double(),
##   ..   date.month = col_double(),
##   ..   date.day = col_double(),
##   ..   `statistics.# wounded_low` = col_double(),
##   ..   `statistics.# wounded_high` = col_double(),
##   ..   `statistics.# killed_low` = col_double(),
##   ..   `statistics.# killed_high` = col_double(),
##   ..   `statistics.# killed_low_civilian` = col_double(),
##   ..   `statistics.# killed_high_civilian` = col_double(),
##   ..   `statistics.# killed_low_political` = col_double(),
##   ..   `statistics.# killed_high_political` = col_double(),
##   ..   `statistics.# killed_low_security` = col_double(),
##   ..   `statistics.# killed_high_security` = col_double(),
##   ..   `statistics.# belt_bomb` = col_double(),
##   ..   `statistics.# truck_bomb` = col_double(),
##   ..   `statistics.# car_bomb` = col_double(),
##   ..   `statistics.# weapon_oth` = col_double(),
##   ..   `statistics.# weapon_unk` = col_double(),
##   ..   target.weapon = col_character(),
##   ..   target.region = col_character(),
##   ..   target.subregion = col_character(),
##   ..   target.country = col_character(),
##   ..   target.province = col_character(),
##   ..   target.city = col_character(),
##   ..   target.location = col_character(),
##   ..   target.latitude = col_double(),
##   ..   target.longtitude = col_double(),
##   ..   target.desc = col_character(),
##   ..   target.type = col_character(),
##   ..   target.nationality = col_character(),
##   ..   `statistics.# attackers` = col_double(),
##   ..   `statistics.# female_attackers` = col_double(),
##   ..   `statistics.# male_attackers` = col_double(),
##   ..   `statistics.# unknown_attackers` = col_double(),
##   ..   attacker.gender = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Cleaning the data

names(suicide_attacks) <- gsub("[(). \\-]", "_", names(suicide_attacks)) # replace ., (), space, with dash
names(suicide_attacks) <- gsub("_$", "", names(suicide_attacks))  # remove trailing underscore

head(suicide_attacks) #verify

## # A tibble: 6 × 39
##   groups           claim status statistics_sources date_year date_month date_day
##   <chr>            <chr> <chr>               <dbl>     <dbl>      <dbl>    <dbl>
## 1 Islamic State    Susp… Confi…                  2      2015          6        2
## 2 Islamic State    Susp… Possi…                  3      2017          1        6
## 3 Islamic State    Susp… Possi…                  3      2017          1        6
## 4 Unknown Group    Uncl… Confi…                  4      2004         10        5
## 5 Taliban (IEA)    Clai… Possi…                  5      2017          7        4
## 6 Al-Jaysh al-Isl… Clai… Confi…                  4      2012         10        3
## # ℹ 32 more variables: `statistics_#_wounded_low` <dbl>,
## #   `statistics_#_wounded_high` <dbl>, `statistics_#_killed_low` <dbl>,
## #   `statistics_#_killed_high` <dbl>, `statistics_#_killed_low_civilian` <dbl>,
## #   `statistics_#_killed_high_civilian` <dbl>,
## #   `statistics_#_killed_low_political` <dbl>,
## #   `statistics_#_killed_high_political` <dbl>,
## #   `statistics_#_killed_low_security` <dbl>, …

Checking for NA’S

colSums(is.na(suicide_attacks))

##                             groups                              claim 
##                                  0                                  0 
##                             status                 statistics_sources 
##                                  0                                  0 
##                          date_year                         date_month 
##                                  0                                  0 
##                           date_day           statistics_#_wounded_low 
##                                  0                                  0 
##          statistics_#_wounded_high            statistics_#_killed_low 
##                                  0                                  0 
##           statistics_#_killed_high   statistics_#_killed_low_civilian 
##                                  0                                  0 
##  statistics_#_killed_high_civilian  statistics_#_killed_low_political 
##                                  0                                  0 
## statistics_#_killed_high_political   statistics_#_killed_low_security 
##                                  0                                  0 
##  statistics_#_killed_high_security             statistics_#_belt_bomb 
##                                  0                                  0 
##            statistics_#_truck_bomb              statistics_#_car_bomb 
##                                  0                                  0 
##            statistics_#_weapon_oth            statistics_#_weapon_unk 
##                                  0                                  0 
##                      target_weapon                      target_region 
##                                  0                                  0 
##                   target_subregion                     target_country 
##                                  0                                  0 
##                    target_province                        target_city 
##                                  0                                  0 
##                    target_location                    target_latitude 
##                                  0                                  0 
##                  target_longtitude                        target_desc 
##                                  0                                  0 
##                        target_type                 target_nationality 
##                                  0                                  0 
##             statistics_#_attackers      statistics_#_female_attackers 
##                                  0                                  0 
##        statistics_#_male_attackers     statistics_#_unknown_attackers 
##                                  0                                  0 
##                    attacker_gender 
##                                  0

There were no NA’s therefore I did not need to filter out NA’s.

Checking how many different target types there are

table(suicide_attacks$target_type)

## 
##  Civilian Political  Security   Unknown 
##      2386      1080      6548         4

Unknown had to little values in its category therefore I filtered it out.

Filtering, renaming and selecting variables

filtering_suicide_attacks<-suicide_attacks |>
  rename(num_killed =`statistics_#_killed_high`, num_attackers =`statistics_#_attackers`) |> #Learned in data 110
  filter(target_type != "Unknown") |> # removing unknown as a target type because it only had 4
  select(claim,num_killed, num_attackers, target_type, date_year, target_region)
filtering_suicide_attacks

## # A tibble: 10,014 × 6
##    claim     num_killed num_attackers target_type date_year target_region
##    <chr>          <dbl>         <dbl> <chr>           <dbl> <chr>        
##  1 Suspected          5             1 Security         2015 Asia         
##  2 Suspected         40             2 Security         2017 Asia         
##  3 Suspected         40             2 Security         2017 Asia         
##  4 Unclaimed         10             1 Security         2004 Asia         
##  5 Claimed            0             1 Security         2017 Asia         
##  6 Claimed           40             3 Civilian         2012 Asia         
##  7 Claimed           40             3 Civilian         2012 Asia         
##  8 Claimed           40             3 Civilian         2012 Asia         
##  9 Claimed           40             3 Civilian         2012 Asia         
## 10 Claimed           40             3 Civilian         2012 Asia         
## # ℹ 10,004 more rows

Checking how many different types of claims there are

table(filtering_suicide_attacks$claim)

## 
##   Claimed    Denied Suspected Unclaimed 
##      4237       128      1438      4211

Claim vs not claimed

suicide_attacks_clean <-filtering_suicide_attacks |>
mutate(claim = ifelse(claim =="Claimed", 1,0), #making claim to be binary for the logistic regression model
       target_type=as.factor(target_type),
       target_region=as.factor(target_region))
  

suicide_attacks_clean

## # A tibble: 10,014 × 6
##    claim num_killed num_attackers target_type date_year target_region
##    <dbl>      <dbl>         <dbl> <fct>           <dbl> <fct>        
##  1     0          5             1 Security         2015 Asia         
##  2     0         40             2 Security         2017 Asia         
##  3     0         40             2 Security         2017 Asia         
##  4     0         10             1 Security         2004 Asia         
##  5     1          0             1 Security         2017 Asia         
##  6     1         40             3 Civilian         2012 Asia         
##  7     1         40             3 Civilian         2012 Asia         
##  8     1         40             3 Civilian         2012 Asia         
##  9     1         40             3 Civilian         2012 Asia         
## 10     1         40             3 Civilian         2012 Asia         
## # ℹ 10,004 more rows

Logistic Regression:

For the logistic I first did quality control to make sure my categorical predictors had at least 5 in each category therefore I was able to use the fitted logistic regression model to predict whether an attack is claimed or not, we can use the predictors that are statistically significant (p<0.05) and interpret the effects on attack being claimed. Next I did checked the R^2 value to explain the variation in claimed attacked, based on number of attackers, number of people killed, target type, year and target_region. Furthermore, I analyzed the P-value to understand if the model is statistically significant. Lastly, I did a confusion matrix, Key performance metrics to understand the errors and predicting strengths and limitations of the model.

Quality control

xtabs(~ claim + num_attackers, data=suicide_attacks_clean)

##      num_attackers
## claim    1    2    3    4    5    6    7    8    9   10   13   15   50
##     0 3941  934  372  200  120   36   28   40   18   60   13   15    0
##     1 2771  674  327  136  115   72   28   16    0   20   13   15   50

xtabs(~ claim + target_type, data=suicide_attacks_clean)

##      target_type
## claim Civilian Political Security
##     0     1566       518     3693
##     1      820       562     2855

xtabs(~ claim + num_killed, data=suicide_attacks_clean)

##      num_killed
## claim   -1    0    1    2    3    4    5    6    7    8    9   10   11   12
##     0   20 1298  527  522  474  369  350  231  243  194  107  181  119   92
##     1   17  722  288  347  265  286  261  196  168  114  108  142   90  130
##      num_killed
## claim   13   14   15   16   17   18   19   20   21   22   23   24   25   26
##     0   79   51   87   50   63   53   29   75   22   23   27   15   52   22
##     1   60   67   85   61   61   51   34   53   36   23   31   19   27   15
##      num_killed
## claim   27   28   29   30   31   32   33   34   35   36   37   38   39   40
##     0   18   12    4   40   10   15   12    8   12    7    8    4    8   26
##     1   21   20   13   37   18   11    6    6   32    7    8    5    1   74
##      num_killed
## claim   41   42   43   44   45   46   47   48   49   50   51   52   53   54
##     0   13    8    6    2    8    2    1    7    6   15    3    1    8    4
##     1    5    4    6    6    6   17    2    4    1    5    1    3    3    7
##      num_killed
## claim   55   56   57   58   59   60   61   62   63   64   65   66   67   68
##     0    6    6    4    6    1    7    7    0    1    0    2    1    4    4
##     1    3    3    2    6    4    9    9    3    0    3    1    2    1    3
##      num_killed
## claim   69   70   71   72   73   74   75   76   78   80   83   84   85   88
##     0    1   11    3    0    3    0    1    4    2    2    2    0    6    4
##     1    1    3    1    5    1    2    0    4    0    1    7    1    1    1
##      num_killed
## claim   89   90   91   92   93   94   95   96  100  101  102  103  105  106
##     0    0    2    2    0    0    0    0    2    4    0    2    0    0    0
##     1    3    1    1    1    2    1    2    0    3    1    3    1    2    2
##      num_killed
## claim  114  115  117  118  120  125  126  130  135  143  149  150  152  156
##     0    0    0    1    0    3    0    1    0    1    1    0    7    0    1
##     1    1    2    0    2    1    1    0    1    0    1    1    1    2    3
##      num_killed
## claim  165  184  200  202  213  241  250  324  358 2753
##     0    1    5    0    0    0    2    0    0    1   10
##     1    0    5   13    2    1    1    2    1    0   10

xtabs(~ claim + target_region, data=suicide_attacks_clean)

##      target_region
## claim Africa Americas Asia Europe
##     0    919       22 4738     98
##     1    442       16 3718     61

xtabs(~ claim + date_year, data=suicide_attacks_clean)

##      date_year
## claim 1974 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993
##     0    0    0    0    5    1    3    1    0    0    0    0    3    1    2
##     1    3    3    2    4    2   40    2    3    3    2    7    7    0   20
##      date_year
## claim 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
##     0    4    5    2   10    6    4   15   25   14   88  129  269  285  448
##     1   16   31   23   19   26   38   34   71   71   63  135  159  154  174
##      date_year
## claim 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019
##     0  320  221  168  194  212  396  525  787  553  579  303  199
##     1  195  171  206  153  199  264  403  313  446  416  188  171

Each category has more more than 5 in both groups, so this variable is safe to use in modeling — no need to drop or merge any levels.

Logistic Regression Model

logistic <- glm(claim ~num_attackers+num_killed+target_type+date_year+target_region, data=suicide_attacks_clean, family="binomial")

summary(logistic)

## 
## Call:
## glm(formula = claim ~ num_attackers + num_killed + target_type + 
##     date_year + target_region, family = "binomial", data = suicide_attacks_clean)
## 
## Coefficients:
##                         Estimate Std. Error z value Pr(>|z|)    
## (Intercept)           73.5021814  7.6032238   9.667  < 2e-16 ***
## num_attackers          0.0524707  0.0075206   6.977 3.02e-12 ***
## num_killed             0.0007869  0.0002855   2.756 0.005853 ** 
## target_typePolitical   0.6876375  0.0761571   9.029  < 2e-16 ***
## target_typeSecurity    0.3224261  0.0520843   6.190 6.00e-10 ***
## date_year             -0.0369949  0.0037745  -9.801  < 2e-16 ***
## target_regionAmericas -1.6511216  0.6300264  -2.621 0.008774 ** 
## target_regionAsia      0.2417477  0.0657500   3.677 0.000236 ***
## target_regionEurope   -0.0941257  0.1776315  -0.530 0.596186    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 13645  on 10013  degrees of freedom
## Residual deviance: 13323  on 10005  degrees of freedom
## AIC: 13341
## 
## Number of Fisher Scoring iterations: 4

Interpretation Using the fitted logistic regression model to predict whether an attack is claimed or not, we can use the predictors that are statistically significant (p<0.05) and interpret the effects on attack being claimed. The meaningful log-odds ratios for the variables are down below.

The number of attackers 0.0524707 (p=3.02e-12): Number of attackers increases the log odds of an attack being claimed. This predictor is statistically significant meaning that number of attacker is important in predicting whether an attack is claimed.

The number of people killed in an attack 0.0007869 (p= 0.005853 ) Number of people killed in an attack increases the log odds of an attack being claimed however it is slight. This predictor is statistically significant meaning that number of people killed plays a small role in predicting whether an attack is claimed or not.

target_type Political 0.6876375 (p=< 2e-16) Political targets have a higher log-odds of an attack being claimed compared to the reference which is a civilian. This is highly significant and statistically meaningful.

target_type Security 0.322 (p=6.00e-10) Security targets have a higher log- odds of being an attack being claimed compared to a civilian. This is very significant.

Date_year = -0.037 (p = < 2e-16) Strong negative effect, very significant, the date year of the attack is significant in determining whether an attack is claimed or not.

target_region Americas = -1.65 (p=0.008774 ) There is a high negative effect of target region in America in determining whether an attack is claimed or not as compared to the reference Africa, this is statistically significant.

target_region Asia = 0.24 (p=0.000236) There is an effect of target region in Asia in determining whether an attack is claimed or not as compared to the reference Africa, this is statistically significant.

target_regionEurope = -0.0941257 p=0.596186
Slight negative effect, but not significant — Europe as a region alone, adjusted for other variables, doesn’t add much here.

The overall model has a lower residual deviance than the null deviance, this means that the predictors above do improve the model fit compared to an intercept-only model,

Calculating R^2

r_square <- 1 - (logistic$deviance/logistic$null.deviance)

r_square

## [1] 0.02353615

The R^2 is 2.3 % s means that the model explains about 2.3% of the variation in claimed attacked, based on number of attackers, number of people killed, target type, year and target_region. This low value is often in logistic regressions due to behavioral data set in which a terrorist claims an attack. A low R^2 does not mean the model is incorrect, this just indicates that claiming a suicide attacks has many other behavioral and social factors. Disclaimer I am aware that the R^2 is low and that my model could favor from having other predictors in this model to improve the explanatory power.

Calculating P-value

1 - pchisq((logistic$null.deviance - logistic$deviance), df=(length(logistic$coefficients) -1))

## [1] 0

The P-value is 0 meaning that the model is extremely significant.

Confusion Matrix

predicted.probs <- logistic$fitted.values
predicted.classes <- ifelse (predicted.probs > 0.5, 1, 0)
confusion <- table(
Predicted = factor (predicted.classes, levels = c(0, 1)),
Actual = factor (suicide_attacks_clean$claim, levels = c(0, 1)))

confusion

##          Actual
## Predicted    0    1
##         0 5192 3455
##         1  585  782

5192 the attack was not claimed, and the model correctly predicted them as not claimed. This is a True Negative.

3455 attacks were claimed, but the model mistakenly predicted them as not claimed. This is a False Negative.

585 attacks were not claimed, but the model mistakenly predicted them as claimed. This is a False Positive.

782 attacks were actually claimed, and the model correctly predicted them as claimed. This is a True Positive.

Calculating key performance metrics

#Extract Values:
TN <- 5192
FP <- 585
FN <- 3455
TP <- 782

#Metrics 
accuracy <- (TP + TN) / (TP + TN + FP + FN)
sensitivity <- TP / (TP + FN)   # also called recall or true positive rate
specificity <- TN / (TN + FP)   # true negative rate
precision <- TP / (TP + FP)     # positive predictive value
f1_score <- 2 * (precision * sensitivity) / (precision + sensitivity)

cat("Accuracy:", round(accuracy, 3), "\nSensitivity:", round(sensitivity, 3), "\nSpecificity:", round(specificity, 3), "\nPrecision:", round(precision, 3))

## Accuracy: 0.597 
## Sensitivity: 0.185 
## Specificity: 0.899 
## Precision: 0.572

The model has an accuracy of 59.7% meaning it performs which is slightly above the baseline and that it correctly identifies some cases. Overall, having being a weak predictive model.

The model is better at identifying attacks not claimed because it has a high specificity of 89.9% than detecting suicide attacks that were claimed. The sensitivity is 18.5% which is poor meaning that the model struggles at determining attacks that were claimed and misses a lot of attacks that are claimed.

The precision is 57.2 percent showing that the positive predictions are weakly correct, the lower sensitivity suggest that the model misses a lot of claims.

In conclusion, the model is better at identify suicide attacks that were not claimed than cases of attacks that were claimed, this is important to consider because in research context being able to identify attacks which are claimed is far more meaningful than then attacks that are not. The model’s inability to identify claimed attack suggest that the predictors included are not sufficient to predict claims. This highlights how terrorist that claim attacks is not just dependent on organizational identity, strategic grouping, area of target and the year in which it happened.

ROC and AUC Curve

library(pROC)

## Type 'citation("pROC")' for a citation.

## 
## Attaching package: 'pROC'

## The following objects are masked from 'package:stats':
## 
##     cov, smooth, var

# ROC curve & AUC on full data
roc_obj <- roc(response = suicide_attacks_clean$claim,
               predictor = logistic$fitted.values,
               levels = c(0, 1),
               direction = "<")  

# Print AUC value
auc_val <- auc(roc_obj); auc_val

## Area under the curve: 0.5784

# Plot ROC with AUC displayed
plot.roc(roc_obj, print.auc = TRUE, legacy.axes = TRUE,
         xlab = "False Positive Rate (1 - Specificity)",
         ylab = "True Positive Rate (Sensitivity)")

The AUC = 0.578 means that the model is marginally better than baseline at distinguishing between claimed attacks and unclaimed attacks.

On the plot, the curve is above the diagonal “random guess” line, which shows the model is better than chance.

In plain words: if you randomly pick one claimed attack and one unclaimed attack, the model has about a 57% chance of ranking the claimed suicide attack higher (more likely to have claimed attack).

Conclusion and Future Directions :

Overall, this logistic regression model showcased 5 different predictors of terrorist attacks. Target type was noted as the strongest predictor of a suicide attack being claimed. In particular, political targets had the highest log odds of an attack being claimed as compared to the reference civilians. The target region differences in Asia had the highest claiming rates when compared to Africa while America had the lowest. All values were statistically significant except the target region Europe. However, the model had several predictive constraints. he model is better at identifying attacks not claimed because it has a high specificity of 89.9% than detecting suicide attacks that were claimed. The sensitivity is 18.5% which is poor meaning that the model struggles at determining attacks that were claimed and misses a lot of attacks that are claimed. The precision is 57.2 percent showing that the positive predictions are weakly correct, the lower sensitivity suggest that the model misses a lot of claims.

As stated earlier, the R^2 is 2.3 % means that the model explains about 2.3% of the variation in claimed attacked, based on number of attackers, number of people killed, target type, year and target_region. This low value is often in logistic regressions due to behavioral data set in which a terrorist claims an attack.

Additionally, the AUC is 0.578 means that the model is slightly better than baseline at distinguishing between claimed attacks and unclaimed attacks. Thus, if you randomly pick one claimed attack and one unclaimed attack, the model has about a 57% chance of ranking the claimed suicide attack higher (more likely to have claimed attack)

Potential avenues:

A potential avenue is using a different data set on suicide attack claims that incorporates different factors since I had tried to use more factors but it was not helpful in this model and it only explains about 2.3% of the variation. A low R^2 does not mean the model is incorrect. This just indicates that claiming a suicide attacks has many other behavioral and social factors which I could explore using another data set to compare against this data set.

Project3

Ayan Elmi

2025-12-02