The topic of the dataset is suicide attacks, and the data for the dataset was collected from The Chicago Project on Security and Terrorism (CPOST) from the University of Chicago. CPOST is a searchable database on all known suicide attacks across the globe from 1982 to October 2020. The dataset includes 10018 observations with 39 variables. I chose this topic as I wanted to see what factors make a suicide attack more harmful. This lead to my question of, what characteristics of a suicide attack make them more detrimental? The variables I intend to use to answer this question are the datasets low and high predictions of the number of individuals killed, the type of weapon used by the attacker, the type of target attacked, the subregion of the attack, the number of attackers involved, the year of the attack, and the month of the attack.
statistics.# killed_low, low estimate of the number of deaths caused by the attack
statistics.# killed_high, high estimate of the numbers of deaths caused by the attack
target.weapon, the weapon used by the attacker
target.type, the target of the attack, (civilian, political, security)
target.subregion, the subregion of the attack, (ex. Southern Asia, Western Asia, etc.)
statistics.# attackers, the number of attackers involved in the attack
date.year, the year the attack occured
date.month, the month the attack took place on
library(tidyverse)
library(highcharter)
library(RColorBrewer)
library(ggthemes)
library(GGally)
setwd("C:/Users/wesle/Downloads")
sads <- read_csv("suicide_attacks.csv")
colSums(is.na(sads)) # checking the dataset for any NAs
## groups claim
## 0 0
## status statistics.sources
## 0 0
## date.year date.month
## 0 0
## date.day statistics.# wounded_low
## 0 0
## statistics.# wounded_high statistics.# killed_low
## 0 0
## statistics.# killed_high statistics.# killed_low_civilian
## 0 0
## statistics.# killed_high_civilian statistics.# killed_low_political
## 0 0
## statistics.# killed_high_political statistics.# killed_low_security
## 0 0
## statistics.# killed_high_security statistics.# belt_bomb
## 0 0
## statistics.# truck_bomb statistics.# car_bomb
## 0 0
## statistics.# weapon_oth statistics.# weapon_unk
## 0 0
## target.weapon target.region
## 0 0
## target.subregion target.country
## 0 0
## target.province target.city
## 0 0
## target.location target.latitude
## 0 0
## target.longtitude target.desc
## 0 0
## target.type target.nationality
## 0 0
## statistics.# attackers statistics.# female_attackers
## 0 0
## statistics.# male_attackers statistics.# unknown_attackers
## 0 0
## attacker.gender
## 0
names(sads) <- gsub("\\.","_",names(sads)) # changed . that seperated words with _ to keep it consistent throughout the data set
names(sads) <- gsub(" ","_",names(sads)) # changed spaces that seperated parts of the variable's name with _ to keep consistencyin the dataset
names(sads) <- gsub("#","num",names(sads)) # changed the # representing number with num to make it easier to understand at a first glance
sads1 <- sads |>
filter(statistics_num_killed_low >= 0) # several of the values present within the low and high estimates had a -1 present which most likely means NA as it isn't possible for there to be a negative amount of deaths in an attack
There were no “NA”s present within the dataset, however, there were placeholders that stood for an NA in both of the low and high estimates of deaths in an attack. The placeholder was a -1, I removed the observations that had a -1 in their death estimates as it isn’t possible to have a negative amount of deaths in an attack. Doing this shouldn’t have too much of an impact on later work with the dataset as it have 10018 observations and the NAs only accounted for 37 of the total observations (down to 9981 observations). I have also changed the column names for a few reasons, one being consistency. The dataset would use either a space, a period, or an underscore to separate words in a variable name, I changed it to use an underscore for all instances. I also changed the # used to symbolize number in the low and high estimate variables to num to make it easier to understand at first glance when looking at the variable name.
colors <- c("red", "orange", "yellow", "green", "blue", "purple", "pink", "hotpink", "black", "maroon", "mediumseagreen", "dodgerblue", "aquamarine", "skyblue", "violet", "orangered", "firebrick") # Colors List
sadsawh <- sads1 |>
group_by(target_weapon) |>
summarize(highnumavg = round(mean(statistics_num_killed_high))) # grouping by weapon type and taking the average of the high estimated number killed to make it easier to create the visualization
highchart() |>
hc_add_series(data = sadsawh, type = "column", hcaes(x = target_weapon, y = highnumavg, group = target_weapon)) |>
hc_title(text = "Average High Estimate of the Number of Individuals Killed vs Attacker Weapon Type") |>
hc_caption(text = "University of Chicago, Chicago Project on Security and Terrorism (CPOST)") |>
hc_xAxis(title = list(text = "Attacker Weapon Type")) |>
hc_yAxis(title = list(text = "Average High Estimate of the Number of Individuals Killed")) |>
hc_colors(colors) |>
hc_add_theme(hc_theme_darkunica())
This graph, while only being the high estimates of the number of deaths, can help push the idea that weapon type can be a factor when looking at what causes higher or lower deaths in an attack.
sadsawl <- sads1 |>
group_by(target_weapon) |>
summarize(lownumavg = round(mean(statistics_num_killed_low))) # grouping by weapon type and taking the average of the high estimated number killed to make it easier to create the visualization
highchart() |>
hc_add_series(data = sadsawl, type = "column", hcaes(x = target_weapon, y = lownumavg, group = target_weapon)) |>
hc_title(text = "Average Low Estimate of the Number of Individuals Killed vs Attacker Weapon Type") |>
hc_caption(text = "University of Chicago, Chicago Project on Security and Terrorism (CPOST)") |>
hc_xAxis(title = list(text = "Attacker Weapon Type")) |>
hc_yAxis(title = list(text = "Average Low Estimate of the Number of Individuals Killed")) |>
hc_colors(colors) |>
hc_add_theme(hc_theme_superheroes())
Both of these visualizations are similar, which is expected, as they look at the amount of deaths caused by an attack. The first visualization is the high estimate while the second visualization is the low estimate. As shown there is a slight difference between the two, which is expected as on is the high and one is the low estimate, however, the differences amount the weapon types average number of estimated kills is the same. For both animal bombs are the lowest and airplanes are the highest by a significant margin. This shows how there is a relationship between the number of individuals killed and the type of weapon used which is useful in answering the question in the introduction. A possible reason for airplanes having such a high average estimate in deaths could come from the fact that this data includes the attacks in New York during 9/11, which is in the data range this data covers from 1982 to 2020. A later tableau visualization supports this as it shows an attack that had a large number of deaths caused by an airplane in the US.
mlrmh <- lm(statistics_num_killed_high ~ target_weapon + target_type + target_subregion + statistics_num_attackers + date_year + date_month, data = sads1)
summary(mlrmh)
##
## Call:
## lm(formula = statistics_num_killed_high ~ target_weapon + target_type +
## target_subregion + statistics_num_attackers + date_year +
## date_month, data = sads1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1797.19 -8.46 -2.36 5.45 902.51
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2587.8266 294.5619 8.785 <2e-16 ***
## target_weaponAnimal bomb -1837.1517 62.5406 -29.375 <2e-16 ***
## target_weaponBackpack bomb -1827.6357 52.4744 -34.829 <2e-16 ***
## target_weaponBelt bomb -1827.4068 51.6916 -35.352 <2e-16 ***
## target_weaponBoat bomb -1821.4653 51.9091 -35.090 <2e-16 ***
## target_weaponCar bomb -1825.8123 51.6800 -35.329 <2e-16 ***
## target_weaponCart bomb -1824.2801 55.2664 -33.009 <2e-16 ***
## target_weaponMixed -1825.1668 52.0971 -35.034 <2e-16 ***
## target_weaponMotorcycle bomb -1830.1434 51.8057 -35.327 <2e-16 ***
## target_weaponNon-suicide IED -1836.9574 71.9185 -25.542 <2e-16 ***
## target_weaponOther PBIED -1832.7142 51.6637 -35.474 <2e-16 ***
## target_weaponOther VBIED -1823.7340 52.9828 -34.421 <2e-16 ***
## target_weaponScuba bomb -1838.8717 53.8733 -34.133 <2e-16 ***
## target_weaponTruck bomb -1816.2487 51.7556 -35.093 <2e-16 ***
## target_weaponTurban bomb -1816.7367 65.8456 -27.591 <2e-16 ***
## target_weaponUnspecified -1828.8970 51.8009 -35.306 <2e-16 ***
## target_weaponUnspecified PBIED -1830.9245 51.8372 -35.321 <2e-16 ***
## target_typePolitical -33.1355 2.7221 -12.173 <2e-16 ***
## target_typeSecurity -18.9712 1.8517 -10.245 <2e-16 ***
## target_typeUnknown -19.3273 35.4881 -0.545 0.5860
## target_subregionCentral Asia 5.7659 73.4332 0.079 0.9374
## target_subregionEastern Africa 11.9441 70.9199 0.168 0.8663
## target_subregionEastern Asia -1.2571 72.0374 -0.017 0.9861
## target_subregionEastern Europe 6.3098 71.0301 0.089 0.9292
## target_subregionMiddle Africa -6.0390 71.0017 -0.085 0.9322
## target_subregionNorthern Africa 3.0322 70.9252 0.043 0.9659
## target_subregionNorthern America -13.7132 86.6990 -0.158 0.8743
## target_subregionNorthern Europe -5.1706 75.2581 -0.069 0.9452
## target_subregionSouth-Eastern Asia 4.2347 71.5865 0.059 0.9528
## target_subregionSouth America 39.2840 79.0576 0.497 0.6193
## target_subregionSouthern Asia 7.8231 70.7855 0.111 0.9120
## target_subregionSouthern Europe -6.1683 74.1888 -0.083 0.9337
## target_subregionWestern Africa -0.5815 70.8253 -0.008 0.9934
## target_subregionWestern Asia 4.5067 70.7803 0.064 0.9492
## target_subregionWestern Europe 9.1847 73.6682 0.125 0.9008
## statistics_num_attackers 1.6098 0.1928 8.349 <2e-16 ***
## date_year -0.3705 0.1426 -2.598 0.0094 **
## date_month 0.1915 0.2146 0.892 0.3723
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.61 on 9943 degrees of freedom
## Multiple R-squared: 0.6779, Adjusted R-squared: 0.6767
## F-statistic: 565.6 on 37 and 9943 DF, p-value: < 2.2e-16
mlrml <- lm(statistics_num_killed_low ~ target_weapon + target_type + target_subregion + statistics_num_attackers + date_year + date_month, data = sads1)
summary(mlrml)
##
## Call:
## lm(formula = statistics_num_killed_low ~ target_weapon + target_type +
## target_subregion + statistics_num_attackers + date_year +
## date_month, data = sads1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1798.35 -7.03 -1.78 4.92 902.91
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2560.6451 291.4103 8.787 <2e-16 ***
## target_weaponAnimal bomb -1835.8772 61.8715 -29.672 <2e-16 ***
## target_weaponBackpack bomb -1829.5225 51.9129 -35.242 <2e-16 ***
## target_weaponBelt bomb -1828.3145 51.1385 -35.752 <2e-16 ***
## target_weaponBoat bomb -1824.8424 51.3537 -35.535 <2e-16 ***
## target_weaponCar bomb -1826.4898 51.1270 -35.725 <2e-16 ***
## target_weaponCart bomb -1828.6661 54.6751 -33.446 <2e-16 ***
## target_weaponMixed -1829.4242 51.5397 -35.495 <2e-16 ***
## target_weaponMotorcycle bomb -1829.8416 51.2514 -35.703 <2e-16 ***
## target_weaponNon-suicide IED -1835.1306 71.1490 -25.793 <2e-16 ***
## target_weaponOther PBIED -1831.9166 51.1109 -35.842 <2e-16 ***
## target_weaponOther VBIED -1824.5932 52.4159 -34.810 <2e-16 ***
## target_weaponScuba bomb -1839.3827 53.2969 -34.512 <2e-16 ***
## target_weaponTruck bomb -1818.6847 51.2018 -35.520 <2e-16 ***
## target_weaponTurban bomb -1815.9827 65.1411 -27.878 <2e-16 ***
## target_weaponUnspecified -1828.9980 51.2466 -35.690 <2e-16 ***
## target_weaponUnspecified PBIED -1831.1880 51.2826 -35.708 <2e-16 ***
## target_typePolitical -32.1443 2.6930 -11.936 <2e-16 ***
## target_typeSecurity -17.1242 1.8319 -9.348 <2e-16 ***
## target_typeUnknown -15.8063 35.1084 -0.450 0.6526
## target_subregionCentral Asia 4.5491 72.6475 0.063 0.9501
## target_subregionEastern Africa 7.1702 70.1611 0.102 0.9186
## target_subregionEastern Asia -3.8177 71.2666 -0.054 0.9573
## target_subregionEastern Europe 3.2975 70.2701 0.047 0.9626
## target_subregionMiddle Africa -7.9775 70.2420 -0.114 0.9096
## target_subregionNorthern Africa 0.6006 70.1664 0.009 0.9932
## target_subregionNorthern America -14.2349 85.7713 -0.166 0.8682
## target_subregionNorthern Europe -4.7834 74.4529 -0.064 0.9488
## target_subregionSouth-Eastern Asia 1.7268 70.8205 0.024 0.9805
## target_subregionSouth America 38.0726 78.2117 0.487 0.6264
## target_subregionSouthern Asia 4.2577 70.0282 0.061 0.9515
## target_subregionSouthern Europe -7.5425 73.3950 -0.103 0.9182
## target_subregionWestern Africa -4.0628 70.0675 -0.058 0.9538
## target_subregionWestern Asia 1.1861 70.0230 0.017 0.9865
## target_subregionWestern Europe 8.8245 72.8800 0.121 0.9036
## statistics_num_attackers 1.5925 0.1908 8.348 <2e-16 ***
## date_year -0.3566 0.1411 -2.527 0.0115 *
## date_month 0.1508 0.2123 0.710 0.4775
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 69.85 on 9943 degrees of freedom
## Multiple R-squared: 0.6829, Adjusted R-squared: 0.6817
## F-statistic: 578.6 on 37 and 9943 DF, p-value: < 2.2e-16
For both the high estimate and low estimate the overall multiple linear regression model were significant with a p-value of < 2.2e-16 for both, as < 2.2e-16 is much lower than the default α of 0.05. Both also had similar adjusted r-squared values, the low estimate model was slightly higher with 0.6817 which means the model explained 68.17% of the variance in the low estimate in the number of individuals killed in an attack, while the high estimate model had a value of 0.6767, meaning that the model was able to explain 67.67% of the variance in the high estimate in the number of individuals killed in an attack. All of the variables used, except for a few, were also significant, all having a p-value of < 2e-16. The variables that did not have this p-value were year, which was still significant as it had a p-value of 0.0094 on the high estimate and 0.0115 on the low estimate, month which was not significant in either model a p-value of 0.3723 in the high estimate and 0.4775 in the low estimate, subregion wasn’t significant at all as its p-value in both models is very close to one, always being greater than 0.5.
mlrmh1 <- lm(statistics_num_killed_high ~ target_weapon + target_type + statistics_num_attackers + date_year, data = sads1)
summary(mlrmh1)
##
## Call:
## lm(formula = statistics_num_killed_high ~ target_weapon + target_type +
## statistics_num_attackers + date_year, data = sads1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1799.81 -8.08 -2.42 5.10 903.50
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2696.2224 274.6473 9.817 < 2e-16 ***
## target_weaponAnimal bomb -1819.5012 37.5837 -48.412 < 2e-16 ***
## target_weaponBackpack bomb -1811.1100 15.5423 -116.528 < 2e-16 ***
## target_weaponBelt bomb -1808.7810 12.9078 -140.131 < 2e-16 ***
## target_weaponBoat bomb -1801.2317 13.9085 -129.505 < 2e-16 ***
## target_weaponCar bomb -1807.0376 12.9389 -139.659 < 2e-16 ***
## target_weaponCart bomb -1805.7215 23.4405 -77.034 < 2e-16 ***
## target_weaponMixed -1806.2302 14.4331 -125.145 < 2e-16 ***
## target_weaponMotorcycle bomb -1810.4299 13.3670 -135.440 < 2e-16 ***
## target_weaponNon-suicide IED -1816.7080 51.6025 -35.206 < 2e-16 ***
## target_weaponOther PBIED -1813.6496 13.9245 -130.249 < 2e-16 ***
## target_weaponOther VBIED -1805.5395 17.3651 -103.975 < 2e-16 ***
## target_weaponScuba bomb -1818.3026 20.0395 -90.736 < 2e-16 ***
## target_weaponTruck bomb -1797.7196 13.1906 -136.288 < 2e-16 ***
## target_weaponTurban bomb -1796.6113 42.7809 -41.996 < 2e-16 ***
## target_weaponUnspecified -1810.9341 13.3062 -136.097 < 2e-16 ***
## target_weaponUnspecified PBIED -1813.4302 13.4250 -135.079 < 2e-16 ***
## target_typePolitical -30.5489 2.6182 -11.668 < 2e-16 ***
## target_typeSecurity -17.5360 1.7708 -9.903 < 2e-16 ***
## target_typeUnknown -16.0804 35.4691 -0.453 0.65030
## statistics_num_attackers 1.5847 0.1916 8.270 < 2e-16 ***
## date_year -0.4311 0.1372 -3.143 0.00168 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.61 on 9959 degrees of freedom
## Multiple R-squared: 0.6774, Adjusted R-squared: 0.6767
## F-statistic: 995.7 on 21 and 9959 DF, p-value: < 2.2e-16
mlrml1 <- lm(statistics_num_killed_low ~ target_weapon + target_type + statistics_num_attackers + date_year, data = sads1)
summary(mlrml1)
##
## Call:
## lm(formula = statistics_num_killed_low ~ target_weapon + target_type +
## statistics_num_attackers + date_year, data = sads1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1800.53 -6.72 -1.72 4.29 903.81
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2670.0568 271.6769 9.828 < 2e-16 ***
## target_weaponAnimal bomb -1821.0214 37.1772 -48.982 < 2e-16 ***
## target_weaponBackpack bomb -1815.0216 15.3742 -118.056 < 2e-16 ***
## target_weaponBelt bomb -1812.4285 12.7682 -141.949 < 2e-16 ***
## target_weaponBoat bomb -1807.5518 13.7581 -131.381 < 2e-16 ***
## target_weaponCar bomb -1810.4902 12.7990 -141.456 < 2e-16 ***
## target_weaponCart bomb -1813.0651 23.1870 -78.193 < 2e-16 ***
## target_weaponMixed -1813.3082 14.2770 -127.009 < 2e-16 ***
## target_weaponMotorcycle bomb -1812.9353 13.2224 -137.111 < 2e-16 ***
## target_weaponNon-suicide IED -1817.6555 51.0444 -35.609 < 2e-16 ***
## target_weaponOther PBIED -1815.6155 13.7739 -131.816 < 2e-16 ***
## target_weaponOther VBIED -1809.2394 17.1773 -105.327 < 2e-16 ***
## target_weaponScuba bomb -1821.8262 19.8227 -91.906 < 2e-16 ***
## target_weaponTruck bomb -1802.8981 13.0479 -138.175 < 2e-16 ***
## target_weaponTurban bomb -1798.6968 42.3182 -42.504 < 2e-16 ***
## target_weaponUnspecified -1813.7201 13.1623 -137.796 < 2e-16 ***
## target_weaponUnspecified PBIED -1816.3554 13.2798 -136.776 < 2e-16 ***
## target_typePolitical -29.7622 2.5899 -11.492 < 2e-16 ***
## target_typeSecurity -15.7539 1.7516 -8.994 < 2e-16 ***
## target_typeUnknown -12.7371 35.0855 -0.363 0.71659
## statistics_num_attackers 1.5698 0.1895 8.282 < 2e-16 ***
## date_year -0.4181 0.1357 -3.082 0.00206 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 69.84 on 9959 degrees of freedom
## Multiple R-squared: 0.6824, Adjusted R-squared: 0.6818
## F-statistic: 1019 on 21 and 9959 DF, p-value: < 2.2e-16
Removing both month and subregion had no affect on either of the high or low models, the only slight difference that appeared was an increase of 0.0001 in the low models adjusted r-squared value, meaning that the model without month and subregion is 0.01% better at explaining the variance in the low estimate amount of deaths in an attack.
https://public.tableau.com/shared/5PXMGTQZ7?:display_count=n&:origin=viz_share_link
This visualization shows where each attack occurred using a map. It also determines the size and colors of each of the points by how many individuals were killed in the attack using the estimate variables from the dataset. Hovering over each point in tableau allows you to see both the high and low estimate, along with the longitude, latitude, weapon type, and target type. The plot also allows you to filter by weapon type. The plot also supports what was mentioned previously with 9/11 being a large contributor for airplanes having the highest avg estimated deaths, as there were only 3 airplane attacks, one of them being the attacks of 9/11 which had the most deaths out of all of the attacks present in the data by a significant margin.
Source(s):
https://corgis-edu.github.io/corgis/csv/suicide_attacks/
https://cpost.uchicago.edu/research/suicide_attacks/database_on_suicide_attacks/
https://time.com/5575956/sri-lanka-history-suicide-bombings-birthplace-invented/