STAT545A hw05: Visualize anything with ggplot2

Rebecca Johnston

I have chosen to explore the Global Terrorism Database (GTD), available for download here. To do this, I will use the ggplot2 package in R, to produce “elegant graphics for data analysis” (one can only hope). I will also use plyr to complete data aggregation tasks where necessary.

Load required R packages and GTD data
Short note on ggplot() function
“geom point” + “facet wraps”
“geom histogram” and plyr
Other variables in GTD data

Load required R packages and GTD data

Load required packages for this session:

library(ggplot2)
library(plyr)

Upload GTD data from local source, and check data structure:

terr <- read.table("globalterrorismdb_1012dist.txt", sep = "\t", quote = "\"", 
    header = TRUE, fill = TRUE)
str(terr)

## 'data.frame':    104457 obs. of  123 variables:
##  $ eventid           : Factor w/ 104438 levels " ","  January 19, 2009.",..: 62 63 64 65 66 67 68 69 70 71 ...
##  $ iyear             : int  1970 1970 1970 1970 1970 1970 1970 1970 1970 1970 ...
##  $ imonth            : int  0 0 1 1 1 1 1 1 1 1 ...
##  $ iday              : int  0 0 0 0 0 1 2 2 2 3 ...
##  $ approxdate        : Factor w/ 275 levels "","01/04/2000",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ extended          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ resolution        : Factor w/ 1836 levels "","1/01/1978",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ country           : int  58 130 160 78 101 217 218 217 217 217 ...
##  $ country_txt       : Factor w/ 210 levels "","Afghanistan",..: 53 123 147 75 97 195 196 195 195 195 ...
##  $ region            : int  2 1 5 8 4 1 3 1 1 1 ...
##  $ region_txt        : Factor w/ 14 levels "","Australasia & Oceania",..: 3 8 12 14 5 8 10 8 8 8 ...
##  $ provstate         : Factor w/ 3581 levels ""," Azad Kashmir",..: 1 1 1 1 1 1391 1 687 3491 3491 ...
##  $ city              : Factor w/ 24688 levels "","'A'Ishah Bakkar",..: 20133 14517 22055 1762 7868 4207 14875 16310 13417 13417 ...
##  $ latitude          : num  18.5 19.4 NA 38 33.6 ...
##  $ longitude         : num  -70 -99.1 NA 23.7 130.4 ...
##  $ specificity       : int  1 1 NA 1 1 1 NA 1 1 1 ...
##  $ vicinity          : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ location          : Factor w/ 21943 levels "","-9","\"Colony 39\" settlement in uncleared areas of Ampara district, in eastern Sri Lanka",..: 1 1 1 1 1 1 1 2426 21792 1 ...
##  $ summary           : Factor w/ 13865 levels "","01/00/2006: American soldiers discovered three decapitated bodies on a soccer field west of Baghdad, Iraq. It is unknown when t"| __truncated__,..: 1 1 1 1 1 1 1 9964 1 1 ...
##  $ crit1             : int  1 1 1 1 1 NA 1 1 NA NA ...
##  $ crit2             : int  1 1 1 1 1 NA 1 1 NA NA ...
##  $ crit3             : int  1 1 1 1 1 NA 1 1 NA NA ...
##  $ doubtterr         : int  -9 -9 -9 -9 -9 NA -9 1 NA NA ...
##  $ alternative       : int  NA NA NA NA NA NA NA 2 NA NA ...
##  $ alternative_txt   : Factor w/ 6 levels "",".","Insurgency/Guerilla Action",..: 2 2 2 2 2 1 2 6 1 1 ...
##  $ multiple          : int  0 0 0 0 0 NA 0 0 NA NA ...
##  $ success           : int  1 1 1 1 1 NA 0 1 NA NA ...
##  $ suicide           : int  0 0 0 0 0 NA 0 0 NA NA ...
##  $ attacktype1       : int  1 6 1 3 7 NA 1 3 NA NA ...
##  $ attacktype1_txt   : Factor w/ 10 levels "","Armed Assault",..: 3 8 3 4 5 1 3 4 1 1 ...
##  $ attacktype2       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ attacktype2_txt   : Factor w/ 8 levels "",".","Armed Assault",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ attacktype3       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ attacktype3_txt   : Factor w/ 6 levels "",".","Armed Assault",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ targtype1         : int  14 7 10 7 7 NA 3 21 NA NA ...
##  $ targtype1_txt     : Factor w/ 23 levels "","Abortion Related",..: 15 7 9 7 7 1 14 22 1 1 ...
##  $ corp1             : Factor w/ 14672 levels "","'Private Network' column",..: 1 1670 14375 1 1 1 14138 9624 1 1 ...
##  $ target1           : Factor w/ 47777 levels "","'Add al-Fattah Isma'il, secretery general",..: 23089 28153 14829 44363 44343 1 22964 14020 1 1 ...
##  $ natlty1           : int  58 21 217 217 217 NA 218 217 NA NA ...
##  $ natlty1_txt       : Factor w/ 212 levels "",".","Afghanistan",..: 55 21 197 197 197 1 199 197 1 1 ...
##  $ targtype2         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ targtype2_txt     : Factor w/ 24 levels "",".","Abortion Related",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ corp2             : Factor w/ 735 levels "","\"L'evenement Du Jeudi\", a magazine",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ target2           : Factor w/ 1305 levels "","(District) Survey Office",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ natlty2           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ natlty2_txt       : Factor w/ 122 levels "",".","Afghanistan",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ targtype3         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ targtype3_txt     : Factor w/ 19 levels "",".","Airports & Airlines",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ corp3             : Factor w/ 103 levels "","African National Congress (ANC)",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ target3           : Factor w/ 173 levels "","*","1 Citroen dealership",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ natlty3           : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ natlty3_txt       : Factor w/ 64 levels "",".","Afghanistan",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ gname             : Factor w/ 2473 levels "","1 May","14th of December Command",..: 1333 11 2389 2389 2389 1 2295 2389 1 1 ...
##  $ gsubname          : Factor w/ 433 levels "","10th Front",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gname2            : Factor w/ 91 levels "","1 May","Abu Baker Martyr Group",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gsubname2         : Factor w/ 6 levels "","al-Quds Brigades",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gname3            : Factor w/ 8 levels "","Armed Revolutionary Nuclei (NAR)",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ gsubname3         : logi  NA NA NA NA NA NA ...
##  $ motive            : Factor w/ 1346 levels "","\"Disgruntled tribesmen have frequently targeted the pipeline in the past few years to try to force the government to improve s"| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ guncertain1       : int  0 0 0 0 0 NA 0 0 NA NA ...
##  $ guncertain2       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ guncertain3       : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ nperps            : int  NA 7 NA NA NA NA 3 -99 NA NA ...
##  $ nperpcap          : int  NA NA NA NA NA NA NA -99 NA NA ...
##  $ claimed           : int  NA NA NA NA NA NA NA 0 NA NA ...
##  $ claimmode         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ claimmode_txt     : Factor w/ 12 levels "",".","Call (post-incident)",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ claim2            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ claimmode2        : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ claimmode2_txt    : Factor w/ 6 levels "",".","Call (post-incident)",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ claim3            : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ claimmode3        : logi  NA NA NA NA NA NA ...
##  $ claimmode3_txt    : Factor w/ 2 levels "",".": 2 2 2 2 2 1 2 2 1 1 ...
##  $ compclaim         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weaptype1         : int  13 13 13 6 8 NA 5 6 NA NA ...
##  $ weaptype1_txt     : Factor w/ 13 levels "","Biological",..: 12 12 12 4 7 1 6 4 1 1 ...
##  $ weapsubtype1      : int  NA NA NA 16 NA NA 2 16 NA NA ...
##  $ weapsubtype1_txt  : Factor w/ 30 levels "",".","Arson/Fire",..: 2 2 2 27 2 1 4 27 1 1 ...
##  $ weaptype2         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weaptype2_txt     : Factor w/ 12 levels "",".","Chemical",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ weapsubtype2      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weapsubtype2_txt  : Factor w/ 26 levels "",".","Arson/Fire",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ weaptype3         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weaptype3_txt     : Factor w/ 11 levels "",".","Chemical",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ weapsubtype3      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weapsubtype3_txt  : Factor w/ 21 levels "",".","Arson/Fire",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ weaptype4         : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weaptype4_txt     : Factor w/ 8 levels "",".","Explosives/Bombs/Dynamite",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ weapsubtype4      : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ weapsubtype4_txt  : Factor w/ 17 levels "",".","Arson/Fire",..: 2 2 2 2 2 1 2 2 1 1 ...
##  $ weapdetail        : Factor w/ 3925 levels "","\"Necklacing\" - burning tire around neck",..: 1 1 1 1319 1911 1 739 1 1 1 ...
##  $ nkill             : num  1 0 1 NA NA NA 0 0 NA NA ...
##  $ nkillus           : int  NA NA NA NA NA NA NA 0 NA NA ...
##  $ nkillter          : num  NA NA NA NA NA NA NA 0 NA NA ...
##  $ nwound            : num  0 0 0 NA NA NA 0 0 NA NA ...
##  $ nwoundus          : int  NA NA NA NA NA NA NA 0 NA NA ...
##  $ nwoundte          : int  NA NA NA NA NA NA NA 0 NA NA ...
##  $ property          : int  0 0 0 1 1 NA 0 1 NA NA ...
##  $ propextent        : int  NA NA NA NA NA NA NA 3 NA NA ...
##   [list output truncated]

This is a huge dataset, over 100 000 rows. It contains both quantitative and qualitative data to work with. Also note, there are quite a few blank and NA entries which may become an issue when it comes to plotting data.

Short note on ggplot() function

ggplot(df, aes(x, y, )) + …

Arguments:

“df” = name of dataframe to plot
“aes” = aesthetics. Spell out what x and y variables should equal. Also specify aesthetics such as colours here. THEN CLOSE BRACKETS to ggplot function!
“+…” = add ons, separated by “+” sign. Bare minimum is to add the type of graph (geom point/histogram/boxplot…) after you close the “ggplot” function (weird but true). Then add on other features such as “scale”, theme", “guides”, “facet_wrap”… think of them as layers.

“geom point”

Plot the number of casualties per year using a scatter plot, known as “geom_point” in ggplot2:

ggplot(terr, aes(x = iyear, y = nkill)) + geom_point()

## Warning: Removed 30819 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-3

Can already see outliers, the most extreme being in year 1994.

Let's work in colour by grouping the points by region:

ggplot(terr, aes(x = iyear, y = nkill, colour = region_txt)) + geom_point()

## Warning: Removed 30819 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-4

Comments on the above graph:

The blank entries for region are an issue. There is a blank entry for region in the legend. We should remove entries withour region data.

There are no data for 1993!

I would like the increments/tick marks to be more frequent on the x axis. Every 2 years perhaps, instead of every decade.

Find and remove entries with no region entry:

which(terr$region_txt == "")

##   [1]  73869  82898  83627  85255  90081  90082  90083  90084  90156  90157
##  [11]  90180  90182  90184  90185  90197  90198  90199  90200  90202  90211
##  [21]  90212  90213  90217  90218  90219  90220  90221  90240  90242  90247
##  [31]  90272  90273  90274  90291  90357  90358  90360  90361  90365  90366
##  [41]  90392  90439  90449  90466  90467  90468  90469  90470  90471  90472
##  [51]  90473  90546  90548  90549  90600  90601  90647  90648  90650  90651
##  [61]  90652  90653  90655  90656  90685  90688  90689  90754  90755  90790
##  [71]  90791  90792  90793  90794  90795  90877  90886  90904  90905  90906
##  [81]  90940  90953  90954  90990  92266 103378 103396 103399 103401 103459
##  [91] 103499 103566 103578 103693 103705 103740 103813 103910 103987 103988
## [101] 103991 103992 103993 104097 104107 104241 104424 104432

terr[73869, c(1:10)]  # Double check we truly are picking up blank entries

##            eventid iyear imonth iday        approxdate extended resolution
## 73869 200203250004  2002      3    0 March 18-25, 2002        0           
##       country country_txt region
## 73869      NA                 NA

terr <- terr[!(terr$region_txt == ""), ]  # Remove rows without region data
dim(terr)

## [1] 104349    123

Rerun ggplot without entries with unknown regions
Change from the default number of breaks and gridlines. I will make a “break” every 2 years, based on the range of the years. Note I also played with the angle of the x axis text, and the vertical justification:

killPerYr <- ggplot(terr, aes(x = iyear, y = nkill, colour = region_txt)) + 
    geom_point()
killPerYr + theme(axis.text.x = element_text(angle = 90, vjust = 0.25)) + scale_x_continuous(breaks = seq(min(terr$iyear), 
    max(terr$iyear), 10))

## Warning: Removed 30711 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-6

This graph appears much less cramped in R Studio, but in the transition from .rmd to HTML with knitr the graph became cramped along the x axis! Is it possible to change the appearance? Also, if I wanted to be really finicky (and had more time up my sleeve) I would change the names of the x and y axis and the legend from the defaults.

“geom point” + “facet wraps”

Let's start working with “facets”. Plot the same point graph with “facets” by regions:

killPerYr + theme(axis.text.x = element_text(angle = 90, vjust = 0.25)) + facet_wrap(~region_txt)

## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 2461 rows containing missing values (geom_point).
## Warning: Removed 55 rows containing missing values (geom_point).
## Warning: Removed 105 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).
## Warning: Removed 7287 rows containing missing values (geom_point).
## Warning: Removed 1067 rows containing missing values (geom_point).
## Warning: Removed 1104 rows containing missing values (geom_point).
## Warning: Removed 2844 rows containing missing values (geom_point).
## Warning: Removed 8680 rows containing missing values (geom_point).
## Warning: Removed 2586 rows containing missing values (geom_point).
## Warning: Removed 2075 rows containing missing values (geom_point).
## Warning: Removed 2162 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-7

Graph is super cramped, all thanks to the legend. And really, in this case the legend is redundant since the facets are named!

Remove the legend by specifying “legend.position” within the “theme” argument as “none”:

killPerYr + theme(axis.text.x = element_text(angle = 90, vjust = 0.25), legend.position = "none") + 
    facet_wrap(~region_txt)

## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 2461 rows containing missing values (geom_point).
## Warning: Removed 55 rows containing missing values (geom_point).
## Warning: Removed 105 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).
## Warning: Removed 7287 rows containing missing values (geom_point).
## Warning: Removed 1067 rows containing missing values (geom_point).
## Warning: Removed 1104 rows containing missing values (geom_point).
## Warning: Removed 2844 rows containing missing values (geom_point).
## Warning: Removed 8680 rows containing missing values (geom_point).
## Warning: Removed 2586 rows containing missing values (geom_point).
## Warning: Removed 2075 rows containing missing values (geom_point).
## Warning: Removed 2162 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-8

Now to fix the names on the facets, since we can't read the Whole title for Russia:

levels(terr$region_txt)

##  [1] ""                                           
##  [2] "Australasia & Oceania"                      
##  [3] "Central America & Caribbean"                
##  [4] "Central Asia"                               
##  [5] "East Asia"                                  
##  [6] "Eastern Europe"                             
##  [7] "Middle East & North Africa"                 
##  [8] "North America"                              
##  [9] "Russia & the Newly Independent States (NIS)"
## [10] "South America"                              
## [11] "South Asia"                                 
## [12] "Southeast Asia"                             
## [13] "Sub-Saharan Africa"                         
## [14] "Western Europe"

levels(terr$region_txt) <- c("", "Oceania", "Central America", "Central Asia", 
    "East Asia", "Eastern Europe", "MENA", "North America", "Russia", "South America", 
    "South Asia", "Southeast Asia", "Sub-Saharan Africa", "Western Europe")

Then replot the whole thing…

killPerYr <- ggplot(terr, aes(x = iyear, y = nkill, colour = region_txt)) + 
    geom_point()
killPerYr + theme(axis.text.x = element_text(angle = 90, vjust = 0.25), legend.position = "none") + 
    facet_wrap(~region_txt)

## Warning: Removed 31 rows containing missing values (geom_point).
## Warning: Removed 2461 rows containing missing values (geom_point).
## Warning: Removed 55 rows containing missing values (geom_point).
## Warning: Removed 105 rows containing missing values (geom_point).
## Warning: Removed 254 rows containing missing values (geom_point).
## Warning: Removed 7287 rows containing missing values (geom_point).
## Warning: Removed 1067 rows containing missing values (geom_point).
## Warning: Removed 1104 rows containing missing values (geom_point).
## Warning: Removed 2844 rows containing missing values (geom_point).
## Warning: Removed 8680 rows containing missing values (geom_point).
## Warning: Removed 2586 rows containing missing values (geom_point).
## Warning: Removed 2075 rows containing missing values (geom_point).
## Warning: Removed 2162 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-10

“geom histogram” and plyr

Use “ddply” to aggregate data based on maximum casualties per region

ddply(terr, ~region_txt, summarize, maxKillPerYr = max(nkill))

##            region_txt maxKillPerYr
## 1             Oceania           NA
## 2     Central America           NA
## 3        Central Asia           NA
## 4           East Asia           NA
## 5      Eastern Europe           NA
## 6                MENA           NA
## 7       North America           NA
## 8              Russia           NA
## 9       South America           NA
## 10         South Asia           NA
## 11     Southeast Asia           NA
## 12 Sub-Saharan Africa           NA
## 13     Western Europe           NA

What?!?! Lies.

max(terr$nkill)

## [1] NA

Returns NA!? We know this isn't true… hmm… Turns out there is an argument in max to ignore NA entries!

max(terr$nkill, na.rm = TRUE)

## [1] 1180

ddply(terr, ~region_txt, summarize, maxKillPerYr = max(nkill, na.rm = TRUE))

##            region_txt maxKillPerYr
## 1             Oceania           17
## 2     Central America          300
## 3        Central Asia           22
## 4           East Asia           25
## 5      Eastern Europe          180
## 6                MENA          422
## 7       North America          329
## 8              Russia          121
## 9       South America          107
## 10         South Asia          200
## 11     Southeast Asia          114
## 12 Sub-Saharan Africa         1180
## 13     Western Europe          270

Success!

Let's name this ddply result and graph it using ggplot2

maxKillReg <- ddply(terr, ~region_txt, summarize, maxKillPerYr = max(nkill, 
    na.rm = TRUE))
ggplot(maxKillReg, aes(x = region_txt, y = maxKillPerYr)) + geom_histogram() + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.25))

## Mapping a variable to y and also using stat="bin".
##   With stat="bin", it will attempt to set the y value to the count of cases in each group.
##   This can result in unexpected behavior and will not be allowed in a future version of ggplot2.
##   If you want y to represent counts of cases, use stat="bin" and don't map a variable to y.
##   If you want y to represent values in the data, use stat="identity".
##   See ?geom_bar for examples. (Deprecated; last used in version 0.9.2)

plot of chunk unnamed-chunk-14

Returns a massive warning about mapping a variable to y without specifying bin…

To remove warnings, state stat = “identity”:

ggplot(maxKillReg, aes(x = region_txt, y = maxKillPerYr)) + geom_histogram(stat = "identity") + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.25))

plot of chunk unnamed-chunk-15

Now, let's reorder data based on maximum value, and replot the result:

maxKillReg <- within(maxKillReg, region_txt <- reorder(region_txt, as.integer(maxKillPerYr), 
    FUN = min))
ggplot(maxKillReg, aes(x = region_txt, y = maxKillPerYr)) + geom_histogram(stat = "identity") + 
    theme(axis.text.x = element_text(angle = 90, vjust = 0.25, hjust = 1))

plot of chunk unnamed-chunk-16

Other variables in GTD data

Let's work with some different data: attack type
I want to find the attack type that caused the most casualties

ggplot(terr, aes(x = iyear, y = nkill)) + geom_point() + theme(axis.text.x = element_text(angle = 90, 
    vjust = 0.25)) + facet_wrap(~attacktype1_txt)

## Warning: Removed 24216 rows containing missing values (geom_point).
## Warning: Removed 1774 rows containing missing values (geom_point).
## Warning: Removed 72 rows containing missing values (geom_point).
## Warning: Removed 3208 rows containing missing values (geom_point).
## Warning: Removed 529 rows containing missing values (geom_point).
## Warning: Removed 13 rows containing missing values (geom_point).
## Warning: Removed 83 rows containing missing values (geom_point).
## Warning: Removed 588 rows containing missing values (geom_point).
## Warning: Removed 15 rows containing missing values (geom_point).
## Warning: Removed 213 rows containing missing values (geom_point).

plot of chunk unnamed-chunk-17

How do you deal with an empty facet???

Let's use data aggregation to find the total kills per attack type:

killByAtt <- ddply(terr, ~attacktype1_txt, summarize, totalKill = sum(nkill, 
    na.rm = TRUE))
killByAtt

##                        attacktype1_txt totalKill
## 1                                              0
## 2                        Armed Assault     89930
## 3                        Assassination     16854
## 4                    Bombing/Explosion     40823
## 5       Facility/Infrastructure Attack      2107
## 6                            Hijacking       467
## 7  Hostage Taking (Barricade Incident)       911
## 8          Hostage Taking (Kidnapping)      2146
## 9                      Unarmed Assault       629
## 10                             Unknown      9213

# Reorder levels based on minimum of totalKill
killByAtt <- within(killByAtt, attacktype1_txt <- reorder(attacktype1_txt, as.integer(totalKill), 
    FUN = min))
# Plot result
ggplot(killByAtt, aes(x = attacktype1_txt, y = totalKill)) + geom_histogram(stat = "identity") + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1))

plot of chunk unnamed-chunk-18

Let's find out more information about the attacks that had the most casualties per region.
Write our own function within ddply to return the max entry information using which.max together with indexing:

maxKillSumm <- ddply(terr, ~region_txt, function(x) {
    indMax <- which.max(x$nkill)
    x[indMax, c("nkill", "attacktype1_txt", "weaptype1_txt", "gname")]
})
maxKillSUmm <- within(maxKillSumm, region_txt <- reorder(region_txt, as.integer(nkill), 
    FUN = min))

Colour number of casualties by attack type using “fill”:

ggplot(maxKillSUmm, aes(x = region_txt, y = nkill, fill = attacktype1_txt)) + 
    geom_histogram(stat = "identity") + theme(axis.text.x = element_text(angle = 45, 
    hjust = 1))

plot of chunk unnamed-chunk-20

Issues:

How to deal with blank entries and NA values??? I had a huge amount of “Warnings” from R about removing rows with missing values each time I called on the ggplot function. And I had graphs that displayed empty facets.

Best practice renaming long entries that are displayed? (e.g. “Russia” here). Rename levels?