Introduction

Today is 20 years since the terrorist attacks on New York’s World Trade Center. I thought I would take a look at the data on the US soldiers who died in Afghanistan.

This exercise illustrates the use of:

The packages that I will use:

library(tidyverse)
library(readxl)
library(lubridate)

Data on the US dead in the Afghanistan war

I downloaded an Excel file on the US military personnel who died in Operation Enduring Freedom (Afghanistan 2001 – 2014) from https://dcas.dmdc.osd.mil/dcas/pages/report_oef_namesalp.xhtml#. Similarly, I downloaded an Excel file on the US military personnel who died in Operation Freedom’s Sentinel (Afghanistan 2014 and after) from https://dcas.dmdc.osd.mil/dcas/pages/report_ofs_namesalp.xhtml#.

OEF.file <- "data/OEFNames of Fallen.xls"
OFS.file <- "data/OFSNames of Fallen.xls"

Reading data from Excel to R using the readxl package

OEF.fallen <- read_excel(OEF.file, skip = 5, .name_repair = "universal")

The .name_repair = "universal"attribute converts the variable names in the Excel file into formats that R can deal with easily.

Convert calendar dates read as character data by read_excel() into data that R recognizes as a calendar date

This uses the ymd() function in the lubridate package.

OEF.fallen <- OEF.fallen %>%
  mutate(Date.of.Death = ymd(Date.of.Death..yyyy.mm.dd.))

I used the ymd() function because my Excel dates were in the yyyymmdd format. Had they been in the ddmmyyyy format I would have used the dmy() function instead.

Also, I used the mutate() function of the dplyr package to give the date variable a nicer name.

Repeat the above steps for Operation Freedom’s Sentinel data

OFS.fallen <- read_excel(OFS.file, skip = 5, .name_repair = "universal") %>%
  mutate(Date.of.Death = ymd(Date.of.Death..yyyy.mm.dd.))

Stack two data frames that have the same variables

fallen <- rbind(OFS.fallen, OEF.fallen)

Find the time period covered by the data

This tally of the Afghan war dead covers the period from October 10, 2001 through August 26, 2021. The last US soldiers left Afghanistan on August 31, 2021.

Extract from a date the year that the date is in

Here, I use the year() function of the lubridate package. (I could have used the month() function to extract the month, etc.)

fallen <- fallen %>%
  mutate(Year = year(Date.of.Death)) %>%
  select(Year, Age, Gender)

Also, I used the select() function of the dplyr package to select the three variables I need for this exercise.

Calculate the number of observations that fall into specific groups

Each observation in the fallen data frame I have created above is one soldier who died. Now, I need to count the number of the dead for each year-and-gender group. The grouping is done using the group_by() function of the dplyr package. Then the counting of observations within each group is done using the n() function.

fallen <- fallen %>%
  group_by(Year, Gender) %>%
  summarise(Deaths = n()) %>%
  na.omit()

Bar chart

Finally, I use the ggplot2 package to make a bar chart:

ggplot(fallen, aes(x = Year, y = Deaths)) +   # Deaths gives you the total Deaths in each year for both Genders.
  geom_col(aes(fill = Gender))

Table

I will now use the pivot_wider() command of the dplyr package to show deaths for males and females in separate columns. Then, I will use the kableExtra package to present my data in tabular form.

fallen <- fallen %>% pivot_wider(names_from = Gender, values_from = Deaths)
fallen[is.na(fallen)] = 0
fallen <- fallen %>%
  mutate(TOTAL = MALE + FEMALE) %>%
  knitr::kable(caption = "U.S. Deaths in Afghanistan") %>%
  kableExtra::kable_styling(full_width = FALSE)
fallen
U.S. Deaths in Afghanistan
Year MALE FEMALE TOTAL
2001 11 0 11
2002 46 3 49
2003 42 1 43
2004 51 1 52
2005 95 4 99
2006 95 3 98
2007 116 2 118
2008 155 1 156
2009 305 6 311
2010 494 4 498
2011 402 9 411
2012 305 9 314
2013 125 7 132
2014 54 1 55
2015 17 4 21
2016 10 0 10
2017 15 0 15
2018 16 0 16
2019 23 0 23
2020 10 1 11
2021 11 2 13

The End