Benefits

The interest of exploring U.S. Military death data is visualize these death to the public so that something can be done to reduce this death. We all know that U.S. Military involves politicians, technology, industry, healthcare and government. Thus, by displaying this data to the public, all these entities can contribute each at their power level to take major decisions that could end up saving more lives in military. These decisions can be to improving military mechanics, to helping politicians to make better policy, to adjusting military strategy, to doctors and paramedical to rethink and find appropriate health-plan for military personnel. I plan to become a consultant using my skills as data scientist in various domain of the society to present meaningful report to government entities, companies, and organizations to help them in decision making. So, this project will contribute to building skills necessary for one to be successful in data science.

Research questions

What is the death rate of military personnel over the course of 20 years? What is the death rate of military personnel in active duty of the course of 20 years? What is the death ration of military personnel by accident and illness? Do military personnel dies more by homicide than combat? Dp military personnel die more by illness than accident?

Data source

We were looking at open-source data like kaggle.com and found some interesting dataset about military that no one has not made a any contribution on it. The original source of the dataset (‘ActiveDutyDeathNo’) is from: Defense Casualty Analysis System (DCAS) , https://dcas.dmdc.osd.mil/dcas/pages/report_by_year_manner.xhtml. Data is completely free and represents 20 years (1980-2010) of data collected on U.S. Active Duty Military Deaths. The details of the dataset can be seen below:

##   Calendar.Year Active.Duty Full.Time..est..Guard.Reserve Selected.Reserve.FTE
## 1          1980     2050758                         22000                86872
## 2          1981     2093032                         22000                91719
## 3          1982     2112609                         41000                97458
## 4          1983     2123909                         49000               100455
## 5          1984     2138339                         55000               104583
## 6          1985     2150379                         64000               108806
##   Total.Military.FTE Total.Deaths Accident Hostile.Action Homicide Illness
## 1            2159630         2392     1556              0      174     419
## 2            2206751         2380     1524              0      145     457
## 3            2251067         2319     1493              0      108     446
## 4            2273364         2465     1413             18      115     419
## 5            2297922         1999     1293              1       84     374
## 6            2323185         2252     1476              0      111     363
##   Pending Self.Inflicted Terrorist.Attack Undetermined
## 1       0            231                1           11
## 2       0            241                0           13
## 3       0            254                2           16
## 4       0            218              263           19
## 5       0            225                6           16
## 6       0            275                5           22
## 'data.frame':    31 obs. of  14 variables:
##  $ Calendar.Year                : int  1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 ...
##  $ Active.Duty                  : int  2050758 2093032 2112609 2123909 2138339 2150379 2177845 2166611 2121659 2112128 ...
##  $ Full.Time..est..Guard.Reserve: int  22000 22000 41000 49000 55000 64000 69000 71000 72000 74200 ...
##  $ Selected.Reserve.FTE         : int  86872 91719 97458 100455 104583 108806 113010 115086 115836 117056 ...
##  $ Total.Military.FTE           : int  2159630 2206751 2251067 2273364 2297922 2323185 2359855 2352697 2309495 2303384 ...
##  $ Total.Deaths                 : int  2392 2380 2319 2465 1999 2252 1984 1983 1819 1636 ...
##  $ Accident                     : num  1556 1524 1493 1413 1293 ...
##  $ Hostile.Action               : int  0 0 0 18 1 0 2 37 0 23 ...
##  $ Homicide                     : int  174 145 108 115 84 111 103 104 90 58 ...
##  $ Illness                      : int  419 457 446 419 374 363 384 383 321 294 ...
##  $ Pending                      : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ Self.Inflicted               : int  231 241 254 218 225 275 269 260 285 224 ...
##  $ Terrorist.Attack             : int  1 0 2 263 6 5 0 2 17 0 ...
##  $ Undetermined                 : int  11 13 16 19 16 22 27 25 26 37 ...

Overall Workflow

We will use OSEMN Process:
    1. Obtain Data
      • Github
      • R Programming
      • spyder-Python
    2. Scrub Data
      • organizing data
      • Tidying up data
    3. Explore Data
      • Inspect data and understand the characteristic of the data
      • Looking for relationship, patterns and values,
    4. Model Data
      • Explore forecast
      • Explore other data visualization charts
      • explore building apps to display plots: Shiny, Dash
    5. Interpret Results
      • Explaining findings (Answering the research questions)
      • Actionable information

Challenges

There are few challenges in this project to be overcome:
      • Due to the sensibility of the dataset, it can be though to be neutral.
      • Interpreting Results: This is going to be crucial. How do we present the data? Which format will be suitable?