SYNOPSIS This document will provide historic data for civilian casualities and economic cost to help Federal Emergency Management Agency (FEMA) prepare for future natural disasters. Damage and casualties results are based from the National Weather Service Storm DATA taken from 1950-2011.

Casulaties
Data results for casuality are based from Storm Data FATALITIES column(23) and INJURIES column (24). Since combining both these factors will not yield which factor has more impact on the population health, fatalities and injuries columns were tabulated separately. Each factor or column were summed up and at other instances averaged.
NOTE: there is no distinction between direct and inderect fatalities and injuries.

Economic Damage
Data results for economic damage are based from Storm Data PRPDMG column (25) and CRPDMG column (27). Both columns were summed up to get the total damage.

NOTE: results are not adjusted for inflation


Data Processing

Strom Data was processed using read_csv from the readr package and then loaded as data.table from the datatable package for easy data manipulation and viewing.

library(readr)
library(dplyr)
library(ggplot2)
library(data.table)
library(yarrr)
library(stats)
download.file('https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2','FStormData.csv.bz2') 
dataRead <- read_csv("FStormData.csv.bz2")
rawdata <- data.table(dataRead)

Result: Storm Data

head(rawdata,5)
##    STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1:       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2:       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3:       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4:       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5:       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
##     EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1: TORNADO         0      NA         NA       NA       NA          0
## 2: TORNADO         0      NA         NA       NA       NA          0
## 3: TORNADO         0      NA         NA       NA       NA          0
## 4: TORNADO         0      NA         NA       NA       NA          0
## 5: TORNADO         0      NA         NA       NA       NA          0
##    COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1:         NA         0      NA         NA   14.0   100 3   0          0
## 2:         NA         0      NA         NA    2.0   150 2   0          0
## 3:         NA         0      NA         NA    0.1   123 2   0          0
## 4:         NA         0      NA         NA    0.0   100 2   0          0
## 5:         NA         0      NA         NA    0.0   150 2   0          0
##    INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1:       15    25.0          K       0         NA  NA         NA        NA
## 2:        0     2.5          K       0         NA  NA         NA        NA
## 3:        2    25.0          K       0         NA  NA         NA        NA
## 4:        2     2.5          K       0         NA  NA         NA        NA
## 5:        2     2.5          K       0         NA  NA         NA        NA
##    LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1:     3040      8812       3051       8806      NA      1
## 2:     3042      8755          0          0      NA      2
## 3:     3340      8742          0          0      NA      3
## 4:     3458      8626          0          0      NA      4
## 5:     3412      8642          0          0      NA      5

Population Health

Most Fatalities by Event

Fatalities results are based from adding the total numbers from Fatalities column per event.
All other columns from raw data were discarded except for EVTYPE,FATALITIES and INJURIES columns.

NOTE: Observation column added to quantify number of records entered per event

PopHealth <- rawdata %>% 
            select(EVTYPE,FATALITIES,INJURIES) %>%
            group_by(EVTYPE)%>%
            summarise(
                     Observation = n(),
                    TotalFatalities = sum(FATALITIES),
                    TotalInjuries = sum(INJURIES)
                    )
MostFatalities <-   PopHealth %>% #code for getting the top ten fatalities data per event
                    select(-TotalInjuries, -Observation) %>%
                    arrange(desc(TotalFatalities))%>%
                    head(10)

mycolors <- piratepal("basel")#color pallete

ggplot(MostFatalities, aes(x= reorder(MostFatalities$EVTYPE,1:10 ),MostFatalities$TotalFatalities)) + geom_col(fill=mycolors) + theme(axis.text.x = element_text(angle = 70, vjust = 0.5)) + labs(title= "Top 10 Events with Most Fatalities", x = "Natural Disasters", y= "Total Fatalities")     


Table Results: Top 10 Events with Most Fatalities

print(MostFatalities)
## # A tibble: 10 × 2
##            EVTYPE TotalFatalities
##             <chr>           <dbl>
## 1         TORNADO            5633
## 2  EXCESSIVE HEAT            1903
## 3     FLASH FLOOD             978
## 4            HEAT             937
## 5       LIGHTNING             816
## 6       TSTM WIND             504
## 7           FLOOD             470
## 8     RIP CURRENT             368
## 9       HIGH WIND             248
## 10      AVALANCHE             224

Most Fatalities per Incident

Fatalities per incident results are based from adding the total numbers from Fatalities column per event divided by the number of observations or incidents.

MostFatal.Incident <-   PopHealth %>% #code for getting the top 10 fatalities per reported incident
                        mutate( Fatality.incident = TotalFatalities / Observation) %>% 
                        select(-TotalFatalities,-Observation, -TotalInjuries) %>%
                        arrange(desc(Fatality.incident)) %>%
                        head(10)

ggplot(MostFatal.Incident, aes(x= MostFatal.Incident$EVTYPE,MostFatal.Incident$Fatality.incident),MostFatalities$TotalFatalities) + geom_col(fill=mycolors) + theme(axis.text.x = element_text(angle = 70, vjust = 0.5)) + labs(title= "Top 10 Events with Most Fatalies Per Incident", x = "Natural Disasters", y= "Ave. Fatalities per incident") 


Table Results: Top 10 Events with Most Fatalities Per Incident

print(MostFatal.Incident)
## # A tibble: 10 × 2
##                        EVTYPE Fatality.incident
##                         <chr>             <dbl>
## 1  TORNADOES, TSTM WIND, HAIL         25.000000
## 2               COLD AND SNOW         14.000000
## 3       TROPICAL STORM GORDON          8.000000
## 4       RECORD/EXCESSIVE HEAT          5.666667
## 5                EXTREME HEAT          4.363636
## 6           HEAT WAVE DROUGHT          4.000000
## 7              HIGH WIND/SEAS          4.000000
## 8               MARINE MISHAP          3.500000
## 9               WINTER STORMS          3.333333
## 10        Heavy surf and wind          3.000000

Most Injuries by Event

Injuries results are based from adding the total numbers from Injuries column per event.

MostInjuries <- PopHealth %>% #code for getting the top 10 injuries by event
                select(-TotalFatalities, -Observation) %>%
                arrange(desc(TotalInjuries, order))%>%
                head(10)

ggplot(MostInjuries, aes(x= reorder(MostInjuries$EVTYPE,1:10 ),MostInjuries$TotalInjuries)) + geom_col(fill=mycolors) + theme(axis.text.x = element_text(angle = 70, vjust = 0.5)) + labs(title= "Top 10 Events with Most Injuries", x = "Natural Disasters", y= "Total Injuries")   


Table Results: Top 10 Events with Most Injuries

print(MostInjuries)
## # A tibble: 10 × 2
##               EVTYPE TotalInjuries
##                <chr>         <dbl>
## 1            TORNADO         91346
## 2          TSTM WIND          6957
## 3              FLOOD          6789
## 4     EXCESSIVE HEAT          6525
## 5          LIGHTNING          5230
## 6               HEAT          2100
## 7          ICE STORM          1975
## 8        FLASH FLOOD          1777
## 9  THUNDERSTORM WIND          1488
## 10              HAIL          1361

Most Injuries by Incident

Injuries results are based from adding the total numbers from Injuries column per event divided the number of observations or incidents.

MostInjury.Incident <- PopHealth %>% #code for getting the top 10 injuries per reported incident
                        mutate( Injury.incident = TotalInjuries / Observation) %>%
                        select(-TotalInjuries,-Observation, -TotalFatalities) %>%
                        arrange(desc(Injury.incident)) %>%
                        head(10)

ggplot(MostInjuries, aes(MostInjury.Incident$EVTYPE,MostInjury.Incident$Injury.incident))+ geom_col(fill=mycolors) + theme(axis.text.x = element_text(angle = 70, vjust = 0.5)) + labs(title= "Top 10 Events with Most Injuries per Incident", x = "Natural Disasters", y= "Ave. Injuries per incident")                         


Table Results: Top 10 Events with Most Injuries per Incident

print(MostInjury.Incident)
## # A tibble: 10 × 2
##                     EVTYPE Injury.incident
##                      <chr>           <dbl>
## 1                Heat Wave        70.00000
## 2    TROPICAL STORM GORDON        43.00000
## 3               WILD FIRES        37.50000
## 4            THUNDERSTORMW        27.00000
## 5       HIGH WIND AND SEAS        20.00000
## 6          SNOW/HIGH WINDS        18.00000
## 7          GLAZE/ICE STORM        15.00000
## 8        HEAT WAVE DROUGHT        15.00000
## 9  WINTER STORM HIGH WINDS        15.00000
## 10       HURRICANE/TYPHOON        14.48864

Economic Cost

Most Property Damage by Event
Total economic cost damage were divided by $10,000.

NOTE: Observation column added to quantify number of records entered per event

EcoDamage <-rawdata %>%
    select(EVTYPE,PROPDMG,CROPDMG) %>%
    group_by(EVTYPE)%>%
    summarise(
        Observation = n(),
        TotalProp.Damage = sum(PROPDMG),
        TotalCrop.Damage = sum(CROPDMG)
        )%>%
    mutate(TotalDamage = floor((TotalProp.Damage + TotalCrop.Damage)/ 10000) ) #Cost diveded by 10,000
        
EcoTotalDamage <-   EcoDamage %>% 
                    select (-Observation,-TotalCrop.Damage, -TotalProp.Damage)%>% 
                    arrange(desc(TotalDamage)) %>%
                    head(10)

ggplot(EcoTotalDamage, aes(x= reorder(EcoTotalDamage$EVTYPE,1:10),EcoTotalDamage$TotalDamage))+ geom_col(fill=mycolors) + theme(axis.text.x = element_text(angle = 70, vjust = 0.5)) + labs(title= "Top 10 Events with Most Economic Damage", x = "Natural Disasters", y= "Damage in $10k units ") 


Table Results: Top 10 Events with Most Economic Damage

print(EcoTotalDamage)
## # A tibble: 10 × 2
##                EVTYPE TotalDamage
##                 <chr>       <dbl>
## 1             TORNADO         331
## 2         FLASH FLOOD         159
## 3           TSTM WIND         144
## 4                HAIL         126
## 5               FLOOD         106
## 6   THUNDERSTORM WIND          94
## 7           LIGHTNING          60
## 8  THUNDERSTORM WINDS          46
## 9           HIGH WIND          34
## 10       WINTER STORM          13

Conclusion

Events that are most harmful to population health: Tornadoes,Thunderstorm and Hail

Event that have the greatest economic consequences: Tornadoes and FlashFlood