1. Introduction:

Storm disasters are a regular occurence in the United States of America (US). National Oceanic and Atmospheric Adminsitration (NOAA) is an American sceintific agency within the Unites States Department of Commerce that focusses on the conditions of the oceans, major waterways and the atmosphere. NOAA has a database of disasters that have occured in the US. This study is an attempt to analyze this database and figure out which disasters are most detrimental to human health and economy.


2. Synopsis:

The storm disaster dataset spans from 1950 to 2011. The dataset has been first downloaded and processed and then analysed to figure out the following:

Barcharts have been created for visualizing the results. The following sections describe the process in detail.


3. Data Processing:

library(data.table)
library(ggplot2)
library(gridExtra)


info_temporary <- fread(input='https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', 
                        nrows=10)
print(colnames(info_temporary))
 [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
 [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
[11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
[16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
[21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
[26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
[31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
[36] "REMARKS"    "REFNUM"    


Detailed information about the column headers can be obtained from Storm Data Documentation

columns_to_keep <- c('EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')


info <- fread(input='https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2', 
              select=columns_to_keep)


sapply(info, typeof)
     EVTYPE  FATALITIES    INJURIES     PROPDMG  PROPDMGEXP     CROPDMG 
"character"    "double"    "double"    "double" "character"    "double" 
 CROPDMGEXP 
"character" 


exponent_list <- list('1'=1,'2'=2,'3'=3,'4'=4,'5'=5,'6'=6,'7'=7,'8'=8,'9'=9,
                      'k'=3,'K'=3, 'm'=6,'M'=6, 'b'=9, 'B'=9,'NA'=0)

exponent_symbol_to_num <- function(exponent_symbol)
  {
    exponent_list[[match(exponent_symbol, names(exponent_list), nomatch=length(exponent_list))]]
  }

info[, `:=`(PROPDMGEXP_NUM = sapply(PROPDMGEXP, exponent_symbol_to_num),
            CROPDMGEXP_NUM = sapply(CROPDMGEXP, exponent_symbol_to_num))]


info[, `:=`(PROPDMG_VAL = PROPDMG*10^PROPDMGEXP_NUM,
            CROPDMG_VAL = CROPDMG*10^CROPDMGEXP_NUM)]
info[, Sum_Damage := PROPDMG_VAL+CROPDMG_VAL]


health_info <- info[, .(Total_Fatalities = sum(FATALITIES), Total_Injuries = sum(INJURIES)), by=EVTYPE]
economic_info <- info[, .(Total_Damage = sum(Sum_Damage, n.rm=TRUE)/10^9), by=EVTYPE]

4. Results:

g1 <- ggplot(data=health_info[order(Total_Fatalities, decreasing=TRUE)][1:7], 
             mapping=aes(x=reorder(EVTYPE, Total_Fatalities), y=Total_Fatalities)) +  
  geom_col(fill='coral2') + coord_flip() + theme_bw()+ xlab('Type of Event') + 
  ylab('Total Fatilities') + labs(title='Total Fatilities by Event Type for top 7 Events (1950-2011)')

g2 <- ggplot(data=health_info[order(Total_Injuries, decreasing=TRUE)][1:7], 
             mapping=aes(x=reorder(EVTYPE, Total_Injuries), y=Total_Injuries)) +  
  geom_col(fill='tan2') + coord_flip() + theme_bw()+ xlab('Type of Event') + 
  ylab('Total Injuries') + labs(title='Total Injuries by Event Type for top 7 Events (1950-2011)')

grid.arrange(g1, g2)


ggplot(data=economic_info[order(Total_Damage, decreasing=TRUE)][1:7],
             mapping=aes(x=reorder(EVTYPE, Total_Damage), y=Total_Damage))+ 
  geom_col(fill='wheat3') + coord_flip() + theme_bw()+ xlab('Type of Event') + 
  ylab('Economic Damage, Property & Crop (USD Billions) ') + 
  labs(title='Economic Damage by Event Type for top 7 Events (1950-2011)')


5. Conclusion:

This is a preliminary analysis of the NOAA Storm data to identify the disasters that cause the greatest damage to life and economy. Further, temporal and spatial studies are required to build upon these findings to design policy responses.