The goal of this research is to examine how severe weather events and storms have affected people financially and physically in the United States since the 1950’s. The data comes from the United States National Oceanic and Atomspheric Adminstation database. My goal is to examine within the dataset the attributes that include physical and property damage and create summaries as well as plot to see which events has caused the most physical harm and which events has been the most expensive to recover from. I will first show and graph the top 10 events with the most injuries and deaths and then I will show the top 15 events with the most property and crop damage.

Data Processing

First lets get a hold of the NOAA dataset containing storm data and select the data that includes only the event type, the property and crop damage and fatalities and injuries. Also as there is a lot of error in the data, I will also only select the data such that the property damage expense and crop damage expense is scaled to hundreds, thousands, millions and billions (PROPDMGEXP and CROPDMGEXP ==“H”,“K”,“M”,or “B” where H=100,K=1000,M=1000000,B=1000000000 so if a observation has PROPDMG = 5 and PROPDMGEXP = “M”, then the cost of damage is 5 million dollars)

stormdata <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
#get the appropriate attributes to answer the questions
stormdata_property_physical <- stormdata[,c("EVTYPE","INJURIES","FATALITIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
# select only the data where PROPDMGEXP and CROPDMGEXP= "H","K","M","B"
stormdata_property_physical <- subset(stormdata_property_physical,
                                      PROPDMGEXP == "H" | PROPDMGEXP == "K" | PROPDMGEXP == "M" | PROPDMGEXP == "B"
                                      | CROPDMGEXP == "K" | CROPDMGEXP == "M" | CROPDMGEXP == "B")
# finally to map the PROPDMGEXP and CROPDMGEXP to appropriate numbers H=100,K=1000,M=1000000,B=1000000000
stormdata_property_physical$PROPDMGEXP <- gsub("H",100,stormdata_property_physical$PROPDMGEXP)
for (i in c("K","M","B"))
    stormdata_property_physical$PROPDMGEXP <- gsub(i,10^n,stormdata_property_physical$PROPDMGEXP)
    stormdata_property_physical$CROPDMGEXP <- gsub(i,10^n,stormdata_property_physical$CROPDMGEXP)
    n <- n + 3
stormdata_property_physical$PROPDMGEXP <- as.numeric(stormdata_property_physical$PROPDMGEXP)
stormdata_property_physical$CROPDMGEXP <- as.numeric(stormdata_property_physical$CROPDMGEXP)
## Warning: NAs introduced by coercion


First Question to answer: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

# use tapply to compute the sum of injuries and fatalities for each event type and sort them in descending order and select say the highest 10 event types that contirbute to the most deaths and injuries. 
highest_deaths_injuries <- sort(with(stormdata_property_physical,tapply(INJURIES+FATALITIES,EVTYPE,sum)),decreasing=TRUE)[1:10]

##           TORNADO             FLOOD         TSTM WIND       FLASH FLOOD 
##             95981              7152              2911              2272 
##         ICE STORM         LIGHTNING              HEAT THUNDERSTORM WIND 
##              1891              1821              1781              1599 
##              1337              1137
m <-barplot(highest_deaths_injuries,xaxt="n",ylab="Total Injuries and Deaths",xaxt="n",main="Total Injuries and Fatalities due to Weather Events",cex.axis=.8,cex.main=1.3,cex.lab=.9)

Abbreviations: TOR = Tornadoes TSTMW = Thunderstorm Wind FF = Flash Flood IC = Ice Storm LGT = Lightning TW = Thunderstorm Wind (same as TSTMW) H/T = Hurricane/Typhoon EH = Exessive Heat HR/HS = Heavy Rain/High Surf LF = Lakeshore Flood HWHR = High Winds Heavy Rain FF = Forest Fires

As you can see, Tornados greatly contributes to the most deaths and injuries while the other top physically devasting weather events are much less severe.

Second Question to answer: Across the United States, which types of events have the greatest economic consequences?

# same concept as finding the highest deaths and injuries except this is for finding the most expensive and most damages in terms of property and crops.
highest_property_crop_dmg <- sort(with(stormdata_property_physical,tapply((PROPDMG*PROPDMGEXP) + (CROPDMG*CROPDMGEXP),EVTYPE,sum)),decreasing=TRUE)[1:10]

## TORNADOES, TSTM WIND, HAIL                    TSUNAMI 
##                 1602500000                  144082000 
##                  117500000                  110000000 
##                   65000000                   20600000 
##       Heavy Rain/High Surf            LAKESHORE FLOOD 
##                   15000000                    7540000 
##                    7510000                    5500000
m <-barplot(highest_property_crop_dmg,xaxt="n",ylab="Total Property and Crop Damage in US Dollars",xaxt="n",main="Expense of Property and Crop Damage due to Weather Events",cex.axis=.8,cex.main=1.3,cex.lab=.9)

Abbreviations: TTWH = Tornadoes, Thunderstorms and Hail TST = Tsunami HW/C = High Winds/Cold HO/HW = Hurricane Opal/High Winds WSHW = Winter Storm High Winds TSJ = Tropical Storm Jerry HR/HS = Heavy Rain/High Surf LF = Lakeshore Flood HWHR = High Winds Heavy Rain FF = Forest Fires

In terms of financial damage, based on the printout of the most expensive and most damages to property and crops, Tornadoes, Thunderstorms and Hail have damaged the US in terms of finances and damages due to weather events.