NOAA Storm Data

Reproducible Research Peer Assessment 2

The US NOAA storm database was used to analyze the effects of the major storm and weather events in the United States, with respect to human fatalities, human injuries, and property/crop damage. The data used for anaysis is from the years 1950 - 2011. The data used is only as reported by NOAA. Overviewand instructions for database: NOAA Storm Data Preperation

Data Processing

  • load data set, assumes setwd() to working directory
  • calculate summary of fatalities and injuries with respect to storm types, using aggregatge
  • rank top 10 storm types
storm<-read.csv("repdata-data-StormData.csv.bz2")

s<-aggregate(FATALITIES ~ EVTYPE, storm, sum, na.rm=TRUE)
s1<-aggregate(INJURIES ~ EVTYPE, storm, sum, na.rm=TRUE)
fatal<-s[order(s$FATALITIES, decreasing=TRUE),]
injury<-s1[order(s1$INJURIES, decreasing=TRUE),]
fatal<-fatal[1:10,]
injury<-injury[1:10,]
  • create new dataframe for cost damage calculation
  • eliminate unused columns
  • create column to transform cost exponents to number to be multiplied, using gsub function for B,M,m,K,H,h,8,7,6,5,4,3,2 and changed to numeric column
  • then multiplied damage by the exponent value for individual damage costs
  • level functionin …EPX column to identify exponent identifiers
  • ignored +,-,?,0,_,1 as definiton could not be found in documentation, resulting in NA values
  • created separate columns for property damage and crop damage, but same technique used for both
  • NA values for exp ommited from calculations
cost<-storm
cost<-cost[,-c(1:7)]
cost<-cost[,-c(2:17)]
cost<-cost[,-c(6:14)]
cost$pexp<-cost$PROPDMGEXP
levels(cost$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
cost$pexp<-gsub("B", "1000000000", cost$pexp)
cost$pexp<-gsub("M|m", "1000000", cost$pexp)
cost$pexp<-gsub("K", "1000", cost$pexp)
cost$pexp<-gsub("H|h", "100", cost$pexp)
cost$pexp<-gsub("8", "100000000", cost$pexp)
cost$pexp<-gsub("7", "10000000", cost$pexp)
cost$pexp<-gsub("6", "1000000", cost$pexp)
cost$pexp<-gsub("5", "100000", cost$pexp)
cost$pexp<-gsub("4", "10000", cost$pexp)
cost$pexp<-gsub("3", "1000", cost$pexp)
cost$pexp<-gsub("2", "100", cost$pexp)
cost$pexp<-as.numeric(cost$pexp)
## Warning: NAs introduced by coercion
cost$property<-cost$PROPDMG * cost$pexp

cost$cexp<-cost$CROPDMGEXP
levels(cost$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
cost$cexp<-gsub("B", "1000000000", cost$cexp)
cost$cexp<-gsub("M|m", "1000000", cost$cexp)
cost$cexp<-gsub("K|k", "1000", cost$cexp)
cost$cexp<-gsub("2", "100", cost$cexp)
cost$cexp<-as.numeric(cost$cexp)
## Warning: NAs introduced by coercion
cost$crop<-cost$CROPDMG * cost$cexp
  • cost for property and crop damage are still in separate columns
  • combine to total cost using apply
  • calculate cost by storm type using aggregate
  • rank the top ten storm typers
cost$total<-apply(cbind(cost$property, cost$crop), 1, sum, na.rm=TRUE)

c<-aggregate(total ~ EVTYPE, cost, sum, na.rm=TRUE)
c1<-c[order(c$total, decreasing=TRUE),]
c1<-c1[1:10,]

Results

  • Top ten storm types for human fatalities:
  • Top ten storm types for human injuries:
print(fatal)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
print(injury)
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
  • Bar chart of storm types:
par(mfrow=c(1,2))
barplot(fatal$FATALITIES, names.arg=fatal$EVTYPE, las=2, 
        main="Storm Event \nFatalities", ylab="Fatalities", col="red",
        cex.axis=0.75, cex.names=0.6)
barplot(injury$INJURIES, names.arg=injury$EVTYPE, las=2, 
        main="Storm Event \nInjuries", ylab="Injuries", col="blue",
        cex.axis=0.75, cex.names=0.6)

  • Top ten storm types for property and crop damage:
print(c1)
##                EVTYPE        total
## 170             FLOOD 150319678250
## 411 HURRICANE/TYPHOON  71913712800
## 834           TORNADO  57362333590
## 670       STORM SURGE  43323541000
## 244              HAIL  18761221670
## 153       FLASH FLOOD  18243990610
## 95            DROUGHT  15018672000
## 402         HURRICANE  14610229010
## 590       RIVER FLOOD  10148404500
## 427         ICE STORM   8967041310
  • Bar chart of storms w.r.t. damage:
par(mfrow=c(1,1))
barplot(c1$total, names.arg=c1$EVTYPE, las=2, 
        main="Storm Event \nDamage Cost", ylab="Cost USD", col="green",
        cex.axis=0.75, cex.names=0.6)

Conclusion

  1. Tornadoes are most dangerous storm type for human health
  2. Floods have the greatest economic consequences