Storm Report

Synopsis

The U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database tracks characteristics of major storms and weather events in the United States.

With that data, this analysis tries to find out which bad wheather conditions cost the most either in terms of fatalities and injuries, or in terms of damaged goods.

Data Processing

Data is read from the file “StormData.csv.bz2; this file must be situated in the working directory of the project. It is rather big, so time-consuming to read. SD (StormData) is used as a data.frame to store. Let's string be read as factors.

SD <- read.csv(".//repdata_data_StormData.csv.bz2", header=TRUE)

Out of the 37 variables, we only use 6 variables:

EVTYPE: Factor of 985 levels with the type of events of each reacord
FATALITIES: Number of death casualties caused by the event
INJURIES: Number of injured people by the event
PROPDMG and PROPDMGEXP: Both indicate the property damage in dollars. They must be combined as PROPDMG·10PROPDMGEXP
CROPDMG and CROPDMGEXP: Both indicate the property damage in dollars. They must be combined as CROPDMG·10CROPDMGEXP

The first problem we find is that the exponent of the two previous quantities not always is a numeric value. For example, sometimes a "k” or “K” is used for a value of three. So we create a “conversion dictionary”, with the named list of vectors conv, and use it as the levels of the exponent variables, PROPDMGEXP and CROPDMGEXP.

Once de EXP variables are corrected, we can calculate the total amounts of PROPDMG and CROPDMG

conv = list("0"=c("","-","?","+","0"),"1"="1","2"=c("2","h","H"),
            "3"=c("3","K","k"),"4"="4","5"="5","6"=c("6","M","m"),"7"=7,
            "8"="8","9"=c("9","B","b"))
levels(SD$PROPDMGEXP) <- conv
levels(SD$CROPDMGEXP) <- conv
SD$PROPDMG=SD$PROPDMG*10**(as.integer(as.character(SD$PROPDMGEXP)))
SD$CROPDMG=SD$CROPDMG*10**(as.integer(as.character(SD$CROPDMGEXP)))

There are some missing values and errors in the EVTYPE, but we will ignore them in the analysis.

For each event type EVTYPE, we calculate the total sum of FATALITIES, INJURIES, PROPDMG and CROPDMG We will use the pair funcions melt() and dcast in the library reshape (Yes, I know that I should have load the library at the beginning, just for the sake of clarity ;)

library(reshape2) 

SDC <- melt(data=SD[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","CROPDMG")],
             id=c("EVTYPE"), 
             measure.vars=c("FATALITIES", "INJURIES", "PROPDMG","CROPDMG"))
SDC <- dcast(SDC, EVTYPE ~ variable, sum)

The result is stored in the new data.frame SDC (Storm Data-Clean).

So, we are ready to proccess the data and answer the asked questions.

Which events are are most harmful to population health ?

We calculate the sum of the two totals for each EVTYPE, FATALITIES+INJURIES, sort the result in decreasing order, and slect the 8 top values.

Finally we use a barplot() to display the results.

healthD <- head(SDC[order(SDC$FATALITIES + SDC$INJURIES, decreasing=TRUE), 
                     c("EVTYPE", "FATALITIES", "INJURIES")], 8)

barplot(height=(healthD$FATALITIES + healthD$INJURIES)/1000, names.arg=healthD$EVTYPE, 
        beside=TRUE, width=2, las=2, ylab="Thousands of Persons", col="red", ylim=c(0,100), 
        main="Health Damage in USA (Fatalities + Injuries)", cex.names=0.6)

plot of chunk unnamed-chunk-4

rownames(healthD)<-NULL; healthD
##           EVTYPE FATALITIES INJURIES
## 1        TORNADO       5633    91346
## 2 EXCESSIVE HEAT       1903     6525
## 3      TSTM WIND        504     6957
## 4          FLOOD        470     6789
## 5      LIGHTNING        816     5230
## 6           HEAT        937     2100
## 7    FLASH FLOOD        978     1777
## 8      ICE STORM         89     1975

So, its easy to see that the TORNADO event is the most dangerous event in terms of economic damage, followed by EXCESSIVE HEAT, TSTM WIND and FLOOD

Which types of events have the greatest economic consequences?

In a simmilar way than in the previous question, we calculate the sum of the two totals for each EVTYPE, PROPDMG+CROPDMG, sort the result in decreasing order, and select the 8 top values.

Finally we use a barplot() to display the results.

properD <- head(SDC[order(SDC$PROPDMG + SDC$CROPDMG, decreasing=TRUE), 
                     c("EVTYPE", "PROPDMG", "CROPDMG")], 8)

barplot(height=(properD$PROPDMG + properD$CROPDMG)/1.E+9, names.arg=properD$EVTYPE, 
        beside=TRUE, width=2, las=2, ylab="Billions of Dollars", col="blue", 
        main="Total Damage in USA (Properties + Crops)", ylim = c(0,160), cex.names=0.7)

plot of chunk unnamed-chunk-5

rownames(properD)<-NULL; properD
##              EVTYPE   PROPDMG   CROPDMG
## 1             FLOOD 1.447e+11 5.662e+09
## 2 HURRICANE/TYPHOON 6.931e+10 2.608e+09
## 3           TORNADO 5.695e+10 4.150e+08
## 4       STORM SURGE 4.332e+10 5.000e+03
## 5              HAIL 1.574e+10 3.026e+09
## 6       FLASH FLOOD 1.682e+10 1.421e+09
## 7           DROUGHT 1.046e+09 1.397e+10
## 8         HURRICANE 1.187e+10 2.742e+09

So, its easy to see that the FLOOD event is the most dangerous event in terms of economic damage, followed by HURRICANE/TYPHOON, TORNADO and STORM SURGE

Results

Two question has veen answered in this report:

1 Which events are are most harmful to population health ?
TORNADOs events are, by far, the most dangerous storm events, in terms if casualties and injuries.

  1. Which types of events have the greatest economic consequences?
    FLOODs, HURRICANE/TYPHOONs, TORNADOs and STORM SURGEs are the first four worst events, in terms of economic consequeneces