Synopsis

This study examinens data of the U.S. National Oceanic and Atmospheric Administration (NOAA) regarding health and economic storm damages in the USA. It presents the most damaging event types according to injuries, fatalities, property damage and crop damage.

Data Processing

In this part the data processing (getting, loading and preperation of the data) is described. First some librarie are integrated (these have to be installed beforehand).

library(data.table)
## Warning: package 'data.table' was built under R version 3.0.3
library(datasets)
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.0.3
library(R.utils)

Getting data

Observations: 902297
Variables: 37
The raw data can be obtained as an archive from the following url:

dataurl="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
archivename="Stormdata.bz2"
#uncomment the following line, if not yet downloaded!
#download.file(dataurl,destfile=archivename)

Reading in data

Next the downloaded data gets loaded into R.

rawdatafilename="StormData.csv"
bunzip2(archivename,rawdatafilename)
data<-read.csv(rawdatafilename,header=TRUE)

Melting and summing up data

Only the sums of the fatalities, injuries, property damage and crop damage is relevant in this study.

meltdata<-melt(data, id=c("EVTYPE"),measure.vars=c("FATALITIES","INJURIES","PROPDMG","CROPDMG"))
summed_data <- dcast(meltdata, EVTYPE ~ variable, sum)

Data Analysis

In the analysis the sums (injuries, fatalities, property damage and crop damage) are compared event type. First the health damages (injuries, fatalities) are presented (the top 5 in the plots, the top 20 in the table):

par(mfrow=c(1,2))

ranked_fatatilties <- summed_data[order(summed_data$FATALITIES, decreasing = TRUE), ]
ranked_fatatilties$RANK <- c(1:nrow(summed_data))
plotH <-ranked_fatatilties$FATALITIES[1:5]
plotN <-ranked_fatatilties$EVTYPE[1:5]
plotName<-"fatalities"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "red",  xlab = "Event name", ylab = paste("Number of",plotName), ylim=ylim, main = paste("Number of",plotName,"\nby event"))
text(x = bp, y = plotH, label = plotH, pos = 3,  col = "red",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)

ranked_injuries <- summed_data[order(summed_data$INJURIES, decreasing = TRUE), ]
ranked_injuries$RANK <- c(1:nrow(summed_data))
plotH <-ranked_injuries$INJURIES[1:5]
plotN <-ranked_injuries$EVTYPE[1:5]
plotName<-"injuries"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "blue",  xlab = "Event name", ylab = paste("Number of",plotName),ylim=ylim, main = paste("Number of",plotName,"\nby event"))
text(x = bp, y = plotH, label = plotH, pos = 3,  col = "blue",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)

plot of chunk unnamed-chunk-5

print(ranked_fatatilties[1:20,c("RANK","EVTYPE","FATALITIES")],row.names=FALSE)
##  RANK                  EVTYPE FATALITIES
##     1                 TORNADO       5633
##     2          EXCESSIVE HEAT       1903
##     3             FLASH FLOOD        978
##     4                    HEAT        937
##     5               LIGHTNING        816
##     6               TSTM WIND        504
##     7                   FLOOD        470
##     8             RIP CURRENT        368
##     9               HIGH WIND        248
##    10               AVALANCHE        224
##    11            WINTER STORM        206
##    12            RIP CURRENTS        204
##    13               HEAT WAVE        172
##    14            EXTREME COLD        160
##    15       THUNDERSTORM WIND        133
##    16              HEAVY SNOW        127
##    17 EXTREME COLD/WIND CHILL        125
##    18             STRONG WIND        103
##    19                BLIZZARD        101
##    20               HIGH SURF        101
print(ranked_injuries[1:20,c("RANK","EVTYPE","INJURIES")],row.names=FALSE)
##  RANK             EVTYPE INJURIES
##     1            TORNADO    91346
##     2          TSTM WIND     6957
##     3              FLOOD     6789
##     4     EXCESSIVE HEAT     6525
##     5          LIGHTNING     5230
##     6               HEAT     2100
##     7          ICE STORM     1975
##     8        FLASH FLOOD     1777
##     9  THUNDERSTORM WIND     1488
##    10               HAIL     1361
##    11       WINTER STORM     1321
##    12  HURRICANE/TYPHOON     1275
##    13          HIGH WIND     1137
##    14         HEAVY SNOW     1021
##    15           WILDFIRE      911
##    16 THUNDERSTORM WINDS      908
##    17           BLIZZARD      805
##    18                FOG      734
##    19   WILD/FOREST FIRE      545
##    20         DUST STORM      440

Next the economic damages (property and crop) are presented (again the top 5 in the plots, the top 20 in the table):

par(mfrow=c(1,2))

ranked_prop <- summed_data[order(summed_data$PROPDMG, decreasing = TRUE), ]
ranked_prop$RANK <- c(1:nrow(summed_data))
plotH <-ranked_prop$PROPDMG[1:5]
plotN <-ranked_prop$EVTYPE[1:5]
plotName<-"property damage"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "brown",  xlab = "Event name", ylab = paste("Total",plotName),ylim=ylim, main = paste("Total",plotName,"\nby event"))
#text(x = bp, y = plotH, label = plotH, pos = 3,  col = "brown",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)

ranked_crop <- summed_data[order(summed_data$CROPDMG, decreasing = TRUE), ]
ranked_crop$RANK <- c(1:nrow(summed_data))
plotH <-ranked_crop$CROPDMG[1:5]
plotN <-ranked_crop$EVTYPE[1:5]
plotName<-"crop damage"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "orange",  xlab = "Event name", ylab = paste("Total",plotName), ylim=ylim, main = paste("Total",plotName,"\nby event"))
#text(x = bp, y = plotH, label = plotH, pos = 3,  col = "orange",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)

plot of chunk unnamed-chunk-6

print(ranked_prop[1:20,c("RANK","EVTYPE","PROPDMG")],row.names=FALSE)
##  RANK               EVTYPE PROPDMG
##     1              TORNADO 3212258
##     2          FLASH FLOOD 1420125
##     3            TSTM WIND 1335966
##     4                FLOOD  899938
##     5    THUNDERSTORM WIND  876844
##     6                 HAIL  688693
##     7            LIGHTNING  603352
##     8   THUNDERSTORM WINDS  446293
##     9            HIGH WIND  324732
##    10         WINTER STORM  132721
##    11           HEAVY SNOW  122252
##    12             WILDFIRE   84459
##    13            ICE STORM   66001
##    14          STRONG WIND   62994
##    15           HIGH WINDS   55625
##    16           HEAVY RAIN   50842
##    17       TROPICAL STORM   48424
##    18     WILD/FOREST FIRE   39345
##    19       FLASH FLOODING   28497
##    20 URBAN/SML STREAM FLD   26052
print(ranked_crop[1:20,c("RANK","EVTYPE","CROPDMG")],row.names=FALSE)
##  RANK             EVTYPE CROPDMG
##     1               HAIL  579596
##     2        FLASH FLOOD  179200
##     3              FLOOD  168038
##     4          TSTM WIND  109203
##     5            TORNADO  100019
##     6  THUNDERSTORM WIND   66791
##     7            DROUGHT   33899
##     8 THUNDERSTORM WINDS   18685
##     9          HIGH WIND   17283
##    10         HEAVY RAIN   11123
##    11       FROST/FREEZE    7034
##    12       EXTREME COLD    6121
##    13     TROPICAL STORM    5899
##    14          HURRICANE    5339
##    15     FLASH FLOODING    5126
##    16  HURRICANE/TYPHOON    4798
##    17           WILDFIRE    4364
##    18     TSTM WIND/HAIL    4357
##    19   WILD/FOREST FIRE    4190
##    20          LIGHTNING    3581

Results

This part contains a summary of the results gained from the analysis: Most fatalities and injuries are caused by tornados (by far: 5633). Next are excessive heat (1903) and flash flood (978) for fatalities; TSTM wind (6957) and flood (6789) for injuries. In terms of property damage tornados (3212258), flash flood (1420124) and TSTM wind (1335966) are most devastating; for crop damage it is: hail (579596), flash flood (179200) and flood (168038).