This study examinens data of the U.S. National Oceanic and Atmospheric Administration (NOAA) regarding health and economic storm damages in the USA. It presents the most damaging event types according to injuries, fatalities, property damage and crop damage.
In this part the data processing (getting, loading and preperation of the data) is described. First some librarie are integrated (these have to be installed beforehand).
library(data.table)
## Warning: package 'data.table' was built under R version 3.0.3
library(datasets)
library(reshape2)
## Warning: package 'reshape2' was built under R version 3.0.3
library(R.utils)
Observations: 902297
Variables: 37
The raw data can be obtained as an archive from the following url:
dataurl="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
archivename="Stormdata.bz2"
#uncomment the following line, if not yet downloaded!
#download.file(dataurl,destfile=archivename)
Next the downloaded data gets loaded into R.
rawdatafilename="StormData.csv"
bunzip2(archivename,rawdatafilename)
data<-read.csv(rawdatafilename,header=TRUE)
Only the sums of the fatalities, injuries, property damage and crop damage is relevant in this study.
meltdata<-melt(data, id=c("EVTYPE"),measure.vars=c("FATALITIES","INJURIES","PROPDMG","CROPDMG"))
summed_data <- dcast(meltdata, EVTYPE ~ variable, sum)
In the analysis the sums (injuries, fatalities, property damage and crop damage) are compared event type. First the health damages (injuries, fatalities) are presented (the top 5 in the plots, the top 20 in the table):
par(mfrow=c(1,2))
ranked_fatatilties <- summed_data[order(summed_data$FATALITIES, decreasing = TRUE), ]
ranked_fatatilties$RANK <- c(1:nrow(summed_data))
plotH <-ranked_fatatilties$FATALITIES[1:5]
plotN <-ranked_fatatilties$EVTYPE[1:5]
plotName<-"fatalities"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "red", xlab = "Event name", ylab = paste("Number of",plotName), ylim=ylim, main = paste("Number of",plotName,"\nby event"))
text(x = bp, y = plotH, label = plotH, pos = 3, col = "red",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)
ranked_injuries <- summed_data[order(summed_data$INJURIES, decreasing = TRUE), ]
ranked_injuries$RANK <- c(1:nrow(summed_data))
plotH <-ranked_injuries$INJURIES[1:5]
plotN <-ranked_injuries$EVTYPE[1:5]
plotName<-"injuries"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "blue", xlab = "Event name", ylab = paste("Number of",plotName),ylim=ylim, main = paste("Number of",plotName,"\nby event"))
text(x = bp, y = plotH, label = plotH, pos = 3, col = "blue",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)
print(ranked_fatatilties[1:20,c("RANK","EVTYPE","FATALITIES")],row.names=FALSE)
## RANK EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
## 11 WINTER STORM 206
## 12 RIP CURRENTS 204
## 13 HEAT WAVE 172
## 14 EXTREME COLD 160
## 15 THUNDERSTORM WIND 133
## 16 HEAVY SNOW 127
## 17 EXTREME COLD/WIND CHILL 125
## 18 STRONG WIND 103
## 19 BLIZZARD 101
## 20 HIGH SURF 101
print(ranked_injuries[1:20,c("RANK","EVTYPE","INJURIES")],row.names=FALSE)
## RANK EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
## 11 WINTER STORM 1321
## 12 HURRICANE/TYPHOON 1275
## 13 HIGH WIND 1137
## 14 HEAVY SNOW 1021
## 15 WILDFIRE 911
## 16 THUNDERSTORM WINDS 908
## 17 BLIZZARD 805
## 18 FOG 734
## 19 WILD/FOREST FIRE 545
## 20 DUST STORM 440
Next the economic damages (property and crop) are presented (again the top 5 in the plots, the top 20 in the table):
par(mfrow=c(1,2))
ranked_prop <- summed_data[order(summed_data$PROPDMG, decreasing = TRUE), ]
ranked_prop$RANK <- c(1:nrow(summed_data))
plotH <-ranked_prop$PROPDMG[1:5]
plotN <-ranked_prop$EVTYPE[1:5]
plotName<-"property damage"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "brown", xlab = "Event name", ylab = paste("Total",plotName),ylim=ylim, main = paste("Total",plotName,"\nby event"))
#text(x = bp, y = plotH, label = plotH, pos = 3, col = "brown",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)
ranked_crop <- summed_data[order(summed_data$CROPDMG, decreasing = TRUE), ]
ranked_crop$RANK <- c(1:nrow(summed_data))
plotH <-ranked_crop$CROPDMG[1:5]
plotN <-ranked_crop$EVTYPE[1:5]
plotName<-"crop damage"
ylim <- c(0, 1.1*max(plotH))
bp<-barplot(plotH, col = "orange", xlab = "Event name", ylab = paste("Total",plotName), ylim=ylim, main = paste("Total",plotName,"\nby event"))
#text(x = bp, y = plotH, label = plotH, pos = 3, col = "orange",cex=0.5)
axis(1, at=bp, labels=plotN, tick=FALSE, las=2, line=-0.5, cex.axis=0.5)
print(ranked_prop[1:20,c("RANK","EVTYPE","PROPDMG")],row.names=FALSE)
## RANK EVTYPE PROPDMG
## 1 TORNADO 3212258
## 2 FLASH FLOOD 1420125
## 3 TSTM WIND 1335966
## 4 FLOOD 899938
## 5 THUNDERSTORM WIND 876844
## 6 HAIL 688693
## 7 LIGHTNING 603352
## 8 THUNDERSTORM WINDS 446293
## 9 HIGH WIND 324732
## 10 WINTER STORM 132721
## 11 HEAVY SNOW 122252
## 12 WILDFIRE 84459
## 13 ICE STORM 66001
## 14 STRONG WIND 62994
## 15 HIGH WINDS 55625
## 16 HEAVY RAIN 50842
## 17 TROPICAL STORM 48424
## 18 WILD/FOREST FIRE 39345
## 19 FLASH FLOODING 28497
## 20 URBAN/SML STREAM FLD 26052
print(ranked_crop[1:20,c("RANK","EVTYPE","CROPDMG")],row.names=FALSE)
## RANK EVTYPE CROPDMG
## 1 HAIL 579596
## 2 FLASH FLOOD 179200
## 3 FLOOD 168038
## 4 TSTM WIND 109203
## 5 TORNADO 100019
## 6 THUNDERSTORM WIND 66791
## 7 DROUGHT 33899
## 8 THUNDERSTORM WINDS 18685
## 9 HIGH WIND 17283
## 10 HEAVY RAIN 11123
## 11 FROST/FREEZE 7034
## 12 EXTREME COLD 6121
## 13 TROPICAL STORM 5899
## 14 HURRICANE 5339
## 15 FLASH FLOODING 5126
## 16 HURRICANE/TYPHOON 4798
## 17 WILDFIRE 4364
## 18 TSTM WIND/HAIL 4357
## 19 WILD/FOREST FIRE 4190
## 20 LIGHTNING 3581
This part contains a summary of the results gained from the analysis: Most fatalities and injuries are caused by tornados (by far: 5633). Next are excessive heat (1903) and flash flood (978) for fatalities; TSTM wind (6957) and flood (6789) for injuries. In terms of property damage tornados (3212258), flash flood (1420124) and TSTM wind (1335966) are most devastating; for crop damage it is: hail (579596), flash flood (179200) and flood (168038).