This article find the most deadly extreme weather events (coded as “Storm Events”) and most harmful ones according to data from US National Oceanic and Atmospheric Administration (NOAA). By most deadly, I mean Storm Events killing and hurting most. By most harmful, I mean Storm Events which cause loss of property and crops most. This article provides two types of numbers indicating the impact on people’s health and wealth. One is the total number and the other is average number. Total number is calculated by sum of all the consequences of one kind of Storm Events. average number is calculated by average of all the consequences of one kind of Storm Events.
There are two main steps. First, reading and cleaning data. Second, calculation and sorting data.
Two processes is involved in this steps.
Configurate the enviroment and Read data directly from bz file
Sys.setlocale('LC_ALL', 'English')
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
library("plyr")
conn <- bzfile("StormData.csv.bz2", "rt")
theData <- read.csv(conn)
close(conn)
Extracts the entrys which I assumed correct. The standard of “correctness” is the extensions of property damage (corresponding to PROPDMGEXP) and extensions of crops damage (corresponding to PROPDMGEXP) can only be one of the following four letters: H (capital letter, hundreds), K(capital letter, kilo), M(capital letter, million), B(capital letter, Billion) or nothing. I assumed a entry is incorrect other than above situation and discarded that incorrect record. The number of entries I discard is 370. This is extremely small comparing to total number of entries which is 902297. This tranformation can not have any major impact on the calculation of results.
Then concerned items whiched is related to people’s health (FATALITIES, INJURIES), propery damage (PROPDMG) and crops damage (CROPDMG) are extracted.
data <- subset(theData, (theData$PROPDMGEXP==""|theData$PROPDMGEXP=="H"|theData$PROPDMGEXP=="K"|theData$PROPDMGEXP=="M"|theData$PROPDMGEXP=="B")&(theData$CROPDMGEXP==""|theData$CROPDMGEXP=="H"|theData$CROPDMGEXP=="K"|theData$CROPDMGEXP=="M"|theData$CROPDMGEXP=="B"))
data <- data[c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
Transform the extension letters (“H”, “K”, “M”, “B”) to number 100, 1000, 1000000 and 1000000000. And mutiply this numbers to property damage and crop damage to get a damage number in US dollar that can be used in finding maximal number.
data$PROPDMGEXP <- sub("", "1", as.character(data$PROPDMGEXP))
data$PROPDMGEXP <- sub("1H", "100", as.character(data$PROPDMGEXP))
data$PROPDMGEXP <- sub("1K", "1000", as.character(data$PROPDMGEXP))
data$PROPDMGEXP <- sub("1M", "1000000", as.character(data$PROPDMGEXP))
data$PROPDMGEXP <- sub("1B", "1000000000", as.character(data$PROPDMGEXP))
data$CROPDMGEXP <- sub("", "1", as.character(data$CROPDMGEXP))
data$CROPDMGEXP <- sub("1K", "1000", as.character(data$CROPDMGEXP))
data$CROPDMGEXP <- sub("1M", "1000000", as.character(data$CROPDMGEXP))
data$CROPDMGEXP <- sub("1B", "1000000000", as.character(data$CROPDMGEXP))
PROPDMGCASH <- data$PROPDMG * as.numeric(data$PROPDMGEXP)
CROPDMGCASH <- data$CROPDMG * as.numeric(data$CROPDMGEXP)
data <- subset(data, select = c(-PROPDMG, -PROPDMGEXP, -CROPDMG, -CROPDMGEXP))
data = data.frame(data, PROPDMGCASH = PROPDMGCASH, CROPDMGCASH = CROPDMGCASH)
For the four items, fatalities(FATALITIES), injuries(INJURIES), dollar property damage(PROPDMGCASH) and dollar crop damage(CROPDMGCASH), calculate the total number and average number for each kind of Storm Event.
totalFatalities <- ddply(data,.(EVTYPE),summarize,TotalFatalities=sum(FATALITIES))
averageFatalities <- ddply(data,.(EVTYPE),summarize,AverageFatalities=mean(FATALITIES))
totalInjuries <- ddply(data,.(EVTYPE),summarize,TotalInjuries=sum(INJURIES))
averageInjuries <- ddply(data,.(EVTYPE),summarize,AverageInjuries=mean(INJURIES))
totalPorpertyDamage <- ddply(data,.(EVTYPE),summarize,TotalPorpertyDamage=sum(PROPDMGCASH))
averagePorpertyDamage <- ddply(data,.(EVTYPE),summarize,AveragePorpertyDamage=mean(PROPDMGCASH))
totalCropsDamage <- ddply(data,.(EVTYPE),summarize,TotalCropsDamage=sum(CROPDMGCASH))
averageCropsDage <- ddply(data,.(EVTYPE),summarize,AverageCropsDamage=mean(CROPDMGCASH))
calData <- merge(totalFatalities, averageFatalities, by = "EVTYPE")
calData <- merge(calData, totalInjuries, by = "EVTYPE")
calData <- merge(calData, averageInjuries, by = "EVTYPE")
calData <- merge(calData, totalPorpertyDamage, by = "EVTYPE")
calData <- merge(calData, averagePorpertyDamage, by = "EVTYPE")
calData <- merge(calData, totalCropsDamage, by = "EVTYPE")
calData <- merge(calData, averageCropsDage, by = "EVTYPE")
Then sort these events in descendent order to get top 20 events by these numbers. There are overall eight ranking lists, most total fatalities, most average fatalities, most total injuries, most average injuries, most total property damage, most average property damage, most total crops damage and most average crops damage for a single type of event.
maximunNumber <- 20
sortedTotalFatalities <- subset(calData[order(calData$TotalFatalities, decreasing = TRUE),], select = c("EVTYPE", "TotalFatalities"))[1:maximunNumber,]
sortedAverageFatalities <- subset(calData[order(calData$AverageFatalities, decreasing = TRUE),], select = c("EVTYPE", "AverageFatalities"))[1:maximunNumber,]
sortedTotalInjuries <- subset(calData[order(calData$TotalInjuries, decreasing = TRUE),], select = c("EVTYPE", "TotalInjuries"))[1:maximunNumber,]
sortedAverageInjuries <- subset(calData[order(calData$AverageInjuries, decreasing = TRUE),], select = c("EVTYPE", "AverageInjuries"))[1:maximunNumber,]
sortedTotalPorpertyDamage <- subset(calData[order(calData$TotalPorpertyDamage, decreasing = TRUE),], select = c("EVTYPE", "TotalPorpertyDamage"))[1:maximunNumber,]
sortedAveragePorpertyDamage <- subset(calData[order(calData$AveragePorpertyDamage, decreasing = TRUE),], select = c("EVTYPE", "AveragePorpertyDamage"))[1:maximunNumber,]
sortedTotalCropsDamage <- subset(calData[order(calData$TotalCropsDamage, decreasing = TRUE),], select = c("EVTYPE", "TotalCropsDamage"))[1:maximunNumber,]
sortedAverageCropsDamage <- subset(calData[order(calData$AverageCropsDamage, decreasing = TRUE),], select = c("EVTYPE", "AverageCropsDamage"))[1:maximunNumber,]
par(mfrow = c(2,2))
par(mar = c(5,6,4,4)+0.1)
barplot(sortedTotalFatalities$TotalFatalities[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedTotalFatalities$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Total Fatalities Storm Event Type", xlab = "Different Events", ylab = "Direct Death Count",args.legend = list(ncol = 2, cex=3), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedTotalFatalities$TotalFatalities[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedTotalFatalities$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Total Fatalities Storm Event TYPE", xlab = "Different Events Type", ylab = "Direct Death Count",args.legend = list(ncol = 2, cex=2),cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAverageFatalities$AverageFatalities[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedAverageFatalities$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Average Fatalities Storm Event Type", xlab = "Different Events", ylab = "Direct Average Death",args.legend = list(ncol = 2, cex=2),cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAverageFatalities$AverageFatalities[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedAverageFatalities$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Average Fatalities Storm Event Type", xlab = "Different Events Type", ylab = "Direct Average Death",args.legend = list(ncol = 2, cex=2),cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
From the figure, the tornado kill people most (5630 people died because of tornado). However, there are 60625 tornado events in the data base and their total fatalities is huge. The second place and forth place belong to ecessive heat(1903) and heat(937). It is notice that they are similar events of high temperture.
Then the average fatalities is carefully examined. First, second and third events is “TORNADOES, TSTM WIND, HAIL”, “COLD AND SNOW” and “TROPICAL STORM GORDON”. However, they all just happended only once. They are more like unique event which take many lifes. Next is “RECORD/EXCESSIVE HEAT” (3 times), “EXTREME HEAT”(1678 times), and other high temperture related event in the top 20. It can be say that every recorded high temperture event kills people.
par(mfrow = c(2,2))
par(mar = c(5,6,4,4)+0.1)
barplot(sortedTotalInjuries$TotalInjuries[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedTotalInjuries$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Total Injuries Storm Event Type", xlab = "Different Events", ylab = "Injury Count",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedTotalInjuries$TotalInjuries[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedTotalInjuries$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Total Injuries Storm Event Type", xlab = "Different Events Type", ylab = "Injury Count",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAverageInjuries$AverageInjuries[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedAverageInjuries$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Average Injuries Storm Event Type", xlab = "Different Events", ylab = "Average Injury",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAverageInjuries$AverageInjuries[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedAverageInjuries$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Average Injuries Storm Event Type", xlab = "Different Events Type", ylab = "Average Injury",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
It can be seen that it is the type of tornado events causing most total injuries. And the total number of injuries caused by tornados is much greater than the following events, which is 91285 injuries to 6957 injuries. Again, it is interesting to look the average injuries numbers. The first is the event “Heat Wave”, noticing the lowercase letter in this type. Actually it is mistype “HEAT WAVE” to “Heat Wave”. And this is a single event. Many entry in the top 20 average injuries is a single event instead of averaging many events. And the true averaging number events include “HURRICANE/TYPHOON”, “EXTREME HEAT”, “HEAT WAVE” and so on. As we can see, this result is basiclly corresponding to the average fatalities count. High temperuture can hurt people in a great possibility.
par(mfrow = c(2,2))
par(mar = c(5,6,4,4)+0.1)
barplot(sortedTotalPorpertyDamage$TotalPorpertyDamage[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedTotalPorpertyDamage$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Total Property Damage Storm Event Type", xlab = "Different Events", ylab = "Total Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedTotalPorpertyDamage$TotalPorpertyDamage[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedTotalPorpertyDamage$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Total Property Damage Storm Event Type", xlab = "Different Events Type", ylab = "Total Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAveragePorpertyDamage$AveragePorpertyDamage[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedAveragePorpertyDamage$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Average Property Damage Storm Event Type", xlab = "Different Events", ylab = "Average Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAveragePorpertyDamage$AveragePorpertyDamage[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedAveragePorpertyDamage$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Average Property Damage Storm Event Type", xlab = "Different Events Type", ylab = "Average Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
The Storm Event which has the greatest total economic consequences is flood. The total property damage by flood is about 144.6 billion US dollar. It is also worth noticing that similar events “FLASH FLOOD” and “RIVER FLOOD” are also contain by the ranking list. So in all these events, flood causes the greatest loss on property.
I also check the average property damage ranking list. And still, there are many single event in it. However, several events related to hurricane such as “HURRICANE/TYPHOON”, “HURRICANE OPAL”, “STORM SURGE”, “HURRICANE” and “TYPHOON” are not single events. These kind of Storm Events may cause property damage every time.
par(mfrow = c(2,2))
par(mar = c(5,6,4,4)+0.1)
barplot(sortedTotalCropsDamage$TotalCropsDamage[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedTotalCropsDamage$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Total Crops Damage Storm Event Type", xlab = "Different Events", ylab = "Total Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedTotalCropsDamage$TotalCropsDamage[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedTotalCropsDamage$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Total Crops Damage Storm Event Type", xlab = "Different Events Type", ylab = "Total Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAverageCropsDamage$AverageCropsDamage[1:round(maximunNumber/2)], col = 1:round(maximunNumber/2),legend.text = sortedAverageCropsDamage$EVTYP[1:round(maximunNumber/2)], main = "Top 10 Average Crops Damage Storm Event Type", xlab = "Different Events", ylab = "Average Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
barplot(sortedAverageCropsDamage$AverageCropsDamage[(round(maximunNumber/2)+1):maximunNumber], col = (round(maximunNumber/2)+1):maximunNumber,legend.text = sortedAverageCropsDamage$EVTYP[(round(maximunNumber/2)+1):maximunNumber], main = "Top 11-20 Average Crops Damage Storm Event Type", xlab = "Different Events Type", ylab = "Average Damage (US dollar)",args.legend = list(ncol = 2, cex=2), cex.axis = 2, cex.names = 2, cex.lab=3, cex.main=3)
As we can expect, drought has the greatest impact on agriculture. Drought events caused a total 14.0 billion loss on agriculture. And flooding related events(“FLOOD”, “RIVER FLOOD” and “FLASH FLOOD” and hurricane related events (“HURRICANE”, “HURRICANE/TYPHOON” and “TROPICAL STORM”) are all on the top 20 ranking list for total crops damage.
As usual, there are some single events in the average crops damage list. However,several flood and hurricane type events which are not unique cases are also on the list. This means every flood and hurricane event causing crops damage. And drought events which is first rank of total crops damage is also on the list. Drought did have a major impact on agricuture.