This is an analysis of the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database of major storms and weather events in the United States (from 1950 to November, 2011) to identify which events have caused the most damage to human life, property and crops.
The investigation finds that Hurricanes, Floods and Storms have damaged property and crops the most followed by Drought as the second most destructive. Loss and injury to human life has been caused most by Tornadoes and then Excessive Heat. Flood follows as the third most damaging event type.
We start by reading the file into a data frame 'stormd'. Next we load the data.table package and convert 'stormd' and set the column 'REFNUM' as the key.
Since we are only analysing the economic and humna cost of the events we can drop all rows which have zero fatalities, injuries, crop or property damage reported.
stormd <- read.csv("repdata-data-StormData.csv.bz2")
library(data.table)
stormd <- data.table(stormd)
setkey(stormd, REFNUM)
stormd <- stormd[(FATALITIES > 0 | INJURIES >0 | PROPDMG > 0 | CROPDMG >0)]
The recording of the types of weather events has not followed the key guidelines and as a result the same type of event has been recorded differently in different cases leading to a bit of a jumble.
For instance Thunderstorm Winds have been recorded seperately as 'Thunderstorm Wind', 'Thunderstorm Winds' also 'TSTM Wind'.
Let's have a look at the 15 most occured events to get a sense of the data. We want to count the occurance of each 'EVTYPE' so we insert a new 'count' column into 'stormd' and sum it up grouped by 'EVTYPE'. The new data table is ordered in descending and the first 15 rows are subsetted.
Before plotting the bar graph we create a new pallette of colours 'cpal'.
Using 'ggplot2' package we plot the bar graph.
library(data.table)
stormd <- data.table(stormd, count=c(1))
sumeve <- stormd[,list(sum=sum(count)), by=EVTYPE]
sumeve <- sumeve[order(sum, decreasing =TRUE),]
sumeve_t15 <- sumeve[1:15]
cpal <- c("#6B1D00", "#912700", "#593D9B", "#422091", "#087945", "#139259", "#AF970B", "#E3CC48", "#004A28", "#E37248", "#D4B91C", "#AF370B", "#240963", "#321378", "#D44E1C", "#917C00", "#006437", "#190449", "#6B5C00", "#329D6C")
cpal <- rep(cpal, 10)
library(ggplot2)
library(grid)
ggplot(sumeve_t15, aes(x=EVTYPE, y=sum)) + geom_bar(aes(fill=EVTYPE), stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5)) + guides(fill=FALSE) + labs(x="Event Type", y="Occurrence", title="15 Most Occured Events") + scale_fill_manual(values=cpal)
The most occured severe weather events seem to be Thunderstorms. We also have associated events like Flood, Flash Flood, Lightning and Heavy Rain in the Wet Weather group.
Tornadoes are the second most occuring event type.
sumeve_t15
## EVTYPE sum
## 1: TSTM WIND 63234
## 2: THUNDERSTORM WIND 43655
## 3: TORNADO 39944
## 4: HAIL 26130
## 5: FLASH FLOOD 20967
## 6: LIGHTNING 13293
## 7: THUNDERSTORM WINDS 12086
## 8: FLOOD 10175
## 9: HIGH WIND 5522
## 10: STRONG WIND 3370
## 11: WINTER STORM 1508
## 12: HEAVY SNOW 1342
## 13: HEAVY RAIN 1105
## 14: WILDFIRE 857
## 15: ICE STORM 708
Deaths and injuries have been recorded in the 'FATALITIES' and 'INJURIES' columns. Let's look at the events which contributed to the 40 highest fatalities and injuries each to identify the most life threatning events.
We order the fatalities data in descending order and subset the top 40 and do the same for injuries. Next we combine both into the data table 'fatinj'
Using ggplot2 package we plot a stacked bar chart.
stormd <- stormd[order(FATALITIES, decreasing=TRUE),]
top_fat <- stormd[1:40]
stormd <- stormd[order(INJURIES, decreasing=TRUE),]
top_inj <- stormd[1:40]
fatinj <- rbind(top_fat, top_inj)
p1 <- ggplot(fatinj, aes(x = factor(1), fill = factor(EVTYPE))) + geom_bar(width = 1) + labs(x="Event Type", y="Count", title="The 80 most fatal and injurious events") + scale_fill_manual(values=cpal) + guides(fill=guide_legend(title="Event Type"))
p1
Tornadoes and Excessive Heat are the two biggest causes of fatalities and injuries. This is followed by Flood.
fatinj.l <- fatinj[,list(Count=sum(count)), by=EVTYPE]
fatinj.l[order(Count, decreasing=TRUE)]
## EVTYPE Count
## 1: TORNADO 49
## 2: EXCESSIVE HEAT 13
## 3: FLOOD 8
## 4: HEAT 2
## 5: HURRICANE/TYPHOON 2
## 6: EXTREME HEAT 1
## 7: HEAT WAVE 1
## 8: TSUNAMI 1
## 9: UNSEASONABLY WARM AND DRY 1
## 10: ICE STORM 1
## 11: BLIZZARD 1
Let's have a look at the events which caused the most property and crop damage. Again we order the data in descending with damages in the billions on top followed by millions. We subset the top 40 and combine both into 'prpcro'.
Using ggplot2 package we plot a stacked bar chart.
top_prp <- stormd[(PROPDMGEXP == "B" | PROPDMGEXP == "M")]
top_prp <- top_prp[order(PROPDMGEXP, -PROPDMG)]
top_prp <- top_prp[1:40]
top_cro <- stormd[(CROPDMGEXP == "B" | CROPDMGEXP == "M")]
top_cro <- top_cro[order(CROPDMGEXP, -CROPDMG)]
top_cro <- top_cro[1:40]
prpcro <- rbind(top_prp, top_cro)
p2 <- ggplot(prpcro, aes(x = factor(1), fill = factor(EVTYPE))) + geom_bar(width = 1) + labs(x="Event Type", y="Count", title="The 80 most property and crop damaging events") + scale_fill_manual(values=cpal) + guides(fill=guide_legend(title="Event Type")) + theme(plot.margin=unit(c(3,2,3,2),"lines"), legend.key.height = unit(0.7, "lines") )
p2
The most damage has been caused by hurricanes then storms and floods. After this comes Drought. Tornadoes also feature in the list though majorly it's wet event types which have been the most harmful.
prpcro.l <- prpcro[,list(Count=sum(count)), by=EVTYPE]
prpcro.l[order(Count, decreasing=TRUE)]
## EVTYPE Count
## 1: DROUGHT 18
## 2: HURRICANE/TYPHOON 15
## 3: FLOOD 10
## 4: HURRICANE 7
## 5: TORNADO 3
## 6: STORM SURGE 2
## 7: TROPICAL STORM 2
## 8: RIVER FLOOD 2
## 9: HURRICANE OPAL 2
## 10: EXTREME COLD 2
## 11: WINTER STORM 1
## 12: STORM SURGE/TIDE 1
## 13: HEAVY RAIN/SEVERE WEATHER 1
## 14: HAIL 1
## 15: TORNADOES, TSTM WIND, HAIL 1
## 16: WILD/FOREST FIRE 1
## 17: HIGH WIND 1
## 18: SEVERE THUNDERSTORM 1
## 19: WILDFIRE 1
## 20: FLASH FLOOD 1
## 21: HURRICANE OPAL/HIGH WINDS 1
## 22: ICE STORM 1
## 23: HEAT 1
## 24: FREEZE 1
## 25: EXCESSIVE HEAT 1
## 26: FROST/FREEZE 1
## 27: DAMAGING FREEZE 1
## EVTYPE Count