Current work represents an analysis of the damage caused by storms in the United States during the years 1950 to 2011. The damage is considered in two ways: damage to human's health and damage to an economy. Both deaths and injuries are considered in the first case, damage to crops and production is considered in the second case.
Load packages and the data.
library(data.table)
library(ggplot2)
data <- data.table(read.csv(bzfile("StormData.csv.bz2")))
Use “data.table” package to find 10 weather events which caused most deaths ans injuries in the USA.
dataHealth <- data[, lapply(.SD, sum, na.rm=TRUE), by= EVTYPE, .SDcols=c("INJURIES", "FATALITIES")]
dataHealth <- dataHealth[,ALL := INJURIES + FATALITIES]
dataHealth <- dataHealth[ALL > 0 ,]
dataHealth <- dataHealth[ order(ALL, decreasing = T) ,]
dataHealth <- dataHealth[ 1:10 ,]
Use “data.table” package to find 10 weather events which caused most economic damage to agriculture and production in the USA.
dataEc <- data[, lapply(.SD, sum, na.rm=TRUE), by= EVTYPE, .SDcols=c("PROPDMG", "CROPDMG")]
dataEc <- dataEc[,ALL := PROPDMG + CROPDMG]
dataEc <- dataEc[ALL > 0 ,]
dataEc <- dataEc[ order(ALL, decreasing = T) ,]
dataEc <- dataEc[ 1:10 ,]
Use “ggplot2” package to visualize results of the current analysis.
dataHealth
## EVTYPE INJURIES FATALITIES ALL
## 1: TORNADO 91346 5633 96979
## 2: EXCESSIVE HEAT 6525 1903 8428
## 3: TSTM WIND 6957 504 7461
## 4: FLOOD 6789 470 7259
## 5: LIGHTNING 5230 816 6046
## 6: HEAT 2100 937 3037
## 7: FLASH FLOOD 1777 978 2755
## 8: ICE STORM 1975 89 2064
## 9: THUNDERSTORM WIND 1488 133 1621
## 10: WINTER STORM 1321 206 1527
ggplot(data = dataHealth, aes(x = EVTYPE)) +
ggtitle("Number of fatalities") +
geom_bar(aes(x = EVTYPE, y = FATALITIES), stat = "identity",position="dodge") +
scale_x_discrete(limits=rev(dataHealth[,EVTYPE]), name="Event") +
coord_flip() +
scale_y_continuous(name="Count")
ggplot(data = dataHealth, aes(x = EVTYPE)) +
ggtitle("Number of injuries") +
geom_bar(aes(x = EVTYPE, y = INJURIES), stat = "identity",position="dodge") +
scale_x_discrete(limits=rev(dataHealth[,EVTYPE]), name="Event") +
coord_flip() +
scale_y_continuous(name="Count")
This bar plot represents number of deaths and injuries of 10 most dangerous weather events. It's easily seen that Tornadoes are the most dangerous weather events in the USA as they caused most deaths and injuries.
dataEc
## EVTYPE PROPDMG CROPDMG ALL
## 1: TORNADO 3212258 100019 3312277
## 2: FLASH FLOOD 1420125 179200 1599325
## 3: TSTM WIND 1335966 109203 1445168
## 4: HAIL 688693 579596 1268290
## 5: FLOOD 899938 168038 1067976
## 6: THUNDERSTORM WIND 876844 66791 943636
## 7: LIGHTNING 603352 3581 606932
## 8: THUNDERSTORM WINDS 446293 18685 464978
## 9: HIGH WIND 324732 17283 342015
## 10: WINTER STORM 132721 1979 134700
ggplot(data = dataEc, aes(x = EVTYPE)) +
ggtitle("Ecomonic damage") +
geom_bar(aes(x = EVTYPE, y = ALL), stat = "identity",position="dodge") +
scale_x_discrete(limits=rev(dataEc[,EVTYPE]), name="Event") +
coord_flip() +
scale_y_continuous(name="Damage")
This bar plot represents economic damage of 10 most dangerous weather events. It's easily seen that Tornadoes are the most dangerous weather events in the USA as they caused most economic damage.