The Storm Data set, provided by the National Oceanic and Atmospheric Administration (NOAA), is being analyzed with respect to the impact on the population health and economic damage across the US. Basic data exploration unveils data quality issues before 1993, so the previous years have not been included in the analysis. The types of events with highest total numbers of fatalities and injuries are being identified. Similarly the economic damage of events on property and crops are being extracted from the data provided.
The data are processed in the following steps
library(ggplot2)
library(reshape2)
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
data_dir <- "data"
destfile <- "FStormData.csv.bz2"
if (!dir.exists(data_dir)) {dir.create(data_dir)}
dest_path_file <- paste(data_dir,destfile, sep="/")
if (!file.exists(dest_path_file)) {
download.file(url, dest_path_file)
}
FStorm <- read.csv(dest_path_file)
FStorm$TOTDMG <- FStorm$PROPDMG + FStorm$CROPDMG
FStorm$YEAR <- format(as.POSIXlt(as.character(FStorm$BGN_DATE), format="%m/%d/%Y %H:%M:%S"), "%Y")
FAT_by_YEAR <- dcast(FStorm, YEAR ~ . , fun.aggregate=sum, value.var = "FATALITIES")
INJ_by_YEAR <- dcast(FStorm, YEAR ~ . , fun.aggregate=sum, value.var = "INJURIES")
PROP_by_YEAR <- dcast(FStorm, YEAR ~ . , fun.aggregate=sum, value.var = "PROPDMG")
CROP_by_YEAR <- dcast(FStorm, YEAR ~ . , fun.aggregate=sum, value.var = "CROPDMG")
select_years <- 1993:2011
FStorm <- FStorm[FStorm$YEAR %in% select_years,]
FAT_by_EVTYPE <- dcast(FStorm, EVTYPE ~ . , fun.aggregate=sum, value.var = "FATALITIES")
INJ_by_EVTYPE <- dcast(FStorm, EVTYPE ~ . , fun.aggregate=sum, value.var = "INJURIES")
PROP_by_EVTYPE <- dcast(FStorm, EVTYPE ~ . , fun.aggregate=sum, value.var = "PROPDMG")
CROP_by_EVTYPE <- dcast(FStorm, EVTYPE ~ . , fun.aggregate=sum, value.var = "CROPDMG")
TOT_by_EVTYPE <- dcast(FStorm, EVTYPE ~ . , fun.aggregate=sum, value.var = "TOTDMG")
FAT_ranked <- order(FAT_by_EVTYPE$., decreasing = TRUE)
INJ_ranked <- order(INJ_by_EVTYPE$., decreasing = TRUE)
PROP_ranked <- order(PROP_by_EVTYPE$., decreasing = TRUE)
CROP_ranked <- order(CROP_by_EVTYPE$., decreasing = TRUE)
TOT_ranked <- order(TOT_by_EVTYPE$., decreasing = TRUE)
FAT_ranked <- FAT_by_EVTYPE[FAT_ranked,]
INJ_ranked <- INJ_by_EVTYPE[INJ_ranked,]
PROP_ranked <- PROP_by_EVTYPE[PROP_ranked,]
CROP_ranked <- CROP_by_EVTYPE[CROP_ranked,]
TOT_ranked <- TOT_by_EVTYPE[TOT_ranked,]
POP_ranked <- cbind(FAT_ranked[1:10,], INJ_ranked[1:10,])
ECO_ranked <- cbind(PROP_ranked[1:10,], CROP_ranked[1:10,])
colnames(POP_ranked) <- c("Eventtype(Fat)","Fatalities","Eventtype(Inj)","Injuries")
colnames(ECO_ranked) <- c("Eventtype(Prop)","Property Damage","Eventtype(Crop)","Crop Damage")
par(mfrow = c(2,2))
plot(FAT_by_YEAR, ylab = "Fatalities")
plot(PROP_by_YEAR, ylab = "Property Damage")
plot(INJ_by_YEAR, ylab = "Injuries")
plot(CROP_by_YEAR, ylab = "Crop Damage")
dev.off()
## null device
## 1
Fig. 1: Fatalities, Injuries, Property Damage an Crop Damage over the years 1950 - 2011
As you can see the data could be devided into two phases: 1950 until 1992 and then 1993 until 2011. Before 1993 the crop damage has not been recorded at all. Fatalities and Injuries appear to be much lower before 1993. The simple conclusion drawn here is that the dataset provided has some quality issues before 1993 and hence only the data of the following years have been analyzed.
The numbers of fatalities and injuries are being summed up and then sorted in descending order. The top ten causes are extracted for figure 2 and 3, thereby assigning meaningful names to column headings.
print(POP_ranked)
## Eventtype(Fat) Fatalities Eventtype(Inj) Injuries
## 130 EXCESSIVE HEAT 1903 TORNADO 23310
## 834 TORNADO 1621 FLOOD 6789
## 153 FLASH FLOOD 978 EXCESSIVE HEAT 6525
## 275 HEAT 937 LIGHTNING 5230
## 464 LIGHTNING 816 TSTM WIND 3631
## 170 FLOOD 470 HEAT 2100
## 585 RIP CURRENT 368 ICE STORM 1975
## 359 HIGH WIND 248 FLASH FLOOD 1777
## 856 TSTM WIND 241 THUNDERSTORM WIND 1488
## 19 AVALANCHE 224 WINTER STORM 1321
Fig. 2: Fatalities, Injuries over the years 1993 - 2011 ranked by top ten eventtype in descending order
print(ECO_ranked)
## Eventtype(Prop) Property Damage Eventtype(Crop) Crop Damage
## 153 FLASH FLOOD 1420124.6 HAIL 579596.28
## 834 TORNADO 1387757.1 FLASH FLOOD 179200.46
## 856 TSTM WIND 1335965.6 FLOOD 168037.88
## 170 FLOOD 899938.5 TSTM WIND 109202.60
## 760 THUNDERSTORM WIND 876844.2 TORNADO 100018.52
## 244 HAIL 688693.4 THUNDERSTORM WIND 66791.45
## 464 LIGHTNING 603351.8 DROUGHT 33898.62
## 786 THUNDERSTORM WINDS 446293.2 THUNDERSTORM WINDS 18684.93
## 359 HIGH WIND 324731.6 HIGH WIND 17283.21
## 972 WINTER STORM 132720.6 HEAVY RAIN 11122.80
Fig 3 - Damage on property and crops over the years 1993 - 2011 ranked by top ten eventtype in descending order
Surprisingly the most frequent cause of fatalities is excessive heat for the years 1993-2011, whereas injuries are by far most frequently caused by tornados. Is is assumed that people with health problems are in high danger in the situation of excessive heat and hospitals need to be in a state of heightened alert.
Property damage is caused most freqently by flooding (flash flood being #1, flood #4) and tornados or strong wind. Precautions would be the protection of buildings through a variety of suitable measurements. Crops are most endangered by hail. The recommendation would be to have a suitable insurance in place.