By analysing the events in the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database started in the year 1950 and end in November 2011, we will find out top harmful storm events happend in the US.

The analysis will show the damages of the top storm events caused on health population and economic. And we can easily recognize Tonardo is the most terribble one by its damage shown on the results.

Data Processings

Obtaining raw data from this link.

destFile <- "StormData.csv.bz2"
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

if(!file.exists(destFile)) {
        res <- tryCatch(download.file(fileUrl,
                                      destfile = "StormData.csv.bz2",
                                      method = "curl"),
                        error = function(e) 1)
} else {
        df <- read.csv(destFile, na.strings = "")
}

Since we are trying to find out and evaluate the influences of storm events on population health and economic across the United States, variables EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP in the provided data are considered.

stormDf <- df %>% select(c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

Variable EVTYPE shows the name of storm event, we reformat EVTYPE value by lowering and trimming white spaces.

levels(stormDf$EVTYPE) <- tolower(trimws(levels(stormDf$EVTYPE)))

Variables FATALITIES and INJURIES record damage information related to health population caused by corresponding storm.

The others record damage information for properties and crop in which PROPDMGEXP and CROPDMGEXP express exponent Value of PROPDMG, CROPDMG, respectively. In general, the two variables have values of H,h,K,k,M,m,B,b,+,-,?,0,1,2,3,4,5,6,7,8, and blank-character. Each has own meaning and maps to a numeric value:

The following codes convert values of PROPDMG, CROPDMG to the correct values w.r.t. PROPDMGEXP and CROPDMGEXP, respectively.

levels(stormDf$PROPDMGEXP) <- c("0", "0", "1", rep("10", 9), "1000000000", "100", "100", "1000", "1000000", "1000000")
stormDf$PROPDMGEXP[is.na(stormDf$PROPDMGEXP)] <- "0"
stormDf$PROPDMGEXP <- as.numeric(stormDf$PROPDMGEXP)
levels(stormDf$CROPDMGEXP) <- c("0", "0", "1", rep("10", 9), "1000000000", "100", "100", "1000", "1000000", "1000000")
stormDf$CROPDMGEXP[is.na(stormDf$CROPDMGEXP)] <- "0"
stormDf$CROPDMGEXP <- as.numeric(stormDf$CROPDMGEXP)

stormDf <- stormDf %>% mutate(economicDMG = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP,
                              healthDMG = FATALITIES + INJURIES) %>% select(-c(PROPDMGEXP, CROPDMGEXP))

Data Analysis

Total damage on health population caused by each storm event, ordred in descending

healthDf <- stormDf %>% group_by(EVTYPE) %>% summarise(totalDMG = sum(healthDMG)) %>% arrange(desc(totalDMG))

Total damage on economics caused by each storm event, ordred in descending

economicDf <- stormDf %>% group_by(EVTYPE) %>% summarise(totalDMG = sum(economicDMG)) %>% arrange(desc(totalDMG))

Results

Tornado is the most harmful storm event to the US population health.

g <- ggplot(healthDf[1:10, ], aes(x = EVTYPE, y = totalDMG))
g <- g + geom_bar(stat = "identity", aes(fill = totalDMG), position = "dodge")
g <- g + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Storm Event") + ylab("Total Damage")
g + guides(fill=guide_legend(title="Total Damage")) + ggtitle("Top 10 most harmful events to the US health population")

Tornado also caused most damage on US economics.

g <- ggplot(economicDf[1:10, ], aes(x = EVTYPE, y = totalDMG))
g <- g + geom_bar(stat = "identity", aes(fill = totalDMG), position = "dodge")
g <- g + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Storm Event") + ylab("Total Damage")
g + guides(fill=guide_legend(title="Total Damage")) + ggtitle("Top 10 most storm events caused most damage on US Economic")