US Storm Data Analysis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The present analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The analysis addresses the following questions:

Data processing

  1. The data file is downloaded if it is not present
  2. It is uncompressed and read as a data frame.
# download and read the data
read_data <- function() {
    fname = "repdata-data-StormData.csv.bz2"
    source_url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
    if (!file.exists(fname)) {
        download.file(source_url, destfile = fname, method = "curl")
    }
    tbl <- read.csv(fname)
    tbl
}
stormData <- read_data()

Aggregate data by event type on injuries and fatalities, filter 0 counts, rank, get 20 highest. Injuries and fatalities are dispare counts. We will have two different data sets.

injuriesByEventType <- aggregate(INJURIES ~ EVTYPE, stormData, sum)
injuriesByEventType <- injuriesByEventType[injuriesByEventType$INJURIES>0,]
injuriesByEventType <- injuriesByEventType[order(injuriesByEventType$INJURIES,decreasing=TRUE),] 
highest20InjuriesThreats <- injuriesByEventType[c(1:20),]
fatalitiesByEventType <- aggregate(FATALITIES ~ EVTYPE, stormData, sum)
fatalitiesByEventType <- fatalitiesByEventType[fatalitiesByEventType$FATALITIES>0,]
fatalitiesByEventType <- fatalitiesByEventType[order(fatalitiesByEventType$FATALITIES, decreasing=TRUE),] 
highest20FatalitiesThreats <- fatalitiesByEventType[c(1:20),]

Aggregate data by event type on property damage, crop damage, etc.

economicDamageByEventType <- aggregate(PROPDMG+CROPDMG ~ EVTYPE, stormData, sum)
names(economicDamageByEventType) <- sub(" \\+ ", "_", names(economicDamageByEventType))
economicDamageByEventType <- economicDamageByEventType[economicDamageByEventType$PROPDMG_CROPDMG>0,]
economicDamageByEventType <- economicDamageByEventType[order(economicDamageByEventType$PROPDMG_CROPDMG, decreasing=TRUE),] 
highest20EconomicThreats <- economicDamageByEventType[c(1:20),]

Results

From the data we have, the most dangerous weather events accross the US are represented in the following figure considering injuries and fatalities.

library(ggplot2)
ggplot(highest20InjuriesThreats, aes(x=EVTYPE, y=INJURIES)) + geom_bar(stat="identity", fill = "steelblue", binwidth = 5) + labs(title = "Most injury dangerous Weather Events", x = "Event Type", y = "Number of Injuries") + theme_bw() + theme(legend.position = "bottom") + theme(axis.text.x=element_text(angle=-90, hjust=0))

plot of chunk unnamed-chunk-5

ggplot(highest20FatalitiesThreats, aes(x=EVTYPE, y=FATALITIES)) + geom_bar(stat="identity", fill = "steelblue", binwidth = 5) + labs(title = "Most fatality dangerous Weather Events", x = "Event Type", y = "Number of Fatalities") + theme_bw() + theme(legend.position = "bottom") + theme(axis.text.x=element_text(angle=-90, hjust=0))

plot of chunk unnamed-chunk-6

Concerning economic damage:

ggplot(highest20EconomicThreats, aes(x=EVTYPE, y=PROPDMG_CROPDMG)) + geom_bar(stat="identity", fill = "steelblue", binwidth = 5) + labs(title = "Most economy damaging Weather Events", x = "Event Type", y = "Economic Damage") + theme_bw() + theme(legend.position = "bottom") + theme(axis.text.x=element_text(angle=-90, hjust=0))

plot of chunk unnamed-chunk-7