In this report, we will attempt to analyze weather event types across the United States that are most harmful with respect to population health. We will also evaluate which types of events have the greatest economic consequences.

We used the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Our analysis shows that tornadoes are the most harmful to public health due to their high number of fatalities. From an economic perspective, floods and hurricanes/typhoons are the most damaging due to their total property damages.

Data Processing

To process the NOAA data, we will load the data from the data source (a URL from the web), save it locally, and read it into R for further analysis. This portion of code is also cached for efficiency.

url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
fileName <- "StormData.csv.bz2"

if(!file.exists(fileName)) {
    download.file(url, destfile = fileName, method = "curl")
}

data <- read.csv(fileName)

Results

First, we evaluated the weather events with the greatest impacts to public health. We considered two primary factors: Fatalities and Injuries. Furthermore, we took into account the average and total number of fatalities and injuries by weather event caused by theses events.

The table below shows a summary of the top 10 weather events sorted from highest total fatalities and injuries. The averages of these items are also shown for context. Tornadoes by far are the most harmful to public health as they have caused the highest number of fatalities and injuries over time. Excessive heat follows tornadoes with the second highest total number of fatalities.

library(dplyr)
summary <- summarise(group_by(data, EVTYPE), 
    TotalFatalities = sum(FATALITIES),
    TotalInjuries = sum(INJURIES),
    AvgFatalities = round(mean(FATALITIES),3), 
    AvgInjuries = round(mean(INJURIES),3)
    
)

summaryTable <- arrange(summary, 
    desc(TotalFatalities), 
    desc(TotalInjuries),
    desc(AvgFatalities), 
    desc(AvgInjuries) 
)

summaryTable
## Source: local data frame [985 x 5]
## 
##            EVTYPE TotalFatalities TotalInjuries AvgFatalities AvgInjuries
## 1         TORNADO            5633         91346         0.093       1.506
## 2  EXCESSIVE HEAT            1903          6525         1.134       3.889
## 3     FLASH FLOOD             978          1777         0.018       0.033
## 4            HEAT             937          2100         1.222       2.738
## 5       LIGHTNING             816          5230         0.052       0.332
## 6       TSTM WIND             504          6957         0.002       0.032
## 7           FLOOD             470          6789         0.019       0.268
## 8     RIP CURRENT             368           232         0.783       0.494
## 9       HIGH WIND             248          1137         0.012       0.056
## 10      AVALANCHE             224           170         0.580       0.440
## ..            ...             ...           ...           ...         ...

Second, we evaluate the economic impact of weather events. To do this we use the property damage field (PROPDMG) provided in our data.

A caveat is that our data uses a secondary field (PROPDMGEXP) to denote the property damage expression where K, M, and B are used to signify Thousand, Million, and Billion respectively. We used R code below to create a field that multiplies the property damage field to create a single measure of currency. We noticed some data collection issues as all rows did not contain the appropriate damage expression as specified. Thus, our analysis will only include values where a damage expression was specified.

kData <- mutate(data[data$PROPDMGEXP == 'K',], PROPDMGNUM = PROPDMG * 1000)
mData <- mutate(data[data$PROPDMGEXP == 'M' | data$PROPDMGEXP == 'm' ,], PROPDMGNUM = PROPDMG * 1000000)
bData <- mutate(data[data$PROPDMGEXP == 'B',], PROPDMGNUM = PROPDMG * 1000000000)
demageData <- rbind(kData,mData,bData)

The table below shows a summary of the top 10 weather events sorted by the highest total property damage. Floods and Hurricane/Typhoon where the top 2 while Tornado was in third place even though it was number one from a health impact perspective.

demageSummary <- summarise(group_by(demageData, EVTYPE), totalDemage = sum(PROPDMGNUM))
demageSummaryTable <- arrange(demageSummary, desc(totalDemage))
demageSummaryTable
## Source: local data frame [404 x 2]
## 
##               EVTYPE  totalDemage
## 1              FLOOD 144657709800
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56937160480
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16140811510
## 6               HAIL  15732266720
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497250
## 10         HIGH WIND   5270046260
## ..               ...          ...