Sinopsys

This study aims to identify the types of events that are most harmful to the population health or have the greatest economic consequences. This report is based on data from events of a broad variety registered between 1950 and 2011. The analysis explores the total number of reported victims related to events as well as the amount of losses with damage on economic goods.

Data processing

Loading Raw Data

The data used was obtained from the National Oceanic and Atmospheric Administration’s (NOAA) National Weather Service. The Storm Data[47 MB] documents “…the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce”, according to National Weather Service Storm Data Documentation.

Environment setting

## load needed libraries
library(dplyr)
library(lattice)
library(ggplot2)
library(grid)
library(R.utils)
options(scipen = 1)

Reading in data

## setting vaviable 
setInternet2(use = TRUE)
urlData <-
        "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
fileName <- "repdata-data-StormData.csv.bz2"
destName <- "repdata-data-StormData.csv"
## Download the file if it isn't yet downloaded
if (!file.exists(fileName)) {
        download.file(url = urlData, destfile = fileName, mode = "wb")
}
if (!file.exists(destName)) {
        bunzip2(fileName, destName, overwrite = TRUE, remove = FALSE)
}
## read the file if it isn't yet read
if (!"stormData" %in% ls()) {
        stormData <- read.csv(destName)
}
## check the amount of data
dim(stormData)
## [1] 902297     37
head(stormData, 3)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1 2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3

We can see that there is a few variables of interest to the scope of this analysis. We may now subset the dataset to get just the list of interest.

## We need now to adjust the column names
names(stormData) <- tolower(names(stormData))
healthdata <- select(stormData, evtype, fatalities, injuries)
damagedata <- select(stormData, evtype, propdmg, propdmgexp, cropdmg, cropdmgexp)

Results

Across the United States, which types of events are most harmful with respect to population health?

To evaluate the danger of each event to the population health, we calculate the deaths and injuries caused by each event type, then we filter the 10 worst.

totalcases <- aggregate(.~ evtype, data = healthdata, FUN = sum)
summary(totalcases)
##                    evtype      fatalities         injuries      
##     HIGH SURF ADVISORY:  1   Min.   :   0.00   Min.   :    0.0  
##   COASTAL FLOOD       :  1   1st Qu.:   0.00   1st Qu.:    0.0  
##   FLASH FLOOD         :  1   Median :   0.00   Median :    0.0  
##   LIGHTNING           :  1   Mean   :  15.38   Mean   :  142.7  
##   TSTM WIND           :  1   3rd Qu.:   0.00   3rd Qu.:    0.0  
##   TSTM WIND (G45)     :  1   Max.   :5633.00   Max.   :91346.0  
##  (Other)              :979
fatalmost <- totalcases[order(-totalcases$fatalities),][1:10,]
injurmost <- totalcases[order(-totalcases$injuries),][1:10,]
fatalmost[,c("evtype", "fatalities")]
##             evtype fatalities
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
injurmost[,c("evtype", "injuries")]
##                evtype injuries
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

The following graph shows the comparison on the number of fatalities and injuries caused by the 10 worst weather events to the population health.

par(mar = c(6,4,4, 2), mfrow = c(1, 2))
barplot(
        fatalmost$fatalities,
        names.arg = fatalmost$evtype,
        las = 2,
        cex.names = 0.5,
        ylim = c(0, 6000),
        main = "Fatalities caused by \n Severe Weather Events \n in U.S. (1950 - 2011)",
        ylab = "Number of Fatalities"
)
barplot(
        injurmost$injuries / 1000,
        names.arg = injurmost$evtype,
        las = 2,
        cex.names = 0.5,
        ylim = c(0, 100),
        main = "Injuries caused by \n Severe Weather Events \n in U.S. (1950 - 2011)",
        ylab = "Number of Injuries(in thousand cases)"
)

Across the United States, which types of events have the greatest economic consequences?

To evaluate the damages caused by each event to the economy, we calculate the property damages and crop damages, caused by each event type, then we filter the 10 worst.

## we need to adjust some values to avoid errors
adjustvalues <- function(dmg,exp){
        if (is.na(dmg) || is.null(dmg)) {return(0)}
        if (is.na(exp) || is.null(exp)) {return(ifelse(is.numeric(dmg),dmg,0))}
        if (toupper(exp)=='B') {return(dmg*10^9)}
        if (toupper(exp)=='M') {return(dmg*10^6)}
        if (toupper(exp)=='K') {return(dmg*10^3)}
        if (toupper(exp)=='H') {return(dmg*10^2)}
        if (exp=='0') {return(dmg)}
        return(dmg)
}

damagedata$losses <- adjustvalues(damagedata$propdmg, damagedata$propdmgexp) + adjustvalues(damagedata$cropdmg, damagedata$cropdmgexp)

## compute the sum of losses by event type
totalloss <- aggregate(losses ~ evtype, data = damagedata, FUN = sum)
summary(filter(totalloss, losses > 0))
##                    evtype        losses          
##     HIGH SURF ADVISORY:  1   Min.   :         1  
##   FLASH FLOOD         :  1   1st Qu.:      5000  
##   TSTM WIND           :  1   Median :     50000  
##   TSTM WIND (G45)     :  1   Mean   :  25257257  
##  ?                    :  1   3rd Qu.:    501008  
##  AGRICULTURAL FREEZE  :  1   Max.   :3212358179  
##  (Other)              :425
damagemost <- totalloss[order(-totalloss$losses),][1:10,]
damagemost
##                 evtype     losses
## 834            TORNADO 3212358179
## 153        FLASH FLOOD 1420303790
## 856          TSTM WIND 1336074813
## 170              FLOOD  900106518
## 760  THUNDERSTORM WIND  876910961
## 244               HAIL  689272976
## 464          LIGHTNING  603355361
## 786 THUNDERSTORM WINDS  446311865
## 359          HIGH WIND  324748843
## 972       WINTER STORM  132722569

The following graph shows the comparison on the economic consequences caused by the 10 worst weather events.

par(mar = c(6,4,4, 2))
barplot(
        damagemost$losses / 1000000,
        names.arg = damagemost$evtype,
        las = 2,
        cex.names = 0.5,
        main = "Damages caused by \n Severe Weather Events \n in U.S. (1950 - 2011)",
        ylim = c(0, 3500),
        ylab = "Damages (in Million US$)"
)

Conclusion

Tornados are the most harmful weather events to popullation healt as well as to the economy.