Synopsis

The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database was explored in order to study which are the most harmful events to population health and to the economy of the United States. The data was downloaded and only the columns of interest was handled. Some data processing was needed to evaluate the economic losses, since there are exponent columns.

After that the 10 most harmful events regarding fatalities and injured was presented, and it was possible to see that tornardos are by far the most harmful.

Regarding the economy the 10 most harmful events were also presented and the conclusion was that floods caused the biggest damages in properties whereas drouhgts were responsible for the biggest losses in crop values.

Data Processing

First the file is downloaded and loaded to the variable “stormdata”

if (!file.exists("/stormdata.csv.bz2")) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                      destfile = "./stormData.csv.bz2")
}

stormdata<-read.csv("stormdata.csv.bz2",header = TRUE, sep = ",")

Next, the data is explored, to check how it was built

head(stormdata)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

To answer the questions of interest, the columns EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP will be used.

event <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", 
           "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
data <- stormdata[event]

To calculate the total property and crop damages by events, there is a need to deal with the exponent columns, which indicate the magnitude of each damage value.

unique(data$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

The numbers indicate the power of 10 of each value, and the letters indicate if it is hundreds (h, H), thousands (K), millions (M,m), or billions (B). If the level is blank or 0, it means units. The values “-”, “?”, “+” are invalid data and will be assigned as zero.

data$PROPEXP[data$PROPDMGEXP == ""] <- 1e+00
data$PROPEXP[data$PROPDMGEXP == "0"] <- 1
data$PROPEXP[data$PROPDMGEXP == "1"] <- 1e+01
data$PROPEXP[data$PROPDMGEXP == "2"] <- 1e+02
data$PROPEXP[data$PROPDMGEXP == "h"] <- 1e+02
data$PROPEXP[data$PROPDMGEXP == "H"] <- 1e+02
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1e+03
data$PROPEXP[data$PROPDMGEXP == "K"] <- 1e+03
data$PROPEXP[data$PROPDMGEXP == "4"] <- 1e+04
data$PROPEXP[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPEXP[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "m"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "M"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPEXP[data$PROPDMGEXP == "8"] <- 1e+08
data$PROPEXP[data$PROPDMGEXP == "B"] <- 1e+09

data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0

data$PROPVAL <- data$PROPDMG * data$PROPEXP

The same was done for the exponent column of crop damage.

unique(data$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

The numbers indicate the power of 10 of each value, and the letters indicate if it is thousands (k,K), millions (M,m) or billions (B). If the level is blank or 0, it means units. The value “?” is for invalid data and will be assigned as zero.

data$CROPEXP[data$CROPDMGEXP == ""] <- 1e+00
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "2"] <- 1e+02
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1e+03
data$CROPEXP[data$CROPDMGEXP == "k"] <- 1e+03
data$CROPEXP[data$CROPDMGEXP == "m"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "M"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "B"] <- 1e+09

data$CROPEXP[data$CROPDMGEXP == "?"] <- 0

data$CROPVAL <- data$CROPDMG * data$CROPEXP

Results

Types of events that are most harmful to population health

The events were ranked regarding the number of fatalities and injuries.

fatalities<-aggregate(FATALITIES~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
fatalities<-head(fatalities[order(fatalities$FATALITIES,decreasing = TRUE),],10)

injuries<-aggregate(INJURIES~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
injuries<-head(injuries[order(injuries$INJURIES,decreasing = TRUE),],10)

print(fatalities)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
print(injuries)
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
par(mfrow=c(1,2),mar=c(12,5,3,2),cex=0.75,mgp=c(4,1,0))

barplot(fatalities$FATALITIES,
        names.arg = fatalities$EVTYPE,
        las=2,
        ylab = "Fatalities",
        main="Fatalities vs Events")

barplot(injuries$INJURIES,
        names.arg = injuries$EVTYPE,
        las=2,
        ylab = "Injuries",
        main="Injuries vs Events")
Types of events that are most harmful to population health

Types of events that are most harmful to population health

As we can see, Tornados are by far the most harmful events to population health (both in fatalities an injuries).

Types of events that have the greatest economic consequences

The events were ranked regarding property and crop damage.

prop<-aggregate(PROPVAL~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
prop<-head(prop[order(prop$PROPVAL,decreasing = TRUE),],10)

crop<-aggregate(CROPVAL~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
crop<-head(crop[order(crop$CROPVAL,decreasing = TRUE),],10)

print(prop)
##                EVTYPE      PROPVAL
## 170             FLOOD 144657709807
## 411 HURRICANE/TYPHOON  69305840000
## 834           TORNADO  56947380617
## 670       STORM SURGE  43323536000
## 153       FLASH FLOOD  16822673979
## 244              HAIL  15735267513
## 402         HURRICANE  11868319010
## 848    TROPICAL STORM   7703890550
## 972      WINTER STORM   6688497251
## 359         HIGH WIND   5270046260
print(crop)
##                EVTYPE     CROPVAL
## 95            DROUGHT 13972566000
## 170             FLOOD  5661968450
## 590       RIVER FLOOD  5029459000
## 427         ICE STORM  5022113500
## 244              HAIL  3025954473
## 402         HURRICANE  2741910000
## 411 HURRICANE/TYPHOON  2607872800
## 153       FLASH FLOOD  1421317100
## 140      EXTREME COLD  1292973000
## 212      FROST/FREEZE  1094086000
par(mfrow=c(1,2),mar=c(12,5,3,2),cex=0.75,mgp=c(4,1,0))

barplot(prop$PROPVAL*1e-9,
        names.arg = prop$EVTYPE,
        las=2,
        ylab = "Property Damage (in billions of USD)",
        main="Property Damage vs Events")

barplot(crop$CROPVAL*1e-9,
        names.arg = crop$EVTYPE,
        las=2,
        ylab = "Crop Damage (in billions of USD)",
        main="Crop Damage vs Events")
Types of events that are most harmful to the economy

Types of events that are most harmful to the economy

As we can see, floods are responsible for the biggest property damages and droughts are accountable for the biggest losses in crops.