Introduction

We analysed the natural causes data for US since 1950 till 2011. Data was downloaded from

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Documentation provided in NATIONAL WEATHER SERVICE INSTRUCTION with the details on the variables and the data characteristics.

We selected the data we wanted all in caps.

setwd("~/Desktop/DATA/Reproducible Research")

stormdata <- read.csv("repdata%2Fdata%2FStormData.csv", header = TRUE, sep = ",")
storm <- stormdata[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
str(storm)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
storm$EVTYPE = toupper(storm$EVTYPE)

Data Processing

After data is downloaded we will answer questions about the US natural causes for damage and death.

With this, data is loaded with events in upper cases to allow combination.

Looking at the cookbook on page 6 we see that several events have different names, possibily due to the extended dates covered it was changed along. Example: “TSTM WIND” and “THUNDERSTORM WIND”

storm[storm$EVTYPE == "TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
storm[storm$EVTYPE == "THUNDERSTORM WINDS", ]$EVTYPE = "THUNDERSTORM WIND"
storm[storm$EVTYPE == "RIVER FLOOD", ]$EVTYPE = "FLOOD"
storm[storm$EVTYPE == "HURRICANE/TYPHOON", ]$EVTYPE = "HURRICANE-TYPHOON"
storm[storm$EVTYPE == "HURRICANE", ]$EVTYPE = "HURRICANE-TYPHOON"

Results

Most harmful events to human health

This data set account data for four types of damage: fatality (FATALITIES) , injury (INJURIES), property damage (PROPDMG) and crop damage (CROPDMG). The last two must be calculated with magnitude, PROPDMGEXP and CROPDMGEXP. The two first items are directly related to human health, so a summary is presented.

##                EVTYPE FATALITIES
## 755           TORNADO       5633
## 116    EXCESSIVE HEAT       1903
## 138       FLASH FLOOD        978
## 243              HEAT        937
## 417         LIGHTNING        816
## 683 THUNDERSTORM WIND        701

The code above aggregates the fatality data by event type and rank it in decreasing order. We see tornadoes and excessive heat are the most fatality-causing events from 1950 until 2011.

To see the Injuries results we crete:

##                EVTYPE INJURIES
## 755           TORNADO    91346
## 683 THUNDERSTORM WIND     9353
## 154             FLOOD     6791
## 116    EXCESSIVE HEAT     6525
## 417         LIGHTNING     5230
## 243              HEAT     2100

We found the two events causing more people injuries are again tornadoes and excessive heat. This makes sense has a lot of injuries can lead to fatalities. As one can see the code as the same structure as the one to find out the Fatalities, this will be used to determine other top causes for mahyem.

We can plot the data to visualise better.

barplot(fatalorder[1:10, 2], col = heat.colors(10), legend.text = fatalorder[1:10, 1], ylab = "Nº Fatalities people", main = "Top 10 Natural Cause for Fatalities")

barplot(injuryorder[1:10, 2], col = heat.colors(10), legend.text = injuryorder[1:10,
                                                                           1], ylab = "Nº Injured people", main = "Top 10 Natural Cause for Injury")

We can which events cause major fatalities and body injuries together.

intersect(fatalorder[1:10, 1], injuryorder[1:10, 1])
## [1] "TORNADO"           "EXCESSIVE HEAT"    "FLASH FLOOD"      
## [4] "HEAT"              "LIGHTNING"         "THUNDERSTORM WIND"
## [7] "FLOOD"

From the top 10 causes of fatalities and body injuries, 7 are intersected, tornadoes are the most harmful event to human health while others like exccesive heat, flash flood, and thunderstorm wind area listed as well.

Now we will look at the damage caused by these natural catastrophes.

We can summarize the data.

unique(storm$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
unique(storm$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

Looking at the cookbook on page 12, there are many conventions to figure reading and calculations: “K” stands for kilo, meaning thousand, “M” for millions and “B” for billions. We need to standarize the upper and lower case from those. But we have to transform exponential values to actual numbers.

storm[storm$PROPDMGEXP == "K", ]$PROPDMG <- storm[storm$PROPDMGEXP == "K", ]$PROPDMG * 
        1000
storm[storm$PROPDMGEXP == "M", ]$PROPDMG <- storm[storm$PROPDMGEXP == "M", ]$PROPDMG * 
        1000000
storm[storm$PROPDMGEXP == "m", ]$PROPDMG <- storm[storm$PROPDMGEXP == "m", ]$PROPDMG * 
        1000000
storm[storm$PROPDMGEXP == "B", ]$PROPDMG <- storm[storm$PROPDMGEXP == "B", ]$PROPDMG * 
        1000000000
storm[storm$CROPDMGEXP == "K", ]$CROPDMG <- storm[storm$CROPDMGEXP == "K", ]$CROPDMG * 
        1000
storm[storm$CROPDMGEXP == "k", ]$CROPDMG <- storm[storm$CROPDMGEXP == "k", ]$CROPDMG * 
        1000
storm[storm$CROPDMGEXP == "M", ]$CROPDMG <- storm[storm$CROPDMGEXP == "M", ]$CROPDMG * 
        1000000
storm[storm$CROPDMGEXP == "m", ]$CROPDMG <- storm[storm$CROPDMGEXP == "m", ]$CROPDMG * 
        1000000
storm[storm$CROPDMGEXP == "B", ]$CROPDMG <- storm[storm$CROPDMGEXP == "B", ]$CROPDMG * 
        1000000000

Now we can aggregate property and crop damage by event types and order them them in decreasing order, the same way we did before.

damage <- aggregate(PROPDMG ~ EVTYPE, data = storm, sum)
damage1 <- damage[damage$PROPDMG > 0, ]
damageorder <- damage1[order(damage1$PROPDMG, decreasing = TRUE), ]
head(damageorder)
##                EVTYPE      PROPDMG
## 154             FLOOD 149776655307
## 371 HURRICANE-TYPHOON  81174159010
## 755           TORNADO  56937160779
## 597       STORM SURGE  43323536000
## 138       FLASH FLOOD  16140812067
## 212              HAIL  15732267048

We can see that Flood is the most damage creating natural disaster.

cropdmg <- aggregate(CROPDMG ~ EVTYPE, data = storm, sum)
cropdmg1 <- cropdmg[cropdmg$CROPDMG > 0, ]
cropdmgorder <- cropdmg1[order(cropdmg1$CROPDMG, decreasing = TRUE), ]
head(cropdmgorder)
##                EVTYPE     CROPDMG
## 84            DROUGHT 13972566000
## 154             FLOOD 10691427450
## 371 HURRICANE-TYPHOON  5349782800
## 386         ICE STORM  5022113500
## 212              HAIL  3025954473
## 138       FLASH FLOOD  1421317100

Unsurprasingly Drought comes first as the most crop damaging natural cause.

We can visualize better in a graph form.

barplot(damageorder[1:10, 2], col = heat.colors(10), legend.text = damageorder[1:10, 
                                                                           1], ylab = "Dollars of damage", main = "Top 10 Natural Causes for DAMAGE")

barplot(cropdmgorder[1:10, 2], col = heat.colors(10), legend.text = cropdmgorder[1:10, 
                                                                             1], ylab = "Dollars of Crop damage", main = "Top 10 Natural Causes of Crop Damage")

We can see that both results have a different order. What if we added them all together ?

totaldmg <- merge(damageorder, cropdmgorder, by = "EVTYPE")
totaldmg$total = totaldmg$PROPDMG + totaldmg$CROPDMG
totaldmgorder <- totaldmg[order(totaldmg$total, decreasing = TRUE), ]
totaldmgorder[1:5, ]
##               EVTYPE      PROPDMG     CROPDMG        total
## 19             FLOOD 149776655307 10691427450 160468082757
## 52 HURRICANE-TYPHOON  81174159010  5349782800  86523941810
## 82           TORNADO  56937160779   414953270  57352114049
## 66       STORM SURGE  43323536000        5000  43323541000
## 31              HAIL  15732267048  3025954473  18758221521

That is a lot of damage. We have determine the most harmfull of natural disasters, causing death, injury, and economic consequences. Interesting would be to see how that damage has changed along the time, my guess is that it will increase with global warming, being more dangerous than ever.