We analysed the natural causes data for US since 1950 till 2011. Data was downloaded from
https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
Documentation provided in NATIONAL WEATHER SERVICE INSTRUCTION with the details on the variables and the data characteristics.
We selected the data we wanted all in caps.
setwd("~/Desktop/DATA/Reproducible Research")
stormdata <- read.csv("repdata%2Fdata%2FStormData.csv", header = TRUE, sep = ",")
storm <- stormdata[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
str(storm)
## 'data.frame': 902297 obs. of 7 variables:
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
storm$EVTYPE = toupper(storm$EVTYPE)
After data is downloaded we will answer questions about the US natural causes for damage and death.
With this, data is loaded with events in upper cases to allow combination.
Looking at the cookbook on page 6 we see that several events have different names, possibily due to the extended dates covered it was changed along. Example: “TSTM WIND” and “THUNDERSTORM WIND”
storm[storm$EVTYPE == "TSTM WIND", ]$EVTYPE = "THUNDERSTORM WIND"
storm[storm$EVTYPE == "THUNDERSTORM WINDS", ]$EVTYPE = "THUNDERSTORM WIND"
storm[storm$EVTYPE == "RIVER FLOOD", ]$EVTYPE = "FLOOD"
storm[storm$EVTYPE == "HURRICANE/TYPHOON", ]$EVTYPE = "HURRICANE-TYPHOON"
storm[storm$EVTYPE == "HURRICANE", ]$EVTYPE = "HURRICANE-TYPHOON"
Most harmful events to human health
This data set account data for four types of damage: fatality (FATALITIES) , injury (INJURIES), property damage (PROPDMG) and crop damage (CROPDMG). The last two must be calculated with magnitude, PROPDMGEXP and CROPDMGEXP. The two first items are directly related to human health, so a summary is presented.
## EVTYPE FATALITIES
## 755 TORNADO 5633
## 116 EXCESSIVE HEAT 1903
## 138 FLASH FLOOD 978
## 243 HEAT 937
## 417 LIGHTNING 816
## 683 THUNDERSTORM WIND 701
The code above aggregates the fatality data by event type and rank it in decreasing order. We see tornadoes and excessive heat are the most fatality-causing events from 1950 until 2011.
To see the Injuries results we crete:
## EVTYPE INJURIES
## 755 TORNADO 91346
## 683 THUNDERSTORM WIND 9353
## 154 FLOOD 6791
## 116 EXCESSIVE HEAT 6525
## 417 LIGHTNING 5230
## 243 HEAT 2100
We found the two events causing more people injuries are again tornadoes and excessive heat. This makes sense has a lot of injuries can lead to fatalities. As one can see the code as the same structure as the one to find out the Fatalities, this will be used to determine other top causes for mahyem.
We can plot the data to visualise better.
barplot(fatalorder[1:10, 2], col = heat.colors(10), legend.text = fatalorder[1:10, 1], ylab = "Nº Fatalities people", main = "Top 10 Natural Cause for Fatalities")
barplot(injuryorder[1:10, 2], col = heat.colors(10), legend.text = injuryorder[1:10,
1], ylab = "Nº Injured people", main = "Top 10 Natural Cause for Injury")
We can which events cause major fatalities and body injuries together.
intersect(fatalorder[1:10, 1], injuryorder[1:10, 1])
## [1] "TORNADO" "EXCESSIVE HEAT" "FLASH FLOOD"
## [4] "HEAT" "LIGHTNING" "THUNDERSTORM WIND"
## [7] "FLOOD"
From the top 10 causes of fatalities and body injuries, 7 are intersected, tornadoes are the most harmful event to human health while others like exccesive heat, flash flood, and thunderstorm wind area listed as well.
Now we will look at the damage caused by these natural catastrophes.
We can summarize the data.
unique(storm$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
unique(storm$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
Looking at the cookbook on page 12, there are many conventions to figure reading and calculations: “K” stands for kilo, meaning thousand, “M” for millions and “B” for billions. We need to standarize the upper and lower case from those. But we have to transform exponential values to actual numbers.
storm[storm$PROPDMGEXP == "K", ]$PROPDMG <- storm[storm$PROPDMGEXP == "K", ]$PROPDMG *
1000
storm[storm$PROPDMGEXP == "M", ]$PROPDMG <- storm[storm$PROPDMGEXP == "M", ]$PROPDMG *
1000000
storm[storm$PROPDMGEXP == "m", ]$PROPDMG <- storm[storm$PROPDMGEXP == "m", ]$PROPDMG *
1000000
storm[storm$PROPDMGEXP == "B", ]$PROPDMG <- storm[storm$PROPDMGEXP == "B", ]$PROPDMG *
1000000000
storm[storm$CROPDMGEXP == "K", ]$CROPDMG <- storm[storm$CROPDMGEXP == "K", ]$CROPDMG *
1000
storm[storm$CROPDMGEXP == "k", ]$CROPDMG <- storm[storm$CROPDMGEXP == "k", ]$CROPDMG *
1000
storm[storm$CROPDMGEXP == "M", ]$CROPDMG <- storm[storm$CROPDMGEXP == "M", ]$CROPDMG *
1000000
storm[storm$CROPDMGEXP == "m", ]$CROPDMG <- storm[storm$CROPDMGEXP == "m", ]$CROPDMG *
1000000
storm[storm$CROPDMGEXP == "B", ]$CROPDMG <- storm[storm$CROPDMGEXP == "B", ]$CROPDMG *
1000000000
Now we can aggregate property and crop damage by event types and order them them in decreasing order, the same way we did before.
damage <- aggregate(PROPDMG ~ EVTYPE, data = storm, sum)
damage1 <- damage[damage$PROPDMG > 0, ]
damageorder <- damage1[order(damage1$PROPDMG, decreasing = TRUE), ]
head(damageorder)
## EVTYPE PROPDMG
## 154 FLOOD 149776655307
## 371 HURRICANE-TYPHOON 81174159010
## 755 TORNADO 56937160779
## 597 STORM SURGE 43323536000
## 138 FLASH FLOOD 16140812067
## 212 HAIL 15732267048
We can see that Flood is the most damage creating natural disaster.
cropdmg <- aggregate(CROPDMG ~ EVTYPE, data = storm, sum)
cropdmg1 <- cropdmg[cropdmg$CROPDMG > 0, ]
cropdmgorder <- cropdmg1[order(cropdmg1$CROPDMG, decreasing = TRUE), ]
head(cropdmgorder)
## EVTYPE CROPDMG
## 84 DROUGHT 13972566000
## 154 FLOOD 10691427450
## 371 HURRICANE-TYPHOON 5349782800
## 386 ICE STORM 5022113500
## 212 HAIL 3025954473
## 138 FLASH FLOOD 1421317100
Unsurprasingly Drought comes first as the most crop damaging natural cause.
We can visualize better in a graph form.
barplot(damageorder[1:10, 2], col = heat.colors(10), legend.text = damageorder[1:10,
1], ylab = "Dollars of damage", main = "Top 10 Natural Causes for DAMAGE")
barplot(cropdmgorder[1:10, 2], col = heat.colors(10), legend.text = cropdmgorder[1:10,
1], ylab = "Dollars of Crop damage", main = "Top 10 Natural Causes of Crop Damage")
We can see that both results have a different order. What if we added them all together ?
totaldmg <- merge(damageorder, cropdmgorder, by = "EVTYPE")
totaldmg$total = totaldmg$PROPDMG + totaldmg$CROPDMG
totaldmgorder <- totaldmg[order(totaldmg$total, decreasing = TRUE), ]
totaldmgorder[1:5, ]
## EVTYPE PROPDMG CROPDMG total
## 19 FLOOD 149776655307 10691427450 160468082757
## 52 HURRICANE-TYPHOON 81174159010 5349782800 86523941810
## 82 TORNADO 56937160779 414953270 57352114049
## 66 STORM SURGE 43323536000 5000 43323541000
## 31 HAIL 15732267048 3025954473 18758221521
That is a lot of damage. We have determine the most harmfull of natural disasters, causing death, injury, and economic consequences. Interesting would be to see how that damage has changed along the time, my guess is that it will increase with global warming, being more dangerous than ever.