author: angelayuan
date: Friday, March 20, 2015
In this report we aim to investigate which type of events are most harmful with respect to population health, and which type of events have the greatest economic consequences across the U.S. To this end, we obtained Storm Data from National Climatic Data Center (NCDC) who regularly receives Storm Data from the National Weather Service (NWS). From these data, we found that, on average across the U.S., (1) Tornado is most harmful with respect to population health, resulting in the most fatalities and injuries; and (2) flood has the greatest economic consequences, causing the heaviest property and crop damage.
We obtained Storm Data from the internet (https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2).
We read in the data, and check the first few rows.
data <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
Here we will make plots to illustrate the two questions: Across the U.S, (1) which types of events are most harmful with respect to population health; and (2) which types of events have the greatest economic consequences.
We first sort the data according to fatalities in decreasing order and sort the data according to injuries in decreasing order, respectively
fat <- arrange(data2,desc(FATALITIES),desc(INJURIES))
inj <- arrange(data2,desc(INJURIES),desc(FATALITIES))
head(fat)
## Group.1 FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 EXCESSIVE HEAT 1903 6525
## 3 FLASH FLOOD 978 1777
## 4 HEAT 937 2100
## 5 LIGHTNING 816 5230
## 6 TSTM WIND 504 6957
head(inj)
## Group.1 FATALITIES INJURIES
## 1 TORNADO 5633 91346
## 2 TSTM WIND 504 6957
## 3 FLOOD 470 6789
## 4 EXCESSIVE HEAT 1903 6525
## 5 LIGHTNING 816 5230
## 6 HEAT 937 2100
sum(fat$FATALITIES[1:10])/sum(fat$FATALITIES)
## [1] 0.797689
sum(inj$INJURIES[1:10])/sum(inj$INJURIES)
## [1] 0.893402
sum(fat$FATALITIES[1])/sum(fat$FATALITIES)
## [1] 0.3719379
sum(inj$INJURIES[1])/sum(inj$INJURIES)
## [1] 0.6500199
We can see that the fatalities caused by the top 10 harmful events accounts for 79.8% of total fatalities, and the injuries caused by the top 10 harmful events accounts for 89.3% of total injuries. Moreover, the most harmful event caused 37.2% fatalities and 65% injuries. These results indicate the necessity to prevent harmful events especially TORNADO!
We plot the top 10 harmful events for fatalities and injuries separately.
par(mar = c(10,10,2,2), las = 2)
#par(mfrow = c(2,1))
barplot( height = fat$FATALITIES[1:10]/1000 , names.arg = fat$Group.1[1:10] , col = heat.colors(10) , main = "Top 10 Harmful Events Cause Most Fatalities" , ylab = "Total number of fatalities (thousand persons)")
par(mar = c(10,8,2,2), las = 2)
barplot( height = inj$INJURIES[1:10]/1000 , names.arg = inj$Group.1[1:10] , col = heat.colors(10) , main = "Top 10 Harmful Events Cause Most Injuries" , ylab = "Total number of injuries (thousand persons)")
For above results, we can answer that Tornado is most harmful with respect to population health.
We first sort the data according to property damage in decreasing order and sort the data according to crop damage in decreasing order, respectively.
prop <- arrange(data4,desc(PROPDMG),desc(CROPDMG))
crop <- arrange(data4,desc(CROPDMG),desc(PROPDMG))
head(prop)
## Group.1 PROPDMG CROPDMG
## 1 FLOOD 144657709807 5661968450
## 2 HURRICANE/TYPHOON 69305840000 2607872800
## 3 TORNADO 56937160779 414953270
## 4 STORM SURGE 43323536000 5000
## 5 FLASH FLOOD 16140812067 1421317100
## 6 HAIL 15732267048 3025954473
head(crop)
## Group.1 PROPDMG CROPDMG
## 1 DROUGHT 1046106000 13972566000
## 2 FLOOD 144657709807 5661968450
## 3 RIVER FLOOD 5118945500 5029459000
## 4 ICE STORM 3944927860 5022113500
## 5 HAIL 15732267048 3025954473
## 6 HURRICANE 11868319010 2741910000
sum(prop$PROPDMG[1:10])/sum(prop$PROPDMG)
## [1] 0.8837154
sum(crop$CROPDMG[1:10])/sum(crop$CROPDMG)
## [1] 0.8526812
sum(prop$PROPDMG[1])/sum(prop$PROPDMG)
## [1] 0.3385242
sum(crop$CROPDMG[1])/sum(crop$CROPDMG)
## [1] 0.2845494
We can see that the property damage caused by the top 10 harmful events accounts for 88.4% of total property damage, and the crop damage caused by the top 10 harmful events accounts for 85.3% of total crop damage. Moreover, the most harmful event caused 33.9% property damage (FLOOD) and 28.5% crop damage (DROUGHT).
We calculate the damage taking both property and crop damage into consideration.
data5 <- mutate(data4, DMG = data4$PROPDMG+data4$CROPDMG)
data5 <- arrange(data5,desc(DMG))
head(data5)
## Group.1 PROPDMG CROPDMG DMG
## 1 FLOOD 144657709807 5661968450 150319678257
## 2 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3 TORNADO 56937160779 414953270 57352114049
## 4 STORM SURGE 43323536000 5000 43323541000
## 5 HAIL 15732267048 3025954473 18758221521
## 6 FLASH FLOOD 16140812067 1421317100 17562129167
From above results, we can see that flood has the greatest economic consequences. We plot the top 10 harmful events for damage as follows.
par(mar = c(11,10,3,3), las = 2)
par(mfrow = c(1,1))
barplot( height = data5$DMG[1:10]/1000000000 , names.arg = data5$Group.1[1:10] , col = heat.colors(10) , main = "Top 10 Harmful Events Cause the Greatest Economic Loss" , ylab = "Total economic loss (billion dollars)")
For above results, we can answer that Flood has the greatest economic consequences.