This report examines the impact of climate events in the U.S. on the population and the economy. It outlines the approach and results. This is a project of the course “Reproducible Research” and the final project of week 4. This analysis is based on the U.S. National Oceanic and Athmospheric Administration (NOAA) storm database. This database records characteristics of severe storms and weather events across the United States. Also recorded are estimates of fatalities, injuries and property damage.
We found the following results:
In this report, the two questions 1. what types of events are most harmful to the health of the population in the U.S.? 2. what types of events have the greatest economic impact in the US.
For the first question, we found that tornadoes are at the top of the list in terms of deaths and injuries. And that by far.
On the second question, the flood has the greatest impact on Proberty. The biggest influence on Corp is by Drought.
Storm event Data which is used in this analysis study can be found at Storm Data [47Mb]
The documentation for the Storm data are described in the 2 documents which you can found there:
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Download and read Storm event data from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 Extract only the columns that we need.
url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
dest_file <- "stromData"
if (!file.exists(dest_file)){
download.file(url, dest_file)
}
NOAA_data <- read.csv("stromData")
Storm_data <- NOAA_data[, -c(1:7, 9:22, 29:37)]
fatal <- aggregate(FATALITIES ~ EVTYPE, data=Storm_data, sum)
fatal_10 <- fatal[order(-fatal$FATALITIES), ][1:10,]
fatal_10
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
inj <- aggregate(INJURIES ~ EVTYPE, data=Storm_data, sum)
inj_10 <- inj[order(-inj$INJURIES), ][1:10,]
inj_10
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
The values for the fatalties and injuries are very different. To make it clearer, we will only look at the top 10.
par(mfrow = c(1, 2), mar = c(12, 8, 3, 2), mgp = c(4, 1, 0), cex = 0.8)
with(fatal_10, barplot(FATALITIES, las = 2, names.arg = EVTYPE,
ylab = 'Number of Fatalities', main='Top 10 Highest Fatalities Events',
col = 'steelblue', ylim = c(0,6000)))
with(inj_10, barplot(INJURIES, las = 2, names.arg = EVTYPE,
ylab = 'Injuries', main='Top 10 Highest Injuries Events',
col = 'steelblue'))
Top 10 Fatalities and Injuries events
From the storm data, we can see that there are PROPDMGEXP and CROPDMGEXP fields, which presumably correspond to the PROPDMG and CROPDMG fields prospectively. Unfortunately, we do not find information on how to decode these exponent fields at NOAA or in the course project description. From our clarifications in the discussion forum and on the Internet, we found a key to decode PROPDMGEXP. We assume that it is correct and use it for further analysis. https://github.com/dsong99/Reproducible-Proj-2/blob/master/storm_exp_code.csv
code <- data.frame(
'Expo'=factor(c('K','k','M','m','B','b','H','h','0','1','2','3','4','5','6','7','8','+','-','?','')),
'Multiplier' =c(1000,1000,1000000,1000000,1000000000,1000000000,100,100,1,1,1,1,1,1,1,1,1,1,0,0,0)
)
prop <- Storm_data[,c('EVTYPE','PROPDMG','PROPDMGEXP')]
unique(prop$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
prop <- merge(x=prop, y=code, by.x = 'PROPDMGEXP', by.y = 'Expo', all.x=TRUE )
prop$PROPDMG <- prop$PROPDMG * prop$Multiplier
prop_damages_analysis <- aggregate(PROPDMG~EVTYPE, data= prop, sum)
prop_damages_analysis <- prop_damages_analysis[order(-prop_damages_analysis$PROPDMG),]
prop_damages_10 <- head(prop_damages_analysis, 10)
prop_damages_10
## EVTYPE PROPDMG
## 170 FLOOD 144657709800
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56937160776
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16140811860
## 244 HAIL 15732267486
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046280
prop_damages_10$PROPDMG <- round(prop_damages_10$PROPDMG/1e+9, 3)
prop_damages_10
## EVTYPE PROPDMG
## 170 FLOOD 144.658
## 411 HURRICANE/TYPHOON 69.306
## 834 TORNADO 56.937
## 670 STORM SURGE 43.324
## 153 FLASH FLOOD 16.141
## 244 HAIL 15.732
## 402 HURRICANE 11.868
## 848 TROPICAL STORM 7.704
## 972 WINTER STORM 6.688
## 359 HIGH WIND 5.270
crop <- Storm_data[,c('EVTYPE','CROPDMG','CROPDMGEXP')]
unique(crop$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
crop <- merge(x=crop, y=code, by.x = 'CROPDMGEXP', by.y = 'Expo', all.x=TRUE )
crop$CROPDMG <- crop$CROPDMG * crop$Multiplier
crop_damages <- aggregate(CROPDMG~EVTYPE, data = crop, FUN=sum)
crop_damages_10 <- crop_damages[order(-crop_damages$CROPDMG),][1:10,]
crop_damages_10
## EVTYPE CROPDMG
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954470
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
crop_damages_10$CROPDMG <- round(crop_damages_10$CROPDMG/1e+9, 3)
crop_damages_10
## EVTYPE CROPDMG
## 95 DROUGHT 13.973
## 170 FLOOD 5.662
## 590 RIVER FLOOD 5.029
## 427 ICE STORM 5.022
## 244 HAIL 3.026
## 402 HURRICANE 2.742
## 411 HURRICANE/TYPHOON 2.608
## 153 FLASH FLOOD 1.421
## 140 EXTREME COLD 1.293
## 212 FROST/FREEZE 1.094
par(mfrow = c(1, 2), mar = c(12, 8, 3, 2), mgp = c(4, 1, 0), cex = 0.8)
with(prop_damages_10, barplot(PROPDMG, las = 2, names.arg = EVTYPE,
ylab = 'Property Damages in Billions ',
main='Top 10 Highest Property Damages Events', col = 'steelblue'))
with(crop_damages_10, barplot(CROPDMG, las = 2, names.arg = EVTYPE,
ylab = 'Crop Damages in Billions ',
main='Top 10 Highest Crop Damages Events', ylim=c(0, 14),col = 'steelblue'))
Top 10 Property and Crop Damages events
In this report, the two questions 1. what types of events are most harmful to the health of the population in the U.S. 2. what types of events have the greatest economic impact in the US.
For the first question, we found that Tornadoes are at the top of the list in terms of deaths and injuries. And that by far.
On the second question, the Flood has the greatest impact on Proberty. The biggest influence on Corp is by Drought.