This analysis is performed as part of a peer-graded assignment for the Coursera course ‘Reproducible Research’.
The objective of this analysis is to answer two questions about severe weather events: 1. which types of severe weather events are most harmful to population health? 2. which types of severe weather events have the greatest economic consequences?
The questions are answered based on a subset of the National Oceanic and Atmospheric Administration’s (NOAA) Storm Database, limited to events in the US from 1996-2011. Impacts on human health are measured as the total fatalities and injuries as a result of severe weather, and economic consequences as the total costs of property and crop damage.
Results: Results show that from 1950-2011 hurricanes account for by far the greatest impact on human health (both in terms of fatalities as in terms of injuries). The weather types with the biggest economic consequences are floods and hurricanes for property dammage, and drought and floods for crop damage.
Load the data in R as ‘StormData’, without first decompressing:
if (!exists('StormData'))
StormData<-read.csv("repdata_data_StormData.csv.bz2")
We don’t need any geographical or temporal information, so will remove this from the dataset. We will call the reduced dataset ‘Storm’.
Storm<-StormData[-c(1:6,9:20,29:37)]
Exploring how many event types there are:
## [1] 985
We will clean the data for this column, as it’s a crucial one for our research questions. we notice that quite a few of the event types contain numbers, which should all be removed since the official event types do not contain numbers:
#remove numbers
Storm$EVTYPE<-gsub('[[:digit:]]+', '', Storm$EVTYPE)
# replace all punct. characters with a space
Storm$EVTYPE<- gsub("[[:blank:][:punct:]+]", " ", Storm$EVTYPE)
#remove leading & trailing white spaces
Storm$EVTYPE<-trimws(Storm$EVTYPE)
#Replace a number of strings that don't appear in the list of 48 official event types on https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.
orig_event_types<- c("RIP CURRENTS", "TSTM WIND","EXTREME COLD WIND CHILL","EXTREME HEAT","HURRICANE/TYPHOON")
new_event_types <- c("RIP CURRENT","THUNDERSTORM WIND","EXTREME COLD","EXCESSIVE HEAT","HURRICANE")
for(i in 1:length(orig_event_types)) {
Storm$EVTYPE <- gsub(orig_event_types[i], new_event_types[i], Storm$EVTYPE, ignore.case = TRUE)
}
Storm$EVTYPE<-gsub(".*Hurricane.*","HURRICANE",Storm$EVTYPE,ignore.case = TRUE)
Storm$EVTYPE<-gsub(".*Tornado.*","HURRICANE",Storm$EVTYPE,ignore.case = TRUE)
Storm$EVTYPE <- gsub("TH.*WIND.*", "THUNDERSTORM WIND", Storm$EVTYPE,ignore.case = TRUE)
Storm$EVTYPE <- gsub("HI.*WIND.*", "HIGH WIND", Storm$EVTYPE,ignore.case = TRUE)
Storm$EVTYPE<-gsub(".*HE.*SNOW.*","HEAVY SNOW",Storm$EVTYPE,ignore.case = TRUE)
Storm$EVTYPE<-gsub(".*HE.*RAIN.*","HEAVY RAIN",Storm$EVTYPE,ignore.case = TRUE)
Storm$EVTYPE<-gsub(".*FLOOD.*","FLOOD",Storm$EVTYPE,ignore.case = TRUE)
Storm$EVTYPE<-gsub(".*BLIZZ.*","BLIZZARD",Storm$EVTYPE,ignore.case = TRUE)
Clean the PROPDMGEXP and CROPDMGEXP columns:
unique(Storm$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(Storm$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
# replace H, h, K,k M,m and B,b with numeric values
orig_exp_values<- c("H", "K", "M", "B","\\-","\\+","\\?")
new_exp_values <- c(2, 3, 6, 9,1,1,1)
for(i in 1:length(orig_exp_values)) {
Storm$PROPDMGEXP <- gsub(orig_exp_values[i], new_exp_values[i], Storm$PROPDMGEXP, ignore.case = TRUE)
Storm$CROPDMGEXP <- gsub(orig_exp_values[i], new_exp_values[i], Storm$CROPDMGEXP, ignore.case = TRUE)
}
Calculate the actual property and crop damage and store those as 2 new columns in the Storm datafame:
Storm<-mutate(Storm,PropertyDamage=PROPDMG*10^as.numeric(PROPDMGEXP),
CropDamage=CROPDMG*10^as.numeric(CROPDMGEXP))
#group events per fatality
fatal_events<-aggregate(FATALITIES~EVTYPE,Storm,sum)
#list the top 10 most fatal events
top_fatal_events <- fatal_events %>% arrange(desc(FATALITIES)) %>% slice(1:10)
top_fatal_events
## EVTYPE FATALITIES
## 1 HURRICANE 5796
## 2 EXCESSIVE HEAT 1999
## 3 FLOOD 1524
## 4 HEAT 937
## 5 LIGHTNING 817
## 6 THUNDERSTORM WIND 710
## 7 RIP CURRENT 572
## 8 HIGH WIND 293
## 9 EXTREME COLD 285
## 10 AVALANCHE 224
#plot
par(mar=c(5,8,4,2))
par(oma=c(8,1,3,3))
barplot(top_fatal_events$FATALITIES,names.arg=top_fatal_events$EVTYPE,las=2,main="10 most fatal event types",ylab="number of fatalities")
#group events per injury
injury_events<-aggregate(INJURIES~EVTYPE,Storm,sum)
#list the 10 event types with most injuries
top_injury_events <-injury_events %>% arrange(desc(INJURIES)) %>% slice(1:10)
top_injury_events
## EVTYPE INJURIES
## 1 HURRICANE 92735
## 2 THUNDERSTORM WIND 9469
## 3 FLOOD 8604
## 4 EXCESSIVE HEAT 6680
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 HIGH WIND 1471
## 9 HAIL 1361
## 10 WINTER STORM 1321
#plot
par(mar=c(5,8,4,2))
par(oma=c(8,1,3,3))
barplot(top_injury_events$INJURIES,names.arg=top_injury_events$EVTYPE,las=2,main="10 event types causing most injuries",ylab="number of injuries",width=800)
PropertyDamage_events<-aggregate(PropertyDamage~EVTYPE,Storm,sum)
#list the 10 event types with most property damage
top_PropertyDamage_events <-PropertyDamage_events %>% arrange(desc(PropertyDamage)) %>% slice(1:10)
top_PropertyDamage_events
## EVTYPE PropertyDamage
## 1 FLOOD 168190218789
## 2 HURRICANE 143359498474
## 3 STORM SURGE 43323536000
## 4 HAIL 15735819456
## 5 THUNDERSTORM WIND 9970370523
## 6 TROPICAL STORM 7703890550
## 7 WINTER STORM 6688497251
## 8 HIGH WIND 6003356490
## 9 WILDFIRE 4765114000
## 10 STORM SURGE TIDE 4641188000
par(mar=c(5,5,4,2))
par(oma=c(5,1,3,3))
# Plot for Property Damage
barplot(top_PropertyDamage_events$PropertyDamage,names.arg=top_PropertyDamage_events$EVTYPE,las=2,main="10 event types causing most property damage")
CropDamage_events<-aggregate(CropDamage~EVTYPE,Storm,sum)
#list the 10 event types with most property damage
top_CropDamage_events <-CropDamage_events %>% arrange(desc(CropDamage)) %>% slice(1:10)
top_CropDamage_events
## EVTYPE CropDamage
## 1 DROUGHT 13972566000
## 2 FLOOD 12379706100
## 3 HURRICANE 5932754320
## 4 ICE STORM 5022113500
## 5 HAIL 3026044470
## 6 EXTREME COLD 1293023000
## 7 THUNDERSTORM WIND 1224408980
## 8 FROST FREEZE 1094086000
## 9 HEAVY RAIN 795752800
## 10 HIGH WIND 686301900
# Plot for Crop Damage
#par(mar=c(5,5,4,2))
#par(oma=c(5,1,3,3))
# barplot(top_CropDamage_events$CropDamage,names.arg=top_CropDamage_events$EVTYPE,las=2,main="10 event types causing most crop damage")
When looking at the impact of severe weather on public health, we distinguish between the impact on fatalities and on injuries. The weather types that caused the most fatalities between 1950 and 2011 are tornadoes, excessive heat and floods. The weather types that caused the most injuries between 1950 and 2011 are tornadoes, thunderstorm winds and floods.
When looking at the impact of severe weather on the economy, we distinguish between the impact on property damage and on crop damage. The weather types that caused the most property damage between 1950 and 2011 are floods, hurricanes and tornadoes. The weather types that caused the most crop damage between 1950 and 2011 are drought, floods and hurricanes.