In this report we try to identify the most harmful types of weather events across the United States between 1991 and 2011. We tried to determine if most of the damages/fatalities/injuries are caused only by few types of weather events. We found out that Floods and Hurricanes are most costly type of event meanwhile EXCESSIVE HEAT, FLASH FLOODS, and TORNADOS are most lethal types for the population. If identified correctly it will be helpful to investigate and invest in proper preparation for such events. If a weather event is inevitable, preventing (dealing with) the event consequences should be the priority.
Originally provided by National Weather Service the 1950-2011 raw data of natural disaster statistics can be downloaded from R Foundation Course servers.
myFile<-"repdata_data_StormData.csv.bz2"
if (!file.exists(myFile)) { download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile=myFile, method='cur') }
First we load the row data and check its dimensions (there are 902297 rows in the dataset)
storm_data<-read.csv(myFile, na.strings=c(""))
dim(storm_data)
## [1] 902297 37
Though some data was gathered since 1950, there was too much of missing data at that time. In order to get meaningful summaries we are going to consider only data since 1991.
storm_data$YEAR<-format(as.Date(storm_data$BGN_DATE,"%m/%d/%Y"),'%Y')
damage_data<-storm_data %>% filter (YEAR>1990)
dim(damage_data)
## [1] 740794 38
Next we will leave only events that cause damage, fatalities or injuries .
damage_data <- damage_data %>%
filter (!is.na(CROPDMGEXP) | !is.na(PROPDMGEXP) | FATALITIES>0 | INJURIES>0 )
dim(damage_data)
## [1] 414696 38
As naming of the events in the row data often is not consistent we will rename some of the most common events names according to a table 2.1.1 of permitted events in National Weather Service Instruction.
damage_data[damage_data$EVTYPE %like% "HURRICANE","EVTYPE"]<-"HURRICANE"
damage_data[damage_data$EVTYPE %like% "HAIL","EVTYPE"]<-"HAIL"
damage_data[damage_data$EVTYPE %like% "THUNDERSTORMS WIND","EVTYPE"]<-"THUNDERSTORM WIND"
damage_data[damage_data$EVTYPE %like% "TSTM WIND","EVTYPE"]<-"THUNDERSTORM WIND"
damage_data[damage_data$EVTYPE %like% "RIP CURRENT","EVTYPE"]<-"RIP CURRENT"
damage_data[damage_data$EVTYPE %like% "COLD","EVTYPE"]<-"EXTREME COLD"
damage_data[damage_data$EVTYPE %like% "FLASH","EVTYPE"]<-"FLASH"
damage_data[damage_data$EVTYPE %like% "FLOOD","EVTYPE"]<-"FLOOD"
damage_data[damage_data$EVTYPE =="HEAT","EVTYPE"]<-"EXCESSIVE HEAT"
damage_data[damage_data$EVTYPE %like% "SURGE","EVTYPE"]<-"STORM SURGE"
As we are going to analyze the cost of damage for natural disasters, we’ll have to translate the stored measurement values in CROPDMGEXP and PROPDMGEXP variables (“K” for thousands, “M” for millions, “B” for billions) to numeric values, that can be summarized. New columns: PROPDMGSCALE and CROPDMGSCALE will be added.
damage_data<- damage_data %>%
mutate(
PROPDMGSCALE = case_when(
PROPDMGEXP == 'b' | PROPDMGEXP == 'B' ~ 1000000000,
PROPDMGEXP == 'm' | PROPDMGEXP == 'M' ~ 1000000,
PROPDMGEXP == 'k' | PROPDMGEXP == 'K' ~ 1000,
PROPDMGEXP == 'h' | PROPDMGEXP == 'H' ~ 100,
TRUE ~ 0
),
CROPDMGSCALE = case_when(
CROPDMGEXP == 'b' | CROPDMGEXP == 'B' ~ 1000000000,
CROPDMGEXP == 'm' | CROPDMGEXP == 'M' ~ 1000000,
CROPDMGEXP == 'k' | CROPDMGEXP == 'K' ~ 1000,
CROPDMGEXP == 'h' | CROPDMGEXP == 'H' ~ 100,
TRUE ~ 0
))
Based on newly created columns we will add new fields for total damage cost for each weather event type.
damage_data<- damage_data %>%
mutate (TOTALPROPDMG= PROPDMGSCALE*PROPDMG, TOTALCROPDMG= CROPDMGSCALE*CROPDMG) %>%
mutate (TOTALDMG=TOTALPROPDMG+TOTALCROPDMG)
Let us start with determinining cumulative property and crop damage as well as total damage (in billions of $):
damage_data %>% summarize (TOTAL_CROP_DMG=sum(TOTALCROPDMG)/1000/1000/1000, TOTAL_PROP_DMG=sum(TOTALPROPDMG)/1000/1000/1000, TOTAL_DMG=sum(TOTALDMG)/1000/1000/1000)
## TOTAL_CROP_DMG TOTAL_PROP_DMG TOTAL_DMG
## 1 49.10419 399.2794 448.3836
We can see that Severe Weather conditions caused almost half a trillion dollars damage since 1991s.
Next we’ll create a summaries for Fatalities, Injuries and Damage per each of the type of the weather event:
by_EV<-group_by(damage_data, EVTYPE)
storm_summary<-summarize(by_EV, fatalities=sum(FATALITIES), injuries=sum(INJURIES), damage=sum(TOTALDMG)/1000000000, .groups = 'drop')
We are going to check the most deadly and damaging events:
damage<-storm_summary %>% arrange(desc(damage)) %>% head
damage[,c(1,4)]
## # A tibble: 6 x 2
## EVTYPE damage
## <chr> <dbl>
## 1 FLOOD 161.
## 2 HURRICANE 90.3
## 3 STORM SURGE 48.0
## 4 TORNADO 29.3
## 5 HAIL 20.7
## 6 FLASH 18.4
fatalities<-storm_summary %>% arrange(desc(fatalities)) %>% head
fatalities[,c(1,2)]
## # A tibble: 6 x 2
## EVTYPE fatalities
## <chr> <dbl>
## 1 EXCESSIVE HEAT 2840
## 2 TORNADO 1699
## 3 FLASH 1035
## 4 LIGHTNING 816
## 5 RIP CURRENT 577
## 6 FLOOD 488
injuries<-storm_summary %>% arrange(desc(injuries)) %>% head
injuries[,c(1,3)]
## # A tibble: 6 x 2
## EVTYPE injuries
## <chr> <dbl>
## 1 TORNADO 25497
## 2 EXCESSIVE HEAT 8625
## 3 FLOOD 6801
## 4 THUNDERSTORM WIND 5943
## 5 LIGHTNING 5230
## 6 ICE STORM 1975
As we can see TORNADO event is the most devastating to cause fatalities and injuries, while Floods and Hurricanes are causing most costly damages. To summarize we will leave only top events for all types of damage:
top_events<-unique(c(injuries$EVTYPE, fatalities$EVTYPE, damage$EVTYPE))
top<-storm_summary[storm_summary$EVTYPE %in% top_events,]
print(top[1])
## # A tibble: 11 x 1
## EVTYPE
## <chr>
## 1 EXCESSIVE HEAT
## 2 FLASH
## 3 FLOOD
## 4 HAIL
## 5 HURRICANE
## 6 ICE STORM
## 7 LIGHTNING
## 8 RIP CURRENT
## 9 STORM SURGE
## 10 THUNDERSTORM WIND
## 11 TORNADO
Next we will plot the summary and comparison for the top storm events:
p<-top %>%
pivot_longer(fatalities:damage, names_to="harm_type", values_to="numbers") %>%
ggplot(aes(EVTYPE, numbers, color=harm_type, group=harm_type))+
geom_bar(stat="identity")+
theme(axis.text.x=element_text(angle=50, size=7,hjust=1))+
facet_wrap(~ harm_type, scales="free_y", ncol=1, strip.position = "left", labeller = as_labeller(c(damage = 'Billions$', fatalities = 'Fatalities', injuries='Injuries')) ) + ylab(NULL) + ggtitle("Top Weather Event Types Harm Summary (1991-2001)") + xlab(NULL)+
theme(strip.background = element_blank(), strip.placement = "outside")
print(p)
Display 3 top Weather events types, that caused most injuries and fatalities:
c<-damage_data %>% filter (EVTYPE %in% c("TORNADO","EXCESSIVE HEAT","FLASH")) %>% group_by(EVTYPE, YEAR) %>%
summarize(fatalities=sum(FATALITIES), injuries=sum(INJURIES), .groups = 'drop') %>%
pivot_longer(fatalities:injuries, names_to="harm_type", values_to="numbers") %>% mutate (YEAR=as.Date(YEAR,'%Y')) %>%
ggplot(aes(x=YEAR,y=numbers, color=EVTYPE, group=EVTYPE)) +
scale_x_date( date_breaks = "1 years", date_labels = "%Y")+
theme(axis.text.x=element_text(angle=70)) + facet_wrap(~ harm_type, scales="free_y", ncol=1)+geom_line()+geom_point() +ggtitle("Casualities in USA by top 3 Weather Event Types (1991-2011)")+ylab(NULL) +xlab(NULL)
print(c)
Plotting historical data for top 3 most costly type of events:
d<-damage_data %>% filter (EVTYPE %in% c("FLOOD","HURRICANE","STORM SURGE","TORNADO")) %>% group_by(EVTYPE, YEAR) %>% summarize(damage=sum(TOTALDMG), .groups = 'drop') %>%
mutate (YEAR=as.Date(YEAR,'%Y')) %>%
ggplot(aes(x=YEAR,y=damage/1000/1000/1000, color=EVTYPE), group=EVTYPE) +
scale_x_date( date_breaks = "1 year", date_labels = "%Y")+
theme(axis.text.x=element_text(angle=90)) +geom_line()+geom_point() +
ggtitle("Top 3 Costly Weather Event Types (1991-2011) ") +ylab("Billions$") + xlab(NULL)
print(d)