Synopsis

In this report we try to identify the most harmful types of weather events across the United States between 1991 and 2011. We tried to determine if most of the damages/fatalities/injuries are caused only by few types of weather events. We found out that Floods and Hurricanes are most costly type of event meanwhile EXCESSIVE HEAT, FLASH FLOODS, and TORNADOS are most lethal types for the population. If identified correctly it will be helpful to investigate and invest in proper preparation for such events. If a weather event is inevitable, preventing (dealing with) the event consequences should be the priority.

Loading and Processing the Raw Data

Originally provided by National Weather Service the 1950-2011 raw data of natural disaster statistics can be downloaded from R Foundation Course servers.

myFile<-"repdata_data_StormData.csv.bz2"
if (!file.exists(myFile)) { download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile=myFile, method='cur') } 

First we load the row data and check its dimensions (there are 902297 rows in the dataset)

storm_data<-read.csv(myFile, na.strings=c(""))
dim(storm_data)
## [1] 902297     37

Though some data was gathered since 1950, there was too much of missing data at that time. In order to get meaningful summaries we are going to consider only data since 1991.

storm_data$YEAR<-format(as.Date(storm_data$BGN_DATE,"%m/%d/%Y"),'%Y')
damage_data<-storm_data %>% filter (YEAR>1990)
dim(damage_data)
## [1] 740794     38

Next we will leave only events that cause damage, fatalities or injuries .

damage_data <- damage_data %>% 
filter (!is.na(CROPDMGEXP) | !is.na(PROPDMGEXP) | FATALITIES>0 | INJURIES>0 ) 
dim(damage_data)
## [1] 414696     38

As naming of the events in the row data often is not consistent we will rename some of the most common events names according to a table 2.1.1 of permitted events in National Weather Service Instruction.

damage_data[damage_data$EVTYPE %like% "HURRICANE","EVTYPE"]<-"HURRICANE"
damage_data[damage_data$EVTYPE %like% "HAIL","EVTYPE"]<-"HAIL"
damage_data[damage_data$EVTYPE %like% "THUNDERSTORMS WIND","EVTYPE"]<-"THUNDERSTORM WIND"
damage_data[damage_data$EVTYPE %like% "TSTM WIND","EVTYPE"]<-"THUNDERSTORM WIND"
damage_data[damage_data$EVTYPE %like% "RIP CURRENT","EVTYPE"]<-"RIP CURRENT"
damage_data[damage_data$EVTYPE %like% "COLD","EVTYPE"]<-"EXTREME COLD"
damage_data[damage_data$EVTYPE %like% "FLASH","EVTYPE"]<-"FLASH"
damage_data[damage_data$EVTYPE %like% "FLOOD","EVTYPE"]<-"FLOOD"
damage_data[damage_data$EVTYPE =="HEAT","EVTYPE"]<-"EXCESSIVE HEAT"
damage_data[damage_data$EVTYPE %like% "SURGE","EVTYPE"]<-"STORM SURGE"

As we are going to analyze the cost of damage for natural disasters, we’ll have to translate the stored measurement values in CROPDMGEXP and PROPDMGEXP variables (“K” for thousands, “M” for millions, “B” for billions) to numeric values, that can be summarized. New columns: PROPDMGSCALE and CROPDMGSCALE will be added.

damage_data<- damage_data %>%
mutate(
    PROPDMGSCALE = case_when(
        PROPDMGEXP == 'b' | PROPDMGEXP == 'B' ~ 1000000000,
    PROPDMGEXP == 'm' | PROPDMGEXP == 'M' ~ 1000000,
    PROPDMGEXP == 'k' | PROPDMGEXP == 'K' ~ 1000,
    PROPDMGEXP == 'h' | PROPDMGEXP == 'H' ~ 100,
        TRUE                      ~ 0
    ),
    CROPDMGSCALE = case_when(
        CROPDMGEXP == 'b' | CROPDMGEXP == 'B' ~ 1000000000,
    CROPDMGEXP == 'm' | CROPDMGEXP == 'M' ~ 1000000,
    CROPDMGEXP == 'k' | CROPDMGEXP == 'K' ~ 1000,
    CROPDMGEXP == 'h' | CROPDMGEXP == 'H' ~ 100,
        TRUE                      ~ 0
    ))

Based on newly created columns we will add new fields for total damage cost for each weather event type.

damage_data<- damage_data %>% 
mutate (TOTALPROPDMG= PROPDMGSCALE*PROPDMG, TOTALCROPDMG= CROPDMGSCALE*CROPDMG) %>%
mutate (TOTALDMG=TOTALPROPDMG+TOTALCROPDMG)

RESULTS

Let us start with determinining cumulative property and crop damage as well as total damage (in billions of $):

damage_data %>% summarize (TOTAL_CROP_DMG=sum(TOTALCROPDMG)/1000/1000/1000, TOTAL_PROP_DMG=sum(TOTALPROPDMG)/1000/1000/1000, TOTAL_DMG=sum(TOTALDMG)/1000/1000/1000)
##   TOTAL_CROP_DMG TOTAL_PROP_DMG TOTAL_DMG
## 1       49.10419       399.2794  448.3836

We can see that Severe Weather conditions caused almost half a trillion dollars damage since 1991s.

Next we’ll create a summaries for Fatalities, Injuries and Damage per each of the type of the weather event:

by_EV<-group_by(damage_data, EVTYPE)

storm_summary<-summarize(by_EV, fatalities=sum(FATALITIES), injuries=sum(INJURIES), damage=sum(TOTALDMG)/1000000000, .groups = 'drop')  

We are going to check the most deadly and damaging events:

damage<-storm_summary %>% arrange(desc(damage)) %>% head
damage[,c(1,4)]
## # A tibble: 6 x 2
##   EVTYPE      damage
##   <chr>        <dbl>
## 1 FLOOD        161. 
## 2 HURRICANE     90.3
## 3 STORM SURGE   48.0
## 4 TORNADO       29.3
## 5 HAIL          20.7
## 6 FLASH         18.4
fatalities<-storm_summary %>% arrange(desc(fatalities)) %>% head
fatalities[,c(1,2)]
## # A tibble: 6 x 2
##   EVTYPE         fatalities
##   <chr>               <dbl>
## 1 EXCESSIVE HEAT       2840
## 2 TORNADO              1699
## 3 FLASH                1035
## 4 LIGHTNING             816
## 5 RIP CURRENT           577
## 6 FLOOD                 488
injuries<-storm_summary %>% arrange(desc(injuries)) %>% head 
injuries[,c(1,3)]
## # A tibble: 6 x 2
##   EVTYPE            injuries
##   <chr>                <dbl>
## 1 TORNADO              25497
## 2 EXCESSIVE HEAT        8625
## 3 FLOOD                 6801
## 4 THUNDERSTORM WIND     5943
## 5 LIGHTNING             5230
## 6 ICE STORM             1975

As we can see TORNADO event is the most devastating to cause fatalities and injuries, while Floods and Hurricanes are causing most costly damages. To summarize we will leave only top events for all types of damage:

top_events<-unique(c(injuries$EVTYPE, fatalities$EVTYPE, damage$EVTYPE))

top<-storm_summary[storm_summary$EVTYPE %in% top_events,]

print(top[1])
## # A tibble: 11 x 1
##    EVTYPE           
##    <chr>            
##  1 EXCESSIVE HEAT   
##  2 FLASH            
##  3 FLOOD            
##  4 HAIL             
##  5 HURRICANE        
##  6 ICE STORM        
##  7 LIGHTNING        
##  8 RIP CURRENT      
##  9 STORM SURGE      
## 10 THUNDERSTORM WIND
## 11 TORNADO

Next we will plot the summary and comparison for the top storm events:

p<-top %>% 
pivot_longer(fatalities:damage, names_to="harm_type", values_to="numbers") %>%
 ggplot(aes(EVTYPE, numbers, color=harm_type, group=harm_type))+ 
 geom_bar(stat="identity")+
  theme(axis.text.x=element_text(angle=50, size=7,hjust=1))+
  facet_wrap(~ harm_type, scales="free_y", ncol=1, strip.position = "left", labeller = as_labeller(c(damage = 'Billions$', fatalities = 'Fatalities', injuries='Injuries')) ) + ylab(NULL) + ggtitle("Top Weather Event Types Harm Summary (1991-2001)") + xlab(NULL)+
     theme(strip.background = element_blank(), strip.placement = "outside")
print(p)

Display 3 top Weather events types, that caused most injuries and fatalities:

c<-damage_data %>% filter (EVTYPE %in% c("TORNADO","EXCESSIVE HEAT","FLASH")) %>% group_by(EVTYPE, YEAR) %>%
summarize(fatalities=sum(FATALITIES), injuries=sum(INJURIES), .groups = 'drop')  %>% 
  pivot_longer(fatalities:injuries, names_to="harm_type", values_to="numbers") %>% mutate (YEAR=as.Date(YEAR,'%Y')) %>%
ggplot(aes(x=YEAR,y=numbers, color=EVTYPE, group=EVTYPE)) +
             scale_x_date( date_breaks = "1 years", date_labels = "%Y")+ 
        theme(axis.text.x=element_text(angle=70)) + facet_wrap(~ harm_type, scales="free_y", ncol=1)+geom_line()+geom_point() +ggtitle("Casualities in USA by top 3 Weather Event Types (1991-2011)")+ylab(NULL) +xlab(NULL)
print(c)

Plotting historical data for top 3 most costly type of events:

d<-damage_data %>% filter (EVTYPE %in% c("FLOOD","HURRICANE","STORM SURGE","TORNADO")) %>% group_by(EVTYPE, YEAR) %>% summarize(damage=sum(TOTALDMG), .groups = 'drop') %>% 
  mutate (YEAR=as.Date(YEAR,'%Y')) %>%
ggplot(aes(x=YEAR,y=damage/1000/1000/1000, color=EVTYPE), group=EVTYPE) +
             scale_x_date( date_breaks = "1 year", date_labels = "%Y")+ 
        theme(axis.text.x=element_text(angle=90)) +geom_line()+geom_point() +
  ggtitle("Top 3 Costly Weather Event Types (1991-2011) ") +ylab("Billions$") + xlab(NULL)
print(d)