Drought, flood and other catastrophic events

The basic goal of this assignment is to explore the NOAA Storm Database to answer the following questions:

Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?

Reading in the dataset

First of all we read the unzipped dataset in csv format from the working directory

storm<-read.csv("repdata_data_StormData.csv",stringsAsFactors = TRUE)

Data Understanding

The dataset has 37 variables as showed below

names(storm)

##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Some variables are unnecessary for the purpose of this analysis. In particular we take:

EVTYPE, the type of the event, Factor
FATALITIES, the number of died persons, Numeric
INJURIES, the number of injured persons, Numeric
PROPDMG, the number of damaged properties, Numeric
PROPDMGEXP, the unit of the previous number, Factor
CROPDMG, the number of damaged crops, Numeric
CROPDMGEXP, the unit of the previous number, Factor

subsetting the initial dataset

Data Processing

storm_ext<-storm[,c("EVTYPE", "FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

There are 985 different types of event.

length(unique(storm_ext$EVTYPE))

## [1] 985

Focusing on the attributes PROPDMGEXP and CROPDMGEXP we observe the distinct values:

unique(storm_ext$PROPDMGEXP)

##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M

unique(storm_ext$CROPDMGEXP)

## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

We may suppose that ‘K’ and ‘k’ multiply 1.000 the corresponding PROPDMG/CROPDMG value, ‘M’ and ‘m’ multiply 1.000.000 while ‘B’ and ‘b’ multiply * 1.000.000.000. The other codes are unintelligible but in very few observations, one of these unknown codes is related to a PROPDMG/CROPDMG value greater than zero.

sum(!storm_ext$PROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") & storm_ext$PROPDMG>0)

## [1] 320

sum(!storm_ext$CROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") & storm_ext$CROPDMG>0)

## [1] 15

So we can delete from the dataset these records and create another two columns with the damage value calculated multiplying for the corresponding unit.

storm_ext_clean<-storm_ext[(storm_ext$PROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") | storm_ext$PROPDMG==0)&(storm_ext$CROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") | storm_ext$CROPDMG==0),]
size<-nrow(storm_ext_clean)
pmul<-vector(mode="numeric", length=size)
cmul<-vector(mode="numeric", length=size)

for(i in 1:size)
{
  if(storm_ext_clean$PROPDMGEXP[i] %in% c('H','h')){
    pmul[i]=100*storm_ext_clean$PROPDMG[i]}
  else if(storm_ext_clean$PROPDMGEXP[i] %in% c('K','k')){
    pmul[i]=1000*storm_ext_clean$PROPDMG[i]}
  else if(storm_ext_clean$PROPDMGEXP[i] %in% c('M','m')){
    pmul[i]=1000000*storm_ext_clean$PROPDMG[i]}
  else if(storm_ext_clean$PROPDMGEXP[i] %in% c('B','b')){
    pmul[i]=1000000000*storm_ext_clean$PROPDMG[i]}
  else{pmul[i]=0}
  
  if(storm_ext_clean$CROPDMGEXP[i] %in% c('H','h')){
    cmul[i]=100*storm_ext_clean$CROPDMG[i]}
  else if(storm_ext_clean$CROPDMGEXP[i] %in% c('K','k')){
    cmul[i]=1000*storm_ext_clean$CROPDMG[i]}
  else if(storm_ext_clean$CROPDMGEXP[i] %in% c('M','m')){
    cmul[i]=1000000*storm_ext_clean$CROPDMG[i]}
  else if(storm_ext_clean$CROPDMGEXP[i] %in% c('B','b')){
    cmul[i]=1000000000*storm_ext_clean$CROPDMG[i]}
  else{cmul[i]=0}
}
storm_ext_clean_mul<-cbind(storm_ext_clean,pmul,cmul)

Results

Finally we can view the results after loading the libraries dplyr and ggplot2

library(dplyr)
library(ggplot2)

The top 10 events with the greatest economic consequences are showed in the figure below

eco_cons<-storm_ext_clean_mul %>%  group_by(EVTYPE) %>% summarise(totalsum = sum(pmul+cmul)) %>% top_n(n = 10, wt = totalsum) %>% arrange(desc(totalsum/1000000))
bp<- ggplot(eco_cons, aes(x=EVTYPE, y=totalsum/1000000,fill=EVTYPE))+
  geom_bar(width = 1, stat = "identity")+
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())+labs(title = "Economic Consequences", x = "Event Type", y = "Money for damages (million dollars)")
bp

The top 10 events with the greatest health consequencesare showed in the figure below

health_cons<-storm_ext_clean_mul %>%  group_by(EVTYPE) %>% summarise(totalsum = sum(FATALITIES+INJURIES)) %>% top_n(n = 10, wt = totalsum) %>% arrange(desc(totalsum))
bp<- ggplot(health_cons, aes(x=EVTYPE, y=totalsum,fill=EVTYPE))+
  geom_bar(width = 1, stat = "identity")+
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())+labs(title = "Health Consequences", x = "Event Type", y = "Number of injuried and died")
bp

Drought, flood and other catastrophic events

Andrea Terlizzi

27 febbraio 2017

Reading in the dataset

Data Understanding

Data Processing

Results