The basic goal of this assignment is to explore the NOAA Storm Database to answer the following questions:

Reading in the dataset

First of all we read the unzipped dataset in csv format from the working directory

storm<-read.csv("repdata_data_StormData.csv",stringsAsFactors = TRUE)

Data Understanding

The dataset has 37 variables as showed below

names(storm)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

Some variables are unnecessary for the purpose of this analysis. In particular we take:

subsetting the initial dataset

Data Processing

storm_ext<-storm[,c("EVTYPE", "FATALITIES", "INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

There are 985 different types of event.

length(unique(storm_ext$EVTYPE))
## [1] 985

Focusing on the attributes PROPDMGEXP and CROPDMGEXP we observe the distinct values:

unique(storm_ext$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(storm_ext$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

We may suppose that ‘K’ and ‘k’ multiply 1.000 the corresponding PROPDMG/CROPDMG value, ‘M’ and ‘m’ multiply 1.000.000 while ‘B’ and ‘b’ multiply * 1.000.000.000. The other codes are unintelligible but in very few observations, one of these unknown codes is related to a PROPDMG/CROPDMG value greater than zero.

sum(!storm_ext$PROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") & storm_ext$PROPDMG>0)
## [1] 320
sum(!storm_ext$CROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") & storm_ext$CROPDMG>0)
## [1] 15

So we can delete from the dataset these records and create another two columns with the damage value calculated multiplying for the corresponding unit.

storm_ext_clean<-storm_ext[(storm_ext$PROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") | storm_ext$PROPDMG==0)&(storm_ext$CROPDMGEXP %in% c("B","b", "H","h", "K","k", "M","m") | storm_ext$CROPDMG==0),]
size<-nrow(storm_ext_clean)
pmul<-vector(mode="numeric", length=size)
cmul<-vector(mode="numeric", length=size)

for(i in 1:size)
{
  if(storm_ext_clean$PROPDMGEXP[i] %in% c('H','h')){
    pmul[i]=100*storm_ext_clean$PROPDMG[i]}
  else if(storm_ext_clean$PROPDMGEXP[i] %in% c('K','k')){
    pmul[i]=1000*storm_ext_clean$PROPDMG[i]}
  else if(storm_ext_clean$PROPDMGEXP[i] %in% c('M','m')){
    pmul[i]=1000000*storm_ext_clean$PROPDMG[i]}
  else if(storm_ext_clean$PROPDMGEXP[i] %in% c('B','b')){
    pmul[i]=1000000000*storm_ext_clean$PROPDMG[i]}
  else{pmul[i]=0}
  
  if(storm_ext_clean$CROPDMGEXP[i] %in% c('H','h')){
    cmul[i]=100*storm_ext_clean$CROPDMG[i]}
  else if(storm_ext_clean$CROPDMGEXP[i] %in% c('K','k')){
    cmul[i]=1000*storm_ext_clean$CROPDMG[i]}
  else if(storm_ext_clean$CROPDMGEXP[i] %in% c('M','m')){
    cmul[i]=1000000*storm_ext_clean$CROPDMG[i]}
  else if(storm_ext_clean$CROPDMGEXP[i] %in% c('B','b')){
    cmul[i]=1000000000*storm_ext_clean$CROPDMG[i]}
  else{cmul[i]=0}
}
storm_ext_clean_mul<-cbind(storm_ext_clean,pmul,cmul)

Results

Finally we can view the results after loading the libraries dplyr and ggplot2

library(dplyr)
library(ggplot2)

The top 10 events with the greatest economic consequences are showed in the figure below

eco_cons<-storm_ext_clean_mul %>%  group_by(EVTYPE) %>% summarise(totalsum = sum(pmul+cmul)) %>% top_n(n = 10, wt = totalsum) %>% arrange(desc(totalsum/1000000))
bp<- ggplot(eco_cons, aes(x=EVTYPE, y=totalsum/1000000,fill=EVTYPE))+
  geom_bar(width = 1, stat = "identity")+
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())+labs(title = "Economic Consequences", x = "Event Type", y = "Money for damages (million dollars)")
bp

The top 10 events with the greatest health consequencesare showed in the figure below

health_cons<-storm_ext_clean_mul %>%  group_by(EVTYPE) %>% summarise(totalsum = sum(FATALITIES+INJURIES)) %>% top_n(n = 10, wt = totalsum) %>% arrange(desc(totalsum))
bp<- ggplot(health_cons, aes(x=EVTYPE, y=totalsum,fill=EVTYPE))+
  geom_bar(width = 1, stat = "identity")+
  theme(axis.text.x=element_blank(),
        axis.ticks.x=element_blank())+labs(title = "Health Consequences", x = "Event Type", y = "Number of injuried and died")
bp