Synopsis

we want to answer two main questions using this rather huge database. which environmental events have most effects on health and economy of people in the United States. my approach is first exploring dataset to find factors which are important to us. from metadata I found the meaning of factor names and differences. “EVTYPE” , “PROPDMG” ,“CROPDMG” , “FATALITY” , “INJURIES” columns above are important us mainly. for propdmg and cropdmg we need to interprete them with their suffix, K,M,B,H ans so on… since I set the echo = TRUE all over the file, I think It’s better to stop explaining the project steps and dive in to the data.

Data Processing

this section is devoted to reading the data and preparing the base for doing our analysis.

Analysis

We want to answer following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

we need to explore our dataset to see what we can obtain from it and how we can answer this questions.

effect on Health

in order to have better idea of these events’ casualties first I plot number of fatalities and injuries during these years (1951-2011).

I believe the gradual increase in number of casualties is because of technologic progress mainly. looking at the plot in last 20 years we can say that number of injuries is reducing over time except some inordinary events like Katrina Hurricane.

next step we are going to answer first question by using EVTYPE factor in dataset.

here we can see that Tornados has been the most dangerous environment to people’s health in US with a significant difference with other weather events. exessive heat , flood and lightning are in next places.

effect on economy

damages are classified in two classes, crop damages and direct harms to infrastructure and buildings and so on.

dmg <- data[data$PROPDMG > 0 | data$CROPDMG > 0,]

dmg <-dmg %>%  select(EVTYPE ,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
l = length(dmg$PROPDMG)
for (i in 1:l) {
  if (dmg$PROPDMGEXP[i] == "k" | dmg$PROPDMGEXP[i] == "K" ) {
    dmg$PROPDMG[i] = dmg$PROPDMG[i] * 1000
  } else if (dmg$PROPDMGEXP[i] == "m" | dmg$PROPDMGEXP[i] == "M" ) {
    dmg$PROPDMG[i] = dmg$PROPDMG[i] * 1000000
  } else if (dmg$PROPDMGEXP[i] == "b" | dmg$PROPDMGEXP[i] == "B" ) {
    dmg$PROPDMG[i] = dmg$PROPDMG[i] * 1000000000
  } else if (dmg$PROPDMGEXP[i] == "h" | dmg$PROPDMGEXP[i] == "H" ) {
    dmg$PROPDMG[i] = dmg$PROPDMG[i] * 100
  }
}
for (i in 1:l) {
  if (dmg$CROPDMGEXP[i] == "k" | dmg$CROPDMGEXP[i] == "K" ) {
    dmg$CROPDMG[i] = dmg$CROPDMG[i] * 1000
  } else if (dmg$CROPDMGEXP[i] == "m" | dmg$CROPDMGEXP[i] == "M" ) {
    dmg$CROPDMG[i] = dmg$CROPDMG[i] * 1000000
  } else if (dmg$CROPDMGEXP[i] == "b" | dmg$CROPDMGEXP[i] == "B" ) {
    dmg$CROPDMG[i] = dmg$CROPDMG[i] * 1000000000
  } else if (dmg$CROPDMGEXP[i] == "h" | dmg$CROPDMGEXP[i] == "H" ) {
    dmg$CROPDMG[i] = dmg$CROPDMG[i] * 100
  }
}
dmg <- dmg %>%  select(EVTYPE,PROPDMG,CROPDMG)
dmg <- dmg %>% group_by(EVTYPE) %>% summarise(prop=sum(PROPDMG),crop= sum(CROPDMG))
dmg <- dmg %>%  mutate(sum = prop + crop)
dmg1 <- dmg %>%  select(-prop , -crop) 
dmg1 <- dmg1 %>%  arrange(desc(sum))
dmg1 <- dmg1[1:20,]
g <- ggplot(dmg1 , aes(reorder(EVTYPE,sum), y = sum , fill = sum))
g <- g + geom_bar(stat = "identity") 
g <- g +theme(axis.text.x = element_text(angle = 90))
g <- g + xlab("event type") + ylab("amount of damage in $")
g

Result

here we can see that flood’s economical damage is hundreds of times more that other weather events and Typhoon , Tornado and storm surge are next places.

I think its better to look at both factors in the same time; health and economy , to decide which event is more harmful since we know the money that goes to prevent these damages is scarce and need to decide. if we look at our both analysis we can simply see that Tornados and floods are in the first place to be prepared for.