Summary (synopsis)

In this report, we import and clean the storm data. First, we analyze impact of different type of events to population health. We show that tornago is the most harmful for the health, as it killed or injured more people than any other event type we considered. Then, we adress the question of which type of events have the greatest economic consequences. We show that flood affects the economy more than any other events.

Data processing

First we import the data using the following code.

storm <- read.csv("repdata_data_StormData.csv.bz2")

Now we get year from the begin date of the events.

storm$YEAR <- format(as.Date(storm$BGN_DATE, "%m/%d/%Y %H:%M:%S"), format="%Y")

Now we clean property and crop damage variables.

storm[storm$PROPDMGEXP=="B",]$PROPDMG = storm[storm$PROPDMGEXP=="B",]$PROPDMG * 10^9
storm[storm$PROPDMGEXP=="m" | storm$PROPDMGEXP=="M",]$PROPDMG = storm[storm$PROPDMGEXP=="m" | storm$PROPDMGEXP=="M",]$PROPDMG * 10^6
storm[storm$PROPDMGEXP=="k" | storm$PROPDMGEXP=="K",]$PROPDMG = storm[storm$PROPDMGEXP=="k" | storm$PROPDMGEXP=="K",]$PROPDMG * 10^3
storm[storm$PROPDMGEXP=="h" | storm$PROPDMGEXP=="H",]$PROPDMG = storm[storm$PROPDMGEXP=="h" | storm$PROPDMGEXP=="H",]$PROPDMG * 100
for(i in 1:9){
      storm[storm$PROPDMGEXP == toString(i),]$PROPDMG = storm[storm$PROPDMGEXP == toString(i),]$PROPDMG * 10^i 
}
storm[storm$CROPDMGEXP=="B",]$CROPDMG = storm[storm$CROPDMGEXP=="B",]$CROPDMG * 10^9
storm[storm$CROPDMGEXP=="m" | storm$CROPDMGEXP=="M",]$CROPDMG = storm[storm$CROPDMGEXP=="m" | storm$CROPDMGEXP=="M",]$CROPDMG * 10^6
storm[storm$CROPDMGEXP=="k" | storm$CROPDMGEXP=="K",]$CROPDMG = storm[storm$CROPDMGEXP=="k" | storm$CROPDMGEXP=="K",]$CROPDMG * 10^3
storm[storm$CROPDMGEXP == "2",]$CROPDMG = storm[storm$CROPDMGEXP == "2",]$CROPDMG * 10^2

As the data is large and contains lots of variables that are not interesting for us, we keep variables that are interesting for us.

clean_storm <- storm[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'CROPDMG','YEAR')]

As the events are not captured well in the earlier years, we consider events happened only after 1980.

clean_storm <- clean_storm[clean_storm$YEAR>=1980, ]

As a final cleaning step, we sum up damages to economy and humans.

clean_storm$HDMG <- clean_storm$FATALITIES+clean_storm$INJURIES
clean_storm$ECONDMG <- clean_storm$PROPDMG+clean_storm$CROPDMG

Results

In this section we show damages to human health and economy.

First we want to get top 5 events that killed or injured most people.

HUMAN_DMG <- aggregate(HDMG ~ EVTYPE, data=clean_storm, sum)
HUMAN_DMG <- HUMAN_DMG[order(-HUMAN_DMG$HDMG),][1:5,]

The following plot shows the top 5 events and the number of people got killed or injured.

library(ggplot2)
ggplot(data=HUMAN_DMG, aes(x=reorder(EVTYPE, -HDMG), y= HDMG)) + geom_bar(stat="identity") + xlab("Event type") + ylab("Number of people") + ggtitle("Number of people killed or injured by event type")

Now we get top 5 events that damaged the economy the most.

ECON_DMG <- aggregate(ECONDMG ~ EVTYPE, data=clean_storm, sum)
ECON_DMG <- ECON_DMG[order(-ECON_DMG$ECONDMG),][1:5,]

The following plot shows the top 5 events and the number of people got killed or injured.

library(ggplot2)
ggplot(data=ECON_DMG, aes(x=reorder(EVTYPE, -ECONDMG), y= ECONDMG)) + geom_bar(stat="identity") + xlab("Event type") + ylab("Economic damage in USD") + ggtitle("Economic damage by event type")