Russian data scientist made new discovery! The most harmful for population is… tornado and the most destructive for economy disaster is … flood !

Synopsis:

This research was made by Sergey Chernov in january 2015. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. Date was loaded and cleaned. After it we determined top ten the most harmful for population disasters and top ten the most destructive disaster. The most harmful for population is tornado and the most destructive for economy disaster is flood The governement must increase budget for tornado preventions.

Data processing: Loading and preprocessing the data

  1. Load data from raw file repdata-data-StormData.csv.bz2 We load all information from raw file.
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
setwd("D:/Program/RepProject2")
storm<-read.csv(con<-bzfile("repdata-data-StormData.csv.bz2"),sep=",")
  1. Exclude events without effects

We excluded events without consequences, as it does not affect the amount of damage and the total number of victims.

tidy_storm<-storm[which((storm$INJURIES!=0)|(storm$FATALITIES!=0)|(storm$PROPDMG!=0)),c("EVTYPE","INJURIES","FATALITIES","PROPDMG","PROPDMGEXP")]

After that we prepared variables for calculate amount of property damage. The code book (Storm Events) was used for the conversion.

tidy_storm<-mutate(tidy_storm,DMGEXP=0)

tidy_storm[tidy_storm$PROPDMGEXP=="K","DMGEXP"]<-1000
tidy_storm[tidy_storm$PROPDMGEXP=="M","DMGEXP"]<-1e+06
tidy_storm[tidy_storm$PROPDMGEXP=="","DMGEXP"]<-1
tidy_storm[tidy_storm$PROPDMGEXP=="B","DMGEXP"]<-1e+09
tidy_storm[tidy_storm$PROPDMGEXP=="m","DMGEXP"]<-1e+06
tidy_storm[tidy_storm$PROPDMGEXP=="0","DMGEXP"]<-1
tidy_storm[tidy_storm$PROPDMGEXP=="8","DMGEXP"]<-1e+08
tidy_storm[tidy_storm$PROPDMGEXP=="7","DMGEXP"]<-1e+07
tidy_storm[tidy_storm$PROPDMGEXP=="5","DMGEXP"]<-1e+05
tidy_storm[tidy_storm$PROPDMGEXP=="6","DMGEXP"]<-1e+06
tidy_storm[tidy_storm$PROPDMGEXP=="4","DMGEXP"]<-1e+04
tidy_storm[tidy_storm$PROPDMGEXP=="3","DMGEXP"]<-1e+03
tidy_storm[tidy_storm$PROPDMGEXP=="2","DMGEXP"]<-1e+02
tidy_storm[tidy_storm$PROPDMGEXP=="h","DMGEXP"]<-100
tidy_storm[tidy_storm$PROPDMGEXP=="H","DMGEXP"]<-100
tidy_storm[tidy_storm$PROPDMGEXP=="1","DMGEXP"]<-10
tidy_storm[tidy_storm$PROPDMGEXP=="+","DMGEXP"]<-0
tidy_storm[tidy_storm$PROPDMGEXP=="-","DMGEXP"]<-0
tidy_storm[tidy_storm$PROPDMGEXP=="?","DMGEXP"]<-0
  1. Grouping by event type (sum injuries and fatality for first analysis and sum amount of property damage for second analysis)
tidy_storm<-group_by(tidy_storm,EVTYPE)
all_lethal<-summarize(tidy_storm, inj=sum(INJURIES+FATALITIES))
all_damage<-summarize(tidy_storm, damage=sum(PROPDMG*DMGEXP))
  1. Find top 10 lethal events and top 10 destructive events sorted by their effect.
top_lethal<-arrange(all_lethal,desc(inj))[1:10,]
top_damage<-arrange(all_damage,desc(damage))[1:10,]

Results:

Question : Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Answer: The most harmful for population disasters are

top_lethal
## Source: local data frame [10 x 2]
## 
##               EVTYPE   inj
## 1            TORNADO 96979
## 2     EXCESSIVE HEAT  8428
## 3          TSTM WIND  7461
## 4              FLOOD  7259
## 5          LIGHTNING  6046
## 6               HEAT  3037
## 7        FLASH FLOOD  2755
## 8          ICE STORM  2064
## 9  THUNDERSTORM WIND  1621
## 10      WINTER STORM  1527
qplot(data=top_lethal[1:5,],x=EVTYPE,y=inj,xlab="Type of disaster",ylab="The number of victims", main="Number of victims caused by disasters", geom="area",size=20,colour=EVTYPE)

Question : Across the United States, which types of events have the greatest economic consequences?

Answer: The most destructive disaster are

top_damage
## Source: local data frame [10 x 2]
## 
##               EVTYPE       damage
## 1              FLOOD 144657709807
## 2  HURRICANE/TYPHOON  69305840000
## 3            TORNADO  56947380617
## 4        STORM SURGE  43323536000
## 5        FLASH FLOOD  16822673979
## 6               HAIL  15735267513
## 7          HURRICANE  11868319010
## 8     TROPICAL STORM   7703890550
## 9       WINTER STORM   6688497251
## 10         HIGH WIND   5270046260
qplot(data=top_damage[1:5,],x=EVTYPE,y=damage,xlab="Type of disaster",ylab="Economic consequences, $", main="Economic consequences ($) caused by disasters", geom="area",size=20,colour=EVTYPE)