Introduction

In this work we made small analysis of demages and injuries related to different weather conditions.

The analysis was based on the data obtained from the National Climatic Data Center Storm Events databases, available (as of April/26/2015) on https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2. The codebook and additional information about the data could be obtained at (as of April/26/2015): https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf and https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

In the analysis we adressed two questions:

  1. Across the United States, which types of events are most harmful with respect to population health?

  2. Across the United States, which types of events have the greatest economic consequences?

The analysis has been made using the R programming language, which codes are included in the analysis. The questions (1) and (2) are adressed using simple graphs.

Data processing

The computer specifications:

MacBook Pro (13-inch, Mid 2010), Processor 2.4 GHz Intel Core 2 Duo, 8 GB 1067 MHz DDR3, NVIDIA GeForce 320M 256 MB, OSX Yosemite, Version 10.10.3

The R environment settings are summarised below:

sessionInfo()
## R version 3.1.2 (2014-10-31)
## Platform: x86_64-apple-darwin13.4.0 (64-bit)
## 
## locale:
## [1] C
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## loaded via a namespace (and not attached):
## [1] digest_0.6.6     evaluate_0.5.5   formatR_1.1      htmltools_0.2.6 
## [5] knitr_1.9        rmarkdown_0.3.10 stringr_0.6.2    tools_3.1.2     
## [9] yaml_2.1.13

Downloading the file:

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","./",method="curl")
## Warning in
## download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
## : download had nonzero exit status

Unzipping the .bz2 file to the local directory:

unz("./repdata-data-StormData.csv.bz2", "./repdata-data-StormData.csv")
##                                                     description 
## "./repdata-data-StormData.csv.bz2:./repdata-data-StormData.csv" 
##                                                           class 
##                                                           "unz" 
##                                                            mode 
##                                                             "r" 
##                                                            text 
##                                                          "text" 
##                                                          opened 
##                                                        "closed" 
##                                                        can read 
##                                                           "yes" 
##                                                       can write 
##                                                           "yes"
data <- read.csv("repdata-data-StormData.csv")
## Warning: closing unused connection 5
## (./repdata-data-StormData.csv.bz2:./repdata-data-StormData.csv)

Loading the necessary libraries:

library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(gridExtra)
## Loading required package: grid

We grouped the date by the EVTYPE. We found it in code book to be the type of the Weather Condition. We introduced also additional factor variable “Injuries” for our future convenience.

g<-group_by(data,EVTYPE)
gn<-summarise(g,sum(INJURIES,na.rm=T))
names(gn)<-c("WeatherType","inj")
gn<-cbind(gn,factor(gn$inj))
names(gn)<-c("WeatherType","inj","Injuries")

We plotted the graphs of the number of injuries w.r.t. the ten most injury causing weather conditions. The Tornado was included in the first graph. The second graph is without the Tornado weather condition. The Tornado weather condition caused the most injuries. On the other hand we wanted to see what is the distribution of injuries when Tornado weather condition is factored out.

myplot1<-ggplot(subset(gn,inj>1331), aes(WeatherType, inj))+geom_point(aes(color=WeatherType,size=Injuries)) + scale_color_brewer(palette="Spectral")+theme(panel.background = element_rect(fill = "azure4")) + theme(axis.text.x = element_text(size = rel(0.5)))+ xlab("Weather type") + ylab("Nr. of Injuries") + labs(title = "Injuries due to weather")+coord_flip()+theme(legend.box="horizontal",legend.key.height=unit(0.3,"cm"),legend.text=element_text(size=rel(0.5)))
myplot2<-ggplot(subset(gn,inj!=91346&inj>1331), aes(WeatherType, inj))+geom_point(aes(color=WeatherType,size=Injuries)) + scale_color_brewer(palette="Spectral")+theme(panel.background = element_rect(fill = "azure4")) + theme(axis.text.x = element_text(size = rel(0.5)))+ xlab("Weather type") + ylab("Nr. of Injuries") + labs(title = "Injuries due to weather, tornado removed")+coord_flip()+theme(legend.box="horizontal",legend.key.height=unit(0.3,"cm"),legend.text=element_text(size=rel(0.5)))
grid<-grid.arrange(myplot1,myplot2,nrow=2)

grid
## NULL

In the second section we examined the economic demages caused by weather. Because the data table after unzipping had over 500 MB. We subset the original table to subtable, which contains only nonzero “propertie demages” and “corp demages”. That subtable is less memory demanding as the original table.

We also converted numbers given in PROPDMG and CROPDMG to the actual numeric values of demages. The problem with those two column was that the original numbers in PROPDMG and CROPDMG were prices of demages up to the scale given in the columns PROPDMGEXP and CROPDMGEXP respectively. We converted the symbols in those columns into the numerical scale factors by to the code book and also by our estimate. Since not all symbols were properly described in the code book. We tried to keep the estimate as conservative as we could.

Our conversion was:

“+” —> “1”

“-” —> “1”

“?” —> “1”

“0” —> “1”

“1” —> “10”

“2” —> “100”

“3” —> “1000”

“4” —> “10000”

“5” —> “100000”

“6” —> “1000000”

“7” —> “10000000”

“8” —> “100000000”

“H” —> “100”

“h” —> “100”

“K” —> “1000”

“M” —> “1000000”

“m” —> “1000000”

sub<-subset(data,data$PROPDMG>0&data$CROPDMG>0)
yh<-table(sub$PROPDMGEXP)
table<-cbind(names(yh),c(1,1,1,1,10,100,1000,10000,100000,1000000,10000000,100000000,1,1000000000,100,1000,1000000,100,1000000))
for(i in 1:nrow(table)){sub[which(sub$PROPDMGEXP==table[i,1]),"PROPDMG"]<-as.numeric(table[i,2])*sub[which(sub$PROPDMGEXP==table[i,1]),"PROPDMG"]}
yh1<-table(sub$CROPDMGEXP)
table1<-cbind(names(yh1),c(1,1,100,1,1000000000,1000,1000000,1000,1000000))
for(i in 1:nrow(table1)){sub[which(sub$CROPDMGEXP==table1[i,1]),"CROPDMG"]<-as.numeric(table1[i,2])*sub[which(sub$CROPDMGEXP==table1[i,1]),"CROPDMG"]}
sun<-group_by(sub,EVTYPE)
gin<-summarise(sun,sum(PROPDMG)+sum(CROPDMG))
names(gin)<-c("WeatherType","dem")
gin<-cbind(gin,factor(gin$dem))
names(gin)<-c("WeatherType","dem","PropAndCorpDemages")
rr<-summary(gin)
head(rr)
##            WeatherType        dem              PropAndCorpDemages
##  "BLIZZARD        : 1  " "Min.   :1.000e+02  " "15000  : 4  "    
##  "COASTAL FLOODING: 1  " "1st Qu.:4.238e+05  " "1e+06  : 4  "    
##  "COLD AIR TORNADO: 1  " "Median :8.755e+06  " "55000  : 3  "    
##  "DROUGHT         : 1  " "Mean   :2.165e+09  " "550000 : 3  "    
##  "DRY MICROBURST  : 1  " "3rd Qu.:1.421e+08  " "100    : 1  "    
##  "DUST STORM      : 1  " "Max.   :1.260e+11  " "550    : 1  "

For plotting we used the same code as in the part one. Except now we picked seven most economically severe weather conditions.

myplot1<-ggplot(subset(gin,dem>2.594e+9), aes(WeatherType, PropAndCorpDemages))+geom_point(aes(color=WeatherType,size=PropAndCorpDemages)) + scale_color_brewer(palette="Spectral")+theme(panel.background = element_rect(fill = "azure4")) + theme(axis.text.x = element_text(size = rel(0.5)))+ xlab("Weather type") + ylab("Property and corp demages") + labs(title = "Property And Corp Demages due to weather")+coord_flip()+theme(legend.box="vertical",legend.key.height=unit(0.3,"cm"),legend.text=element_text(size=rel(0.5)))+ theme(axis.text.y = element_text(size = rel(0.5)))
myplot1

Results

The question nr. (1) is addressed in the plot nr. one Injuries due to weather. It shows that the socially most disasterous weather condition is Tornado weather condition.

We looked also at the subleading weather conditions. The second part of plot one Injuries due to weather, Tornado excluded shows that in the subleading sector the demages are more evenly distributed. Leading with TSTM Wind, Flood and Excesive Heat.

The question nr. (2) is addressed in the plot nr. two Demages due to weather.

It shows that the most economically disasterous weather condition is Flood. However two subleading weather conditions are two types of huricanes: Huricane and Huricane/Typhoon. If one combines those two subleading weather conditions and consider them as to be of the Huricane type. One gets that the most economically chalenging weather condition is actually the Huricane.