Synopsis

In this report I aim to describe the harmful impact the weather events have on public health(1) and properties and crops(2).
The hypothesis is that some of weather events are highly harmful and to show this we will use data downloaded from the NOAA. From these data I found that tornado is the most harmful for public health: it produces maximum number of injuries and fatalities.
Flood is the most harmful for properties and crops. Among with hurricane, tornado and storm surge it makes most of the damages to properties and crops.

Data Processing

Data was downloaded from National Weather Service. The events in the database start in the year 1950 and end in November 2011.

destfile <- "RR_CP2_stormdata.csv.bz2"
# Checking if archieve already exists.
if (!file.exists(destfile)){
  fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileURL, destfile=destfile)
}  
storm_data<-read.csv(destfile)

Reading in the storm data

library(ggplot2)
library(plyr)

Now we will prepare data for both questions.
#### Reading and preparing data for public health impact

sdaggfat<-(with(storm_data, aggregate(INJURIES+FATALITIES~EVTYPE, FUN="sum")))
t5<-head(sdaggfat[order(-sdaggfat[2]),], 5)
colnames(t5)<-c("EVTYPE","COUNT")

We could see many events recorded, but we choose to show top 5 only to identify what is the event with the maximum damage.
Now data of top 5 events is prepared:

t5
##             EVTYPE COUNT
## 834        TORNADO 96979
## 130 EXCESSIVE HEAT  8428
## 856      TSTM WIND  7461
## 170          FLOOD  7259
## 464      LIGHTNING  6046

Let’s prepare property and crop damage data to answer second question:

substorm<-storm_data[storm_data$PROPDMG > 0 | storm_data$CROPDMG > 0, c(8,25:28)]

I take subset of the data for which damage is greater than zero and also I take only columns relevant for my research.

I identified measures stored in xxxEXP columns. They were quite confusing and since it was no description of the values, I mapped them as per my understanding. In case of doubt - just keep the value of damage as it is.

unique(substorm$PROPDMGEXP)
##  [1] K M B m   + 0 5 6 4 h 2 7 3 H -
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(substorm$CROPDMGEXP)
## [1]   M K m B ? 0 k
## Levels:  ? 0 2 B k K m M

Mapping:

mapPROPDMG <- mapvalues(substorm$PROPDMGEXP,
                         c("K","M","", "B","m","+","0","5","6","?","4","2","3","h","7","H","-","1","8"), 
                         c(1e3,1e6, 1, 1e9,1e6, 1,  1, 1e5,1e6, 1, 1e4,1e2,1e3, 1, 1e7,1e2, 1, 10, 1e8))
mapCROPDMG <- mapvalues(substorm$CROPDMGEXP,
                         c("","M","K","m","B","?","0","k","2"),
                         c( 1,1e6,1e3,1e6,1e9, 1,  1, 1e3,1e2))

And final data preparation to calculate Property and Crop damage counts separately and their total.

substorm$PROPTOTALDMG <- as.numeric(levels(mapPROPDMG))[mapPROPDMG] * substorm$PROPDMG 
substorm$CROPTOTALDMG <- as.numeric(levels(mapCROPDMG))[mapCROPDMG] * substorm$CROPDMG
substorm$TOTALDMG<-substorm$PROPTOTALDMG+substorm$CROPTOTALDMG 
agg_dmg <- with(substorm, aggregate(TOTALDMG ~ EVTYPE, data=substorm, FUN = "sum"))
ord_agg_dmg <- agg_dmg[order(-agg_dmg$TOTALDMG),]
head(ord_agg_dmg)
##                EVTYPE     TOTALDMG
## 72              FLOOD 150319678257
## 197 HURRICANE/TYPHOON  71913712800
## 354           TORNADO  57362333946
## 299       STORM SURGE  43323541000
## 116              HAIL  18761221986
## 59        FLASH FLOOD  18243991078

Input data is ready to present the results!

Results

Across the United States, which types of events are most harmful with respect to population health?

We will draw the graph of top 5 harmful events. The result has totals of Injuries and Fatals without separating them in the graph.

jColors <- c('chartreuse3', 'cornflowerblue', 'darkgoldenrod1', 'peachpuff3',
             'mediumorchid2')
p1 <- ggplot(t5, aes(EVTYPE, COUNT)) + geom_col(fill=jColors)+
  labs(x = "Event Type", y = "Count") +
  labs(title = "Top 5 harmful events") 
print(p1)

We can see that TORNADO has the maximum impact on public health. The runner up is EXCESSIVE HEAT

Across the United States, which types of events have the greatest economic consequences?

We will draw the graph of 5 events making maximum damage to the properties and crops. The result has totals of damages without separating it into Properties and Crops damages.

tp5<-head(ord_agg_dmg, 5)
p1 <- ggplot(tp5, aes(EVTYPE, TOTALDMG)) + geom_col(fill=jColors)+
  labs(x = "Event Type", y = "Count") +
  labs(title = "Top 5 harmful events") 
print(p1)

We can see that FLOOD has the maximum impact on properties and crop. The runner up is HURRICANE/TYPHOON and second runner up is TORNADO