In this assignment we will investigate impact of the severe weather events in the health community and what are economic consequences for these events. We are running exploratory analysis based on the storm database collected from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) from 1948 - 2013. During database exploration, we will use some of the important parameters such as: fatalities, injuries, property damage and crop damage in time frame of 65 years. During this report we will try to have clear material about our concern and giving us awareness for preventing or minimizing impact of the severe weather event.

Purpose of the project: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences?

Fundamental settings libraries

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.3
## 
## Attaching package: 'ggplot2'
## 
## The following object is masked _by_ '.GlobalEnv':
## 
##     mpg
library(plyr)
library(grid)
library(gridExtra)

Data processing and creating R working environment

data<-read.csv("repdata-data-StormData.csv", stringsAsFactors = FALSE, sep=",", header=T)
data$BGN_DATE<- strptime(data$BGN_DATE, "%m/%d/%Y %H:%M:%S")
data$BGN_DATE<-as.Date(data$BGN_DATE)

Plot a histogram with the total data by year

hist(data$BGN_DATE, breaks = 30, main="Histogram data/year" ,xlab="Years")

For purpose of the data manipulation we substitute in the PROPDMG and CROPDMG columns for each observation where we have Thousand (K), Million (M) and Billion (B) with numeric data.

from <- c("k","K","M","m","B")
to <- c("1000","1000","1e+06","1e+06","1e+09")

gsub1 <- function(pattern, replacement, x, ...) {
  for(i in 1:length(pattern))
    x <- gsub(pattern[i], replacement[i], x, ...)
  x
}

data$PROPDMGEXP <- gsub1(from, to, data$PROPDMGEXP)
data$CROPDMGEXP <- gsub1(from, to, data$CROPDMGEXP)

In this section, we will create function of processing information for four parameters (Fatalities, Injuries, Crop damage, Property damage) and getting the first 10 most severe types of weather events from column EVTYPE.

filter <- function(colonne,  data = data, recursive = FALSE) {
  i <- which(colnames(data) == colonne)
  d <- aggregate(data[, i], by = list(data$EVTYPE), FUN = "sum", na.rm = TRUE)
  names(d) <- c("EVTYPE", colonne)
  d <- arrange(d, d[, 2], decreasing = T)
  d <- within(d, EVTYPE <- factor(x = EVTYPE, levels = d$EVTYPE))
  d <- d[1:10, ]
  return(d)
}

fatalities <- filter("FATALITIES", data = data)
injuries <- filter("INJURIES", data = data)
property <- filter("PROPDMG", data = data)
crop<- filter("CROPDMG", data = data)

Results As for the impact of the severe weather event on communities we got two sorted list and graphics. These evidence clarify us numbers of peoples affected by type of weather events.

fatalities
##            EVTYPE FATALITIES
## 1         TORNADO       5633
## 2  EXCESSIVE HEAT       1903
## 3     FLASH FLOOD        978
## 4            HEAT        937
## 5       LIGHTNING        816
## 6       TSTM WIND        504
## 7           FLOOD        470
## 8     RIP CURRENT        368
## 9       HIGH WIND        248
## 10      AVALANCHE        224
injuries
##               EVTYPE INJURIES
## 1            TORNADO    91346
## 2          TSTM WIND     6957
## 3              FLOOD     6789
## 4     EXCESSIVE HEAT     6525
## 5          LIGHTNING     5230
## 6               HEAT     2100
## 7          ICE STORM     1975
## 8        FLASH FLOOD     1777
## 9  THUNDERSTORM WIND     1488
## 10              HAIL     1361
fataities_plot<-ggplot(fatalities, aes(x=EVTYPE, y=FATALITIES)) + geom_bar(stat="identity", fill="orange", colour="brown")+ 
  scale_x_discrete(name="EVENTS")+
  scale_y_continuous(name="FATALITIES")+
  ggtitle("FATALITIES") + 
  theme(plot.title = element_text(lineheight=1, face="bold"),
        axis.title.x = element_text(face="bold", colour="#990000"),
        axis.title.y = element_text(face="bold", colour="#990000"),
        axis.text.x  = element_text(angle=45, hjust=1,face="bold",colour="blue")) + geom_text(aes(x = EVTYPE, y = FATALITIES, label = FATALITIES, angle  = 90, size = 4,  hjust = -0.1), color = "brown",  show_guide  = F)

injuries_plot<-ggplot(injuries, aes(x=EVTYPE, y=INJURIES)) + geom_bar(stat="identity", fill="orange", colour="brown")+ 
  scale_x_discrete(name="EVENTS")+
  scale_y_continuous(name="INJURIES")+
  ggtitle("INJURIES") + 
  theme(plot.title = element_text(lineheight=1, face="bold"),
        axis.title.x = element_text(face="bold", colour="#990000"),
        axis.title.y = element_text(face="bold", colour="#990000"),
        axis.text.x  = element_text(angle=45, hjust=1,face="bold",colour="blue")) + geom_text(aes(x = EVTYPE, y = INJURIES, label = INJURIES, angle  = 90, size = 4,  hjust = -0.1), color = "brown",  show_guide  = F)
grid.arrange(fataities_plot, injuries_plot, ncol = 2)

Summary: Based on the above evidences, we find that tornado, excessive heat and flood cause most fatalities and injuries in the United States from 1948 to 2013.

As for the impact of the severe weather event on total property damage and total crop damage we got two sorted list and graphics. These evidence clarify us amount of money in $ by type of weather events

property
##                EVTYPE   PROPDMG
## 1             TORNADO 3212258.2
## 2         FLASH FLOOD 1420124.6
## 3           TSTM WIND 1335965.6
## 4               FLOOD  899938.5
## 5   THUNDERSTORM WIND  876844.2
## 6                HAIL  688693.4
## 7           LIGHTNING  603351.8
## 8  THUNDERSTORM WINDS  446293.2
## 9           HIGH WIND  324731.6
## 10       WINTER STORM  132720.6
crop
##                EVTYPE   CROPDMG
## 1                HAIL 579596.28
## 2         FLASH FLOOD 179200.46
## 3               FLOOD 168037.88
## 4           TSTM WIND 109202.60
## 5             TORNADO 100018.52
## 6   THUNDERSTORM WIND  66791.45
## 7             DROUGHT  33898.62
## 8  THUNDERSTORM WINDS  18684.93
## 9           HIGH WIND  17283.21
## 10         HEAVY RAIN  11122.80
cropPlot<- qplot(EVTYPE, data = crop, weight = CROPDMG, geom = "bar", fill=EVTYPE, binwidth = 1) + 
  theme(axis.text.x = element_text(angle = 30, hjust = 1)) + scale_y_continuous("Crop Damage in US $") + xlab("Event Type") + ggtitle("Crop Damage/Events 1948 - 2013")

propertyPlot<- qplot(EVTYPE, data = property, weight = PROPDMG, geom = "bar", fill=EVTYPE, binwidth = 1) + theme(axis.text.x = element_text(angle = 30, hjust = 1)) + scale_y_continuous("Crop Damage in US $") + xlab("Event Type") + ggtitle("Property damage/Events 1948 - 2013")
grid.arrange(propertyPlot, cropPlot, ncol = 1)

Summary: Based on the above evidences, we find that tornado and floods cause most total property damage and hail is most for total crop damage in the United States from 1948 to 2013.