Synposis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key cocern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

The data analysis in this document addresses the following questions: 1. Which types of events are most harmful with respect to population health? 2. Which types of events have the greatest economic consequences?

suppressWarnings(library(dplyr))
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
suppressWarnings(library(ggplot2))
suppressWarnings(library(knitr))

Data Processing

The data for this assignment come in the form of a comma-separated-value file from National Weather Service Storm Data Documentation. The events in the database start in the year 1950 and end in November 2011.

Loading data

data <- read.csv("StormData.csv.bz2")

Subsetting required data

colnames <- c("BGN_DATE","EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
Storm_data <- data.frame(data[, colnames])

Tidying data for eevent types

length(unique(Storm_data$EVTYPE))
## [1] 985

This shows about 985 types of event while the Storm documentation list only 48 major types. Following code groups the major events.

Storm_data$EVTYPE <- toupper(Storm_data$EVTYPE)
Storm_data[which(Storm_data$EVTYPE=="FLOOD" | Storm_data$EVTYPE=="RIVER FLOOD"), "EVTYPE"] <- "FLOOD"
Storm_data[which(Storm_data$EVTYPE=="HURRICANE/TYPHOON" | Storm_data$EVTYPE=="HURRICANE"|  Storm_data$EVTYPE=="HURRICANE OPAL"), "EVTYPE"] <- "HURRICANE/TYPHOON"
Storm_data[which(Storm_data$EVTYPE=="RIP CURRENT" | Storm_data$EVTYPE=="RIP CURRENTS"), "EVTYPE"] <- "RIP CURRENT"
Storm_data[which(Storm_data$EVTYPE=="STORM SURGE/TIDE" | Storm_data$EVTYPE=="STORM SURGE"), "EVTYPE"] <- "STORM SURGE/TIDE"
Storm_data[which(Storm_data$EVTYPE=="THUNDERSTORM WINDS" |Storm_data$EVTYPE=="THUNDERTORM WINDS" |Storm_data$EVTYPE=="TSTM WIND (G45)" |  Storm_data$EVTYPE=="TSTM WIND"), "EVTYPE"] <- "THUNDERSTORM WIND"
Storm_data[which(Storm_data$EVTYPE=="TORNADOES, TSTM WIND, HAIL"), "EVTYPE"] <- "TORNADO"

The above was done after multiple runs of codes and analysing results so that Top 10 events are grouped together.

Refining data for Damage cost calculation

Storm_data$PROPDMGEXP <- as.character(Storm_data$PROPDMGEXP)
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("","?","-")] <- "0"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("+")] <- "1"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("h","H")] <- "100"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("k","K")] <- "1000"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("m","M")] <- "1000000"
Storm_data$PROPDMGEXP[Storm_data$PROPDMGEXP %in% c("b","B")] <- "1000000000"
Storm_data$PROPDMGEXP <- as.numeric(Storm_data$PROPDMGEXP)

Storm_data$CROPDMGEXP <- as.character(Storm_data$CROPDMGEXP)
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("","?","-")] <- "0"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("+")] <- "1"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("h","H")] <- "100"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("k","K")] <- "1000"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("m","M")] <- "1000000"
Storm_data$CROPDMGEXP[Storm_data$CROPDMGEXP %in% c("b","B")] <- "1000000000"
Storm_data$CROPDMGEXP <- as.numeric(Storm_data$CROPDMGEXP)

Results

  1. Events most harmful with respect to population health
injury <- aggregate(INJURIES ~ EVTYPE, data = Storm_data, FUN = sum)
fatality <- aggregate(FATALITIES ~ EVTYPE, data = Storm_data, FUN = sum)

Total_health_data<-merge(injury,fatality,by="EVTYPE")
#sum of total harm
Total_health_data$sum<-Total_health_data$INJURIES+Total_health_data$FATALITIES

Total_health_data <- Total_health_data[order(Total_health_data$sum, decreasing = T), ]
#Top 10 events
Total_health_harm <- Total_health_data[1:10, ]
Total_health_harm
##                EVTYPE INJURIES FATALITIES   sum
## 751           TORNADO    91346       5658 97004
## 680 THUNDERSTORM WIND     9356        702 10058
## 116    EXCESSIVE HEAT     6525       1903  8428
## 154             FLOOD     6791        472  7263
## 416         LIGHTNING     5230        816  6046
## 243              HEAT     2100        937  3037
## 138       FLASH FLOOD     1777        978  2755
## 385         ICE STORM     1975         89  2064
## 878      WINTER STORM     1321        206  1527
## 370 HURRICANE/TYPHOON     1322        126  1448

From the above result it can be inferred that Tornado is the most harmful for both injuries and fatalities.

Plot of total population health harm

g<- ggplot(Total_health_harm, aes(EVTYPE, sum)) + 
    labs(title="Total Harm to Population Health") +
    xlab("Top 10 population harm events") + ylab("Sum of injuries and fatalities")
plot1<- g + geom_bar(stat="identity",aes(fill = EVTYPE)) + theme(axis.text.x = element_text(angle = 90, hjust = 1))
plot1

  1. Events which have the greatest economic consequences
Storm_data$PROPDMG <- Storm_data$PROPDMGEXP * Storm_data$PROPDMG
Storm_data$CROPDMG <- Storm_data$CROPDMGEXP * Storm_data$CROPDMG
#sum of total damage
Storm_data$Total_damage_sum<- Storm_data$PROPDMG+Storm_data$CROPDMG
Total_damage <- aggregate(Total_damage_sum ~ EVTYPE, data = Storm_data, FUN = sum)
Total_damage <- Total_damage[order(Total_damage$Total_damage_sum, decreasing = T), ]
#Top 10 events
Total_damage<- Total_damage[1:10, ]
Total_damage
##                EVTYPE Total_damage_sum
## 154             FLOOD     160468082750
## 370 HURRICANE/TYPHOON      89715787810
## 751           TORNADO      58954614161
## 595  STORM SURGE/TIDE      47965579000
## 212              HAIL      18758221820
## 138       FLASH FLOOD      17562129187
## 84            DROUGHT      15018672000
## 680 THUNDERSTORM WIND      10864367818
## 385         ICE STORM       8967041310
## 764    TROPICAL STORM       8382236550

From the above it can be inferred that Flood causes maxium economic damage

Plot of total economic damage

p <- qplot(EVTYPE, Total_damage_sum, data = Total_damage, stat='identity',geom = "bar", fill= EVTYPE,xlab="Top 10 economic consequences events",ylab="Economic 
damage",main="Economic damage due to severe weather events\nin the U.S from 1950-2011")
p + theme(axis.text.x = element_text(angle = 90,hjust = 1))

Conclusion

We can conclude that Tornado causes most population hamr and Flood causes most ecomomic harm.