Synopsis

Many severe weather events such as storms, hails, and tornados can result in fatalities, injuries, and property damage. This project identifies some important information related to such impacts of weather events on health and well-being of population using the U.S. National Oceanic and Atmospheric Administration???s (NOAA) storm database. This database tracks main characteristics of major storms and weather events in the United States, including the estimates of any fatalities, injuries, and property damage.

In this report,we presented an analysis of weather events that affect personal and property damages based on the data extracted from the database. The top 10 weather events that causes highest fatalities and highest injuries were plotted. Results of the analysis indicate that most fatalities and injuries were caused by Tornados. In addition, flood contributes to highest amount of property damages, while drought most significantly affected the crop production.

Data Processing

This section describes the data processing activities for analysis.

Data Retrieval and Extraction

The data can be downloaded and extracted using the following commands:

stormdset = read.csv("StormData.csv.bz2")
str(stormdset)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "000","0000","0001",..: 152 167 2645 1563 2524 3126 122 1563 3126 3126 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 826 826 826 826 826 826 826 826 826 826 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","+","-","0",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","0","2","?",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

We shall extract important fields from this dataset:

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
fatalData = select(stormdset, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
str(fatalData)
## 'data.frame':    902297 obs. of  7 variables:
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 826 826 826 826 826 826 826 826 826 826 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","+","-","0",..: 16 16 16 16 16 16 16 16 16 16 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","0","2","?",..: 1 1 1 1 1 1 1 1 1 1 ...

Data Cleaning and Preparation

Prior to data analysis, we need to prepare some of the parameter values, based on the exponent and levels details.

Finding exponent and levels for property damage and crop damage:

unique(fatalData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  + - 0 1 2 3 4 5 6 7 8 ? B H K M h m
unique(fatalData$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  0 2 ? B K M k m
# Assigning values for the property exponent data 
fatalData$PROPEXP[fatalData$PROPDMGEXP == "K"] <- 1000
fatalData$PROPEXP[fatalData$PROPDMGEXP == "M"] <- 1e+06
fatalData$PROPEXP[fatalData$PROPDMGEXP == ""] <- 1
fatalData$PROPEXP[fatalData$PROPDMGEXP == "B"] <- 1e+09
fatalData$PROPEXP[fatalData$PROPDMGEXP == "m"] <- 1e+06
fatalData$PROPEXP[fatalData$PROPDMGEXP == "0"] <- 1
fatalData$PROPEXP[fatalData$PROPDMGEXP == "5"] <- 1e+05
fatalData$PROPEXP[fatalData$PROPDMGEXP == "6"] <- 1e+06
fatalData$PROPEXP[fatalData$PROPDMGEXP == "4"] <- 10000
fatalData$PROPEXP[fatalData$PROPDMGEXP == "2"] <- 100
fatalData$PROPEXP[fatalData$PROPDMGEXP == "3"] <- 1000
fatalData$PROPEXP[fatalData$PROPDMGEXP == "h"] <- 100
fatalData$PROPEXP[fatalData$PROPDMGEXP == "7"] <- 1e+07
fatalData$PROPEXP[fatalData$PROPDMGEXP == "H"] <- 100
fatalData$PROPEXP[fatalData$PROPDMGEXP == "1"] <- 10
fatalData$PROPEXP[fatalData$PROPDMGEXP == "8"] <- 1e+08
# Assigning '0' to invalid exponent data
fatalData$PROPEXP[fatalData$PROPDMGEXP == "+"] <- 0
fatalData$PROPEXP[fatalData$PROPDMGEXP == "-"] <- 0
fatalData$PROPEXP[fatalData$PROPDMGEXP == "?"] <- 0
# Calculating the property damage value
fatalData$PROPDMGVAL <- fatalData$PROPDMG * fatalData$PROPEXP

# Assigning values for the crop exponent data 
fatalData$CROPEXP[fatalData$CROPDMGEXP == "M"] <- 1e+06
fatalData$CROPEXP[fatalData$CROPDMGEXP == "K"] <- 1000
fatalData$CROPEXP[fatalData$CROPDMGEXP == "m"] <- 1e+06
fatalData$CROPEXP[fatalData$CROPDMGEXP == "B"] <- 1e+09
fatalData$CROPEXP[fatalData$CROPDMGEXP == "0"] <- 1
fatalData$CROPEXP[fatalData$CROPDMGEXP == "k"] <- 1000
fatalData$CROPEXP[fatalData$CROPDMGEXP == "2"] <- 100
fatalData$CROPEXP[fatalData$CROPDMGEXP == ""] <- 1
# Assigning '0' to invalid exponent data
fatalData$CROPEXP[fatalData$CROPDMGEXP == "?"] <- 0
# calculating the crop damage value
fatalData$CROPDMGVAL <- fatalData$CROPDMG * fatalData$CROPEXP

We can get the aggregates for each incident by event type using the following code:

fatal <- aggregate(FATALITIES ~ EVTYPE, fatalData, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, fatalData, FUN = sum)
propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, fatalData, FUN = sum)
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, fatalData, FUN = sum)

Results

Plots of weather events with highest fatalities and highest injuries

The following charts show the highest number of fatalities and injuries due to the weather condition from 1950 till 2011:

# Listing  events with highest fatalities
fatalHigh <- fatal[order(-fatal$FATALITIES), ][1:10, ]
# Listing events with highest injuries
injuryHigh <- injury[order(-injury$INJURIES), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(fatalHigh$FATALITIES, las = 3, names.arg = fatalHigh$EVTYPE, main = "Weather Events with Highest Fatalities", 
        ylab = "Number of fatalities", col = "red")
barplot(injuryHigh$INJURIES, las = 3, names.arg = injuryHigh$EVTYPE, main = "Weather Events with Highest Injuries", 
        ylab = "Number of injuries", col = "red")

Note that Tornado event caused the maximum number of fatalities and injuries.

Plots of weather events with highest crop and property damages

The following charts show the number of crop damages and property damages due to the weather condition from 1950 till 2011:

 # Finding events with highest property damage
propdmg10 <- propdmg[order(-propdmg$PROPDMGVAL), ][1:10, ]
# Finding events with highest crop damage
cropdmg10 <- cropdmg[order(-cropdmg$CROPDMGVAL), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmg10$PROPDMGVAL/(10^9), las = 3, names.arg = propdmg10$EVTYPE, 
        main = "Events with Highest Property Damages", ylab = "Damage Cost ($ billions)", 
        col = "green")
barplot(cropdmg10$CROPDMGVAL/(10^9), las = 3, names.arg = cropdmg10$EVTYPE, 
        main = "Events With Highest Crop Damages", ylab = "Damage Cost ($ billions)", 
        col = "blue")

Note that flood event caused the highest property damages, while drought significantlly affected the crop production.