Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents:

This project is a part of Reproducible Research course offered on Coursera. It involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

# Loading the data
dataset <- read.csv('repdata_data_StormData.csv.bz2')

# Change column names to lowercase
colnames(dataset) <- tolower(colnames(dataset))

str(dataset)
## 'data.frame':    902297 obs. of  37 variables:
##  $ state__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ bgn_date  : Factor w/ 16335 levels "10/10/1954 0:00:00",..: 6523 6523 4213 11116 1426 1426 1462 2873 3980 3980 ...
##  $ bgn_time  : Factor w/ 3608 levels "000","0000","00:00:00 AM",..: 212 257 2645 1563 2524 3126 122 1563 3126 3126 ...
##  $ time_zone : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ county    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ countyname: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ state     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ evtype    : Factor w/ 985 levels "?","ABNORMALLY DRY",..: 830 830 830 830 830 830 830 830 830 830 ...
##  $ bgn_range : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ bgn_azi   : Factor w/ 35 levels "","E","Eas","EE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ bgn_locati: Factor w/ 54429 levels "","?","(01R)AFB GNRY RNG AL",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ end_date  : Factor w/ 6663 levels "","10/10/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ end_time  : Factor w/ 3647 levels "","?","0000",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ county_end: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ countyendn: logi  NA NA NA NA NA NA ...
##  $ end_range : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ end_azi   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ end_locati: Factor w/ 34506 levels "","(0E4)PAYSON ARPT",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ length    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ width     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ f         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ mag       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ fatalities: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ injuries  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ propdmg   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ propdmgexp: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ cropdmg   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ cropdmgexp: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ wfo       : Factor w/ 542 levels "","2","43","9V9",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ stateoffic: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ zonenames : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ latitude  : num  3040 3042 3340 3458 3412 ...
##  $ longitude : num  8812 8755 8742 8626 8642 ...
##  $ latitude_e: num  3051 0 0 0 0 ...
##  $ longitude_: num  8806 0 0 0 0 ...
##  $ remarks   : Factor w/ 436781 levels ""," ","  ","   ",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ refnum    : num  1 2 3 4 5 6 7 8 9 10 ...

Since, for our analysis we need particular columns, we subset our dataset.

# Subset on the parameters of interest
data <- subset(x=dataset,
               subset=(evtype != "?" & (injuries > 0 | fatalities > 0 | propdmg > 0 | cropdmg > 0)),
               select=c('evtype',
                        'fatalities',
                        'injuries',
                        'propdmg',
                        'propdmgexp',
                        'cropdmg',
                        'cropdmgexp'))    

Now, map the property and crop damage exponent alphabetic multipliers to numeric values.

# Change all damage exponents to uppercase.
data$propdmgexp <- toupper(data$propdmgexp)
data$cropdmgexp <- toupper(data$cropdmgexp)

# Map alphanumeric exponents to numeric values.
DmgKey <-  c("\"\"" = 10^0,
                 "-" = 10^0,
                 "+" = 10^0,
                 "0" = 10^0,
                 "1" = 10^1,
                 "2" = 10^2,
                 "3" = 10^3,
                 "4" = 10^4,
                 "5" = 10^5,
                 "6" = 10^6,
                 "7" = 10^7,
                 "8" = 10^8,
                 "9" = 10^9,
                 "H" = 10^2,
                 "K" = 10^3,
                 "M" = 10^6,
                 "B" = 10^9)

data$propdmgexp <- DmgKey[as.character(data$propdmgexp)]
data$propdmgexp[is.na(data$propdmgexp)] <- 10^0
    
data$cropdmgexp <- DmgKey[as.character(data$cropdmgexp)]
data$cropdmgexp[is.na(data$cropdmgexp)] <- 10^0

Analysis

# Aggregate the data for Fatalities and injuries
health <- aggregate(cbind(fatalities, injuries) ~ evtype, data=data, FUN=sum)
health$total <- health$fatalities + health$injuries
health <- health[health$total > 0, ]
health <- health[order(health$total, decreasing=TRUE), ]
rownames(health) <- tolower(rownames(health))
healthTop <- health[1:5, ]
summary(healthTop)
##             evtype    fatalities      injuries         total      
##  EXCESSIVE HEAT:1   Min.   : 470   Min.   : 5230   Min.   : 6046  
##  FLOOD         :1   1st Qu.: 504   1st Qu.: 6525   1st Qu.: 7259  
##  LIGHTNING     :1   Median : 816   Median : 6789   Median : 7461  
##  TORNADO       :1   Mean   :1865   Mean   :23369   Mean   :25235  
##  TSTM WIND     :1   3rd Qu.:1903   3rd Qu.: 6957   3rd Qu.: 8428  
##  ?             :0   Max.   :5633   Max.   :91346   Max.   :96979  
##  (Other)       :0
# Combine propdmg and propdmgexp parameters into a single parameter called propertyloss.
data$propertyloss <- data$propdmg * data$propdmgexp

# Combine cropdmg and cropdmgexp parameters into a single parameter called croploss.
data$croploss <- data$cropdmg * data$cropdmgexp

# Aggregate the data for property loss and crop loss
economic <- aggregate(cbind(propertyloss, croploss) ~ evtype, data=data, FUN=sum)
economic$total <- economic$propertyloss + economic$croploss
economic <- economic[economic$total > 0, ]
economic <- economic[order(economic$total, decreasing=TRUE), ]
rownames(economic) <- tolower(rownames(economic))
economicTop <- economic[1:5, ]
summary(economicTop)
##                evtype   propertyloss          croploss        
##  FLOOD            :1   Min.   :1.574e+10   Min.   :5.000e+03  
##  HAIL             :1   1st Qu.:4.332e+10   1st Qu.:4.150e+08  
##  HURRICANE/TYPHOON:1   Median :5.695e+10   Median :2.608e+09  
##  STORM SURGE      :1   Mean   :6.599e+10   Mean   :2.342e+09  
##  TORNADO          :1   3rd Qu.:6.931e+10   3rd Qu.:3.026e+09  
##  ?                :0   Max.   :1.447e+11   Max.   :5.662e+09  
##  (Other)          :0                                          
##      total          
##  Min.   :1.876e+10  
##  1st Qu.:4.332e+10  
##  Median :5.736e+10  
##  Mean   :6.834e+10  
##  3rd Qu.:7.191e+10  
##  Max.   :1.503e+11  
## 

Results

Plotting the human health impact data

library(ggplot2)
library(gridExtra)
fatalities_plot <- ggplot() + geom_bar(data = healthTop, aes(x = evtype, 
    y = fatalities, fill = interaction(fatalities, evtype)), stat = "identity", 
    show.legend = F) + theme(axis.text.x = element_text(angle = 30, hjust = 1)) +
    xlab("Harmful Events") + ylab("No. of fatailities") + ggtitle("Weather events causing fatalities")
   
# Plot injuries and store at variable Inj_plot
injuries_plot <- ggplot() + geom_bar(data = healthTop, aes(x = evtype, y = injuries, 
    fill = interaction(injuries, evtype)), stat = "identity", show.legend = F) + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Harmful Events") + 
    ylab("No. of Injuries") + ggtitle("Weather events causing Injuries")
# Draw two plots generated above dividing space in two columns

grid.arrange(fatalities_plot, injuries_plot, ncol = 2)

Inference: We observed that tornado is the major cause of fatalities and injuries in the US and it far exceeds all other disasters or weather events.

Plotting the economic impact data

propertyloss_plot <- ggplot() + geom_bar(data = economicTop, aes(x = evtype, 
    y = propertyloss, fill = interaction(propertyloss, evtype)), stat = "identity", 
    show.legend = F) + theme(axis.text.x = element_text(angle = 30, hjust = 1)) +
    xlab("Harmful Events") + ylab("Amount of Property Loss (in US dollars)") + ggtitle("Weather events causing Property damage")
   
# Plot injuries and store at variable Inj_plot
croploss_plot <- ggplot() + geom_bar(data = economicTop, aes(x = evtype, y = croploss, 
    fill = interaction(croploss, evtype)), stat = "identity", show.legend = F) + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Harmful Events") + 
    ylab("Amount of Crop Loss (in US dollars)") + ggtitle("Weather events causing Crop damage")
# Draw two plots generated above dividing space in two columns

grid.arrange(propertyloss_plot, croploss_plot, ncol = 2)

Inference: We observed that flood is the major cause of economic loss, both in terms of property damage and crop damage.

Plotting the total data

healh_total_plot <- ggplot() + geom_bar(data = healthTop, aes(x = evtype, 
    y = total, fill = interaction(total, evtype)), stat = "identity", 
    show.legend = F) + theme(axis.text.x = element_text(angle = 30, hjust = 1)) +
    xlab("Harmful Events") + ylab("Total health impact") + ggtitle("Weather events impacting Health")
   
# Plot injuries and store at variable Inj_plot
economic_total_plot <- ggplot() + geom_bar(data = economicTop, aes(x = evtype, y = total, 
    fill = interaction(total, evtype)), stat = "identity", show.legend = F) + 
    theme(axis.text.x = element_text(angle = 30, hjust = 1)) + xlab("Harmful Events") + 
    ylab("Amount of Economic loss (in US dollars)") + ggtitle("Weather events causing Economic damage")
# Draw two plots generated above dividing space in two columns

grid.arrange(healh_total_plot, economic_total_plot, ncol = 2)

Inference: While, tornado has the major impact on human health among all weather realated events or disasters, but flood causes more economic damage than tornado or any other weather event.