Title: US Storm Data Investigation for Storm Damage and Health Consequences

Synopsis

An analysis of the U.S. National Oceanic and Atmospheric Administration’s storm event database, gathered between 1950 and 2011, was performed with emphasis major causes of personal health issues and major damages for both crops and property. By summing the totals over the range of years, the study revealed that tornadoes are the single most contibuting factor, in both injuries and fatalities, in the United States. The economic impact of weather events was also summed up for the range of years. Tornadoes came out first in property damages . The largest damage to crops was caused by hail.

Introduction

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:

Storm Data [47Mb] There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.

National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Data Processing

The code chunk below performs the required processing of the data for publication. It prepares the neccessary data tables to be displayed in the Result section.

setwd("C:/Albert/Coursera/Reproduceable Research/Project 2")

library(ggplot2) 
library(scales)
library(dplyr)
library(data.table)
library(tidyr)
library(lubridate) 
library(magrittr)
library("devtools")
library(rCharts)
require(knitr) 
devtools::install_github('jbryer/DataCache')
## Skipping install of 'DataCache' from a github remote, the SHA1 (c1889dab) has not changed since last install.
##   Use `force = TRUE` to force installation
require(markdown) # required for md to html 
library("knitr")
library('DataCache')


 SD<-read.csv("repdata%2Fdata%2FStormData.csv.bz2") ## read in the data set
 dim(SD) ##rows and columns
## [1] 902297     37
 str(SD)##show data fields, object type, sample fields ... etc.
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
library(lattice)
library(dplyr)  ## Kludge to prevent select from hiccuping

p<-select(SD,FATALITIES,INJURIES,EVTYPE) # fatalities injuries and events from DB
p<-filter(p,FATALITIES>0|INJURIES>0) #filter out zero values
pF<-select(p,EVTYPE,FATALITIES)#split into Fatalities
pI<-select(p,EVTYPE,INJURIES)# and Injuries
pF<-filter(pF,FATALITIES>0) # remove any residual zeros
pI<-filter(pI,INJURIES>0) # ditto
names(pI)<-c("EVTYPE","number") #change column names
names(pF)<-c("EVTYPE","number")
I <- aggregate(x = list(number = pI$number), by = list(EVTYPE = pI$EVTYPE),
               FUN = sum, na.rm = TRUE) # take sum of all the values for each event
F <- aggregate(x = list(number = pF$number), by = list(EVTYPE = pF$EVTYPE),
                     FUN = sum, na.rm = TRUE) #Ditto
F<-mutate(F,lognum=log10(number)) # take the base 10 log of the total number of fatalities
I<-mutate(I,lognum=log10(number))# take the base 10 log of the total number of injuries
names(I)<-c("EVTYPE","number","lognum") #add lognum name for the column
names(F)<-c("EVTYPE","number","lognum") #ditto

F<-data.frame(F,type="Fatalities") # add the type name to data frame fatalities
I<-data.frame(I,type="Injuries") # same for injuries
SP<-rbind(F,I) #bind the 2 frames together for the plot
SP<-filter(SP,lognum>2.47748) #remove totals <= 300


P<-select(SD,EVTYPE,PROPDMG)# same thing all the way down
C<-select(SD,EVTYPE,CROPDMG)
P<-filter(P,PROPDMG>0)
C<-filter(C,CROPDMG>0)
names(P)<-c("EVTYPE","number")
names(C)<-c("EVTYPE","number")
PD <- aggregate(x = list(number = P$number), by = list(EVTYPE = P$EVTYPE),
               FUN = sum, na.rm = TRUE)
CD<- aggregate(x = list(number = C$number), by = list(EVTYPE = C$EVTYPE),
                     FUN = sum, na.rm = TRUE)
PD<-mutate(PD,lognum=log10(number))
CD<-mutate(CD,lognum=log10(number))
CD<-data.frame(CD,type="Crop Damage")
PD<-data.frame(PD,type="Property Damage")
SX<-rbind(PD,CD)

SX<-filter(SX,lognum>3.47718) # limit number entries <= 3000

Results

Part II: Storm Damage

In the tables below the 12 largest storm sources for property damage and crop damage, repectively, are provided. Note that TSTM stands for Too Small To Measure and also, that there is column to the right of the number column that is called lognum. The lognum column contains the value of the Base 10 logarithm of the number column. This is used for plotting the ordinate in the figure.

The figure plot clearly delineates the sources of the damage issues for both crop damage and propert damage.

head(PD[order(-PD[,2]),],12) ## put PROPERTY DAMAGE in descending order
##                 EVTYPE     number   lognum            type
## 334            TORNADO 3212258.16 6.506810 Property Damage
## 51         FLASH FLOOD 1420124.59 6.152326 Property Damage
## 348          TSTM WIND 1335965.61 6.125795 Property Damage
## 64               FLOOD  899938.48 5.954213 Property Damage
## 296  THUNDERSTORM WIND  876844.17 5.942922 Property Damage
## 106               HAIL  688693.38 5.838026 Property Damage
## 209          LIGHTNING  603351.78 5.780571 Property Damage
## 309 THUNDERSTORM WINDS  446293.18 5.649620 Property Damage
## 159          HIGH WIND  324731.56 5.511524 Property Damage
## 399       WINTER STORM  132720.59 5.122938 Property Damage
## 133         HEAVY SNOW  122251.99 5.087256 Property Damage
## 389           WILDFIRE   84459.34 4.926648 Property Damage
head(CD[order(-CD[,2]),],12) ## put CROP DAMAGE in descending order
##                 EVTYPE    number   lognum        type
## 42                HAIL 579596.28 5.763126 Crop Damage
## 23         FLASH FLOOD 179200.46 5.253339 Crop Damage
## 27               FLOOD 168037.88 5.225407 Crop Damage
## 115          TSTM WIND 109202.60 5.038233 Crop Damage
## 107            TORNADO 100018.52 5.000080 Crop Damage
## 94   THUNDERSTORM WIND  66791.45 4.824721 Crop Damage
## 10             DROUGHT  33898.62 4.530182 Crop Damage
## 97  THUNDERSTORM WINDS  18684.93 4.271491 Crop Damage
## 60           HIGH WIND  17283.21 4.237624 Crop Damage
## 54          HEAVY RAIN  11122.80 4.046214 Crop Damage
## 37        FROST/FREEZE   7034.14 3.847211 Crop Damage
## 19        EXTREME COLD   6121.14 3.786832 Crop Damage
xyplot(as.integer(SX$lognum) ~ SX$EVTYPE|SX$type, main="Types of events that have the greatest economic consequences",xlab="Source of Damage", ylab="Base 10 Log of Total Cost in US Dollars",layout=c(1,2), type="h",scales=list(x=list(rot=90)))