Title: Impact of severe weather events on US population and its economy.

Synopsis

Severe weather events can cause injuries and loss of human life, as well as economic damages. This analysis focuses on identifying severe weather events in US that are most harmful to the human population and those that impact economy by causing property and crop damages.

Storm data was originally obtained from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database has storm data from 1950 through 2011. In earlier years only fewer severe weather events were recorded, but as the years progressed storm data collection methods improved resulting in growth of number and type of storm events being recorded. NOAA’s website cautions about possible miscodings being present in this storm data. Data processing/cleansing techniques were applied on this data for this paper, but be informed of potential data anomalies.

Data Processing

Initial exploratory data analysis of this storm data identified the attributes that needed some data processing. Only selective attributes were included in the analytics data for this analysis to speed up processing time. Several data transformations were applied in order to prepare this data for reporting. National Weather Services storm data documentation was referred to heavily to improve quality of the storm data.

Storm Data for this analysis is a packed CSV file.

    options(width=200)
    ##Set current working directory in R 
        
    ## Download the packed CSV file and load the data
        download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="repdata-data-StormData.csv.bz2")
        storms<-read.csv("repdata-data-StormData.csv.bz2",header=TRUE,na.strings = "NA",stringsAsFactors=FALSE)
    
    ## Include R libraries needed for the analysis
    
        library(plyr)
        library(ggplot2)
    library(reshape2)
    library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI
## Loading required package: RSQLite.extfuns
    ## Attribute selection for the analysis
        nws_data<-subset(storms, select=c(BGN_DATE,STATE,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP))

Attributes included are described below. National Climatic Data Center document was referred to for this purpose.

Attributes from the raw data -
    . *BGN_DATE*   : Date the storm event began
    . *STATE*      : State postal code
    . *EVTYPE*     : Type of storm events. 
    . *FATALITIES* : Number directly killed
    . *INJURIES*   : Number directly injured
    . *PROPDMG*    : Property damage in whole numbers and hundredths
    . *PROPDMGEXP* : A multiplier where Hundred (H), Thousand (K), Million (M), Billion (B) etc, any non-numeric invalids values were translated to 1.
    . *CROPDMG*    : Crop damage in whole numbers and hundredths
    . *CROPDMGEXP* : A multiplier where Hundred (H), Thousand (K), Million (M), Billion (B) etc, any non-numeric invalids values were translated to 1.

Attributes derived from the raw data:
    . year         : Year derived from the BGN_DATE, and indicates the year in which the event occurred.
    . propdmgcost  : Total cost of property damages.
    . cropdmgcost  : Total cost of crop damages.
    . totaldmgcost : Total property and crop damage costs.

Property and crop damage costs in this data set must be calculated (or derived) for this reporting. For property and cost damages, a whole number field (attribute named like “***DMG“) and its unit (attribute named like CROPDMGEXP) are given in the data set. The derivation for cost attributes is done in two steps. Step one, convert units (defined above) to numeric equivalent. Step two, multiply the whole number damages with their numeric unit value from step one.

Event begin date attribute is used to populate the year attribute.

  ## Add "year" attribute to the data frame using BGN_DATE attribute.
        nws_data<- transform(nws_data,year=format(as.Date(nws_data$BGN_DATE,"%m/%d/%Y"),"%Y"))
    
  ## Set numeric unit values for property damage
  ## Multiplier where Hundred (H), Thousand (K), Million (M), Billion (B), number as given, 
  ## any invalid units are translated to 1

  ##Step One - Property Damage Costs derivation
        nws_data$PROPDMGEXP<-as.character(nws_data$PROPDMGEXP)
        nws_data[nws_data$PROPDMGEXP=="+"|nws_data$PROPDMGEXP=="-"|nws_data$PROPDMGEXP==""|
               nws_data$PROPDMGEXP=="?",]$PROPDMGEXP<-1
        nws_data[toupper(nws_data$PROPDMGEXP)=="H",]$PROPDMGEXP<-100
        nws_data[toupper(nws_data$PROPDMGEXP)=="K",]$PROPDMGEXP<-1000
        nws_data[toupper(nws_data$PROPDMGEXP)=="M",]$PROPDMGEXP<-1000000
        nws_data[toupper(nws_data$PROPDMGEXP)=="B",]$PROPDMGEXP<-1000000000
        nws_data$PROPDMGEXP<-as.numeric(nws_data$PROPDMGEXP)
            
  ## Set numeric unit values for crop damage
  ## Multiplier where Hundred (H), Thousand (K), Million (M), Billion (B), number as given, 
  ## any invalid units are translated to 1
  ##Step One - Crop Damage Costs derivation

        nws_data$CROPDMGEXP<-as.character(nws_data$CROPDMGEXP)
        nws_data[nws_data$CROPDMGEXP==""|nws_data$CROPDMGEXP=="?",]$CROPDMGEXP<-1
        nws_data[toupper(nws_data$CROPDMGEXP)=="K",]$CROPDMGEXP<-1000
        nws_data[toupper(nws_data$CROPDMGEXP)=="M",]$CROPDMGEXP<-1000000
        nws_data[toupper(nws_data$CROPDMGEXP)=="B",]$CROPDMGEXP<-1000000000
        nws_data$CROPDMGEXP<-as.numeric(nws_data$CROPDMGEXP)


    ## Step Two - Property, Crop and total Damage costs derivation. 
        nws_data<- transform(nws_data,propdmgcost=PROPDMGEXP*PROPDMG,
                                        cropdmgcost=CROPDMGEXP*CROPDMG)
        nws_data<- transform(nws_data,totaldmgcost=propdmgcost+cropdmgcost)

After reviewing the events in the data set and reading through NWS storm data documentation, we decided to exclude events corresponding to periodic summaries (ex: May Summary), rare unusual events (ex: Late Snowfall) and other meteorogical events (ex: Record Temperatures).

    ## Convert event type attribute (EVTYPE) to upper case
        nws_data$EVTYPE<-toupper(nws_data$EVTYPE)
    ## Remove rows corresponding to SUMMARY data, Rare unusual events and other meteorogical events
        nws_data<-nws_data[-grep("SUMMARY|MONTHLY TEMPERATURE|SNOWFALL|MONTHLY RAINFALL|
                             RECORD HIGH|SNOW ADVISORY|UNUSUALLY LATE SNOW|FIRST SNOW|
                             RECORD MAY SNOW|RECORD WINTER SNOW|LATE SNOW|METRO STORM|
                             PROLONG COLD/SNOW|LACK OF SNOW|EARLY|RECORD",nws_data$EVTYPE),]

National weather documentation was also used to improve the data quality for the events attribute (EVTYPE). For example: Similar storm events were worded slightly differently, e.g. “coastal flood” and “cstl flood;”FLASH FLOOD" and “FLASH FLOOODING”; or “TSTM for Thunderstorm” etc.

    ## Improve storm events (EVTYPE) attribute data quality based on 2.1.1 Storm Data Event Table in the NWS document
        nws_data$EVTYPE[grep("LOW TIDE",nws_data$EVTYPE)]<-"ASTRONOMICAL LOW TIDE"
        nws_data$EVTYPE[grep("AVA",nws_data$EVTYPE)]<-"AVALANCHE"
        nws_data$EVTYPE[grep("BLIZ",nws_data$EVTYPE)]<-"BLIZZARD"
        ##Look for both words in the pattern for an event
        pattern<-c("COASTAL","FLOOD")
        idx <- Reduce(`&`, lapply(pattern, grepl, nws_data$EVTYPE))
        nws_data$EVTYPE[idx]<-"COASTAL FLOOD"
        nws_data$EVTYPE[grep("CSTL FLOOD",nws_data$EVTYPE)]<-"COASTAL FLOOD"
        nws_data$EVTYPE[grep("^(COLD|WIND|COLD/WIND)",nws_data$EVTYPE)]<-"COLD/WIND CHILL"
        nws_data$EVTYPE[grep("EXTREME COLD|RECORD COLD|
                    EXTREME WIND CHILL|UNUSUALLY COLD",nws_data$EVTYPE)]<-"EXTREME COLD/WIND CHILL"  
        nws_data$EVTYPE[grep("FOG AND|FREEZING FOG|ICE FOG",nws_data$EVTYPE)]<-"FREEZING FOG"  
        nws_data$EVTYPE[grep("^(DENSE FOG|FOG|PATCHY D)",nws_data$EVTYPE)]<-"DENSE FOG"  
        nws_data$EVTYPE[grep("DROUGHT",nws_data$EVTYPE)]<-"DROUGHT"
        nws_data$EVTYPE[grep("DUST DE|BLOWING DUST",nws_data$EVTYPE)]<-"DUST DEVIL"
        ##Look for both words in the pattern for an event
        pattern<-c("FLASH","FLOOD")
        idx <- Reduce(`&`, lapply(pattern, grepl, nws_data$EVTYPE))
        nws_data$EVTYPE[idx]<-"FLASH FLOOD"
        nws_data$EVTYPE[grep("FLASH FLOOOD",nws_data$EVTYPE)]<-"FLASH FLOOD"
        nws_data$EVTYPE[grep("EXCESSIVE HEAT|EXTREME HEAT|RECORD HIGH",nws_data$EVTYPE)]<-"EXCESSIVE HEAT"
        nws_data$EVTYPE[grep("FROST|FREEZE",nws_data$EVTYPE)]<-"FROST/FREEZE"
        nws_data$EVTYPE[grep("^(HAIL)|SMALL HAIL|DEEP HAIL|NON SEVERE HAIL",nws_data$EVTYPE)]<-"HAIL"
        nws_data$EVTYPE[grep("^(HEAT)|HOT|WARM|HYPERTHERMIA/EXPOSURE|MONTHLY TEMPERATURE",nws_data$EVTYPE)]<-"HEAT"
        nws_data$EVTYPE[grep("FUNNEL",nws_data$EVTYPE)]<-"FUNNEL CLOUD"
        nws_data$EVTYPE[grep("HEAVY SNOW|RECORD SNOW|EXCESSIVE SNOW|
                         ACCUMULATED SNOWFALL|HEAVY LAKE SNOW",nws_data$EVTYPE)]<-"HEAVY SNOW"
        nws_data$EVTYPE[grep("HEAVY RAIN|EXCESSIVE RAIN|PROLONGED RAIN|(HEAVY)|EXCESSIVE PRECI|
            HEAVY PRECI|TORRENTIAL RAIN|EXTREMELY WET|EXCESSIVE WET|HEAVY SHOWER|HVY RAIN",nws_data$EVTYPE)]<-"HEAVY RAIN"
        nws_data$EVTYPE[grep("HIGH SURF|ROUGH SURF|HIGH WAVES|HIGH TIDES|HIGH SEAS|HEAVY SURF|
                         HAZARDOUS SURF",nws_data$EVTYPE)]<-"HIGH SURF"
        nws_data$EVTYPE[grep("HIGH WIND",nws_data$EVTYPE)]<-"HIGH WIND"
        nws_data$EVTYPE[grep("HURRICANE|TYPHOON",nws_data$EVTYPE)]<-"HURRICANE/TYPHOON"
        nws_data$EVTYPE[grep("ICE STORM",nws_data$EVTYPE)]<-"ICE STORM"
        nws_data$EVTYPE[grep("LAKE-E|LAKE E",nws_data$EVTYPE)]<-"LAKE-EFFECT SNOW"
        nws_data$EVTYPE[grep("LIGHTNING|LIGHTING|LIGNTNING",nws_data$EVTYPE)]<-"LIGHTNING"
        nws_data$EVTYPE[grep("SLEET|HEAVY WET SNOW|FREEZING RAIN|FREEZING DRIZZLE",nws_data$EVTYPE)]<-"SLEET"
        nws_data$EVTYPE[grep("^(STRONG WIND)",nws_data$EVTYPE)]<-"STRONG WIND"
        nws_data$EVTYPE[grep("MIXED PRECI|NORMAL PREC|RAINFALL|PRECIPITATION",nws_data$EVTYPE)]<-"RAIN"
        nws_data[nws_data$EVTYPE %in% unique(nws_data$EVTYPE)[
                  grep("THUN|THUDER|TSTM",unique(nws_data[nws_data$EVTYPE!=
              "MARINE THUNDERSTORM WIND",]$EVTYPE))],]$EVTYPE <-"THUNDERSTORM WIND"
        # Section 7.39 for NWS documentation indicates MICROBURST/GUSTNADO as Thunderstorm Wind.
        nws_data$EVTYPE[grep("MICROBURST|MICOBURST|GUST",nws_data$EVTYPE)]<-"MICROBURST"
        nws_data$EVTYPE[grep("TORN",nws_data$EVTYPE)]<-"TORNADO"
        nws_data$EVTYPE[grep("TROPICAL STORM",nws_data$EVTYPE)]<-"TROPICAL STORM"
        nws_data$EVTYPE[grep("VOLCANIC ASH",nws_data$EVTYPE)]<-"VOLCANIC ASH"
        nws_data$EVTYPE[grep("WATERSPOUT|WAYTERSPOUT|WATER SPOUT",nws_data$EVTYPE)]<-"WATERSPOUT"
        nws_data$EVTYPE[grep("WALL CLOUD",nws_data$EVTYPE)]<-"WALL CLOUD"
        nws_data$EVTYPE[grep("FIRE",nws_data$EVTYPE)]<-"WILDFIRE"
        nws_data$EVTYPE[grep("WINTER STORM",nws_data$EVTYPE)]<-"WINTER STORM"
        nws_data$EVTYPE[grep("WINTER WEATHER|WINTERY MIX|WINTRY MIX|ICY ROADS|SNOWFALL",nws_data$EVTYPE)]<-"WINTER WEATHER"
        nws_data$EVTYPE[grep("DRY",nws_data$EVTYPE)]<-"DRY WEATHER"
        ## Based on 7.14.1 section in the NWS documentation, a mudslide/landslide is categorized as Debris Flow
        nws_data$EVTYPE[grep("SLIDE|LANDSLUMP",nws_data$EVTYPE)]<-"DEBRIS FLOW"
        flood_idx<-grep("FLOOD|FLD",nws_data$EVTYPE)
        flash_idx<-grep("FLASH FLOOD",nws_data$EVTYPE)
        coastal_idx<-grep("COASTAL FLOOD",nws_data$EVTYPE)
        lake_idx<-grep("LAKESHORE FLOOD",nws_data$EVTYPE)
        nws_data[setdiff(flood_idx,union(union(flash_idx,coastal_idx),lake_idx)),]$EVTYPE<-"FLOOD"
        nws_data$EVTYPE[grep("RIP CURRENT",nws_data$EVTYPE)]<-"RIP CURRENTS"
        ## Categorize any event that has ICE or FREEZ in it and is not a standard event in 2.1.1 event table as SLEET.
            nws_data[intersect(setdiff(grep("ICE|FREEZ",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|LAKE-E",nws_data$EVTYPE)),
                    setdiff(grep("SNOW",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|LAKE-E",nws_data$EVTYPE))),]$EVTYPE<-"SLEET"
        ## Categorize remaining events that has ICE and is not a standard event as WINTER WEATHER
            nws_data[setdiff(grep("ICE",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|
                                              LAKE-E",nws_data$EVTYPE)),]$EVTYPE<-"WINTER WEATHER"
            nws_data[setdiff(grep("ICE|SNOW",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|
                                              LAKE-E|LATE SEASON SNOW|SNOWS",nws_data$EVTYPE)),]$EVTYPE<-"WINTER WEATHER"
            nws_data[setdiff(grep("WIND",nws_data$EVTYPE),grep("EXTREME COLD/WIND CHILL|
                                          THUNDERSTORM WIND|MARINE|HIGH WIND|CHILL",nws_data$EVTYPE)),]$EVTYPE<-"HIGH WIND"
            nws_data$EVTYPE[grep("CHILL|UNSEASONABLY COLD",nws_data$EVTYPE)]<-"EXTREME COLD/WIND CHILL" 

A few outliers were observed in the data set. But for the purposes of this analysis a decision was made to ignore any outliers.

    ## Sample Outliers in the dataset
    ## May 22, 2011 Joplin, Missouri EF5 Tornado 
      nws_data[nws_data$FATALITIES>150 & nws_data$EVTYPE=="TORNADO",]
##                 BGN_DATE STATE  EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP year propdmgcost cropdmgcost totaldmgcost
## 862634 5/22/2011 0:00:00    MO TORNADO        158     1150     2.8      1e+09       0       1000 2011     2.8e+09           0      2.8e+09
    ## The heat wave of [Chicago](http://www.crh.noaa.gov/lot?n=top20events_1900to1999)
          nws_data[nws_data$FATALITIES>500 & nws_data$EVTYPE=="HEAT",] 
##                 BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP year propdmgcost cropdmgcost totaldmgcost
## 198704 7/12/1995 0:00:00    IL   HEAT        583        0       0          1       0          1 1995           0           0            0
    ## Potential data issue
          nws_data[nws_data$totaldmgcost>1e+10 & nws_data$EVTYPE=="FLOOD",] 
##                BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP year propdmgcost cropdmgcost totaldmgcost
## 605953 1/1/2006 0:00:00    CA  FLOOD          0        0     115      1e+09    32.5      1e+06 2006    1.15e+11    32500000     1.15e+11

Summarize data Metrics like total, average and median fatalities, injuries , costs (property damage costs, crop damage costs and total costs) are computed for each event and for each year. The data are further summarized by event type using these same metrics.

        ## Yearly totals by Event Type
                nws_yearly_aggregate <- ddply(nws_data, .(year,EVTYPE), summarize, 
                                    totFatalities = sum(FATALITIES), 
                                    totInjuries = sum(INJURIES), 
                                    totPropcost = sum(propdmgcost),
                                    totCropcost = sum(cropdmgcost),
                                    totalcost=sum(propdmgcost) +sum(cropdmgcost)
                                    )
                                    
        ## Overall Totals and Averages by Event Type. Calculating median for totalcost as an outlier was observed in the data set.

        eventSummary <-sqldf("select EVTYPE,sum(totFatalities) totFatalities,
                    AVG(totFatalities) avgFatalities,
                    AVG(CASE WHEN year='2011' THEN totFatalities END) AVG_1yr_FATAL,
                    AVG(CASE WHEN year>'2000' THEN totFatalities END) AVG_10yr_FATAL,
                    AVG(CASE WHEN year>'1980' THEN totFatalities END) AVG_30yr_FATAL,
                    sum(totInjuries) totInjuries,
                    AVG(totInjuries) avgInjuries,
                    AVG(CASE WHEN year='2011' THEN totInjuries END) AVG_1yr_Injuries,
                    AVG(CASE WHEN year>'2000' THEN totInjuries END) AVG_10yr_Injuries,
                    AVG(CASE WHEN year>'1980' THEN totInjuries END) AVG_30yr_Injuries,
                    sum(totPropcost) totPropcost,
                    SUM(totCropcost) totCropcost,
                    SUM(totalcost) totalcost,
                    AVG(totalcost) AVGtotalcost,
                    AVG(CASE WHEN year='2011' THEN totalcost END) AVG_1yr_totalcost,
                    AVG(CASE WHEN year>'2000' THEN totalcost END) AVG_10yr_totalcost,
                    AVG(CASE WHEN year>'1980' THEN totalcost END) AVG_30yr_totalcost,
                    median(totalcost) mediantotalcost,
                    median(CASE WHEN year='2011' THEN totalcost END) median_1yr_totalcost,
                    median(CASE WHEN year>'2000' THEN totalcost END) median_10yr_totalcost,
                    median(CASE WHEN year>'1980' THEN totalcost END) median_30yr_totalcost,
                    median(totFatalities) medianFatalities,
                    median(CASE WHEN year='2011' THEN totFatalities END) median_1yr_FATAL,
                    median(CASE WHEN year>'2000' THEN totFatalities END) median_10yr_FATAL,
                    median(CASE WHEN year>'1980' THEN totFatalities END) median_30yr_FATAL,
                    median(totInjuries) medianInjuries,
                    median(CASE WHEN year='2011' THEN totInjuries END) median_1yr_Injuries,
                    median(CASE WHEN year>'2000' THEN totInjuries END) median_10yr_Injuries,
                    median(CASE WHEN year>'1980' THEN totInjuries END) median_30yr_Injuries
                from nws_yearly_aggregate
                group by EVTYPE")
## Loading required package: tcltk

Top 10 events causing fatalities in human population and their metrics (total, average and median)

                subset(eventSummary[order(eventSummary$totFatalities,eventSummary$totInjuries, decreasing=T),],
                select=c(EVTYPE,totFatalities,avgFatalities,medianFatalities,totInjuries,avgInjuries,medianInjuries,totalcost))[1:10,]
##                     EVTYPE totFatalities avgFatalities medianFatalities totInjuries avgInjuries medianInjuries totalcost
## 98                 TORNADO          5636         90.90             56.5       91407     1474.31         1060.0 5.741e+10
## 34          EXCESSIVE HEAT          1999        111.06             59.5        6680      371.11          293.5 5.053e+08
## 53               HIGH WIND          1156         20.28              3.0       11348      199.09           60.0 2.095e+10
## 46                    HEAT          1155         72.19             14.5        2498      156.12           15.0 4.193e+08
## 37             FLASH FLOOD          1035         54.47             58.0        1802       94.84           44.0 1.844e+10
## 62               LIGHTNING           817         43.00             44.0        5232      275.37          256.0 9.461e+08
## 83            RIP CURRENTS           572         31.78             30.5         529       29.39           22.0 1.630e+05
## 38                   FLOOD           511         26.89             22.0        6874      361.79           14.0 1.611e+11
## 36 EXTREME COLD/WIND CHILL           489         25.74             27.0         407       21.42           14.0 1.513e+09
## 47              HEAVY RAIN           291         15.32             11.0        1404       73.89           61.0 5.285e+09

Percent of total fatalities caused by Tornado

        round(eventSummary[eventSummary$EVTYPE=="TORNADO",]$totFatalities/sum(eventSummary$totFatalities),2)*100
## [1] 37

Top 10 events causing injuries in human population and their metrics (total, average and median)

        subset(eventSummary[order(eventSummary$totInjuries,eventSummary$totFatalities, decreasing=T),],
                select=c(EVTYPE,totInjuries,avgInjuries,medianInjuries,totFatalities,avgFatalities,medianFatalities))[1:10,]
##             EVTYPE totInjuries avgInjuries medianInjuries totFatalities avgFatalities medianFatalities
## 98         TORNADO       91407     1474.31         1060.0          5636        90.903             56.5
## 53       HIGH WIND       11348      199.09           60.0          1156        20.281              3.0
## 38           FLOOD        6874      361.79           14.0           511        26.895             22.0
## 34  EXCESSIVE HEAT        6680      371.11          293.5          1999       111.056             59.5
## 62       LIGHTNING        5232      275.37          256.0           817        43.000             44.0
## 46            HEAT        2498      156.12           15.0          1155        72.188             14.5
## 57       ICE STORM        1990      104.74            8.0            89         4.684              3.0
## 37     FLASH FLOOD        1802       94.84           44.0          1035        54.474             58.0
## 121       WILDFIRE        1608       84.63           82.0            90         4.737              2.0
## 47      HEAVY RAIN        1404       73.89           61.0           291        15.316             11.0

Percent of total injuries caused by Tornado

            round(eventSummary[eventSummary$EVTYPE=="TORNADO",]$totInjuries/sum(eventSummary$totInjuries),2)*100
## [1] 65

Percent of Total Fatalities caused by Excessive Heat and/or Heat

            round(sum(eventSummary[grep("HEAT",eventSummary$EVTYPE),]$totFatalities)/sum(eventSummary$totFatalities),2)*100
## [1] 21
        ## Get data for top 10 most fatalities storm events.

            eventSummaryFATsort<-eventSummary[order(eventSummary$totFatalities, decreasing=T),]
            eventSummaryFATsort<-eventSummaryFATsort[1:10,] 
            
            ## Average Fatalities for Analysis
                eventAVGFatalities<-melt(subset(eventSummaryFATsort,
                                    select=c(EVTYPE,avgFatalities,AVG_1yr_FATAL,AVG_10yr_FATAL,AVG_30yr_FATAL))
                                    , id.vars="EVTYPE") 
                eventAVGFatalities$value <-as.numeric(eventAVGFatalities$value)
                eventAVGFatalities$variable <-as.character(eventAVGFatalities$variable)
                eventAVGFatalities[eventAVGFatalities$variable=="avgFatalities",]$variable <-"1950-2011"
                eventAVGFatalities[eventAVGFatalities$variable=="AVG_1yr_FATAL",]$variable <-"2011"
                eventAVGFatalities[eventAVGFatalities$variable=="AVG_10yr_FATAL",]$variable <-"2001-2011"
                eventAVGFatalities[eventAVGFatalities$variable=="AVG_30yr_FATAL",]$variable <-"1981-2011)"


            ## Median Fatalities for Analysis
                eventMedianFatalities<-melt(subset(eventSummaryFATsort,
                                    select=c(EVTYPE,medianFatalities,median_1yr_FATAL,median_10yr_FATAL,median_30yr_FATAL))
                                    , id.vars="EVTYPE")  
                eventMedianFatalities$value <-as.numeric(eventMedianFatalities$value)
                eventMedianFatalities<-eventMedianFatalities[order(eventMedianFatalities$value),]
                eventMedianFatalities$variable <-as.character(eventMedianFatalities$variable)
                eventMedianFatalities[eventMedianFatalities$variable=="medianFatalities",]$variable <-"1950-2011"
                eventMedianFatalities[eventMedianFatalities$variable=="median_1yr_FATAL",]$variable <-"2011"
                eventMedianFatalities[eventMedianFatalities$variable=="median_10yr_FATAL",]$variable <-"2001-2011"
                eventMedianFatalities[eventMedianFatalities$variable=="median_30yr_FATAL",]$variable <-"1981-2011"
        
        ## Get data for top 10 storm events causing injuries.

                eventSummaryInjsort<-eventSummary[order(eventSummary$totInjuries, decreasing=T),]
                eventSummaryInjsort<-eventSummaryInjsort[1:10,]  

            ## Average Injuries for Analysis
                eventAVGInjuries<-melt(subset(eventSummaryInjsort,
                                    select=c(EVTYPE,avgInjuries,AVG_1yr_Injuries,AVG_10yr_Injuries,AVG_30yr_Injuries))
                                    , id.vars="EVTYPE")  
                eventAVGInjuries$value <-as.numeric(eventAVGInjuries$value)
                eventAVGInjuries$variable <-as.character(eventAVGInjuries$variable)
                eventAVGInjuries[eventAVGInjuries$variable=="avgInjuries",]$variable       <-"1950-2011"
                eventAVGInjuries[eventAVGInjuries$variable=="AVG_1yr_Injuries",]$variable  <-"2011"
                eventAVGInjuries[eventAVGInjuries$variable=="AVG_10yr_Injuries",]$variable <-"2001-2011"
                eventAVGInjuries[eventAVGInjuries$variable=="AVG_30yr_Injuries",]$variable <-"1981-2011)"


            ## Median Injuries for Analysis
                eventMedianInjuries<-melt(subset(eventSummaryInjsort,
                                        select=c(EVTYPE,medianInjuries,median_1yr_Injuries,median_10yr_Injuries,median_30yr_Injuries))
                                        , id.vars="EVTYPE")  
                eventMedianInjuries$value <-as.numeric(eventMedianInjuries$value)
                eventMedianInjuries<-eventMedianInjuries[order(eventMedianInjuries$value),]
                eventMedianInjuries$variable <-as.character(eventMedianInjuries$variable)
                eventMedianInjuries[eventMedianInjuries$variable=="medianInjuries",]$variable <-"1950-2011"
                eventMedianInjuries[eventMedianInjuries$variable=="median_1yr_Injuries",]$variable <-"2011"
                eventMedianInjuries[eventMedianInjuries$variable=="median_10yr_Injuries",]$variable <-"2001-2011"
                eventMedianInjuries[eventMedianInjuries$variable=="median_30yr_Injuries",]$variable <-"1981-2011"       
        
## 2011 data for severe weather events
        sqldf("select EVTYPE,sum(totFatalities) totFatalities,                  
                        sum(totInjuries) totInjuries,
                        sum(totPropcost) totPropcost,
                        SUM(totCropcost) totCropcost,
                        SUM(totalcost) totalcost
                    from nws_yearly_aggregate
                    where year='2011'
                    group by EVTYPE
                    order by 4  desc
                    ")[1:10,]
##              EVTYPE totFatalities totInjuries totPropcost totCropcost totalcost
## 1           TORNADO           587        6163   9.820e+09    31361000 9.851e+09
## 2             FLOOD            58          10   7.718e+09   154872000 7.873e+09
## 3       FLASH FLOOD            68          30   1.384e+09    88447000 1.472e+09
## 4          WILDFIRE             6         116   6.483e+08     9797000 6.581e+08
## 5              HAIL             0          31   4.513e+08    82334000 5.337e+08
## 6         HIGH WIND            68         417   4.414e+08   199184000 6.406e+08
## 7    TROPICAL STORM             4           1   1.387e+08    24501000 1.632e+08
## 8           TSUNAMI             1           0   5.355e+07           0 5.355e+07
## 9         LIGHTNING            26         194   4.698e+07      112000 4.709e+07
## 10 STORM SURGE/TIDE             0           0   4.070e+07           0 4.070e+07
  ## Economic Damages Data Analysis

      subset(eventSummary[order(eventSummary$totalcost, decreasing=T),],
            select=c(EVTYPE,totalcost,totPropcost,totCropcost,AVGtotalcost,mediantotalcost,
                AVG_10yr_totalcost,median_10yr_totalcost))[1:10,]
##                EVTYPE totalcost totPropcost totCropcost AVGtotalcost mediantotalcost AVG_10yr_totalcost median_10yr_totalcost
## 38              FLOOD 1.611e+11   1.502e+11   1.086e+10    8.477e+09       1.313e+09   12508050284.5455          1173603400.0
## 54  HURRICANE/TYPHOON 9.076e+10   8.526e+10   5.506e+09    5.042e+09       8.407e+08       7540017781.0           524977505.0
## 98            TORNADO 5.741e+10   5.699e+10   4.150e+08    9.259e+08       6.314e+08   1751364542.72727           802053210.0
## 96        STORM SURGE 4.332e+10   4.332e+10   5.000e+03    3.610e+09       8.164e+06       7194962500.0            26894000.0
## 53          HIGH WIND 2.095e+10   1.891e+10   2.046e+09    3.676e+08       0.000e+00   1053791499.09091           607908770.0
## 44               HAIL 1.902e+10   1.597e+10   3.047e+09    3.337e+08       0.000e+00   1200699378.18182           606047300.0
## 37        FLASH FLOOD 1.844e+10   1.691e+10   1.532e+09    9.705e+08       8.689e+08   1107210019.09091           991720900.0
## 26            DROUGHT 1.502e+10   1.046e+09   1.397e+10    7.905e+08       5.548e+08   685767545.454545           374677000.0
## 57          ICE STORM 8.968e+09   3.946e+09   5.022e+09    4.720e+08       1.132e+08   180310436.363636            87469000.0
## 121          WILDFIRE 8.900e+09   8.497e+09   4.033e+08    4.684e+08       2.019e+08   482018111.818182           202833630.0
      round(eventSummary[eventSummary$EVTYPE=="FLOOD",]$totalcost/sum(eventSummary$totalcost),2)*100            
## [1] 34
      round(eventSummary[eventSummary$EVTYPE=="HURRICANE/TYPHOON",]$totalcost/sum(eventSummary$totalcost),2)*100                
## [1] 19
      round(eventSummary[eventSummary$EVTYPE=="TORNADO",]$totalcost/sum(eventSummary$totalcost),2)*100              
## [1] 12
      eventSummarytotcostsort<-eventSummary[order(eventSummary$totalcost, decreasing=T),]
      eventSummarytotcostsort<-eventSummarytotcostsort[1:10,]   

    
      eventTotalcost<-melt(eventSummarytotcostsort[,c(1,15,16,17,18)], id.vars="EVTYPE")  
        eventTotalcost$value <-as.numeric(eventTotalcost$value)
        eventTotalcost$variable <-as.character(eventTotalcost$variable)
        eventTotalcost[eventTotalcost$variable=="AVGtotalcost",]$variable <-"1950-2011"
        eventTotalcost[eventTotalcost$variable=="AVG_1yr_totalcost",]$variable <-"2011"
        eventTotalcost[eventTotalcost$variable=="AVG_10yr_totalcost",]$variable <-"2001-2011"
        eventTotalcost[eventTotalcost$variable=="AVG_30yr_totalcost",]$variable <-"1981-2011"
  
    ## Median Cost Analysis
        eventMedianCost<-melt(eventSummarytotcostsort[,c(1,19,20,21,22)], id.vars="EVTYPE")  
        eventMedianCost$value <-as.numeric(eventMedianCost$value)
        eventMedianCost<-eventMedianCost[order(eventMedianCost$value),]
        eventMedianCost$variable <-as.character(eventMedianCost$variable)
        eventMedianCost[eventMedianCost$variable=="mediantotalcost",]$variable <-"1950-2011"
        eventMedianCost[eventMedianCost$variable=="median_1yr_totalcost",]$variable <-"2011"
        eventMedianCost[eventMedianCost$variable=="median_10yr_totalcost",]$variable <-"2001-2011"
        eventMedianCost[eventMedianCost$variable=="median_30yr_totalcost",]$variable <-"1981-2011"

Results

Storm events having most harmful effect on human population in US between 1950 and 2011

Tornado is identified to be the most harmful severe weather event causing over 37% of fatalities and 65% of injuries in US between 1950 and 2011. A total of 5636 fatalities and 91407 injuries have been recorded during these years. In the year 2011 especially fatalities (587) and injuries (6163) recorded from tornados were way above the normal averages and medians between 1950 and 2011.

Excessive heat and heat have proven to be the second most dangerous storm event for the human population. There were 1999 fatalities recorded from “excessive heat” and 1155 fatalities from “heat”. together “excessive heat” and “heat” accounted for over 21% of fatalities and 7% of injuries.

Flash flooding, lightning, thunderstorms winds and flooding have been observed as the next set of events causing most fatalities in human population. It is also observed that the thunderstorm winds, flood, excessive heat, lightning seem to cause most injuries in human population. All these observations and much more are clearly depicted in the analysis above and plots in the figure below.

    ## **FIGURE - 1.**

    ## Generate plots for FATALITIES
        plot1<-ggplot(data=eventSummaryFATsort, aes(x=reorder(EVTYPE, -totFatalities), y=totFatalities,las=2)) +
              geom_bar(stat="identity", position='dodge',fill="#FF9999") +
              labs( x = "Severe Storm Events (1950-2011)",y = "Total Fatalities", 
        title = "Top 10 events in US causing most fatalities") +
              theme(axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold"),
        axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
        axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
        plot.title = element_text(lineheight=3, face="bold", color="black", size=25))

        plot3<-ggplot(data=eventAVGFatalities, aes(x=EVTYPE, y=value,las=2)) +
              geom_bar(aes(fill=variable),stat="identity", position='dodge') +
              labs( x = "Severe Storm Events (1950-2011)",y = "Average Fatalities",
        title = "Average of Top 10 events in US causing most fatalities") +
        theme(axis.text=element_text(size=12),axis.title=element_text(size=14,face="bold"),
        axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
        axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
        legend.position="bottom", legend.text=element_text(size=14, face="bold"),
        plot.title = element_text(lineheight=3, face="bold", color="black", size=25))

    ## Generate plots for INJURIES

        plot2<-ggplot(data=eventSummaryInjsort, aes(x=reorder(EVTYPE, -totInjuries), y=totInjuries,las=2)) +
                        geom_bar(stat="identity", position='dodge',fill="#FF9999") +
                        labs( x = "Severe Storm Events (1950-2011)",y = "Total Injuries", 
                        title = "Top 10 events in US causing most injuries") +
                        theme(axis.text=element_text(size=12),axis.title=element_text(size=14,face="bold"),
            axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
            axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
            plot.title = element_text(lineheight=3, face="bold", color="black", size=25))

        plot4<-ggplot(data=eventAVGInjuries, aes(x=EVTYPE, y=value,las=2)) +
                  geom_bar(aes(fill=variable),stat="identity", position='dodge') +
                  labs( x = "Severe Storm Events (1950-2011)",y = "Average Injuries", 
            title = "Average of Top 10 events in US causing most injuries") +
                  theme(axis.text=element_text(size=12),axis.title=element_text(size=14,face="bold"),
            axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
            axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
            legend.position="bottom",  legend.text=element_text(size=14, face="bold"),
            plot.title = element_text(lineheight=3, face="bold", color="black", size=25))
    
        require(grid)
## Loading required package: grid
        grid.newpage()
        pushViewport(viewport(layout = grid.layout(2, 2)))   
        print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))         
        print(plot2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))
        print(plot3, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))         
        print(plot4, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))

plot of chunk unnamed-chunk-12 Total storm effect on Economy in US between 1950 and 2011

Floods, hurricane/typhoon and tornados seems to be top 3 severe storm events that caused most economic damages in US between 1950-2011. Floods constituted 34%, hurricane/typhoon constituted 19% and TORNADO constituted 12% of total economic damages during that period. Plots in the figure below show these results clearly.

    ## **FIGURE - 2.**

    ## Generate Plots for TOTALCOST
        plot1<-ggplot(data=eventSummarytotcostsort, aes(x=reorder(EVTYPE, -totalcost), y=totalcost,las=2)) +
              geom_bar(stat="identity", position='dodge',fill="#FF9999") +
              labs( x = "Severe Weather Events (1950-2011)",y = "Total Economic Damages", 
            title = "Top 10 storm events that cause most economic damage in US") +
            theme(axis.text=element_text(size=14,face="bold"),axis.title=element_text(size=14,face="bold"),
            axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
            axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
            plot.title = element_text(lineheight=3, face="bold", color="black", size=20))

        plot2<-ggplot(data=eventTotalcost, aes(x=EVTYPE, y=value,las=2)) +
                  geom_bar(aes(fill=variable),stat="identity", position='dodge') +
                  labs( x = "Severe Weather Events (1950-2011)",y = "Average Economic Damages", 
            title = "Average economic damage from storm events (top 10) in US") +
            theme(axis.text=element_text(size=14,face="bold"),axis.title=element_text(size=14,face="bold"),
            axis.text.x = element_text(size = 14, colour = 'black',face="bold",angle = 90),
            axis.text.y = element_text(size = 14, colour = 'black',face="bold"),
            legend.position="bottom", legend.text=element_text(size=14, face="bold"),
            plot.title = element_text(lineheight=3, face="bold", color="black", size=20))


        plot3<-ggplot(data=eventMedianCost, aes(x=EVTYPE, y=value,las=2)) +
                  geom_bar(aes(fill=variable),stat="identity", position='dodge') +
                  labs( x = "Severe Weather Events (1950-2011)",y = "Median Economic Damages", 
            title = "Median storm events causing most damage to US economy") +
                theme(axis.text=element_text(size=14,face="bold"),axis.title=element_text(size=14,face="bold"),
            axis.text.x = element_text(size = 14, colour = 'black',face="bold",angle = 90),
            axis.text.y = element_text(size = 14, colour = 'black',face="bold"),
            legend.position="bottom", legend.text=element_text(size=14, face="bold"),
            plot.title = element_text(lineheight=3, face="bold", color="black", size=20))
            
    require(grid)
        grid.newpage()
        pushViewport(viewport(layout = grid.layout(2,2)))   
        print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))         
        print(plot2, vp = viewport(layout.pos.row = 2, layout.pos.col = 1)) 
        print(plot3, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))     

plot of chunk unnamed-chunk-13