Synopsis

This report examined NOAA data for extreme weather events in the US. Statistical data analysis (documented below) answered two questions about severe weather events:

Data Processing

The NOAA data [47Mb] for this project is in a comma-separated-value file and can bedownload the file from here: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Documentation (variables, etc) for the database is available here: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

National Climatic Data Center Storm Events FAQ:
https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2FNCDC%20Storm%20Events-FAQ%20Page.pdf

Events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. Additionally, the data documentation reveals that the current damage categories were introduced in 1995.

Consequently, the data set used for this project will range from 1996 to 2011


  1. Load the libraries
        library(ggplot2)
        library(knitr)
        library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
        library("gridExtra") 
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine


  1. Load the raw CSV file containing the data
        myData<-read.csv("repdata_data_StormData.csv")  # copy of data


  1. Process/transform the data (if necessary) into a format suitable for your analysis
        str(myData) # to understand what data looks like
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
        head(myData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE  EVTYPE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL TORNADO
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL TORNADO
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL TORNADO
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL TORNADO
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL TORNADO
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL TORNADO
##   BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1         0                                               0         NA
## 2         0                                               0         NA
## 3         0                                               0         NA
## 4         0                                               0         NA
## 5         0                                               0         NA
## 6         0                                               0         NA
##   END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1         0                      14.0   100 3   0          0       15    25.0
## 2         0                       2.0   150 2   0          0        0     2.5
## 3         0                       0.1   123 2   0          0        2    25.0
## 4         0                       0.0   100 2   0          0        2     2.5
## 5         0                       0.0   150 2   0          0        2     2.5
## 6         0                       1.5   177 2   0          0        6     2.5
##   PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1          K       0                                         3040      8812
## 2          K       0                                         3042      8755
## 3          K       0                                         3340      8742
## 4          K       0                                         3458      8626
## 5          K       0                                         3412      8642
## 6          K       0                                         3450      8748
##   LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1       3051       8806              1
## 2          0          0              2
## 3          0          0              3
## 4          0          0              4
## 5          0          0              5
## 6          0          0              6


  1. Date in BGN_DATE is stored as char, convert to Date
        myData$BGN_DATE <- as.Date(myData$BGN_DATE, "%m/%d/%Y") 


  1. Filter out dates prior to 1996, since earlier EVENT data is incomplete. 902297 to 653530 obs.
        myData2<-myData[myData$BGN_DATE > "1995/12/31",]


  1. Event type has mixed upper and lower case. Change to all upper.
        myData2$EVTYPE<-toupper(myData2$EVTYPE)


  1. Breakdown data into subset to include population health and covert to factors
        health <- subset(myData2, select = c(EVTYPE,FATALITIES,INJURIES ))
        health$EVTYPE<-as.factor(health$EVTYPE)
        # aggregate teh data by EVENT type
        agg_health<-aggregate(cbind(health$FATALITIES, health$INJURIES),by=list(Category=health$EVTYPE), FUN=sum)
        # sort the table so we get top 10 deadliest 
        deaths<-agg_health[order(-agg_health$V1),]
        deathsTop <- deaths[1:10,]
        
        # sort the table so we get top 10 most injuries 
        injury<-agg_health[order(-agg_health$V2),]
        injuryTop <- injury[1:10,] 


  1. Breakdown data into subset to include only economic data and covert to numbers
        economic <- subset(myData2, select = c(EVTYPE,PROPDMG,PROPDMGEXP, CROPDMG, CROPDMGEXP))

        # convert alpha to uppercase
        economic$PROPDMGEXP <-toupper(economic$PROPDMGEXP)
        economic$CROPDMGEXP <-toupper(economic$CROPDMGEXP)

        # convert letters to numbers so we can calc
        economic$c_multi[economic$CROPDMGEXP=="K"]<-1000
        economic$c_multi[economic$CROPDMGEXP=="M"]<-1000000
        economic$c_multi[economic$CROPDMGEXP=="G"]<-1000000000
        economic$c_multi[economic$CROPDMGEXP==""] <-0

        # convert letters to numbers so we can calc
        economic$p_multi[economic$PROPDMGEXP=="K"]<-1000
        economic$p_multi[economic$PROPDMGEXP=="M"]<-1000000
        economic$p_multi[economic$PROPDMGEXP=="G"]<-1000000000
        economic$p_multi[economic$PROPDMGEXP==""] <-0


  1. Calculate the damage
        economic$cropCost<-economic$CROPDMG * economic$c_multi
        economic$propCost<-economic$PROPDMG * economic$p_multi
        # total damage
        economic$total <- economic$cropCost + economic$propCost


10 Aggregate the results

        # aggregate the data by EVENT type
        agg_economic<-aggregate(cbind(economic$total),by=list(Category=economic$EVTYPE), FUN=sum)
        # sort the table so we get top 10 deadliest 
        sort_econ<-agg_economic[order(-agg_economic$V1),]
        econTop <- sort_econ[1:10,]




Results


  1. Across the United States, these are the 10 deadliest weather events:
        ggplot(deathsTop,aes(x= reorder(Category,+V1),V1)) + 
        geom_bar(stat = "identity",fill="red",alpha=1, colour="black",) + coord_flip() + 
        geom_text(aes(label=round(V1, 1)), vjust= 0.5, hjust = 1.1, 
                  color="black", size=3) +
        labs(x="Weather Event") + labs(y="Deaths") + 
        labs(title=paste("Top 10 Weather Events by Fatalities", "1996-2011", sep="\n"))+
        theme(axis.text.x = element_text(angle = 90, hjust = 1))


  1. Across the United States, these 10 weather events cause the most injuries:
        ggplot(injuryTop,aes(x= reorder(Category,+V2),V2)) + 
        geom_bar(stat = "identity",fill="blue",alpha=1, colour="black",) +
        geom_text(aes(label=round(V2, 1)), vjust= 0.5, hjust = 1.1, 
                  color="white", size=3) +
        coord_flip() + 
        labs(x="Weather Event") + labs(y="Total Injuries") + 
        labs(title=paste("Top 10 Weather Events by Injuries", "1996-2011", sep="\n"))+
        theme(axis.text.x = element_text(angle = 90, hjust = 1))


  1. Across the United States, these 10 weather event types had the greatest economic consequences from 1996 to 2011:
        # change units to billions so graph is clearer
        econTop$mega <- econTop$V1/1000000000
        # Plot the economic impact findings
        ggplot(econTop,aes(x= reorder(Category,+mega),mega)) + 
        geom_bar(stat = "identity",fill="lightgreen",alpha=1, colour="darkgreen",) +
        geom_text(aes(label=round(mega, 2)), vjust= 0.5, hjust = 1.2, 
                  color="darkgreen", size=3) +
        coord_flip() + 
        labs(x="Weather Event") + labs(y="Total Damage (billions of dollars)") + 
        labs(title=paste("Top 10 Weather Events by Economic Impact", "1996-2011", sep="\n")) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1),
              plot.title = element_text(hjust = 0.5))



Conclusions

Excessive heat and tornados are the deadliest weather events to occur in the US over the time period 1996 to 2011. During the same period, tornados cause the most injuries by far. In economic terms, Wind related events and ice storms caused the greatest impact.
Thanks for reviewing.