Synopsis

In this report we use the R Statistical package to summarize the most hazardous weather events to both human and physical capital with Data from the U.S. National Oceanic and Atmospheric Administration’s Storm Database. The approach that was taken was to roll the data up to all time and group it by weather event type for one data set and event type and weather event type and state in another. We further limit the datasets to only look at weather events that have caused more damage to physical and human capital than either of the means of those measurements: injurries, fatalities, property damage and agricultural crop damage. What we find is that tornadoes cost the greatest cost to human and physical capital. Other than tornadoes, excessive heat poses the largest threat to human capital and floods are the cause of large amounts of physical capital damage.

Data Processing

This report uses a file generated by the U.S. National Oceanic and Atmospheric Administration’s Storm Database and is processed using the R Statistical Package and published using RStudio. This report includes the R code used to produce the report so verification of the results can be easily achieved.

The storm database was recieved as comma seperated value file and was imported into R as a data table using the below code. In addition, the below code loads all required libraries for analysis as well as a quick data overview.

library(knitr)
## Warning: package 'knitr' was built under R version 3.1.1
opts_chunk$set(echo = TRUE, results = 'hold')
library(data.table)
## Warning: package 'data.table' was built under R version 3.1.1
library(sqldf)
## Warning: package 'sqldf' was built under R version 3.1.1
## Loading required package: gsubfn
## Warning: package 'gsubfn' was built under R version 3.1.1
## Loading required package: proto
## Warning: package 'proto' was built under R version 3.1.1
## Loading required package: RSQLite
## Warning: package 'RSQLite' was built under R version 3.1.1
## Loading required package: DBI
## Warning: package 'DBI' was built under R version 3.1.1
## Loading required package: RSQLite.extfuns
## Warning: package 'RSQLite.extfuns' was built under R version 3.1.1
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.1
library(lattice)
temp<-"repdata-data-StormData.csv.bz2"
unz(temp,"repdata-data-StormData.csv")
##                                                 description 
## "repdata-data-StormData.csv.bz2:repdata-data-StormData.csv" 
##                                                       class 
##                                                       "unz" 
##                                                        mode 
##                                                         "r" 
##                                                        text 
##                                                      "text" 
##                                                      opened 
##                                                    "closed" 
##                                                    can read 
##                                                       "yes" 
##                                                   can write 
##                                                       "yes"
storm<-data.table(read.csv("repdata-data-StormData.csv",head=TRUE,sep=","))
## Warning: closing unused connection 5
## (repdata-data-StormData.csv.bz2:repdata-data-StormData.csv)
unlink(temp)
str(storm)
## Classes 'data.table' and 'data.frame':   902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
##  - attr(*, ".internal.selfref")=<externalptr>
cache = TRUE

Now that the data is loaded we need to make two subsets of data: inuries and fatalities group by weather event and economic costs group by weather event over time. The SQLDF package is the quickest way to get to these two subsets:

storm_health<-sqldf('select sum(FATALITIES) as FATALITIES, SUM(INJURIES) as INJURIES, STATE, EVTYPE from storm group by EVTYPE,STATE')
## Loading required package: tcltk
storm_cost<-sqldf('select sum(PROPDMG) as PROPDMG, SUM(CROPDMG) as CROPDMG, STATE, EVTYPE from storm group by EVTYPE,STATE')
cache = TRUE

Summary of The storm_health Data Subset

summary(storm_health)
##    FATALITIES       INJURIES        STATE               EVTYPE    
##  Min.   :  0.0   Min.   :   0   CA     : 143   FLASH FLOOD :  55  
##  1st Qu.:  0.0   1st Qu.:   0   PA     : 141   FLOOD       :  55  
##  Median :  0.0   Median :   0   TX     : 135   FUNNEL CLOUD:  55  
##  Mean   :  3.6   Mean   :  33   NY     : 127   HEAVY RAIN  :  55  
##  3rd Qu.:  1.0   3rd Qu.:   1   OK     : 125   HIGH WIND   :  55  
##  Max.   :653.0   Max.   :8207   NJ     : 122   LIGHTNING   :  55  
##                                 (Other):3465   (Other)     :3928

Summary of The storm_cost Data Subset

summary(storm_cost)
##     PROPDMG          CROPDMG           STATE               EVTYPE    
##  Min.   :     0   Min.   :     0   CA     : 143   FLASH FLOOD :  55  
##  1st Qu.:     0   1st Qu.:     0   PA     : 141   FLOOD       :  55  
##  Median :     0   Median :     0   TX     : 135   FUNNEL CLOUD:  55  
##  Mean   :  2556   Mean   :   324   NY     : 127   HEAVY RAIN  :  55  
##  3rd Qu.:   232   3rd Qu.:     0   OK     : 125   HIGH WIND   :  55  
##  Max.   :283097   Max.   :201031   NJ     : 122   LIGHTNING   :  55  
##                                    (Other):3465   (Other)     :3928

As we can see the vast majority of weather events are not hazardous to public health or property. We will create two more subsets that will require INURIES, FATALITIES, PROPDMG or CROPDMG to be greater than their respective means and combine INJURIES and FATALITIES to one measurement, which is a measurement of how many individual’s health was negatively effected, and PROPDMG and CROPDMG will also be combine which will be measuring total economic damage by weather event.

storm_health<-data.table(sqldf('select MAX(INJURIES)+MAX(FATALITIES) as COST_HEALTH, EVTYPE, STATE  from storm_health where INJURIES > 143 or FATALITIES > 15 group by EVTYPE,STATE order by COST_HEALTH'))
storm_cost<-data.table(sqldf('select MAX(PROPDMG)+MAX(CROPDMG) as COST_ECON,EVTYPE, STATE from storm_cost where PROPDMG > 11050 or CROPDMG > 1399 group by EVTYPE, STATE order by COST_ECON'))
total_health<-data.table(sqldf('select SUM(COST_HEALTH) as TOTAL_HEALTH, EVTYPE from storm_health group by EVTYPE order by TOTAL_HEALTH'))
total_econ<-data.table(sqldf('select SUM(COST_ECON) as TOTAL_ECON, EVTYPE from storm_cost group by EVTYPE order by TOTAL_ECON'))
total_cost<-sqldf('select TOTAL_HEALTH, TOTAL_ECON, total_health.EVTYPE from total_health join total_econ on total_health.EVTYPE=total_econ.EVTYPE order by TOTAL_HEALTH')   

RESULTS

Below are the weather events that cost the most in human injuries and fatalities for the entire US for the entirety of the dataset, 1950-2011.

total_health
##     TOTAL_HEALTH                     EVTYPE
##  1:           17      RECORD/EXCESSIVE HEAT
##  2:           25 TORNADOES, TSTM WIND, HAIL
##  3:           29  UNSEASONABLY WARM AND DRY
##  4:           41                  LANDSLIDE
##  5:           51            COLD/WIND CHILL
##  6:           65                  HURRICANE
##  7:           69    EXTREME COLD/WIND CHILL
##  8:           70               EXTREME COLD
##  9:           86               EXTREME HEAT
## 10:          108                  HIGH SURF
## 11:          131                 HEAVY RAIN
## 12:          153                 WILD FIRES
## 13:          157          THUNDERSTORM WIND
## 14:          161                    TSUNAMI
## 15:          166                      GLAZE
## 16:          170             WINTER WEATHER
## 17:          171                  DENSE FOG
## 18:          194                 DUST STORM
## 19:          230             TROPICAL STORM
## 20:          291                       HAIL
## 21:          342                  AVALANCHE
## 22:          359           WILD/FOREST FIRE
## 23:          377               RIP CURRENTS
## 24:          386                  HIGH WIND
## 25:          409                  HEAT WAVE
## 26:          431                        FOG
## 27:          501                RIP CURRENT
## 28:          568                   BLIZZARD
## 29:          578                 HEAVY SNOW
## 30:          741               WINTER STORM
## 31:          823                   WILDFIRE
## 32:         1291          HURRICANE/TYPHOON
## 33:         1695                  ICE STORM
## 34:         2315                FLASH FLOOD
## 35:         2747                       HEAT
## 36:         4544                  LIGHTNING
## 37:         6233                  TSTM WIND
## 38:         6837                      FLOOD
## 39:         8041             EXCESSIVE HEAT
## 40:        96300                    TORNADO
##     TOTAL_HEALTH                     EVTYPE

As we can see tornadoes cause the vast majority of injuries and fatalities in the US.

Below are the weather events that cost the most in terms of property damage and crop damage in the entire US for the entirety of the dataset, 1950-2011.

total_econ
##     TOTAL_ECON               EVTYPE
##  1:       1802           SMALL HAIL
##  2:       2336           DUST STORM
##  3:       2465    HURRICANE/TYPHOON
##  4:       2615      HIGH WINDS/COLD
##  5:       4350             FLOODING
##  6:       4778            HURRICANE
##  7:       8352 URBAN/SML STREAM FLD
##  8:      11662          RIVER FLOOD
##  9:      12338          URBAN FLOOD
## 10:      16290       FLASH FLOODING
## 11:      22360     WILD/FOREST FIRE
## 12:      29526              DROUGHT
## 13:      32018           HEAVY RAIN
## 14:      50986           HEAVY SNOW
## 15:      52340             WILDFIRE
## 16:      57216         WINTER STORM
## 17:     148194            HIGH WIND
## 18:     358728   THUNDERSTORM WINDS
## 19:     486990            LIGHTNING
## 20:     868922    THUNDERSTORM WIND
## 21:     968907                FLOOD
## 22:    1219731                 HAIL
## 23:    1396417            TSTM WIND
## 24:    1498017          FLASH FLOOD
## 25:    3221176              TORNADO
##     TOTAL_ECON               EVTYPE

Below is the weather events that cost the most in terms of both human life and property.

total_cost
##    TOTAL_HEALTH TOTAL_ECON            EVTYPE
## 1            65       4778         HURRICANE
## 2           131      32018        HEAVY RAIN
## 3           157     868922 THUNDERSTORM WIND
## 4           194       2336        DUST STORM
## 5           291    1219731              HAIL
## 6           359      22360  WILD/FOREST FIRE
## 7           386     148194         HIGH WIND
## 8           578      50986        HEAVY SNOW
## 9           741      57216      WINTER STORM
## 10          823      52340          WILDFIRE
## 11         1291       2465 HURRICANE/TYPHOON
## 12         2315    1498017       FLASH FLOOD
## 13         4544     486990         LIGHTNING
## 14         6233    1396417         TSTM WIND
## 15         6837     968907             FLOOD
## 16        96300    3221176           TORNADO

The following plots are the total cost in human capital and physical capital, respectively, based on the State.

ggplot(storm_health, aes(y=COST_HEALTH, x=EVTYPE)) +geom_histogram(stat="identity",aes(fill=EVTYPE)) + facet_wrap(~ STATE)

plot of chunk unnamed-chunk-9

As we can see by this plot we find that the majority of the states do not have severe injuries or fatalities due to weather except for those states in “tornado alley.”

ggplot(storm_cost, aes(y=COST_ECON, x=EVTYPE)) +geom_histogram(stat="identity",aes(fill=EVTYPE)) + facet_wrap(~ STATE)

plot of chunk unnamed-chunk-10

Flood damage is a problem in nearly every state.