In this report we use the R Statistical package to summarize the most hazardous weather events to both human and physical capital with Data from the U.S. National Oceanic and Atmospheric Administration’s Storm Database. The approach that was taken was to roll the data up to all time and group it by weather event type for one data set and event type and weather event type and state in another. We further limit the datasets to only look at weather events that have caused more damage to physical and human capital than either of the means of those measurements: injurries, fatalities, property damage and agricultural crop damage. What we find is that tornadoes cost the greatest cost to human and physical capital. Other than tornadoes, excessive heat poses the largest threat to human capital and floods are the cause of large amounts of physical capital damage.
This report uses a file generated by the U.S. National Oceanic and Atmospheric Administration’s Storm Database and is processed using the R Statistical Package and published using RStudio. This report includes the R code used to produce the report so verification of the results can be easily achieved.
The storm database was recieved as comma seperated value file and was imported into R as a data table using the below code. In addition, the below code loads all required libraries for analysis as well as a quick data overview.
library(knitr)
## Warning: package 'knitr' was built under R version 3.1.1
opts_chunk$set(echo = TRUE, results = 'hold')
library(data.table)
## Warning: package 'data.table' was built under R version 3.1.1
library(sqldf)
## Warning: package 'sqldf' was built under R version 3.1.1
## Loading required package: gsubfn
## Warning: package 'gsubfn' was built under R version 3.1.1
## Loading required package: proto
## Warning: package 'proto' was built under R version 3.1.1
## Loading required package: RSQLite
## Warning: package 'RSQLite' was built under R version 3.1.1
## Loading required package: DBI
## Warning: package 'DBI' was built under R version 3.1.1
## Loading required package: RSQLite.extfuns
## Warning: package 'RSQLite.extfuns' was built under R version 3.1.1
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.1
library(lattice)
temp<-"repdata-data-StormData.csv.bz2"
unz(temp,"repdata-data-StormData.csv")
## description
## "repdata-data-StormData.csv.bz2:repdata-data-StormData.csv"
## class
## "unz"
## mode
## "r"
## text
## "text"
## opened
## "closed"
## can read
## "yes"
## can write
## "yes"
storm<-data.table(read.csv("repdata-data-StormData.csv",head=TRUE,sep=","))
## Warning: closing unused connection 5
## (repdata-data-StormData.csv.bz2:repdata-data-StormData.csv)
unlink(temp)
str(storm)
## Classes 'data.table' and 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436774 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
## - attr(*, ".internal.selfref")=<externalptr>
cache = TRUE
Now that the data is loaded we need to make two subsets of data: inuries and fatalities group by weather event and economic costs group by weather event over time. The SQLDF package is the quickest way to get to these two subsets:
storm_health<-sqldf('select sum(FATALITIES) as FATALITIES, SUM(INJURIES) as INJURIES, STATE, EVTYPE from storm group by EVTYPE,STATE')
## Loading required package: tcltk
storm_cost<-sqldf('select sum(PROPDMG) as PROPDMG, SUM(CROPDMG) as CROPDMG, STATE, EVTYPE from storm group by EVTYPE,STATE')
cache = TRUE
Summary of The storm_health Data Subset
summary(storm_health)
## FATALITIES INJURIES STATE EVTYPE
## Min. : 0.0 Min. : 0 CA : 143 FLASH FLOOD : 55
## 1st Qu.: 0.0 1st Qu.: 0 PA : 141 FLOOD : 55
## Median : 0.0 Median : 0 TX : 135 FUNNEL CLOUD: 55
## Mean : 3.6 Mean : 33 NY : 127 HEAVY RAIN : 55
## 3rd Qu.: 1.0 3rd Qu.: 1 OK : 125 HIGH WIND : 55
## Max. :653.0 Max. :8207 NJ : 122 LIGHTNING : 55
## (Other):3465 (Other) :3928
Summary of The storm_cost Data Subset
summary(storm_cost)
## PROPDMG CROPDMG STATE EVTYPE
## Min. : 0 Min. : 0 CA : 143 FLASH FLOOD : 55
## 1st Qu.: 0 1st Qu.: 0 PA : 141 FLOOD : 55
## Median : 0 Median : 0 TX : 135 FUNNEL CLOUD: 55
## Mean : 2556 Mean : 324 NY : 127 HEAVY RAIN : 55
## 3rd Qu.: 232 3rd Qu.: 0 OK : 125 HIGH WIND : 55
## Max. :283097 Max. :201031 NJ : 122 LIGHTNING : 55
## (Other):3465 (Other) :3928
As we can see the vast majority of weather events are not hazardous to public health or property. We will create two more subsets that will require INURIES, FATALITIES, PROPDMG or CROPDMG to be greater than their respective means and combine INJURIES and FATALITIES to one measurement, which is a measurement of how many individual’s health was negatively effected, and PROPDMG and CROPDMG will also be combine which will be measuring total economic damage by weather event.
storm_health<-data.table(sqldf('select MAX(INJURIES)+MAX(FATALITIES) as COST_HEALTH, EVTYPE, STATE from storm_health where INJURIES > 143 or FATALITIES > 15 group by EVTYPE,STATE order by COST_HEALTH'))
storm_cost<-data.table(sqldf('select MAX(PROPDMG)+MAX(CROPDMG) as COST_ECON,EVTYPE, STATE from storm_cost where PROPDMG > 11050 or CROPDMG > 1399 group by EVTYPE, STATE order by COST_ECON'))
total_health<-data.table(sqldf('select SUM(COST_HEALTH) as TOTAL_HEALTH, EVTYPE from storm_health group by EVTYPE order by TOTAL_HEALTH'))
total_econ<-data.table(sqldf('select SUM(COST_ECON) as TOTAL_ECON, EVTYPE from storm_cost group by EVTYPE order by TOTAL_ECON'))
total_cost<-sqldf('select TOTAL_HEALTH, TOTAL_ECON, total_health.EVTYPE from total_health join total_econ on total_health.EVTYPE=total_econ.EVTYPE order by TOTAL_HEALTH')
Below are the weather events that cost the most in human injuries and fatalities for the entire US for the entirety of the dataset, 1950-2011.
total_health
## TOTAL_HEALTH EVTYPE
## 1: 17 RECORD/EXCESSIVE HEAT
## 2: 25 TORNADOES, TSTM WIND, HAIL
## 3: 29 UNSEASONABLY WARM AND DRY
## 4: 41 LANDSLIDE
## 5: 51 COLD/WIND CHILL
## 6: 65 HURRICANE
## 7: 69 EXTREME COLD/WIND CHILL
## 8: 70 EXTREME COLD
## 9: 86 EXTREME HEAT
## 10: 108 HIGH SURF
## 11: 131 HEAVY RAIN
## 12: 153 WILD FIRES
## 13: 157 THUNDERSTORM WIND
## 14: 161 TSUNAMI
## 15: 166 GLAZE
## 16: 170 WINTER WEATHER
## 17: 171 DENSE FOG
## 18: 194 DUST STORM
## 19: 230 TROPICAL STORM
## 20: 291 HAIL
## 21: 342 AVALANCHE
## 22: 359 WILD/FOREST FIRE
## 23: 377 RIP CURRENTS
## 24: 386 HIGH WIND
## 25: 409 HEAT WAVE
## 26: 431 FOG
## 27: 501 RIP CURRENT
## 28: 568 BLIZZARD
## 29: 578 HEAVY SNOW
## 30: 741 WINTER STORM
## 31: 823 WILDFIRE
## 32: 1291 HURRICANE/TYPHOON
## 33: 1695 ICE STORM
## 34: 2315 FLASH FLOOD
## 35: 2747 HEAT
## 36: 4544 LIGHTNING
## 37: 6233 TSTM WIND
## 38: 6837 FLOOD
## 39: 8041 EXCESSIVE HEAT
## 40: 96300 TORNADO
## TOTAL_HEALTH EVTYPE
As we can see tornadoes cause the vast majority of injuries and fatalities in the US.
Below are the weather events that cost the most in terms of property damage and crop damage in the entire US for the entirety of the dataset, 1950-2011.
total_econ
## TOTAL_ECON EVTYPE
## 1: 1802 SMALL HAIL
## 2: 2336 DUST STORM
## 3: 2465 HURRICANE/TYPHOON
## 4: 2615 HIGH WINDS/COLD
## 5: 4350 FLOODING
## 6: 4778 HURRICANE
## 7: 8352 URBAN/SML STREAM FLD
## 8: 11662 RIVER FLOOD
## 9: 12338 URBAN FLOOD
## 10: 16290 FLASH FLOODING
## 11: 22360 WILD/FOREST FIRE
## 12: 29526 DROUGHT
## 13: 32018 HEAVY RAIN
## 14: 50986 HEAVY SNOW
## 15: 52340 WILDFIRE
## 16: 57216 WINTER STORM
## 17: 148194 HIGH WIND
## 18: 358728 THUNDERSTORM WINDS
## 19: 486990 LIGHTNING
## 20: 868922 THUNDERSTORM WIND
## 21: 968907 FLOOD
## 22: 1219731 HAIL
## 23: 1396417 TSTM WIND
## 24: 1498017 FLASH FLOOD
## 25: 3221176 TORNADO
## TOTAL_ECON EVTYPE
Below is the weather events that cost the most in terms of both human life and property.
total_cost
## TOTAL_HEALTH TOTAL_ECON EVTYPE
## 1 65 4778 HURRICANE
## 2 131 32018 HEAVY RAIN
## 3 157 868922 THUNDERSTORM WIND
## 4 194 2336 DUST STORM
## 5 291 1219731 HAIL
## 6 359 22360 WILD/FOREST FIRE
## 7 386 148194 HIGH WIND
## 8 578 50986 HEAVY SNOW
## 9 741 57216 WINTER STORM
## 10 823 52340 WILDFIRE
## 11 1291 2465 HURRICANE/TYPHOON
## 12 2315 1498017 FLASH FLOOD
## 13 4544 486990 LIGHTNING
## 14 6233 1396417 TSTM WIND
## 15 6837 968907 FLOOD
## 16 96300 3221176 TORNADO
The following plots are the total cost in human capital and physical capital, respectively, based on the State.
ggplot(storm_health, aes(y=COST_HEALTH, x=EVTYPE)) +geom_histogram(stat="identity",aes(fill=EVTYPE)) + facet_wrap(~ STATE)
As we can see by this plot we find that the majority of the states do not have severe injuries or fatalities due to weather except for those states in “tornado alley.”
ggplot(storm_cost, aes(y=COST_ECON, x=EVTYPE)) +geom_histogram(stat="identity",aes(fill=EVTYPE)) + facet_wrap(~ STATE)
Flood damage is a problem in nearly every state.