Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This data analysis explores the NOAA Storm Database and answer some basic questions about severe weather events. Based on the results of the data analysis described next, tornados have the most adverse impact to public health across the United States. Tornados also have the greatest economic impact based on property damage data. Crop damage is also discussed in this analysis.
This data analysis is based on two public sources using the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database:
- National Weather Service Storm Data Documentation.
- National Climatic Data Center Storm Events, FAQ.
The data can be downloaded here. First download the data, then decompress and read the data as followed:
df_rawDataFromBz2file <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
#returns data frame with 902297 obs. of 37 variables
The data is then categorized by event type for analysis. Event types include 985 different factors including TORNADO, WINTER STORM, HIGH SURF ADVISORY, etc. The structure of the dataset published by NOAA is described below:
#split by event type, $ EVTYPE: factor
splitPerEventType <- split(df_rawDataFromBz2file, df_rawDataFromBz2file$EVTYPE, drop=TRUE)
str(df_rawDataFromBz2file)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
First, as shown below fatalities are highest for the following event types:
1. Tornados,
2. Excessive heat,
3. Flash flood,
4. Heat, and
5. Lightning.
#add fatalities for each event type
splitPerEventTypeSum <- sapply(splitPerEventType, function(x) { sum(x$FATALITIES)})
#sort the totals starting with highest counts
sortedEvents <- sort(splitPerEventTypeSum, decreasing = TRUE)
topEvents <- head(sortedEvents, 10)
#plot the results
op <- par(mar = c(14,8,4,2) + 0.1)
barplot(topEvents,
main="Event types with highest number of fatalities",
ylab = "Fatality count",
ylim=c(0,1000+max(topEvents)),
las = 2)
More specifically, the fatality count per event type is listed below:
topEvents
## TORNADO EXCESSIVE HEAT FLASH FLOOD HEAT LIGHTNING
## 5633 1903 978 937 816
## TSTM WIND FLOOD RIP CURRENT HIGH WIND AVALANCHE
## 504 470 368 248 224
Second, as shown below injuries are highest for the following event types:
1. Tornados,
2. Thunderstorm wind,
3. Flood,
4. Excessive heat, and
5. Lightning.
#add injuries for each event type
splitSumInjuries <- sapply(splitPerEventType, function(x) { sum(x$INJURIES)})
#sort the totals starting with highest counts
sortedInjuries <- sort(splitSumInjuries, decreasing = TRUE)
topInjuries <- head(sortedInjuries, 10)
#plot the results
op <- par(mar = c(14,8,4,2) + 0.1)
barplot(topInjuries,
main="Event types and the number of injuries",
ylim=c(0,1000+max(topInjuries)),
las = 2)
More specifically, the injury count per event type is listed below:
topInjuries
## TORNADO TSTM WIND FLOOD EXCESSIVE HEAT
## 91346 6957 6789 6525
## LIGHTNING HEAT ICE STORM FLASH FLOOD
## 5230 2100 1975 1777
## THUNDERSTORM WIND HAIL
## 1488 1361
To achieve this we’ll look at property damage (PROPDMG) and crop damage (CROPDMG), and the sum of both per event type will indicate the outcome.
As shown below the event types with the greatest economic consequences are:
1. Tornados,
2. Flash floods,
3. Thunderstorm wind,
4. Hail, and
5. Flood.
#add property and crop damage for each event type
splitSumEconomic <- sapply(splitPerEventType, function(x) { sum(x$PROPDMG) + sum(x$CROPDMG)})
#sort the totals starting with highest counts
sortedEconomic <- sort(splitSumEconomic, decreasing = TRUE)
topEconomic <- head(sortedEconomic, 10)
#plot the results
op <- par(mar = c(14,8,4,2) + 0.1)
barplot(topEconomic,
main="Highest economic cost per event type",
ylim=c(0,max(topEconomic)*1.1),
las = 2)
More specifically, the top 10 event types with the greatest economic consequences are listed below:
topEconomic
## TORNADO FLASH FLOOD TSTM WIND
## 3312276.7 1599325.1 1445168.2
## HAIL FLOOD THUNDERSTORM WIND
## 1268289.7 1067976.4 943635.6
## LIGHTNING THUNDERSTORM WINDS HIGH WIND
## 606932.4 464978.1 342014.8
## WINTER STORM
## 134699.6
Based on these results, suggested future analyses on this dataset would include:
Comparing the trends for severe events in each 10-year period from 1950 to 2011,
Looking at specific regions with similar weather patterns, ex. states located in the Northeast region are likely to have different trends than the Southwest region.
Comparing individual states within regions experiencing similar weather patterns and determine if the impacts can be correlated with demographic data.