This report shows which severe weather events have the greatest impact on public health and economy. The study is based on data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (wheather events from 1950 to 2011). That database contains data of major storms and weather events, when and where they occur, and what damage (injuries, fatalities, property, crop) was generated. The top 10 most damaging events for each catagory were identified. It was found, that within the period from 1950 to 2011 Tornados had the strongest impact on public health, causing about 5500 fatalities and 90000 injuries. Floods caused the most damage on economy, with expenses of about 180 billion $.
## Loading required package: knitr
## Loading required package: downloader
## Loading required package: stringr
The data for this study comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database tracks characteristics of major storms and weather events in the United States [1]. The data comes in the form of a comma-separated-value file compressed via the bzip2 [2].
bzFilename <- "StormData.csv.bz2"
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download(fileUrl,destfile=bzFilename, mode="wb")
The file was downloaded at 2015-05-23 09:58:31
The data is unzipped while loading. To get an overview of the data, its structure and the first two rows are printed
# read and unzip datafile (can take several minutes)
data <- read.csv(bzfile(bzFilename),sep=",",quote="\"")
# structure of data
str(data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
# print first rows
head(data,2)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14 100 3 0 0
## 2 NA 0 2 150 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
As we are only interested in data which causes fatalities, injuries and economic damages (property and crop), we subset the data, selecting only thoses rows, where the columns FATALITIES , INJURIES, PROPDMG and CROPDMG have values greater than zero.
eventData <- subset(data, FATALITIES >0 | INJURIES >0 | PROPDMG > 0 | CROPDMG > 0)
dim(eventData)
## [1] 254633 37
Only about every fourth event of the file contains damage-relevant data.
Next we convert the column EVTYPE to upper case and trim leading white spaces.
eventData$EVTYPE <- toupper(str_trim(eventData$EVTYPE))
Then the EVTYPE values are corrected (misspelling) and some categories summarized. For details of the different event types see chapter 7 of the Storm Data Documentation [1].
eventData[grep("AVALANC",eventData$EVTYPE),]$EVTYPE <- "AVALANCHE"
eventData[grep("BLIZZARD",eventData$EVTYPE),]$EVTYPE <- "BLIZZARD"
eventData[grep("HAIL",eventData$EVTYPE),]$EVTYPE <- "HAIL"
eventData[grep("HEAVY RAIN",eventData$EVTYPE),]$EVTYPE <- "HEAVY RAIN"
eventData[grep("WATERSPOUT",eventData$EVTYPE),]$EVTYPE <- "WATERSPOUT"
eventData[grep("HURRICANE",eventData$EVTYPE),]$EVTYPE <- "HURRICANE"
eventData[grep("THU.*TORM|THUNDER.*WIND|TSTM|TUND.*STORM",eventData$EVTYPE),]$EVTYPE <- "THUNDERSTORM"
eventData[grep("TORNADO|TORND",eventData$EVTYPE),]$EVTYPE <- "TORNADO"
eventData[grep("RIP CURRENT",eventData$EVTYPE),]$EVTYPE <- "RIP CURRENT"
eventData[grep("STRONG WIND",eventData$EVTYPE),]$EVTYPE <- "STRONG WIND"
eventData[grep("LIG.*ING",eventData$EVTYPE),]$EVTYPE <- "LIGHTNING"
eventData[grep("WINTER WEATHER",eventData$EVTYPE),]$EVTYPE <- "WINTER WEATHER"
eventData[grep("WINTER STORM",eventData$EVTYPE),]$EVTYPE <- "WINTER STORM"
eventData[grep("TROPICAL STORM",eventData$EVTYPE),]$EVTYPE <- "TROPICAL STORM"
eventData[grep("HEAVY SNOW",eventData$EVTYPE),]$EVTYPE <- "HEAVY SNOW"
eventData[grep("H.*VY RAIN",eventData$EVTYPE),]$EVTYPE <- "HEAVY RAIN"
eventData[grep("WILD.*FIRE",eventData$EVTYPE),]$EVTYPE <- "WILDFIRE"
eventData[grep("HURRICANE",eventData$EVTYPE),]$EVTYPE <- "HURRICANE"
eventData[grep("FLOOD",eventData$EVTYPE),]$EVTYPE <- "FLOOD"
To facilitate a better calculation of expenses, the values in PROPDMGEXP and CROPDMGEXP columns are converted from values like: 0, 1, 2, 3, 4, 5, 6, 7, 8, B, h, H, K, m, M to ‘billion $’ values:
We create a magnitude table to store this information.
magnitude <- c(0,1,2,3,4,5,6,7,8,9,"h","H","k","K","m","M","b","B")
magNumber <- c(1e-9,1e-8,1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1, 1e-7,1e-7, 1e-6,1e-6, 1e-3,1e-3, 1,1)
magTable <- data.frame(magnitude=magnitude,magNumber=magNumber)
magTable
## magnitude magNumber
## 1 0 1e-09
## 2 1 1e-08
## 3 2 1e-07
## 4 3 1e-06
## 5 4 1e-05
## 6 5 1e-04
## 7 6 1e-03
## 8 7 1e-02
## 9 8 1e-01
## 10 9 1e+00
## 11 h 1e-07
## 12 H 1e-07
## 13 k 1e-06
## 14 K 1e-06
## 15 m 1e-03
## 16 M 1e-03
## 17 b 1e+00
## 18 B 1e+00
Next the damage values are converted with a the fv function based on the magnitude table. The converted expense values are stored in two new columns CROPDMGEXP2 and PROPDMGEXP2
# transform magnitude character to billion-dollar-number
fv <- function(x){
if(x %in% magTable$magnitude) {
magTable[magnitude==x,]$magNumber
} else {
0
}
}
# new columns with expenses as numeric in billions
eventData$CROPDMGEXP2 <- sapply(eventData$CROPDMGEXP, fv )
eventData$PROPDMGEXP2 <- sapply(eventData$PROPDMGEXP, fv )
Both damage values multiplied with its magnitudes (CROPDMG * CROPDMGEXP2) and (PROPDMG * PROPDMGEXP2) are added to create a new column TOTALEXP.
eventData$TOTALEXP <- eventData$PROPDMG*eventData$PROPDMGEXP2 +
eventData$CROPDMG*eventData$CROPDMGEXP2
The most harmful events to public health can be calculated by taking the top event types for fatalities and injuries
The eventTop function calculates the 10 most important event types of a given damage type (FATALITIES, INJURIES, TOTALEXP).
eventTop <- function(data, column){
# sum of 'column' per EVTYPE (as an array)
ev <- tapply(data[,column], data$EVTYPE, sum)
# convert ev (as.numeric) to data.frame
evdf <- data.frame(EVTYPE=names(ev), value=as.numeric(ev))
# sort (value descendant)
evdf <- evdf[order(evdf$value,decreasing=TRUE),]
# and return the top 10 event types
evdf <- evdf[1:10,]
}
We first calculate the total number of fatalities for the event types that caused the most fatailities.
fatEvents <- eventTop(eventData, "FATALITIES")
print(fatEvents)
## EVTYPE value
## 190 TORNADO 5633
## 40 EXCESSIVE HEAT 1903
## 51 FLOOD 1524
## 78 HEAT 937
## 133 LIGHTNING 817
## 189 THUNDERSTORM 725
## 158 RIP CURRENT 577
## 102 HIGH WIND 248
## 6 AVALANCHE 225
## 216 WINTER STORM 217
The following barplot shows the top 10 fatalities-causing events
par(mar=c(13,7,3.5,1),las=2)
barplot(fatEvents$value/1000, names.arg=fatEvents$EVTYPE,
ylab="Total Number of Fatalities (thousand)",
ylim=c(0,6),
main="Total Fatalities per Event Type\n(Top 10 most harmful event types)")
title(xlab = "Event Type", line=9)
We calculate the total numbers of injuries for the event types that caused the most injuries
injEvents <- eventTop(eventData, "INJURIES")
print(injEvents)
## EVTYPE value
## 190 TORNADO 91364
## 189 THUNDERSTORM 9448
## 51 FLOOD 8604
## 40 EXCESSIVE HEAT 6525
## 133 LIGHTNING 5231
## 78 HEAT 2100
## 122 ICE STORM 1975
## 210 WILDFIRE 1606
## 75 HAIL 1467
## 216 WINTER STORM 1353
The following barplot shows the top 10 injuries for each event type.
par(mar=c(13,7,3.5,1),las=2)
barplot(injEvents$value/1000, names.arg=injEvents$EVTYPE,
ylab="Total Number of Injuries (thousand)",
ylim=c(0,100),
main="Total Injuries per Event Type\n(Top 10 most harmful event types)")
title(xlab = "Event Type", line=9)
From both barplots (fatalities and injuries) we can see that TORNADOs have by far the largest impact on population health, with about 5,500 fatalities and 90,000 injuries over the period of 62 years (1950 to 2011, incl.).
We calculate the total expense for the event types that caused the greatest eceonomic damage.
expEvents <- eventTop(eventData, "TOTALEXP")
print(expEvents)
## EVTYPE value
## 51 FLOOD 180.574425
## 112 HURRICANE 90.271473
## 190 TORNADO 57.367114
## 185 STORM SURGE 43.323541
## 75 HAIL 20.737204
## 31 DROUGHT 15.018672
## 189 THUNDERSTORM 12.346958
## 122 ICE STORM 8.967041
## 210 WILDFIRE 8.894345
## 193 TROPICAL STORM 8.409287
The following barplot shows the top 10 events causing the greatest economic consequences.
par(mar=c(13,7,3.5,1),las=2)
barplot(expEvents$value, names.arg=expEvents$EVTYPE,
ylab="Total Expense (billion $)",
ylim=c(0,200),
main="Total Expense per Event Type\n(Top 10 most harmful event types)")
title(xlab = "Event Type", line=9)
From the graph it can easily be seen, that FLOOD has the greatest economic consequences with expenses of about 180 billion $. On the second and third position are HURRICANE and TORNADO with about 90 and 60 billion $, resp.
This report was based on the storm database of the U.S. National Oceanic and Atmospheric Administration’s (NOAA), which contains wheather events from 1950 to 2011. We showed that Tornados had the strongest impact on public health causing about 5500 fatalities and 90000 injuries. We also showed that Flood caused the most damage on economy with about 180 billion $.
[1] Storm Data Documentation, URL:https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf
[2] Storm Data, URL:https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2