Synopsis

This report shows which severe weather events have the greatest impact on public health and economy. The study is based on data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database (wheather events from 1950 to 2011). That database contains data of major storms and weather events, when and where they occur, and what damage (injuries, fatalities, property, crop) was generated. The top 10 most damaging events for each catagory were identified. It was found, that within the period from 1950 to 2011 Tornados had the strongest impact on public health, causing about 5500 fatalities and 90000 injuries. Floods caused the most damage on economy, with expenses of about 180 billion $.

## Loading required package: knitr
## Loading required package: downloader
## Loading required package: stringr

Data Processing

The data for this study comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The database tracks characteristics of major storms and weather events in the United States [1]. The data comes in the form of a comma-separated-value file compressed via the bzip2 [2].

Getting the Datafile

bzFilename <- "StormData.csv.bz2"
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download(fileUrl,destfile=bzFilename, mode="wb")

The file was downloaded at 2015-05-23 09:58:31

Loading the Data

The data is unzipped while loading. To get an overview of the data, its structure and the first two rows are printed

# read and unzip datafile (can take several minutes)
data <- read.csv(bzfile(bzFilename),sep=",",quote="\"")

# structure of data
str(data) 
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
# print first rows
head(data,2)
##   STATE__          BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1 4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1 4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                        14   100 3   0          0
## 2         NA         0                         2   150 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2

Filtering and Cleaning the Data

As we are only interested in data which causes fatalities, injuries and economic damages (property and crop), we subset the data, selecting only thoses rows, where the columns FATALITIES , INJURIES, PROPDMG and CROPDMG have values greater than zero.

  • FATALITIES/INJURIES - Number of fatalities and injuries caused by the weather event
  • PROPDMG/CROPDMG - Property and crop damage caused by the weather event
  • PROPDMGEXP/CROPDMGEXP - Character signifying the magnitude of the damage number
eventData <- subset(data, FATALITIES >0 | INJURIES >0 | PROPDMG > 0 | CROPDMG > 0)  
dim(eventData)
## [1] 254633     37

Only about every fourth event of the file contains damage-relevant data.

Next we convert the column EVTYPE to upper case and trim leading white spaces.

  • EVTYPE - Type of wheather event
eventData$EVTYPE <- toupper(str_trim(eventData$EVTYPE))

Then the EVTYPE values are corrected (misspelling) and some categories summarized. For details of the different event types see chapter 7 of the Storm Data Documentation [1].

eventData[grep("AVALANC",eventData$EVTYPE),]$EVTYPE <- "AVALANCHE"
eventData[grep("BLIZZARD",eventData$EVTYPE),]$EVTYPE <- "BLIZZARD"
eventData[grep("HAIL",eventData$EVTYPE),]$EVTYPE <- "HAIL"
eventData[grep("HEAVY RAIN",eventData$EVTYPE),]$EVTYPE <- "HEAVY RAIN"
eventData[grep("WATERSPOUT",eventData$EVTYPE),]$EVTYPE <- "WATERSPOUT"
eventData[grep("HURRICANE",eventData$EVTYPE),]$EVTYPE <- "HURRICANE"
eventData[grep("THU.*TORM|THUNDER.*WIND|TSTM|TUND.*STORM",eventData$EVTYPE),]$EVTYPE <- "THUNDERSTORM"
eventData[grep("TORNADO|TORND",eventData$EVTYPE),]$EVTYPE <- "TORNADO"
eventData[grep("RIP CURRENT",eventData$EVTYPE),]$EVTYPE <- "RIP CURRENT"
eventData[grep("STRONG WIND",eventData$EVTYPE),]$EVTYPE <- "STRONG WIND"
eventData[grep("LIG.*ING",eventData$EVTYPE),]$EVTYPE <- "LIGHTNING"
eventData[grep("WINTER WEATHER",eventData$EVTYPE),]$EVTYPE <- "WINTER WEATHER"
eventData[grep("WINTER STORM",eventData$EVTYPE),]$EVTYPE <- "WINTER STORM"
eventData[grep("TROPICAL STORM",eventData$EVTYPE),]$EVTYPE <- "TROPICAL STORM"
eventData[grep("HEAVY SNOW",eventData$EVTYPE),]$EVTYPE <- "HEAVY SNOW"
eventData[grep("H.*VY RAIN",eventData$EVTYPE),]$EVTYPE <- "HEAVY RAIN"
eventData[grep("WILD.*FIRE",eventData$EVTYPE),]$EVTYPE <- "WILDFIRE"
eventData[grep("HURRICANE",eventData$EVTYPE),]$EVTYPE <- "HURRICANE"
eventData[grep("FLOOD",eventData$EVTYPE),]$EVTYPE <- "FLOOD"

Converting expense values

To facilitate a better calculation of expenses, the values in PROPDMGEXP and CROPDMGEXP columns are converted from values like: 0, 1, 2, 3, 4, 5, 6, 7, 8, B, h, H, K, m, M to ‘billion $’ values:

  • 0 becomes 1e-9 in terms of billions
  • 1 becomes 1e-8 in terms of billions
  • 2 becomes 1e-7 in terms of billions
  • k|K (kilo=thousand) becomes 1e-6 in terms of billions
  • h|H (hundred) becomes 1e-7 in terms of billions
  • m|H (million) becomes 1e-3 in terms of billions
  • b|B (billion) becomes 1 in terms of billions

We create a magnitude table to store this information.

magnitude <- c(0,1,2,3,4,5,6,7,8,9,"h","H","k","K","m","M","b","B")
magNumber <- c(1e-9,1e-8,1e-7,1e-6,1e-5,1e-4,1e-3,1e-2,1e-1,1, 1e-7,1e-7, 1e-6,1e-6, 1e-3,1e-3, 1,1)
magTable <- data.frame(magnitude=magnitude,magNumber=magNumber)
magTable
##    magnitude magNumber
## 1          0     1e-09
## 2          1     1e-08
## 3          2     1e-07
## 4          3     1e-06
## 5          4     1e-05
## 6          5     1e-04
## 7          6     1e-03
## 8          7     1e-02
## 9          8     1e-01
## 10         9     1e+00
## 11         h     1e-07
## 12         H     1e-07
## 13         k     1e-06
## 14         K     1e-06
## 15         m     1e-03
## 16         M     1e-03
## 17         b     1e+00
## 18         B     1e+00

Next the damage values are converted with a the fv function based on the magnitude table. The converted expense values are stored in two new columns CROPDMGEXP2 and PROPDMGEXP2

# transform magnitude character to billion-dollar-number
fv <- function(x){
    if(x %in% magTable$magnitude) {
        magTable[magnitude==x,]$magNumber
    } else {
        0
    }
}

# new columns with expenses as numeric in billions
eventData$CROPDMGEXP2 <- sapply(eventData$CROPDMGEXP, fv )
eventData$PROPDMGEXP2 <- sapply(eventData$PROPDMGEXP, fv )

Calculating the total expense

Both damage values multiplied with its magnitudes (CROPDMG * CROPDMGEXP2) and (PROPDMG * PROPDMGEXP2) are added to create a new column TOTALEXP.

eventData$TOTALEXP <- eventData$PROPDMG*eventData$PROPDMGEXP2 + 
                      eventData$CROPDMG*eventData$CROPDMGEXP2        

Results

Most harmful events to population health

The most harmful events to public health can be calculated by taking the top event types for fatalities and injuries

The eventTop function calculates the 10 most important event types of a given damage type (FATALITIES, INJURIES, TOTALEXP).

eventTop <- function(data, column){
    # sum of 'column' per EVTYPE (as an array)
    ev <- tapply(data[,column], data$EVTYPE, sum)
    # convert ev (as.numeric) to data.frame
    evdf <- data.frame(EVTYPE=names(ev), value=as.numeric(ev))
    # sort (value descendant)
    evdf <- evdf[order(evdf$value,decreasing=TRUE),]
    # and return the top 10 event types
    evdf <- evdf[1:10,]
}

Fatalities:

We first calculate the total number of fatalities for the event types that caused the most fatailities.

fatEvents <- eventTop(eventData, "FATALITIES")
print(fatEvents)
##             EVTYPE value
## 190        TORNADO  5633
## 40  EXCESSIVE HEAT  1903
## 51           FLOOD  1524
## 78            HEAT   937
## 133      LIGHTNING   817
## 189   THUNDERSTORM   725
## 158    RIP CURRENT   577
## 102      HIGH WIND   248
## 6        AVALANCHE   225
## 216   WINTER STORM   217

The following barplot shows the top 10 fatalities-causing events

par(mar=c(13,7,3.5,1),las=2)

barplot(fatEvents$value/1000, names.arg=fatEvents$EVTYPE,
        ylab="Total Number of Fatalities (thousand)",
        ylim=c(0,6),
        main="Total Fatalities per Event Type\n(Top 10 most harmful event types)")

title(xlab = "Event Type", line=9)

Injuries:

We calculate the total numbers of injuries for the event types that caused the most injuries

injEvents <- eventTop(eventData, "INJURIES")
print(injEvents)
##             EVTYPE value
## 190        TORNADO 91364
## 189   THUNDERSTORM  9448
## 51           FLOOD  8604
## 40  EXCESSIVE HEAT  6525
## 133      LIGHTNING  5231
## 78            HEAT  2100
## 122      ICE STORM  1975
## 210       WILDFIRE  1606
## 75            HAIL  1467
## 216   WINTER STORM  1353

The following barplot shows the top 10 injuries for each event type.

par(mar=c(13,7,3.5,1),las=2)

barplot(injEvents$value/1000, names.arg=injEvents$EVTYPE,     
        ylab="Total Number of Injuries (thousand)",
        ylim=c(0,100),
        main="Total Injuries per Event Type\n(Top 10 most harmful event types)")

title(xlab = "Event Type", line=9)

From both barplots (fatalities and injuries) we can see that TORNADOs have by far the largest impact on population health, with about 5,500 fatalities and 90,000 injuries over the period of 62 years (1950 to 2011, incl.).

Events with the greatest economic consequences

We calculate the total expense for the event types that caused the greatest eceonomic damage.

expEvents <- eventTop(eventData, "TOTALEXP")
print(expEvents)
##             EVTYPE      value
## 51           FLOOD 180.574425
## 112      HURRICANE  90.271473
## 190        TORNADO  57.367114
## 185    STORM SURGE  43.323541
## 75            HAIL  20.737204
## 31         DROUGHT  15.018672
## 189   THUNDERSTORM  12.346958
## 122      ICE STORM   8.967041
## 210       WILDFIRE   8.894345
## 193 TROPICAL STORM   8.409287

The following barplot shows the top 10 events causing the greatest economic consequences.

par(mar=c(13,7,3.5,1),las=2)

barplot(expEvents$value, names.arg=expEvents$EVTYPE,    
        ylab="Total Expense (billion $)",
        ylim=c(0,200),
        main="Total Expense per Event Type\n(Top 10 most harmful event types)")

title(xlab = "Event Type", line=9)

From the graph it can easily be seen, that FLOOD has the greatest economic consequences with expenses of about 180 billion $. On the second and third position are HURRICANE and TORNADO with about 90 and 60 billion $, resp.

Conclusion

This report was based on the storm database of the U.S. National Oceanic and Atmospheric Administration’s (NOAA), which contains wheather events from 1950 to 2011. We showed that Tornados had the strongest impact on public health causing about 5500 fatalities and 90000 injuries. We also showed that Flood caused the most damage on economy with about 180 billion $.

References

[1] Storm Data Documentation, URL:https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf

[2] Storm Data, URL:https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2