Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
This docuent is an elementary data analysis for the NOAA storm data from 1950~2011. It focuses on two points: 1. Population caualties 2. Economic damage I will analyse some simple processing work on the data, and use the plot to present my result.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the file from the course web site:
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The first step is to read the data into a data frame.
> download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
+ "NOAAStormData.csv.bz2")
> storm <- read.csv("NOAAStormData.csv.bz2")
Before the analysis, the data need some preprocessing. Event types don’t have a specific format. You can see lots of unique event types below.
> length(unique(storm$EVTYPE))
[1] 985
> # translate all letters to lowercase
> eventTypes <- tolower(storm$EVTYPE)
> # replace all punct. characters with a space
> eventTypes <- gsub("[[:blank:][:punct:]+]", " ", eventTypes)
> length(unique(eventTypes))
[1] 874
> # update the data frame
> storm$EVTYPE <- eventTypes
No further data preprocessing was performed although the event type field can be processed further to merge event types such as tstm wind and thunderstorm wind. After the cleaning, as expected, the number of unique event types reduce significantly. For further analysis, the cleaned event types are used.
To find the event types that are most harmful to population health, the number of casualties are aggregated by the event type.
> library(plyr)
> casualties <- ddply(storm, .(EVTYPE), summarize,
+ fatalities = sum(FATALITIES),
+ injuries = sum(INJURIES))
>
> # Find events that caused most death and injury
> fatalEvents <- head(casualties[order(casualties$fatalities, decreasing = T), ], 10)
> injuryEvents <- head(casualties[order(casualties$injuries, decreasing = T), ], 10)
Top 10 events that caused largest number of deaths are
> fatalEvents[, c("EVTYPE", "fatalities")]
EVTYPE fatalities
741 tornado 5633
116 excessive heat 1903
138 flash flood 978
240 heat 937
410 lightning 816
762 tstm wind 504
154 flood 470
515 rip current 368
314 high wind 248
19 avalanche 224
Top 10 events that caused most number of injuries are
> injuryEvents[, c("EVTYPE", "injuries")]
EVTYPE injuries
741 tornado 91346
762 tstm wind 6957
154 flood 6789
116 excessive heat 6525
410 lightning 5230
240 heat 2100
382 ice storm 1975
138 flash flood 1777
671 thunderstorm wind 1488
209 hail 1361
To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used.
In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event.
> expTransform <- function(exp) {
+ # h -> hundred, k -> thousand, m -> million, b -> billion
+ if (exp %in% c('h', 'H'))
+ return(2)
+ else if (exp %in% c('k', 'K'))
+ return(3)
+ else if (exp %in% c('m', 'M'))
+ return(6)
+ else if (exp %in% c('b', 'B'))
+ return(9)
+ else if (!is.na(as.numeric(exp)))
+ return(as.numeric(exp))
+ else if (exp %in% c('', '-', '?', '+'))
+ return(0)
+ else {
+ stop("Invalid exponent value.")
+ }
+ }
> propDmgExp <- sapply(storm$PROPDMGEXP, FUN=expTransform)
> storm$propDmg <- storm$PROPDMG * (10 ** propDmgExp)
> cropDmgExp <- sapply(storm$CROPDMGEXP, FUN=expTransform)
> storm$cropDmg <- storm$CROPDMG * (10 ** cropDmgExp)
> # Compute the economic loss by event type
> library(plyr)
> econLoss <- ddply(storm, .(EVTYPE), summarize,
+ propDmg = sum(propDmg),
+ cropDmg = sum(cropDmg))
>
> # filter out events that caused no economic loss
> econLoss <- econLoss[(econLoss$propDmg > 0 | econLoss$cropDmg > 0), ]
> propDmgEvents <- head(econLoss[order(econLoss$propDmg, decreasing = T), ], 10)
> cropDmgEvents <- head(econLoss[order(econLoss$cropDmg, decreasing = T), ], 10)
Top 10 events that caused most property damage (in dollars) are as follows
> propDmgEvents[, c("EVTYPE", "propDmg")]
EVTYPE propDmg
138 flash flood 6.834763e+12
697 thunderstorm winds 2.088094e+12
741 tornado 1.591385e+11
154 flood 1.446577e+11
366 hurricane typhoon 6.930584e+10
209 hail 4.573462e+10
585 storm surge 4.332354e+10
410 lightning 1.813012e+10
357 hurricane 1.186832e+10
755 tropical storm 7.703891e+09
Similarly, the events that caused biggest crop damage are
> cropDmgEvents[, c("EVTYPE", "cropDmg")]
EVTYPE cropDmg
84 drought 13972566000
154 flood 5661968450
519 river flood 5029459000
382 ice storm 5022113500
209 hail 3025956480
357 hurricane 2741910000
366 hurricane typhoon 2607872800
138 flash flood 1421317100
125 extreme cold 1312973000
185 frost freeze 1094186000
The following plot shows top dangerous weather event types.
> library(ggplot2)
> # Set the levels in order
> p1 <- ggplot(data=fatalEvents,
+ aes(x=reorder(EVTYPE, fatalities), y=fatalities, fill=fatalities)) +
+ geom_bar(stat="identity") +
+ coord_flip() +
+ ylab("Total number of fatalities") +
+ xlab("Event type") +
+ theme(legend.position="none")
>
> p2 <- ggplot(data=injuryEvents,
+ aes(x=reorder(EVTYPE, injuries), y=injuries, fill=injuries)) +
+ geom_bar(stat="identity") +
+ coord_flip() +
+ ylab("Total number of injuries") +
+ xlab("Event type") +
+ theme(legend.position="none")
>
> p1; p2
Tornadoes cause most number of deaths and injuries among all event types. There are more than 5,000 deaths and more than 10,000 injuries in the last 60 years in US, due to tornadoes. The other event types that are most dangerous with respect to population health are excessive heat and flash floods.
The following plot shows the most severe weather event types with respect to economic cost that they have costed since 1950s.
> library(ggplot2)
> # Set the levels in order
> p1 <- ggplot(data=propDmgEvents,
+ aes(x=reorder(EVTYPE, propDmg), y=log10(propDmg), fill=propDmg )) +
+ geom_bar(stat="identity") +
+ coord_flip() +
+ xlab("Event type") +
+ ylab("Property damage in dollars (log-scale)") +
+ theme(legend.position="none")
>
> p2 <- ggplot(data=cropDmgEvents,
+ aes(x=reorder(EVTYPE, cropDmg), y=cropDmg, fill=cropDmg)) +
+ geom_bar(stat="identity") +
+ coord_flip() +
+ xlab("Event type") +
+ ylab("Crop damage in dollars") +
+ theme(legend.position="none")
>
> p1; p2
Property damages are given in logarithmic scale due to large range of values. The data shows that flash floods and thunderstorm winds cost the largest property damages among weather-related natural diseasters. Note that, due to untidy nature of the available data, type flood and flash flood are separate values and should be merged for more accurate data-driven conclusions.
The most severe weather event in terms of crop damage is the drought. In the last half century, the drought has caused more than 10 billion dollars damage. Other severe crop-damage-causing event types are floods and hails.