This report uses data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to analyze the impact of severe weather on both human health and the economy.
The analysis in this report determines the types of severe weather events that are most harmful to human heath and those which have the greatest economic consequences.
The results of the human health impact analysis clearly indicate that tornados are by far the most harmful to human health causing 96,979 casualties since 1950. The results of the economic consequences analysis indicate that floods cause the greatest damage to property and crops followed by hurricanes and tornados.
The data used for this report is from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The raw data is a comma-separated-value file compressed with the bzip2 algorithm to reduce file size. The data is available here:
Storm Data [47Mb]
Documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011 with recent data being more complete.
## Load required R packages
suppressMessages(require(ggplot2))
suppressMessages(require(reshape2))
The data is read from the raw text file provided in the .bz2 file using read.csv().
## Load raw data
data <- read.csv("./data/repdata-data-StormData.csv")
The event types are a mix of upper and lowercase. These are converted to all uppercase for consistency in the figures in this report.
## Convert all event types to uppercase
data$EVTYPE <- toupper(data$EVTYPE)
The data is then conditioned for the human health impact analysis. Sums for fatalities and injuries are calculated for each weather event type or EVTYPE. This provides injury and death totals for weather events such as floods and tornados since 1950.
In addition to EVTYPE, the FATALITIES, and INJURIES columns are used in this analysis. The data in the FATALITIES and INJURIES columns contain total deaths and injuries caused by each weather event.
## Condition raw data for human health impact analysis
## Aggregate (sum) FATALITIES and INJURIES data by EVTYPE. Omit NA data.
d <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data=data, FUN=sum, na.action = na.omit)
## Add a column for total casualties by summing fatailities and injuries.
d$CASUALTIES <- d$FATALITIES + d$INJURIES
## Order descending by CASUALTIES. Only the top 10 will be shown in a histogram.
p <- d[order(-d$CASUALTIES),]
p$EVTYPE <- factor(p$EVTYPE, levels = p$EVTYPE)
## Reshape the data to long format for ggplot2. Omit the CASUALTIES column.
## Only use the top 10 events.
mx <- melt(p[p$EVTYPE != "TORNADO",][c("EVTYPE", "FATALITIES", "INJURIES")][1:10,], id.vars=c(1:1))
mx$EVTYPE <- factor(mx$EVTYPE, levels = p$EVTYPE)
colnames(mx) <- c("EVTYPE", "Casualty", "Total" )
tornado_fatalities <- p[p$EVTYPE == "TORNADO",]$FATALITIES
tornado_injuries <- p[p$EVTYPE == "TORNADO",]$INJURIES
tornado_casualties <- p[p$EVTYPE == "TORNADO",]$CASUALTIES
exheat_casualties <- p[p$EVTYPE == "EXCESSIVE HEAT",]$CASUALTIES
t_exh_diff <- tornado_casualties - exheat_casualties
The data is then conditioned for the economic impact analysis. Sums for property and crop damage are calculated for each weather event type. This provides property and crop damage totals for weather events since 1950.
In addition to EVTYPE, the PROPDMG, and CROPDMG columns are used in this analysis. The data in the PROPDMG and CROPDMG columns contain dollar amounts for property damage and crop damage for each weather event.
The columns PROPDMGEXP and CROPDMGEXP indicate a power of ten multiplier for the corresponding PROPDMG or CROPDMG amounts for each row. These columns contain the following characters:
These values are translated each to a power of ten subsequently used to calculate actual dollar amounts for property and crop damage.
## Condition raw ecomomic data for ecomomic impact analysis
## Trim the dataset down to the five data columns we're interested in.
de <- data[c("EVTYPE", "PROPDMGEXP", "PROPDMG", "CROPDMGEXP", "CROPDMG")]
## Convert property damage exponents to uppercase. Set all NAs to 0. Set all "", "+", "-", "?"
## characters in PROPDMGEXP column to 0.
de$PROPDMGEXP <- toupper(de$PROPDMGEXP)
de$PROPDMG[which(is.na(de$PROPDMG))] <- 0
de$PROPDMGEXP[de$PROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
## Translate B, M, K, and H in the PROPDMGEXP to 9 (Billions), 6 (Millions), 3 (Thousands),
## 2 (Hundreds) respectively
de$PROPDMGEXP[de$PROPDMGEXP == "B"] <- "9"
de$PROPDMGEXP[de$PROPDMGEXP == "M"] <- "6"
de$PROPDMGEXP[de$PROPDMGEXP == "K"] <- "3"
de$PROPDMGEXP[de$PROPDMGEXP == "H"] <- "2"
## Multiply PROPDMG by factor of ten to get full property damage amount
de$PROPDMGEXP <- 10^(as.numeric(de$PROPDMGEXP))
de$property.damage = de$PROPDMG * de$PROPDMGEXP
## Convert craop damage exponents to uppercase. Set all NAs to 0. Set all "", "+", "-", "?"
## characters in CROPDMGEXP column to 0.
de$CROPDMGEXP <- toupper(de$CROPDMGEXP)
de$CROPDMG[which(is.na(de$CROPDMG))] <- 0
de$CROPDMGEXP[de$CROPDMGEXP %in% c("", "+", "-", "?")] <- "0"
## Translate B, M, K, and H in the CROPDMGEXP to 9 (Billions), 6 (Millions), 3 (Thousands),
## 2 (Hundreds) respectively
de$CROPDMGEXP[de$CROPDMGEXP == "B"] <- "9"
de$CROPDMGEXP[de$CROPDMGEXP == "M"] <- "6"
de$CROPDMGEXP[de$CROPDMGEXP == "K"] <- "3"
de$CROPDMGEXP[de$CROPDMGEXP == "H"] <- "2"
## Multiply CROPDMG by factor of ten to get full crop damage amount
de$CROPDMGEXP <- 10^(as.numeric(de$CROPDMGEXP))
de$crop.damage = de$CROPDMG * de$CROPDMGEXP
## Aggregate (sum) property.damage and crop.damage data by EVTYPE. Omit NA data.
dea <- aggregate(cbind(property.damage, crop.damage) ~ EVTYPE, data=de, FUN=sum, na.action = na.omit)
## Add a column for total economic damage by summing property.damage and crop.damage.
dea$total.damage <- dea$property.damage + dea$crop.damage
## Order descending by total.damage. Only the top 10 will be shown in a histogram.
dea <- dea[order(-dea$total.damage),]
## Reshape the data to long format for ggplot2. Omit the total.damage column.
## Only use the top 10 events.
dx <- melt(dea[1:10, c("EVTYPE", "property.damage", "crop.damage")], id.vars=c(1:1))
dx$EVTYPE <- factor(dx$EVTYPE, levels = dea$EVTYPE)
colnames(dx) <- c("EVTYPE", "Type", "Damage")
## Scale the damage amounts to display in billions.
dx$Damage <- dx$Damage / 1000000000
dx$Type <- factor(dx$Type, levels=c("property.damage", "crop.damage"), labels=c("Property Damage", "Crop Damage"))
flood_prop_damage <- round(dea[dea$EVTYPE == "FLOOD",]$property.damage / 10^9)
flood_damage <- round(dea[dea$EVTYPE == "FLOOD",]$total.damage / 10^9)
drought_crop_damage <- round(dea[dea$EVTYPE == "DROUGHT",]$crop.damage / 10^9)
The following histogram shows the top 10 severe weather events causing either death or injury. Clearly, tornados are the most harmful to human health with 96979 casualties since 1950 including 5633 deaths and 91346 injuries. Since 1950, tornados have caused 88551 more casualties than the second-leading severe weather event, Excessive Heat.
print(ggplot(p[1:10,], aes(x=EVTYPE, y=CASUALTIES)) +
geom_bar(stat="identity", fill="#2579B2", colour="black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Total Casualties due to Weather Events\n for the Period 1950 to November 2011\n(Source NOAA Storm Database)") +
ylab("Number of Casualties") + xlab("Weather Event") + scale_fill_brewer(palette="Paired")
)
The following figure shows the top 10 severe weather events causing either death or injury with tornados omitted. It’s clear that heat was a significant cause of death during the 61 years during which this data was collected.
print(ggplot(mx, aes(x=EVTYPE, y=Total, fill=Casualty)) +
geom_bar(stat="identity", colour="black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Total Casualties (Excluding Tornados) due to Weather Events\n for the Period 1950 to November 2011\n(Source NOAA Storm Database)") +
ylab("Number of Casualties") + xlab("Weather Event") +
theme(legend.position = c(0.9, 0.9), legend.background = element_rect()) + scale_fill_brewer(palette="Paired")
)
The following histogram shows the top 10 severe weather events causing property and crop damage. Floods have caused the most property damage and total damage of $145 billon and $150 billon respectively. Drought seems to have the greatest economic impact on crops, having caused $14 billon in damage from 1950 to 2011.
print(ggplot(dx, aes(x=EVTYPE, y=Damage, fill=Type)) +
geom_bar(stat="identity", colour="black") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) +
ggtitle("Total Property Damage Due to Weather Events\n for the Period 1950 to November 2011\n(Source NOAA Storm Database)") +
ylab("Damage (Billions of Dollars, $)") + xlab("Weather Event") +
theme(legend.position = c(0.9, 0.9), legend.background = element_rect()) + scale_fill_brewer(palette="Paired")
)
From the figures and analysis above it can be concluded that tornados have had the greatest impact on human health from 1950 to 2011. Floods, hurricanes, and tornados have had the greatest economic consequences from 1950 to 2011.