In this article, we will conduct an analysis of data from the 2007 NOAA Storm Database which documents the occurance of U.S. storms and other significant weather phenomenon having sufficient intensity to cause loss of life, injuries, significant property damage or disruption of commerce. A list of captured events is displayed below.
Collection of this data began in 1950 and continues today. Our interest is in determining which types of weather events have the greatest negative impact on the well-being of the U.S. populace, and which events have the greatest negative impact on the U.S. economy.
We produce charts which show the “top ten” weather events with respect to fatalities, injuries, property damage and crop damage. Our results show that tornados, flooding, thunderstorms and extreme heat and cold cause many injuries and deaths, while drought, flooding, hurricanes and thunderstorms have the largest economic impact.
event_types
## [1] "ASTRONOMICAL LOW TIDE" "AVALANCHE"
## [3] "BLIZZARD" "COASTAL FLOOD"
## [5] "COLD" "DEBRIS FLOW"
## [7] "DENSE FOG" "DENSE SMOKE"
## [9] "DROUGHT" "DUST DEVIL"
## [11] "DUST STORM" "EXCESSIVE HEAT"
## [13] "EXTREME COLD" "FLASH FLOOD"
## [15] "FLOOD" "FROST/FREEZE"
## [17] "FUNNEL CLOUD" "FREEZING FOG"
## [19] "HAIL" "HEAT"
## [21] "HEAVY RAIN" "HEAVY SNOW"
## [23] "HIGH SURF" "HIGH WIND"
## [25] "HURRICANE" "ICE STORM"
## [27] "LAKE-EFFECT SNOW" "LAKESHORE FLOOD"
## [29] "LIGHTNING" "MARINE HAIL"
## [31] "MARINE HIGH WIND" "MARINE STRONG WIND"
## [33] "MARINE THUNDERSTORM WIND" "RIP CURRENT"
## [35] "SEICHE" "SLEET"
## [37] "STORM SURGE/TIDE" "STRONG WIND"
## [39] "THUNDERSTORM WIND" "TORNADO"
## [41] "TROPICAL DEPRESSION" "TROPICAL STORM"
## [43] "TSUNAMI" "VOLCANIC ASH"
## [45] "WATERSPOUT" "WILDFIRE"
## [47] "WINTER STORM" "WINTER WEATHER"
The data is read from a csv file that was encrypted with the Burrows-Wheeler compression algorithm. This format is natively supported by the R function read.csv
df <- read.csv("StormData.csv.bz2", stringsAsFactors = FALSE)
We relied on the dplyr and stringdist packages to help us the data. The steps are explained in detail below.
df <- tbl_df(df) %>%
filter(FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0) %>%
mutate(UEVENT = str_trim(str_to_upper(gsub("tstm","thunderstorm",EVTYPE, ignore.case = T))),
EVENT = event_types[amatch(UEVENT, event_types, method="jw", p=0.1, maxDist=0.2)],
PROPD = PROPDMG * sapply(str_to_upper(PROPDMGEXP), switch, B=1e9, M=1e6, K=1e3, 0),
CROPD = CROPDMG * sapply(str_to_upper(CROPDMGEXP), switch, B=1e9, M=1e6, K=1e3, 0)) %>%
filter(!is.na(EVENT)) %>%
select(EVENT, FATALITIES, INJURIES, PROPD, CROPD) %>%
group_by(EVENT) %>%
summarize(INJURIES=sum(INJURIES),FATALITIES=sum(FATALITIES),PROPD=sum(PROPD),CROPD=sum(CROPD))
Filtering - We filtered out information about weather events that had no recorded fatalaties, injuries, property damage, or crop damage.
Cleaning - Because the reported weather events appear to have been hand-typed, include misspellings, transpositions, and use abbreviations, we used some maniupulation and the fuzzy string matching algorithms provided in the stringdist package to standardize these. * We converted the abbreviation tstm to thunderstorm. This occured thousands of times and none of the fuzzy string matching algorithms did a good job of converting it. * We removed extraneous spaces from all of the strings and converted them to uppercase. * We then used the Jaro-Winkler distance algoritm to find the closest match of the resulting strings to the list provided above. For more information on this and other string matching algorithms [click here] (http://www.joyofdata.de/blog/comparison-of-string-distance-algorithms/).
Property and crop damage data was provided in two columns, one numeric and the other an alphabetic column indicating the magnitude of the number, for example 1.55 B to indicate $1,550,000,000. This was converted to a single column.
Next, we include nly the columns used in the analysis were included in the resulting dataset.
Finally, we group the data by weather event, and calculate the total injuries, fatalaties, property damage and crop damaage for each. The table of results are shown below.
c <- rainbow(10)
par(mfcol=c(1,2))
f_df <- df %>% arrange(desc(FATALITIES))
plot_df <- head(f_df, n=10)
barplot(plot_df$FATALITIES, horiz=T, col = c, cex.axis=.5, xlab="FATALITIES", ylab="TOP TEN WEATHER EVENT")
legend("topright", rev(plot_df$EVENT), cex=.7, col=rev(c), pch=19, bty="n")
i_df <- df %>% arrange(desc(INJURIES))
plot_df <- head(i_df, n=10)
barplot(plot_df$INJURIES, horiz=T, col = c, cex.axis=.5, xlab="INJURIES", ylab="")
legend("topright", rev(plot_df$EVENT), cex=.7, col=rev(c), pch=19, bty="n")
We can see from the above that tornados, flooding, thunderstorms and extreme heat and cold account for most weather related injuries and fatalaties.
par(mfcol=c(1,2))
p_df <- df %>% arrange(desc(PROPD))
plot_df <- head(p_df, n=10)
barplot(plot_df$PROPD, horiz=T, col = c, cex.axis=.5, xlab="PROPERTY DAMAGE", ylab="TOP TEN WEATHER EVENTS")
legend("topright", rev(plot_df$EVENT), cex=.7, col=rev(c), pch=19, bty="n")
c_df <- df %>% arrange(desc(CROPD))
plot_df <- head(c_df, n=10)
barplot(plot_df$CROPD, horiz=T, col = c, cex.axis=.5, xlab="CROP DAMAGE", ylab="")
legend("topright", rev(plot_df$EVENT), cex=.7, col=rev(c), pch=19, bty="n")
The greated property and crop damage results from drought, flooding, hurricanes and thunderstorms. These events have a large ecomomic impact in the U.S.