Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This analysis involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Data analysis must address the following questions: 1. Across the US, which types of events are most harmful with respect to population health? 2. Across the US, which types of events have the greatest economic consequences?
NCDC receives Storm Data from the National Weather Service. The National Weather service receives their information from a variety of sources, which include but are not limited to: county, state and federal emergency management officials, local law enforcement officials, skywarn spotters, NWS damage surveys, newspaper clipping services, the insurance industry and the general public.
Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents the occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce. In addition, it is a partial record of other significant meteorological events, such as record maximum or minimum temperatures or precipitation that occurs in connection with another event. Some information appearing in Storm Data may be provided by or gathered from sources outside the National Weather Service (NWS), such as the media, law enforcement and/or other government agencies, private companies, individuals, etc. An effort is made to use the best available information but because of time and resource constraints, information from these sources may be unverified by the NWS. Therefore, when using information from Storm Data, customers should be cautious as the NWS does not guarantee the accuracy or validity of the information.
Data file has been dowloaded using next link:
https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
The events in the database start in the year 1950 and end in November 2011.
In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
Loading Storm Data into R:
storm.url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(storm.url, "repdata_data_StormData.csv")
Storms <- read.table("repdata-data-StormData.csv", header = TRUE, sep = ",")
Loading relevant packages:
library(rmarkdown)
library(dplyr)
##
## Attaching package: 'dplyr'
## Следующие объекты скрыты от 'package:stats':
##
## filter, lag
## Следующие объекты скрыты от 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(cowplot)
##
## Attaching package: 'cowplot'
## Следующий объект скрыт от 'package:ggplot2':
##
## ggsave
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Warning in doTryCatch(return(expr), name, parentenv, handler): не могу загрузить разделяемый объект '/Library/Frameworks/R.framework/Resources/modules//R_X11.so':
## dlopen(/Library/Frameworks/R.framework/Resources/modules//R_X11.so, 6): Library not loaded: /opt/X11/lib/libSM.6.dylib
## Referenced from: /Library/Frameworks/R.framework/Resources/modules//R_X11.so
## Reason: image not found
## Could not load tcltk. Will use slower R code instead.
## Loading required package: RSQLite
## Loading required package: DBI
library(pander)
Lets check how many events in the Data table.
summary(Storms$EVTYPE)
## HAIL TSTM WIND THUNDERSTORM WIND
## 288661 219940 82563
## TORNADO FLASH FLOOD FLOOD
## 60652 54277 25326
## THUNDERSTORM WINDS HIGH WIND LIGHTNING
## 20843 20212 15754
## HEAVY SNOW HEAVY RAIN WINTER STORM
## 15708 11723 11433
## WINTER WEATHER FUNNEL CLOUD MARINE TSTM WIND
## 7026 6839 6175
## MARINE THUNDERSTORM WIND WATERSPOUT STRONG WIND
## 5812 3796 3566
## URBAN/SML STREAM FLD WILDFIRE BLIZZARD
## 3392 2761 2719
## DROUGHT ICE STORM EXCESSIVE HEAT
## 2488 2006 1678
## HIGH WINDS WILD/FOREST FIRE FROST/FREEZE
## 1533 1457 1342
## DENSE FOG WINTER WEATHER/MIX TSTM WIND/HAIL
## 1293 1104 1028
## EXTREME COLD/WIND CHILL HEAT HIGH SURF
## 1002 767 725
## TROPICAL STORM FLASH FLOODING EXTREME COLD
## 690 682 655
## COASTAL FLOOD LAKE-EFFECT SNOW FLOOD/FLASH FLOOD
## 650 636 624
## LANDSLIDE SNOW COLD/WIND CHILL
## 600 587 539
## FOG RIP CURRENT MARINE HAIL
## 538 470 442
## DUST STORM AVALANCHE WIND
## 427 386 340
## RIP CURRENTS STORM SURGE FREEZING RAIN
## 304 261 250
## URBAN FLOOD HEAVY SURF/HIGH SURF EXTREME WINDCHILL
## 249 228 204
## STRONG WINDS DRY MICROBURST ASTRONOMICAL LOW TIDE
## 196 186 174
## HURRICANE RIVER FLOOD LIGHT SNOW
## 174 173 154
## STORM SURGE/TIDE RECORD WARMTH COASTAL FLOODING
## 148 146 143
## DUST DEVIL MARINE HIGH WIND UNSEASONABLY WARM
## 141 135 126
## FLOODING ASTRONOMICAL HIGH TIDE MODERATE SNOWFALL
## 120 103 101
## URBAN FLOODING WINTRY MIX HURRICANE/TYPHOON
## 98 90 88
## FUNNEL CLOUDS HEAVY SURF RECORD HEAT
## 87 84 81
## FREEZE HEAT WAVE COLD
## 74 74 72
## RECORD COLD ICE THUNDERSTORM WINDS HAIL
## 64 61 61
## TROPICAL DEPRESSION SLEET UNSEASONABLY DRY
## 60 59 56
## FROST GUSTY WINDS THUNDERSTORM WINDSS
## 53 53 51
## MARINE STRONG WIND OTHER SMALL HAIL
## 48 48 47
## FUNNEL FREEZING FOG THUNDERSTORM
## 46 45 45
## Temperature record TSTM WIND (G45) Coastal Flooding
## 43 39 38
## WATERSPOUTS MONTHLY PRECIPITATION WINDS
## 37 36 36
## (Other)
## 2940
985 types of events is too much. Lets simplificate it. Trying to reduce the number of unique events from 985 to at least 10:
Storms$EVTYPE <- as.character(Storms$EVTYPE)
Storms$event <- ""
Storms$event[Storms$EVTYPE %in% c("FLOOD","FLASH FLOOD","FLOOD/FLASH FLOOD",
"URBAN FLOOD","URBAN FLOODING","COASTAL FLOODING","RIVER FLOOD",
"FLOODING","COASTAL FLOOD","Coastal Flooding","FLASH FLOODING",
"FLASH FLOODING/FLOOD", "TIDAL FLOODING", "FLASH FLOOD/FLOOD",
"LAKESHORE FLOOD", "RIVER FLOODING", "URBAN/SMALL STREAM FLOOD",
"FLASH FLOODS", "SMALL STREAM FLOOD", "URBAN/SMALL STREAM FLOODING",
"SMALL STREAM FLOODING", "ICE JAM FLOODING",
"URBAN AND SMALL STREAM FLOODIN", "FLOOD/RAIN/WINDS")] <- "Flood"
Storms$event[Storms$EVTYPE %in% c("RECORD COLD","WINTER WEATHER","EXCESSIVE HEAT",
"WINTER WEATHER/MIX","EXTREME COLD","EXTREME COLD/WIND CHILL","COLD","FREEZE",
"RECORD HEAT","HEAT WAVE","FROST","HEAT","FROST/FREEZE","Temperature record",
"UNSEASONABLY HOT", "Cold", "Record temperature", "UNSEASONABLY COOL",
"PROLONG COLD", "WIND CHILL", "Winter Weather", "EXTREME WINDCHILL TEMPERATURES",
"FREEZING DRIZZLE", "EXTREME HEAT", "UNSEASONABLY COLD", "FREEZING FOG",
"EXTREME/RECORD COLD", "DROUGHT/EXCESSIVE HEAT", "DROUGHT/EXCESSIVE HEAT",
"UNUSUALLY COLD", "HARD FREEZE", "LOW TEMPERATURE", "AGRICULTURAL FREEZE",
"BITTER WIND CHILL TEMPERATURES", "HIGH WINDS/COLD", "RECORD COOL")] <- "Temperature"
Storms$event[Storms$EVTYPE %in% c("TSTM WIND","THUNDERSTORM WIND","STRONG WIND",
"TSTM WIND/HAIL","HIGH WIND", "MARINE TSTM WIND","MARINE THUNDERSTORM WIND",
"STRONG WINDS", "THUNDERSTORM WINDSS", "MARINE STRONG WIND",
"MARINE HIGH WIND","WIND","GUSTY WINDS","TSTM WIND (G45)","WINDS",
"GRADIENT WINDS","TSTM WIND (G40)", "Gusty Winds", "WIND ADVISORY",
"GUSTY WIND", "THUNDERSTORM WINDS/HAIL", "WIND DAMAGE", "HIGH WINDS",
"COLD/WIND CHILL", "Strong Winds", "Wind Damage")] <- "Wind"
Storms$event[Storms$EVTYPE %in% c("HAIL","HEAVY RAIN","HEAVY SNOW","WINTER STORM",
"ICE STORM","BLIZZARD","SMALL HAIL","MARINE HAIL","FREEZING RAIN",
"THUNDERSTORM WINDS HAIL","MODERATE SNOWFALL","LIGHT SNOW","SNOW","ICE",
"LAKE-EFFECT SNOW","SLEET","WINTRY MIX", "HEAVY RAINS/FLOODING",
"FREEZING RAIN/SLEET","FIRST SNOW", "SNOW/SLEET", "SNOW FREEZING RAIN",
"MONTHLY RAINFALL", "BLOWING SNOW", "RAIN", "RECORD RAINFALL",
"BLACK ICE", "HEAVY SNOW-SQUALLS", "Heavy Rain", "SNOW/ICE STORM",
"SNOW SQUALLS", "HAIL 0.75", "SNOW SQUALL", "Light Snow",
"LAKE EFFECT SNOW", "LIGHT FREEZING RAIN", "HEAVY LAKE SNOW",
"EXCESSIVE SNOW", "HEAVY RAINS", "ICY ROADS", "HAIL 75",
"SNOW AND ICE", "EXCESSIVE RAINFALL", "HEAVY SNOW SQUALLS",
"Snow", "HAIL 100", "HAIL 175", "RECORD SNOW", "Freezing Rain",
"NON SEVERE HAIL", "SNOW DROUGHT", "SNOW/BLOWING SNOW",
"SNOW/ICE", "SNOW/SLEET/FREEZING RAIN", "Blowing Snow",
"Black Ice", "SNOW AND SLEET")] <- "Rain/Snow"
Storms$event[Storms$EVTYPE %in% c("TORNADO", "TORNADO F0", "HURRICANE ERIN",
"TORNADO F1")] <- "Tornado"
Storms$event[Storms$EVTYPE %in% c("WATERSPOUT", "WATERSPOUT-", "WATERSPOUTS",
"WATERSPOUT/TORNADO")] <- "Waterspout"
Storms$event[Storms$EVTYPE %in% c("TYPHOON")] <- "Typhoon"
Storms$event[Storms$EVTYPE %in% c("WILDFIRES", "WILD FIRES", "BRUSH FIRE")] <- "Fire"
Storms$event[Storms$EVTYPE %in% c("LIGHTNING","WILDFIRE","WILD/FOREST FIRE")] <- "Fire"
Storms$event[Storms$EVTYPE %in% c("DROUGHT","OTHER","UNSEASONABLY WARM",
"EXTREME WINDCHILL","DUST DEVIL","AVALANCHE","FOG","RIP CURRENT",
"UNSEASONABLY DRY", "LANDSLIDES", "HIGH SEAS", "HEAVY MIX",
"MUDSLIDE", "DRY", "UNUSUAL WARMTH", "MIXED PRECIP", "DENSE SMOKE",
"SMOKE", "Glaze", "UNSEASONABLY WARM AND DRY", "UNSEASONABLY WET",
"SEICHE", "VOLCANIC ASH", "TROPICAL DEPRESSION", "RECORD WARMTH",
"DRY MICROBURST", "FUNNEL CLOUD", "DENSE FOG", "LANDSLIDE",
"DRY WEATHER", "URBAN/SML STREAM FLD", "RIP CURRENTS",
"MONTHLY PRECIPITATION", "MIXED PRECIPITATION", "GLAZE")] <- "Others"
Storms$event[Storms$EVTYPE %in% c("HIGH SURF","HEAVY SURF/HIGH SURF",
"ASTRONOMICAL LOW TIDE","HEAVY SURF","STORM SURGE/TIDE", "High Surf",
"TSUNAMI", "ASTRONOMICAL HIGH TIDE")] <- "Tides"
Storms$event[Storms$EVTYPE %in% c("TROPICAL STORM","DUST STORM","FUNNEL",
"THUNDERSTORM","HURRICANE/TYPHOON", "FUNNEL CLOUDS","HURRICANE",
"COASTAL STORM", "HURRICANE OPAL", "SLEET STORM", "SEVERE THUNDERSTORMS",
"THUNDERSTORM WINDS", "FUNNEL CLOUD", "STORM SURGE",
"THUNDERSTORMS WINDS", "SEVERE THUNDERSTORM", "THUNDERSTORM WINDS",
"THUNDERSTORM WINDS LIGHTNING", "THUNDERSTORMS",
"THUNDERSTORM WIND/ TREES", "THUNDERSTORM WIND G50",
"THUNDERSTORM WIND 60 MPH", "SEVERE THUNDERSTORM WINDS",
"GUSTY THUNDERSTORM WINDS")] <- "Storm"
Storms$event[which(Storms$event=="")] <- "Others"
Storms$event <- as.factor(Storms$event)
summary(Storms$event)
## Fire Flood Others Rain/Snow Storm Temperature
## 19987 82571 12038 335337 29619 14319
## Tides Tornado Typhoon Waterspout Wind
## 1491 60682 11 3851 342391
Now we have just 11 groups of events.
Lets find right amount for each event. We have the next multipliers for data figures:
summary(Storms$PROPDMGEXP)
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
summary(Storms$CROPDMGEXP)
## ? 0 2 B k K m M
## 618413 7 19 1 9 21 281832 1 1994
Let’s make them to lower case
Storms$PROPDMGEXP <- as.character(Storms$PROPDMGEXP)
Storms$CROPDMGEXP <- as.character(Storms$CROPDMGEXP)
Storms$PROPDMGEXP <- tolower(Storms$PROPDMGEXP)
Storms$CROPDMGEXP <- tolower(Storms$CROPDMGEXP)
Looking for the right multipliers at the same Storm Data Table:
Storms$Crop.factor <- 1
Storms$Crop.factor[which(Storms$CROPDMGEXP=="h")] <- 100
Storms$Crop.factor[which(Storms$CROPDMGEXP=="k")] <- 1000
Storms$Crop.factor[which(Storms$CROPDMGEXP=="m")] <- 1000000
Storms$Crop.factor[which(Storms$CROPDMGEXP=="b")] <- 1000000000
Storms$Prop.factor <- 1
Storms$Prop.factor[which(Storms$PROPDMGEXP=="h")] <- 100
Storms$Prop.factor[which(Storms$PROPDMGEXP=="k")] <- 1000
Storms$Prop.factor[which(Storms$PROPDMGEXP=="m")] <- 1000000
Storms$Prop.factor[which(Storms$PROPDMGEXP=="b")] <- 1000000000
and create new column with total property damage value:
Storms$Damage <- (Storms$PROPDMG*Storms$Prop.factor+
Storms$CROPDMG*Storms$Crop.factor)/1000000 # in MM
Fine! We are rady to make total result table. Summarizing the total of injuries, fatalities and property damages per natural event in a new table:
Storm.Groups = sqldf("SELECT event,
sum(INJURIES) as inj,
sum(Damage) as prop,
sum(FATALITIES) as fat,
COUNT(*) AS 'freq'
FROM Storms GROUP BY 1")
Storm.Groups$event <- as.character(Storm.Groups$event)
Now analyze the losses and injuries per event:
Storm.Groups$inj.per.event <- Storm.Groups$inj/Storm.Groups$freq
Storm.Groups$prpDMG.per.event <- Storm.Groups$prop/Storm.Groups$freq
Storm.Groups$fat.per.event <- Storm.Groups$fat/Storm.Groups$freq
And prepare the result table with the highest events
Storm.Ordered <- as.data.frame(x = c(1:11))
Storm.Ordered$Injures <- arrange(Storm.Groups, desc(inj.per.event))[,1]
Storm.Ordered$Fatalities <- arrange(Storm.Groups, desc(fat.per.event))[,1]
Storm.Ordered$Property.Damaged <- arrange(Storm.Groups, desc(prpDMG.per.event))[,1]
Have a look for a rate by the losses per event:
names(Storm.Ordered) <- c("#", "by Injuries", "by Fatalities", "by Economic Losses")
pander(Storm.Ordered)
| # | by Injuries | by Fatalities | by Economic Losses |
|---|---|---|---|
| 1 | Tornado | Temperature | Typhoon |
| 2 | Temperature | Tides | Storm |
| 3 | Typhoon | Others | Tides |
| 4 | Fire | Tornado | Flood |
| 5 | Tides | Fire | Others |
| 6 | Others | Flood | Tornado |
| 7 | Flood | Storm | Fire |
| 8 | Storm | Wind | Temperature |
| 9 | Wind | Rain/Snow | Rain/Snow |
| 10 | Rain/Snow | Waterspout | Wind |
| 11 | Waterspout | Typhoon | Waterspout |
The minimum number of position has a maximum impact on the US economy and public health per event.
Let’s plotting the most events conserning human injuries:
a1 <- ggplot(Storm.Groups, aes(x = event, y = inj, fill = event)) +
geom_bar(stat = "identity") +
scale_fill_hue(l=30) +
coord_flip() +
xlab("Event") +
ylab("Number of injuries") +
theme_minimal(base_size = 10) +
guides(fill=FALSE) +
ggtitle("Events with highest level of injuries in the US")
a2 <- ggplot(Storm.Groups, aes(x = event, y = inj.per.event, fill = event)) +
geom_bar(stat = "identity") +
scale_fill_hue(l=30) +
coord_flip() +
xlab("Event") +
ylab("Injuries per event") +
theme_minimal(base_size = 10) +
guides(fill=FALSE) +
ggtitle("Injuries per natural event")
plot_grid(a1, a2, ncol = 2, nrow = 1)
a1 <- ggplot(Storm.Groups, aes(x = event, y = fat, fill = event)) +
geom_bar(stat = "identity") +
scale_fill_hue(l=30) +
coord_flip() +
xlab("Event") +
ylab("Number of deaths") +
theme_minimal(base_size = 10) +
guides(fill=FALSE) +
ggtitle("Events with highest level of deaths in the US")
a2 <- ggplot(Storm.Groups, aes(x = event, y = fat.per.event, fill = event)) +
geom_bar(stat = "identity") +
scale_fill_hue(l=30) +
coord_flip() +
xlab("Event") +
ylab("Death per event") +
theme_minimal(base_size = 10) +
guides(fill=FALSE) +
ggtitle("Death per natural event")
plot_grid(a1, a2, ncol = 2, nrow = 1)
a1 <- ggplot(Storm.Groups, aes(x = event, y = prop, fill = event)) +
geom_bar(stat = "identity") +
scale_fill_hue(l=30) +
coord_flip() +
xlab("Event") +
ylab("Losses in mln US dollars") +
theme_minimal(base_size = 10) +
guides(fill=FALSE) +
ggtitle("Economic losses of natural events in the US")
a2 <- ggplot(Storm.Groups, aes(x = event, y = prpDMG.per.event, fill = event)) +
geom_bar(stat = "identity") +
scale_fill_hue(l=30) +
coord_flip() +
xlab("Event") +
ylab("Losses per event in mln US dollars") +
theme_minimal(base_size = 10) +
guides(fill=FALSE) +
ggtitle("Economic losses per natural event")
plot_grid(a1, a2, ncol = 2, nrow = 1)