The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
Your data analysis must address the following questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Only the most 5 destructive events were used to draw the plots. The
first figure illustrates the number of injuries/fatalities per storm
type. Tornadoes were the most life threatening event. With more than 90K
injures and around 5.6k deaths, Tornadoes surpassing other events by
huge difference. Whereas, thunderstorm wind and excessive heat took the
second place in injures and fatalities, respectively.
On the other hand, floods were the most costly event as it caused damage
approximate 150 billion dollar to properties, as well as 11 billion
dollar to plants. However, hurricanes/typhoons did half of the damage to
properties i.e, merely 85B dollar and 5.5B dollar to crops. Although
drought was behind the majority of crop wasting (almost 14B dollar), it
did not cause a lot of damage to properties in that period.
First of all, the data was loaded using read.csv(). After that, the data was cleaned using Storm Data Event Table from the National Weather Service Storm Data Documentation as a reference. The table mentions 48 event, therefore the data was classified accordingly. In case of presenting two event types, the most distractive one was considered. Therfore, the codes at the top have more priority to consider as a event type than below. The code is shown below. Additionally, a new class was added (NA/OTHERS) to classify unknown events.
## reading the data
storm <- read.csv ("repdata_data_StormData.csv.bz2")
## modify the data
# remove spaces and unifiy into uppercase
storm$EVTYPE <- trimws(storm$EVTYPE)
storm$EVTYPE <- toupper (storm$EVTYPE)
# codes at the top have more priority to consider as a event type than below
# BLIZZARD
storm$EVTYPE[grep("BLIZZARD", storm$EVTYPE)] <- "BLIZZARD"
# (EXTREME) COLD/WIND
storm$EVTYPE[grepl("COLD|CHILL", storm$EVTYPE) & !grepl("EXTREME", storm$EVTYPE)] <- "COLD/WIND CHILL"
storm$EVTYPE[grepl("LOW", storm$EVTYPE) & grepl("TEMP|RECORD", storm$EVTYPE)] <- "COLD/WIND CHILL"
storm$EVTYPE[grep("COOL", storm$EVTYPE)] <- "COLD/WIND CHILL"
storm$EVTYPE[grepl("COLD|CHILL", storm$EVTYPE) & grepl("EXTREME", storm$EVTYPE)] <- "EXTREME COLD/WIND CHILL"
# LIGHTNING
storm$EVTYPE[grepl("LIGHT", storm$EVTYPE)& !grepl("RAIN", storm$EVTYPE)] <- "LIGHTNING"
storm$EVTYPE[agrepl("LIGHT", storm$EVTYPE)& !grepl("RAIN", storm$EVTYPE)] <- "LIGHTNING"
# TORNADO
storm$EVTYPE[grep("TORNADO|GUSTN", storm$EVTYPE)] <- "TORNADO"
storm$EVTYPE[agrep("TORNADO|GUSTN", storm$EVTYPE)] <- "TORNADO"
# THUNDERSTORM
storm$EVTYPE[grep("TSTM|THUNDER|DOWNBURST|MICROBURST|METRO STORM", storm$EVTYPE)] <- "THUNDERSTORM WIND"
storm$EVTYPE[agrep("TSTM|THUNDER|DOWNBURST", storm$EVTYPE)] <- "THUNDERSTORM WIND"
# FOG
storm$EVTYPE[grepl("FOG|VOG", storm$EVTYPE)& !grepl("FREEZE", storm$EVTYPE)] <- "DENSE FOG"
storm$EVTYPE[agrepl("FOG|VOG", storm$EVTYPE)& !grepl("FREEZE", storm$EVTYPE)] <- "DENSE FOG"
storm$EVTYPE[grepl("FOG", storm$EVTYPE)& grepl("FREEZE", storm$EVTYPE)] <- "FREEZE FOG"
# DENSE SMOKE
storm$EVTYPE[grep("SMOKE", storm$EVTYPE)] <- "DENSE SMOKE"
# (EXCESSIVE) HEAT
storm$EVTYPE[grepl("HEAT|WARM", storm$EVTYPE) & grepl("EXTREME|EXCESSIVE|ABNORM", storm$EVTYPE)] <- "EXCESSIVE HEAT"
storm$EVTYPE[grepl("HEAT|WARM|HOT|HYPERTHERMIA", storm$EVTYPE) & !grepl("EXCESSIVE", storm$EVTYPE)] <- "HEAT"
storm$EVTYPE[grepl("HIGH", storm$EVTYPE) & grepl("TEMP|RECORD|PREC", storm$EVTYPE)] <- "HEAT"
# WATERSPOUT
storm$EVTYPE[grep("SPOUT", storm$EVTYPE)] <- "WATERSPOUT"
# FLOOD
storm$EVTYPE[grep("COASTAL|EROS", storm$EVTYPE)] <- "COASTAL FLOOD"
storm$EVTYPE[grep("FLASH|URBAN", storm$EVTYPE)] <- "FLASH FLOOD"
storm$EVTYPE[grep("LAKE FLOOD", storm$EVTYPE)] <- "LAKESHORE FLOOD"
storm$EVTYPE[grepl("FLOOD|STEAM|STREAM|RISING|FLOYD|DAM|DROWN|\\bWATER\\b", storm$EVTYPE) & !grepl("FLASH|COASTAL|LAKESHORE", storm$EVTYPE)] <- "FLOOD"
storm$EVTYPE[agrepl("FLOOD|STEAM|STREAM|RISING|FLOYD|DAM", storm$EVTYPE) & !grepl("FLASH|COASTAL|LAKESHORE", storm$EVTYPE)] <- "FLOOD"
# HURRICANE/TYPHOON
storm$EVTYPE[grep("HURRICANE|TYPHOON", storm$EVTYPE)] <- "HURRICANE/TYPHOON"
# DROUGHT
storm$EVTYPE[grep("DROUGHT|DRY|DRIEST", storm$EVTYPE)] <- "DROUGHT"
# DUST
storm$EVTYPE[grepl("DUST", storm$EVTYPE) & grepl("STORM", storm$EVTYPE)] <- "DUST STORM"
storm$EVTYPE[grepl("DUST", storm$EVTYPE) & !grepl("STORM", storm$EVTYPE)] <- "DUST DEVIL"
# FROST/FREEZE
storm$EVTYPE[grep("FROST|FREEZ", storm$EVTYPE)] <- "FROST/FREEZE"
# FUNNEL CLOUD
storm$EVTYPE[grep("CLOUD|FUNNEL", storm$EVTYPE)] <- "FUNNEL CLOUD"
# MARINE
storm$EVTYPE[grepl("MARINE", storm$EVTYPE) & !grepl("WIND", storm$EVTYPE)] <- "MARINE HAIL"
# HAIL
storm$EVTYPE[grepl("HAIL", storm$EVTYPE) & !grepl("MARINE", storm$EVTYPE)] <- "HAIL"
# SNOW
storm$EVTYPE[grepl("SNOW", storm$EVTYPE) & grepl("LAKE", storm$EVTYPE)] <- "LAKE-EFFECT SNOW"
storm$EVTYPE[agrepl("SNOW|AVALA", storm$EVTYPE) & !grepl("LAKE", storm$EVTYPE)] <- "HEAVY SNOW"
storm$EVTYPE[grepl("SNOW|AVALA", storm$EVTYPE) & !grepl("LAKE", storm$EVTYPE)] <- "HEAVY SNOW"
# ICE STORM
storm$EVTYPE[grep("ICE|ICY|GLAZ", storm$EVTYPE)] <- "ICE STORM"
# HIGH SURF
storm$EVTYPE[grep("SURF|SEA|WAVE|SWELL", storm$EVTYPE)] <- "HIGH SURF"
# RAIN
storm$EVTYPE[grep("RAIN|WET|SHOWER|PRECIP", storm$EVTYPE)] <- "HEAVY RAIN"
# WIND
storm$EVTYPE[grepl("WIND|TURBULENCE", storm$EVTYPE)& !grepl("STORM|COLD|WINTER|MARINE|Strong", storm$EVTYPE)] <- "HIGH WIND"
storm$EVTYPE[grepl("WIND", storm$EVTYPE)& !grepl("THUNDERSTORM|COLD|WINTER|MARINE|HIGH", storm$EVTYPE)] <- "STRONG WIND"
# RIP CURRENT
storm$EVTYPE[grep("CURRENT", storm$EVTYPE)] <- "RIP CURRENT"
# SLEET
storm$EVTYPE[grep("SLEET", storm$EVTYPE)] <- "SLEET"
# STORM SURGE/TIDE
storm$EVTYPE[grepl("TIDE|SURGE", storm$EVTYPE)& !grepl("\\bLOW\\b", storm$EVTYPE)] <- "STORM SURGE/TIDE"
# TROPICAL STORM
storm$EVTYPE[grep("TROPICAL STORM", storm$EVTYPE)] <- "TROPICAL STORM"
# VOLCANIC ASH
storm$EVTYPE[grep("VOLCANIC", storm$EVTYPE)] <- "VOLCANIC ASH"
# WILDFIRE
storm$EVTYPE[grep("WILD|FIRE|RED", storm$EVTYPE)] <- "WILDFIRE"
# WINTER
storm$EVTYPE[grepl("WINTER", storm$EVTYPE)& grepl("STORM", storm$EVTYPE)] <- "WINTER STORM"
storm$EVTYPE[grepl("WINTER|WINT|WND|MIX", storm$EVTYPE)& !grepl("STORM", storm$EVTYPE)] <- "WINTER WEATHER"
# UNKNOWN Events
storm$EVTYPE[grep("SUMMARY|\\?|RECORD|ROCK|SLIDE|OTHER|\\bNA\\b|\\bNO\\b", storm$EVTYPE)] <- "NA/OTHERS"
storm$EVTYPE[grep("COUNTY|LAND|SOUTHEAST|^HIGH$|^EXCESSIVE$|MILD|MONTH|NONE", storm$EVTYPE)] <- "NA/OTHERS"
According the data, population health is represented by injuries and deaths. The data was splitted based on the storm type and the sum the total numbers of both aspects was calculated. The results for the top 5 storm types for each category are shown in the form of bar plots.
# Injuries
inj <- aggregate(INJURIES ~ EVTYPE, data = storm, FUN = sum)
inj<- inj[order(inj$INJURIES, decreasing = TRUE), ]
top.inj <- inj[1:5, ]
# Fatalities
death <- aggregate(FATALITIES ~ EVTYPE, data = storm, FUN = sum)
death <- death[order(death$FATALITIES , decreasing = TRUE), ]
top.dd <- death[1:5, ]
#draw plots
par(mfrow= c(1,2))
barplot(top.inj$INJURIES, names.arg = top.inj$EVTYPE,
main = "Number of Injuries Per Storm Type", xlab = "Event",
ylab = "Number of Injuries", col= "blue", las = 2)
barplot(top.dd$FATALITIES, names.arg = top.dd$EVTYPE,
main = "Number of Death Per Storm Type", xlab = "Event",
ylab = "Number of Death", ylim = c(0,6000),col= "blue", las = 2)
Economic consequences were measured through estimated property and crop damage. The data was splitted based on the storm type and the total numbers of costs after unify the units was calculated. Then bar plots were drawn.
# Property Damage Estimates
storm$PROPDMGEXP <- toupper(storm$PROPDMGEXP)
storm$PROP <- ifelse(storm$PROPDMGEXP == "K", 1e3,
ifelse(storm$PROPDMGEXP == "M", 1e6,
ifelse(storm$PROPDMGEXP == "B", 1e9, 1)))
storm$PROP2 <- storm$PROPDMG * storm$PROP
pde <- aggregate(PROP2 ~ EVTYPE, data = storm, FUN = sum)
pde <- pde[order(pde$PROP2, decreasing = TRUE), ]
top.pde <- pde[1:5, ]
#Crop Damage Estimates
storm$CROPDMGEXP <- toupper(storm$CROPDMGEXP)
storm$crop <- ifelse(storm$CROPDMGEXP == "K", 1e3,
ifelse(storm$CROPDMGEXP == "M", 1e6,
ifelse(storm$CROPDMGEXP == "B", 1e9, 1)))
storm$crop2 <- storm$CROPDMG * storm$crop
cde <- aggregate(crop2 ~ EVTYPE, data = storm, FUN = sum)
cde <- cde[order(cde$crop2, decreasing = TRUE), ]
top.cde <- cde[1:5, ]
#draw plots
par(mfrow= c(1,2))
barplot(top.pde$PROP2, names.arg = top.pde$EVTYPE,
main = "Property Damage Estimates Per Storm Type", xlab = "Event",
ylab = "Property Damage Estimates", col= "blue", las = 2)
barplot(top.cde$crop2, names.arg = top.cde$EVTYPE,
main = "Crop Damage Estimates Per Storm Type", xlab = "Event",
ylab = "Crop Damage Estimates", col= "blue", las = 2)