storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administrations (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The basic goal of this assignment is to explore the NOAA storm Database and answer some basic questions about severe weather events. The database is used to answer the questions below and show the code for entire analysis:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
Data is downloaded and stored in the proiject directory. The Data is then processed and analyzed to calculate fatalities and injuries to humans. We use a data frame to aggregate both fatal and non-fatal injuries. - We assess the economic impact by calculating the value of the property and damage.
Using the readCSV function, the downloaded data is read into R for processing.
setwd("C:/Training/Data Science/Reproducible Research/RepData_PeerAssessment2")
stormData <- read.csv("repdata_data_storm.csv.bz2")
To find the event types that are most harmful the number of FATALITIES and INJURIES are aggregated by the event type.
Fatalities_Per_Event <- aggregate(as.numeric(FATALITIES) ~ EVTYPE, data=stormData, sum)
Injuries_Per_Event <- aggregate(as.numeric(INJURIES) ~ EVTYPE, data=stormData, sum)
# Top 10 event types for FATALITIES
#Fatalities_Per_Event_Top10 <- head(Fatalities_Per_Event[order(Fatalities_Per_Event$FATALITIES, decreasing = TRUE),],10)
Fatalities_Per_Event_Top10 <- read.csv("Fatalities_Per_Event_Top10.csv")
# Top 10 event types for INJURIES
#Injuries_Per_Event_Top10 <- head(Injuries_Per_Event[order(Injuries_Per_Event$INJURIES, decreasing = TRUE),],10)
Injuries_Per_Event_Top10 <- read.csv("Injuries_Per_Event_Top10.csv")
# Set the levels in order
p1 <- ggplot(data=Fatalities_Per_Event_Top10,
aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES, fill=FATALITIES)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of FATALITIES") +
xlab("Event type") +
theme(legend.position="none")
p2 <- ggplot(data=Injuries_Per_Event_Top10,
aes(x=reorder(EVTYPE, INJURIES), y=INJURIES, fill=INJURIES)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of injuries") +
xlab("Event type") +
theme(legend.position="none")
#grid.arrange(p1, p2, top=textGrob("Top deadly weather events in the US (1950-2011)",gp=gpar(fontsize=14,font=3)))
grid.arrange(p1, p2, ncol=2, top = "Top 10 deadly weather events in the US (1950-2011)")
Based on the analysis above tornadoes cause most number of deaths and injuries in the United States. There are more than 5,000 deaths and more than 10,000 injuries in the last 60 years in US caused by tornadoes.
To analyze the impact of weather events on the economy, available property damage and crop damage reportings/estimates were used.
In the raw data, the property damage is represented with two fields, a number PROPDMG in dollars and the exponent PROPDMGEXP. Similarly, the crop damage is represented using two fields, CROPDMG and CROPDMGEXP. The first step in the analysis is to calculate the property and crop damage for each event.
# Function to translate the exponent:
exp_transform <- function(e) {
# h -> hundred, k -> thousand, m -> million, b -> billion
if (e %in% c('h', 'H'))
return(2)
else if (e %in% c('k', 'K'))
return(3)
else if (e %in% c('m', 'M'))
return(6)
else if (e %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(e))) # if a digit
return(as.numeric(e))
else if (e %in% c('', '-', '?', '+'))
return(1)
else {
stop("Invalid exponent value.")
}
}
prop_dmg_exp <- sapply(stormData$PROPDMGEXP, FUN=exp_transform)
stormData$PROPDMG_incl_exp <- stormData$PROPDMG * (10 ** prop_dmg_exp)
crop_dmg_exp <- sapply(stormData$CROPDMGEXP, FUN=exp_transform)
stormData$CROPDMG_incl_exp <- stormData$CROPDMG * (10 ** crop_dmg_exp)
# Aggregate economic loss by event type
PROPDMG_incl_exp_per_evt <- aggregate(PROPDMG_incl_exp ~ EVTYPE, data=stormData, sum)
CROPDMG_incl_exp_per_evt <- aggregate(CROPDMG_incl_exp ~ EVTYPE, data=stormData, sum)
# Top 10 event types for property damage
PROPDMG_Top10 <- head(PROPDMG_incl_exp_per_evt[order(PROPDMG_incl_exp_per_evt$PROPDMG_incl_exp, decreasing = TRUE),],10)
PROPDMG_Top10
## EVTYPE PROPDMG_incl_exp
## 153 FLASH FLOOD 6.820237e+13
## 786 THUNDERSTORM WINDS 2.086532e+13
## 834 TORNADO 1.078951e+12
## 244 HAIL 3.157558e+11
## 464 LIGHTNING 1.729433e+11
## 170 FLOOD 1.446577e+11
## 411 HURRICANE/TYPHOON 6.930584e+10
## 185 FLOODING 5.920826e+10
## 670 STORM SURGE 4.332354e+10
## 310 HEAVY SNOW 1.793259e+10
# Top 10 event types for crop damage
CROPDMG_Top10 <- head(CROPDMG_incl_exp_per_evt[order(CROPDMG_incl_exp_per_evt$CROPDMG_incl_exp, decreasing = TRUE),],10)
CROPDMG_Top10
## EVTYPE CROPDMG_incl_exp
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025974480
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
p3 <- ggplot(data=PROPDMG_Top10,
aes(x=reorder(EVTYPE, PROPDMG_incl_exp), y=PROPDMG_incl_exp, fill=PROPDMG_incl_exp)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of property damage") +
xlab("Event type") +
theme(legend.position="none")
p4 <- ggplot(data=CROPDMG_Top10,
aes(x=reorder(EVTYPE, CROPDMG_incl_exp), y=CROPDMG_incl_exp, fill=CROPDMG_incl_exp)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of crop damage") +
xlab("Event type") +
theme(legend.position="none")
grid.arrange(p3, p4, ncol=2, top = "Top 10 weather events that caused most economic damage in the US (1950-2011)")
Based on the analysis, flash floods and thunderstorm winds cost the largest property damages among weather-related natural diseasters. The most severe weather event in terms of crop damage is drought.