Severe weather events can be harmful with respect to population health and can have economic consequences. In this report, we found out which events impact most considering these two aspects (health and economic).
Using the NOAA Storm Database, we checked the number of fatalites, injuries and the total damage caused by the events available in the database.
The event with more fatalities associated with is the TORNADO, followed by EXCESSIVE HEAT, FLASH FLOOD, HEAD, and LIGHTNING.
The event with more injuries associated with is the TORNADO, followed by TSTM WIND, FLOOD, EXCESSIVE HEAT, and LIGHTNING
The event with the greatest economic consequences is FLOOD, followed by HURRICANY/TYPHOON, TORNADO, STORM SURGE, and HAIL.
First we download the data to the workspace directory and open it using read.csv function. The download is done only if the file does not exists. The read.csv function read the data in the zipped file. Since it can take some time, the following code is cached, so it is only readed once. So, from now on, we will not update the variable original_data.
# Check if file exists in the working directory
local_file <- "StormData.csv.bz2"
fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
if(!file.exists(local_file)){
download.file(fileURL, local_file, method="curl")
}
original_data <- read.csv(bzfile(local_file), sep=",", header=T)
We can check the structure of original_data:
str(original_data)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
As we can observe, there are some variables that will help us in this analysis:
Let’s create a new table with only those variables. To make it easier to analyse the total damage, we will consider the total damage as the sum of property damage and crop damage. This operation can be cached:
prop_dmg <- original_data[, "PROPDMG"]
prop_dmg <- prop_dmg * ((original_data[, "PROPDMGEXP"] == "H") * 1e2) +
prop_dmg * ((original_data[, "PROPDMGEXP"] == "K") * 1e3) +
prop_dmg * ((original_data[, "PROPDMGEXP"] == "M") * 1e6) +
prop_dmg * ((original_data[, "PROPDMGEXP"] == "B") * 1e9)
crop_dmg <- original_data[, "CROPDMG"]
crop_dmg <- crop_dmg * ((original_data[, "CROPDMGEXP"] == "H") * 1e2) +
crop_dmg * ((original_data[, "CROPDMGEXP"] == "K") * 1e3) +
crop_dmg * ((original_data[, "CROPDMGEXP"] == "M") * 1e6) +
crop_dmg * ((original_data[, "CROPDMGEXP"] == "B") * 1e9)
# Get the total damage scalled to billion
total_dmg <- (prop_dmg + crop_dmg)/1e9
subset_original <- original_data[, c("EVTYPE", "FATALITIES", "INJURIES")]
subset_original <- cbind(subset_original, TOTALDMG = total_dmg)
Is there any NA in this subset?
sum(is.na(subset_original))
## [1] 0
So, there are no NA in this subset.
Finally, we intend to know about the effects of these events in the economics and in the population health. So, we need to sum the fatalities, injuries and the total damage for all the events. This can be achieved using the aggregate function:
result <- aggregate(cbind(FATALITIES, INJURIES, TOTALDMG)~EVTYPE,
data=subset_original, sum)
Now we can use the variable result to find the events that impacts more in the population health and with the greatest economic consequences. So, let’s sort the variable result in the
print(paste(
result[which.max(result$FATALITIES),"EVTYPE"],
"is the event that causes more fatalities (",
max(result$FATALITIES),
") fatalities"))
## [1] "TORNADO is the event that causes more fatalities ( 5633 ) fatalities"
print(paste(
result[which.max(result$INJURIES),"EVTYPE"],
"is the event that causes more injuries (",
max(result$INJURIES),
") injuries"))
## [1] "TORNADO is the event that causes more injuries ( 91346 ) injuries"
print(paste(
result[which.max(result$TOTALDMG),"EVTYPE"],
"is the event with the greatest economic consequences (",
max(result$TOTALDMG)/1e9,
"billion dollars )"))
## [1] "FLOOD is the event with the greatest economic consequences ( 1.5031967825e-07 billion dollars )"
We can also check the top 5 most important events in these aspects (population health and economic consequences).
First, the top 5 events with more fatalities:
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.4.1
# Sort FATALITIES in decreasing order
result <- result[with(result, order(-FATALITIES)), ]
ggplot(result[1:5,], aes(x=reorder(EVTYPE, -FATALITIES), y=FATALITIES)) +
labs(x="Event", y="Fatalities") +
ggtitle("Top 5 events that causes more fatalities") +
theme(panel.background = element_blank(),
panel.grid.major = element_line(colour="grey"),
axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_bar(stat="identity", fill=rgb(56,146,208, maxColorValue = 255))
Second, let’s see the top 5 events that cause more injuries:
# Sort INJURIES in decreasing order
result <- result[with(result, order(-INJURIES)), ]
ggplot(result[1:5,], aes(x=reorder(EVTYPE, -INJURIES), y=INJURIES)) +
labs(x="Event", y="Injuries") +
ggtitle("Top 5 events that causes more injuries") +
theme(panel.background = element_blank(),
panel.grid.major = element_line(colour="grey"),
axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_bar(stat="identity", fill=rgb(56,146,208, maxColorValue = 255))
Finally, let’s check the top 5 events with more economic consequences:
# Sort TOTALDMG in decreasing order
result <- result[with(result, order(-TOTALDMG)), ]
ggplot(result[1:5,], aes(x=reorder(EVTYPE, -TOTALDMG), y=TOTALDMG)) +
labs(x="Event", y="Total damage (billion dollars)") +
ggtitle("Top 5 events with more economic consequences") +
theme(panel.background = element_blank(),
panel.grid.major = element_line(colour="grey"),
axis.text.x = element_text(angle = 45, hjust = 1)) +
geom_bar(stat="identity", fill=rgb(56,146,208, maxColorValue = 255))