The present work studies the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The events considered begin in 1950 until 2011.
We aim to answer the following questions:
To answer those questions, we consider only the following columns in the data set:
With that information, we can proceed as follows. First, we split the fatalities by event and computed their sums. Next, we split of injuries by event and computed their sums. Then, we found the event that caused the most fatalities/injuries. Furthermore, we made a couple of plots showing those results.
After that, we split the amount of dollars due to damage by event and compute their sums. Then, we found the event with the most expensive cost damage. Finally, we made a plot showing those results.
The used data set can be downloaded at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, and its documentation can be consulted at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.
First, we can create a directory where the generated figures are going to be saved:
if ( !dir.exists("figure") ) dir.create("figure")
The file repdata_data_StormData.csv contains 37 columns; as mentioned above, not all are needed. We can only load the necessary data as following
df <- read.csv("repdata_data_StormData.csv", colClasses = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "character", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "numeric", "numeric", "numeric", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"))
We can verify that the correct columns have been loaded
names(df)
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG"
Also, we can see that there are no missing values in our data set
sapply(df, function(x) sum(is.na(x)))
## EVTYPE FATALITIES INJURIES PROPDMG
## 0 0 0 0
Hence, we can proceed with our analysis.
First, we can split the fatalities/injuries by events, like this
fatalities_by_type <- split(df$FATALITIES, df$EVTYPE)
injuries_by_type <- split(df$INJURIES, df$EVTYPE)
And now, we can calculate the sums across all the events, just like this
s_f <- sapply(fatalities_by_type, sum)
s_i <- sapply(injuries_by_type, sum)
Now, we are ready to answer the first question: Across the United States, which types of events are most harmful concerning population health? These are the most harmful events concerning population health in terms of fatalities and injuries
s_f[which.max(s_f)]
## TORNADO
## 5633
s_i[which.max(s_i)]
## TORNADO
## 91346
Although the number of casualties differs, the event type is the same: Tornado. We can see how harmful is this event in comparison with the rest in the following plot
plot(s_f, main="Fatalities by event type", xaxt="n", xlab = "Event type", ylab="")
points(which.max(s_f), s_f[which.max(s_f)], pch=19, col="red")
legend("topleft", legend="Tornado", fill = "red")
dev.copy(png, "figure/fatalities.png")
## png
## 3
dev.off()
## png
## 2
plot(s_i, main="Injuries by event type", xaxt="n", xlab = "Event type", ylab="")
points(which.max(s_i), s_i[which.max(s_i)], pch=19, col="red")
legend("topleft", legend="Tornado", fill = "red")
dev.copy(png, "figure/injuries.png")
## png
## 3
dev.off()
## png
## 2
Notice that both figures have been saved in the figure directory.
To answer the second question, we can conduct a similar analysis. First, we can split the damage cost by event and calculate the sums across all the events like this
damage_by_type <- split(df$PROPDMG, df$EVTYPE)
s_d <- sapply(damage_by_type, sum)
Now, we are prepared to answer the second question: Across the United States, which types of events have the most significant economic consequences? This is the most harmful event concerning economic cost
s_d[which.max(s_d)]
## TORNADO
## 3212258
Once again, we see that Tornado is the wanted event. We can see how costly this event is in comparison with the rest in the following plot
par(mfrow=c(1,1))
plot(s_d, main="Damage by event type", xaxt="n", xlab = "Event type", ylab="Dollars")
points(which.max(s_d), s_d[which.max(s_d)], pch=19, col="red")
legend("topleft", legend="Tornado", fill = "red")
dev.copy(png, "figure/cost.png")
## png
## 3
dev.off()
## png
## 2
Finally, notice that we have also saved the latter plot in the figure directory.