Synopsis

The present work studies the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The events considered begin in 1950 until 2011.

We aim to answer the following questions:

  1. Across the United States, which types of events are most harmful concerning to population health?
  2. Across the United States, which types of events have the most significant economic consequences?

To answer those questions, we consider only the following columns in the data set:

  1. EVTYPE
  2. FATALITIES
  3. INJURIES
  4. PROPDMG

With that information, we can proceed as follows. First, we split the fatalities by event and computed their sums. Next, we split of injuries by event and computed their sums. Then, we found the event that caused the most fatalities/injuries. Furthermore, we made a couple of plots showing those results.

After that, we split the amount of dollars due to damage by event and compute their sums. Then, we found the event with the most expensive cost damage. Finally, we made a plot showing those results.

The used data set can be downloaded at https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2, and its documentation can be consulted at https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf.

Data Proccesing

First, we can create a directory where the generated figures are going to be saved:

if ( !dir.exists("figure") )  dir.create("figure")

The file repdata_data_StormData.csv contains 37 columns; as mentioned above, not all are needed. We can only load the necessary data as following

df <- read.csv("repdata_data_StormData.csv", colClasses = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "character", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "numeric", "numeric", "numeric", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL", "NULL"))

We can verify that the correct columns have been loaded

names(df)
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"

Also, we can see that there are no missing values in our data set

sapply(df, function(x) sum(is.na(x)))
##     EVTYPE FATALITIES   INJURIES    PROPDMG 
##          0          0          0          0

Hence, we can proceed with our analysis.

Results

First, we can split the fatalities/injuries by events, like this

fatalities_by_type <- split(df$FATALITIES, df$EVTYPE)
injuries_by_type <- split(df$INJURIES, df$EVTYPE)

And now, we can calculate the sums across all the events, just like this

s_f <- sapply(fatalities_by_type, sum)
s_i <- sapply(injuries_by_type, sum)

Now, we are ready to answer the first question: Across the United States, which types of events are most harmful concerning population health? These are the most harmful events concerning population health in terms of fatalities and injuries

s_f[which.max(s_f)]
## TORNADO 
##    5633
s_i[which.max(s_i)]
## TORNADO 
##   91346

Although the number of casualties differs, the event type is the same: Tornado. We can see how harmful is this event in comparison with the rest in the following plot

plot(s_f, main="Fatalities by event type", xaxt="n", xlab = "Event type", ylab="")
points(which.max(s_f), s_f[which.max(s_f)], pch=19, col="red")
legend("topleft", legend="Tornado", fill = "red")

dev.copy(png, "figure/fatalities.png")
## png 
##   3
dev.off()
## png 
##   2
plot(s_i, main="Injuries by event type", xaxt="n", xlab = "Event type", ylab="")
points(which.max(s_i), s_i[which.max(s_i)], pch=19, col="red")
legend("topleft", legend="Tornado", fill = "red")

dev.copy(png, "figure/injuries.png")
## png 
##   3
dev.off()
## png 
##   2

Notice that both figures have been saved in the figure directory.

To answer the second question, we can conduct a similar analysis. First, we can split the damage cost by event and calculate the sums across all the events like this

damage_by_type <- split(df$PROPDMG, df$EVTYPE)
s_d <- sapply(damage_by_type, sum)

Now, we are prepared to answer the second question: Across the United States, which types of events have the most significant economic consequences? This is the most harmful event concerning economic cost

s_d[which.max(s_d)]
## TORNADO 
## 3212258

Once again, we see that Tornado is the wanted event. We can see how costly this event is in comparison with the rest in the following plot

par(mfrow=c(1,1))
plot(s_d, main="Damage by event type", xaxt="n", xlab = "Event type", ylab="Dollars")
points(which.max(s_d), s_d[which.max(s_d)], pch=19, col="red")
legend("topleft", legend="Tornado", fill = "red")

dev.copy(png, "figure/cost.png")
## png 
##   3
dev.off()
## png 
##   2

Finally, notice that we have also saved the latter plot in the figure directory.