Synopsys

Analysing the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm data we will answer the following questions:

As a result, the analysis showed that the TORNADOES are the most harmful, with 91346 injuries and 5633 deaths.

The event with the greatest economic consequence are the FLOODS with 150 billion dollars in damages.

Read the full analysis to see the complete report.

Loading the necessary libraries

library(ggplot2)

Data Processing

setwd("C:/Users/Reinaldo/Desktop/coursera-JHU/reproducibleresearch/week4")

##importing just the necessary columns 
df <- read.csv("repdata-data-StormData.csv.bz2", stringsAsFactors = FALSE, strip.white=TRUE, header=TRUE)[,c("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Checking the data

head(df,10)
##     EVTYPE           BGN_DATE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1  TORNADO  4/18/1950 0:00:00          0       15    25.0          K
## 2  TORNADO  4/18/1950 0:00:00          0        0     2.5          K
## 3  TORNADO  2/20/1951 0:00:00          0        2    25.0          K
## 4  TORNADO   6/8/1951 0:00:00          0        2     2.5          K
## 5  TORNADO 11/15/1951 0:00:00          0        2     2.5          K
## 6  TORNADO 11/15/1951 0:00:00          0        6     2.5          K
## 7  TORNADO 11/16/1951 0:00:00          0        1     2.5          K
## 8  TORNADO  1/22/1952 0:00:00          0        0     2.5          K
## 9  TORNADO  2/13/1952 0:00:00          1       14    25.0          K
## 10 TORNADO  2/13/1952 0:00:00          0        0    25.0          K
##    CROPDMG CROPDMGEXP
## 1        0           
## 2        0           
## 3        0           
## 4        0           
## 5        0           
## 6        0           
## 7        0           
## 8        0           
## 9        0           
## 10       0

From this first look at the data frame, we see that we will have to pre-processing the data. Here I’ll adjust the date, column names and the level of detail by agregating some important columns to our analysis.

The documentation of NOAA data says that that “h” or “H” means 102, “k” or “K” means 103, “m” or “M” means 106 and “b” or “B” means 109. I will ignore others string characters.

  • Fixing Datetime
df$BGN_DATE <- as.POSIXct(df$BGN_DATE,format="%m/%d/%Y %H:%M:%S")
  • Converting headers to lowercase
names(df) <- tolower(names(df))
  • Agregating Data
fatals <- aggregate(fatalities ~ evtype, data = df, FUN = sum)

fatals <- fatals[order(fatals$fatalities, decreasing = T), ]


injuries <- aggregate(injuries ~ evtype, data = df, FUN = sum)

injuries <- injuries[order(injuries$injuries, decreasing = T), ]
  • Fixing the dollar values
pd <- df$propdmg
pde <- df$propdmgexp
cd <- df$cropdmg
cde <- df$cropdmgexp

pd[pde %in% "B"]         <- pd[pde %in% "B"] * 1000
pd[pde %in% c("M", "m")] <- pd[pde %in% c("M", "m")] * 1
pd[pde %in% c("K")]      <- pd[pde %in% c("K")] * 0.001
pd[pde %in% c("H", "h")] <- pd[pde %in% c("H", "h")] * 1e-04
pd[!(pde %in% c("B", "M", "m", "K", "H", "h"))] <- pd[!(pde %in% c("B", "M", "m", "K", "H", "h"))] * 1e-06

cd[cde %in% "B"]                           <- cd[cde %in% "B"] * 1000
cd[cde %in% c("M", "m")]                   <- cd[cde %in% c("M", "m")] * 1
cd[cde %in% c("K", "k")]                   <- cd[cde %in% c("K", "k")] * 0.001
cd[!(cde %in% c("B", "M", "m", "K", "k"))] <- cd[!(cde %in% c("B", "M", "m", "K", "k"))] * 1e-06


##sumarizes data
econdmg <- cd + pd
edt <- aggregate(econdmg ~ df$evtype, FUN = sum)
oedt <- edt[order(edt$econdmg, decreasing = T), ]
names(oedt)[1] <- "evtype"

Results

1.Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

  • Fatalities
ggplot(fatals[1:5, ], aes(x=reorder(evtype, -fatalities), y= fatalities)) + geom_bar(stat = "identity", fill="red") + geom_text(aes(label=fatalities), size=3, vjust="inward") +
    ylab("Fatalities") + xlab("Event Type") + ggtitle("Top Five Types of Events Causing Deaths Across the U.S")

  • Injuries
ggplot(injuries[1:5, ], aes(x=reorder(evtype, -injuries), y= injuries)) + geom_bar(stat = "identity", fill="blue") + geom_text(aes(label=injuries), size=3, vjust="inward") +
    ylab("Injuries") + xlab("Event Type") + ggtitle("Top Five Types of Events Causing Injuries Across the U.S")

2.Across the United States, which types of events have the greatest economic consequences?

ggplot(oedt[1:5, ], aes(x=reorder(evtype, -econdmg), y= econdmg)) + geom_bar(stat = "identity", fill="green") + ylab("Economic Damages (million dollars)") + 
    xlab("Event Type") + ggtitle("Top Five Types of Events Causing Economic Damages Across the U.S")

Final Conclusion

  • Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The tornadoes are the most harmful, with more than 90,000 people hurt or dead.

  • Across the United States, which types of events have the greatest economic consequences?

The floods, with more than 150 billion dollars have the greatest economic consequences.