Synopsis

The present study aims to characterize the effects of the different weather events in people’s lives and their properties. For this effect was taken the U.S. National Oceanic and Atmospheric Administration’s (NOAA) Storm Database. This database contains information about the time and areas affected, its nature and impact in terms of the population’s heath and property damage. In this study we will focus on answering the two following questions:

Data Processing

Download the database and load it into a data frame. In order to prevent the original data adulteration, the date is cloned to another data frame.

if(!file.exists("data")){
  dir.create("data")
}
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","./data/data.csv.bz2", mode="wb")
data <- read.csv("./data/data.csv.bz2")
dta <- data

Data structure

dim(dta)
## [1] 902297     37

The available information consists in 37 columns and 332195 rows.

Available fields.

names(dta)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

To answer the first question, besides the event type (EVTYPE), we will use both fatalities (FATALITIES) and injuries (INJURIES) fields.

sapply(data[23:24], function(x) sum(is.na(x)))
## FATALITIES   INJURIES 
##          0          0
str(data[23:24])
## 'data.frame':    902297 obs. of  2 variables:
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...

Both are in a numeric form and do not have any missing value.

For the second question, besides the event type (EVTYPE), we will use the property damage (PROPDMG, PROPDMGEXP) and crop damage (CROPDMG, CROPDMGEXP). Each of these parameters have a complementary unit which is necessary to integrate on order to proceed with further calculations.

sapply(data[25:28], function(x) sum(is.na(x)))
##    PROPDMG PROPDMGEXP    CROPDMG CROPDMGEXP 
##          0          0          0          0
str(data[23:24])
## 'data.frame':    902297 obs. of  2 variables:
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...

Considering the following categories:

unique(dta$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
unique(dta$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M

It is necessary to convert the unit signals (H, K, M, B) to the respective quantity (100, 1000, 1000000, 1000000000). The remaining values will be imputed as zero. The result is multiplied by the respective field integrating the unit of measurement.

unique(dta$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
dta$PROPDMGEXP <- toupper(as.character(dta$PROPDMGEXP))
dta$CROPDMGEXP <- toupper(as.character(dta$CROPDMGEXP))
dta$PROPDMGEXP[dta$PROPDMGEXP=="H"] <-100
dta$PROPDMGEXP[dta$PROPDMGEXP=="K"] <-1000
dta$PROPDMGEXP[dta$PROPDMGEXP=="M"] <-1000000
dta$PROPDMGEXP[dta$PROPDMGEXP=="B"] <-1000000000
dta$PROPDMGEXP[-grep("[0-9]",dta$PROPDMGEXP)] <-0
dta$PROPDMGEXP <- as.numeric(dta$PROPDMGEXP)
dta$CROPDMGEXP[dta$CROPDMGEXP=="H"] <-100
dta$CROPDMGEXP[dta$CROPDMGEXP=="K"] <-1000
dta$CROPDMGEXP[dta$CROPDMGEXP=="M"] <-1000000
dta$CROPDMGEXP[dta$CROPDMGEXP=="B"] <-1000000000
dta$CROPDMGEXP[-grep("[0-9]",dta$CROPDMGEXP)] <-0
dta$CROPDMGEXP <- as.numeric(dta$CROPDMGEXP)
dta$PROPDMGTOT <- as.numeric(dta$PROPDMG)*dta$PROPDMGEXP
dta$CROPDMGTOT <- as.numeric(dta$CROPDMG)*dta$CROPDMGEXP

Results

Necessary libraries.

library(reshape2)
library(ggplot2)

Evaluation of the Most harmful Events for Population Health

To quantify the impact of each event has in people’s health we sum the number of fatalities and injuries cause by each event across United States.

type_tot <- aggregate(cbind(data$INJURIES,data$FATALITIES), list(Events = data$EVTYPE), FUN = sum)

Even considering the difference between a fatality and an injure, for the definition of the event impact, this study aggregate them together and selects as the most harmful Events for Population Health, the ones with more fatalities and injuries. It is created a new parameter, containing the total of injuries and fatalities. The events are reorder just to promote a better visual perception of the results when plotting the results.

type_tot$total <- type_tot$V1+type_tot$V2
type_tot$Events <- reorder(type_tot$Event, -type_tot$total)

The results are ordered according to the population heath impact and selected the 10 major events.

names(type_tot) <- c("Event", "Injuries", "Fatalities", "total")
type_tot2 <- head(type_tot[order(-type_tot$total),1:3],10)
type_tot3 <- melt(type_tot2, id.var = c("Event"))
str(type_tot3)
## 'data.frame':    20 obs. of  3 variables:
##  $ Event   : Factor w/ 985 levels "TORNADO","EXCESSIVE HEAT",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ variable: Factor w/ 2 levels "Injuries","Fatalities": 1 1 1 1 1 1 1 1 1 1 ...
##  $ value   : num  91346 6525 6957 6789 5230 ...
names(type_tot3) <- c("Event", "Type", "Casualties")

The final chart shows the number of fatalities and injuries by the different weather events.

ggplot(data = type_tot3, aes(x = Event, y = Casualties, fill=Type)) + 
  geom_bar(stat = "identity")+ 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  ggtitle("Most Harmful Events to Population Health")

Whether in the case of injuries or fatalities, this plot presents tornado as the most harmful event for people health.

Evaluation of the Natural Events with the Greatest Adverse Economic Consequences

To quantify the impact that each event has in the economy we sum the property damages with the crops damages, across United States.

dmg_tot <- aggregate(cbind(dta$PROPDMGTOT ,dta$CROPDMGTOT), list(Events = data$EVTYPE), FUN = sum)
dmg_tot$total <- dmg_tot$V1+dmg_tot$V2
dmg_tot$Events <- reorder(dmg_tot$Event, -dmg_tot$total)
dmg_tot1 <- subset(dmg_tot, total>0)
names(dmg_tot1) <- c("Event", "Properties", "Crops", "total")

Selection of them 10 most destructive events.

dmg_tot2 <- head(dmg_tot1[order(-dmg_tot1$total),1:3],10)

dmg_tot3 <- melt(dmg_tot2, id.var = c("Event"))
str(dmg_tot3)
## 'data.frame':    20 obs. of  3 variables:
##  $ Event   : Factor w/ 985 levels "FLOOD","HURRICANE/TYPHOON",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ variable: Factor w/ 2 levels "Properties","Crops": 1 1 1 1 1 1 1 1 1 1 ...
##  $ value   : num  1.45e+11 6.93e+10 5.69e+10 4.33e+10 1.57e+10 ...

The final chart shows the amount of damages in properties and crops by the different weather events.

names(dmg_tot3) <- c("Event", "Damage", "Dollars")
ggplot(data = dmg_tot3, aes(x = Event, y = Dollars, fill=Damage)) + 
  geom_bar(stat = "identity")+ 
  theme(axis.text.x = element_text(angle = 45, hjust = 1))+
  ggtitle("Most Harmful Events to the Economy")

As in the case of population health, the tornado ranks in first place as the most adverse weather event for the economy. In what concerns to the economy tornado mainly affect the properties while events like droughts have a major effect on crops.