1.Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

  1. Across the United States, which types of events are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

2.Data Processing

2.1 Data Loading

The following data processing use reference Data provided by NOAA (National Oceanic and Atmospheric Administration’s U.S) Storm Database downloaded from link provided by the 2nd Assessment in COURSERA Reproducible Research from John Hopkins University.

# Libraries used in this document
library(dplyr)

# Create Data Directory if not exist
if (!file.exists("data")){
     dir.create("data")
}

# Download file from the web
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./data/StormDataUSA.csv.bz2")

# Load file
StormData <- read.csv("./data/StormDataUSA.csv.bz2")

2.2 Examining variables

There is also some documentation of the database available.

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The follow variables are available in the dataset provided:

colnames(StormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

2.3 Subsetting and cleaning Data

According with Storm Data Documentation, direct fatalities/injuries is defined as “a fatality or injury directly attributable to the hydro-meteorological event itself, or impact by airbone/falling/moving debris, i.e., missiles generated by wind, water, ice, lighting, tornado, etc. Fatalities and Injuries directly caused by weather event are defined as”FATALITIES" and “INJURIES” in the Storm Dataset. Other parameter to be used is the overall “DEMAGE” in properties. According with the documentation the property damage estimates “should be entered as actual dollar amounts, if a reasonably accurate estimate from an insurance company or other qualified individual is available”. The Storm Data used is available in Appendix B in reference documentation entitled Property Demage Estimates for the variables “PROPDMG” and “CROPDMG”. “PROPDMGEXP” and “CROPDMGEXP” identify the exponential factor for each value of “PROPDMG” and “CROPDMG” in character.

# Get only data that will be used in this analysis.
SDsub <- subset(StormData, select=c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

A special procedure is required to converting exponent variables in characters to numerical such as: “-”, “+”, “K”, “M”, “B” and others according with the maps bellow:

library(data.table)

# Convert PROPDMGEXP and CROPDMGEXP to numerical.

# Map property damage alphanumeric exponents to numeric values.
propDmgConvert <-  c("\"\"" = 10^0, "-" = 10^0,  "+" = 10^0,
 "0" = 10^0,"1" = 10^1,  "2" = 10^2, "3" = 10^3, "4" = 10^4, "5" = 10^5,"6" = 10^6, "7" = 10^7,"8" = 10^8,"9" = 10^9,"H" = 10^2,"K" = 10^3, "M" = 10^6, "B" = 10^9)
# Map crop damage alphanumeric exponents to numeric values
cropDmgConvert <-  c("\"\"" = 10^0, "?" = 10^0, "0" = 10^0,"K" = 10^3,"M" = 10^6, "B" = 10^9)

# Replace character to numerical
SDsub <- as.data.table(SDsub)
SDsub[, PROPDMGEXP := propDmgConvert[as.character(SDsub[,PROPDMGEXP])]]
SDsub[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]
SDsub[, CROPDMGEXP := cropDmgConvert[as.character(SDsub[,CROPDMGEXP])] ]
SDsub[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]

2.4 Economic Calculations

Once that the exponential factors are converted to numerical, the economic calculations is required adding two new variables in subset data named “DMGCOST” and “CROPCOST”.

# Calculate Cost of Damage and Crop
SDsub$PROPDMGCOST <- SDsub$PROPDMG * SDsub$PROPDMGEXP
SDsub$CROPCOST <- SDsub$CROPDMG * SDsub$CROPDMGEXP

3.Results

This data analysis address two following important questions about the impact of the weather events in USA:

3.1 Across the United States, wich types of events are most harmful with respect to population health?

Calculating total fatalities and Injuries is possible to identify types of events have are most harmful with respect to population health in a list of top 15 events type.

library(knitr)

# Sum all Property Demage Cost and Crop Damage Cost
FATALITIES <- aggregate(SDsub$FATALITIES, by=list(SDsub$EVTYPE), sum)
INJURIES <- aggregate(SDsub$INJURIES, by=list(SDsub$EVTYPE), sum)

# Merge and rename columns
TOTALHARM <- merge(FATALITIES, INJURIES, "Group.1")
colnames(TOTALHARM) <- c("EVTYPE", "TOT_FATALITIES", "TOT_INJURIES")

# All Total
TOTALHARM$TOTAL <- TOTALHARM$TOT_FATALITIES + TOTALHARM$TOT_INJURIES

# Order by max to min and preparing table.
TOTALHARM <- TOTALHARM[order(-TOTALHARM$TOTAL),]
colnames(TOTALHARM) <- c("Event Type", "Total Fatalilties", "Total Injuries", "Total")
rownames(TOTALHARM) <- 1:nrow(TOTALHARM)

# table Top 15 Event Types
kable(TOTALHARM[1:15,])
Event Type Total Fatalilties Total Injuries Total
TORNADO 5633 91346 96979
EXCESSIVE HEAT 1903 6525 8428
TSTM WIND 504 6957 7461
FLOOD 470 6789 7259
LIGHTNING 816 5230 6046
HEAT 937 2100 3037
FLASH FLOOD 978 1777 2755
ICE STORM 89 1975 2064
THUNDERSTORM WIND 133 1488 1621
WINTER STORM 206 1321 1527
HIGH WIND 248 1137 1385
HAIL 15 1361 1376
HURRICANE/TYPHOON 64 1275 1339
HEAVY SNOW 127 1021 1148
WILDFIRE 75 911 986

Plotting Top 15 types of events with the most harmful with respect to population health.

library(ggplot2)

# Get data to plot Top 15
plotData <- TOTALHARM[1:15,]

#Rename variables to be plot
colnames(plotData) <- c("EventType", "TotalFatalities","TotalInjuries", "Total")

#Plot data
ggplot(plotData, aes(x=reorder(EventType, -Total), y=Total)) + geom_bar(stat="identity", fill="forest green") + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.4, hjust = 1), plot.title = element_text(hjust = 0.5)) + ggtitle("Total Most Harmful per Event Type") + xlab("Event Type") + ylab("Total (Fatalities + Injuries)")

3.2 Across the United States, which types of events have the greatest economic consequences?

Calculating the Total Property Demage Cost and the Total Crop Demage Cost is possible to identify types of events have the greatest economic consequences in a list of top 15 events type.

library(knitr)

# Sum all Property Demage Cost and Crop Damage Cost
PROPDMGTOTAL <- aggregate(SDsub$PROPDMGCOST, by=list(SDsub$EVTYPE), sum)
CROPTOTAL <- aggregate(SDsub$CROPCOST, by=list(SDsub$EVTYPE), sum)

# Merge and rename columns
TOTALCOST <- merge(PROPDMGTOTAL, CROPTOTAL, "Group.1")
colnames(TOTALCOST) <- c("EVTYPE", "PROPDMGTOTAL", "CROPTOTAL")

# All Total
TOTALCOST$TOTAL <- TOTALCOST$PROPDMGTOTAL + TOTALCOST$CROPTOTAL

# Order by max to min and preparing table.
TOTALCOST <- TOTALCOST[order(-TOTALCOST$TOTAL),]
colnames(TOTALCOST) <- c("Event Type", "Property Damage Total [USD]", "Crop Damage Total [USD]", "Total Damage Cost [USD]")
rownames(TOTALCOST) <- 1:nrow(TOTALCOST)

# table Top 15 Event Types
kable(TOTALCOST[1:15,])
Event Type Property Damage Total [USD] Crop Damage Total [USD] Total Damage Cost [USD]
FLOOD 144657709807 5661968450 150319678257
HURRICANE/TYPHOON 69305840000 2607872800 71913712800
TORNADO 56935880688 414953270 57350833958
STORM SURGE 43323536000 5000 43323541000
HAIL 15730367518 3025537890 18755905408
FLASH FLOOD 16822673978 1421317100 18243991078
DROUGHT 1046106000 13972566000 15018672000
HURRICANE 11868319010 2741910000 14610229010
RIVER FLOOD 5118945500 5029459000 10148404500
ICE STORM 3944927860 5022113500 8967041360
TROPICAL STORM 7703890550 678346000 8382236550
WINTER STORM 6688497251 26944000 6715441251
HIGH WIND 5270046295 638571300 5908617595
WILDFIRE 4765114000 295472800 5060586800
TSTM WIND 4484928495 554007350 5038935845

Plotting Top 15 types of events with the greatest economic consequences

library(ggplot2)

# Get data to plot Top 15
plotData <- TOTALCOST[1:15,]

#Rename variables to be plot
colnames(plotData) <- c("EventType", "PropertyDamageTotalCost","CropDamageTotalCost", "DamageTotalCost")

#Plot data
ggplot(plotData, aes(x=reorder(EventType, -DamageTotalCost), y=DamageTotalCost)) + geom_bar(stat="identity", fill="forest green") + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.4, hjust = 1), plot.title = element_text(hjust = 0.5)) + ggtitle("Total Damage Cost per Event Type") + xlab("Event Type") + ylab("Total Damage Cost (Property + Crop) [USD]")