The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
The following data processing use reference Data provided by NOAA (National Oceanic and Atmospheric Administration’s U.S) Storm Database downloaded from link provided by the 2nd Assessment in COURSERA Reproducible Research from John Hopkins University.
# Libraries used in this document
library(dplyr)
# Create Data Directory if not exist
if (!file.exists("data")){
dir.create("data")
}
# Download file from the web
fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileUrl, destfile = "./data/StormDataUSA.csv.bz2")
# Load file
StormData <- read.csv("./data/StormDataUSA.csv.bz2")
There is also some documentation of the database available.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
The follow variables are available in the dataset provided:
colnames(StormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
According with Storm Data Documentation, direct fatalities/injuries is defined as “a fatality or injury directly attributable to the hydro-meteorological event itself, or impact by airbone/falling/moving debris, i.e., missiles generated by wind, water, ice, lighting, tornado, etc. Fatalities and Injuries directly caused by weather event are defined as”FATALITIES" and “INJURIES” in the Storm Dataset. Other parameter to be used is the overall “DEMAGE” in properties. According with the documentation the property damage estimates “should be entered as actual dollar amounts, if a reasonably accurate estimate from an insurance company or other qualified individual is available”. The Storm Data used is available in Appendix B in reference documentation entitled Property Demage Estimates for the variables “PROPDMG” and “CROPDMG”. “PROPDMGEXP” and “CROPDMGEXP” identify the exponential factor for each value of “PROPDMG” and “CROPDMG” in character.
# Get only data that will be used in this analysis.
SDsub <- subset(StormData, select=c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
A special procedure is required to converting exponent variables in characters to numerical such as: “-”, “+”, “K”, “M”, “B” and others according with the maps bellow:
library(data.table)
# Convert PROPDMGEXP and CROPDMGEXP to numerical.
# Map property damage alphanumeric exponents to numeric values.
propDmgConvert <- c("\"\"" = 10^0, "-" = 10^0, "+" = 10^0,
"0" = 10^0,"1" = 10^1, "2" = 10^2, "3" = 10^3, "4" = 10^4, "5" = 10^5,"6" = 10^6, "7" = 10^7,"8" = 10^8,"9" = 10^9,"H" = 10^2,"K" = 10^3, "M" = 10^6, "B" = 10^9)
# Map crop damage alphanumeric exponents to numeric values
cropDmgConvert <- c("\"\"" = 10^0, "?" = 10^0, "0" = 10^0,"K" = 10^3,"M" = 10^6, "B" = 10^9)
# Replace character to numerical
SDsub <- as.data.table(SDsub)
SDsub[, PROPDMGEXP := propDmgConvert[as.character(SDsub[,PROPDMGEXP])]]
SDsub[is.na(PROPDMGEXP), PROPDMGEXP := 10^0 ]
SDsub[, CROPDMGEXP := cropDmgConvert[as.character(SDsub[,CROPDMGEXP])] ]
SDsub[is.na(CROPDMGEXP), CROPDMGEXP := 10^0 ]
Once that the exponential factors are converted to numerical, the economic calculations is required adding two new variables in subset data named “DMGCOST” and “CROPCOST”.
# Calculate Cost of Damage and Crop
SDsub$PROPDMGCOST <- SDsub$PROPDMG * SDsub$PROPDMGEXP
SDsub$CROPCOST <- SDsub$CROPDMG * SDsub$CROPDMGEXP
This data analysis address two following important questions about the impact of the weather events in USA:
Calculating total fatalities and Injuries is possible to identify types of events have are most harmful with respect to population health in a list of top 15 events type.
library(knitr)
# Sum all Property Demage Cost and Crop Damage Cost
FATALITIES <- aggregate(SDsub$FATALITIES, by=list(SDsub$EVTYPE), sum)
INJURIES <- aggregate(SDsub$INJURIES, by=list(SDsub$EVTYPE), sum)
# Merge and rename columns
TOTALHARM <- merge(FATALITIES, INJURIES, "Group.1")
colnames(TOTALHARM) <- c("EVTYPE", "TOT_FATALITIES", "TOT_INJURIES")
# All Total
TOTALHARM$TOTAL <- TOTALHARM$TOT_FATALITIES + TOTALHARM$TOT_INJURIES
# Order by max to min and preparing table.
TOTALHARM <- TOTALHARM[order(-TOTALHARM$TOTAL),]
colnames(TOTALHARM) <- c("Event Type", "Total Fatalilties", "Total Injuries", "Total")
rownames(TOTALHARM) <- 1:nrow(TOTALHARM)
# table Top 15 Event Types
kable(TOTALHARM[1:15,])
| Event Type | Total Fatalilties | Total Injuries | Total |
|---|---|---|---|
| TORNADO | 5633 | 91346 | 96979 |
| EXCESSIVE HEAT | 1903 | 6525 | 8428 |
| TSTM WIND | 504 | 6957 | 7461 |
| FLOOD | 470 | 6789 | 7259 |
| LIGHTNING | 816 | 5230 | 6046 |
| HEAT | 937 | 2100 | 3037 |
| FLASH FLOOD | 978 | 1777 | 2755 |
| ICE STORM | 89 | 1975 | 2064 |
| THUNDERSTORM WIND | 133 | 1488 | 1621 |
| WINTER STORM | 206 | 1321 | 1527 |
| HIGH WIND | 248 | 1137 | 1385 |
| HAIL | 15 | 1361 | 1376 |
| HURRICANE/TYPHOON | 64 | 1275 | 1339 |
| HEAVY SNOW | 127 | 1021 | 1148 |
| WILDFIRE | 75 | 911 | 986 |
Plotting Top 15 types of events with the most harmful with respect to population health.
library(ggplot2)
# Get data to plot Top 15
plotData <- TOTALHARM[1:15,]
#Rename variables to be plot
colnames(plotData) <- c("EventType", "TotalFatalities","TotalInjuries", "Total")
#Plot data
ggplot(plotData, aes(x=reorder(EventType, -Total), y=Total)) + geom_bar(stat="identity", fill="forest green") + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.4, hjust = 1), plot.title = element_text(hjust = 0.5)) + ggtitle("Total Most Harmful per Event Type") + xlab("Event Type") + ylab("Total (Fatalities + Injuries)")
Calculating the Total Property Demage Cost and the Total Crop Demage Cost is possible to identify types of events have the greatest economic consequences in a list of top 15 events type.
library(knitr)
# Sum all Property Demage Cost and Crop Damage Cost
PROPDMGTOTAL <- aggregate(SDsub$PROPDMGCOST, by=list(SDsub$EVTYPE), sum)
CROPTOTAL <- aggregate(SDsub$CROPCOST, by=list(SDsub$EVTYPE), sum)
# Merge and rename columns
TOTALCOST <- merge(PROPDMGTOTAL, CROPTOTAL, "Group.1")
colnames(TOTALCOST) <- c("EVTYPE", "PROPDMGTOTAL", "CROPTOTAL")
# All Total
TOTALCOST$TOTAL <- TOTALCOST$PROPDMGTOTAL + TOTALCOST$CROPTOTAL
# Order by max to min and preparing table.
TOTALCOST <- TOTALCOST[order(-TOTALCOST$TOTAL),]
colnames(TOTALCOST) <- c("Event Type", "Property Damage Total [USD]", "Crop Damage Total [USD]", "Total Damage Cost [USD]")
rownames(TOTALCOST) <- 1:nrow(TOTALCOST)
# table Top 15 Event Types
kable(TOTALCOST[1:15,])
| Event Type | Property Damage Total [USD] | Crop Damage Total [USD] | Total Damage Cost [USD] |
|---|---|---|---|
| FLOOD | 144657709807 | 5661968450 | 150319678257 |
| HURRICANE/TYPHOON | 69305840000 | 2607872800 | 71913712800 |
| TORNADO | 56935880688 | 414953270 | 57350833958 |
| STORM SURGE | 43323536000 | 5000 | 43323541000 |
| HAIL | 15730367518 | 3025537890 | 18755905408 |
| FLASH FLOOD | 16822673978 | 1421317100 | 18243991078 |
| DROUGHT | 1046106000 | 13972566000 | 15018672000 |
| HURRICANE | 11868319010 | 2741910000 | 14610229010 |
| RIVER FLOOD | 5118945500 | 5029459000 | 10148404500 |
| ICE STORM | 3944927860 | 5022113500 | 8967041360 |
| TROPICAL STORM | 7703890550 | 678346000 | 8382236550 |
| WINTER STORM | 6688497251 | 26944000 | 6715441251 |
| HIGH WIND | 5270046295 | 638571300 | 5908617595 |
| WILDFIRE | 4765114000 | 295472800 | 5060586800 |
| TSTM WIND | 4484928495 | 554007350 | 5038935845 |
Plotting Top 15 types of events with the greatest economic consequences
library(ggplot2)
# Get data to plot Top 15
plotData <- TOTALCOST[1:15,]
#Rename variables to be plot
colnames(plotData) <- c("EventType", "PropertyDamageTotalCost","CropDamageTotalCost", "DamageTotalCost")
#Plot data
ggplot(plotData, aes(x=reorder(EventType, -DamageTotalCost), y=DamageTotalCost)) + geom_bar(stat="identity", fill="forest green") + geom_bar(stat="identity") + theme(axis.text.x = element_text(angle = 90, vjust = 0.4, hjust = 1), plot.title = element_text(hjust = 0.5)) + ggtitle("Total Damage Cost per Event Type") + xlab("Event Type") + ylab("Total Damage Cost (Property + Crop) [USD]")