In this report we analyse the weather events registered by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. NOAA storm database tracks characteristics of major weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, property damage and crop damage. Events in the NOAA database start in the year 1950 and end in November 2011. For the earlier years the database contains fewer events recorded, most likely due to a lack of standard record procedures. Data for more recent years are considerably more complete. For this study, we took that we considered the most important variables: Fatalities, Property Damages and Crop Damages.
We obtained the data from the NOAA Storm databse which register all main weather events across the U.S.
# Retrieving the zipped file from the internet:
datasetURL <- "https://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
download.file(datasetURL, destfile = "StormData.csv.bz2", method = "curl")
As the dataset comes as a bz2 file, we use the read.csv command which is cappable of reading bz2 files directly.
# Reading the dataset:
stormData <- read.csv("StormData.csv.bz2", header = TRUE, na.strings = "NA")
The next chunck appears with all lines commented, it is useful to save the dataset in the working directory to be retrieved much more faster in case it was to be needed.
# Saving the dataset and load from the HD, in case it is needed:
#save(stormData, file = "storm.RData")
#load("storm.RData")
The next chunck displays few of the first rows of the raw dataset which consists of 902297 observations with 37 variables each.
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
In order to make the dataset much more readable, we first selected the most important variables. We consider only the individual events which caused more than zero damage. Then we sorted the selected events (Fatalities, Property damage and Crop damage) in a decreasing manner. From these, we selected the first 200 individual events as we noticed very few damage is caused beyond the 200th observation.
From these three new datasets, we only selected Event Type and the Amount of Damage which can be displayed in a graphic. Below, we show only the first six observations of each one of the three datasets.
# Creating a dataset only with important variables:
selectedVar <- stormData[,c(2,3,6,7,8,23,24,25,27)]
# FATALITIES DATA
fatalitiesData <- selectedVar[which(selectedVar$FATALITIES >0), ]
fatalitiesOrderInd <- order(fatalitiesData$FATALITIES, decreasing = TRUE)
first200byFatalities <- fatalitiesData[fatalitiesOrderInd[c(1:200)],c("EVTYPE","FATALITIES")]
head(first200byFatalities)
## EVTYPE FATALITIES
## 198704 HEAT 583
## 862634 TORNADO 158
## 68670 TORNADO 116
## 148852 TORNADO 114
## 355128 EXCESSIVE HEAT 99
## 67884 TORNADO 90
# PROPERTY DAMAGE DATA
propDamData <- selectedVar[which(selectedVar$PROPDMG >0), ]
propDamOrderInd <- order(propDamData$PROPDMG , decreasing = TRUE)
first200byPropDam <- propDamData[propDamOrderInd[c(1:200)],c("EVTYPE","PROPDMG")]
head(first200byPropDam)
## EVTYPE PROPDMG
## 778568 THUNDERSTORM WIND 5000
## 808182 FLASH FLOOD 5000
## 808183 FLASH FLOOD 5000
## 900685 WATERSPOUT 5000
## 791403 LANDSLIDE 4800
## 750967 TORNADO 4410
# CROP DAMAGE DATA
cropDamData <- selectedVar[which(selectedVar$CROPDMG >0), ]
cropDamOrderInd <- order(cropDamData$CROPDMG, decreasing = TRUE)
first200byCropDam <- cropDamData[cropDamOrderInd[c(1:200)],c("EVTYPE","CROPDMG")]
head(first200byCropDam)
## EVTYPE CROPDMG
## 544253 DROUGHT 990
## 631126 TROPICAL STORM 985
## 322172 FLOOD 978
## 387863 FLOOD 975
## 279930 River Flooding 950
## 743347 FLASH FLOOD 950
This section presents the results of the analysis which are supported by ggplot2 graphics.
# Loading ggplot2 package:
library(ggplot2)
# Graphic for most dangerous events by fatalities:
qplot(FATALITIES, data = first200byFatalities , geom = "density", colour = EVTYPE)
This Density Plot shows that most recurrent events that take few victims are Flood and Flash Flood. Each one of these events causes very few victims (near only one). On the other side, the plot shows that much less recurrent events that take massive victims is Heat. The plot shows a notorious case which caused near 600 fatalities.
# Graphic for most dangerous events by property damage:
qplot(PROPDMG, data = first200byPropDam , geom = "density", colour = EVTYPE)
This Density plot shows that there exist many weather events that usually occur and which causes relatively few property damages. On the other side, there exist weather events that seldom occur and which occurrence cause a lot of damage. That is the case of Flash Flood, which rarely occurs but causes great economic loses. Other harmful but rare events that is convenient to take into account are Tornado, High Wind, Flood and Hail.
# Graphic for most dangerous events by crop damage:
qplot(CROPDMG, data = first200byCropDam , geom = "density", colour = EVTYPE)
This Density Plot shows that crop is exposed to many harmful weather events. From these, the most common are Hail, Flash Flood and Drought. From these, there is an isolated case of Drought which caused severe damages.
This brief study is intended to serve as a first glance of main weather events that cause great human and economic consequences. It can be used as a guide to priorize resources to attend disaster situations.