Sinopsis

In this report we analyse the weather events registered by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. NOAA storm database tracks characteristics of major weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, property damage and crop damage. Events in the NOAA database start in the year 1950 and end in November 2011. For the earlier years the database contains fewer events recorded, most likely due to a lack of standard record procedures. Data for more recent years are considerably more complete. For this study, we took that we considered the most important variables: Fatalities, Property Damages and Crop Damages.


Data Processing

We obtained the data from the NOAA Storm databse which register all main weather events across the U.S.

# Retrieving the zipped file from the internet:
datasetURL <- "https://d396qusza40orc.cloudfront.net/repdata/data/StormData.csv.bz2"
download.file(datasetURL, destfile = "StormData.csv.bz2", method = "curl")

As the dataset comes as a bz2 file, we use the read.csv command which is cappable of reading bz2 files directly.

# Reading the dataset:
stormData <- read.csv("StormData.csv.bz2", header = TRUE, na.strings = "NA")

The next chunck appears with all lines commented, it is useful to save the dataset in the working directory to be retrieved much more faster in case it was to be needed.

# Saving the dataset and load from the HD, in case it is needed:
#save(stormData, file = "storm.RData")
#load("storm.RData")

The next chunck displays few of the first rows of the raw dataset which consists of 902297 observations with 37 variables each.

head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

In order to make the dataset much more readable, we first selected the most important variables. We consider only the individual events which caused more than zero damage. Then we sorted the selected events (Fatalities, Property damage and Crop damage) in a decreasing manner. From these, we selected the first 200 individual events as we noticed very few damage is caused beyond the 200th observation.

From these three new datasets, we only selected Event Type and the Amount of Damage which can be displayed in a graphic. Below, we show only the first six observations of each one of the three datasets.

# Creating a dataset only with important variables: 
selectedVar <- stormData[,c(2,3,6,7,8,23,24,25,27)]

# FATALITIES DATA
fatalitiesData <- selectedVar[which(selectedVar$FATALITIES >0), ]
fatalitiesOrderInd <- order(fatalitiesData$FATALITIES, decreasing = TRUE)
first200byFatalities <- fatalitiesData[fatalitiesOrderInd[c(1:200)],c("EVTYPE","FATALITIES")]
head(first200byFatalities)
##                EVTYPE FATALITIES
## 198704           HEAT        583
## 862634        TORNADO        158
## 68670         TORNADO        116
## 148852        TORNADO        114
## 355128 EXCESSIVE HEAT         99
## 67884         TORNADO         90
# PROPERTY DAMAGE DATA
propDamData <- selectedVar[which(selectedVar$PROPDMG >0), ]
propDamOrderInd <- order(propDamData$PROPDMG , decreasing = TRUE)
first200byPropDam <- propDamData[propDamOrderInd[c(1:200)],c("EVTYPE","PROPDMG")]
head(first200byPropDam)
##                   EVTYPE PROPDMG
## 778568 THUNDERSTORM WIND    5000
## 808182       FLASH FLOOD    5000
## 808183       FLASH FLOOD    5000
## 900685        WATERSPOUT    5000
## 791403         LANDSLIDE    4800
## 750967           TORNADO    4410
# CROP DAMAGE DATA
cropDamData <- selectedVar[which(selectedVar$CROPDMG >0), ]
cropDamOrderInd <- order(cropDamData$CROPDMG, decreasing = TRUE)
first200byCropDam <- cropDamData[cropDamOrderInd[c(1:200)],c("EVTYPE","CROPDMG")]
head(first200byCropDam)
##                EVTYPE CROPDMG
## 544253        DROUGHT     990
## 631126 TROPICAL STORM     985
## 322172          FLOOD     978
## 387863          FLOOD     975
## 279930 River Flooding     950
## 743347    FLASH FLOOD     950

Results

This section presents the results of the analysis which are supported by ggplot2 graphics.

Most harmful weather events respect to Fatalities

# Loading ggplot2 package:
library(ggplot2)

# Graphic for most dangerous events by fatalities:
qplot(FATALITIES, data = first200byFatalities , geom = "density", colour = EVTYPE)

This Density Plot shows that most recurrent events that take few victims are Flood and Flash Flood. Each one of these events causes very few victims (near only one). On the other side, the plot shows that much less recurrent events that take massive victims is Heat. The plot shows a notorious case which caused near 600 fatalities.

Most harmful weather events respect to Property damage

# Graphic for most dangerous events by property damage:
qplot(PROPDMG, data = first200byPropDam , geom = "density", colour = EVTYPE)

This Density plot shows that there exist many weather events that usually occur and which causes relatively few property damages. On the other side, there exist weather events that seldom occur and which occurrence cause a lot of damage. That is the case of Flash Flood, which rarely occurs but causes great economic loses. Other harmful but rare events that is convenient to take into account are Tornado, High Wind, Flood and Hail.

Most harmful weather events respect to Crop damage

# Graphic for most dangerous events by crop damage:
qplot(CROPDMG, data = first200byCropDam , geom = "density", colour = EVTYPE)

This Density Plot shows that crop is exposed to many harmful weather events. From these, the most common are Hail, Flash Flood and Drought. From these, there is an isolated case of Drought which caused severe damages.


This brief study is intended to serve as a first glance of main weather events that cause great human and economic consequences. It can be used as a guide to priorize resources to attend disaster situations.