Synopsis

Storms and other severe weather events can cause public health problems and economic losses for communities and municipalities. The more severe events can result in fatalities, injuries, and property damage. The prevention of such events is paramount to mitigate as much as possible the impact on human lives and property.

The study suggests that in the period that start in the year 1993 and end in November 2011, Heavy Snow is the climatic event that cause more fatalities, followed by Tornado.

In injuries case, the positions are exchanged and Tornado is the weather event more harmful followed by Heavy Snow.

Flood causes major economic loss on property followed by Hurricane.

Paradoxically, Drought and Flood, respectively, cause major economic loss on crop.

Data

In this report we analysis the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

You can download the file from the web site:

. Storm Data [47Mb]

Another documents can be find on the following links:

. National Weather Service Storm Data Documentation

. National Climatic Data Center Storm Events FAQ

They explain how some of the variables are constructed/defined.

Load the data from the downloaded file:

# Load the data from the downloaded file
data <- read.csv("~/datasciencecoursera/Downloads/repdata-data-StormData.csv.bz2",stringsAsFactors = FALSE)

Given the purpose of the study, we subset the relevant variables.

# Create a new variable in dataset
data$YEAR <- as.POSIXlt(data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")$year + 1900

# Subset the data set to just the relevant columns that are useful
damage <- data[,c('BGN_DATE', 'EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG', 
                             'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP', 'YEAR')]

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records (see plot below). More recent years are considered more complete. Therefore, post-1992 events are selects.

Then, we select only the observations that contain values greater than zero in at least one of the variables of interest.

hist(damage$YEAR, breaks = 61, col = rgb(1,0,0,1),main = 'Wheather events by year',xlab = 'year')

damage <- subset(damage,FATALITIES!= 0|INJURIES!= 0|PROPDMG!= 0|CROPDMG!= 0)
damage <- subset(damage,YEAR > 1992)

# dimensionality of the dataset
dim(damage)
## [1] 227257      9

Then clean and transform the data are performed.

damage$PROPDMGEXP <- toupper(damage$PROPDMGEXP)
damage$CROPDMGEXP <- toupper(damage$CROPDMGEXP)
table(damage$PROPDMGEXP)
## 
##             -      +      0      2      3      4      5      6      7 
##  10207      1      5    210      1      1      4     18      3      3 
##      B      H      K      M 
##     40      7 208203   8554
damage[damage$PROPDMGEXP == '','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '-','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '+','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '0','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '2','PROPDMGEXP'] <- 10^2
damage[damage$PROPDMGEXP == '3','PROPDMGEXP'] <- 10^3
damage[damage$PROPDMGEXP == '4','PROPDMGEXP'] <- 10^4
damage[damage$PROPDMGEXP == '5','PROPDMGEXP'] <- 10^5
damage[damage$PROPDMGEXP == '6','PROPDMGEXP'] <- 10^6
damage[damage$PROPDMGEXP == '7','PROPDMGEXP'] <- 10^7
damage[damage$PROPDMGEXP == 'K','PROPDMGEXP'] <- 10^3
damage[damage$PROPDMGEXP == 'H','PROPDMGEXP'] <- 10^5
damage[damage$PROPDMGEXP == 'M','PROPDMGEXP'] <- 10^6
damage[damage$PROPDMGEXP == 'B','PROPDMGEXP'] <- 10^9
table(damage$PROPDMGEXP)
## 
##      0    100   1000  10000  1e+05  1e+06  1e+07  1e+09 
##  10423      1 208204      4     25   8557      3     40
table(damage$CROPDMGEXP)      
## 
##             ?      0      B      K      M 
## 125288      6     17      7  99953   1986
damage[damage$CROPDMGEXP == '','CROPDMGEXP'] <- 0
damage[damage$CROPDMGEXP == '?','CROPDMGEXP'] <- 0
damage[damage$CROPDMGEXP == '0','CROPDMGEXP'] <- 0
damage[damage$CROPDMGEXP == 'K','CROPDMGEXP'] <- 10^3
damage[damage$CROPDMGEXP == 'M','CROPDMGEXP'] <- 10^6
damage[damage$CROPDMGEXP == 'B','CROPDMGEXP'] <- 10^9
table(damage$CROPDMGEXP ) 
## 
##      0   1000  1e+06  1e+09 
## 125311  99953   1986      7
damage <- subset(damage,PROPDMGEXP!= 0|CROPDMGEXP!= 0|FATALITIES!= 0|INJURIES!= 0)
damage$EVENT <- damage$EVTYPE 
# replace double spaces with one space
damage$EVENT<- gsub("  ", " ",damage$EVTYPE)
# remove leading and trailing spaces
damage$EVENT<- trimws(damage$EVENT, which = 'both') 
# convert lower to capital the string
damage$EVENT <- toupper(damage$EVENT)
## replace punctuation with spaces
damage$EVENT<- gsub('([[:punct:]])|\\s+',' ',damage$EVENT)

NOAA has a Storm Event Data Table. Our next goal is that the types of events recorded in the dataset match the NOAA table. After correcting typographical errors, we perform the pairing described above. However, not all events in the database can be matching. This residual information is also provided for those who wish to deepen in the report.

noaaevent = c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood","Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke","Drought", "Dust Devil", "Dust Storm", "Excessive Heat","Extreme Cold/Wind Chill", "Flash Flood", "Flood","Freezing Fog","Frost Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain","Heavy Snow", "High Surf", "High Wind","Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow","Lightning","Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet","Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")

noaaevent= toupper(noaaevent)

damage$EVENT <- gsub(pattern = "TSTM", replacement = 'THUNDERSTORM', 
                      x= damage$EVENT, ignore.case = TRUE)

idx <- grep("^(?=.*HAIL)^(?!.*TORNADOES)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HAIL"
idx <- grep("^(?=.*TORN)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "TORNADO"
idx <- grep("^(LIGHTNING)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "LIGHTNING"
idx <- grep("^(FLASH FLOOD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLASH FLOOD"
idx <- grep("^(?=.*FLOOD)^(?!.*FLASH)(?!.*LAKESHORE)(?!.*COASTAL)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLOOD"
idx <- grep("^(?=.*THUNDER)^(?!.*SNOW)^(?!.*MARINE)^(?!.*NON)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "THUNDERSTORM WIND"
idx <- grep("^(?=.*HEAVY RAIN)^(?!.*HIGH WINDS)^(?!.*FLOOD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY RAIN"
idx <- grep("^(HURRI)|^(TYPHOON)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HURRICANE/TYPHOON"
idx <- grep("^(?=.*HEAVY SNOW)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY SNOW"
idx <- grep("^(HIGH WIND)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HIGH WIND"
idx <- grep("^(HIGH )", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY SNOW"
idx <- grep("^(?=.*EXCESSIVE HEAT)(?!.*DROUGHT)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY SNOW"
idx <- grep("^(?=.*DROUGHT)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "DROUGHT"
idx <- grep("^(?=.*TROPICAL STORM)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "TROPICAL STORM"
idx <- grep("^(?=.*FIRE)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WILDFIRE"
idx <- grep("^(?=.*ICE)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "ICE STORM"
idx <- grep("^(?=.*WINTER)(?!.*WEATHER)(?!.*BLIZZARD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WINTER STORM"
idx <- grep("^(?=.*AVALAN)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "AVALANCHE"
idx <- grep("^(?=.*BLIZZARD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "BLIZZARD"
idx <- grep("^(?=.*COASTAL FLOOD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "COASTAL FLOOD"
idx <- grep("^(?=.*STORM SURGE)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "STORM TIDE"
idx <- grep("^(?=.*RIP CURRENT)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "RIP CURRENT"
idx <- grep("^(?=.*STREAM)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLOOD"
idx <- grep("^(?=.*WINTER)(?!.*STORM)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WINTER WEATHER"
idx <- grep("^(?=.*COLD)(?!.*SNOW)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WINTER WEATHER"
idx <- grep("LAKE EFFECT SNOW", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "LAKE-EFFECT SNOW"
idx <- grep("FLOOD FLASH FLOOD", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLASH FLOOD"
idx <- grep("LANDSLIDE", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "DEBRIS FLOW"
idx <- grep("FOG", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FREEZING FOG"
idx <- grep("SLEET", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "SLEET"
idx <- grep("DUST", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "DUST DEVIL"
idx <- grep("WATERSPOUT", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WATERSPOUT"

# Event counts in the database that cannot be matching.
dif <- damage$EVENT[-(which(damage$EVENT %in% noaaevent))]
# observations counts
length(dif)
## [1] 1080
# unique event types no matchimg
head(sort(unique(dif)),10)
##  [1] " "                      "AGRICULTURAL FREEZE"   
##  [3] "APACHE COUNTY"          "ASTRONOMICAL HIGH TIDE"
##  [5] "BEACH EROSION"          "BLOWING SNOW"          
##  [7] "COASTAL EROSION"        "COASTAL STORM"         
##  [9] "COASTAL SURGE"          "COASTALSTORM"

Results

if (!'ggplot2' %in% installed.packages()) install.packages('ggplot2')
library(ggplot2)
if (!'dplyr' %in% installed.packages()) install.packages('dplyr')
library(dplyr)
## 
## Attaching package: 'dplyr'
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
#fatalities

fatalities <- aggregate(FATALITIES ~ EVENT, data = damage, FUN = sum)
fatalities <- head(arrange(fatalities,-FATALITIES),10)
print(fatalities)
##                EVENT FATALITIES
## 1         HEAVY SNOW       2456
## 2            TORNADO       1649
## 3        FLASH FLOOD       1035
## 4               HEAT        937
## 5          LIGHTNING        817
## 6        RIP CURRENT        577
## 7              FLOOD        512
## 8     WINTER WEATHER        497
## 9  THUNDERSTORM WIND        443
## 10         AVALANCHE        225
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command. 
fatalities <- transform(fatalities, EVENT = reorder(EVENT, -FATALITIES))

# Make the plot
ggplot(fatalities, aes(x=EVENT, y=FATALITIES)) +
    geom_bar(stat="identity", fill="dark blue") +
    geom_text(aes(label=FATALITIES), vjust=-0.4) +
    xlab("Event Types") + 
    ylab("fatalities") + 
    theme(axis.title.x = element_blank(), axis.text.x = element_text(angle = 45,hjust = 1))+
    theme(plot.title = element_text(lineheight=.5, face="bold"))+
    ggtitle("Top 10 most harmful weather events (1993-2011): Fatalities counts")

# Injuries

injuries <- aggregate(INJURIES ~ EVENT, data = damage, FUN = sum)
injuries <- head(arrange(injuries,-INJURIES),10)
print(injuries)
##                EVENT INJURIES
## 1            TORNADO    23371
## 2         HEAVY SNOW     9194
## 3              FLOOD     6874
## 4  THUNDERSTORM WIND     6086
## 5          LIGHTNING     5232
## 6          ICE STORM     2156
## 7               HEAT     2100
## 8        FLASH FLOOD     1800
## 9           WILDFIRE     1608
## 10      WINTER STORM     1353
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command. 
injuries <- transform(injuries, EVENT = reorder(EVENT, -INJURIES))

# Make the plot
ggplot(injuries, aes(x=EVENT, y=INJURIES)) +
    geom_bar(stat="identity", fill="dark blue") +
    geom_text(aes(label=INJURIES), vjust=-0.4) +
    xlab("Event Types") + 
    ylab("injuries") + 
    theme(axis.title.x = element_blank(), axis.text.x = element_text(angle = 45,hjust = 1))+
    theme(plot.title = element_text(lineheight=.5, face="bold"))+
    ggtitle("Top 10 most harmful weather events (1993-2011): Injuries counts")

# Property

property <- round(damage$PROPDMG*as.numeric(damage$PROPDMGEXP)/10^9,1)
property.by.event <- aggregate(property ~ EVENT, data = damage, FUN = sum)
property.by.event <- head(arrange(property.by.event,-property),10)
print(property.by.event)
##                EVENT property
## 1              FLOOD    140.0
## 2  HURRICANE/TYPHOON     84.5
## 3         STORM TIDE     47.6
## 4            TORNADO     17.5
## 5               HAIL     10.6
## 6        FLASH FLOOD      7.7
## 7           WILDFIRE      7.2
## 8     TROPICAL STORM      6.9
## 9       WINTER STORM      6.0
## 10 THUNDERSTORM WIND      4.5
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command. 
property <- transform(property.by.event, EVENT = reorder(EVENT, -property))

# Make the plot
ggplot(property, aes(x=EVENT, y=property)) +
    geom_bar(stat="identity", fill="dark blue") +
    geom_text(aes(label=property), vjust=-0.4) +
    xlab("Event Types") + 
    ylab("billons ($)") + 
    theme(axis.title.x = element_blank(), axis.text.x =         element_text(angle = 45,hjust = 1))+
    theme(plot.title = element_text(lineheight=.5, face="bold"))+
    ggtitle("Top 10 most harmful weather events (1993-2011): Damage to the Property")

# Crop

crop <- round(damage$CROPDMG*as.numeric(damage$CROPDMGEXP)/10^9, 1)
crop.by.event <- aggregate(crop ~ EVENT, data = damage, FUN = sum)
crop.by.event <-head(arrange(crop.by.event,-crop),10)
print(crop.by.event)
##                EVENT crop
## 1            DROUGHT 13.4
## 2              FLOOD  7.5
## 3          ICE STORM  5.0
## 4  HURRICANE/TYPHOON  4.8
## 5     WINTER WEATHER  1.4
## 6         HEAVY SNOW  0.9
## 7       FROST FREEZE  0.8
## 8        FLASH FLOOD  0.7
## 9         HEAVY RAIN  0.5
## 10              HEAT  0.4
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command. 
crop <- transform(crop.by.event, EVENT = reorder(EVENT, -crop))

# Make the plot
ggplot(crop, aes(x=EVENT, y=crop)) +
    geom_bar(stat="identity", fill="dark blue") +
    geom_text(aes(label=crop), vjust=-0.4) +
    xlab("Event Types") + 
    ylab("billons ($)") + 
    theme(axis.title.x = element_blank(), axis.text.x =         element_text(angle = 45,hjust = 1))+
    theme(plot.title = element_text(lineheight=.5, face="bold"))+
    ggtitle("Top 10 most harmful weather events (1993-2011): Damage to the Crop")

Summary

The study suggests that Heavy Snow and Tornado make the major damage for the public health, while Flood and Hurricane have the most negative economic consequences.

The following report aim to guide actions to events that cause the major damage.