Storms and other severe weather events can cause public health problems and economic losses for communities and municipalities. The more severe events can result in fatalities, injuries, and property damage. The prevention of such events is paramount to mitigate as much as possible the impact on human lives and property.
The study suggests that in the period that start in the year 1993 and end in November 2011, Heavy Snow is the climatic event that cause more fatalities, followed by Tornado.
In injuries case, the positions are exchanged and Tornado is the weather event more harmful followed by Heavy Snow.
Flood causes major economic loss on property followed by Hurricane.
Paradoxically, Drought and Flood, respectively, cause major economic loss on crop.
In this report we analysis the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
You can download the file from the web site:
. Storm Data [47Mb]
Another documents can be find on the following links:
. National Weather Service Storm Data Documentation
. National Climatic Data Center Storm Events FAQ
They explain how some of the variables are constructed/defined.
Load the data from the downloaded file:
# Load the data from the downloaded file
data <- read.csv("~/datasciencecoursera/Downloads/repdata-data-StormData.csv.bz2",stringsAsFactors = FALSE)
Given the purpose of the study, we subset the relevant variables.
# Create a new variable in dataset
data$YEAR <- as.POSIXlt(data$BGN_DATE, format = "%m/%d/%Y %H:%M:%S")$year + 1900
# Subset the data set to just the relevant columns that are useful
damage <- data[,c('BGN_DATE', 'EVTYPE', 'FATALITIES', 'INJURIES', 'PROPDMG',
'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP', 'YEAR')]
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records (see plot below). More recent years are considered more complete. Therefore, post-1992 events are selects.
Then, we select only the observations that contain values greater than zero in at least one of the variables of interest.
hist(damage$YEAR, breaks = 61, col = rgb(1,0,0,1),main = 'Wheather events by year',xlab = 'year')
damage <- subset(damage,FATALITIES!= 0|INJURIES!= 0|PROPDMG!= 0|CROPDMG!= 0)
damage <- subset(damage,YEAR > 1992)
# dimensionality of the dataset
dim(damage)
## [1] 227257 9
Then clean and transform the data are performed.
damage$PROPDMGEXP <- toupper(damage$PROPDMGEXP)
damage$CROPDMGEXP <- toupper(damage$CROPDMGEXP)
table(damage$PROPDMGEXP)
##
## - + 0 2 3 4 5 6 7
## 10207 1 5 210 1 1 4 18 3 3
## B H K M
## 40 7 208203 8554
damage[damage$PROPDMGEXP == '','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '-','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '+','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '0','PROPDMGEXP'] <- 0
damage[damage$PROPDMGEXP == '2','PROPDMGEXP'] <- 10^2
damage[damage$PROPDMGEXP == '3','PROPDMGEXP'] <- 10^3
damage[damage$PROPDMGEXP == '4','PROPDMGEXP'] <- 10^4
damage[damage$PROPDMGEXP == '5','PROPDMGEXP'] <- 10^5
damage[damage$PROPDMGEXP == '6','PROPDMGEXP'] <- 10^6
damage[damage$PROPDMGEXP == '7','PROPDMGEXP'] <- 10^7
damage[damage$PROPDMGEXP == 'K','PROPDMGEXP'] <- 10^3
damage[damage$PROPDMGEXP == 'H','PROPDMGEXP'] <- 10^5
damage[damage$PROPDMGEXP == 'M','PROPDMGEXP'] <- 10^6
damage[damage$PROPDMGEXP == 'B','PROPDMGEXP'] <- 10^9
table(damage$PROPDMGEXP)
##
## 0 100 1000 10000 1e+05 1e+06 1e+07 1e+09
## 10423 1 208204 4 25 8557 3 40
table(damage$CROPDMGEXP)
##
## ? 0 B K M
## 125288 6 17 7 99953 1986
damage[damage$CROPDMGEXP == '','CROPDMGEXP'] <- 0
damage[damage$CROPDMGEXP == '?','CROPDMGEXP'] <- 0
damage[damage$CROPDMGEXP == '0','CROPDMGEXP'] <- 0
damage[damage$CROPDMGEXP == 'K','CROPDMGEXP'] <- 10^3
damage[damage$CROPDMGEXP == 'M','CROPDMGEXP'] <- 10^6
damage[damage$CROPDMGEXP == 'B','CROPDMGEXP'] <- 10^9
table(damage$CROPDMGEXP )
##
## 0 1000 1e+06 1e+09
## 125311 99953 1986 7
damage <- subset(damage,PROPDMGEXP!= 0|CROPDMGEXP!= 0|FATALITIES!= 0|INJURIES!= 0)
damage$EVENT <- damage$EVTYPE
# replace double spaces with one space
damage$EVENT<- gsub(" ", " ",damage$EVTYPE)
# remove leading and trailing spaces
damage$EVENT<- trimws(damage$EVENT, which = 'both')
# convert lower to capital the string
damage$EVENT <- toupper(damage$EVENT)
## replace punctuation with spaces
damage$EVENT<- gsub('([[:punct:]])|\\s+',' ',damage$EVENT)
NOAA has a Storm Event Data Table. Our next goal is that the types of events recorded in the dataset match the NOAA table. After correcting typographical errors, we perform the pairing described above. However, not all events in the database can be matching. This residual information is also provided for those who wish to deepen in the report.
noaaevent = c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood","Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke","Drought", "Dust Devil", "Dust Storm", "Excessive Heat","Extreme Cold/Wind Chill", "Flash Flood", "Flood","Freezing Fog","Frost Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain","Heavy Snow", "High Surf", "High Wind","Hurricane/Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow","Lightning","Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet","Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
noaaevent= toupper(noaaevent)
damage$EVENT <- gsub(pattern = "TSTM", replacement = 'THUNDERSTORM',
x= damage$EVENT, ignore.case = TRUE)
idx <- grep("^(?=.*HAIL)^(?!.*TORNADOES)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HAIL"
idx <- grep("^(?=.*TORN)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "TORNADO"
idx <- grep("^(LIGHTNING)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "LIGHTNING"
idx <- grep("^(FLASH FLOOD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLASH FLOOD"
idx <- grep("^(?=.*FLOOD)^(?!.*FLASH)(?!.*LAKESHORE)(?!.*COASTAL)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLOOD"
idx <- grep("^(?=.*THUNDER)^(?!.*SNOW)^(?!.*MARINE)^(?!.*NON)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "THUNDERSTORM WIND"
idx <- grep("^(?=.*HEAVY RAIN)^(?!.*HIGH WINDS)^(?!.*FLOOD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY RAIN"
idx <- grep("^(HURRI)|^(TYPHOON)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HURRICANE/TYPHOON"
idx <- grep("^(?=.*HEAVY SNOW)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY SNOW"
idx <- grep("^(HIGH WIND)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HIGH WIND"
idx <- grep("^(HIGH )", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY SNOW"
idx <- grep("^(?=.*EXCESSIVE HEAT)(?!.*DROUGHT)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "HEAVY SNOW"
idx <- grep("^(?=.*DROUGHT)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "DROUGHT"
idx <- grep("^(?=.*TROPICAL STORM)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "TROPICAL STORM"
idx <- grep("^(?=.*FIRE)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WILDFIRE"
idx <- grep("^(?=.*ICE)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "ICE STORM"
idx <- grep("^(?=.*WINTER)(?!.*WEATHER)(?!.*BLIZZARD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WINTER STORM"
idx <- grep("^(?=.*AVALAN)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "AVALANCHE"
idx <- grep("^(?=.*BLIZZARD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "BLIZZARD"
idx <- grep("^(?=.*COASTAL FLOOD)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "COASTAL FLOOD"
idx <- grep("^(?=.*STORM SURGE)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "STORM TIDE"
idx <- grep("^(?=.*RIP CURRENT)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "RIP CURRENT"
idx <- grep("^(?=.*STREAM)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLOOD"
idx <- grep("^(?=.*WINTER)(?!.*STORM)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WINTER WEATHER"
idx <- grep("^(?=.*COLD)(?!.*SNOW)", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WINTER WEATHER"
idx <- grep("LAKE EFFECT SNOW", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "LAKE-EFFECT SNOW"
idx <- grep("FLOOD FLASH FLOOD", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FLASH FLOOD"
idx <- grep("LANDSLIDE", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "DEBRIS FLOW"
idx <- grep("FOG", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "FREEZING FOG"
idx <- grep("SLEET", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "SLEET"
idx <- grep("DUST", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "DUST DEVIL"
idx <- grep("WATERSPOUT", damage$EVENT, perl=TRUE,value = FALSE)
damage$EVENT[idx] = "WATERSPOUT"
# Event counts in the database that cannot be matching.
dif <- damage$EVENT[-(which(damage$EVENT %in% noaaevent))]
# observations counts
length(dif)
## [1] 1080
# unique event types no matchimg
head(sort(unique(dif)),10)
## [1] " " "AGRICULTURAL FREEZE"
## [3] "APACHE COUNTY" "ASTRONOMICAL HIGH TIDE"
## [5] "BEACH EROSION" "BLOWING SNOW"
## [7] "COASTAL EROSION" "COASTAL STORM"
## [9] "COASTAL SURGE" "COASTALSTORM"
if (!'ggplot2' %in% installed.packages()) install.packages('ggplot2')
library(ggplot2)
if (!'dplyr' %in% installed.packages()) install.packages('dplyr')
library(dplyr)
##
## Attaching package: 'dplyr'
##
## The following object is masked from 'package:stats':
##
## filter
##
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
#fatalities
fatalities <- aggregate(FATALITIES ~ EVENT, data = damage, FUN = sum)
fatalities <- head(arrange(fatalities,-FATALITIES),10)
print(fatalities)
## EVENT FATALITIES
## 1 HEAVY SNOW 2456
## 2 TORNADO 1649
## 3 FLASH FLOOD 1035
## 4 HEAT 937
## 5 LIGHTNING 817
## 6 RIP CURRENT 577
## 7 FLOOD 512
## 8 WINTER WEATHER 497
## 9 THUNDERSTORM WIND 443
## 10 AVALANCHE 225
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command.
fatalities <- transform(fatalities, EVENT = reorder(EVENT, -FATALITIES))
# Make the plot
ggplot(fatalities, aes(x=EVENT, y=FATALITIES)) +
geom_bar(stat="identity", fill="dark blue") +
geom_text(aes(label=FATALITIES), vjust=-0.4) +
xlab("Event Types") +
ylab("fatalities") +
theme(axis.title.x = element_blank(), axis.text.x = element_text(angle = 45,hjust = 1))+
theme(plot.title = element_text(lineheight=.5, face="bold"))+
ggtitle("Top 10 most harmful weather events (1993-2011): Fatalities counts")
# Injuries
injuries <- aggregate(INJURIES ~ EVENT, data = damage, FUN = sum)
injuries <- head(arrange(injuries,-INJURIES),10)
print(injuries)
## EVENT INJURIES
## 1 TORNADO 23371
## 2 HEAVY SNOW 9194
## 3 FLOOD 6874
## 4 THUNDERSTORM WIND 6086
## 5 LIGHTNING 5232
## 6 ICE STORM 2156
## 7 HEAT 2100
## 8 FLASH FLOOD 1800
## 9 WILDFIRE 1608
## 10 WINTER STORM 1353
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command.
injuries <- transform(injuries, EVENT = reorder(EVENT, -INJURIES))
# Make the plot
ggplot(injuries, aes(x=EVENT, y=INJURIES)) +
geom_bar(stat="identity", fill="dark blue") +
geom_text(aes(label=INJURIES), vjust=-0.4) +
xlab("Event Types") +
ylab("injuries") +
theme(axis.title.x = element_blank(), axis.text.x = element_text(angle = 45,hjust = 1))+
theme(plot.title = element_text(lineheight=.5, face="bold"))+
ggtitle("Top 10 most harmful weather events (1993-2011): Injuries counts")
# Property
property <- round(damage$PROPDMG*as.numeric(damage$PROPDMGEXP)/10^9,1)
property.by.event <- aggregate(property ~ EVENT, data = damage, FUN = sum)
property.by.event <- head(arrange(property.by.event,-property),10)
print(property.by.event)
## EVENT property
## 1 FLOOD 140.0
## 2 HURRICANE/TYPHOON 84.5
## 3 STORM TIDE 47.6
## 4 TORNADO 17.5
## 5 HAIL 10.6
## 6 FLASH FLOOD 7.7
## 7 WILDFIRE 7.2
## 8 TROPICAL STORM 6.9
## 9 WINTER STORM 6.0
## 10 THUNDERSTORM WIND 4.5
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command.
property <- transform(property.by.event, EVENT = reorder(EVENT, -property))
# Make the plot
ggplot(property, aes(x=EVENT, y=property)) +
geom_bar(stat="identity", fill="dark blue") +
geom_text(aes(label=property), vjust=-0.4) +
xlab("Event Types") +
ylab("billons ($)") +
theme(axis.title.x = element_blank(), axis.text.x = element_text(angle = 45,hjust = 1))+
theme(plot.title = element_text(lineheight=.5, face="bold"))+
ggtitle("Top 10 most harmful weather events (1993-2011): Damage to the Property")
# Crop
crop <- round(damage$CROPDMG*as.numeric(damage$CROPDMGEXP)/10^9, 1)
crop.by.event <- aggregate(crop ~ EVENT, data = damage, FUN = sum)
crop.by.event <-head(arrange(crop.by.event,-crop),10)
print(crop.by.event)
## EVENT crop
## 1 DROUGHT 13.4
## 2 FLOOD 7.5
## 3 ICE STORM 5.0
## 4 HURRICANE/TYPHOON 4.8
## 5 WINTER WEATHER 1.4
## 6 HEAVY SNOW 0.9
## 7 FROST FREEZE 0.8
## 8 FLASH FLOOD 0.7
## 9 HEAVY RAIN 0.5
## 10 HEAT 0.4
# Make EVENT an ordered factor
# We can do this with the re-order command and transform command.
crop <- transform(crop.by.event, EVENT = reorder(EVENT, -crop))
# Make the plot
ggplot(crop, aes(x=EVENT, y=crop)) +
geom_bar(stat="identity", fill="dark blue") +
geom_text(aes(label=crop), vjust=-0.4) +
xlab("Event Types") +
ylab("billons ($)") +
theme(axis.title.x = element_blank(), axis.text.x = element_text(angle = 45,hjust = 1))+
theme(plot.title = element_text(lineheight=.5, face="bold"))+
ggtitle("Top 10 most harmful weather events (1993-2011): Damage to the Crop")
The study suggests that Heavy Snow and Tornado make the major damage for the public health, while Flood and Hurricane have the most negative economic consequences.
The following report aim to guide actions to events that cause the major damage.