Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
We will find across the United States :
Which types of events are most harmful with respect to population health?
Which types of events have the greatest economic consequences?
In summary we can say that the Tornado is most harmful event causing 5.633 deaths and 91.346 injuries. In terms of economical consequences, flood has been the responsible of most of the properties losses, while drought has been the reponsible of most of the crops losses.
The data for this analysis can be download from web site Storm Data
There is also some documentation of the database available. Here we can find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.
First, we download the data file from the site Storm Data and unzip it.
if (!file.exists("StormData.csv.bz2")) {
fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(fileURL, destfile='StormData.csv.bz2')
}
Then, we read the generated csv file .
stormData <- read.csv(bzfile('StormData.csv.bz2'),header=TRUE, stringsAsFactors = FALSE)
The file contains 902297 rows and 37 columns. The names of the columns are:
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We can create a subset with the informations relevant for this analysis. So a subset containing weather event, health and economic impact data is created. These columns are:
col<-c("BGN_DATE","EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
subsetData <-stormData[col]
even<-unique(subsetData$EVTYPE)
length(even)
## [1] 985
There are 985 different types of events. We will analyse events which impact more.
PROPDMGEXP and CROPDMGEXP are character type. Before we start our analysis we need to convert the field PROPDMGEXP and CROPDMGEXP (character field) to Numeric field. We create two new numeric columns. Note: Hundred (H), Thousand (K), Million (M) and Billion (B)
unique(subsetData$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
## Funtion to convert PROPDMGEXP as a number
library(plyr)
subsetData$PROPex<-subsetData$PROPDMGEXP
subsetData$PROPex <- revalue(subsetData$PROPex, c("K"="3", "M"="6","m"="6","B"="9","+"="0","h"="2","H"="2","-"="0","?"="0"))
subsetData$PROPex[subsetData$PROPex==""]<-"0"
subsetData$PROPex<-as.numeric(subsetData$PROPex)
unique(subsetData$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
## Funtion to convert PROPDMGEXP as a number
subsetData$CROPex<-subsetData$CROPDMGEXP
subsetData$CROPex <- revalue(subsetData$CROPex, c("K"="3","k"="3", "M"="6","m"="6","B"="9","?"="0"))
subsetData$CROPex[subsetData$CROPex==""]<-"0"
subsetData$CROPex<-as.numeric(subsetData$CROPex)
We calculate the Total property damage and the Total Crop damage
subsetData$TOTALPROPDMG <- subsetData$PROPDMG * (10^subsetData$PROPex)
subsetData$TOTALCROPDMG <- subsetData$CROPDMG * (10^subsetData$CROPex)
1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
In this section we analyse about the top 10 of weather events that caused the most fatalities and injuries.
Fatalities
agrfatalities<-aggregate(FATALITIES~EVTYPE, data = subsetData, "sum")
fatalities<-agrfatalities[order(-agrfatalities$FATALITIES), ][1:10, ]
fatalities
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
Injuries
agrinjuries<-aggregate(INJURIES~EVTYPE, data = subsetData, "sum")
injuries<-agrinjuries[order(-agrinjuries$INJURIES), ][1:10, ]
injuries
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
Lets analyse graphically,
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.2.3
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.2.3
# First plot
plotfatalities <-qplot(EVTYPE, data = fatalities, weight = FATALITIES, geom = "bar", fill = I("cyan")) +
xlim("TORNADO","EXCESSIVE HEAT","FLASH FLOOD","HEAT","LIGHTNING", "TSTM WIND","FLOOD","RIP CURRENT","HIGH WIND","AVALANCHE")+
scale_y_continuous("Number of Fatalities") + theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") +
ggtitle("The Top 10 Weather Events\n by fatalities")
# Second plot
plotinjuries<-qplot(EVTYPE, data = injuries, weight = INJURIES, geom = "bar", fill = I("violet")) + xlim("TORNADO","TSTM WIND","FLOOD","EXCESSIVE HEAT","LIGHTNING", "HEAT","ICE STORM","FLASH FLOOD","THUNDERSTORM WIND","HAIL")+
scale_y_continuous("Number of injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") +
ggtitle("The Top 10 Weather Events\n by injuries")
grid.arrange(plotfatalities, plotinjuries, ncol = 2)
Tornado is the weather event that produce more fatalities and injuries.
2. Across the United States, which types of events have the greatest economic consequences?
In this section we analyse about the top 10 of weather events that have the greastest economic impact in properties and crop.
Properties Damage
agrproperties<-aggregate(TOTALPROPDMG~EVTYPE, data = subsetData, "sum")
properties<-agrproperties[order(-agrproperties$TOTALPROPDMG), ][1:10, ]
properties
## EVTYPE TOTALPROPDMG
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380677
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046295
Crop Damage
agrcrop<-aggregate(TOTALCROPDMG~EVTYPE, data = subsetData, "sum")
crop<-agrcrop[order(-agrcrop$TOTALCROPDMG), ][1:10, ]
crop
## EVTYPE TOTALCROPDMG
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
# First plot
plotproperties<-qplot(EVTYPE, data = properties, weight = TOTALPROPDMG, geom = "bar", fill = I("cyan")) +
xlim("FLOOD","HURRICANE/TYPHOON","TORNADO","STORM SURGE","FLASH FLOOD","HAIL","HURRICANE","TROPICAL STORM","WINTER STORM","HIGH WIND")+
scale_y_continuous("Prop. Damage") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") +
ggtitle("The Top 10 Weather Events\n with Highest Property Damage")
# Second plot
plotcrop<-qplot(EVTYPE, data = crop, weight = TOTALCROPDMG, geom = "bar", fill = I("violet")) +
xlim("DROUGHT","FLOOD","RIVER FLOOD","ICE STORM","HAIL","HURRICANE","HURRICANE/TYPHOON","FLASH FLOOD","EXTREME COLD","FROST/FREEZE")+
scale_y_continuous("Prop. Crop") +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) + xlab("Weather Type") +
ggtitle("The Top 10 Weather Events\n with Highest Crop Damage")
grid.arrange(plotproperties, plotcrop, ncol = 2)
Flood have maximum impact on Properties and Drought have maximum impact on Crops.