Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
We will be exploring the NOAA Storm Database to answer the following questions related to weather events:
During the analysis it was found that the most harmful event is the tornado causing with 5633 deaths and 91346 injuries. In terms of economical loses, floods has been the responsible of most of the properties loses, while drought has been the greater contributor to crops loses.
Getting and loading the data
The NOAA Storm Database to use will be obtained from the following link.
Once the data is downloaded to the destination, it will be extracted. We will keep the date of downloaded for future references.
if (!file.exists("StormData.csv.bz2")) {
fileURL <- 'https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2'
download.file(fileURL, destfile='StormData.csv.bz2', method = 'curl')
}
stormData <- read.csv(bzfile('StormData.csv.bz2'),header=TRUE, stringsAsFactors = FALSE)
downloadedData <- date()
We will perform some basic analysis.
summary(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE
## Min. : 1.0 Length:902297 Length:902297 Length:902297
## 1st Qu.:19.0 Class :character Class :character Class :character
## Median :30.0 Mode :character Mode :character Mode :character
## Mean :31.2
## 3rd Qu.:45.0
## Max. :95.0
##
## COUNTY COUNTYNAME STATE EVTYPE
## Min. : 0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 31 Class :character Class :character Class :character
## Median : 75 Mode :character Mode :character Mode :character
## Mean :101
## 3rd Qu.:131
## Max. :873
##
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE
## Min. : 0 Length:902297 Length:902297 Length:902297
## 1st Qu.: 0 Class :character Class :character Class :character
## Median : 0 Mode :character Mode :character Mode :character
## Mean : 1
## 3rd Qu.: 1
## Max. :3749
##
## END_TIME COUNTY_END COUNTYENDN END_RANGE
## Length:902297 Min. :0 Mode:logical Min. : 0
## Class :character 1st Qu.:0 NA's:902297 1st Qu.: 0
## Mode :character Median :0 Median : 0
## Mean :0 Mean : 1
## 3rd Qu.:0 3rd Qu.: 0
## Max. :0 Max. :925
##
## END_AZI END_LOCATI LENGTH WIDTH
## Length:902297 Length:902297 Min. : 0.0 Min. : 0
## Class :character Class :character 1st Qu.: 0.0 1st Qu.: 0
## Mode :character Mode :character Median : 0.0 Median : 0
## Mean : 0.2 Mean : 8
## 3rd Qu.: 0.0 3rd Qu.: 0
## Max. :2315.0 Max. :4400
##
## F MAG FATALITIES INJURIES
## Min. :0 Min. : 0 Min. : 0 Min. : 0.0
## 1st Qu.:0 1st Qu.: 0 1st Qu.: 0 1st Qu.: 0.0
## Median :1 Median : 50 Median : 0 Median : 0.0
## Mean :1 Mean : 47 Mean : 0 Mean : 0.2
## 3rd Qu.:1 3rd Qu.: 75 3rd Qu.: 0 3rd Qu.: 0.0
## Max. :5 Max. :22000 Max. :583 Max. :1700.0
## NA's :843563
## PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## Min. : 0 Length:902297 Min. : 0.0 Length:902297
## 1st Qu.: 0 Class :character 1st Qu.: 0.0 Class :character
## Median : 0 Mode :character Median : 0.0 Mode :character
## Mean : 12 Mean : 1.5
## 3rd Qu.: 0 3rd Qu.: 0.0
## Max. :5000 Max. :990.0
##
## WFO STATEOFFIC ZONENAMES LATITUDE
## Length:902297 Length:902297 Length:902297 Min. : 0
## Class :character Class :character Class :character 1st Qu.:2802
## Mode :character Mode :character Mode :character Median :3540
## Mean :2875
## 3rd Qu.:4019
## Max. :9706
## NA's :47
## LONGITUDE LATITUDE_E LONGITUDE_ REMARKS
## Min. :-14451 Min. : 0 Min. :-14455 Length:902297
## 1st Qu.: 7247 1st Qu.: 0 1st Qu.: 0 Class :character
## Median : 8707 Median : 0 Median : 0 Mode :character
## Mean : 6940 Mean :1452 Mean : 3509
## 3rd Qu.: 9605 3rd Qu.:3549 3rd Qu.: 8735
## Max. : 17124 Max. :9706 Max. :106220
## NA's :40
## REFNUM
## Min. : 1
## 1st Qu.:225575
## Median :451149
## Mean :451149
## 3rd Qu.:676723
## Max. :902297
##
head(stormData)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
From all the columns, only several of them will be revelant to our analysis. These are: * EVTYPE: the type of weather event * FATALITIES: the number of fatalities * INJURIES: the number of injuries * PROPDMG: the amount of property damage (in US dollars) * PROPDMGEXP: a multiplier for PROPDMG * CROPDMG: the amount of crop damage (in US dollars) * CROPDMGEXP: a multiplier for CROPDMG
For more information about the database, there is some documentation available in the following websites: National Weather Service Storm Data Documentation National Climatic Data Center Storm Events FAQ
As explained in the section 2.7 of the manual, the database contains the value of damage using non numeric values. For example, B stands for billions and K for thousands. For this reason we will convert the property damage and crop damage values to numerical. A new variable called TOTALPROPDMG will contain the total damage cost.
stormData$PROPDMGEXP <- as.character(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'H'] <- "2"
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'K'] <- "3"
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'M'] <- "6"
stormData$PROPDMGEXP[toupper(stormData$PROPDMGEXP) == 'B'] <- "9"
stormData$PROPDMGEXP <- as.numeric(stormData$PROPDMGEXP)
stormData$PROPDMGEXP[is.na(stormData$PROPDMGEXP)] <- 0
stormData$TOTALPROPDMG <- stormData$PROPDMG * 10^stormData$PROPDMGEXP
Now for he crop damage values
stormData$CROPDMGEXP <- as.character(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'H'] <- "2"
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'K'] <- "3"
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'M'] <- "6"
stormData$CROPDMGEXP[toupper(stormData$CROPDMGEXP) == 'B'] <- "9"
stormData$CROPDMGEXP <- as.numeric(stormData$CROPDMGEXP)
stormData$CROPDMGEXP[is.na(stormData$CROPDMGEXP)] <- 0
stormData$TOTALCROPDMG <- stormData$CROPDMG * 10^stormData$CROPDMGEXP
To answer this question we will present the results divided by fatalities and injuries. First, we will perform an aggregate function to obtain the sum of all fatalities caused by an specific event. From that, we will order the table in a decreasing order and take the top 10.
sumFatalities <- aggregate(stormData$FATALITIES, by = list(stormData$EVTYPE), "sum")
names(sumFatalities) <- c("Event", "Fatalities")
sumFatalities <- sumFatalities[order(-sumFatalities$Fatalities), ][1:10, ]
sumFatalities
## Event Fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
The same will be done for injuries.
sumInjuries <- aggregate(stormData$INJURIES, by = list(stormData$EVTYPE), "sum")
names(sumInjuries) <- c("Event", "Injuries")
sumInjuries <- sumInjuries[order(-sumInjuries$Injuries), ][1:10, ]
sumInjuries
## Event Injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
We will plot both tables using a boxplot.
par(mfrow = c(1, 2), mar = c(12, 5, 3, 2), mgp = c(3, 1, 0), cex = 0.8, las = 3)
barplot(sumFatalities$Fatalities, names.arg = sumFatalities$Event, col = 'red',
main = 'Top 10 Weather Events for Fatalities', ylab = 'Number of Fatalities')
barplot(sumInjuries$Injuries, names.arg = sumInjuries$Event, col = 'blue',
main = 'Top 10 Weather Events for Injuries', ylab = 'Number of Injuries')
A similar approach as the first question will be use to solve this problem.
Events that caused the most cost in damage to properties.
sumPropDmg <- aggregate(stormData$TOTALPROPDMG, by = list(stormData$EVTYPE), "sum")
names(sumPropDmg) <- c("Event", "Cost")
sumPropDmg <- sumPropDmg[order(-sumPropDmg$Cost), ][1:10, ]
sumPropDmg
## Event Cost
## 170 FLOOD 1.447e+11
## 411 HURRICANE/TYPHOON 6.931e+10
## 834 TORNADO 5.695e+10
## 670 STORM SURGE 4.332e+10
## 153 FLASH FLOOD 1.682e+10
## 244 HAIL 1.574e+10
## 402 HURRICANE 1.187e+10
## 848 TROPICAL STORM 7.704e+09
## 972 WINTER STORM 6.688e+09
## 359 HIGH WIND 5.270e+09
Events that caused the most cost in damage to crops.
sumCropDmg <- aggregate(stormData$TOTALCROPDMG, by = list(stormData$EVTYPE), "sum")
names(sumCropDmg) <- c("Event", "Cost")
sumCropDmg <- sumCropDmg[order(-sumCropDmg$Cost), ][1:10, ]
sumCropDmg
## Event Cost
## 95 DROUGHT 1.397e+10
## 170 FLOOD 5.662e+09
## 590 RIVER FLOOD 5.029e+09
## 427 ICE STORM 5.022e+09
## 244 HAIL 3.026e+09
## 402 HURRICANE 2.742e+09
## 411 HURRICANE/TYPHOON 2.608e+09
## 153 FLASH FLOOD 1.421e+09
## 140 EXTREME COLD 1.293e+09
## 212 FROST/FREEZE 1.094e+09
library(reshape2)
library(ggplot2)
fatalitiesAndDamage <- merge(x = sumPropDmg, y = sumCropDmg, by = "Event", all = TRUE)
fatalitiesAndDamage <- melt(fatalitiesAndDamage, id.vars = 'Event')
ggplot(fatalitiesAndDamage, aes(Event, value)) +
geom_bar(aes(fill = variable), position = "dodge", stat="identity") +
theme(axis.text.x = element_text(angle = 90, hjust = 1)) + xlab("Event Type") +
ylab("Damage, USD") + ggtitle("Crop/Property damage by type")