The basic goal of this assignment is to explore the U.S. National Oceanic and Atmospheric Administration (NOAA) Storm Database and answer some basic questions about severe weather events.
This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The events in the database start in the year 1950 and end in November 2011.
We give answers to the following questions:
Across the United States, which types of events are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
From the analysis we can say:
The tornados are responsible for a maximum number of fatalities and injuries.
The floods are responsbile for maximum property damage, while droughts cause maximum crop damage.
library(dplyr)
library(ggplot2)
library(gridExtra)
library(rmarkdown)
library(knitr)
data <- read.csv("repdata_data_StormData.csv")
dim(data)
## [1] 902297 37
str(data)
View(data)
data <- tbl_df(data)
fatalities <- aggregate(FATALITIES~EVTYPE, data, sum)
fatalities <- arrange(fatalities, desc(FATALITIES))
fatalities10 <- fatalities[1:10, ]
injuries <- aggregate(INJURIES~EVTYPE, data, sum)
injuries <- arrange(injuries, desc(INJURIES))
injuries10 <- injuries[1:10,]
fatalities10
## EVTYPE FATALITIES
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
injuries10
## EVTYPE INJURIES
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
We will graph the previous results to have an overview of them
plotfatalities = ggplot(fatalities10, aes(x = EVTYPE, y = FATALITIES)) + geom_bar(stat = "identity", fill = "green") + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) + xlab("Event Type") + ylab("Fatalities") + ggtitle("Fatalities by top 10 Weather Event Types") + theme(plot.title = element_text(size = 10))
plotinjuries <- ggplot(injuries10, aes(x = EVTYPE, y = INJURIES)) + geom_bar(stat = "identity", fill = "blue") + theme(axis.text.x = element_text(angle = 90, hjust = 1, size = 8)) + xlab("Event Type") + ylab("Injuries") + ggtitle("Injuries by top 10 Weather Event Types") + theme(plot.title = element_text(size = 10))
grid.arrange(plotfatalities, plotinjuries, ncol = 2, top = "Most Harmful Events with Respect to Population Health")
When reviewing column names, we have property damage like (PROPDMG) and crop damage like (CROPDMG). But we have to transform their values.
Calculating property damage
unique(data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
damageProperty <- select(data, EVTYPE, PROPDMG, PROPDMGEXP)
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "0"] <- 1
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "1"] <- 10
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "2"] <- 100
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "3"] <- 1000
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "4"] <- 10000
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "5"] <- 1e+05
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "6"] <- 1e+06
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "7"] <- 1e+07
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "8"] <- 1e+08
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "B"] <- 1e+09
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "h"] <- 100
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "H"] <- 100
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "K"] <- 1000
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "m"] <- 1e+06
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "M"] <- 1e+06
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == ""] <- 1
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "+"] <- 0
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "-"] <- 0
damageProperty$ChangeExp[damageProperty$PROPDMGEXP == "?"] <- 0
damageProperty$damageValue <- damageProperty$PROPDMG*damageProperty$ChangeExp
damageValueProperty <- aggregate(damageValue ~ EVTYPE, damageProperty, sum)
damageValueProperty <- arrange(damageValueProperty, desc(damageValue))
damageValueProperty10 <- damageValueProperty[1:10,]
The 10 events with the greatest property damage
damageValueProperty10
## EVTYPE damageValue
## 1 FLOOD 144657709807
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56947380617
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16822673979
## 6 HAIL 15735267513
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497251
## 10 HIGH WIND 5270046260
Calculating crop damage
unique(data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
damageCrop <- select(data, EVTYPE, CROPDMG,CROPDMGEXP)
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "0"] <- 1
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "2"] <- 100
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "B"] <- 1e+09
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "K"] <- 1000
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "k"] <- 1000
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "m"] <- 1e+06
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "M"] <- 1e+06
damageCrop$ChangeExp[damageCrop$CROPDMGEXP == "?"] <- 0
damageCrop$damageValue <- damageCrop$CROPDMG*damageCrop$ChangeExp
damageValueCrop <- aggregate(damageValue ~ EVTYPE, damageCrop, sum)
damageValueCrop <- arrange(damageValueCrop, desc(damageValue))
damageValueCrop10 <- damageValueCrop[1:10,]
The 10 events with the greatest crop damage
damageValueCrop10
## EVTYPE damageValue
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954470
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
We will graph the previous results to have an overview of them
par(mfrow=c(1,2),mar=c(11,3,3,2))
barplot(damageValueProperty10$damageValue/(10^9),names.arg=damageValueProperty10$EVTYPE,las=2,col="blue", ylab = "Property damage (billions)",main="Property Damages")
barplot(damageValueCrop10$damageValue/(10^9),names.arg=damageValueCrop10$EVTYPE,las=2,col="green", ylab = "Crop damage (billions)",main="Crop Damages")
From the analysis made, we can say that tornados have caused the greatest number of fatalities and injuries throughout the registered period. 5633 and 91346 respectively.
Also floods are the cause of the greatest damage to property, 144657709807 dollars in the records. While droughts are responsible for the greatest losses to crops. 13972566000 dollars