Goal of this assignment is to explore the NOAA Storm Database and answer below questions: 1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? 2. Across the United States, which types of events have the greatest economic consequences? It is observed that: 1. Most harmful events to population health, wrt. both fatalities and injuries, are Tornadoes. 2. Floods are the events which have greatest economic consequences based on Total damage (Dollars) wrt. both Property and Crop damages.
We are going to use fread to read the CSV file into a data.table
stormData <- data.table(fread("StormData.csv", header = TRUE, sep = ","))
Number of records in the file: 902297
Number of columns in the file: 37
Number of Unique Weather Events: 985
Group by event types and sum up the fatalities to get total fatalities for each event
popFatal <- stormData[,.(FATALITIES = sum(FATALITIES)), by = EVTYPE][order(-FATALITIES)]
Group by event types and sum up the injuries to get total injuries for each event
popInjury <- stormData[,.(INJURIES = sum(INJURIES)), by = EVTYPE][order(-INJURIES)]
Property and Crop damages are reported separately. Damages cost is expressed in Thousand or Million or Billion dollars.
propDamage <- stormData[,c(8,25,26,27,28)]
propDamage[,PROPDMG := PROPDMG][PROPDMGEXP == 'K',
PROPDMG := PROPDMG*1000][PROPDMGEXP %in% c('M','m'),
PROPDMG := PROPDMG*1000000][PROPDMGEXP == 'B',
PROPDMG := PROPDMG*1000000000]
propDamage[,CROPDMG := CROPDMG][CROPDMGEXP %in% c('K','k'),
CROPDMG := CROPDMG*1000][CROPDMGEXP %in% c('M','m'),
CROPDMG := CROPDMG*1000000][CROPDMGEXP == 'B',
CROPDMG := CROPDMG*1000000000]
We are going to find the events which caused highest damage based on both property and crop. Below code sums up the property damages and crop damages based on each weather event:
propDamage[,TOTALDMG := (PROPDMG + CROPDMG)]
totalPropDmg <- propDamage[,.(Damage = sum(TOTALDMG)), by = EVTYPE][order(-Damage)]
We are going to find harmful events for population fatalities and injuries seprately. Please note that only top 10 events are considered. #### Fatalities Below is the summary of events based on total number of fatalities for each type of event.
head(popFatal, 10)
## EVTYPE FATALITIES
## 1: TORNADO 5633
## 2: EXCESSIVE HEAT 1903
## 3: FLASH FLOOD 978
## 4: HEAT 937
## 5: LIGHTNING 816
## 6: TSTM WIND 504
## 7: FLOOD 470
## 8: RIP CURRENT 368
## 9: HIGH WIND 248
## 10: AVALANCHE 224
ggplot(popFatal[1:10], aes(x = reorder(EVTYPE,FATALITIES), y = FATALITIES)) +
geom_bar(stat = "identity") +
ggtitle("Top 10 Weather Events by Fatalities") +
labs(x = "Event Type", y = "Fatalities") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))
It can be observed that Tornadoes caused highest number of fatalities across the US.
Below is the summary of events based on total number of injuries for each type of event.
head(popInjury, 10)
## EVTYPE INJURIES
## 1: TORNADO 91346
## 2: TSTM WIND 6957
## 3: FLOOD 6789
## 4: EXCESSIVE HEAT 6525
## 5: LIGHTNING 5230
## 6: HEAT 2100
## 7: ICE STORM 1975
## 8: FLASH FLOOD 1777
## 9: THUNDERSTORM WIND 1488
## 10: HAIL 1361
ggplot(popInjury[1:10], aes(x = reorder(EVTYPE,INJURIES), y = INJURIES)) +
geom_bar(stat = "identity") +
ggtitle("Top 10 Weather Events by Injuries") +
labs(x = "Event Type", y = "Injuries") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))
It can be observed that Tornadoes caused highest number of injuries across the US.
We have analyzed damages of property and crops in the data pre-processing section - by summing up the damages to find out the Total damages caused for each weather event type. Below is the summary:
head(totalPropDmg, 10)
## EVTYPE Damage
## 1: FLOOD 150319678257
## 2: HURRICANE/TYPHOON 71913712800
## 3: TORNADO 57352114049
## 4: STORM SURGE 43323541000
## 5: HAIL 18758221521
## 6: FLASH FLOOD 17562129167
## 7: DROUGHT 15018672000
## 8: HURRICANE 14610229010
## 9: RIVER FLOOD 10148404500
## 10: ICE STORM 8967041360
ggplot(totalPropDmg[1:10], aes(x = reorder(EVTYPE,Damage), y = Damage/10^9)) +
geom_bar(stat = "identity") +
ggtitle("Top 10 Weather Events by Damages to Property and Crop") +
labs(x = "Event Type", y = "Damages (in Billion $s)") +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
plot.title = element_text(hjust = 0.5))
It can be observed that Floods are major events that caused damages to property and crops.