Author: Bharat S Raj
The goal of this report is show the most harmful events that can be affect the united states, in order to take some decisions related to the future investments in planning and management of damages. The report identifies the most significant weather event types with the largest impact on population health (as measured by number of fatalities and injuries) and the largest economic consequences (as measured by the property damage crop damage sustained during the event). The data was collected during the period from 1950 and November 2011. The purpose of this analysis is to answer the following two questions:
Loading the data
The dataset can be downloaded from this link Storm Data [46.9 MB]
We assume the data file is in the working directory
cache = TRUE
storm <- read.csv(bzfile("repdata-data-StormData.csv.bz2"), header = TRUE, stringsAsFactors = FALSE)
# convert letter exponents to integers
storm[(storm$PROPDMGEXP == "K" | storm$PROPDMGEXP == "k"), ]$PROPDMGEXP <- 3
storm[(storm$PROPDMGEXP == "M" | storm$PROPDMGEXP == "m"), ]$PROPDMGEXP <- 6
storm[(storm$PROPDMGEXP == "B" | storm$PROPDMGEXP == "b"), ]$PROPDMGEXP <- 9
storm[(storm$CROPDMGEXP == "K" | storm$CROPDMGEXP == "k"), ]$CROPDMGEXP <- 3
storm[(storm$CROPDMGEXP == "M" | storm$CROPDMGEXP == "m"), ]$CROPDMGEXP <- 6
storm[(storm$CROPDMGEXP == "B" | storm$CROPDMGEXP == "b"), ]$CROPDMGEXP <- 9
# multiply property and crops damage by 10 raised to the power of the exponent
suppressWarnings(storm$PROPDMG <- storm$PROPDMG * 10^as.numeric(storm$PROPDMGEXP))
suppressWarnings(storm$CROPDMG <- storm$CROPDMG * 10^as.numeric(storm$CROPDMGEXP))
# compute combined economic damage (property damage + crops damage)
suppressWarnings(TOTECODMG <- storm$PROPDMG + storm$CROPDMG)
Across the United States, which types of events are most harmful with respect to population health ?
Aggregate data for fatalities
fatalities <- aggregate(FATALITIES ~ EVTYPE, data = storm, FUN = sum)
fatalities <- fatalities[order(fatalities$FATALITIES, decreasing = TRUE), ]
# 5 most harmful causes of fatalities
fatalitiesMax <- fatalities[1:5, ]
print(fatalitiesMax)
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
Aggregate data for injuries
injuries <- aggregate(INJURIES ~ EVTYPE, data = storm, FUN = sum)
injuries <- injuries[order(injuries$INJURIES, decreasing = TRUE), ]
# 5 most harmful causes of injuries
injuriesMax <- injuries[1:5, ]
print(injuriesMax)
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
Plotting the data for 5 most dangerous events for each type of damage
For plotting graph, used ggplot2 package in this analysis
library(ggplot2)
ggplot(data = fatalitiesMax, aes(x = fatalitiesMax$EVTYPE, y = fatalitiesMax$FATALITIES)) + geom_bar(colour = "black", fill = "blue", stat = "identity") + xlab("Event Type") + ylab("Number of fatalities") + ggtitle("Total number of fatalities, 1950 - 2011") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
ggplot(data = injuriesMax, aes(x = injuriesMax$EVTYPE, y = injuriesMax$INJURIES)) + geom_bar(colour = "black", fill = "blue", stat = "identity") + xlab("Event Type") + ylab("Number of injuries") + ggtitle("Total number of Injuries, 1950 - 2011") + theme(axis.text.x = element_text(angle = 90, hjust = 1))
Across the United States, which types of events have the greatest economic consequences ?
Aggregate data for property damage
propdmg <- aggregate(PROPDMG ~ EVTYPE, data = storm, FUN = sum)
propdmg <- propdmg[order(propdmg$PROPDMG, decreasing = TRUE), ]
# 5 most harmful causes of injuries
propdmgMax <- propdmg[1:5, ]
print(propdmgMax)
## EVTYPE PROPDMG
## 62 FLOOD 144657709800
## 179 HURRICANE/TYPHOON 69305840000
## 332 TORNADO 56947380614
## 281 STORM SURGE 43323536000
## 50 FLASH FLOOD 16822673772
Aggregate data for crop damage
cropdmg <- aggregate(CROPDMG ~ EVTYPE, data = storm, FUN = sum)
cropdmg <- cropdmg[order(cropdmg$CROPDMG, decreasing = TRUE), ]
# 5 most harmful causes of injuries
cropdmgMax <- cropdmg[1:5, ]
print(cropdmgMax)
## EVTYPE CROPDMG
## 16 DROUGHT 13972566000
## 34 FLOOD 5661968450
## 98 RIVER FLOOD 5029459000
## 85 ICE STORM 5022113500
## 52 HAIL 3025954470
Aggregate total economic damage
ecodmg <- aggregate(TOTECODMG ~ EVTYPE, data = storm, FUN = sum)
ecodmg <- ecodmg[order(ecodmg$TOTECODMG, decreasing = TRUE), ]
# 5 most harmful causes of property damage
ecodmgMax <- ecodmg[1:5, ]
print(ecodmgMax)
## EVTYPE TOTECODMG
## 23 FLOOD 138007444500
## 62 HURRICANE/TYPHOON 29348167800
## 99 TORNADO 16570326363
## 57 HURRICANE 12405268000
## 75 RIVER FLOOD 10108369000
Total Economic Damage Graph Plot:
# total economic damage (property + crops)
ggplot(data = ecodmgMax, aes(x = ecodmgMax$EVTYPE, y = ecodmgMax$TOTECODMG/10^9)) +
geom_bar(colour = "black", fill = "blue", stat = "identity") + xlab("Event Type") +
ylab("Total damage, bln USD") + ggtitle("Total economic damage 1950 - 2011, billions USD") +
theme(axis.text.x = element_text(angle = 90, hjust = 1))
Tornados have caused the greatest number of fatalities - 5,633 and injuries - 91,346 followed by Heat in terms of fatalities 1,903 (6,525 injuries slightly less than Thunderstorm Wind 6957 injuries which is the second harmful cause in terms of injuries).
Floods have caused the most significant economic damage 138,007,444,500 USD (combined for property loss and crops damage) followed by Hurricanes and Typhoons - 29,348,167,800 USD