The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.
This Analysis attempts to answer the following Questions:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
Across the United States, which types of events have the greatest economic consequences?
The intended audience for this report is a goverment of municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events.
This report will provide an analysis of the data but will not make specific reccomendations.
Data is sourced from the NOAA database set provided here.
The first step, as always, is to read the data:
noaa<-read.csv(gzfile("repdata-data-StormData.csv.bz2"))
In order to determine the type of events that is most harmful to healt, we will consider both fatalities and injuries cased by the event.
require(plyr)
## Loading required package: plyr
harmful_index<-ddply(noaa, c("EVTYPE"), summarise, N=length(FATALITIES),
TOTAL_FATALITIES=sum(FATALITIES),
AVG_FATALITIES=mean(FATALITIES),
MEDIAN_FATALITIES=median(FATALITIES)
)
And sort it according to the number fo fatalities:
harmful_index<-harmful_index[order(-harmful_index$TOTAL_FATALITIES),]
To answer the questions, we need to determine the total costs of the event. This is done by multiplying the provided cost with the magnitude and adding the costs for crop damage and property damage:
calc_exp <- function(x, exp) {
if (any(exp == "B"))
return (x * 1000000000)
else if (exp == "M")
return (x * 1000000)
else if (exp == "H")
return (x * 100000)
else if (exp %in% c("k", "K"))
return (x * 1000)
else
return (1)
}
noaa$TOTCOST<-(calc_exp(noaa$PROPDMG, noaa$PROPDMGEXP) + calc_exp(noaa$CROPDMG, noaa$CROPDMGEXP))
This is then processed to an average cost per event type:
require(plyr)
cost_index<-ddply(noaa, c("EVTYPE"), summarise,
N=length(TOTCOST),
TOT_COST=sum(TOTCOST),
AVG_COST=mean(TOTCOST))
And sort it according to the average cost:
cost_index<-cost_index[order(-cost_index$TOT_COST),]
The assumption is made that average fatalities per event is the best indicator of the type of events that is most harmful to our health.
The following plot illustrates the 10 most harmful events with the average number of fatalities:
barplot(B, main=“MY NEW BARPLOT”, xlab=“LETTERS”, ylab=“MY Y VALUES”, names.arg=c(“A”,“B”,“C”,“D”,“E”,“F”,“G”), border=“red”, density=c(90, 70, 50, 40, 30, 20, 10))
harmful_index[1:10,]
## EVTYPE N TOTAL_FATALITIES AVG_FATALITIES
## 834 TORNADO 60652 5633 0.092874
## 130 EXCESSIVE HEAT 1678 1903 1.134088
## 153 FLASH FLOOD 54277 978 0.018019
## 275 HEAT 767 937 1.221643
## 464 LIGHTNING 15754 816 0.051796
## 856 TSTM WIND 219940 504 0.002292
## 170 FLOOD 25326 470 0.018558
## 585 RIP CURRENT 470 368 0.782979
## 359 HIGH WIND 20212 248 0.012270
## 19 AVALANCHE 386 224 0.580311
## MEDIAN_FATALITIES
## 834 0
## 130 0
## 153 0
## 275 0
## 464 0
## 856 0
## 170 0
## 585 1
## 359 0
## 19 0
Or in graphical format:
TOP10<-harmful_index[1:10,]
barplot(TOP10$TOTAL_FATALITIES,
main="Total fatalaties per event type",
xlab="Event type",
names.arg=TOP10$EVTYPE)
The assumption is made that the total cost per event type is the best indicator of type of event that have the largest economic consequence.
cost_index[1:10,]
## EVTYPE N TOT_COST AVG_COST
## 834 TORNADO 60652 3.312e+15 5.461e+10
## 153 FLASH FLOOD 54277 1.599e+15 2.947e+10
## 856 TSTM WIND 219940 1.445e+15 6.571e+09
## 244 HAIL 288661 1.268e+15 4.394e+09
## 170 FLOOD 25326 1.068e+15 4.217e+10
## 760 THUNDERSTORM WIND 82563 9.436e+14 1.143e+10
## 464 LIGHTNING 15754 6.069e+14 3.853e+10
## 786 THUNDERSTORM WINDS 20843 4.650e+14 2.231e+10
## 359 HIGH WIND 20212 3.420e+14 1.692e+10
## 972 WINTER STORM 11433 1.347e+14 1.178e+10
This can be graphically summarised as follows:
TOP10COST<-cost_index[1:10,]
barplot(TOP10COST$TOT_COST,
main="Total cost per event type",
xlab="Event type",
names.arg=TOP10COST$EVTYPE)