Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
At this time we will just present some descriptive analysis.
data<-read.csv(bzfile('repdata_data_StormData.csv.bz2'))
The first problem we face, is the fact the classification of type of events is to big.
length(levels(data$EVTYPE))
## [1] 985
And some of these events have very low frequencies, thus we could aggregate or collapse these types in more general categories; but sinces we are not any kind of experts in the matter to do this. We are treating each type as different to the rest.
Our objetive is to identify which type of events are the most harmfull for population health, an which have the greatest economic consequences. Related to public health we created a new variable that adds the fatalities of these events and the injuries, resulting in the number of ‘VICTIMS’ of these events; for the economic damages we add the property damage and the crop damage, diving un just economic ‘DAMAGE’.
data$VICTIMS<-data$FATALITIES+data$INJURIES
data$DAMAGE<-data$PROPDMG+data$CROPDMG
The next figure summarizes thes information for the 5 greatest types for each variable.
par(mfrow=c(3,1))
barplot(sort(table(data$EVTYPE),decreasing=TRUE)[1:5],main='Frequency')
vic_tot<-aggregate(VICTIMS~EVTYPE,FUN=sum,data=data)
barplot(vic_tot[order(vic_tot$VICTIMS,decreasing=TRUE),][1:5,2],main='Total victims',names.arg=vic_tot[order(vic_tot$VICTIMS,decreasing=TRUE),][1:5,1])
dmg_tot<-aggregate(DAMAGE~EVTYPE,FUN=sum,data=data)
barplot(dmg_tot[order(dmg_tot$DAMAGE,decreasing=TRUE),][1:5,2],main='Total damage',names.arg=dmg_tot[order(dmg_tot$DAMAGE,decreasing=TRUE),][1:5,1],ylab='$')
Another way to observe the impact is to take into account the frequency of each type, we can acompplish this by considering the mean of each storm type, the boxplot analysis can help us with this, we consider the five types with greater mean for each variable:
vic_rel<-aggregate(VICTIMS~EVTYPE,FUN=mean,data=data)
vic_rel[order(vic_rel$VICTIMS,decreasing=TRUE),][1:5,]
## EVTYPE VICTIMS
## 273 Heat Wave 70.00
## 847 TROPICAL STORM GORDON 51.00
## 955 WILD FIRES 38.25
## 756 THUNDERSTORMW 27.00
## 833 TORNADOES, TSTM WIND, HAIL 25.00
dam_rel<-aggregate(DAMAGE~EVTYPE,FUN=mean,data=data)
dam_rel[order(dam_rel$DAMAGE,decreasing=TRUE),][1:5,]
## EVTYPE DAMAGE
## 847 TROPICAL STORM GORDON 1000
## 43 COASTAL EROSION 766
## 285 HEAVY RAIN AND FLOOD 600
## 585 RIVER AND STREAM FLOOD 600
## 440 Landslump 570
But when we observe how many of these events have happened, we realize there are only 8 events, but quite big ones in terms of victims:
nrow(data[data$EVTYPE %in% as.character(vic_rel[order(vic_rel$VICTIMS,decreasing=TRUE),][1:5,1]),])
## [1] 8
Those with de biggest mean dammage are just 6:
nrow(data[data$EVTYPE %in% as.character(dam_rel[order(dam_rel$DAMAGE,decreasing=TRUE),][1:5,1]),])
## [1] 6
A better analysis could be accompilsh by someone with a better background regarding the topic.
But it is clear to us that this type of data comes with some complications.
First the prevention must be done obviosly for the types which has shown bigger total impact, but you cannot forget those which even when rare, has a great individual impact. These may be contradictory.
On the other hand we have a complexity in the nature of the phenomenon; is this classification accurate, is it the best we could have for adressing this problem, I really have no answer.
Sorry for the lousy english.