Synopsis

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

At this time we will just present some descriptive analysis.

Data Processing

data<-read.csv(bzfile('repdata_data_StormData.csv.bz2'))

The first problem we face, is the fact the classification of type of events is to big.

length(levels(data$EVTYPE))
## [1] 985

And some of these events have very low frequencies, thus we could aggregate or collapse these types in more general categories; but sinces we are not any kind of experts in the matter to do this. We are treating each type as different to the rest.

Our objetive is to identify which type of events are the most harmfull for population health, an which have the greatest economic consequences. Related to public health we created a new variable that adds the fatalities of these events and the injuries, resulting in the number of ‘VICTIMS’ of these events; for the economic damages we add the property damage and the crop damage, diving un just economic ‘DAMAGE’.

data$VICTIMS<-data$FATALITIES+data$INJURIES
data$DAMAGE<-data$PROPDMG+data$CROPDMG

Total impact

The next figure summarizes thes information for the 5 greatest types for each variable.

par(mfrow=c(3,1))
barplot(sort(table(data$EVTYPE),decreasing=TRUE)[1:5],main='Frequency')
vic_tot<-aggregate(VICTIMS~EVTYPE,FUN=sum,data=data)
barplot(vic_tot[order(vic_tot$VICTIMS,decreasing=TRUE),][1:5,2],main='Total victims',names.arg=vic_tot[order(vic_tot$VICTIMS,decreasing=TRUE),][1:5,1])
dmg_tot<-aggregate(DAMAGE~EVTYPE,FUN=sum,data=data)
barplot(dmg_tot[order(dmg_tot$DAMAGE,decreasing=TRUE),][1:5,2],main='Total damage',names.arg=dmg_tot[order(dmg_tot$DAMAGE,decreasing=TRUE),][1:5,1],ylab='$')

Relative impact

Another way to observe the impact is to take into account the frequency of each type, we can acompplish this by considering the mean of each storm type, the boxplot analysis can help us with this, we consider the five types with greater mean for each variable:

vic_rel<-aggregate(VICTIMS~EVTYPE,FUN=mean,data=data)
vic_rel[order(vic_rel$VICTIMS,decreasing=TRUE),][1:5,]
##                         EVTYPE VICTIMS
## 273                  Heat Wave   70.00
## 847      TROPICAL STORM GORDON   51.00
## 955                 WILD FIRES   38.25
## 756              THUNDERSTORMW   27.00
## 833 TORNADOES, TSTM WIND, HAIL   25.00
dam_rel<-aggregate(DAMAGE~EVTYPE,FUN=mean,data=data)
dam_rel[order(dam_rel$DAMAGE,decreasing=TRUE),][1:5,]
##                     EVTYPE DAMAGE
## 847  TROPICAL STORM GORDON   1000
## 43         COASTAL EROSION    766
## 285   HEAVY RAIN AND FLOOD    600
## 585 RIVER AND STREAM FLOOD    600
## 440              Landslump    570

But when we observe how many of these events have happened, we realize there are only 8 events, but quite big ones in terms of victims:

nrow(data[data$EVTYPE %in% as.character(vic_rel[order(vic_rel$VICTIMS,decreasing=TRUE),][1:5,1]),])
## [1] 8

Those with de biggest mean dammage are just 6:

nrow(data[data$EVTYPE %in% as.character(dam_rel[order(dam_rel$DAMAGE,decreasing=TRUE),][1:5,1]),])
## [1] 6

Results

A better analysis could be accompilsh by someone with a better background regarding the topic.

But it is clear to us that this type of data comes with some complications.

First the prevention must be done obviosly for the types which has shown bigger total impact, but you cannot forget those which even when rare, has a great individual impact. These may be contradictory.

On the other hand we have a complexity in the nature of the phenomenon; is this classification accurate, is it the best we could have for adressing this problem, I really have no answer.

Sorry for the lousy english.