Synopsis

There are four variables that describe the effect of weather events on the health and economy. They are: Fatalities and Injuries for population healt and Property and Crop Damages as economic consequences. Tornados is the only weather event that has a big impact on all the variables. Actualy, it is the most important event for Injuries, Fatalities and Property Damage. Also Flash Flood and Flood are causing effect in many cases. TstmWind is important for the econonomic consequences instead Lighting focalizes on the healt damage.

Data Processing

The dataset is load directly from the compress file in a raw data.frame called df:

df <- read.csv(gzfile("repdata_data_StormData.csv.bz2"),header=TRUE, sep=',',dec='.')

The first question is:

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

The health population is described by two variables: Fatalities and Injuries. Each of them is extracted and nomalized in a vector, ordered in decreasing order vs the event type.

fat<-tapply(df$FATALITIES,df$EVTYPE,sum)
fat<-fat[order(-fat)]
fat<-fat/sum(fat)*100
inj<-tapply(df$INJURIES,df$EVTYPE,sum)
inj<-inj[order(-inj)]
inj<-inj/sum(inj)*100

The second question is:

Across the United States, which types of events have the greatest economic consequences?

The economic consequences are described by two variables: Property Damages and Crop Damages. Each of them is extracted in a vector ordered in decreasing order vs the event type.

prop<-tapply(df$PROPDMG,df$EVTYPE,sum)
prop<-prop[order(-prop)]
crop<-tapply(df$CROPDMG,df$EVTYPE,sum)
crop<-crop[order(-crop)]

Results

THe four interesting variables a broadly distributed as it is evident from the following summary table.

summary(df[,c('FATALITIES','INJURIES','PROPDMG','CROPDMG')])
##    FATALITIES          INJURIES            PROPDMG           CROPDMG       
##  Min.   :  0.0000   Min.   :   0.0000   Min.   :   0.00   Min.   :  0.000  
##  1st Qu.:  0.0000   1st Qu.:   0.0000   1st Qu.:   0.00   1st Qu.:  0.000  
##  Median :  0.0000   Median :   0.0000   Median :   0.00   Median :  0.000  
##  Mean   :  0.0168   Mean   :   0.1557   Mean   :  12.06   Mean   :  1.527  
##  3rd Qu.:  0.0000   3rd Qu.:   0.0000   3rd Qu.:   0.50   3rd Qu.:  0.000  
##  Max.   :583.0000   Max.   :1700.0000   Max.   :5000.00   Max.   :990.000

Distributions of the variables are different. The healt variable are decreasing instead the economic variables are more centered as shown by their histograms.

par(mfrow=c(2,2),cex.axis=0.5,cex.main=1.5,mar=c(5,4,3,4))
hist(log(df$FATALITIES),main='Fatalities',xlab='Log Scale')
hist(log(df$INJURIES),main='Injuries',xlab='Log Scale')
hist(log(df$PROPDMG),main='Property Damages',xlab='Log Scale')
hist(log(df$CROPDMG),main='Crop Damages',xlab='Log Scale')

The plot of the healt variables vs the event type is the following:

par(mfrow=c(1,2),cex.axis=0.5,cex.main=1.5,mar=c(5,4,3,4))
barplot(fat[1:5],ylab='% Probability',main='Fatalities',las=2)
barplot(inj[1:5],ylab='% Probability',main='Injuries',las=2)

The plot of the economic variables vs the event type is the following:

par(mfrow=c(1,2),cex.axis=0.5,cex.main=1.5,mar=c(5,4,3,4))
barplot(prop[1:5]/1e6,ylab='Damage (M$)',main='Property Damages',las=2)
barplot(crop[1:5]/1e6,ylab='Damage (M$)',main='Crop Damages',las=2)