There are four variables that describe the effect of weather events on the health and economy. They are: Fatalities and Injuries for population healt and Property and Crop Damages as economic consequences. Tornados is the only weather event that has a big impact on all the variables. Actualy, it is the most important event for Injuries, Fatalities and Property Damage. Also Flash Flood and Flood are causing effect in many cases. TstmWind is important for the econonomic consequences instead Lighting focalizes on the healt damage.
The dataset is load directly from the compress file in a raw data.frame called df:
df <- read.csv(gzfile("repdata_data_StormData.csv.bz2"),header=TRUE, sep=',',dec='.')
The first question is:
Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
The health population is described by two variables: Fatalities and Injuries. Each of them is extracted and nomalized in a vector, ordered in decreasing order vs the event type.
fat<-tapply(df$FATALITIES,df$EVTYPE,sum)
fat<-fat[order(-fat)]
fat<-fat/sum(fat)*100
inj<-tapply(df$INJURIES,df$EVTYPE,sum)
inj<-inj[order(-inj)]
inj<-inj/sum(inj)*100
The second question is:
Across the United States, which types of events have the greatest economic consequences?
The economic consequences are described by two variables: Property Damages and Crop Damages. Each of them is extracted in a vector ordered in decreasing order vs the event type.
prop<-tapply(df$PROPDMG,df$EVTYPE,sum)
prop<-prop[order(-prop)]
crop<-tapply(df$CROPDMG,df$EVTYPE,sum)
crop<-crop[order(-crop)]
THe four interesting variables a broadly distributed as it is evident from the following summary table.
summary(df[,c('FATALITIES','INJURIES','PROPDMG','CROPDMG')])
## FATALITIES INJURIES PROPDMG CROPDMG
## Min. : 0.0000 Min. : 0.0000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.0000 1st Qu.: 0.0000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 0.0000 Median : 0.0000 Median : 0.00 Median : 0.000
## Mean : 0.0168 Mean : 0.1557 Mean : 12.06 Mean : 1.527
## 3rd Qu.: 0.0000 3rd Qu.: 0.0000 3rd Qu.: 0.50 3rd Qu.: 0.000
## Max. :583.0000 Max. :1700.0000 Max. :5000.00 Max. :990.000
Distributions of the variables are different. The healt variable are decreasing instead the economic variables are more centered as shown by their histograms.
par(mfrow=c(2,2),cex.axis=0.5,cex.main=1.5,mar=c(5,4,3,4))
hist(log(df$FATALITIES),main='Fatalities',xlab='Log Scale')
hist(log(df$INJURIES),main='Injuries',xlab='Log Scale')
hist(log(df$PROPDMG),main='Property Damages',xlab='Log Scale')
hist(log(df$CROPDMG),main='Crop Damages',xlab='Log Scale')
The plot of the healt variables vs the event type is the following:
par(mfrow=c(1,2),cex.axis=0.5,cex.main=1.5,mar=c(5,4,3,4))
barplot(fat[1:5],ylab='% Probability',main='Fatalities',las=2)
barplot(inj[1:5],ylab='% Probability',main='Injuries',las=2)
The plot of the economic variables vs the event type is the following:
par(mfrow=c(1,2),cex.axis=0.5,cex.main=1.5,mar=c(5,4,3,4))
barplot(prop[1:5]/1e6,ylab='Damage (M$)',main='Property Damages',las=2)
barplot(crop[1:5]/1e6,ylab='Damage (M$)',main='Crop Damages',las=2)