INJURIES and FATALITIES column grouped by EVTYPE.PROPDMG and CROPDMG columns grouped by EVTYPE.| Variable Name | Meaning |
|---|---|
| EVTYPE | Event Type |
| INJURIES | Number of human injuries |
| FATALITIES | Number of human deaths |
| PROPDMG | Property Damage amount in USD |
| PROPDMGEXP | Exponent of PROPDMG |
| CROPDMG | Crop Damage amount in USD |
| CROPDMGEXP | Exponent of CROPDMG |
Download the csv file compressed in bz2 format(around 47MB) from here. After downloading the bz2, let’s import the csv.
if(!file.exists("storm.csv.bz2"))
{
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","storm.csv.bz2")
}
st=read.csv("storm.csv.bz2")
st = st[,c(8,23,24,25,26,27,28)]
We are concerned only about the 7 imported columns.
q1md = aggregate(st$FATALITIES, list(st$EVTYPE),FUN=median)
q1md[,3] = aggregate(st$INJURIES, list(st$EVTYPE),FUN=median)[,2]
q1sm = aggregate(st$FATALITIES, list(st$EVTYPE),FUN=sum)
q1sm[,3] = aggregate(st$INJURIES, list(st$EVTYPE),FUN=sum)[,2]
names(q1md)=c("EVTYPE","FATALITIES","INJURIES")
names(q1sm)=c("EVTYPE","FATALITIES","INJURIES")
Now let’s move to the 2nd question. The maximum economic damage.
The possible values of CROPDMGEXP and PROPDMGEXP are K, M, , B, m, +, 0, 5, 6, ?, 4, 2, 3, h, 7, H, -, 1, 8, k and blank/space character. The two columns contain the exponents of their respective predecessor columns.
Exponent Chart
| Character | Exponent |
|---|---|
| H,h | 10^2 |
| K,k | 10^3 |
| M,m | 10^6 |
| B,b | 10^9 |
| (+) | 1 |
| (-) | 0 |
| (?) | 0 |
| space | 0 |
| numeric | 10^1 |
To find the culprit events causing maximum economic damage, let’s multiply the exponents to their respective base values.
digit = as.character(0:9)
nulll = c(" ","-","?")
#property
st$PROPDMG[st$PROPDMGEXP=="+"]=st$PROPDMG[st$PROPDMGEXP=="+"]*1
st$PROPDMG[toupper(st$PROPDMGEXP)=='H']=st$PROPDMG[toupper(st$PROPDMGEXP)=='H'] * (10^2)
st$PROPDMG[toupper(st$PROPDMGEXP)=='K']=st$PROPDMG[toupper(st$PROPDMGEXP)=='K'] * (10^3)
st$PROPDMG[toupper(st$PROPDMGEXP)=='M']=st$PROPDMG[toupper(st$PROPDMGEXP)=='M'] * (10^6)
st$PROPDMG[toupper(st$PROPDMGEXP)=='B']=st$PROPDMG[toupper(st$PROPDMGEXP)=='B'] * (10^9)
st$PROPDMG[st$PROPDMGEXP %in% digit]=st$PROPDMG[st$PROPDMGEXP %in% digit]*10
st$PROPDMG[st$PROPDMGEXP %in% nulll]=st$PROPDMG[st$PROPDMGEXP %in% nulll]*0
#now crop
st$CROPDMG[st$CROPDMGEXP=="+"]=st$CROPDMG[st$CROPDMGEXP=="+"]*1
st$CROPDMG[toupper(st$CROPDMGEXP)=='H']=st$CROPDMG[toupper(st$CROPDMGEXP)=='H'] * (10^2)
st$CROPDMG[toupper(st$CROPDMGEXP)=='K']=st$CROPDMG[toupper(st$CROPDMGEXP)=='K'] * (10^3)
st$CROPDMG[toupper(st$CROPDMGEXP)=='M']=st$CROPDMG[toupper(st$CROPDMGEXP)=='M'] * (10^6)
st$CROPDMG[toupper(st$CROPDMGEXP)=='B']=st$CROPDMG[toupper(st$CROPDMGEXP)=='B'] * (10^9)
st$CROPDMG[st$CROPDMGEXP %in% digit]=st$CROPDMG[st$CROPDMGEXP %in% digit]*10
st$CROPDMG[st$CROPDMGEXP %in% nulll]=st$CROPDMG[st$CROPDMGEXP %in% nulll]*0
Answer to the first question
bigkillavg = q1md$EVTYPE[q1md$FATALITIES==max(q1md$FATALITIES)]
biginjravg = q1md$EVTYPE[q1md$INJURIES==max(q1md$INJURIES)]
bigkillsum = q1sm$EVTYPE[q1sm$FATALITIES==max(q1sm$FATALITIES)]
biginjrsum = q1sm$EVTYPE[q1sm$INJURIES==max(q1sm$INJURIES)]
paste(biginjravg,biginjrsum,bigkillavg,bigkillsum, sep=" ")
## [1] "Heat Wave TORNADO TORNADOES, TSTM WIND, HAIL TORNADO"
On average highest fatality is caused by tornadoes, tstm wind, hail, while heat wave caused the highest average injuries. Altogether, the biggest killer event was tornado and in total, it caused maximum number of injuries too.
Answer to the 2nd question
q2md = aggregate(st[,c(4,6)],list(st$EVTYPE), FUN=median)
q2sm = aggregate(st[,c(4,6)],list(st$EVTYPE), FUN=sum)
names(q2md)[1] = names(q2sm)[1] = "EVTYPE"
bigpropavg = q2md$EVTYPE[q2md$PROPDMG==max(q2md$PROPDMG)]
bigcropavg = q2md$EVTYPE[q2md$CROPDMG==max(q2md$CROPDMG)]
bigpropsum = q2sm$EVTYPE[q2sm$PROPDMG==max(q2sm$PROPDMG)]
bigcropsum = q2sm$EVTYPE[q2sm$CROPDMG==max(q2sm$CROPDMG)]
print(bigpropavg); print(bigpropsum);print(bigcropavg);print(bigcropsum)
## [1] "TORNADOES, TSTM WIND, HAIL"
## [1] "FLOOD"
## [1] "EXCESSIVE WETNESS"
## [1] "DROUGHT"
On average highest property damage is caused by tornadoes, tstm wind, hail, while excessive wetness caused the highest average crop damage. Altogether, the biggest property damaging events were flood and in total, drought caused maximum crop damage.
On a side note, let’s find out the correlation between injuries and fatalities.
plot(q1md[,3],q1md[,2], xlab = "Injuries", ylab = "Fatalities",main = "Correlation between injuries and fatalities")
lama= lm(q1md[,3]~q1md[,2])
abline(lama)
Thus we found that injuries and fatalities have very high correlation.
And at last,let’s see the most devastating event types.
head(q1sm[order(q1sm$INJURIES+q1sm$FATALITIES,decreasing=TRUE),],10)
## EVTYPE FATALITIES INJURIES
## 834 TORNADO 5633 91346
## 130 EXCESSIVE HEAT 1903 6525
## 856 TSTM WIND 504 6957
## 170 FLOOD 470 6789
## 464 LIGHTNING 816 5230
## 275 HEAT 937 2100
## 153 FLASH FLOOD 978 1777
## 427 ICE STORM 89 1975
## 760 THUNDERSTORM WIND 133 1488
## 972 WINTER STORM 206 1321
head(q2sm[order(q2sm$PROPDMG+q2sm$CROPDMG,decreasing=TRUE),],10)
## EVTYPE PROPDMG CROPDMG
## 170 FLOOD 144657709807 5661968450
## 411 HURRICANE/TYPHOON 69305840000 2607872800
## 834 TORNADO 56937162900 414954710
## 670 STORM SURGE 43323536000 5000
## 244 HAIL 15732269934 3025954653
## 153 FLASH FLOOD 16140815218 1421317100
## 95 DROUGHT 1046106000 13972566000
## 402 HURRICANE 11868319010 2741910000
## 590 RIVER FLOOD 5118945500 5029459000
## 427 ICE STORM 3944928310 5022113500