Extreme weather Events

Synopsis

  • This is analysis of a dataset created from the U.S. National Oceanic and Atmospheric Administration(NOAA)’s storm database.
  • The dataset contains the entries of extreme weather events recorded from the year 1950 to November 2011.
  • The dataset gives important information(eg: location, time, injuries, fatalities, crop and property losses, intensity etc.) about the weather events.
  • To find out the type of events causing maximum direct harm to the population and maximum economic devastation the 8th,23rd,24th,25th,26th,27th and the 28th columns were selected for further analysis.
  • To find the worst event types for public health, we aggregated the INJURIES and FATALITIES column grouped by EVTYPE.
  • We found that in total, tornado caused both the maximum injuries and fatalities.
  • We also got to know that on average, tornadoes, thunderstorms and hailstorms cause the highest fatalities on each single events, while heat-waves on average cause highest injuries on each single events.
  • Before going further, we multiplied the crop and property damage values with their respective exponent figures. Then we aggregated the PROPDMG and CROPDMG columns grouped by EVTYPE.
  • Here we learned that on average tornadoes, thunderstorms and hailstorms caused the maximum property damage, while excessive wetness is the most damaging event as far as crop damages are concerned.
  • On the other hand, in total, flood causes the highest property damage, while drought causes maximum crop damage.
Variable Name Meaning
EVTYPE Event Type
INJURIES Number of human injuries
FATALITIES Number of human deaths
PROPDMG Property Damage amount in USD
PROPDMGEXP Exponent of PROPDMG
CROPDMG Crop Damage amount in USD
CROPDMGEXP Exponent of CROPDMG

Data Processing

Download the csv file compressed in bz2 format(around 47MB) from here. After downloading the bz2, let’s import the csv.

if(!file.exists("storm.csv.bz2"))
{
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","storm.csv.bz2")
}
st=read.csv("storm.csv.bz2")
st = st[,c(8,23,24,25,26,27,28)]

We are concerned only about the 7 imported columns.

q1md = aggregate(st$FATALITIES, list(st$EVTYPE),FUN=median)
q1md[,3] = aggregate(st$INJURIES, list(st$EVTYPE),FUN=median)[,2]
q1sm = aggregate(st$FATALITIES, list(st$EVTYPE),FUN=sum)
q1sm[,3] = aggregate(st$INJURIES, list(st$EVTYPE),FUN=sum)[,2]
names(q1md)=c("EVTYPE","FATALITIES","INJURIES")
names(q1sm)=c("EVTYPE","FATALITIES","INJURIES")

Now let’s move to the 2nd question. The maximum economic damage.

The possible values of CROPDMGEXP and PROPDMGEXP are K, M, , B, m, +, 0, 5, 6, ?, 4, 2, 3, h, 7, H, -, 1, 8, k and blank/space character. The two columns contain the exponents of their respective predecessor columns.

Exponent Chart

Character Exponent
H,h 10^2
K,k 10^3
M,m 10^6
B,b 10^9
(+) 1
(-) 0
(?) 0
space 0
numeric 10^1

To find the culprit events causing maximum economic damage, let’s multiply the exponents to their respective base values.

digit = as.character(0:9)
nulll = c(" ","-","?")
#property
st$PROPDMG[st$PROPDMGEXP=="+"]=st$PROPDMG[st$PROPDMGEXP=="+"]*1
st$PROPDMG[toupper(st$PROPDMGEXP)=='H']=st$PROPDMG[toupper(st$PROPDMGEXP)=='H'] * (10^2)
st$PROPDMG[toupper(st$PROPDMGEXP)=='K']=st$PROPDMG[toupper(st$PROPDMGEXP)=='K'] * (10^3)
st$PROPDMG[toupper(st$PROPDMGEXP)=='M']=st$PROPDMG[toupper(st$PROPDMGEXP)=='M'] * (10^6)
st$PROPDMG[toupper(st$PROPDMGEXP)=='B']=st$PROPDMG[toupper(st$PROPDMGEXP)=='B'] * (10^9)
st$PROPDMG[st$PROPDMGEXP %in% digit]=st$PROPDMG[st$PROPDMGEXP %in% digit]*10
st$PROPDMG[st$PROPDMGEXP %in% nulll]=st$PROPDMG[st$PROPDMGEXP %in% nulll]*0
#now crop
st$CROPDMG[st$CROPDMGEXP=="+"]=st$CROPDMG[st$CROPDMGEXP=="+"]*1
st$CROPDMG[toupper(st$CROPDMGEXP)=='H']=st$CROPDMG[toupper(st$CROPDMGEXP)=='H'] * (10^2)
st$CROPDMG[toupper(st$CROPDMGEXP)=='K']=st$CROPDMG[toupper(st$CROPDMGEXP)=='K'] * (10^3)
st$CROPDMG[toupper(st$CROPDMGEXP)=='M']=st$CROPDMG[toupper(st$CROPDMGEXP)=='M'] * (10^6)
st$CROPDMG[toupper(st$CROPDMGEXP)=='B']=st$CROPDMG[toupper(st$CROPDMGEXP)=='B'] * (10^9)
st$CROPDMG[st$CROPDMGEXP %in% digit]=st$CROPDMG[st$CROPDMGEXP %in% digit]*10
st$CROPDMG[st$CROPDMGEXP %in% nulll]=st$CROPDMG[st$CROPDMGEXP %in% nulll]*0

Results

Answer to the first question

bigkillavg = q1md$EVTYPE[q1md$FATALITIES==max(q1md$FATALITIES)]
biginjravg = q1md$EVTYPE[q1md$INJURIES==max(q1md$INJURIES)]
bigkillsum = q1sm$EVTYPE[q1sm$FATALITIES==max(q1sm$FATALITIES)]
biginjrsum = q1sm$EVTYPE[q1sm$INJURIES==max(q1sm$INJURIES)]
paste(biginjravg,biginjrsum,bigkillavg,bigkillsum, sep="    ")
## [1] "Heat Wave    TORNADO    TORNADOES, TSTM WIND, HAIL    TORNADO"

On average highest fatality is caused by tornadoes, tstm wind, hail, while heat wave caused the highest average injuries. Altogether, the biggest killer event was tornado and in total, it caused maximum number of injuries too.

Answer to the 2nd question

q2md = aggregate(st[,c(4,6)],list(st$EVTYPE), FUN=median)
q2sm = aggregate(st[,c(4,6)],list(st$EVTYPE), FUN=sum)
names(q2md)[1] = names(q2sm)[1] = "EVTYPE"
bigpropavg = q2md$EVTYPE[q2md$PROPDMG==max(q2md$PROPDMG)]
bigcropavg = q2md$EVTYPE[q2md$CROPDMG==max(q2md$CROPDMG)]
bigpropsum = q2sm$EVTYPE[q2sm$PROPDMG==max(q2sm$PROPDMG)]
bigcropsum = q2sm$EVTYPE[q2sm$CROPDMG==max(q2sm$CROPDMG)]
print(bigpropavg); print(bigpropsum);print(bigcropavg);print(bigcropsum)
## [1] "TORNADOES, TSTM WIND, HAIL"
## [1] "FLOOD"
## [1] "EXCESSIVE WETNESS"
## [1] "DROUGHT"

On average highest property damage is caused by tornadoes, tstm wind, hail, while excessive wetness caused the highest average crop damage. Altogether, the biggest property damaging events were flood and in total, drought caused maximum crop damage.

On a side note, let’s find out the correlation between injuries and fatalities.

plot(q1md[,3],q1md[,2], xlab = "Injuries", ylab = "Fatalities",main = "Correlation between injuries and fatalities")
lama= lm(q1md[,3]~q1md[,2])
abline(lama)

Thus we found that injuries and fatalities have very high correlation.

And at last,let’s see the most devastating event types.

head(q1sm[order(q1sm$INJURIES+q1sm$FATALITIES,decreasing=TRUE),],10)
##                EVTYPE FATALITIES INJURIES
## 834           TORNADO       5633    91346
## 130    EXCESSIVE HEAT       1903     6525
## 856         TSTM WIND        504     6957
## 170             FLOOD        470     6789
## 464         LIGHTNING        816     5230
## 275              HEAT        937     2100
## 153       FLASH FLOOD        978     1777
## 427         ICE STORM         89     1975
## 760 THUNDERSTORM WIND        133     1488
## 972      WINTER STORM        206     1321
head(q2sm[order(q2sm$PROPDMG+q2sm$CROPDMG,decreasing=TRUE),],10)
##                EVTYPE      PROPDMG     CROPDMG
## 170             FLOOD 144657709807  5661968450
## 411 HURRICANE/TYPHOON  69305840000  2607872800
## 834           TORNADO  56937162900   414954710
## 670       STORM SURGE  43323536000        5000
## 244              HAIL  15732269934  3025954653
## 153       FLASH FLOOD  16140815218  1421317100
## 95            DROUGHT   1046106000 13972566000
## 402         HURRICANE  11868319010  2741910000
## 590       RIVER FLOOD   5118945500  5029459000
## 427         ICE STORM   3944928310  5022113500