Weather and Storm event damages

Following analysis takes data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and tries to answer:

  1. Which type of weather calamity has caused maximum health damage across whole US
  2. Which type of weather calamity has caused maximum economic damage across whole US

Synopsis

In the analysis a raw csv file carrying data related to storm events was loaded in R. Since the aim of the analysis was to answer questions regarding health and economic loss for events, only columns corresponding to them were retained. Event type was identified and stored in a new column. Economic losses (property and crop) were calculated using corresponding value and exponent columns. Event wise health and economic losses were summed and stored in new variables. Plots corresponding to event-wise health and economic losses were plotted. As per the data, tornado caused highest health loss and floods caused highest economic loss in US.


Data processing

Data was available repdata-data-StormData.csv.bz2 and has 902297 observations and 37 columns

Reading data

The code begins by reading data. Function read.csv was used to load data in the variable data. repdata-data-StormData.csv.bz2 is compressed, but read.csv reads data perfectly. Data was then subset and columns corresponding to event type, health losses (fatalities and injuries), economic losses (property/crop damages and their exponents) were retained. Since loading data took sufficient amount of time, cache=TRUE was used.

data <- read.csv("repdata-data-StormData.csv.bz2")
data <- data[,c(8,23:28)]

Functions for data processing

Following are the functions writter in order to process the data. These functions helped in keeping the code simple and readable.

eventType Function

Since there were various kinds of events in the data, with different kind of names stored in EVTYPE column of the data, a function was written to identify which category of event an observation belonged to. The functions takes a string as an input and returns a string with type of event stored. For the analysis following category of events were identified:

  • Thunderstorm
  • Flood
  • Rain
  • Tornado
  • Tide
  • Avalanche
  • Blizzard
  • Drought
  • Wind
  • Hail
  • Snow
  • Storm
  • Lightening
  • Winter
  • Others (rest of the events)

The function uses grepl to match the EVTYPE with each of the above strings, in the order specified above. Since both uppercase and lower case letters were present, string passed to grepl was appropriately written. there might be a few observations which had multiple string patterns matching, however for those the first one to be matched was stored.

eventType <- function(a){
  if(grepl("[Tt][Hh][Uu][Nn][Dd][Ee][Rr][Ss][Tt][Oo][Rr][Mm]", a)){
        b <- "THUNDERSTORM"
    } else if(grepl("[Ff][Ll][Oo][Oo][Dd]", a)){
        b <- "FLOOD"
    } else if(grepl("[Rr][Aa][Ii][Nn]", a)){
        b <- "RAIN"
    } else if(grepl("[Tt][Oo][Rr][Nn][Aa][Dd][Oo]", a)){
        b <- "TORNADO"
    } else if(grepl("[Tt][Ii][Dd][Ee]", a)){
        b <- "TIDE"
    } else if(grepl("[Aa][Vv][Aa][Ll][Aa][Nn][Cc][Hh][Ee]", a)){
        b <- "AVALANCHE"
    } else if(grepl("[Bb][Ll][Ii][Zz][Zz][Aa][Rr][Dd]", a)){
        b <- "BLIZZARD"
    } else if(grepl("[Dd][Rr][Oo][Uu][Gg][Hh][Tt]", a)){
        b <- "DROUGHT"
    } else if(grepl("[Ww][Ii][Nn][Dd]", a)){
        b <- "WIND"
    } else if(grepl("[Hh][Aa][Ii][Ll]", a)){
        b <- "HAIL"
    } else if(grepl("[Ss][Nn][Oo][Ww]", a)){
        b <- "SNOW"
    } else if(grepl("[Ss][Tt][Oo][Rr][Mm]", a)){
        b <- "STORM"
    } else if(grepl("[Ll][Ii][Gg][Hh][Tt][Ee][Nn][Ii][Nn][Gg]", a)){
        b <- "LIGHTENING"
    } else if(grepl("[Ww][Ii][Nn][Tt][Ee][Rr]", a)){
        b <- "WINTER"
    } else{
        b <- "OTHERS"
    }
    b
}

expAsNumeric Function

For economic loss, there were two kind of columns. One with value and one with exponent. Exponent column had values such as “H”, “k” etc which referred to thousand, hundred etc. The function takes string as input and returns a number. Following categories of exponents were identified:

  • H/h - hundred - 2
  • K/k - thousand - 3
  • M/m - million - 6
  • B/b - billion - 9
  • numbers(1 to 9) - number’s value
  • all other symbols were assigned value 0

The function returns 10 raised to power above exponent.

expAsNumeric <- function(a){
    if(a == "K" | a == "k"){
        b <- 3
    } else if(a == "M" | a == "m"){
        b <- 6
    } else if(a == "B" | a == "b"){
        b <- 9
    } else if(a == "H" | a == "h"){
        b <- 2
    } else if(a == "1" | a == "2" | a == "3" | a == "4" | a == "5" | a == "6" | a == "7" | a == "8" | a == "9" ){
        b <- as.numeric(a)
    } else {
        b <- 0
    }
    10^b
}

Processing of data

A loop was run for all the observations. The functiones defined above were used. In the loop following tasks are performed:

  • Identifying type of event using function eventType and storing in column EVENTTYPE
  • Identifying exponent value using function expAsNumeric and multiplying it with coefficient value to calculate damage of property and crop.

Since processing data took sufficient amount of time, cache=TRUE was used.

for(i in 1:dim(data)[1])
{
data$EVENTTYPE[i] <- eventType(data$EVTYPE[i])
data$PROPDAMAGE[i] <- expAsNumeric(data$PROPDMGEXP[i])*data$PROPDMG[i]
data$CROPDAMAGE[i] <- expAsNumeric(data$CROPDMGEXP[i])*data$CROPDMG[i]
}

Event wise summary

Now we have identified event of each observation and economic losses for each observation have been calculted, the data was summarised. data.frame health stores number of fatalities and injuries, for this function aggregate was used to calculate sum for each event. Similarly eco stores property damages and crop damages.

health <- merge(aggregate(FATALITIES~EVENTTYPE,data,FUN=sum),aggregate(INJURIES~EVENTTYPE,data,FUN=sum),by="EVENTTYPE")
eco <- merge(aggregate(PROPDAMAGE~EVENTTYPE,data,FUN=sum),aggregate(CROPDAMAGE~EVENTTYPE,data,FUN=sum),by="EVENTTYPE")

Results

After sum of health losses and economic losses was calculated, we were ready to look at the summary of data and answer the questions we had in the begining. For it graphs for health and economic losses were plotted using ggplot2. Library ggplot2 for plots and xtable to print table, were loaded.

library(ggplot2)
library(xtable)

Health

Following is the graph which would help us determine the event which caused maximum damage to population health. For each event, number of fatalities and injuries were added and plotted.

g <- ggplot(health,aes(EVENTTYPE))
g + geom_point(aes(y = INJURIES+FATALITIES, col = "Healh Loss"),size=5,pch=17)+labs(title = "Harm to population health",x="Events",y="Number of fatalities and injuries")+theme(axis.text.x = element_text(angle = 90))

The above graph has nummber of fatalities and injuries for different events.

Following is the table of health losses caused by various events. And determining which event caused maximum health loss.

maxHealthLoss <- max(health$FATALITIES+health$INJURIES)
maxHealthEvent <- health[which(health$FATALITIES+health$INJURIES==maxHealthLoss),1]
health <- health[with(health,order(-FATALITIES-INJURIES)),]
rownames(health) <- 1:14
healthTable <- xtable(health)
maxHealthLoss <- round(maxHealthLoss/(10^3),2)
print(healthTable,type="html")
EVENTTYPE FATALITIES INJURIES
1 TORNADO 5661.00 91407.00
2 OTHERS 5433.00 20454.00
3 WIND 1216.00 9059.00
4 FLOOD 1525.00 8604.00
5 STORM 410.00 4191.00
6 THUNDERSTORM 210.00 2479.00
7 HAIL 15.00 1371.00
8 SNOW 159.00 1120.00
9 BLIZZARD 101.00 805.00
10 WINTER 61.00 538.00
11 RAIN 113.00 305.00
12 AVALANCHE 224.00 171.00
13 DROUGHT 6.00 19.00
14 TIDE 11.00 5.00

We can see that TORNADO caused maximum damage, i.e of 97.07 thousand fatalities and injuries to polulation health.

Economic

Following is the graph which would help us determine the event which caused maximum damage to economy. For each event, total property damages and crop damages were added and plotted.

g <- ggplot(eco,aes(EVENTTYPE))
g + geom_point(aes(y = PROPDAMAGE+CROPDAMAGE, col = "Economic Loss"),size=5,pch=17)+labs(title = "Economic damages",x="Events",y="Damages (Property and Crop)")+theme(axis.text.x = element_text(angle = 90))

The above graph has total property and crop damage for different events.

Following is the table of economic losses caused by various events. And determining which event caused maximum economic loss.

maxEcoLoss <- max(eco$PROPDAMAGE+eco$CROPDAMAGE)
maxEcoEvent <- eco[which(eco$PROPDAMAGE+eco$CROPDAMAGE==maxEcoLoss),1]
eco <- eco[with(eco,order(-PROPDAMAGE-CROPDAMAGE)),]
rownames(eco) <- 1:14
ecoTable <- xtable(eco)
maxEcoLoss <- round(maxEcoLoss/(10^12),2)
print(ecoTable,type="html")
EVENTTYPE PROPDAMAGE CROPDAMAGE
1 FLOOD 68415029215834.91 12380079100.00
2 THUNDERSTORM 20869552594005.10 653005388.00
3 TORNADO 1080593097926.50 417461520.00
4 HAIL 315974043512.70 3046837473.00
5 OTHERS 267547951169.50 10408668370.00
6 STORM 61677950661.00 5747558500.00
7 SNOW 18011019750.00 134663100.00
8 DROUGHT 1046306000.00 13972621780.00
9 WIND 10904848618.00 1409224150.00
10 TIDE 4650933150.00 850000.00
11 RAIN 3254758190.00 806162800.00
12 BLIZZARD 659913950.00 112060000.00
13 WINTER 27298000.00 15000000.00
14 AVALANCHE 8721800.00 0.00

We can see that FLOOD caused maximum damage, i.e of 68.43 trillion USD in form of property and crop damages.