A Study of the Influence of Severe Weather on NOAA Storm Database

Feb. 2015 by Alfred Lu

Synopsis

In this report, we will try to use NOAA storm dataset, and finally to estimate the influence of severe weather event to human health as well as economic loss.

The dataset link is availabe on the website with a codebook, which gives the details of these dataset.

Data Pre-processing

data <- read.csv("repdata-data-StormData.csv")

A brief glance of the data shows we have 902297 records, organized in following columns:

cat(colnames(data))
STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM

The columns important to us is

levels(data$PROPDMGEXP)
 [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
[18] "m" "M"

According to codebook in NOAA website, 'h' or 'H' means hundred, and 'k' or 'K' means thousand. As so, we have million and billion. Other sign is not mentioned in that document, so we treat is as unknown (missing data).

levels(data$CROPDMGEXP)
[1] ""  "?" "0" "2" "B" "k" "K" "m" "M"

Explore the most harmful with respect to population health

Influence to population health is reflected by fatalities and injuries. Firstly, the data set is subsetted by weather event and the sum of fatalities in each set are calculated over years. After the decreasing sort we have following table to show us the harmful weather in top 10.

dataFatality <- split(data$FATALITIES, as.factor(data$EVTYPE))
sumFatality <- sapply(dataFatality, sum, na.rm=T)
fatalityTab <- data.frame(sumFatalities=sumFatality, evtName=names(dataFatality))
fatalityTab <- fatalityTab[with(fatalityTab, order(sumFatalities,decreasing=T)), ]
head(fatalityTab, 10)
               sumFatalities        evtName
TORNADO                 5633        TORNADO
EXCESSIVE HEAT          1903 EXCESSIVE HEAT
FLASH FLOOD              978    FLASH FLOOD
HEAT                     937           HEAT
LIGHTNING                816      LIGHTNING
TSTM WIND                504      TSTM WIND
FLOOD                    470          FLOOD
RIP CURRENT              368    RIP CURRENT
HIGH WIND                248      HIGH WIND
AVALANCHE                224      AVALANCHE

and following chart shows the trend of most deadly weather developing over years,

dataMostHarmful <- data[data$EVTYPE == fatalityTab[1,]$evtName, ]
years <- as.numeric(format(as.Date(dataMostHarmful$BGN_DATE, 
                                   format = "%m/%d/%Y %H:%M:%S"), "%Y"))
sumofYear <- split(dataMostHarmful$FATALITIES, years)
sumofYear <- sapply(sumofYear, sum, na.rm=T)

plot(sumofYear~as.numeric(names(sumofYear)), type="o", 
     xlab='Year', ylab='Total death',
     main=c('Developing of most deadly weather over years'))

plot of chunk unnamed-chunk-6

The similar calculation is conducted on the number of injuries, and we get another table.

dataInjuries <- split(data$INJURIES, as.factor(data$EVTYPE))
sumInjuries <- sapply(dataInjuries, sum, na.rm=T)
injuryTab <- data.frame(sumInjuries=sumInjuries, evtName=names(dataInjuries))
injuryTab <- injuryTab[with(injuryTab, order(sumInjuries, decreasing=T)), ]
head(injuryTab, 10)
                  sumInjuries           evtName
TORNADO                 91346           TORNADO
TSTM WIND                6957         TSTM WIND
FLOOD                    6789             FLOOD
EXCESSIVE HEAT           6525    EXCESSIVE HEAT
LIGHTNING                5230         LIGHTNING
HEAT                     2100              HEAT
ICE STORM                1975         ICE STORM
FLASH FLOOD              1777       FLASH FLOOD
THUNDERSTORM WIND        1488 THUNDERSTORM WIND
HAIL                     1361              HAIL

Compare these tables and figure, one can claim that tornado could be the most dangerous weather to people based on the NOAA database records from 1950 to 2010. And its fatality rebounds to a high number in 2010.

Explore the weather event with greatest economic consequences

In this section, we are trying to find the weather event combined with greatest economic loss.

The first action we taken is trying to combine the base number with the magnitude, and create a new column to the right hand of raw data, which indicated the total loss in each onset of hazard weather,

symToNum <- function(x) {
    if (x == "H") {
        y <- 10^2
    } else if (x == "K") {
        y <- 10^3
    } else if (x == "M") {
        y <- 10^6
    } else if (x == "B") {
        y <- 10^9
    } else {
        y <- 1
    }
    return(y)
}
dmgBase <- data$PROPDMG
dmgExpSym <- as.character(data$PROPDMGEXP)
dmgExpSym <- toupper(dmgExpSym)
dmgNum <- sapply(dmgExpSym, symToNum)
dmgAll <- dmgBase * dmgNum

dmgBase2 <- data$CROPDMG
dmgExpSym2 <- as.character(data$CROPDMGEXP)
dmgExpSym2 <- toupper(dmgExpSym2)
dmgNum2 <- sapply(dmgExpSym2, symToNum)
dmgAll2 <- dmgBase2 * dmgNum2

data$totEcoDmg <- dmgAll + dmgAll2

and now we can subset into corresponding weather event and sum up economic loss, as we did in last section.

dataEcoLoss <- split(data$totEcoDmg, as.factor(data$EVTYPE))
sumEcoLoss <- sapply(dataEcoLoss, sum, na.rm=T)
ecoLossTab <- data.frame(sumEcoLoss=sumEcoLoss, evtName=names(dataEcoLoss))
ecoLossTab <- ecoLossTab[with(ecoLossTab, order(sumEcoLoss, decreasing=T)), ]
head(ecoLossTab, 10)
                    sumEcoLoss           evtName
FLOOD             150319678257             FLOOD
HURRICANE/TYPHOON  71913712800 HURRICANE/TYPHOON
TORNADO            57352114049           TORNADO
STORM SURGE        43323541000       STORM SURGE
HAIL               18758222016              HAIL
FLASH FLOOD        17562129167       FLASH FLOOD
DROUGHT            15018672000           DROUGHT
HURRICANE          14610229010         HURRICANE
RIVER FLOOD        10148404500       RIVER FLOOD
ICE STORM           8967041360         ICE STORM

Finally, different statement is deduced, the flood weather brings the greatest economic loss based on the NOAA database records from 1950 to 2010.

Conclusion

This report brings forward an exploration of the influence from hazard weather event based on NOAA strom database. Different outputs come up when we trying to evaluate from different aspect. The tornado is the dangerous weather to human health while the flood go with the great ecomonic loss.