Feb. 2015 by Alfred Lu
In this report, we will try to use NOAA storm dataset, and finally to estimate the influence of severe weather event to human health as well as economic loss.
The dataset link is availabe on the website with a codebook, which gives the details of these dataset.
data <- read.csv("repdata-data-StormData.csv")
A brief glance of the data shows we have 902297 records, organized in following columns:
cat(colnames(data))
STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
The columns important to us is
'EVTYPE', are string code, recording the hazard weather event,
'FATALITIES' and 'INJURIES', are numerics, mainly including the number of death or injury during that hazard weather event.
'PROPDMG' and 'PROPDMGEXP', are columns related to property damages, while 'PROPDMGEXP' gives the magnitude of the base data recorded in 'PROPDMG'. The input to that column could be as following levels,
levels(data$PROPDMGEXP)
[1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
[18] "m" "M"
According to codebook in NOAA website, 'h' or 'H' means hundred, and 'k' or 'K' means thousand. As so, we have million and billion. Other sign is not mentioned in that document, so we treat is as unknown (missing data).
levels(data$CROPDMGEXP)
[1] "" "?" "0" "2" "B" "k" "K" "m" "M"
Influence to population health is reflected by fatalities and injuries. Firstly, the data set is subsetted by weather event and the sum of fatalities in each set are calculated over years. After the decreasing sort we have following table to show us the harmful weather in top 10.
dataFatality <- split(data$FATALITIES, as.factor(data$EVTYPE))
sumFatality <- sapply(dataFatality, sum, na.rm=T)
fatalityTab <- data.frame(sumFatalities=sumFatality, evtName=names(dataFatality))
fatalityTab <- fatalityTab[with(fatalityTab, order(sumFatalities,decreasing=T)), ]
head(fatalityTab, 10)
sumFatalities evtName
TORNADO 5633 TORNADO
EXCESSIVE HEAT 1903 EXCESSIVE HEAT
FLASH FLOOD 978 FLASH FLOOD
HEAT 937 HEAT
LIGHTNING 816 LIGHTNING
TSTM WIND 504 TSTM WIND
FLOOD 470 FLOOD
RIP CURRENT 368 RIP CURRENT
HIGH WIND 248 HIGH WIND
AVALANCHE 224 AVALANCHE
and following chart shows the trend of most deadly weather developing over years,
dataMostHarmful <- data[data$EVTYPE == fatalityTab[1,]$evtName, ]
years <- as.numeric(format(as.Date(dataMostHarmful$BGN_DATE,
format = "%m/%d/%Y %H:%M:%S"), "%Y"))
sumofYear <- split(dataMostHarmful$FATALITIES, years)
sumofYear <- sapply(sumofYear, sum, na.rm=T)
plot(sumofYear~as.numeric(names(sumofYear)), type="o",
xlab='Year', ylab='Total death',
main=c('Developing of most deadly weather over years'))
The similar calculation is conducted on the number of injuries, and we get another table.
dataInjuries <- split(data$INJURIES, as.factor(data$EVTYPE))
sumInjuries <- sapply(dataInjuries, sum, na.rm=T)
injuryTab <- data.frame(sumInjuries=sumInjuries, evtName=names(dataInjuries))
injuryTab <- injuryTab[with(injuryTab, order(sumInjuries, decreasing=T)), ]
head(injuryTab, 10)
sumInjuries evtName
TORNADO 91346 TORNADO
TSTM WIND 6957 TSTM WIND
FLOOD 6789 FLOOD
EXCESSIVE HEAT 6525 EXCESSIVE HEAT
LIGHTNING 5230 LIGHTNING
HEAT 2100 HEAT
ICE STORM 1975 ICE STORM
FLASH FLOOD 1777 FLASH FLOOD
THUNDERSTORM WIND 1488 THUNDERSTORM WIND
HAIL 1361 HAIL
Compare these tables and figure, one can claim that tornado could be the most dangerous weather to people based on the NOAA database records from 1950 to 2010. And its fatality rebounds to a high number in 2010.
In this section, we are trying to find the weather event combined with greatest economic loss.
The first action we taken is trying to combine the base number with the magnitude, and create a new column to the right hand of raw data, which indicated the total loss in each onset of hazard weather,
symToNum <- function(x) {
if (x == "H") {
y <- 10^2
} else if (x == "K") {
y <- 10^3
} else if (x == "M") {
y <- 10^6
} else if (x == "B") {
y <- 10^9
} else {
y <- 1
}
return(y)
}
dmgBase <- data$PROPDMG
dmgExpSym <- as.character(data$PROPDMGEXP)
dmgExpSym <- toupper(dmgExpSym)
dmgNum <- sapply(dmgExpSym, symToNum)
dmgAll <- dmgBase * dmgNum
dmgBase2 <- data$CROPDMG
dmgExpSym2 <- as.character(data$CROPDMGEXP)
dmgExpSym2 <- toupper(dmgExpSym2)
dmgNum2 <- sapply(dmgExpSym2, symToNum)
dmgAll2 <- dmgBase2 * dmgNum2
data$totEcoDmg <- dmgAll + dmgAll2
and now we can subset into corresponding weather event and sum up economic loss, as we did in last section.
dataEcoLoss <- split(data$totEcoDmg, as.factor(data$EVTYPE))
sumEcoLoss <- sapply(dataEcoLoss, sum, na.rm=T)
ecoLossTab <- data.frame(sumEcoLoss=sumEcoLoss, evtName=names(dataEcoLoss))
ecoLossTab <- ecoLossTab[with(ecoLossTab, order(sumEcoLoss, decreasing=T)), ]
head(ecoLossTab, 10)
sumEcoLoss evtName
FLOOD 150319678257 FLOOD
HURRICANE/TYPHOON 71913712800 HURRICANE/TYPHOON
TORNADO 57352114049 TORNADO
STORM SURGE 43323541000 STORM SURGE
HAIL 18758222016 HAIL
FLASH FLOOD 17562129167 FLASH FLOOD
DROUGHT 15018672000 DROUGHT
HURRICANE 14610229010 HURRICANE
RIVER FLOOD 10148404500 RIVER FLOOD
ICE STORM 8967041360 ICE STORM
Finally, different statement is deduced, the flood weather brings the greatest economic loss based on the NOAA database records from 1950 to 2010.
This report brings forward an exploration of the influence from hazard weather event based on NOAA strom database. Different outputs come up when we trying to evaluate from different aspect. The tornado is the dangerous weather to human health while the flood go with the great ecomonic loss.