Synopsis
This report analyses the U.S. National Oceanic and Atmospheric Administration’s storm database and reports the impact of severe weather event types on public health (in terms of fatalities and injuries) and public wealth (in terms of property damage and crop damage).
This report aims to assist goverment officials and policy makers in prioritizing the resources in mitgating severe weather events in future.
This report is written following the principles of Reporducible Research, i.e. the R code chunks are presented interspersed with the results and documentation, so that anyone could check and / or reproduce the results.
Data since 1996 alone is considered for this analysis, due to consistent style in data entry.
Data Processing
Assumption: Data file and this program file are in same directory.
Load the data file into data frame (full data - fdat).
fdat <- read.csv("repdatA-data-StormData.csv",header=T)
Injuries and Fatalities are used to quantify population health impact.
Property damage and its exponent, Crop damage and its exponent are used to quantify economic impact.
Select the required columns of data into a data frame for further processing. (trimmed data - tdat).
tdat <- fdat[c("INJURIES","FATALITIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP","EVTYPE")]
The event type entries are in either cases: lower or uppper.
So change all entries to lower case to avoid dupicate counting during aggregation.
The damange exponents are 0, K, M and B, standing for units, thousands, millions and billions respectively. These exponents are also are in either cases. So change all the damage exponents to lower case.
tdat$EVTYPE <- tolower(tdat$EVTYPE)
tdat$PROPDMGEXP <- tolower(tdat$PROPDMGEXP)
tdat$CROPDMGEXP <- tolower(tdat$CROPDMGEXP)
tdat <- subset(tdat,!grepl("summary",EVTYPE))
Recent data (from 1996) have damage exponents “0”,“k”, “m” or “b”. This criterion is used to isolate the recent data for further analysis.
The number of data points subsequent to data cleaning is about 0.3 million, as seen below.
tdat <- tdat[tdat$CROPDMGEXP %in% c("0","k","m","b"),]
nrow(tdat)
## [1] 283876
## Remove some typographic errors / abbreviations
tdat$EVTYPE <- gsub("avalance","avalanche",tdat$EVTYPE)
tdat$EVTYPE <- gsub("erosin","erosion",tdat$EVTYPE)
tdat$EVTYPE <- gsub("cstl","coastal",tdat$EVTYPE)
tdat$EVTYPE <- gsub("hvy","heavy",tdat$EVTYPE)
tdat$EVTYPE <- gsub("wnd","wind",tdat$EVTYPE)
tdat$EVTYPE <- gsub("w inds","winds",tdat$EVTYPE)
## Replace "/" and "\\", and other spl chars with a blank space
tdat$EVTYPE <- gsub("/"," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub("\\\\"," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub(" and"," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub(" &"," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub(","," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub("\\("," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub("\\)"," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub(";"," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub("-"," ",tdat$EVTYPE)
tdat$EVTYPE <- gsub(" "," ",tdat$EVTYPE)
## Replace some common plurals with singulars to avoid duplicate classification
tdat$EVTYPE <- gsub("winds","wind",tdat$EVTYPE)
tdat$EVTYPE <- gsub("temperatures","temperature",tdat$EVTYPE)
tdat$EVTYPE <- gsub("fires","fire",tdat$EVTYPE)
tdat$EVTYPE <- gsub("funnels","funnel",tdat$EVTYPE)
tdat$EVTYPE <- gsub("floods","flood",tdat$EVTYPE)
tdat$EVTYPE <- gsub("storms","storm",tdat$EVTYPE)
tdat$EVTYPE <- gsub("rains","rain",tdat$EVTYPE)
tdat$EVTYPE <- gsub("tornados","tornado",tdat$EVTYPE)
tdat$EVTYPE <- gsub("tornadoes","tornado",tdat$EVTYPE)
tdat$EVTYPE <- gsub("currents","current",tdat$EVTYPE)
## Trim leading or trailing blank spaces
# Source: http://stackoverflow.com/questions/2261079/how-to-trim-leading-and-trailing-whitespace-in-r
trim <- function (x) gsub("^\\s+|\\s+$", "", x)
tdat$EVTYPE <- trim(tdat$EVTYPE)
Analysis & Results
Impact of event types on Fatalities:
## Aggregate the fatalities data by event type (fatalities data - fataldat)
fataldat <- aggregate(tdat$FATALITIES,by=list(tdat$EVTYPE),FUN=sum,na.rm=T)
colnames(fataldat)<-c("EVTYPE","FATALITIES")
## Retain only the rows with non-zero fatalities
fataldat <- fataldat[fataldat$FATALITIES!=0,]
nrow(fataldat)
## [1] 55
## Sort the data by fatalities in descending order
fataldat <- fataldat[order(-fataldat$FATALITIES),]
## Select the top ten fatalities (Select fatalities data - sfdat)
sfdat <- fataldat[1:10,]
sfdat[,1] <- toupper(sfdat[,1])
print(sfdat)
## EVTYPE FATALITIES
## 110 TORNADO 1064
## 28 FLASH FLOOD 388
## 32 FLOOD 262
## 54 HEAT 219
## 87 RIP CURRENT 211
## 80 LIGHTNING 173
## 22 EXCESSIVE HEAT 171
## 103 THUNDERSTORM WIND 141
## 10 COLD WIND CHILL 94
## 4 AVALANCHE 83
## Dot chart for selected fatalities data
maintxt="Figure 1: Top 10 causes of Fatality"
xlabtxt="Number of fatalities since 1996"
dotchart(sfdat[,2],labels=sfdat[,1],xlab=xlabtxt,main=maintxt)
Impact of event types on Injuries:
The cleaned data on injuries are aggregated based on the event types, and then zero injury event types are removed, leaving 50 event types as shown below.
These non-zero injury event types are arranged in descending order, and top ten fatalities are listed below.
## Aggregate the injuries data by event type (injury data - injdat)
injdat <- aggregate(tdat$INJURIES,by=list(tdat$EVTYPE),FUN=sum,na.rm=T)
colnames(injdat)<-c("EVTYPE","INJURIES")
## Retain only the rows with non-zero injuries
injdat <- injdat[injdat$INJURIES!=0,]
nrow(injdat)
## [1] 50
## Sort the data by injuries in descending order
injdat <- injdat[order(-injdat$INJURIES),]
## Select the top ten injuries (select injury data - sidat)
sidat <- injdat[1:10,]
sidat[,1] <- toupper(sidat[,1])
print(sidat)
## EVTYPE INJURIES
## 110 TORNADO 11960
## 32 FLOOD 6495
## 75 ICE STORM 1616
## 54 HEAT 1554
## 103 THUNDERSTORM WIND 1491
## 80 LIGHTNING 1014
## 73 HURRICANE TYPHOON 909
## 22 EXCESSIVE HEAT 899
## 28 FLASH FLOOD 667
## 132 WILDFIRE 560
The three most injurious natural event types (since 1996) are Tornado, Flood and Ice storm.
However, note that if we combine Heat and Excessive heat, this combined event type will the third most injurious one.
Impact of event types on Property damage:
In the database the damage value significant figure and exponent are reported as separate columns.
The property damage value is computed for the cleaned up data.Then the property damage value is aggregated based on the event type, and zero damage event types are removed leaving 109 event types as shown below.
## Property damage value computation: property damage value - pdmgval
## A lookup vector is used to convert damage exponents to dollar value
## Idea Source: https://class.coursera.org/repdata-006/forum/thread?thread_id=131
expref <- c("0" = 1, "k" = 1000, "m" = 1E6, "b" = 1E9)
pdmgval <- as.numeric(tdat$PROPDMG*expref[tdat$PROPDMGEXP])
## Aggregate the property damage data by event type
pdat <- aggregate(pdmgval,by=list(tdat$EVTYPE),FUN=sum,na.rm=T)
colnames(pdat)<-c("EVTYPE","PROPDMG")
## Retain only the rows with non-zero property damage value
pdat <- pdat[pdat$PROPDMG!=0,]
nrow(pdat)
## [1] 109
## Sort the data by property damages in descending order
pdat <- pdat[order(-pdat$PROPDMG),]
## Select the top ten property damages (select property data - spdat)
spdat <- pdat[1:10,]
spdat[,1] <- toupper(spdat[,1])
print(spdat)
## EVTYPE PROPDMG
## 32 FLOOD 1.328e+11
## 73 HURRICANE TYPHOON 2.674e+10
## 110 TORNADO 1.617e+10
## 68 HURRICANE 9.716e+09
## 46 HAIL 7.992e+09
## 28 FLASH FLOOD 7.328e+09
## 88 RIVER FLOOD 5.080e+09
## 98 STORM SURGE TIDE 4.641e+09
## 103 THUNDERSTORM WIND 3.679e+09
## 132 WILDFIRE 3.499e+09
## Dot chart for selected property damage data
maintxt="Figure 2: Top 10 causes of Property Damage"
xlabtxt="Property damage in Billion US $s"
dotchart(spdat[,2]/1E9,labels=spdat[,1],xlab=xlabtxt,main=maintxt)
Impact of event types on Crop damage:
The crop damage analysis is identical to the property damage analysis reported above.
After aggregation by event types, there are 112 non-zero crop damage event types.
The top ten crop damage event types are shown below in a dot plot.
## Crop damage value computation: crop damage value - cdmgval
cdmgval <- as.numeric(tdat$CROPDMG*expref[tdat$CROPDMGEXP])
## Aggregate the crop damage data by event type (crop damage data - cdat)
cdat <- aggregate(cdmgval,by=list(tdat$EVTYPE),FUN=sum,na.rm=T)
colnames(cdat)<-c("EVTYPE","CROPDMG")
## Retain only the rows with non-zero crop damage value
cdat <- cdat[cdat$CROPDMG!=0,]
nrow(cdat)
## [1] 112
## Sort the data by crop damages in descending order
cdat <- cdat[order(-cdat$CROPDMG),]
## Select the top ten crop damages (select crop damage data - scdat)
scdat <- cdat[1:10,]
scdat[,1] <- toupper(scdat[,1])
print(scdat)
## EVTYPE CROPDMG
## 15 DROUGHT 1.397e+10
## 32 FLOOD 5.662e+09
## 88 RIVER FLOOD 5.029e+09
## 75 ICE STORM 5.022e+09
## 46 HAIL 3.026e+09
## 68 HURRICANE 2.742e+09
## 73 HURRICANE TYPHOON 2.608e+09
## 28 FLASH FLOOD 1.421e+09
## 24 EXTREME COLD 1.313e+09
## 41 FROST FREEZE 1.094e+09
## Dot chart for selected crop damage data
maintxt="Figure 3: Top 10 causes of Crop Damage"
xlabtxt="Property damage in Billion US $s."
dotchart(scdat[,2]/1E9,labels=scdat[,1],xlab=xlabtxt,main=maintxt)