Severe weather events can cause injuries and loss of human life, as well as economic damages. This analysis focuses on identifying severe weather events in US that are most harmful to the human population and those that impact economy by causing property and crop damages.
Storm data was originally obtained from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database has storm data from 1950 through 2011. In earlier years only fewer severe weather events were recorded, but as the years progressed storm data collection methods improved resulting in growth of number and type of storm events being recorded. NOAA’s website cautions about possible miscodings being present in this storm data. Data processing/cleansing techniques were applied on this data for this paper, but be informed of potential data anomalies.
Initial exploratory data analysis of this storm data identified the attributes that needed some data processing. Only selective attributes were included in the analytics data for this analysis to speed up processing time. Several data transformations were applied in order to prepare this data for reporting. National Weather Services storm data documentation was referred to heavily to improve quality of the storm data.
Storm Data for this analysis is a packed CSV file.
options(width=200)
##Set current working directory in R
## Download the packed CSV file and load the data
download.file("http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile="repdata-data-StormData.csv.bz2")
storms<-read.csv("repdata-data-StormData.csv.bz2",header=TRUE,na.strings = "NA",stringsAsFactors=FALSE)
## Include R libraries needed for the analysis
library(plyr)
library(ggplot2)
library(reshape2)
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
## Loading required package: DBI
## Loading required package: RSQLite.extfuns
## Attribute selection for the analysis
nws_data<-subset(storms, select=c(BGN_DATE,STATE,EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP))
Attributes included are described below. National Climatic Data Center document was referred to for this purpose.
Attributes from the raw data -
. *BGN_DATE* : Date the storm event began
. *STATE* : State postal code
. *EVTYPE* : Type of storm events.
. *FATALITIES* : Number directly killed
. *INJURIES* : Number directly injured
. *PROPDMG* : Property damage in whole numbers and hundredths
. *PROPDMGEXP* : A multiplier where Hundred (H), Thousand (K), Million (M), Billion (B) etc, any non-numeric invalids values were translated to 1.
. *CROPDMG* : Crop damage in whole numbers and hundredths
. *CROPDMGEXP* : A multiplier where Hundred (H), Thousand (K), Million (M), Billion (B) etc, any non-numeric invalids values were translated to 1.
Attributes derived from the raw data:
. year : Year derived from the BGN_DATE, and indicates the year in which the event occurred.
. propdmgcost : Total cost of property damages.
. cropdmgcost : Total cost of crop damages.
. totaldmgcost : Total property and crop damage costs.
Property and crop damage costs in this data set must be calculated (or derived) for this reporting. For property and cost damages, a whole number field (attribute named like “***DMG“) and its unit (attribute named like CROPDMGEXP) are given in the data set. The derivation for cost attributes is done in two steps. Step one, convert units (defined above) to numeric equivalent. Step two, multiply the whole number damages with their numeric unit value from step one.
Event begin date attribute is used to populate the year attribute.
## Add "year" attribute to the data frame using BGN_DATE attribute.
nws_data<- transform(nws_data,year=format(as.Date(nws_data$BGN_DATE,"%m/%d/%Y"),"%Y"))
## Set numeric unit values for property damage
## Multiplier where Hundred (H), Thousand (K), Million (M), Billion (B), number as given,
## any invalid units are translated to 1
##Step One - Property Damage Costs derivation
nws_data$PROPDMGEXP<-as.character(nws_data$PROPDMGEXP)
nws_data[nws_data$PROPDMGEXP=="+"|nws_data$PROPDMGEXP=="-"|nws_data$PROPDMGEXP==""|
nws_data$PROPDMGEXP=="?",]$PROPDMGEXP<-1
nws_data[toupper(nws_data$PROPDMGEXP)=="H",]$PROPDMGEXP<-100
nws_data[toupper(nws_data$PROPDMGEXP)=="K",]$PROPDMGEXP<-1000
nws_data[toupper(nws_data$PROPDMGEXP)=="M",]$PROPDMGEXP<-1000000
nws_data[toupper(nws_data$PROPDMGEXP)=="B",]$PROPDMGEXP<-1000000000
nws_data$PROPDMGEXP<-as.numeric(nws_data$PROPDMGEXP)
## Set numeric unit values for crop damage
## Multiplier where Hundred (H), Thousand (K), Million (M), Billion (B), number as given,
## any invalid units are translated to 1
##Step One - Crop Damage Costs derivation
nws_data$CROPDMGEXP<-as.character(nws_data$CROPDMGEXP)
nws_data[nws_data$CROPDMGEXP==""|nws_data$CROPDMGEXP=="?",]$CROPDMGEXP<-1
nws_data[toupper(nws_data$CROPDMGEXP)=="K",]$CROPDMGEXP<-1000
nws_data[toupper(nws_data$CROPDMGEXP)=="M",]$CROPDMGEXP<-1000000
nws_data[toupper(nws_data$CROPDMGEXP)=="B",]$CROPDMGEXP<-1000000000
nws_data$CROPDMGEXP<-as.numeric(nws_data$CROPDMGEXP)
## Step Two - Property, Crop and total Damage costs derivation.
nws_data<- transform(nws_data,propdmgcost=PROPDMGEXP*PROPDMG,
cropdmgcost=CROPDMGEXP*CROPDMG)
nws_data<- transform(nws_data,totaldmgcost=propdmgcost+cropdmgcost)
After reviewing the events in the data set and reading through NWS storm data documentation, we decided to exclude events corresponding to periodic summaries (ex: May Summary), rare unusual events (ex: Late Snowfall) and other meteorogical events (ex: Record Temperatures).
## Convert event type attribute (EVTYPE) to upper case
nws_data$EVTYPE<-toupper(nws_data$EVTYPE)
## Remove rows corresponding to SUMMARY data, Rare unusual events and other meteorogical events
nws_data<-nws_data[-grep("SUMMARY|MONTHLY TEMPERATURE|SNOWFALL|MONTHLY RAINFALL|
RECORD HIGH|SNOW ADVISORY|UNUSUALLY LATE SNOW|FIRST SNOW|
RECORD MAY SNOW|RECORD WINTER SNOW|LATE SNOW|METRO STORM|
PROLONG COLD/SNOW|LACK OF SNOW|EARLY|RECORD",nws_data$EVTYPE),]
National weather documentation was also used to improve the data quality for the events attribute (EVTYPE). For example: Similar storm events were worded slightly differently, e.g. “coastal flood” and “cstl flood;”FLASH FLOOD" and “FLASH FLOOODING”; or “TSTM for Thunderstorm” etc.
## Improve storm events (EVTYPE) attribute data quality based on 2.1.1 Storm Data Event Table in the NWS document
nws_data$EVTYPE[grep("LOW TIDE",nws_data$EVTYPE)]<-"ASTRONOMICAL LOW TIDE"
nws_data$EVTYPE[grep("AVA",nws_data$EVTYPE)]<-"AVALANCHE"
nws_data$EVTYPE[grep("BLIZ",nws_data$EVTYPE)]<-"BLIZZARD"
##Look for both words in the pattern for an event
pattern<-c("COASTAL","FLOOD")
idx <- Reduce(`&`, lapply(pattern, grepl, nws_data$EVTYPE))
nws_data$EVTYPE[idx]<-"COASTAL FLOOD"
nws_data$EVTYPE[grep("CSTL FLOOD",nws_data$EVTYPE)]<-"COASTAL FLOOD"
nws_data$EVTYPE[grep("^(COLD|WIND|COLD/WIND)",nws_data$EVTYPE)]<-"COLD/WIND CHILL"
nws_data$EVTYPE[grep("EXTREME COLD|RECORD COLD|
EXTREME WIND CHILL|UNUSUALLY COLD",nws_data$EVTYPE)]<-"EXTREME COLD/WIND CHILL"
nws_data$EVTYPE[grep("FOG AND|FREEZING FOG|ICE FOG",nws_data$EVTYPE)]<-"FREEZING FOG"
nws_data$EVTYPE[grep("^(DENSE FOG|FOG|PATCHY D)",nws_data$EVTYPE)]<-"DENSE FOG"
nws_data$EVTYPE[grep("DROUGHT",nws_data$EVTYPE)]<-"DROUGHT"
nws_data$EVTYPE[grep("DUST DE|BLOWING DUST",nws_data$EVTYPE)]<-"DUST DEVIL"
##Look for both words in the pattern for an event
pattern<-c("FLASH","FLOOD")
idx <- Reduce(`&`, lapply(pattern, grepl, nws_data$EVTYPE))
nws_data$EVTYPE[idx]<-"FLASH FLOOD"
nws_data$EVTYPE[grep("FLASH FLOOOD",nws_data$EVTYPE)]<-"FLASH FLOOD"
nws_data$EVTYPE[grep("EXCESSIVE HEAT|EXTREME HEAT|RECORD HIGH",nws_data$EVTYPE)]<-"EXCESSIVE HEAT"
nws_data$EVTYPE[grep("FROST|FREEZE",nws_data$EVTYPE)]<-"FROST/FREEZE"
nws_data$EVTYPE[grep("^(HAIL)|SMALL HAIL|DEEP HAIL|NON SEVERE HAIL",nws_data$EVTYPE)]<-"HAIL"
nws_data$EVTYPE[grep("^(HEAT)|HOT|WARM|HYPERTHERMIA/EXPOSURE|MONTHLY TEMPERATURE",nws_data$EVTYPE)]<-"HEAT"
nws_data$EVTYPE[grep("FUNNEL",nws_data$EVTYPE)]<-"FUNNEL CLOUD"
nws_data$EVTYPE[grep("HEAVY SNOW|RECORD SNOW|EXCESSIVE SNOW|
ACCUMULATED SNOWFALL|HEAVY LAKE SNOW",nws_data$EVTYPE)]<-"HEAVY SNOW"
nws_data$EVTYPE[grep("HEAVY RAIN|EXCESSIVE RAIN|PROLONGED RAIN|(HEAVY)|EXCESSIVE PRECI|
HEAVY PRECI|TORRENTIAL RAIN|EXTREMELY WET|EXCESSIVE WET|HEAVY SHOWER|HVY RAIN",nws_data$EVTYPE)]<-"HEAVY RAIN"
nws_data$EVTYPE[grep("HIGH SURF|ROUGH SURF|HIGH WAVES|HIGH TIDES|HIGH SEAS|HEAVY SURF|
HAZARDOUS SURF",nws_data$EVTYPE)]<-"HIGH SURF"
nws_data$EVTYPE[grep("HIGH WIND",nws_data$EVTYPE)]<-"HIGH WIND"
nws_data$EVTYPE[grep("HURRICANE|TYPHOON",nws_data$EVTYPE)]<-"HURRICANE/TYPHOON"
nws_data$EVTYPE[grep("ICE STORM",nws_data$EVTYPE)]<-"ICE STORM"
nws_data$EVTYPE[grep("LAKE-E|LAKE E",nws_data$EVTYPE)]<-"LAKE-EFFECT SNOW"
nws_data$EVTYPE[grep("LIGHTNING|LIGHTING|LIGNTNING",nws_data$EVTYPE)]<-"LIGHTNING"
nws_data$EVTYPE[grep("SLEET|HEAVY WET SNOW|FREEZING RAIN|FREEZING DRIZZLE",nws_data$EVTYPE)]<-"SLEET"
nws_data$EVTYPE[grep("^(STRONG WIND)",nws_data$EVTYPE)]<-"STRONG WIND"
nws_data$EVTYPE[grep("MIXED PRECI|NORMAL PREC|RAINFALL|PRECIPITATION",nws_data$EVTYPE)]<-"RAIN"
nws_data[nws_data$EVTYPE %in% unique(nws_data$EVTYPE)[
grep("THUN|THUDER|TSTM",unique(nws_data[nws_data$EVTYPE!=
"MARINE THUNDERSTORM WIND",]$EVTYPE))],]$EVTYPE <-"THUNDERSTORM WIND"
# Section 7.39 for NWS documentation indicates MICROBURST/GUSTNADO as Thunderstorm Wind.
nws_data$EVTYPE[grep("MICROBURST|MICOBURST|GUST",nws_data$EVTYPE)]<-"MICROBURST"
nws_data$EVTYPE[grep("TORN",nws_data$EVTYPE)]<-"TORNADO"
nws_data$EVTYPE[grep("TROPICAL STORM",nws_data$EVTYPE)]<-"TROPICAL STORM"
nws_data$EVTYPE[grep("VOLCANIC ASH",nws_data$EVTYPE)]<-"VOLCANIC ASH"
nws_data$EVTYPE[grep("WATERSPOUT|WAYTERSPOUT|WATER SPOUT",nws_data$EVTYPE)]<-"WATERSPOUT"
nws_data$EVTYPE[grep("WALL CLOUD",nws_data$EVTYPE)]<-"WALL CLOUD"
nws_data$EVTYPE[grep("FIRE",nws_data$EVTYPE)]<-"WILDFIRE"
nws_data$EVTYPE[grep("WINTER STORM",nws_data$EVTYPE)]<-"WINTER STORM"
nws_data$EVTYPE[grep("WINTER WEATHER|WINTERY MIX|WINTRY MIX|ICY ROADS|SNOWFALL",nws_data$EVTYPE)]<-"WINTER WEATHER"
nws_data$EVTYPE[grep("DRY",nws_data$EVTYPE)]<-"DRY WEATHER"
## Based on 7.14.1 section in the NWS documentation, a mudslide/landslide is categorized as Debris Flow
nws_data$EVTYPE[grep("SLIDE|LANDSLUMP",nws_data$EVTYPE)]<-"DEBRIS FLOW"
flood_idx<-grep("FLOOD|FLD",nws_data$EVTYPE)
flash_idx<-grep("FLASH FLOOD",nws_data$EVTYPE)
coastal_idx<-grep("COASTAL FLOOD",nws_data$EVTYPE)
lake_idx<-grep("LAKESHORE FLOOD",nws_data$EVTYPE)
nws_data[setdiff(flood_idx,union(union(flash_idx,coastal_idx),lake_idx)),]$EVTYPE<-"FLOOD"
nws_data$EVTYPE[grep("RIP CURRENT",nws_data$EVTYPE)]<-"RIP CURRENTS"
## Categorize any event that has ICE or FREEZ in it and is not a standard event in 2.1.1 event table as SLEET.
nws_data[intersect(setdiff(grep("ICE|FREEZ",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|LAKE-E",nws_data$EVTYPE)),
setdiff(grep("SNOW",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|LAKE-E",nws_data$EVTYPE))),]$EVTYPE<-"SLEET"
## Categorize remaining events that has ICE and is not a standard event as WINTER WEATHER
nws_data[setdiff(grep("ICE",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|
LAKE-E",nws_data$EVTYPE)),]$EVTYPE<-"WINTER WEATHER"
nws_data[setdiff(grep("ICE|SNOW",nws_data$EVTYPE),grep("ICE STORM|HEAVY SNOW|
LAKE-E|LATE SEASON SNOW|SNOWS",nws_data$EVTYPE)),]$EVTYPE<-"WINTER WEATHER"
nws_data[setdiff(grep("WIND",nws_data$EVTYPE),grep("EXTREME COLD/WIND CHILL|
THUNDERSTORM WIND|MARINE|HIGH WIND|CHILL",nws_data$EVTYPE)),]$EVTYPE<-"HIGH WIND"
nws_data$EVTYPE[grep("CHILL|UNSEASONABLY COLD",nws_data$EVTYPE)]<-"EXTREME COLD/WIND CHILL"
A few outliers were observed in the data set. But for the purposes of this analysis a decision was made to ignore any outliers.
## Sample Outliers in the dataset
## May 22, 2011 Joplin, Missouri EF5 Tornado
nws_data[nws_data$FATALITIES>150 & nws_data$EVTYPE=="TORNADO",]
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP year propdmgcost cropdmgcost totaldmgcost
## 862634 5/22/2011 0:00:00 MO TORNADO 158 1150 2.8 1e+09 0 1000 2011 2.8e+09 0 2.8e+09
## The heat wave of [Chicago](http://www.crh.noaa.gov/lot?n=top20events_1900to1999)
nws_data[nws_data$FATALITIES>500 & nws_data$EVTYPE=="HEAT",]
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP year propdmgcost cropdmgcost totaldmgcost
## 198704 7/12/1995 0:00:00 IL HEAT 583 0 0 1 0 1 1995 0 0 0
## Potential data issue
nws_data[nws_data$totaldmgcost>1e+10 & nws_data$EVTYPE=="FLOOD",]
## BGN_DATE STATE EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP year propdmgcost cropdmgcost totaldmgcost
## 605953 1/1/2006 0:00:00 CA FLOOD 0 0 115 1e+09 32.5 1e+06 2006 1.15e+11 32500000 1.15e+11
Summarize data Metrics like total, average and median fatalities, injuries , costs (property damage costs, crop damage costs and total costs) are computed for each event and for each year. The data are further summarized by event type using these same metrics.
## Yearly totals by Event Type
nws_yearly_aggregate <- ddply(nws_data, .(year,EVTYPE), summarize,
totFatalities = sum(FATALITIES),
totInjuries = sum(INJURIES),
totPropcost = sum(propdmgcost),
totCropcost = sum(cropdmgcost),
totalcost=sum(propdmgcost) +sum(cropdmgcost)
)
## Overall Totals and Averages by Event Type. Calculating median for totalcost as an outlier was observed in the data set.
eventSummary <-sqldf("select EVTYPE,sum(totFatalities) totFatalities,
AVG(totFatalities) avgFatalities,
AVG(CASE WHEN year='2011' THEN totFatalities END) AVG_1yr_FATAL,
AVG(CASE WHEN year>'2000' THEN totFatalities END) AVG_10yr_FATAL,
AVG(CASE WHEN year>'1980' THEN totFatalities END) AVG_30yr_FATAL,
sum(totInjuries) totInjuries,
AVG(totInjuries) avgInjuries,
AVG(CASE WHEN year='2011' THEN totInjuries END) AVG_1yr_Injuries,
AVG(CASE WHEN year>'2000' THEN totInjuries END) AVG_10yr_Injuries,
AVG(CASE WHEN year>'1980' THEN totInjuries END) AVG_30yr_Injuries,
sum(totPropcost) totPropcost,
SUM(totCropcost) totCropcost,
SUM(totalcost) totalcost,
AVG(totalcost) AVGtotalcost,
AVG(CASE WHEN year='2011' THEN totalcost END) AVG_1yr_totalcost,
AVG(CASE WHEN year>'2000' THEN totalcost END) AVG_10yr_totalcost,
AVG(CASE WHEN year>'1980' THEN totalcost END) AVG_30yr_totalcost,
median(totalcost) mediantotalcost,
median(CASE WHEN year='2011' THEN totalcost END) median_1yr_totalcost,
median(CASE WHEN year>'2000' THEN totalcost END) median_10yr_totalcost,
median(CASE WHEN year>'1980' THEN totalcost END) median_30yr_totalcost,
median(totFatalities) medianFatalities,
median(CASE WHEN year='2011' THEN totFatalities END) median_1yr_FATAL,
median(CASE WHEN year>'2000' THEN totFatalities END) median_10yr_FATAL,
median(CASE WHEN year>'1980' THEN totFatalities END) median_30yr_FATAL,
median(totInjuries) medianInjuries,
median(CASE WHEN year='2011' THEN totInjuries END) median_1yr_Injuries,
median(CASE WHEN year>'2000' THEN totInjuries END) median_10yr_Injuries,
median(CASE WHEN year>'1980' THEN totInjuries END) median_30yr_Injuries
from nws_yearly_aggregate
group by EVTYPE")
## Loading required package: tcltk
Top 10 events causing fatalities in human population and their metrics (total, average and median)
subset(eventSummary[order(eventSummary$totFatalities,eventSummary$totInjuries, decreasing=T),],
select=c(EVTYPE,totFatalities,avgFatalities,medianFatalities,totInjuries,avgInjuries,medianInjuries,totalcost))[1:10,]
## EVTYPE totFatalities avgFatalities medianFatalities totInjuries avgInjuries medianInjuries totalcost
## 98 TORNADO 5636 90.90 56.5 91407 1474.31 1060.0 5.741e+10
## 34 EXCESSIVE HEAT 1999 111.06 59.5 6680 371.11 293.5 5.053e+08
## 53 HIGH WIND 1156 20.28 3.0 11348 199.09 60.0 2.095e+10
## 46 HEAT 1155 72.19 14.5 2498 156.12 15.0 4.193e+08
## 37 FLASH FLOOD 1035 54.47 58.0 1802 94.84 44.0 1.844e+10
## 62 LIGHTNING 817 43.00 44.0 5232 275.37 256.0 9.461e+08
## 83 RIP CURRENTS 572 31.78 30.5 529 29.39 22.0 1.630e+05
## 38 FLOOD 511 26.89 22.0 6874 361.79 14.0 1.611e+11
## 36 EXTREME COLD/WIND CHILL 489 25.74 27.0 407 21.42 14.0 1.513e+09
## 47 HEAVY RAIN 291 15.32 11.0 1404 73.89 61.0 5.285e+09
Percent of total fatalities caused by Tornado
round(eventSummary[eventSummary$EVTYPE=="TORNADO",]$totFatalities/sum(eventSummary$totFatalities),2)*100
## [1] 37
Top 10 events causing injuries in human population and their metrics (total, average and median)
subset(eventSummary[order(eventSummary$totInjuries,eventSummary$totFatalities, decreasing=T),],
select=c(EVTYPE,totInjuries,avgInjuries,medianInjuries,totFatalities,avgFatalities,medianFatalities))[1:10,]
## EVTYPE totInjuries avgInjuries medianInjuries totFatalities avgFatalities medianFatalities
## 98 TORNADO 91407 1474.31 1060.0 5636 90.903 56.5
## 53 HIGH WIND 11348 199.09 60.0 1156 20.281 3.0
## 38 FLOOD 6874 361.79 14.0 511 26.895 22.0
## 34 EXCESSIVE HEAT 6680 371.11 293.5 1999 111.056 59.5
## 62 LIGHTNING 5232 275.37 256.0 817 43.000 44.0
## 46 HEAT 2498 156.12 15.0 1155 72.188 14.5
## 57 ICE STORM 1990 104.74 8.0 89 4.684 3.0
## 37 FLASH FLOOD 1802 94.84 44.0 1035 54.474 58.0
## 121 WILDFIRE 1608 84.63 82.0 90 4.737 2.0
## 47 HEAVY RAIN 1404 73.89 61.0 291 15.316 11.0
Percent of total injuries caused by Tornado
round(eventSummary[eventSummary$EVTYPE=="TORNADO",]$totInjuries/sum(eventSummary$totInjuries),2)*100
## [1] 65
Percent of Total Fatalities caused by Excessive Heat and/or Heat
round(sum(eventSummary[grep("HEAT",eventSummary$EVTYPE),]$totFatalities)/sum(eventSummary$totFatalities),2)*100
## [1] 21
## Get data for top 10 most fatalities storm events.
eventSummaryFATsort<-eventSummary[order(eventSummary$totFatalities, decreasing=T),]
eventSummaryFATsort<-eventSummaryFATsort[1:10,]
## Average Fatalities for Analysis
eventAVGFatalities<-melt(subset(eventSummaryFATsort,
select=c(EVTYPE,avgFatalities,AVG_1yr_FATAL,AVG_10yr_FATAL,AVG_30yr_FATAL))
, id.vars="EVTYPE")
eventAVGFatalities$value <-as.numeric(eventAVGFatalities$value)
eventAVGFatalities$variable <-as.character(eventAVGFatalities$variable)
eventAVGFatalities[eventAVGFatalities$variable=="avgFatalities",]$variable <-"1950-2011"
eventAVGFatalities[eventAVGFatalities$variable=="AVG_1yr_FATAL",]$variable <-"2011"
eventAVGFatalities[eventAVGFatalities$variable=="AVG_10yr_FATAL",]$variable <-"2001-2011"
eventAVGFatalities[eventAVGFatalities$variable=="AVG_30yr_FATAL",]$variable <-"1981-2011)"
## Median Fatalities for Analysis
eventMedianFatalities<-melt(subset(eventSummaryFATsort,
select=c(EVTYPE,medianFatalities,median_1yr_FATAL,median_10yr_FATAL,median_30yr_FATAL))
, id.vars="EVTYPE")
eventMedianFatalities$value <-as.numeric(eventMedianFatalities$value)
eventMedianFatalities<-eventMedianFatalities[order(eventMedianFatalities$value),]
eventMedianFatalities$variable <-as.character(eventMedianFatalities$variable)
eventMedianFatalities[eventMedianFatalities$variable=="medianFatalities",]$variable <-"1950-2011"
eventMedianFatalities[eventMedianFatalities$variable=="median_1yr_FATAL",]$variable <-"2011"
eventMedianFatalities[eventMedianFatalities$variable=="median_10yr_FATAL",]$variable <-"2001-2011"
eventMedianFatalities[eventMedianFatalities$variable=="median_30yr_FATAL",]$variable <-"1981-2011"
## Get data for top 10 storm events causing injuries.
eventSummaryInjsort<-eventSummary[order(eventSummary$totInjuries, decreasing=T),]
eventSummaryInjsort<-eventSummaryInjsort[1:10,]
## Average Injuries for Analysis
eventAVGInjuries<-melt(subset(eventSummaryInjsort,
select=c(EVTYPE,avgInjuries,AVG_1yr_Injuries,AVG_10yr_Injuries,AVG_30yr_Injuries))
, id.vars="EVTYPE")
eventAVGInjuries$value <-as.numeric(eventAVGInjuries$value)
eventAVGInjuries$variable <-as.character(eventAVGInjuries$variable)
eventAVGInjuries[eventAVGInjuries$variable=="avgInjuries",]$variable <-"1950-2011"
eventAVGInjuries[eventAVGInjuries$variable=="AVG_1yr_Injuries",]$variable <-"2011"
eventAVGInjuries[eventAVGInjuries$variable=="AVG_10yr_Injuries",]$variable <-"2001-2011"
eventAVGInjuries[eventAVGInjuries$variable=="AVG_30yr_Injuries",]$variable <-"1981-2011)"
## Median Injuries for Analysis
eventMedianInjuries<-melt(subset(eventSummaryInjsort,
select=c(EVTYPE,medianInjuries,median_1yr_Injuries,median_10yr_Injuries,median_30yr_Injuries))
, id.vars="EVTYPE")
eventMedianInjuries$value <-as.numeric(eventMedianInjuries$value)
eventMedianInjuries<-eventMedianInjuries[order(eventMedianInjuries$value),]
eventMedianInjuries$variable <-as.character(eventMedianInjuries$variable)
eventMedianInjuries[eventMedianInjuries$variable=="medianInjuries",]$variable <-"1950-2011"
eventMedianInjuries[eventMedianInjuries$variable=="median_1yr_Injuries",]$variable <-"2011"
eventMedianInjuries[eventMedianInjuries$variable=="median_10yr_Injuries",]$variable <-"2001-2011"
eventMedianInjuries[eventMedianInjuries$variable=="median_30yr_Injuries",]$variable <-"1981-2011"
## 2011 data for severe weather events
sqldf("select EVTYPE,sum(totFatalities) totFatalities,
sum(totInjuries) totInjuries,
sum(totPropcost) totPropcost,
SUM(totCropcost) totCropcost,
SUM(totalcost) totalcost
from nws_yearly_aggregate
where year='2011'
group by EVTYPE
order by 4 desc
")[1:10,]
## EVTYPE totFatalities totInjuries totPropcost totCropcost totalcost
## 1 TORNADO 587 6163 9.820e+09 31361000 9.851e+09
## 2 FLOOD 58 10 7.718e+09 154872000 7.873e+09
## 3 FLASH FLOOD 68 30 1.384e+09 88447000 1.472e+09
## 4 WILDFIRE 6 116 6.483e+08 9797000 6.581e+08
## 5 HAIL 0 31 4.513e+08 82334000 5.337e+08
## 6 HIGH WIND 68 417 4.414e+08 199184000 6.406e+08
## 7 TROPICAL STORM 4 1 1.387e+08 24501000 1.632e+08
## 8 TSUNAMI 1 0 5.355e+07 0 5.355e+07
## 9 LIGHTNING 26 194 4.698e+07 112000 4.709e+07
## 10 STORM SURGE/TIDE 0 0 4.070e+07 0 4.070e+07
## Economic Damages Data Analysis
subset(eventSummary[order(eventSummary$totalcost, decreasing=T),],
select=c(EVTYPE,totalcost,totPropcost,totCropcost,AVGtotalcost,mediantotalcost,
AVG_10yr_totalcost,median_10yr_totalcost))[1:10,]
## EVTYPE totalcost totPropcost totCropcost AVGtotalcost mediantotalcost AVG_10yr_totalcost median_10yr_totalcost
## 38 FLOOD 1.611e+11 1.502e+11 1.086e+10 8.477e+09 1.313e+09 12508050284.5455 1173603400.0
## 54 HURRICANE/TYPHOON 9.076e+10 8.526e+10 5.506e+09 5.042e+09 8.407e+08 7540017781.0 524977505.0
## 98 TORNADO 5.741e+10 5.699e+10 4.150e+08 9.259e+08 6.314e+08 1751364542.72727 802053210.0
## 96 STORM SURGE 4.332e+10 4.332e+10 5.000e+03 3.610e+09 8.164e+06 7194962500.0 26894000.0
## 53 HIGH WIND 2.095e+10 1.891e+10 2.046e+09 3.676e+08 0.000e+00 1053791499.09091 607908770.0
## 44 HAIL 1.902e+10 1.597e+10 3.047e+09 3.337e+08 0.000e+00 1200699378.18182 606047300.0
## 37 FLASH FLOOD 1.844e+10 1.691e+10 1.532e+09 9.705e+08 8.689e+08 1107210019.09091 991720900.0
## 26 DROUGHT 1.502e+10 1.046e+09 1.397e+10 7.905e+08 5.548e+08 685767545.454545 374677000.0
## 57 ICE STORM 8.968e+09 3.946e+09 5.022e+09 4.720e+08 1.132e+08 180310436.363636 87469000.0
## 121 WILDFIRE 8.900e+09 8.497e+09 4.033e+08 4.684e+08 2.019e+08 482018111.818182 202833630.0
round(eventSummary[eventSummary$EVTYPE=="FLOOD",]$totalcost/sum(eventSummary$totalcost),2)*100
## [1] 34
round(eventSummary[eventSummary$EVTYPE=="HURRICANE/TYPHOON",]$totalcost/sum(eventSummary$totalcost),2)*100
## [1] 19
round(eventSummary[eventSummary$EVTYPE=="TORNADO",]$totalcost/sum(eventSummary$totalcost),2)*100
## [1] 12
eventSummarytotcostsort<-eventSummary[order(eventSummary$totalcost, decreasing=T),]
eventSummarytotcostsort<-eventSummarytotcostsort[1:10,]
eventTotalcost<-melt(eventSummarytotcostsort[,c(1,15,16,17,18)], id.vars="EVTYPE")
eventTotalcost$value <-as.numeric(eventTotalcost$value)
eventTotalcost$variable <-as.character(eventTotalcost$variable)
eventTotalcost[eventTotalcost$variable=="AVGtotalcost",]$variable <-"1950-2011"
eventTotalcost[eventTotalcost$variable=="AVG_1yr_totalcost",]$variable <-"2011"
eventTotalcost[eventTotalcost$variable=="AVG_10yr_totalcost",]$variable <-"2001-2011"
eventTotalcost[eventTotalcost$variable=="AVG_30yr_totalcost",]$variable <-"1981-2011"
## Median Cost Analysis
eventMedianCost<-melt(eventSummarytotcostsort[,c(1,19,20,21,22)], id.vars="EVTYPE")
eventMedianCost$value <-as.numeric(eventMedianCost$value)
eventMedianCost<-eventMedianCost[order(eventMedianCost$value),]
eventMedianCost$variable <-as.character(eventMedianCost$variable)
eventMedianCost[eventMedianCost$variable=="mediantotalcost",]$variable <-"1950-2011"
eventMedianCost[eventMedianCost$variable=="median_1yr_totalcost",]$variable <-"2011"
eventMedianCost[eventMedianCost$variable=="median_10yr_totalcost",]$variable <-"2001-2011"
eventMedianCost[eventMedianCost$variable=="median_30yr_totalcost",]$variable <-"1981-2011"
Storm events having most harmful effect on human population in US between 1950 and 2011
Tornado is identified to be the most harmful severe weather event causing over 37% of fatalities and 65% of injuries in US between 1950 and 2011. A total of 5636 fatalities and 91407 injuries have been recorded during these years. In the year 2011 especially fatalities (587) and injuries (6163) recorded from tornados were way above the normal averages and medians between 1950 and 2011.
Excessive heat and heat have proven to be the second most dangerous storm event for the human population. There were 1999 fatalities recorded from “excessive heat” and 1155 fatalities from “heat”. together “excessive heat” and “heat” accounted for over 21% of fatalities and 7% of injuries.
Flash flooding, lightning, thunderstorms winds and flooding have been observed as the next set of events causing most fatalities in human population. It is also observed that the thunderstorm winds, flood, excessive heat, lightning seem to cause most injuries in human population. All these observations and much more are clearly depicted in the analysis above and plots in the figure below.
## **FIGURE - 1.**
## Generate plots for FATALITIES
plot1<-ggplot(data=eventSummaryFATsort, aes(x=reorder(EVTYPE, -totFatalities), y=totFatalities,las=2)) +
geom_bar(stat="identity", position='dodge',fill="#FF9999") +
labs( x = "Severe Storm Events (1950-2011)",y = "Total Fatalities",
title = "Top 10 events in US causing most fatalities") +
theme(axis.text=element_text(size=12), axis.title=element_text(size=14,face="bold"),
axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
plot.title = element_text(lineheight=3, face="bold", color="black", size=25))
plot3<-ggplot(data=eventAVGFatalities, aes(x=EVTYPE, y=value,las=2)) +
geom_bar(aes(fill=variable),stat="identity", position='dodge') +
labs( x = "Severe Storm Events (1950-2011)",y = "Average Fatalities",
title = "Average of Top 10 events in US causing most fatalities") +
theme(axis.text=element_text(size=12),axis.title=element_text(size=14,face="bold"),
axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
legend.position="bottom", legend.text=element_text(size=14, face="bold"),
plot.title = element_text(lineheight=3, face="bold", color="black", size=25))
## Generate plots for INJURIES
plot2<-ggplot(data=eventSummaryInjsort, aes(x=reorder(EVTYPE, -totInjuries), y=totInjuries,las=2)) +
geom_bar(stat="identity", position='dodge',fill="#FF9999") +
labs( x = "Severe Storm Events (1950-2011)",y = "Total Injuries",
title = "Top 10 events in US causing most injuries") +
theme(axis.text=element_text(size=12),axis.title=element_text(size=14,face="bold"),
axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
plot.title = element_text(lineheight=3, face="bold", color="black", size=25))
plot4<-ggplot(data=eventAVGInjuries, aes(x=EVTYPE, y=value,las=2)) +
geom_bar(aes(fill=variable),stat="identity", position='dodge') +
labs( x = "Severe Storm Events (1950-2011)",y = "Average Injuries",
title = "Average of Top 10 events in US causing most injuries") +
theme(axis.text=element_text(size=12),axis.title=element_text(size=14,face="bold"),
axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
legend.position="bottom", legend.text=element_text(size=14, face="bold"),
plot.title = element_text(lineheight=3, face="bold", color="black", size=25))
require(grid)
## Loading required package: grid
grid.newpage()
pushViewport(viewport(layout = grid.layout(2, 2)))
print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(plot2, vp = viewport(layout.pos.row = 1, layout.pos.col = 2))
print(plot3, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
print(plot4, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))
Total storm effect on Economy in US between 1950 and 2011
Floods, hurricane/typhoon and tornados seems to be top 3 severe storm events that caused most economic damages in US between 1950-2011. Floods constituted 34%, hurricane/typhoon constituted 19% and TORNADO constituted 12% of total economic damages during that period. Plots in the figure below show these results clearly.
## **FIGURE - 2.**
## Generate Plots for TOTALCOST
plot1<-ggplot(data=eventSummarytotcostsort, aes(x=reorder(EVTYPE, -totalcost), y=totalcost,las=2)) +
geom_bar(stat="identity", position='dodge',fill="#FF9999") +
labs( x = "Severe Weather Events (1950-2011)",y = "Total Economic Damages",
title = "Top 10 storm events that cause most economic damage in US") +
theme(axis.text=element_text(size=14,face="bold"),axis.title=element_text(size=14,face="bold"),
axis.text.x = element_text(size = 12, colour = 'black',face="bold",angle = 90),
axis.text.y = element_text(size = 12, colour = 'black',face="bold"),
plot.title = element_text(lineheight=3, face="bold", color="black", size=20))
plot2<-ggplot(data=eventTotalcost, aes(x=EVTYPE, y=value,las=2)) +
geom_bar(aes(fill=variable),stat="identity", position='dodge') +
labs( x = "Severe Weather Events (1950-2011)",y = "Average Economic Damages",
title = "Average economic damage from storm events (top 10) in US") +
theme(axis.text=element_text(size=14,face="bold"),axis.title=element_text(size=14,face="bold"),
axis.text.x = element_text(size = 14, colour = 'black',face="bold",angle = 90),
axis.text.y = element_text(size = 14, colour = 'black',face="bold"),
legend.position="bottom", legend.text=element_text(size=14, face="bold"),
plot.title = element_text(lineheight=3, face="bold", color="black", size=20))
plot3<-ggplot(data=eventMedianCost, aes(x=EVTYPE, y=value,las=2)) +
geom_bar(aes(fill=variable),stat="identity", position='dodge') +
labs( x = "Severe Weather Events (1950-2011)",y = "Median Economic Damages",
title = "Median storm events causing most damage to US economy") +
theme(axis.text=element_text(size=14,face="bold"),axis.title=element_text(size=14,face="bold"),
axis.text.x = element_text(size = 14, colour = 'black',face="bold",angle = 90),
axis.text.y = element_text(size = 14, colour = 'black',face="bold"),
legend.position="bottom", legend.text=element_text(size=14, face="bold"),
plot.title = element_text(lineheight=3, face="bold", color="black", size=20))
require(grid)
grid.newpage()
pushViewport(viewport(layout = grid.layout(2,2)))
print(plot1, vp = viewport(layout.pos.row = 1, layout.pos.col = 1))
print(plot2, vp = viewport(layout.pos.row = 2, layout.pos.col = 1))
print(plot3, vp = viewport(layout.pos.row = 2, layout.pos.col = 2))