This document reports on the public health and economic impact of major storms in the USA. The questions it answers are:
Which types of storm event have caused the most deaths?
Which types of storm event have cost the most in terms of total damage?
types of storm event have cost the most in terms of damage per event?
The data were obtained from the following link.
https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
This link contains a bzip file. The R code used to extract the data is
{
setwd("C:/Users/Warwick/Documents/DataScience/repdata")
if(!exists("rawStormData"))
{
if(!exists("storm.csv.bz2"))
{
download_url<-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(download_url, destfile="storm.csv.bz2")
#unzip(download.file)
}
rawStormData <- read.csv("storm.csv.bz2", stringsAsFactors = FALSE, sep="," )
}
}
I found out how big the dataset is by running dim
{
dim(rawStormData)
}
## [1] 902297 37
I found out the columns of interest from dimnames.
{
look_StormData<-head(rawStormData)
dimnames(look_StormData)
}
## [[1]]
## [1] "1" "2" "3" "4" "5" "6"
##
## [[2]]
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
The data columns required are event type, number of fatalities, number of injuries, property damage, expenditure on property damage, crop damage and expenditure on crop damage.
I created a data table for the analysis by selecting the columns above.
{
library(dplyr)
storm_table<-tbl_df(rawStormData)
storm_effects<-select(storm_table, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
}
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:stats':
##
## filter
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Using the number of fatalities and injuries here are at least two ways of measuring this:
The R-code to find:
{
#Check for missing values for fatalities and injuries.
length(storm_effects[is.na(storm_effects$FATALITIES)==TRUE])
length(storm_effects[is.na(storm_effects$INJURIES)==TRUE])
PERSONDMG=storm_effects$FATALITIES+storm_effects$INJURIES
storm_damage<-cbind(storm_effects, PERSONDMG)
by_event_type<-group_by(storm_damage, EVTYPE)
pop_health<-summarise(by_event_type, total_people=sum(PERSONDMG), mean_people=mean(PERSONDMG))
worst_event_types<-arrange(pop_health,desc(total_people))
}
The ten types of events that have killed or injured the most people are in given by
{
worst_totals<-select(worst_event_types, "Type of Event"=EVTYPE, "Deaths and Injuries"=total_people)
head(worst_totals,10)
}
## Source: local data frame [10 x 2]
##
## Type of Event Deaths and Injuries
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 TSTM WIND 7461
## 4 FLOOD 7259
## 5 LIGHTNING 6046
## 6 HEAT 3037
## 7 FLASH FLOOD 2755
## 8 ICE STORM 2064
## 9 THUNDERSTORM WIND 1621
## 10 WINTER STORM 1527
and in descending order are:
Type of Event Deaths and Injuries
1 TORNADO 96979 2 EXCESSIVE HEAT 8428 3 TSTM WIND 7461 4 FLOOD 7259 5 LIGHTNING 6046 6 HEAT 3037 7 FLASH FLOOD 2755 8 ICE STORM 2064 9 THUNDERSTORM WIND 1621 10 WINTER STORM 1527
The figure below shows this. It was derived using the following code:
{
worst_types<-head(worst_event_types$EVTYPE,10)
worst<-head(worst_event_types$total_people,10)
barplot(worst, legend=worst_types, col=c("red","orange", "yellow", "green","blue","purple", "violet", "lavender", "lightblue", "pink"),args.legend=list(x="topright",cex=0.5), las=2, xlab="Event Type", main="Deaths and Injuries in USA\n Caused By the Top Ten Types of Storm Events")
}
There is a problem with the code of event types. Flood and flash flood could be considered as one category as could heat and excess heat, and thunderstorm winds and tstm winds.
The ten types of events with the highest number of deaths and injuries per event can be found by
{
worst_event_rate_types<-arrange(pop_health,desc(mean_people))
worst_rates<-select(worst_event_rate_types, "Type of Event"=EVTYPE, "Deaths and Injuries Per Event"=mean_people)
head(worst_rates,10)
}
## Source: local data frame [10 x 2]
##
## Type of Event Deaths and Injuries Per Event
## 1 Heat Wave 70.00000
## 2 TROPICAL STORM GORDON 51.00000
## 3 WILD FIRES 38.25000
## 4 THUNDERSTORMW 27.00000
## 5 TORNADOES, TSTM WIND, HAIL 25.00000
## 6 HIGH WIND AND SEAS 23.00000
## 7 HEAT WAVE DROUGHT 19.00000
## 8 SNOW/HIGH WINDS 18.00000
## 9 WINTER STORM HIGH WINDS 16.00000
## 10 HURRICANE/TYPHOON 15.21591
and are: Type of Event Deaths and Injuries 1 Heat Wave 70.00000 2 TROPICAL STORM GORDON 51.00000 3 WILD FIRES 38.25000 4 THUNDERSTORMW 27.00000 5 TORNADOES, TSTM WIND, HAIL 25.00000 6 HIGH WIND AND SEAS 23.00000 7 HEAT WAVE DROUGHT 19.00000 8 SNOW/HIGH WINDS 18.00000 9 WINTER STORM HIGH WINDS 16.00000 10 HURRICANE/TYPHOON 15.21591
Tornadoes have killed or injured by far the greatest number of people for any type of storm event but the mean number of deaths and injuries per event is greater for heat waves, wild fires and thunderstorms with high winds.
The damage caused by storm events the total estimates of damage to property and crops in millions of dollars are combined for each event. Below is the R code for doing this:
{
ECONDMG=storm_damage$PROPDMG+storm_damage$CROPDMG
all_storm_damage<-cbind(storm_damage,ECONDMG)
}
The ten types of events that have caused the most costly estimated damage are in given by
{
by_damage_event_type<-group_by(all_storm_damage, EVTYPE)
econ_damage<-summarise(by_damage_event_type, total_damage=sum(ECONDMG), mean_damage=mean(ECONDMG))
worst_econ_event_types<-arrange(econ_damage,desc(total_damage))
worst_damage_totals<-select(worst_econ_event_types, "Type of Event"=EVTYPE, "Total Damage"=total_damage)
head(worst_damage_totals,10)
}
## Source: local data frame [10 x 2]
##
## Type of Event Total Damage
## 1 TORNADO 3312276.7
## 2 FLASH FLOOD 1599325.1
## 3 TSTM WIND 1445168.2
## 4 HAIL 1268289.7
## 5 FLOOD 1067976.4
## 6 THUNDERSTORM WIND 943635.6
## 7 LIGHTNING 606932.4
## 8 THUNDERSTORM WINDS 464978.1
## 9 HIGH WIND 342014.8
## 10 WINTER STORM 134699.6
and in descending order are:
Type of Event Total Damage (Millions $)
1 TORNADO 3312276.7 2 FLASH FLOOD 1599325.1 3 TSTM WIND 1445168.2 4 HAIL 1268289.7 5 FLOOD 1067976.4 6 THUNDERSTORM WIND 943635.6 7 LIGHTNING 606932.4 8 THUNDERSTORM WINDS 464978.1 9 HIGH WIND 342014.8 10 WINTER STORM 134699.6
The figure below shows the cost of property and crop damage for the ten most costly event types. It was derived using the following code:
{
worst_damage_types<-head(worst_econ_event_types$EVTYPE,10)
worst_damage<-head(worst_econ_event_types$total_damage,10)
barplot(worst_damage, legend=worst_damage_types, col=c("red","orange", "yellow", "green","blue","purple", "violet", "lavender", "lightblue", "pink"),args.legend=list(x="topright",cex=0.5), las=2, xlab="Event Type", main="Estimated Property and Crop Damage (Thousands of $) in USA\n Caused By the Top Ten Most Costly Types of Storm Events")
}
Tornadoes are the most costly category of storm events. Again there is a problem with the coding of categories. Flood and Flash Flood could be considered one category as could Thunderstorm Wind and Tstm Wind. Flood and thunderstorm winds are the second and third most costly storm event types.