Study to Investigate the Effect of Storms on Public Health in the United States of America.

Abstract

This document reports on the public health and economic impact of major storms in the USA. The questions it answers are:

Which types of storm event have caused the most deaths?

Which types of storm event have cost the most in terms of total damage?

types of storm event have cost the most in terms of damage per event?

Obtaining and Preprocessing the Data

The data were obtained from the following link.

https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

This link contains a bzip file. The R code used to extract the data is

{
  setwd("C:/Users/Warwick/Documents/DataScience/repdata")
  if(!exists("rawStormData"))
    {
    if(!exists("storm.csv.bz2"))
      { 
      download_url<-"http://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
      download.file(download_url, destfile="storm.csv.bz2")
 #unzip(download.file)
      }
    rawStormData <- read.csv("storm.csv.bz2", stringsAsFactors = FALSE, sep="," )
    
    }
 }

I found out how big the dataset is by running dim

{
  
dim(rawStormData)
}
## [1] 902297     37

I found out the columns of interest from dimnames.

{
  look_StormData<-head(rawStormData)
  dimnames(look_StormData)
}
## [[1]]
## [1] "1" "2" "3" "4" "5" "6"
## 
## [[2]]
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

The data columns required are event type, number of fatalities, number of injuries, property damage, expenditure on property damage, crop damage and expenditure on crop damage.

I created a data table for the analysis by selecting the columns above.

{
    library(dplyr)
  storm_table<-tbl_df(rawStormData)
  storm_effects<-select(storm_table, EVTYPE, FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)
  }
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:stats':
## 
##     filter
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Results

Effect of Storms on Public Health

Using the number of fatalities and injuries here are at least two ways of measuring this:

The R-code to find:

  1. the event types with the highest total number of fatalities and injuries over the reporting period; and
  2. the event types with the highest number of fatalities plus injuries per event is :
{
  #Check for missing values for fatalities and injuries.
  length(storm_effects[is.na(storm_effects$FATALITIES)==TRUE])
  length(storm_effects[is.na(storm_effects$INJURIES)==TRUE])
  PERSONDMG=storm_effects$FATALITIES+storm_effects$INJURIES
  storm_damage<-cbind(storm_effects, PERSONDMG)
  by_event_type<-group_by(storm_damage, EVTYPE)
  pop_health<-summarise(by_event_type, total_people=sum(PERSONDMG), mean_people=mean(PERSONDMG))
  worst_event_types<-arrange(pop_health,desc(total_people))
}

The ten types of events that have killed or injured the most people are in given by

{
  worst_totals<-select(worst_event_types, "Type of Event"=EVTYPE, "Deaths and Injuries"=total_people)
  head(worst_totals,10)
  }
## Source: local data frame [10 x 2]
## 
##        Type of Event Deaths and Injuries
## 1            TORNADO               96979
## 2     EXCESSIVE HEAT                8428
## 3          TSTM WIND                7461
## 4              FLOOD                7259
## 5          LIGHTNING                6046
## 6               HEAT                3037
## 7        FLASH FLOOD                2755
## 8          ICE STORM                2064
## 9  THUNDERSTORM WIND                1621
## 10      WINTER STORM                1527

and in descending order are:

   Type of Event Deaths and Injuries

1 TORNADO 96979 2 EXCESSIVE HEAT 8428 3 TSTM WIND 7461 4 FLOOD 7259 5 LIGHTNING 6046 6 HEAT 3037 7 FLASH FLOOD 2755 8 ICE STORM 2064 9 THUNDERSTORM WIND 1621 10 WINTER STORM 1527

The figure below shows this. It was derived using the following code:

{
  worst_types<-head(worst_event_types$EVTYPE,10)
  worst<-head(worst_event_types$total_people,10)
  barplot(worst, legend=worst_types, col=c("red","orange", "yellow", "green","blue","purple", "violet", "lavender", "lightblue", "pink"),args.legend=list(x="topright",cex=0.5), las=2, xlab="Event Type", main="Deaths and Injuries in USA\n Caused By the Top Ten Types of Storm Events")
 
}

There is a problem with the code of event types. Flood and flash flood could be considered as one category as could heat and excess heat, and thunderstorm winds and tstm winds.

The ten types of events with the highest number of deaths and injuries per event can be found by

{
  worst_event_rate_types<-arrange(pop_health,desc(mean_people))
  worst_rates<-select(worst_event_rate_types, "Type of Event"=EVTYPE, "Deaths and Injuries Per Event"=mean_people)
  head(worst_rates,10)
  }
## Source: local data frame [10 x 2]
## 
##                 Type of Event Deaths and Injuries Per Event
## 1                   Heat Wave                      70.00000
## 2       TROPICAL STORM GORDON                      51.00000
## 3                  WILD FIRES                      38.25000
## 4               THUNDERSTORMW                      27.00000
## 5  TORNADOES, TSTM WIND, HAIL                      25.00000
## 6          HIGH WIND AND SEAS                      23.00000
## 7           HEAT WAVE DROUGHT                      19.00000
## 8             SNOW/HIGH WINDS                      18.00000
## 9     WINTER STORM HIGH WINDS                      16.00000
## 10          HURRICANE/TYPHOON                      15.21591

and are: Type of Event Deaths and Injuries 1 Heat Wave 70.00000 2 TROPICAL STORM GORDON 51.00000 3 WILD FIRES 38.25000 4 THUNDERSTORMW 27.00000 5 TORNADOES, TSTM WIND, HAIL 25.00000 6 HIGH WIND AND SEAS 23.00000 7 HEAT WAVE DROUGHT 19.00000 8 SNOW/HIGH WINDS 18.00000 9 WINTER STORM HIGH WINDS 16.00000 10 HURRICANE/TYPHOON 15.21591

Tornadoes have killed or injured by far the greatest number of people for any type of storm event but the mean number of deaths and injuries per event is greater for heat waves, wild fires and thunderstorms with high winds.

Property Damage

The damage caused by storm events the total estimates of damage to property and crops in millions of dollars are combined for each event. Below is the R code for doing this:

{
  ECONDMG=storm_damage$PROPDMG+storm_damage$CROPDMG
  all_storm_damage<-cbind(storm_damage,ECONDMG)
}

The ten types of events that have caused the most costly estimated damage are in given by

{
  by_damage_event_type<-group_by(all_storm_damage, EVTYPE)
  econ_damage<-summarise(by_damage_event_type, total_damage=sum(ECONDMG), mean_damage=mean(ECONDMG))
  worst_econ_event_types<-arrange(econ_damage,desc(total_damage))
  worst_damage_totals<-select(worst_econ_event_types, "Type of Event"=EVTYPE, "Total Damage"=total_damage)
  head(worst_damage_totals,10)
  }
## Source: local data frame [10 x 2]
## 
##         Type of Event Total Damage
## 1             TORNADO    3312276.7
## 2         FLASH FLOOD    1599325.1
## 3           TSTM WIND    1445168.2
## 4                HAIL    1268289.7
## 5               FLOOD    1067976.4
## 6   THUNDERSTORM WIND     943635.6
## 7           LIGHTNING     606932.4
## 8  THUNDERSTORM WINDS     464978.1
## 9           HIGH WIND     342014.8
## 10       WINTER STORM     134699.6

and in descending order are:

    Type of Event Total Damage (Millions $)

1 TORNADO 3312276.7 2 FLASH FLOOD 1599325.1 3 TSTM WIND 1445168.2 4 HAIL 1268289.7 5 FLOOD 1067976.4 6 THUNDERSTORM WIND 943635.6 7 LIGHTNING 606932.4 8 THUNDERSTORM WINDS 464978.1 9 HIGH WIND 342014.8 10 WINTER STORM 134699.6

The figure below shows the cost of property and crop damage for the ten most costly event types. It was derived using the following code:

{
   worst_damage_types<-head(worst_econ_event_types$EVTYPE,10)
  worst_damage<-head(worst_econ_event_types$total_damage,10)
  barplot(worst_damage, legend=worst_damage_types, col=c("red","orange", "yellow", "green","blue","purple", "violet", "lavender", "lightblue", "pink"),args.legend=list(x="topright",cex=0.5), las=2, xlab="Event Type", main="Estimated Property and Crop Damage (Thousands of $) in USA\n Caused By the Top Ten Most Costly Types of Storm Events")

}

Tornadoes are the most costly category of storm events. Again there is a problem with the coding of categories. Flood and Flash Flood could be considered one category as could Thunderstorm Wind and Tstm Wind. Flood and thunderstorm winds are the second and third most costly storm event types.