Synopsis

The analysis uses data provided by the National Oceanic and Atmospheric Administration (NOAA).The data for this analysis was collected between the years 1950 and 2011.However only events after the year 1996 is taken into account as only few events were recorded in earlier years.This analysis aims to find out the different weather events that are most harmful to the health of the population.Also the effects of such events on the country’s economy is evaluated.The analysis produces two lists containing the rankings of the weather events that were most dangerous to public health and those that affected the country’s economy.

Introduction

Weather events in USA often have negative consequences on population health and country’s economy.The government needs to measures in tackling such disasters.However,they require reproducible reports to base these measures on.Two main insights provided by this analysis is:

  1. Weather events having worst impact on population health in USA.
  2. Weather events causing economic downfalls in USA. This study is based on data collected between 1996 and 2011.

The steps of the study are:

  1. Collecting the raw data 2.Cleaning the raw data to produce a tidy dataset 3.Creare two sets from tidy set based on the factors(variables) that determine population health and economy. 4.Totaling the damage caused by each event in both subsets and sorting both the subsets. 5.The 10 events that have most impact on health and exonomy are selected.

Data

The data set used in this analysis is provided by the National Oceanic and Atmospheric Administration (NOAA)(https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2).

Data processing

Loading the data

The required R packages are loaded.

library(R.utils)
library(plyr)
library(ggplot2)

If the data set hasn’t been downloaded, this chunk downloads it to the working directory.

if (!file.exists("stormdata.csv")) {
  
  file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  file_name <- "stormdata.csv.bz2"
  download.file(file_url, file_name, method = "curl")
  
  bunzip2(file_name)
}

The data set is loaded into org_storm_data.

 org_storm_data <- read.csv('stormdata.csv')

org_storm_data has the following variables.

names(org_storm_data)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

This analysis takes the following variables into consideration:

  • BGN_DATE: a date variable,for subsetting the data set for observations between 1996 and 2011.
  • EVTYPE: a variable indicating the event type
  • FATALATIES: a variable indicating the number of fatalities caused by the particular observation.Used for impact on population health.
  • INJURIES: a variable indicating the number of injuries caused by the particular observation.Used for impact on population health.
  • PROPDMG: a variable indicating the estimated monetary damage to property caused by particular event.Used to calculate damage to economy.
  • PROPDMGEXP: a variable indicating the multiplier for PROPDMG; can be “K” for 1,000, “M” for 1,000,000 or “B” for 1,000,000,000
  • CROPDMG: a variable indicating the estimated monetary value of damage to agricultural property (crops) caused by the particular observations.Used to calculate damage to economy.
  • CROPDMGEXP: a variable indicating the multiplier for CROPDMG; can be “K” for 1,000, “M” for 1,000,000 or “B” for 1,000,000,000

Cleaning the data

The initial data set, org_storm_data is subset by the above variables, and results in storm_data.

storm_data <- org_storm_data[, c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Subsetting storm_data to include events after 1996.

storm_data$BGN_DATE <- as.Date(as.character(storm_data$BGN_DATE), "%m/%d/%Y %H:%M:%S")
storm_data <- subset(storm_data, format(storm_data$BGN_DATE, "%Y") > 1996 )

storm_data is also subset to include only event types as defined by NWS Directive 10-1605. The storm_events list holds all the event types defined in NWS Directive 10-1605, plus event types that are constituent parts of an event type with a slash character (e.g “Cold/Wind Chill” also results in “Cold” and “Wind Chill”).

storm_events <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Cold", "Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Extreme Cold", "Flash Flood", "Flood", "Freezing Fog", "Frost/Freeze", "Frost", "Freeze", "Funnel Cloud", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane/Typhoon", "Hurricane", "Typhoon", "Ice Storm", "Lakeshore Flood", "Lake-Effect Snow", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")

The EVTYPE variable is first converted to upper case to ensure consistent matching, and then converted to a factor.storm_data is subset for all event types included in storm_events, also converted to upper case to ensure consistent matching. Finally the redundant factor levels are dropped from EVTYPE.

storm_data$EVTYPE <- factor(toupper(storm_data$EVTYPE))
storm_data <- subset(storm_data, (storm_data$EVTYPE %in% toupper(storm_events)))
droplevels(storm_data$EVTYPE)

Subsetting data with harmful impact on population health

Only FATALATIES and INJURIES variables having atleast 1 count are accounted. storm_data is thus subset to include only observations where FATALATIES and INJURIES are greater than 0. The resultant sub set is stored in storm_data_health.

storm_data_health <- subset(storm_data, storm_data$FATALITIES > 0 | storm_data$INJURIES > 0 )

For analysing consequences on population health, only EVTYPE, FATALATIES and INJURIES will be considered.

storm_data_health <- storm_data_health[,c("EVTYPE", "FATALITIES", "INJURIES")]

To approximate to total effect of the consequence of an observation on population health, FATALATIES and INJURIES are summed together in a HARM variable.

storm_data_health$HARM <- storm_data_health$FATALITIES + storm_data_health$INJURIES

HARM is then summed together per EVTYPE and stored in storm_data_harm_by_event

storm_data_harm_by_event <- ddply(storm_data_health, .(EVTYPE), numcolwise(sum))
head(storm_data_harm_by_event)
##            EVTYPE FATALITIES INJURIES HARM
## 1       AVALANCHE        218      151  369
## 2        BLIZZARD         47      215  262
## 3   COASTAL FLOOD          3        2    5
## 4            COLD         15       12   27
## 5 COLD/WIND CHILL         95       12  107
## 6       DENSE FOG          9      143  152

Subsetting data with negative consequences for the economy

In order to determine the values of observations from the PROPDMG and CROPDMG variables, the multiplier (PROPDMGEXP and CROPDMGEXP respectively) needs to be known. storm_data is thus firstly subset to include only observations where PROPDMGEXP and CROPDMGEXP are not missing. The result is stored in storm_data_for_economy.

storm_data_for_economy <- subset(storm_data, storm_data$PROPDMGEXP != "" & storm_data$CROPDMGEXP != "" )

Secondly, for observations of the PROPDMG and CROPDMG variables to be valuable for determining negative consequences on the economy, they have to be greater than 0. storm_data_for_economy is thus subset to include only observations where PROPDMG and CROPDMG are greater than 0.

storm_data_for_economy <- subset(storm_data_for_economy, storm_data_for_economy$PROPDMG > 0 | storm_data_for_economy$CROPDMG > 0 )

For analysing consequences on the economy, only EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP will be considered.

storm_data_for_economy <- storm_data_for_economy[,c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

In order to work with full amounts in PROPDMG and CROPDMG, two new variables will be created (PROPDMGFULL and CROPDMGFULL respectively), which will be the result of multiplying PROPDMG with PROPDMGEXP and multiplying CROPDMG with CROPDMGEXP respectively.

PROPDMGEXP and CROPDMGEXP are first converted to upper case to ensure consistent matching.

storm_data_for_economy$PROPDMGEXP <- factor(toupper(storm_data_for_economy$PROPDMGEXP))
storm_data_for_economy$CROPDMGEXP <- factor(toupper(storm_data_for_economy$CROPDMGEXP))

PROPDMGFULL and CROPDMGFULL are calculated based on the following rules:

  • If PROPDMGEXP or CROPDMGEXP is “K”, the multiplier is 1,000 (e.g. ’PROPDMGFULL = PROPDMG * 1000`).
  • If PROPDMGEXP or CROPDMGEXP is “M”, the multiplier is 1,000,000 (e.g. ’PROPDMGFULL = PROPDMG * 1000000`).
  • If PROPDMGEXP or CROPDMGEXP is “B”, the multiplier is 1,000,000,000 (e.g. ’PROPDMGFULL = PROPDMG * 1000000000`).
storm_data_for_economy$PROPDMGFULL <- ifelse(storm_data_for_economy$PROPDMGEXP == "K", storm_data_for_economy$PROPDMG * 1000, ifelse(storm_data_for_economy$PROPDMGEXP == "M",  storm_data_for_economy$PROPDMG * 1000000, ifelse(storm_data_for_economy$PROPDMGEXP == "B",  storm_data_for_economy$PROPDMG * 1000000000, 0)))
storm_data_for_economy$CROPDMGFULL <- ifelse(storm_data_for_economy$CROPDMGEXP == "K", storm_data_for_economy$CROPDMG * 1000, ifelse(storm_data_for_economy$CROPDMGEXP == "M",  storm_data_for_economy$CROPDMG * 1000000, ifelse(storm_data_for_economy$CROPDMGEXP == "B",  storm_data_for_economy$CROPDMG * 1000000000, 0)))

To approximate to total effect of the consequence of an observation on the economy, PROPDMGFULL and CROPDMGFULL are summed together in a DAMAGE variable.

storm_data_for_economy$DAMAGE <- storm_data_for_economy$PROPDMGFULL + storm_data_for_economy$CROPDMGFULL

DAMAGE is then summed together per EVTYPE and stored in storm_data_for_economy_grouped_per_event_type

storm_data_for_economy_grouped_per_event_type <- ddply(storm_data_for_economy, .(EVTYPE), numcolwise(sum))
head(storm_data_for_economy_grouped_per_event_type)
##                  EVTYPE  PROPDMG CROPDMG PROPDMGFULL CROPDMGFULL    DAMAGE
## 1 ASTRONOMICAL LOW TIDE   320.00       0      320000           0    320000
## 2             AVALANCHE   287.90       0     2385800           0   2385800
## 3              BLIZZARD 10709.80      67    39481000     7060000  46541000
## 4         COASTAL FLOOD  7340.96       0   167580560           0 167580560
## 5       COLD/WIND CHILL  1990.00     600     1990000      600000   2590000
## 6             DENSE FOG  2842.00       0     2842000           0   2842000

Results

Population health

A ranking can be created from storm_data_for_harmfulness_grouped_per_event_type by ordering the data set by HARMFULNESS in a descending order (i.e. events causing more fatalities and injuries will be at the top).

top_10_events_for_harmfulness <-storm_data_harm_by_event[order(storm_data_harm_by_event$HARM, decreasing = TRUE), ][1:10, ]
print(top_10_events_for_harmfulness[, c("EVTYPE", "HARM")])
##               EVTYPE  HARM
## 33           TORNADO 21447
## 10    EXCESSIVE HEAT  8093
## 14             FLOOD  7121
## 26         LIGHTNING  4424
## 13       FLASH FLOOD  2425
## 32 THUNDERSTORM WIND  1530
## 18              HEAT  1459
## 24 HURRICANE/TYPHOON  1339
## 39      WINTER STORM  1209
## 22         HIGH WIND  1181

For a more visual effect, a smaller ranking can be created in a similar fashion and plotted. The surfaces represent the number of fatalities and injuries caused by the particular weather events in the U.S between 1994 and 2011.

top_5_events_for_harmfulness <- storm_data_harm_by_event[order(storm_data_harm_by_event$HARM, decreasing = TRUE), ][1:5, ]
ggplot(top_5_events_for_harmfulness) + aes(x = factor(1), y = HARM, fill = factor(EVTYPE), order = HARM) + geom_bar(stat = "identity") + coord_polar(theta = "y") + labs(title = 'Five most dangerous weather types in the U.S.', x = "", y = "", fill = "Event types")

Economy

A ranking can be created from storm_data_for_economy_grouped_per_event_type by ordering the data set by DAMAGE in a descending order (i.e. events causing more damage to properties and crops will be at the top).

top_10_events_for_damage <- storm_data_for_economy_grouped_per_event_type[order(storm_data_for_economy_grouped_per_event_type$DAMAGE, decreasing = TRUE), ][1:10, ]
print(top_10_events_for_damage[, c("EVTYPE", "DAMAGE")])
##               EVTYPE       DAMAGE
## 15             FLOOD 136877233900
## 27 HURRICANE/TYPHOON  29348167800
## 40           TORNADO  16203902150
## 26         HURRICANE  11474663000
## 20              HAIL   9172124220
## 14       FLASH FLOOD   8246133530
## 39 THUNDERSTORM WIND   3780985440
## 46          WILDFIRE   3684468370
## 25         HIGH WIND   2873328540
## 42    TROPICAL STORM   1496252350

For a more visual effect, a smaller ranking can be created in a similar fashion and plotted. The surfaces represent the monetary value of damage caused to property and crops by weather in the U.S (in U.S. dollars) between 1994 to 2011.

top_5_events_for_damage <- storm_data_for_economy_grouped_per_event_type[order(storm_data_for_economy_grouped_per_event_type$DAMAGE, decreasing = TRUE), ][1:5, ]
ggplot(top_5_events_for_damage) + aes(x = factor(1), y = DAMAGE, fill = factor(EVTYPE)) + geom_bar(stat = "identity") + coord_polar(theta = "y") + labs(title = 'Five most damaging weather types in the U.S', x = "", y = "", fill = "Event types")

Conclusion

Study shows that excessive heat, floods, lightning and tornadoes has most negative consequences on population health in the United States, whereas floods, hail, hurricanes and tornadoes has most negative consequences on the economy of the United Staate