Author: Cleverson

Synopsis

The aim of this report is to calculate the total of fatalities, injuries and economic damages caused by the different storm, wheater and meteorological events that ocurred from 1950 to 2011 in the United States. The data for this analysis were collected from NOAA (National Oceanic and Atmospheric Administration). Only events with correct event type entries, see “National Weather Service Instruction - Storm Data Prepapartion”, were considered on this report. Tornados were the most harmful events regarding population health. On the other hand, Floods had the highest economic damages.

Data Processing

Loading the needed libraries

library(dplyr) 
library(ggplot2)
library(plyr) 
library(magrittr) 
library(reshape2) 

File should be downloaded on: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"

Reading the file and loading the data

storm_data<-read.csv("repdata%2Fdata%2FStormData.csv.bz2")

In several entries, thunderstorm was entered by its abbreviation (TSTM) as event type. The gsub command below replaces the abbreviation by Thunderstorm and consequently retrieves more valid observations. Upper letters are used to uniform the event type entries and facilitate the joining with event data.frame, which is created with event names described on Storm Data Event Table (page 6) of “NATIONAL WEATHER SERVICE INSTRUCTION - STORM DATA PREPARATION”). The objective is to only keep event types entries with corrected event name. Some entries had two event types for the same observation, for example.

storm_data$EVTYPE<-as.factor(gsub("TSTM","Thunderstorm",storm_data$EVTYPE, ignore.case=TRUE))
storm_data$EVTYPE<-as.factor(toupper(storm_data$EVTYPE))
event_name<- data.frame(EVTYPE=c("Astronomical Low Tide","Avalanche", "Blizzard",
                              "Coastal Flood", "Cold/Wind Chill", "Debris Flow",
                              "Dense Fog","Dense Smoke","Drought","Dust Devil",
                              "Dust Storm","Excessive Heat","Extreme Cold/Wind Chill",
                              "Flash Flood","Flood","Frost/Freeze","Funnel Cloud",
                              "Freezing Fog","Hail","Heat","Heavy Rain","Heavy Snow",
                              "High Surf","High Wind","Hurricane (Typhoon)","Ice Storm",
                              "Lake-Effect Snow","Lakeshore Flood","Lightning",
                              "Marine Hail","Marine High Wind","Marine Strong Wind",
                              "Marine Thunderstorm Wind","Rip Current","Seiche",
                              "Sleet","Storm Surge/Tide","Strong Wind","Thunderstorm Wind",
                              "Tornado","Tropical Depression","Tropical Storm",
                              "Tsunami","Volcanic Ash","Waterspout","Wildfire",
                              "Winter Storm","Winter Weather"))
event_name$EVTYPE<-as.factor(toupper(event_name$EVTYPE))
storm_eventname<-join(storm_data,event_name, type="inner")
storm_eventname$EVTYPE<-droplevels(storm_eventname$EVTYPE)

The whole data (storm_data) had 902297 observations. However, the data taking into account correct event type entries (storm_eventname) had 861468 observations. So, 95% of obervations could be kept, which are more than enough to run the analysis.

Population Most Harmful Events

Date, event type, fatalities and injuries were selected to perform the analysis of events that were most harmfull in respect to population health. Only obervations that caused at least one fatality or one injury remained.

health_harmful<-storm_eventname[,c("BGN_DATE","EVTYPE","FATALITIES","INJURIES")]
health_harmful<-filter(health_harmful,health_harmful[,"FATALITIES"]!=0 
                                | health_harmful[,"INJURIES"]!=0)
health_harmful$EVTYPE<-droplevels(health_harmful$EVTYPE)

To find out the most harmfull events, data are grouped and summaries are created by calculating the total number of fatalities and injuries per each event type.

fatalities<-dplyr::summarise(group_by(health_harmful,EVTYPE),FATALITIES=sum(FATALITIES,
                                                                            na.rm =
                                                                              TRUE))
fatalities<-fatalities[order(fatalities$FATALITIES,decreasing = TRUE),]
injuries<-dplyr::summarise(group_by(health_harmful,EVTYPE),INJURIES=sum(INJURIES,
                                                                        na.rm = TRUE))
injuries<-injuries[order(injuries$INJURIES,decreasing = TRUE),]

The Greatest Economic Consequences Events

Date, event type, property and crop damages were selected to perform tha analysis of events that caused the greatest economic impact. Only obervations that had any property or crop damage remained.

economy_damage<-storm_eventname[,c("BGN_DATE","EVTYPE","PROPDMG","PROPDMGEXP",
                                   "CROPDMG","CROPDMGEXP")]
economy_damage<-filter(economy_damage,economy_damage[,"PROPDMG"]!=0 |
                         economy_damage[,"CROPDMG"]!=0)
economy_damage<-droplevels(economy_damage)

Two variables (PROPDMGEXP and CROPDMGEXP) are alphabetical characters used to mean magnitudes: “K” for thousands, “M” for millions, and “B” for billions. So, these two variables were replaced by numerical values to facilitate in calculating the economic damage.

economy_damage$PROPDMGEXP<-toupper(economy_damage$PROPDMGEXP)
economy_damage$CROPDMGEXP<-toupper(economy_damage$CROPDMGEXP)
economy_damage<-filter(economy_damage,economy_damage[,"PROPDMGEXP"]=="K" |
                         economy_damage[,"PROPDMGEXP"]=="M" |
                         economy_damage[,"PROPDMGEXP"]=="B" |
                         economy_damage[,"CROPDMGEXP"]=="K" |
                         economy_damage[,"CROPDMGEXP"]=="M" |
                         economy_damage[,"CROPDMGEXP"]=="B")
economy_damage$PROPDMGEXP %<>%
  gsub("K","1000",.) %>%
  gsub("M","1000000",.) %>%
  gsub("B","1000000000",.)
economy_damage$CROPDMGEXP %<>%
  gsub("K","1000",.) %>%
  gsub("M","1000000",.) %>%
  gsub("B","1000000000",.)
economy_damage$PROPDMGEXP<-as.numeric(economy_damage$PROPDMGEXP)
economy_damage$CROPDMGEXP<-as.numeric(economy_damage$CROPDMGEXP)
economy_damage$PROPDMG<-(economy_damage$PROPDMG * economy_damage$PROPDMGEXP) 
economy_damage$CROPDMG<-(economy_damage$CROPDMG * economy_damage$CROPDMGEXP)

To find out the events with highest economic damages on properties and crops, data are grouped and summaries are created by calculating the total property and crop economic damages.

property<-dplyr::summarise(group_by(economy_damage,EVTYPE),PROPDMG = sum(PROPDMG,
                                                                       na.rm =
                                                                         TRUE ))
property<-property[order(property$PROPDMG,decreasing = TRUE),]
crop<-dplyr::summarise(group_by(economy_damage,EVTYPE),CROPDMG = sum(CROPDMG,
                                                                   na.rm =
                                                                     TRUE))
crop<-crop[order(crop$CROPDMG,decreasing = TRUE),]

Results

The graphs show the most harmfull events regarding poupulation health.

plot_fatalities<-ggplot(filter(fatalities,FATALITIES > 0), aes(x=reorder(EVTYPE, 
                                                                         FATALITIES)
                                                               ,FATALITIES)) +
  geom_bar(stat="identity",aes(fill=FATALITIES)) + theme(axis.text.y = element_text()) +
  geom_text(aes(label=FATALITIES), size=2.5, hjust=0) + coord_flip() + xlab("Events") +
  ylab("Total of Fatalities") + scale_y_continuous(limits = c(0, 6000)) + labs(fill = "")
plot_injuries<-ggplot(filter(injuries,INJURIES > 0), aes(x=reorder(EVTYPE,INJURIES),
                                                         INJURIES)) +
  geom_bar(stat="identity",aes(fill=INJURIES)) + theme(axis.text.y = element_text()) +
  geom_text(aes(label=INJURIES), size=2.5, hjust=0) + coord_flip() + xlab("Events") +
  ylab("Total of Injuries") + scale_y_continuous(limits = c(0, 100000)) + labs(fill = "")

Fatalities

Injuries

Tornados were the events responsible for the majority of fatalities and injuries in the United States from 1950 to 2011. Thunderstorm winds, floods - flash floods, execessive heat - heat and lightnings were also very harmfull events regarding population health.

The graph below shows now the events with highest economic impact that caused at least 1 Billion USD in damages (property plus crop).

plot_damage<-merge(property,crop) %>%
  filter((PROPDMG+CROPDMG) > 1e+09) %>%
  melt(id.vars="EVTYPE") %>%
  ggplot(aes(x=reorder(EVTYPE,value), y = value/1e+06, fill = factor(variable,
                                                                 levels=c("CROPDMG",
                                                                          "PROPDMG")))) +
  geom_bar(stat="identity") + theme(axis.text.x = element_text(angle = 45,hjust = 1)) +
  labs(fill="Damage") + xlab("Events") + ylab("Damage in Millions of USD")

Property and Crop damages

Floods were the responsible for the majority of property damages. Tornados, and Hails had an important economic impact on properties as well. Droughts were the responsible for the majority of crop damages. However, Floods were the events that combined (property and crop) caused the highest economic impact in the United States from 1950 to 2011.