Assignment

This assignment is done with an intention to explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and answer some basic questions about severe weather events.

Synopsis

The National Oceanic and Atmospheric Administration (NOAA) maintains a public database for storm event. The data contains the type of storm event, details like location, date, estimates for damage to property as well as the number of human victims of the storm. In this report we investigate which type of events are the most harmful to the population and financially.

As global climent change makes the weather more volatile, our plans for protecting ourselves against the variability of future weather depends on having knowledge about costs of weather in the past. The events in this database start in the year 1950 and end in November 2011. The analysis found that tornadoes cause the most bodily harm, and flooding causes the most property damage

The conclusion is that the impact on humans, be it injuries or fatalities, isn’t directly correlated to the ecomomic damage weather events cause. Tornado’s are by far the highest cause for injuries, and second in fatalities, whilst heat & drought cause the most fatalities, but fourth in injuries. Both are in the top 5 of injuries & fatalities next to Thunderstorms, Flooding and Snow & Ice. In economic damages, only the property damage really factors in the total damage, except for Heat & Drought where more than 90% of damages is determined by crop damage. The #1 & #2 of weather damage sources, resp. Flooding & High Surf and Wind & Storm cover more than 80% of all economic cost, while Wind & Storm aren’t even in the top 5 of victims.

Data Processing

suppressWarnings(library(plyr))
library(knitr)
suppressWarnings(library(ggplot2))

setwd("D:/coursera/reproducible/final assignment")

system.time(df <- read.csv(bzfile("repdata_data_StormData.csv.bz2"), 
                           header = TRUE, 
                           #quote = "", 
                           strip.white=TRUE,
                           stringsAsFactors = FALSE))
str(df)
colnames(df)

The data was downloaded from earlier and loaded from locally saved folder.

There are 902,297 weather events described in this dataset, which includes location, date, and type of the event, as well as a count of fatalities and injuries and an estimate of property and crop damage.

In this study, it’s assumed that harmful events with respect to population health comes from variables FATALITIES and INJURIES.

Select useful data

df <- df[ , c("EVTYPE", "BGN_DATE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
str(df)
dim(df)
length(unique(df$EVTYPE))

head(df$BGN_DATE)
df$BGN_DATE <- as.POSIXct(df$BGN_DATE,format="%m/%d/%Y %H:%M:%S")
head(df$BGN_DATE)
unique(df$PROPDMGEXP)
unique(df$CROPDMGEXP)

Create new variables: TOTALPROPDMG, TOTALCROPDMG and TOTALDMG with: TOTALDMG = (TOTALPROPDMG + TOTALCROPDMG)

tmpPROPDMG <- mapvalues(df$PROPDMGEXP,
                         c("K","M","", "B","m","+","0","5","6","?","4","2","3","h","7","H","-","1","8"), 
                         c(1e3,1e6, 1, 1e9,1e6,  1,  1,1e5,1e6,  1,1e4,1e2,1e3,  1,1e7,1e2,  1, 10,1e8))

tmpCROPDMG <- mapvalues(df$CROPDMGEXP,
                         c("","M","K","m","B","?","0","k","2"),
                         c( 1,1e6,1e3,1e6,1e9,1,1,1e3,1e2))
#colnames(df)
df$TOTAL_PROPDMG <- as.numeric(tmpPROPDMG) * df$PROPDMG
df$TOTAL_CROPDMG <- as.numeric(tmpCROPDMG) * df$CROPDMG
colnames(df)
remove(tmpPROPDMG)
remove(tmpCROPDMG)

df$TOTALDMG <- df$TOTAL_PROPDMG + df$TOTAL_CROPDMG
head(unique(df$EVTYPE))

Results

  1. Population health impact Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
summary1 <- ddply(df,"EVTYPE", summarize, propdamage = sum(TOTALDMG), injuries= sum(INJURIES), fatalities = sum(FATALITIES), persdamage = sum(INJURIES)+sum(FATALITIES))

summary1 <- summary1[order(summary1$propdamage, decreasing = TRUE),]
#head(summary1,10)
#tmp = head(summary1,10)

summary2 <- summary1[order(summary1$persdamage, decreasing = TRUE),]
#head(summary2,10)

plot2 <- ggplot(data=head(summary2,10), aes(x=EVTYPE, y=persdamage, fill=persdamage)) + 
  geom_bar(stat="identity",position=position_dodge()) +
  labs(x = "event type", y = "personal damage (injuries and fatalities)") + 
  scale_fill_gradient("personal damage", low = "lightblue", high = "darkblue") + 
  ggtitle("Most harmful events on Public Health") +
  theme(axis.text.x = element_text(angle=90, hjust=1))
print(plot2)

From the above figure we can see that TORNADOES have the most significant impact on public health.

  1. Across the United States, which types of events have the greatest economic consequences?
plot1 <- ggplot(data=head(summary1,10), aes(x=EVTYPE, y=propdamage, fill=propdamage)) + 
  geom_bar(stat="identity",position=position_dodge()) +
  labs(x = "event type", y = "property damage (in $USD)")  +
  scale_fill_gradient("$USD", low = "lightblue", high = "darkblue") +
  ggtitle("Events with greatest economic consequences on the U.S.") +
  theme(axis.text.x = element_text(angle=90, hjust=1))
print(plot1)

The FLOODS, HURRICANES/TYPHOONES and TORNADOES are the events with the greatest economic consequences