Sys.setlocale("LC_TIME", "English")

Load libraries for plotting, data processing and imputing.

library(ggplot2)
library(dplyr)
library(kableExtra)
library(knitr)

Synopsis

Every weather event has an impact. Whether small or big it always could change people lives. Some major weather events leads to human deaths, injuries and property damage. This report briefly describes most lethal and damage weather events.

Description of raw data and justification for data transformations

Data source

The data for this report come from project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The database could be downloaded from here: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

Timeline of events

The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

Justification

Most of the Event-types in database filled carelessly. The are no rules (in naming Events) about using capital letters, special characters, words sequence and so on. In result it seems so many different records defined with different events could be decribe the same event. Moreover, many records should be coerced to general type. For example, “flash flood”, “flood”, “flash flood/flood”, “flash flooding”, “flash flooding/flood”, “flash floods”, “flood & heavy rain”, “flood/flash flood”, “flood/flood”, “flooding”, “flooding/flood”, “floods”, “flood/river flood” all would be named “Flood”. But it couldn’t be done until general terms and rules will be agreed. It obvious that “heat wave” and “heat waves” are the same weather events (or “thunderstorm” and “tstm”), but is it possible to coerce “hurricane/typhoon” to “hurricane” or not should be agreed before. For this first draft of the report none of modifications of Event-types were done except converting to lower-case for uniformity.

Data Processing

Download data form source and save it locally “as-is”.

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "StormData.csv.bz2")

Read zipped CSV-file into raw data dataframe using function “read.table” including header. No need to unzip file with name “*.csv.bz2" for “read.table” function.

rawData <- read.table("StormData.csv.bz2", sep = ",", header = TRUE)

Take only six values from raw data dataframe to research target indicators and boost processing. These six are:
1. EVTYPE - weather Event type
2. FATALITIES - number of people deaths due to Event
3. PROPDMG - amount of inflicted damage to property
4. PROPDMGEXP - units of property damage
5. CROPDMG - amount of inflicted damage to crop
6. CROPDMGEXP - units of crop damage

stormData <- rawData[, c(8, 23, 25, 26, 27, 28)]

Lower case all Events for uniformity.

stormData$EVTYPE <- tolower(stormData$EVTYPE)

There are non-consistent units of property and crop damage in raw data. So, it’s necessary to coerce units to common abbreviation. First of all, take a *EXP values and make it string and then find unique variety.

stormData$PROPDMGEXP <- as.character(stormData$PROPDMGEXP)
stormData$CROPDMGEXP <- as.character(stormData$CROPDMGEXP) 
unique(stormData$PROPDMGEXP)
##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(stormData$CROPDMGEXP)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

Then replace all inconsistent units with nothing

stormData[stormData$PROPDMGEXP %in% c("-", "?", "+", "1", "2", "3", "4", "5", "6", "7", "8", "h", "H"), "PROPDMGEXP"] <- ""
stormData[stormData$CROPDMGEXP %in% c("?", "2"), "CROPDMGEXP"] <- ""

Now replace all abbreviations with corresponding exponent according to table:

abbr <- data.frame(Abbreviation = c("k, M", "m, M", "B", ""), Value = c("Thousands", "Millions", "Billions", "Nothing"), Exp = c("3", "6", "9", "0"))
kable(abbr, "html") %>%
kable_styling(bootstrap_options = "striped", full_width = F)
Abbreviation Value Exp
k, M Thousands 3
m, M Millions 6
B Billions 9
Nothing 0
stormData[stormData$PROPDMGEXP == "M", "PROPDMGEXP"] <- "6"
stormData[stormData$PROPDMGEXP == "m", "PROPDMGEXP"] <- "6"
stormData[stormData$PROPDMGEXP == "K", "PROPDMGEXP"] <- "3"
stormData[stormData$PROPDMGEXP == "B", "PROPDMGEXP"] <- "9"
stormData[stormData$PROPDMGEXP == "", "PROPDMGEXP"] <- "0"
stormData[stormData$CROPDMGEXP == "M", "CROPDMGEXP"] <- "6"
stormData[stormData$CROPDMGEXP == "m", "CROPDMGEXP"] <- "6"
stormData[stormData$CROPDMGEXP == "K", "CROPDMGEXP"] <- "3"
stormData[stormData$CROPDMGEXP == "k", "CROPDMGEXP"] <- "3"
stormData[stormData$CROPDMGEXP == "B", "CROPDMGEXP"] <- "9"
stormData[stormData$CROPDMGEXP == "", "CROPDMGEXP"] <- "0"

Now make all units numeric, then multiple property and crop damage value by 10 to the power of units.

stormData$PROPDMGEXP <- as.numeric(stormData$PROPDMGEXP)
stormData$CROPDMGEXP <- as.numeric(stormData$CROPDMGEXP)
stormData$propertyDamage <- stormData$PROPDMG * (10 ^ stormData$PROPDMGEXP)
stormData$cropDamage <- stormData$CROPDMG * (10 ^ stormData$CROPDMGEXP)

Drop the units values hence no more need in them. Also, make Event-type value factors.

stormData <- stormData[, -c(3, 4, 5, 6)]
stormData$EVTYPE <- as.factor(stormData$EVTYPE)

Results

Fatalities caused by weather Events

Take a processed Dataframe and group Event-types then summarise deaths for each Event-type and sort asceding by Event-type. At the end remove all zero-fatalities Events.

fatalEvents <- stormData %>% 
    group_by(EVTYPE) %>% 
    summarise(total = sum(FATALITIES)) %>% 
    arrange(desc(total))
fatalEvents <- fatalEvents[fatalEvents$total != 0,]

Show the number of Events causes deaths.

nrow(fatalEvents)
## [1] 160

Take Top-10 most harmfull Events and make a barplot

g1 <- ggplot(data = fatalEvents[1:10,], aes(x = EVTYPE, y = total))
g1 + 
    geom_bar(stat = "identity", fill="steelblue") + 
    geom_text(aes(label = total), vjust = -0.3, size = 3.5) + 
    theme_minimal() + 
    theme(axis.text.x = element_text(angle=90)) +
    labs(y = "Victims", x = "Event type", title = "Most lethal weather events (Top-10)")

Obviously, Tornado is the most lethal weather Event causes more than 5,500 deaths.

Damage caused by weather Events

Take a processed Dataframe and group Event-types then summarise damage for each Event-type and sort asceding by Event-type. At the end remove all zero-fatalities Events.

totalDamage <- stormData %>% 
    group_by(EVTYPE) %>% 
    summarise(total = sum(propertyDamage) + sum(cropDamage)) %>%
    arrange(desc(total))
totalDamage <- totalDamage[totalDamage$total != 0,]

Show the number of Events causes deaths.

nrow(totalDamage)
## [1] 397

Take Top-10 most damage Events and make a barplot

g2 <- ggplot(data = totalDamage[1:10,], aes(x = EVTYPE, y = total))
g2 + 
    geom_bar(stat = "identity", fill="steelblue") + 
    geom_text(aes(label = sprintf("%1.2e", total)), vjust = -0.3, size = 3.5) + 
    theme_minimal() + 
    theme(axis.text.x = element_text(angle=90)) +
    labs(y = "Damage", x = "Event type", title = "Events with most economic effect (Top-10)")

Most economic damage caused by Tornado.