Synopsis

The code evaluates the harmfulness of natural events to human health and economy. The database used in this evaluation is maintained by National Oceanic and Atmospheric Admistration (NOOA) (1950-2011).

Data Processing

Data preprocessing included reading the data from its original source online.

library(lubridate)

# Reading the data
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormdata")
data <- read.csv("stormdata",header=TRUE, sep=",")

# There are 902297 records and 37 variables in the data 
dim(data)
## [1] 902297     37

Data manipulation & decisions for subsetting the dataset included multiple phases. Because from 1996 onwards, all event types have been recorded to the database, a subset was chosen to be include year 1996-2011.
Variables STATE__, EVTYPE, BGN_DATE, END_DATE, BGN_YEAR, END_YEAR, PROPDMG, PROPDMEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES are included in the subset data.

# Changing variables of 'begin date' and 'end date' from factor to Date format

data$BGN_DATE <- mdy_hms(data$BGN_DATE)
data$END_DATE <- mdy_hms(data$END_DATE)
data$BGN_YEAR <- year(data$BGN_DATE)
data$END_YEAR <- year(data$END_DATE)

# Check occurrences of events per year
table(data$BGN_YEAR)
## 
##  1950  1951  1952  1953  1954  1955  1956  1957  1958  1959  1960  1961  1962 
##   223   269   272   492   609  1413  1703  2184  2213  1813  1945  2246  2389 
##  1963  1964  1965  1966  1967  1968  1969  1970  1971  1972  1973  1974  1975 
##  1968  2348  2855  2388  2688  3312  2926  3215  3471  2168  4463  5386  4975 
##  1976  1977  1978  1979  1980  1981  1982  1983  1984  1985  1986  1987  1988 
##  3768  3728  3657  4279  6146  4517  7132  8322  7335  7979  8726  7367  7257 
##  1989  1990  1991  1992  1993  1994  1995  1996  1997  1998  1999  2000  2001 
## 10410 10946 12522 13534 12607 20631 27970 32270 28680 38128 31289 34471 34962 
##  2002  2003  2004  2005  2006  2007  2008  2009  2010  2011 
## 36293 39752 39363 39184 44034 43289 55663 45817 48161 62174
# From 1996 onwards, all event types have been recorded which also shows in numbers of events recorded. 

# Subset of the date is decided to include years 1996-2011. Only variables of STATE__, EVTYPE, BGN_DATE, END_DATE, BGN_YEAR, END_YEAR, PROPDMG, PROPDMEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES

myvars <- c("STATE__", "EVTYPE", "BGN_DATE", "END_DATE", "BGN_YEAR", "END_YEAR", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "FATALITIES", "INJURIES")

data9611 <- data[data$BGN_YEAR >= 1996, ]

subdata <- data9611[myvars]
dim(subdata)
## [1] 653530     12

The expoment values had to be change to correspond correct numbers (e.g. billions (b -> 9)). After that, the total values of damages could be calculated.

# There are 48 different event types in the data
EventTypes <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")

# The event types in EVTYPE variable have multiple issues that need to be cleaned: 

# Switch all text to lower case 
subdata$EVTYPE <- tolower(subdata$EVTYPE)