The code evaluates the harmfulness of natural events to human health and economy. The database used in this evaluation is maintained by National Oceanic and Atmospheric Admistration (NOOA) (1950-2011).
Data preprocessing included reading the data from its original source online.
library(lubridate)
# Reading the data
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "stormdata")
data <- read.csv("stormdata",header=TRUE, sep=",")
# There are 902297 records and 37 variables in the data
dim(data)
## [1] 902297 37
Data manipulation & decisions for subsetting the dataset included multiple phases. Because from 1996 onwards, all event types have been recorded to the database, a subset was chosen to be include year 1996-2011.
Variables STATE__, EVTYPE, BGN_DATE, END_DATE, BGN_YEAR, END_YEAR, PROPDMG, PROPDMEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES are included in the subset data.
# Changing variables of 'begin date' and 'end date' from factor to Date format
data$BGN_DATE <- mdy_hms(data$BGN_DATE)
data$END_DATE <- mdy_hms(data$END_DATE)
data$BGN_YEAR <- year(data$BGN_DATE)
data$END_YEAR <- year(data$END_DATE)
# Check occurrences of events per year
table(data$BGN_YEAR)
##
## 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 1960 1961 1962
## 223 269 272 492 609 1413 1703 2184 2213 1813 1945 2246 2389
## 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975
## 1968 2348 2855 2388 2688 3312 2926 3215 3471 2168 4463 5386 4975
## 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988
## 3768 3728 3657 4279 6146 4517 7132 8322 7335 7979 8726 7367 7257
## 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001
## 10410 10946 12522 13534 12607 20631 27970 32270 28680 38128 31289 34471 34962
## 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011
## 36293 39752 39363 39184 44034 43289 55663 45817 48161 62174
# From 1996 onwards, all event types have been recorded which also shows in numbers of events recorded.
# Subset of the date is decided to include years 1996-2011. Only variables of STATE__, EVTYPE, BGN_DATE, END_DATE, BGN_YEAR, END_YEAR, PROPDMG, PROPDMEXP, CROPDMG, CROPDMGEXP, FATALITIES, INJURIES
myvars <- c("STATE__", "EVTYPE", "BGN_DATE", "END_DATE", "BGN_YEAR", "END_YEAR", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP", "FATALITIES", "INJURIES")
data9611 <- data[data$BGN_YEAR >= 1996, ]
subdata <- data9611[myvars]
dim(subdata)
## [1] 653530 12
The expoment values had to be change to correspond correct numbers (e.g. billions (b -> 9)). After that, the total values of damages could be calculated.
# There are 48 different event types in the data
EventTypes <- c("Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood", "Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke", "Drought", "Dust Devil", "Dust Storm", "Excessive Heat", "Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze", "Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain", "Heavy Snow", "High Surf", "High Wind", "Hurricane (Typhoon)", "Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning", "Marine Hail", "Marine High Wind", "Marine Strong Wind", "Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet", "Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado", "Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash", "Waterspout", "Wildfire", "Winter Storm", "Winter Weather")
# The event types in EVTYPE variable have multiple issues that need to be cleaned:
# Switch all text to lower case
subdata$EVTYPE <- tolower(subdata$EVTYPE)