The Health and Economic Effects of the Severe Weather Events

Summarize:

This document consists of the loading, processing and statistical result of the NOAA Storm Database using R programing language.

Synopsis:

The analysis is intended to find the most harmuful event of health and economy. For the health event, the injuries and fatalities data are used to conduct the analysis, and, in terms of the quantity of events, tornado is found to be the most harmuful event for health. For the economic event, the monetary damage amounts of property and crops, indicated by the variable PROPDMG and CROPDMG, are used to conduct the analysis regarding the most harmulf weather event. After the analysis, flood is found to be the most harmful evnet for economy.

Data Processing:

This section describes the data processing steps.The NOAA Storm Database file is downloaded from “https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2”, and the data is imported into the dataframe, named “storm”. In order to create the dataframe suited for detecting the health effect of events, the storm dataframe is set with the columns such as, EVTYPE and the sum of INJURIES and FATALITIES aggregated by the sum of EVTYPE. For the analysis of the economic event, PROPDMG and CROPDMG are assumed to indicate the number of zeros; ie k as thousand (000) 3 as three zeros (000) etc. Each value of PROPDMG and CROPDMG are coonverted into zeros and combined with the columns, PROPDMG and CROPDMG, to create the variables, propdmg and cropdmg. Finally, the storm dataframe is set with EVTYPE and the sum of propdmg and cropdmg aggregated by EVTYPE. Due to the time constraint, choosing the adequate range of data by time to avoid a lack of records is not implemented. Also, due to the difficulty in classifying EVTYPE with the 42 event types described in the Storm Data Documentation, such a classification is not implemented.

# Download the zipped file from "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2""

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","repdata%2Fdata%2FStormData.csv.bz2",method ="curl")

#Assign the csv file into the dataframe, storm.
storm <- read.csv(bzfile("Downloads/repdata-data-StormData.csv.bz2"),stringsAsFactors =F)

# Aggregate INJURIES and FATALITIES by EVTYPE:
stormH<-aggregate(INJURIES+FATALITIES~tolower(EVTYPE), data = storm, sum)

# Change the name of columns of stormH:
names(stormH)<-c("evtype","injuries")

# Sort the stormH by injuries in descending order:

stormH<-stormH[order(-stormH$injuries),]

## Economic Events:

# Classify the characters of PROPDMGEXP and CROPDMGEXP and convert into numeric values: 

# Except the unknown characters such as,  "+","-" and "?" that are set as a space, the following characters are classified as following numeric characters:

# 1 = 0
# h and 2 = 00 
# k and 3 = 000
# 4 = 0000
# 5 = 00000
# m and 6 = 000000
# 7 = 0000000
# 8 = 00000000
# b = 000000000

storm$propdmgexp <-unlist(sapply(tolower(storm$PROPDMGEXP), switch,              
              "1" = "0",
              "2" = "00",
              "3" ="000",
              "4" ="0000",
              "5" = "00000",
              "6" = "000000",
              "7" = "0000000",
              "8" = "00000000",
              k ="000",
              m="000000",
              b="000000000",
              "+"="",
              "-"="",
              "?"="",
              ""
              ))

storm$cropdmgexp <-unlist(sapply(tolower(storm$CROPDMGEXP), switch,              
              "1" = "0",
              "2" = "00",
              "3" ="000",
              "4" ="0000",
              "5" = "00000",
              "6" = "000000",
              "7" = "0000000",
              "8" = "00000000",
              k ="000",
              m="000000",
              b="000000000",
              "+"="",
              "-"="",
              "?"="",
              ""
              ))

# Round PROPDMG and CROPDMG to integers:
storm$propdmg <- round(storm$PROPDMG)
storm$cropdmg <- round(storm$CROPDMG)

# Paste PROPDMG and PROPDMGEXP/ CROPDMG and CROPDMGEXP and create prop and crop variables that are converted into numeric values:

storm$prop<-with(storm, as.numeric(paste(propdmg, propdmgexp,sep="")))

storm$crop<-with(storm, as.numeric(paste(cropdmg, cropdmgexp,sep="")))

# Aggregate the prop and crop in the storm dataframe by EVTYPE:

stormEcon<-aggregate(prop+crop ~tolower(EVTYPE), data = storm, sum)

# Change the column names of stormEcon:

names(stormEcon)<-c("evtype","cost")

# Sort the stormEcon by cost in descending order:

stormEcon<-stormEcon[order(-stormEcon$cost),]

Results:

Although the processed dataframes are too simplistic due to the lack of cleaning event data types and choosing the suited data sets in terms of the time range, the most harmful events for health and economy can be reasonably foud from such data sets.

In terms of the sum of the injuries and fatalities, the most harmful event of health can be attributed to tornado, as the plot below indicates:

# load the library ggplot2

library(ggplot2)
qplot(evtype, injuries , data = head(stormH), main ="The Total Injuries of Top Five Weather Events", ylab ="Injuries", xlab ="Event")

Accordingly, the most harmful event of economy can be attributed to flood, as the plot below indicates:

qplot(evtype, cost , data = head(stormEcon), main ="The Total Ecoonomic Cost of Top Five Weather Events", ylab ="Cost", xlab ="Event")