author: “Farzad Ravari”
date: “April 12, 2017”
output: html_document
The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
The National Oceanic and Atmospheric Administration (NOAA) maintain a public database for storm event. The data contains the type of storm event, details like location, date, estimates for damage to property as well as the number of human victims of the storm. data analysis must address the following questions: a) Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health? b) Across the United States, which types of events have the greatest economic consequences?
getwd()
[1] "C:/Users/farza/OneDrive/Documents"
setwd("C:/Users/farza/Desktop/Data Science/Course 5/Project 2/data/data")
getwd()
[1] "C:/Users/farza/Desktop/Data Science/Course 5/Project 2/data/data"
library(RCurl) # for loading external dataset
library(plyr) # for count & aggregate method
library(reshape2) # for melt
library(ggplot2) # for plots
library(grid) # for grids
library(gridExtra) # for advanced plots
library(scales) # for plot scaling
if(!file.exists("C:/Users/farza/Desktop/Data Science/Course 5/Project
2/data/data/StormData.csv")){
filePath <- "C:/Users/farza/Desktop/Data Science/Course 5/Project
2/data/data/StormData.csv.bz2"
destPath <- "C:/Users/farza/Desktop/Data Science/Course 5/Project
2/data/data/StormData.csv"
unzip(filePath,destPath,overwrite=TRUE, remove=FALSE)
}
storm <- read.csv("C:/Users/farza/Desktop/Data Science/Course 5/Project
2/data/data/StormData.csv")
event <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP",
"CROPDMG", "CROPDMGEXP")
data <- storm[event]
unique(data$PROPDMGEXP)
[1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
data$PROPEXP[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "4"] <- 10000
data$PROPEXP[data$PROPDMGEXP == "2"] <- 100
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1000
data$PROPEXP[data$PROPDMGEXP == "h"] <- 100
data$PROPEXP[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPEXP[data$PROPDMGEXP == "H"] <- 100
data$PROPEXP[data$PROPDMGEXP == "1"] <- 10
data$PROPEXP[data$PROPDMGEXP == "8"] <- 1e+08
data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0
data$PROPDMGVAL <- data$PROPDMG * data$PROPEXP
unique(data$CROPDMGEXP)
[1] M K m B ? 0 k 2
Levels: ? 0 2 B k K m M
data$CROPEXP[data$CROPDMGEXP == "M"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "m"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "B"] <- 1e+09
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "k"] <- 1000
data$CROPEXP[data$CROPDMGEXP == "2"] <- 100
data$CROPEXP[data$CROPDMGEXP == ""] <- 1
data$CROPEXP[data$CROPDMGEXP == "?"] <- 0
data$CROPDMGVAL <- data$CROPDMG * data$CROPEXP
data$CROPDMGVAL <- data$CROPDMG * data$CROPEXP
fatal <- aggregate(FATALITIES ~ EVTYPE, data, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, data, FUN = sum)
propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, data, FUN = sum)
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, data, FUN = sum)
fatal8 <- fatal[order(-fatal$FATALITIES), ][1:8, ]
injury8 <- injury[order(-injury$INJURIES), ][1:8, ]par(mfrow = c(1, 2), mar= c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(fatal8$FATALITIES, las= 3, names.arg = fatal8$EVTYPE, main ="Highest Fatalities Events",ylab ="Fatalities No.", col = "red")
barplot(injury8$INJURIES, las = 3, names.arg = injury8$EVTYPE, main = "Highest Injuries Events",ylab = "Injuries No.", col = "purple")
Alt text
propdmg8 <- propdmg[order(-propdmg$PROPDMGVAL), ][1:8, ]
cropdmg8 <- cropdmg[order(-cropdmg$CROPDMGVAL), ][1:8, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmg8$PROPDMGVAL/(10^9), las = 3, names.arg = propdmg8$EVTYPE,
main = "Highest Property Damages Events", ylab = "Damage Cost ($billions)", col = "grey")
barplot(cropdmg8$CROPDMGVAL/(10^9), las = 3, names.arg = cropdmg8$EVTYPE,
main = "Highest Crop Damages Events", ylab = "Damage Cost ($ billions)",col = "orange")
Alt text
Fatalities and injuries:
Maximum fatalities and injuries are due to tornados and then excessive heat for fatalities and thunderstorm wind for injuries
Property damage:
Mainly caused by floods and then hurricanes /typhoons and crop damage by drought and floods