The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. First, we deduced which event has been the most harmful event in the US from 1950 to 2011, which is tornado. Then, the database has been used to study the economic impact of these events. Hence we deduced that flash floods and thundestorm winds implied several billions of dollars regarding property damages in the US from 1950 to 2011. The highest costs regarding crop damages are due to drought and flood.
First, the database has been loaded as below:
setwd("C:/Users/maxim/Desktop")
storm_data <- read.csv("repdata_data_StormData.csv.bz2")
stormdata<- storm_data[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Please find attached two files to understand how the database has been written:
We are interested in seven information:
Then, the data has been processed as below:
stormdata<- storm_data[c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
#Converting the exponent
transform_exposant <- function(e) {
# h -> hundred, k -> thousand, m -> million, b -> billion
if (e %in% c('h', 'H'))
return(2)
else if (e %in% c('k', 'K'))
return(3)
else if (e %in% c('m', 'M'))
return(6)
else if (e %in% c('b', 'B'))
return(9)
else if (!is.na(as.numeric(e))) # if a digit
return(as.numeric(e))
else if (e %in% c('', '-', '?', '+'))
return(1)
else {
stop("Invalid exponent value.")
}
}
property_damage <- sapply(stormdata$PROPDMGEXP, FUN=transform_exposant)
stormdata$PROPDMGEXP <- stormdata$PROPDMG * (10 ** property_damage)
crop_damage <- sapply(stormdata$CROPDMGEXP, FUN=transform_exposant)
stormdata$CROPDMGEXP <- stormdata$CROPDMG * (10 ** crop_damage)
Then, it is possible to get TOP15 of the most harmful events in the US from 1950 to 2011:
fatal_event <- aggregate(FATALITIES ~ EVTYPE, data=stormdata, sum)
injured_event <- aggregate(INJURIES ~ EVTYPE, data=stormdata, sum)
fatal_event_15 <- head(fatal_event[order(fatal_event$FATALITIES, decreasing = TRUE),],15)
injured_event_15 <- head(injured_event[order(injured_event$INJURIES, decreasing = TRUE),],15)
It is also possible to get TOP15 of the events which implied the highest costs in the US from 1950 to 2011:
property_damage_event <- aggregate(PROPDMGEXP ~ EVTYPE, data=stormdata, sum)
crop_damage_event <- aggregate(CROPDMGEXP ~ EVTYPE, data=stormdata, sum)
property_damage_event_15 <- head(property_damage_event[order(property_damage_event$PROPDMGEXP, decreasing = TRUE),],15)
crop_damage_event_15 <- head(crop_damage_event[order(crop_damage_event$CROPDMGEXP, decreasing = TRUE),],15)
graph_fatalities <- ggplot(data=fatal_event_15, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES, fill=FATALITIES)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of dead people") +
xlab("Event type") +
theme(legend.position="none")
graph_injuries <- ggplot(data=injured_event_15,aes(x=reorder(EVTYPE, INJURIES), y=INJURIES, fill=INJURIES)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of injured people") +
xlab("Event type") +
theme(legend.position="none")
grid.arrange(graph_fatalities, graph_injuries, top=textGrob("Most harmful events in the US (1950-2011)",gp=gpar(fontsize=14,font=3)))
graph_property_damage <- ggplot(data=property_damage_event_15,
aes(x=reorder(EVTYPE, PROPDMGEXP), y=PROPDMGEXP, fill=PROPDMGEXP)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of property damages") +
xlab("Event type") +
theme(legend.position="none")
graph_crop_damage <- ggplot(data=crop_damage_event_15,
aes(x=reorder(EVTYPE, CROPDMGEXP), y=CROPDMGEXP, fill=CROPDMGEXP)) +
geom_bar(stat="identity") +
coord_flip() +
ylab("Total number of crop damages") +
xlab("Event type") +
theme(legend.position="none")
grid.arrange(graph_property_damage, graph_crop_damage, top=textGrob("Events that caused greatest economic consequences in the US (1950-2011)",gp=gpar(fontsize=14,font=3)))
First, we deduced which event has been the most harmful event in the US from 1950 to 2011, which is tornado. Then, the database has been used to study the economic impact of these events. Hence we deduced that flash floods and thundestorm winds implied the highest costs regarding damages.