Synopsis

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage."

Our focus is on two questions:

Our analysis indicates:

Data processing

Preprocessing i.e. Load the data, create data frames

# loading data

if (!"datafile.csv.bz2" %in% dir("./")) {
        download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","datafile.csv.bz2")
}

if(!"weatherdata" %in% ls()) {
        weatherdata <- read.csv("datafile.csv.bz2")

}

# Data Frame for event type, fatalities and injuries

weatherdataclean <- data.frame(weatherdata$EVTYPE,weatherdata$FATALITIES, weatherdata$INJURIES)
colnames(weatherdataclean) = c("EVTYPE", "FATALITIES", "INJURIES")

# Data Frame for event type, property damage and crop damage

damagedataclean <- data.frame(weatherdata$EVTYPE,weatherdata$PROPDMG, weatherdata$PROPDMGEXP, weatherdata$CROPDMG, weatherdata$CROPDMGEXP)

colnames(damagedataclean) = c("EVTYPE", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
damagedataclean$PROPDMGMult <- ifelse (damagedataclean$PROPDMGEXP == "K", 1000, ifelse (damagedataclean$PROPDMGEXP == "M", 1000000, ifelse (damagedataclean$PROPDMGEXP == "B", 1000000000, 0)))


# Derive damage amount based on metric summary. Create new metric for combined property + crop damage

damagedataclean$PROPDMGAMT <- damagedataclean$PROPDMG*damagedataclean$PROPDMGMult

damagedataclean$CROPDMGMult <- ifelse (damagedataclean$CROPDMGEXP == "K", 1000, ifelse (damagedataclean$CROPDMGEXP == "M", 1000000, ifelse (damagedataclean$CROPDMGEXP == "B", 1000000000, 0)))

damagedataclean$CROPDMGAMT <- damagedataclean$CROPDMG*damagedataclean$CROPDMGMult

damagedataclean$TOTALDMGAMT <- damagedataclean$PROPDMGAMT+damagedataclean$CROPDMGAMT

It is assumed that events are causing damage to population health by fatalities (FATALITIES in data set) and injuries (INJURIES in data set).

Event most harmful to population health (fatalities) - Tornados

# summary of events based on total number of fatalities by event type

weatherfatalities <- aggregate(weatherdataclean$FATALITIES, by = list(weatherdataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(weatherfatalities) = c("EVTYPE", "FATALITIES")
weatherfatalities <- weatherfatalities[order(-weatherfatalities$FATALITIES),]
topweatherfatalities <- weatherfatalities[1: 10, ]

p<- ggplot(topweatherfatalities, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES))
p+geom_bar(stat = "identity", fill = "grey")+ ggtitle("Weather Events by # Fatalities")+labs(x = "Event Type", y="#Fatalities") +theme(axis.text.x = element_text(angle=45, hjust=1)) 

Fig 1: We can see from the graph that tornadoes are most dangerous events for health in U.S in terms of fatalities.

The event type with the most total fatalities was tornado (5633) followed by excessive heat (1903) and flash flood (978). But when looking at individual weathr events, some event types appear more deadly. The event type with the most fatalities per event is tornadoes, TSTM wind, hail (25) followed by cold and snow (14) and tropical strom Gordon (8).

Event most harmful to population health (injuries) - Tornados

# summary of events based on total number of injuries by event type.

weatherinjury <- aggregate(weatherdataclean$INJURIES, by = list(weatherdataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(weatherinjury) = c("EVTYPE", "INJURIES")
weatherinjury <- weatherinjury[order(-weatherinjury$INJURIES),]
topweatherinjury <- weatherinjury[1: 10, ]

q<- ggplot(topweatherinjury, aes(x=reorder(EVTYPE, INJURIES), y=INJURIES))
q+geom_bar(stat = "identity", fill = "grey")+ ggtitle("Weather Events by # Injuries")+labs(x = "Event Type", y="#Injuries") +theme(axis.text.x = element_text(angle=45, hjust=1)) 

Fig 2:We can see from the graph that tornadoes are most dangerous events for health in U.S in terms of injuries.

Tornados were also the event type with the most injuries (91346) followed by TSTM wind (6957) and flood (6789). When considering individual events, the average number of injuries per event was highest for heat wave (70) followed by tropical storm Gordon (43) and wild fires (37.5).

Event with greatest economic consequence - Flood

# summary of events based on total damage($) by event type

TOTALDMGAMT <- aggregate(damagedataclean$TOTALDMGAMT, by = list(damagedataclean$EVTYPE), FUN = sum, na.rm = TRUE)
colnames(TOTALDMGAMT) = c("EVTYPE", "TOTALDMGAMT")
TOTALDMGAMT <- TOTALDMGAMT[order(-TOTALDMGAMT$TOTALDMGAMT),]
TOPTOTALDMGAMT <- TOTALDMGAMT[1: 10, ]

r<- ggplot(TOPTOTALDMGAMT, aes(x=reorder(EVTYPE, TOTALDMGAMT/1000000000), y=TOTALDMGAMT/1000000000))
r+geom_bar(stat = "identity", fill = "grey")+ ggtitle("Weather Events by Total Damage (in $ Billions)")+labs(x = "Event Type", y="Total Damage (in $ Billions)") +theme(axis.text.x = element_text(angle=45, hjust=1)) 

Fig 3:We can conclude from the graph that floods have greatest economic consequences.

The cost of damages analysis combines both personal property damage and crop damage. The weather event with the highest total cost of damages was flood ($15 BILLION) followed by hurricane/typhoon ($7.2 BILLION) and tornado ($5.7 BILLION). For individual weather events, the highest cost of damages per event was for tornadoes, TSTM wind, hail ($160.2 MILLION) followed by heavy rain/severe weather ($125 MILLION) and hurricane/typhoon ($81.7 MILLION).

Results