The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.
The National Oceanic and Atmospheric Administration (NOAA) maintains a public database for all major weather events occurred throughout the country since 1950. Investigating data elements from this data source, I will prepare an analytical report of which type of storm events are the most harmful to the population health and causes worst financial damage. I will use the estimates of fatalities, injuries, property and crop damage to decide which types of weather events are most detrimental to both human health as well as economy.
Population Health impact with injuries and fatalities
Fatalities: tornado and excessive heat cause most fatality.
Injuries: tornado causes most injuries.
Economic Damages attributed from weather events:
Property damage: flash flood, thunderstorm wind and tornado cause most damage.
Crop damage: drought, floods and ice storms caused most damage.
# Package library selection
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.1
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.3.1
# Setting the directory path to read StormData file
setwd("C:/Documents and Settings/Owner/Desktop/ReProducible_Research/Week_04/Project_02")
# reading the 'StormData.csv' file
StormData <- read.csv(bzfile("repdata-data-StormData.csv.bz2"))
# total number of rows and column
dim(StormData)
## [1] 902297 37
# number of unique event types
length(unique(StormData$EVTYPE))
## [1] 985
EVTYPE Variation: The ‘EVTYPE’ contains 985 unique weather source events. Many of them are very identical in nature and can be converged to single unique ‘event-instance’ by description. For instance, “wind|storm|wnd|hurricane|typhoon” all these categories can be converged to “Wind-Storm” as one single-unit of weather event.The obvious skewed variation of description needs to be addressed.
Selecting subgroup of data columns
# selecting columns for analysis as per assignement
SelectedColumn <- which(colnames(StormData) %in% c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))
# creating a new dataset with selected columns
StormData <- StormData [, SelectedColumn]
head(StormData)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
4.1 FATALITIES Aggregated and Plotted
# aggregating 'fatalities' by 'Event_Type'
Fatality_Aggregated <- aggregate(FATALITIES ~ EVTYPE, data = StormData, FUN = sum, na.rm=TRUE)
# extricating 12 weather events in correlation with 'human Fatality'
Fatalities <- Fatality_Aggregated [order( -Fatality_Aggregated$FATALITIES),][1:12,]
# Displaying the data aggregation
Fatalities[, c("EVTYPE", "FATALITIES")]
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
## 972 WINTER STORM 206
## 586 RIP CURRENTS 204
# Plotting the fatalities on a barplot using ggplot
Fatality_Plot <-ggplot( Fatalities, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES, fill=FATALITIES))+geom_bar(stat = "identity", fill="saddlebrown")+theme(axis.text.x=element_text(angle=45, hjust=1))+ labs(title="Total Fatalities by Weather Events\n from 1950 - 2011", x="", y="Total Fatalities")
Fatality Result analysis: It is obvious that Tornado is the most fatal weather events with a range of up to 6000 fatality. Excessive heat is the second most deadly event with 2000 or more fatality.Flash flood comes third.
4.2 INJURIES Aggregated and Plotted
# Aggregating injuries in cohesion with weather-events
Injuries_Aggregated <- aggregate( INJURIES ~ EVTYPE, data = StormData, FUN = sum, na.rm=TRUE)
# splitting 12 weather-events related injuries in increasing order
Injuries <- Injuries_Aggregated[order( - Injuries_Aggregated$INJURIES),][1:12,]
# Displaying splitting data
Injuries[,c("EVTYPE", "INJURIES")]
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
## 972 WINTER STORM 1321
## 411 HURRICANE/TYPHOON 1275
# Plotting the injuries on a barplot using ggplot
Injury_Plot <- ggplot(Injuries, aes(x=reorder(EVTYPE,INJURIES),y=INJURIES,fill = INJURIES))+ geom_bar(stat = "identity",fill = "skyblue4")+ theme(axis.text.x=element_text(angle=45, hjust=1))+ labs(title="Total Injuries by Weather Events\n from 1950 - 2011", x="", y="Total Injuries")
Injury Result analysis: Tornado is still the most deadliest reason for injury casualties countrywide.
# Displaying "Fatality_plot" vs. "Injury_plot" in one plane
grid.arrange(Fatality_Plot, Injury_Plot ,ncol = 2)
# Refactor 'PROPDMG-PROPDMGEXP' & 'CROPDMG-CROPDMGEXP' to absolute values
# writing function to read exponents
Redefine_Exponent <- function(exp) {
# Define in data-table
# h: hundred, k: thousand, m: million, b: billion
if ( exp %in% c('H', 'h'))
return(2)
else if (exp %in% c('K', 'k'))
return(3)
else if (exp %in% c('M', 'm'))
return(6)
else if (exp %in% c('B', 'b'))
return(9)
else if (!is.na(as.numeric(exp))) # if a digit
return(as.numeric(exp))
else if (exp %in% c('', '-', '?', '+'))
return(0)
else {
stop("Not a valid exponent value.")
}
}
# calculating exact numeric value of Property and Crop Damage
Damage_Total_Amount <- function(value, exp){
poE <- Redefine_Exponent(exp)
# multiplying the property damage value by exponent
if( is.numeric(value))
{ value <- value * (10^poE)}
else{
stop("Invalid exponents.")
}
}
# calculating Property Damage with exponents
StormData$prop_Damage <- mapply(Damage_Total_Amount, StormData$PROPDMG, StormData$PROPDMGEXP)
head(StormData$prop_Damage)
## [1] 25000 2500 25000 2500 2500 2500
# calculating Crop Damage with exponent
StormData$crop_Damage <- mapply(Damage_Total_Amount, StormData$CROPDMG, StormData$CROPDMGEXP)
head(StormData$crop_Damage)
## [1] 0 0 0 0 0 0
# aggregating property damage by weather event type
Aggregate_by_PropDmg <- aggregate(prop_Damage ~ EVTYPE, data = StormData, FUN = sum)
head(Aggregate_by_PropDmg)
## EVTYPE prop_Damage
## 1 HIGH SURF ADVISORY 200000
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 50000
## 4 LIGHTNING 0
## 5 TSTM WIND 8100000
## 6 TSTM WIND (G45) 8000
# selecting property-damage by weather events upto 12 category
Property_Damage <- Aggregate_by_PropDmg [order(-Aggregate_by_PropDmg $prop_Damage),][1:12,]
head(Aggregate_by_PropDmg)
## EVTYPE prop_Damage
## 1 HIGH SURF ADVISORY 200000
## 2 COASTAL FLOOD 0
## 3 FLASH FLOOD 50000
## 4 LIGHTNING 0
## 5 TSTM WIND 8100000
## 6 TSTM WIND (G45) 8000
# plotting the property damage
Damage_Property_Total <- ggplot(data=Property_Damage,
aes(x=reorder(EVTYPE, prop_Damage), y=log10(prop_Damage), fill=prop_Damage)) + geom_bar(stat="identity", fill = "royalblue4" )+ theme(axis.text.x=element_text(angle=45, hjust=1))+ labs(title="Property Damage Total",x="Weather Event type", y="Prop damage in dollars ")
Property Damage Result analysis: Flash flood, Thunderstorm winds and Tornados are the top three reasons causing property damage.
# aggregating CROP Damage by weahter events
aggregate_by_Crop_Damage <- aggregate(crop_Damage ~ EVTYPE, data = StormData, FUN = sum)
# subselecting 12 events in increasing order
Crop_Damage <- aggregate_by_Crop_Damage[order(-aggregate_by_Crop_Damage$crop_Damage),][1:12,]
head( Crop_Damage )
## EVTYPE crop_Damage
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025974480
## 402 HURRICANE 2741910000
# plotting crop-damage by Evetype
Crop_Damage_Total <- ggplot(data=Crop_Damage,aes(x=reorder(EVTYPE, crop_Damage), y=crop_Damage, fill=crop_Damage))+ geom_bar(stat="identity", fill = " goldenrod4")+ theme(axis.text.x=element_text(angle=45, hjust=1))+ labs(title="Crop Damage Total", x="Weather Event type", y="Crop damage in dollars ")
Crop Damage Result analysis: Drought has the most severe effect on crop damage, which caused more than $10 billion dollars damage since 1950. Flash flood, flood, ice storm also have had significant damage on crops.
5.5 Displaying Property vs. Crop Damage in one Visual horizon
grid.arrange(Damage_Property_Total, Crop_Damage_Total ,ncol = 2)