Reproducible Research: Peer Assessment 2

1.Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

2.Synopsis

The National Oceanic and Atmospheric Administration (NOAA) maintains a public database for all major weather events occurred throughout the country since 1950. Investigating data elements from this data source, I will prepare an analytical report of which type of storm events are the most harmful to the population health and causes worst financial damage. I will use the estimates of fatalities, injuries, property and crop damage to decide which types of weather events are most detrimental to both human health as well as economy.

Population Health impact with injuries and fatalities

Fatalities: tornado and excessive heat cause most fatality.

Injuries: tornado causes most injuries.

Economic Damages attributed from weather events:

Property damage: flash flood, thunderstorm wind and tornado cause most damage.

Crop damage: drought, floods and ice storms caused most damage.

Impact of Severe Weather Events on Public Health and US Economy

3.Data Processing

# Package library selection
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.1
library(gridExtra)
## Warning: package 'gridExtra' was built under R version 3.3.1
# Setting the directory path to read StormData file
setwd("C:/Documents and Settings/Owner/Desktop/ReProducible_Research/Week_04/Project_02")

# reading the 'StormData.csv' file
StormData  <-  read.csv(bzfile("repdata-data-StormData.csv.bz2"))

# total number of rows and column
dim(StormData)
## [1] 902297     37
# number of unique event types
length(unique(StormData$EVTYPE))
## [1] 985

EVTYPE Variation: The ‘EVTYPE’ contains 985 unique weather source events. Many of them are very identical in nature and can be converged to single unique ‘event-instance’ by description. For instance, “wind|storm|wnd|hurricane|typhoon” all these categories can be converged to “Wind-Storm” as one single-unit of weather event.The obvious skewed variation of description needs to be addressed.

Selecting subgroup of data columns

# selecting columns for analysis as per assignement
SelectedColumn <- which(colnames(StormData) %in% c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP"))

# creating a new dataset with selected columns
StormData <- StormData [, SelectedColumn]

head(StormData)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Impact analysis on public health

Analysing human ‘Fatality’ and ‘Injury’ by weather events

4.1 FATALITIES Aggregated and Plotted

# aggregating 'fatalities' by 'Event_Type'
Fatality_Aggregated <- aggregate(FATALITIES ~ EVTYPE, data = StormData, FUN = sum, na.rm=TRUE)

# extricating 12 weather events in correlation with 'human Fatality'
Fatalities <- Fatality_Aggregated [order( -Fatality_Aggregated$FATALITIES),][1:12,]

# Displaying the data aggregation
Fatalities[, c("EVTYPE", "FATALITIES")]
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224
## 972   WINTER STORM        206
## 586   RIP CURRENTS        204
# Plotting the fatalities on a barplot  using ggplot
Fatality_Plot  <-ggplot( Fatalities, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES, fill=FATALITIES))+geom_bar(stat = "identity", fill="saddlebrown")+theme(axis.text.x=element_text(angle=45, hjust=1))+ labs(title="Total Fatalities by Weather Events\n from 1950 - 2011", x="", y="Total Fatalities")

Fatality Result analysis: It is obvious that Tornado is the most fatal weather events with a range of up to 6000 fatality. Excessive heat is the second most deadly event with 2000 or more fatality.Flash flood comes third.

4.2 INJURIES Aggregated and Plotted

# Aggregating injuries in cohesion with weather-events
Injuries_Aggregated <- aggregate( INJURIES ~ EVTYPE, data = StormData, FUN = sum, na.rm=TRUE)

# splitting 12 weather-events related injuries in increasing order
Injuries <- Injuries_Aggregated[order( - Injuries_Aggregated$INJURIES),][1:12,]

# Displaying splitting data
Injuries[,c("EVTYPE", "INJURIES")]
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361
## 972      WINTER STORM     1321
## 411 HURRICANE/TYPHOON     1275
# Plotting the injuries on a barplot  using ggplot
Injury_Plot <- ggplot(Injuries, aes(x=reorder(EVTYPE,INJURIES),y=INJURIES,fill = INJURIES))+ geom_bar(stat = "identity",fill = "skyblue4")+ theme(axis.text.x=element_text(angle=45, hjust=1))+   labs(title="Total Injuries by Weather Events\n from 1950 - 2011", x="", y="Total Injuries")

Injury Result analysis: Tornado is still the most deadliest reason for injury casualties countrywide.

# Displaying "Fatality_plot" vs. "Injury_plot" in one plane
grid.arrange(Fatality_Plot, Injury_Plot ,ncol = 2)

Economic impact by weather events in USA

5.1 Property Damage And Crop Damage By Stormy Weather Events

# Refactor 'PROPDMG-PROPDMGEXP' & 'CROPDMG-CROPDMGEXP' to absolute values

# writing function to read exponents
Redefine_Exponent <- function(exp) {
        # Define in data-table
        # h: hundred, k: thousand, m: million, b: billion
  
  if ( exp %in% c('H', 'h'))
        return(2)
    else if (exp %in% c('K', 'k'))
        return(3)
    else if (exp %in% c('M', 'm'))
        return(6)
    else if (exp %in% c('B', 'b'))
        return(9)
    else if (!is.na(as.numeric(exp))) # if a digit
        return(as.numeric(exp))
    else if (exp %in% c('', '-', '?', '+'))
        return(0)
    else {
        stop("Not a valid exponent value.")
    }
}

# calculating exact numeric value of Property and Crop Damage
Damage_Total_Amount <- function(value, exp){
  
  poE <- Redefine_Exponent(exp)
  
  # multiplying the property damage value by exponent
  if( is.numeric(value))
  { value <- value * (10^poE)}
  else{
    stop("Invalid exponents.")
  }
}


# calculating Property Damage with exponents
StormData$prop_Damage <- mapply(Damage_Total_Amount, StormData$PROPDMG, StormData$PROPDMGEXP)

head(StormData$prop_Damage)
## [1] 25000  2500 25000  2500  2500  2500
# calculating Crop Damage with exponent
StormData$crop_Damage <- mapply(Damage_Total_Amount, StormData$CROPDMG, StormData$CROPDMGEXP) 

head(StormData$crop_Damage)
## [1] 0 0 0 0 0 0

5.2 PROPERTY damage aggregated and ploted

# aggregating property damage by weather event type
Aggregate_by_PropDmg <- aggregate(prop_Damage ~ EVTYPE, data = StormData, FUN = sum)

head(Aggregate_by_PropDmg)
##                  EVTYPE prop_Damage
## 1    HIGH SURF ADVISORY      200000
## 2         COASTAL FLOOD           0
## 3           FLASH FLOOD       50000
## 4             LIGHTNING           0
## 5             TSTM WIND     8100000
## 6       TSTM WIND (G45)        8000
# selecting property-damage by weather events upto 12 category
Property_Damage <- Aggregate_by_PropDmg [order(-Aggregate_by_PropDmg $prop_Damage),][1:12,]

head(Aggregate_by_PropDmg)
##                  EVTYPE prop_Damage
## 1    HIGH SURF ADVISORY      200000
## 2         COASTAL FLOOD           0
## 3           FLASH FLOOD       50000
## 4             LIGHTNING           0
## 5             TSTM WIND     8100000
## 6       TSTM WIND (G45)        8000
# plotting the property damage
Damage_Property_Total <- ggplot(data=Property_Damage,
             aes(x=reorder(EVTYPE, prop_Damage), y=log10(prop_Damage), fill=prop_Damage)) + geom_bar(stat="identity", fill = "royalblue4" )+ theme(axis.text.x=element_text(angle=45, hjust=1))+ labs(title="Property Damage Total",x="Weather Event type", y="Prop damage in dollars ")

Property Damage Result analysis: Flash flood, Thunderstorm winds and Tornados are the top three reasons causing property damage.

5.4 CROP damage aggregated and ploted

# aggregating CROP Damage by weahter events
aggregate_by_Crop_Damage <- aggregate(crop_Damage ~ EVTYPE, data = StormData, FUN = sum)

# subselecting 12 events in increasing order
Crop_Damage <- aggregate_by_Crop_Damage[order(-aggregate_by_Crop_Damage$crop_Damage),][1:12,]

head( Crop_Damage )
##          EVTYPE crop_Damage
## 95      DROUGHT 13972566000
## 170       FLOOD  5661968450
## 590 RIVER FLOOD  5029459000
## 427   ICE STORM  5022113500
## 244        HAIL  3025974480
## 402   HURRICANE  2741910000
# plotting crop-damage by Evetype
Crop_Damage_Total   <- ggplot(data=Crop_Damage,aes(x=reorder(EVTYPE, crop_Damage), y=crop_Damage, fill=crop_Damage))+ geom_bar(stat="identity", fill = " goldenrod4")+ theme(axis.text.x=element_text(angle=45, hjust=1))+ labs(title="Crop Damage Total", x="Weather Event type", y="Crop damage in dollars ")

Crop Damage Result analysis: Drought has the most severe effect on crop damage, which caused more than $10 billion dollars damage since 1950. Flash flood, flood, ice storm also have had significant damage on crops.

5.5 Displaying Property vs. Crop Damage in one Visual horizon

grid.arrange(Damage_Property_Total, Crop_Damage_Total ,ncol = 2)