Assignment

The basic goal of this assignment is to explore the NOAA Storm Database and answer some basic questions about severe weather events. You must use the database to answer the questions below and show the code for your entire analysis. Your analysis can consist of tables, figures, or other summaries. You may use any R package you want to support your analysis.

Synopsis

Storm Data is an official publication of the National Oceanic and Atmospheric Administration (NOAA) which documents: a: The occurrence of storms and other significant weather phenomena having sufficient intensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce; b: Rare, unusual, weather phenomena c: Other significant meteorological events

This data analysis project addresseses following questions :

  1. which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health ?
  2. which types of events have the greatest economic consequences ?

From the analysis it is evident by aggregating the data by storm events type :

1.Tornado is the event which causes most harm to population health, and
2.Flood is the event which causes most harm to private & public property

  library(knitr)
  library(ggplot2)

Data Processing

  # Dowloading the storm data file
  if(!file.exists("repdata_data_StormData.csv.bz2")) {
    download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
                  destfile = "repdata_data_StormData.csv.bz2")
  }
  
  # Loading data
  fullStormData <- read.csv(bzfile("repdata_data_StormData.csv.bz2"), sep=",", header=T)
  names(fullStormData)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"
  dim(fullStormData)
## [1] 902297     37
  # Subset (NOAA) storm database
  subset <- fullStormData[,c('EVTYPE','FATALITIES','INJURIES', 'PROPDMG', 'PROPDMGEXP', 'CROPDMG', 'CROPDMGEXP')]
  #head(subset)

As a part of data getting ready for analysis PROPDMGEXP and CROPDMGEXP factors must be used to convert the actual damage

PROPDMG and CROPDMG: Amount (without currency units) of property damage and crop damage

PROPDMGEXP and CROPDMGEXP: are factors expressed in power of 10 of the above variables
H->Hundreds
K->Thousands
M->Millions
B->Billions

  # Convert H, K, M, B units to calculate Property Damage 
  
  ## Property Damage  = PROPDMG * PROPDMGEXP
  subset$PROPDMGAMOUNT = 0
  
  ## fill in the data with correct units
  subset[subset$PROPDMGEXP == "H", ]$PROPDMGAMOUNT = subset[subset$PROPDMGEXP == "H", ]$PROPDMG * 10^2
  subset[subset$PROPDMGEXP == "K", ]$PROPDMGAMOUNT = subset[subset$PROPDMGEXP == "K", ]$PROPDMG * 10^3
  subset[subset$PROPDMGEXP == "M", ]$PROPDMGAMOUNT = subset[subset$PROPDMGEXP == "M", ]$PROPDMG * 10^6
  subset[subset$PROPDMGEXP == "B", ]$PROPDMGAMOUNT = subset[subset$PROPDMGEXP == "B", ]$PROPDMG * 10^9
  # Convert H, K, M, B units to calculate Crop Damage 
  
  ## CROP Damage  = CROPPDMG * CROPDMGEXP
  subset$CROPDMGAMOUNT = 0
  
  ## assign correct values based on parameters
  subset[subset$CROPDMGEXP == "H", ]$CROPDMGAMOUNT = subset[subset$CROPDMGEXP == "H", ]$CROPDMG * 10^2
  subset[subset$CROPDMGEXP == "K", ]$CROPDMGAMOUNT = subset[subset$CROPDMGEXP == "K", ]$CROPDMG * 10^3
  subset[subset$CROPDMGEXP == "M", ]$CROPDMGAMOUNT = subset[subset$CROPDMGEXP == "M", ]$CROPDMG * 10^6
  subset[subset$CROPDMGEXP == "B", ]$CROPDMGAMOUNT = subset[subset$CROPDMGEXP == "B", ]$CROPDMG * 10^9

Results

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health ?

  # plot number of fatalities with the most harmful event type
  
  fatalities <- aggregate(FATALITIES ~ EVTYPE, data=subset, sum)
  
  fatalities <- fatalities[order(-fatalities$FATALITIES), ][1:10, ]
  fatalities$EVTYPE <- factor(fatalities$EVTYPE, levels = fatalities$EVTYPE)
  
  ggplot(fatalities, aes(x = EVTYPE, y = FATALITIES)) + 
    geom_bar(stat = "identity", fill = "black") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Fatalities") + ggtitle("Number of fatalities by top 10 Events")

# number of injuries with the most harmful event types

injuries <- aggregate(INJURIES ~ EVTYPE, data=subset, sum)
injuries <- injuries[order(-injuries$INJURIES), ][1:10, ]
injuries$EVTYPE <- factor(injuries$EVTYPE, levels = injuries$EVTYPE)

ggplot(injuries, aes(x = EVTYPE, y = INJURIES)) + 
    geom_bar(stat = "identity", fill = "black") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Injuries") + ggtitle("Number of injuries by top 10 Weather Events")

Conclusion 1: Tornados cause the most harm to public health. The graphs above show it to be the largest cause of fatalities and injuries due to weather events.

Question 2: Across the United States, which types of events hae the greatest economic consequences?

# plot of total damages caused to Property and crops

damages <- aggregate(PROPDMGAMOUNT+ CROPDMGAMOUNT ~ EVTYPE, data=subset, sum)
names(damages) = c("EVTYPE", "TOTALDAMAGE")
damages$TOTALDAMAGE<-damages$TOTALDAMAGE/10^6
damages <- damages[order(-damages$TOTALDAMAGE), ][1:10, ]
damages$EVTYPE <- factor(damages$EVTYPE, levels = damages$EVTYPE)

ggplot(damages, aes(x = EVTYPE, y = TOTALDAMAGE)) + 
    geom_bar(stat = "identity", fill = "red") + 
    theme(axis.text.x = element_text(angle = 90, hjust = 1)) + 
    xlab("Event Type") + ylab("Damages ($ MILLION)") + ggtitle("Property & Crop Damages by top 10 Weather Events")

Conclusion2: Flood causes the greatest economical damages to property and Crops