Data analysis of NOAA data to identify events with key concerns

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

The data analysis explores the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database and provides details of events that cause maximum damage to human capital and financials.

Summary of analysis

  1. The data from US NOAA contained observations covering event types and associated attributes related to impact of each events.
  2. To analyse the data the impact of events related to fatalities, injuries, property damage and corp damage was considered.
  3. Other information related to timing of the event, location and other was considered as out of scope for the purpose of this analysis.
  4. To analyse, sumarise and present the data, R was used and document published using RMarkdown.
  5. Code, associated graphs and final results are present in the subsequent section of this document.

Code Base

  1. Code for reading in the dataset and/or processing the data
  2. Read full data
  3. Selecting only relevant column related to human fatalaties, injuries and financial loss to property and crop.
  4. Updating the event type to uppercase, to ensure that when data is grouped together, the anamolies related to case is removed and duplicates are grouped together.
  5. Aggregate the sum by event type and for four columns - fatalities, injuries, property damage and crop damage.
  6. Fitering out data points where the value is zero in all columns.
  7. Update column name
  8. Calculating and storing the event into a new data frame with maximum damage to health and financials.
##Reproducible Research Wk4 Assignment

##Defining Library for the program
library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(ggplot2)
library(scales)

##clean up R memory before executing the code
rm(list=ls())

## Set Working Directory
setwd("~/Documents/Coursera/5/Wk4/")

##1. Code for reading in the dataset and/or processing the data
temp <- tempfile()
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",temp)

## Read full data
storm <- read.csv(temp, header = TRUE)
unlink(temp)
rm(temp)

##(1) Selecting only relevant column related to human fatalaties, injuries and 
## financial loss to property and crop, (2) updating the event type to upper
## case to ensure that duplicates are grouped together. (3) aggregate the sum
## by event type and for four columns. (4) fitering out data points where zero.
## (5) update column name
storm <- select(storm,EVTYPE,FATALITIES,INJURIES,PROPDMG,CROPDMG)
storm$EVTYPE <- toupper(storm$EVTYPE)
storm <- aggregate(storm[,2:5], by=list(storm$EVTYPE), sum)
storm <- filter(storm, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0)
names(storm)[1]<-paste("EVTYPE")

## New Data Frame to store event with maximum damage to health
max_health <- head(storm %>%
             select(FATALITIES,INJURIES,EVTYPE) %>%
             group_by(EVTYPE) %>%
             summarise(FATALITIES = sum(round((FATALITIES+INJURIES)/1000,2))) %>%
             arrange(desc(FATALITIES)),1)

## New Data Frame to store event with maximum damage to financials
max_fin <- head(storm %>%
                        select(PROPDMG,CROPDMG,EVTYPE) %>%
                        group_by(EVTYPE) %>%
                        summarise(PROPDMG = sum(round((PROPDMG+CROPDMG)/1000000,2))) %>%
                        arrange(desc(PROPDMG)),1)

Results

  1. Below Bar Plot for maximum casualities to human health, which is a sum of Fatalities and Injuries. The event TORNADO has the maximum damage to human health, a sum of fatalities & injuries with total value of 96.98 thousands.

  1. Below Bar Plot for maximum damage to financials, which is a sum of property damage and corp damage. The event TORNADO has the maximum damage to financials, a sum of property damage and corp damage with total value of 3.31 millions.

  1. The top 5 events with maximum fatalities in descending order (in 1000) is :
## # A tibble: 5 x 2
##   EVTYPE         FATALITIES
##   <chr>               <dbl>
## 1 TORNADO              5.63
## 2 EXCESSIVE HEAT       1.9 
## 3 FLASH FLOOD          0.98
## 4 HEAT                 0.94
## 5 LIGHTNING            0.82
  1. The top 5 events with maximum injuries in descending order (in 1000) is :
## # A tibble: 5 x 2
##   EVTYPE         INJURIES
##   <chr>             <dbl>
## 1 TORNADO           91.4 
## 2 TSTM WIND          6.96
## 3 FLOOD              6.79
## 4 EXCESSIVE HEAT     6.53
## 5 LIGHTNING          5.23
  1. The top 5 events with maximum property damage in descending order (in million) is :
## # A tibble: 5 x 2
##   EVTYPE            PROPDMG
##   <chr>               <dbl>
## 1 TORNADO              3.21
## 2 FLASH FLOOD          1.42
## 3 TSTM WIND            1.34
## 4 FLOOD                0.9 
## 5 THUNDERSTORM WIND    0.88
  1. The top 5 events with maximum crop damage in descending order (in million) is :
## # A tibble: 5 x 2
##   EVTYPE      CROPDMG
##   <chr>         <dbl>
## 1 HAIL          0.580
## 2 FLASH FLOOD   0.18 
## 3 FLOOD         0.17 
## 4 TSTM WIND     0.11 
## 5 TORNADO       0.1