Synopsis

In this assigment, the NOAA storm data on weather events are expored. The data provides the infromation on different weather events such as hail, thunder, drought and others acrross the United States. In this analysis, some of the harmful weather events are identified based on number of injuries and fatalities. Also the property loss and crop loss due to these weather events are analyses.

Data Processing

Following libraries are loaded for data processing and plotting:

library(dplyr)
## Warning: package 'dplyr' was built under R version 3.6.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

The data for analyses was obtained as comma-separated value file. It was downloded and stored in .csv format as follows:

if (!file.exists("data")){
  dir.create("data")
}

file_url <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(file_url, destfile ="./data/Dataset.csv")

The stored dataset is read as follows. The structure of the file is as follows:

Weather <- read.csv("./data/Dataset.csv")
str(Weather)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
##  $ BGN_TIME  : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
##  $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
##  $ STATE     : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ EVTYPE    : Factor w/ 985 levels "   HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : Factor w/ 35 levels "","  N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_LOCATI: Factor w/ 54429 levels "","- 1 N Albion",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_DATE  : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_TIME  : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ END_LOCATI: Factor w/ 34506 levels "","- .5 NNW",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ WFO       : Factor w/ 542 levels ""," CI","$AC",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ ZONENAMES : Factor w/ 25112 levels "","                                                                                                               "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : Factor w/ 436781 levels "","-2 at Deer Park\n",..: 1 1 1 1 1 1 1 1 1 1 ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

There are 902297 data points. It is observed that for the events columns some of the events are given different names for similar weather events. These events are assigned the following values:

Weather$Event <- 'Other'
Weather$Event[grep('tide', Weather$EVTYPE, ignore.case = TRUE)] <- 'Tide'
Weather$Event[grep('avalance', Weather$EVTYPE, ignore.case = TRUE)] <- 'Avalanche'
Weather$Event[grep('blizzard', Weather$EVTYPE, ignore.case = TRUE)] <- 'Blizzard'
Weather$Event[grep('cold', Weather$EVTYPE, ignore.case = TRUE)] <- 'Cold'
Weather$Event[grep('heat', Weather$EVTYPE, ignore.case = TRUE)] <- 'Heat'
Weather$Event[grep('flood', Weather$EVTYPE, ignore.case = TRUE)] <- 'Flood'
Weather$Event[grep('lighting', Weather$EVTYPE, ignore.case = TRUE)] <- 'Lighting'
Weather$Event[grep('drought', Weather$EVTYPE, ignore.case = TRUE)] <- 'Drought'
Weather$Event[grep('storm', Weather$EVTYPE, ignore.case = TRUE)] <- 'Storm'
Weather$Event[grep('tornado', Weather$EVTYPE, ignore.case = TRUE)] <- 'Tornado'
Weather$Event[grep('wildfire', Weather$EVTYPE, ignore.case = TRUE)] <- 'Wildfire'
Weather$Event[grep('smoke', Weather$EVTYPE, ignore.case = TRUE)] <- 'Smoke'
Weather$Event[grep('hail', Weather$EVTYPE, ignore.case = TRUE)] <- 'Hail'
Weather$Event[grep('rain', Weather$EVTYPE, ignore.case = TRUE)] <- 'Rain'

Data is arranged according to number of fatalities due to differnt weather events as follows:

health_fatal <- tapply(Weather$FATALITIES, Weather$Event, sum)
health_fatal <- sort(health_fatal, decreasing = TRUE)
head(health_fatal)
## Tornado   Other    Heat   Flood   Storm    Cold 
##    5636    3427    3132    1524     633     451

Similar to fatalities data, injuries due to differnt weather events are estimated as follows:

health_injuries <- tapply(Weather$INJURIES, Weather$Event, sum)
health_injuries <- sort(health_injuries, decreasing = TRUE)
head(health_injuries)
## Tornado   Other    Heat   Flood   Storm    Hail 
##   91407   20791    9209    8602    6691    1467

Now, let us look at the level of damage to property and crops. The damage levels are given as ‘k’, ‘m’ and ‘b’, which corresponds to thousands, million and billion. The exact value of damage is estimated as follows:

# Property loss
Weather$Prop_dam <- 0
Weather$Prop_dam[grep('k', Weather$PROPDMGEXP, ignore.case = TRUE)] <- 10^3
Weather$Prop_dam[grep('m', Weather$PROPDMGEXP, ignore.case = TRUE)] <- 10^6
Weather$Prop_dam[grep('b', Weather$PROPDMGEXP, ignore.case = TRUE)] <- 10^9
Weather$P_value <- Weather$PROPDMG*Weather$Prop_dam

Prop_dam <- tapply(Weather$P_value, Weather$Event, sum)
Prop_dam <- sort(Prop_dam, decreasing = TRUE)
head(Prop_dam)
##        Flood        Other        Storm      Tornado         Hail     Wildfire 
## 167513743320 102255776340  72812914700  56993097730  17619990720   4865614000
# Crop damage
Weather$Crop_dam <- 0
Weather$Crop_dam[grep('k', Weather$CROPDMGEXP, ignore.case = TRUE)] <- 10^3
Weather$Crop_dam[grep('m', Weather$CROPDMGEXP, ignore.case = TRUE)] <- 10^6
Weather$Crop_dam[grep('b', Weather$CROPDMGEXP, ignore.case = TRUE)] <- 10^9
Weather$C_value <- Weather$CROPDMG*Weather$Crop_dam

Crop_dam <- tapply(Weather$C_value, Weather$Event, sum)
Crop_dam <- sort(Crop_dam, decreasing = TRUE)
head(Crop_dam)
##     Drought       Flood       Other       Storm        Hail        Cold 
## 13972621780 12266926100  9280152420  6406789800  3114212850  1416765500

Results

The processed data is used for anlysing the level damage to peroperty and life.

Damage to life in the form of caulities and injuries computed from the processed data is plotted here :

barplot(health_injuries [1:10], xlab = 'Event', ylab = 'Injuries', main = 'Injuries', las = 2)

barplot(health_fatal[1:10], xlab = 'Event', ylab = 'Fatalities', main = 'Fatalities', las = 2)

It is observed from the fatalities data, tornados have resulted in highest number of fatalities followed by heat and floods.

Similary the total monetary loss due to these weather events are plotted as follows:

damage <- Crop_dam+Prop_dam
barplot(damage [1:10], xlab = 'Event', ylab = 'Monetary loss', main = 'Monetary damage', las = 2)

It is observed that drought has resulted in maximum economic loss followed by flood and storm events.