Synopsis

We are looking at the “Storm Data” from the NOAA Storm Database. We are trying to explore this data and answer the following questions:

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

Approach

  1. To answer the first question we shall be first looking at the leading number of fatalities and identify the events causing them. Similary, look at leading number of injuries and identify events causing them.

  2. To answer the second question we shall first clean the eco damage related columns, add up both poperty and crop damage and identify the top 5 events causing them

Data Processing

Approach

  • We shall read the downloaded file from the working directory
  • Pick the coloums which are of use to us and create a subset accordingly
  • Identify the columns which need to be modified
  • Modify the required columns & use it for our analysis

Reading the file

 Storm <- read.csv("repdata%2Fdata%2FStormData.csv")
 storm <- Storm[c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]

Checking for events which are harmful for the population

Checking the reason for highest number of fatalities

library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
storm_types <- aggregate(FATALITIES ~ EVTYPE,sum,data = storm)

# sorting the data on basis on number of fatalities
storm_types <- storm_types[order(storm_types$FATALITIES,decreasing = TRUE),]

#picking the top 10 events 
storm_types10 <- storm_types[1:10,]

#plotting a barplot
qplot(EVTYPE,FATALITIES,data = storm_types10,color = EVTYPE,xlab = "Top  Fatalities Causing Events",ylab="No. of Fatalities",main = "Most Harmful Events - Fatalities (1950 - 2011)")+theme(axis.text.x = element_text(angle = 60, hjust = 1))+geom_bar(stat = "identity")

Clearly we observe that “Tornado” is causing the most number of fatalities followed by “Excessive Heat”

Checking the reason for highest number of injuries

storm_types1 <- aggregate(INJURIES ~ EVTYPE,sum,data = storm)

# sorting the data on basis on number of fatalities
storm_types1 <- storm_types1[order(storm_types1$INJURIES,decreasing = TRUE),]

#picking the top 10 events 
storm_injuries10 <- storm_types1[1:10,]

#plotting a histogram
qplot(EVTYPE,INJURIES,data = storm_injuries10,color = EVTYPE,xlab = "Top  Injuries Causing Events",ylab="No. of Injuries",main = "Most Harmful Events - Injuries (1950-2011)")+theme(axis.text.x = element_text(angle = 60, hjust = 1))+geom_bar(stat = "identity")

Even in this case “Tornado” causes the maximum damage by a way ahead of other events. It is followed by “Thunderstorms”,“Flood” and “Excessive Heat”

Identifying events that have great economic consequences

Checking the Property Damage expenses

As the expenses have been represented with the use of two columns where the second column “PROPDMGEXP” indicates the exponential (on base of 10) we need to standardise all the values.

We shall be representing all vaues as Millions

Standardising the values for Property Damage

storm$prop[storm$PROPDMGEXP == "0"]<- 1
storm$prop[storm$PROPDMGEXP == "1"]<- 10
storm$prop[storm$PROPDMGEXP == "2"|storm$PROPDMGEXP == "h"|storm$PROPDMGEXP == "H"]<- 100
storm$prop[storm$PROPDMGEXP == "3" | storm$PROPDMGEXP == "K"]<- 1000
storm$prop[storm$PROPDMGEXP == "4"]<- 10000
storm$prop[storm$PROPDMGEXP == "5"]<- 100000
storm$prop[storm$PROPDMGEXP == "6"|storm$PROPDMGEXP == "m"|storm$PROPDMGEXP == "M"]<- 1000000
storm$prop[storm$PROPDMGEXP == "7"]<- 10000000
storm$prop[storm$PROPDMGEXP == "8"]<- 100000000
storm$prop[storm$PROPDMGEXP == "B"]<- 1000000000
storm$prop[storm$PROPDMGEXP == ""|storm$PROPDMGEXP == "?"|storm$PROPDMGEXP == "+"|storm$PROPDMGEXP == "-"]<- 0

storm$prop <- as.integer(storm$prop)
storm$prop <- storm$prop/1000000
storm$PROPDMG <- as.integer(storm$PROPDMG)

storm$prop_expense <- storm$prop*storm$PROPDMG

Standardising the values for Crop Damage

storm$crop[storm$CROPDMGEXP == "0"]<- 1
storm$crop[storm$PROPDMGEXP == "2"]<- 100
storm$crop[storm$PROPDMGEXP == "B"]<- 1000000000
storm$crop[storm$PROPDMGEXP == "k"|storm$PROPDMGEXP == "K"]<- 1000
storm$crop[storm$PROPDMGEXP == "m"|storm$PROPDMGEXP == "M"]<- 1000000

storm$crop <- as.integer(storm$crop)
storm$crop <- storm$crop/1000000
storm$CROPDMG <- as.integer(storm$CROPDMG)

storm$crop_expense <- storm$crop*storm$CROPDMG

Calculate the total damage caused to both Property & Crop and sort

storm$total_exp <- storm$prop_expense + storm$crop_expense

storm_total_exp <-aggregate(total_exp ~ EVTYPE,sum,data = storm)

storm_total_exp <- storm_total_exp[order(storm_total_exp$total_exp,decreasing = TRUE),]

Well, lets have a look at the top 5 culprits of Property & Crop Damage

head(storm_total_exp,5)
##                EVTYPE total_exp
## 170         HURRICANE 814035.95
## 178 HURRICANE/TYPHOON 796839.98
## 62              FLOOD 230614.68
## 331           TORNADO  82541.93
## 50        FLASH FLOOD  54734.72
qplot(EVTYPE,total_exp,data = storm_total_exp[1:5,],color = EVTYPE,xlab="Top Events causing economic loss",ylab = "Economic Loss in Millions",main = "Events causing economic loss to US 1950-2011")+theme(axis.text.x = element_text(angle = 60, hjust = 1))+geom_bar(stat = "identity")

Clearly “Hurricane” has caused lot of damage in term of economics for the country

Results

From the above results we can summarize that

1.“Tornado” is causing the most number of fatalities & injuries followed by “Excessive Heat” in Fatalities & “Thunderstorms”,“Flood” and “Excessive Heat” in Injuries section.

2.The top 5 events causing econmic loss to the country are “Hurricane”,“Flood” & “Tornado” in descending order.