In this report we will explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Loading neccasry libraries
library(reshape2)
library(ggplot2)
To explore the schema of the data in further details please refer to : We will be focusing on following columns from this storm data set:
weather_data <- read.table(bzfile("repdata-data-StormData.csv.bz2"), header=TRUE, sep=",", colClasses=c(rep("NULL",7), "factor", rep("NULL",14),"numeric","numeric","numeric","NULL","numeric",rep("NULL",10)))
rows <- nrow(weather_data)
summary(weather_data)
## EVTYPE FATALITIES INJURIES PROPDMG
## HAIL :288661 Min. : 0 Min. : 0.0 Min. : 0
## TSTM WIND :219940 1st Qu.: 0 1st Qu.: 0.0 1st Qu.: 0
## THUNDERSTORM WIND: 82563 Median : 0 Median : 0.0 Median : 0
## TORNADO : 60652 Mean : 0 Mean : 0.2 Mean : 12
## FLASH FLOOD : 54277 3rd Qu.: 0 3rd Qu.: 0.0 3rd Qu.: 0
## FLOOD : 25326 Max. :583 Max. :1700.0 Max. :5000
## (Other) :170878
## CROPDMG
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 0.0
## Mean : 1.5
## 3rd Qu.: 0.0
## Max. :990.0
##
The number of observations is 902297 .
The data is too large so lets subset it and consider only the important non-empty rows.
weather_data <- subset(weather_data, FATALITIES + INJURIES + PROPDMG + CROPDMG > 0)
rows <- nrow(weather_data)
summary(weather_data)
## EVTYPE FATALITIES INJURIES PROPDMG
## TSTM WIND :63234 Min. : 0.0 Min. : 0.0 Min. : 0
## THUNDERSTORM WIND:43655 1st Qu.: 0.0 1st Qu.: 0.0 1st Qu.: 2
## TORNADO :39944 Median : 0.0 Median : 0.0 Median : 5
## HAIL :26130 Mean : 0.1 Mean : 0.6 Mean : 43
## FLASH FLOOD :20967 3rd Qu.: 0.0 3rd Qu.: 0.0 3rd Qu.: 25
## LIGHTNING :13293 Max. :583.0 Max. :1700.0 Max. :5000
## (Other) :47410
## CROPDMG
## Min. : 0.0
## 1st Qu.: 0.0
## Median : 0.0
## Mean : 5.4
## 3rd Qu.: 0.0
## Max. :990.0
##
The number of observations after subsetting is 254633
Aggregate the injuries and fatalities by each type of weather events.
health_impact<-with(weather_data, aggregate(list("FATALITIES"=FATALITIES,"INJURIES"=INJURIES),list("EVTYPE"=EVTYPE), sum))
health_impact<-health_impact[order(-(health_impact[2]+health_impact[,3])),]
head(health_impact)
## EVTYPE FATALITIES INJURIES
## 407 TORNADO 5633 91346
## 61 EXCESSIVE HEAT 1903 6525
## 423 TSTM WIND 504 6957
## 86 FLOOD 470 6789
## 258 LIGHTNING 816 5230
## 151 HEAT 937 2100
To Answer the questions lets examine the top 5 events that corrspond to most injuries or fatalities
health_impact<-health_impact[1:5,]
melted_health_impact<-melt(health_impact,id=c("EVTYPE"))
melted_health_impact<-melted_health_impact[order(-melted_health_impact[3]),]
qplot(data=melted_health_impact, x=EVTYPE, y=value,geom="bar", stat="identity",fill=variable,xlab='Weather Event ',ylab='Number Affected People')+coord_flip()
As we can see in the graph Tornados corsponds to the largest number of Injuries.
Aggregate the crop and total property damage by each type of weather events.
economic_impact<-with(weather_data, aggregate(list("PROPDMG"=PROPDMG,"CROPDMG"=CROPDMG),list("EVTYPE"=EVTYPE), sum))
economic_impact<-economic_impact[order(-(economic_impact[2]+economic_impact[,3])),]
To Answer the questions lets examine the top 5 events that corrspond to most damaging.
economic_impact<-economic_impact[1:5,]
melted_economic_impact<-melt(economic_impact,id=c("EVTYPE"))
melted_economic_impact<-melted_economic_impact[order(-melted_economic_impact[3]),]
qplot(data=melted_economic_impact, x=EVTYPE, y=value,geom="bar", stat="identity",fill=variable,xlab='Weather Event ',ylab='Damages in $ ')+coord_flip()
Once Again , Tornados cause the most economical damages among other events types.