Synopsis

In this report we will explore the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Processing

Loading neccasry libraries

library(reshape2)
library(ggplot2)

To explore the schema of the data in further details please refer to : We will be focusing on following columns from this storm data set:

weather_data <- read.table(bzfile("repdata-data-StormData.csv.bz2"), header=TRUE, sep=",", colClasses=c(rep("NULL",7), "factor", rep("NULL",14),"numeric","numeric","numeric","NULL","numeric",rep("NULL",10)))
rows <- nrow(weather_data)
summary(weather_data)
##                EVTYPE         FATALITIES     INJURIES         PROPDMG    
##  HAIL             :288661   Min.   :  0   Min.   :   0.0   Min.   :   0  
##  TSTM WIND        :219940   1st Qu.:  0   1st Qu.:   0.0   1st Qu.:   0  
##  THUNDERSTORM WIND: 82563   Median :  0   Median :   0.0   Median :   0  
##  TORNADO          : 60652   Mean   :  0   Mean   :   0.2   Mean   :  12  
##  FLASH FLOOD      : 54277   3rd Qu.:  0   3rd Qu.:   0.0   3rd Qu.:   0  
##  FLOOD            : 25326   Max.   :583   Max.   :1700.0   Max.   :5000  
##  (Other)          :170878                                                
##     CROPDMG     
##  Min.   :  0.0  
##  1st Qu.:  0.0  
##  Median :  0.0  
##  Mean   :  1.5  
##  3rd Qu.:  0.0  
##  Max.   :990.0  
## 

The number of observations is 902297 .
The data is too large so lets subset it and consider only the important non-empty rows.

weather_data <- subset(weather_data, FATALITIES + INJURIES + PROPDMG + CROPDMG > 0)
rows <- nrow(weather_data)
summary(weather_data)
##                EVTYPE        FATALITIES       INJURIES         PROPDMG    
##  TSTM WIND        :63234   Min.   :  0.0   Min.   :   0.0   Min.   :   0  
##  THUNDERSTORM WIND:43655   1st Qu.:  0.0   1st Qu.:   0.0   1st Qu.:   2  
##  TORNADO          :39944   Median :  0.0   Median :   0.0   Median :   5  
##  HAIL             :26130   Mean   :  0.1   Mean   :   0.6   Mean   :  43  
##  FLASH FLOOD      :20967   3rd Qu.:  0.0   3rd Qu.:   0.0   3rd Qu.:  25  
##  LIGHTNING        :13293   Max.   :583.0   Max.   :1700.0   Max.   :5000  
##  (Other)          :47410                                                  
##     CROPDMG     
##  Min.   :  0.0  
##  1st Qu.:  0.0  
##  Median :  0.0  
##  Mean   :  5.4  
##  3rd Qu.:  0.0  
##  Max.   :990.0  
## 

The number of observations after subsetting is 254633

Results

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Aggregate the injuries and fatalities by each type of weather events.

health_impact<-with(weather_data, aggregate(list("FATALITIES"=FATALITIES,"INJURIES"=INJURIES),list("EVTYPE"=EVTYPE), sum))
health_impact<-health_impact[order(-(health_impact[2]+health_impact[,3])),]
head(health_impact)
##             EVTYPE FATALITIES INJURIES
## 407        TORNADO       5633    91346
## 61  EXCESSIVE HEAT       1903     6525
## 423      TSTM WIND        504     6957
## 86           FLOOD        470     6789
## 258      LIGHTNING        816     5230
## 151           HEAT        937     2100

To Answer the questions lets examine the top 5 events that corrspond to most injuries or fatalities

health_impact<-health_impact[1:5,]
melted_health_impact<-melt(health_impact,id=c("EVTYPE"))
melted_health_impact<-melted_health_impact[order(-melted_health_impact[3]),]
qplot(data=melted_health_impact, x=EVTYPE, y=value,geom="bar", stat="identity",fill=variable,xlab='Weather Event ',ylab='Number Affected People')+coord_flip()

plot of chunk unnamed-chunk-5

As we can see in the graph Tornados corsponds to the largest number of Injuries.

Across the United States, which types of events have the greatest economic consequences?

Aggregate the crop and total property damage by each type of weather events.

economic_impact<-with(weather_data, aggregate(list("PROPDMG"=PROPDMG,"CROPDMG"=CROPDMG),list("EVTYPE"=EVTYPE), sum))
economic_impact<-economic_impact[order(-(economic_impact[2]+economic_impact[,3])),]

To Answer the questions lets examine the top 5 events that corrspond to most damaging.

economic_impact<-economic_impact[1:5,]
melted_economic_impact<-melt(economic_impact,id=c("EVTYPE"))
melted_economic_impact<-melted_economic_impact[order(-melted_economic_impact[3]),]
qplot(data=melted_economic_impact, x=EVTYPE, y=value,geom="bar", stat="identity",fill=variable,xlab='Weather Event ',ylab='Damages in $ ')+coord_flip()

plot of chunk unnamed-chunk-7

Once Again , Tornados cause the most economical damages among other events types.