Created by James Lim - June 2015
Explore the NOAA Storm Database and answer some basic questions about severe weather events.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size. You can download the 47mb file from the web site Storm Data
There is also some documentation of the database available. Here you will find how some of the variables are constructed/defined.
National Weather Service Storm Data Documentation
National Climatic Data Center Storm Events FAQ
The csv file is downloaded into machine local hard disk and user must unzip the zip file. This will take a few minutes to read the csv.
setwd("e:\\module5")
stormdata <- read.csv("repdata_data_StormData.csv", sep = ",")
Subset data for fatalities, injuries, and property damage
newDataName=c("EVTYPE","FATALITIES","INJURIES","PROPDMG","CROPDMG")
dataDanger<-subset(stormdata,select=newDataName)
dataDanger$EVTYPE = toupper(stormdata$EVTYPE)
dataDanger<-dataDanger[!grepl("Summary", stormdata$EVTYPE), ]
Aggregate Data
damageType=c("FATALITIES","INJURIES","PROPDMG","CROPDMG")
damages<-aggregate(dataDanger[damageType],list(dataDanger$EVTYPE),sum)
#Delete rows with zero in all columns
damages<-damages[rowSums(damages[, -1] > 0) != 0, ]
#Sum damage on property and crop
damages$finantial<-damages$PROPDMG+damages$CROPDMG
damages$PROPDMG<-NULL
damages$CROPDMG<-NULL
myData<-data.frame(damages$Group.1,damages$FATALITIES)
names(myData)<-c("x","y")
#Order descending
myData <- myData[order(-myData$y),]
#deleting zero
myData<-myData[(myData[, -1] >0),]
#keep top 10 Events
myData<-head(myData,10)
ymax<-max(myData$y)
barplot(myData$y, las=3, names.arg = myData$x, main = "Top 10 Highest Fatalities", ylab = "Number of Fatalities", col = "blue")
myData<-data.frame(damages$Group.1,damages$INJURIES)
names(myData)<-c("x","y")
#Order descending
myData <- myData[order(-myData$y),]
#deleting zero
myData<-myData[(myData[, -1] >0),]
#keep top 10 Events
myData<-head(myData,10)
ymax<-max(myData$y)
barplot (myData$y, las=3, names.arg = myData$x, main = "Top 10 Highest Injuries", ylab = "Number of Injuries", col = "green")
Top 10 weather events classified by economic damage to crops and properties
myData<-data.frame(damages$Group.1,damages$finantial)
names(myData)<-c("x","y")
#Order descending
myData <- myData[order(-myData$y),]
#deleting zero
myData<-myData[(myData[, -1] >0),]
#keep top 10 Events
myData<-head(myData,10)
ymax<-max(myData$y)
barplot(myData$y, las=3, names.arg = myData$x, main = "Top 10 Highest Economic Damages", ylab = "Economic damage ($)", col = "red")