This report attempts to identify severe weather events in the U.S. which have had the greatest impact on human health and economic conditions. For the purpose of this analysis the Storm Data was used. This data is published bu the the National Oceanic and Atmospheric Administration (NOAA) which documents the occurrence of storms and other significant weather phenomena having sufficientintensity to cause loss of life, injuries, significant property damage, and/or disruption to commerce.Economic impact is analysed in terms of the damage caused to property and crops.The results of the analysis show that floods, droughts, hurricane and tornado have been most harmful to the economic conditions. Effect on health is measured by the number of fatalities and injuries casued by the weather events. Analysis of these varaibles show that tornado, thunderstorm wind, floods and excessive heat were top most to have adversely effected human health.
Loading Data
#####Working directory
setwd("C:\\Users\\Sara\\Desktop\\Sara\\5_Reproducible Research\\PA_2")
#####Loading data
rawdata<-read.csv(bzfile("repdata-data-StormData.csv.bz2"), header=TRUE, stringsAsFactors=FALSE)
names(rawdata)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Selecting Variables
The dataset has a totao of 37 variables. For this analysis we require only seven of these variables. These are:
EVTYPE
FATALITIES
INJURIES
PROPDMG
PROPDMGEXP
CROPDMG
CROPDMGEXP
The first step of data processing is to select the variables of our interest.
variables<-c("EVTYPE","FATALITIES","INJURIES","PROPDMG", "PROPDMGEXP","CROPDMG","CROPDMGEXP")
data<-rawdata[variables]
Selecting Health Cases
The next step is selecting 10 events that cause the most fatalities and injuries.
fatal <- aggregate(FATALITIES ~ EVTYPE, data = data, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, data = data, FUN = sum)
fatal10 <- fatal[order(-fatal$FATALITIES),][1:10, ]
injury10 <- injury[order(-injury$INJURIES),][1:10, ]
Preparing crop and production data
In the dataset there are two variables that capture the cost of property damage. One variable records the estimate rounded to three significant digits, whereas the other variable has an alphabetical character which signifies that magnitute for example, include “K” for thousands, “M” for millions, and “B” for billions. Thus, as part of data processing these alphabetical characters were converted to their numeric value. A new variable was generated which was a product of these two variables.
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
data$PROPEXP[data$PROPDMGEXP == "K" ] <- 1000
data$PROPEXP[data$PROPDMGEXP == "M" ] <- 10^6
data$PROPEXP[data$PROPDMGEXP == "" ] <- 1
data$PROPEXP[data$PROPDMGEXP == "B" ] <- 10^9
data$PROPEXP[data$PROPDMGEXP == "m" ] <- 10^6
data$PROPEXP[data$PROPDMGEXP == "+" ] <- 0
data$PROPEXP[data$PROPDMGEXP == "0" ] <- 1
data$PROPEXP[data$PROPDMGEXP == "5" ] <- 10^5
data$PROPEXP[data$PROPDMGEXP == "6" ] <- 10^6
data$PROPEXP[data$PROPDMGEXP == "?" ] <- 0
data$PROPEXP[data$PROPDMGEXP == "4" ] <- 10000
data$PROPEXP[data$PROPDMGEXP == "2" ] <- 100
data$PROPEXP[data$PROPDMGEXP == "3" ] <- 1000
data$PROPEXP[data$PROPDMGEXP == "h" ] <- 100
data$PROPEXP[data$PROPDMGEXP == "7" ] <- 10^7
data$PROPEXP[data$PROPDMGEXP == "H" ] <- 100
data$PROPEXP[data$PROPDMGEXP == "-" ] <- 0
data$PROPEXP[data$PROPDMGEXP == "1" ] <- 10
data$PROPEXP[data$PROPDMGEXP == "8" ] <- 10^8
data$PROPDMGVAL <- data$PROPDMG * data$PROPEXP
Similar variables exist for crop damage. Thus, the same exercise was carried out to achieve a variable which has the numeric value of estimate of crop damage.
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
data$CROPEXP[data$CROPDMGEXP == "" ] <- 1
data$CROPEXP[data$CROPDMGEXP == "M" ] <- 10^6
data$CROPEXP[data$CROPDMGEXP == "K" ] <- 1000
data$CROPEXP[data$CROPDMGEXP == "m" ] <- 10^9
data$CROPEXP[data$CROPDMGEXP == "B" ] <- 10^6
data$CROPEXP[data$CROPDMGEXP == "?" ] <- 0
data$CROPEXP[data$CROPDMGEXP == "0" ] <- 1
data$CROPEXP[data$CROPDMGEXP == "k" ] <- 1000
data$CROPEXP[data$CROPDMGEXP == "2" ] <- 100
data$CROPDMGVAL <- data$CROPDMG * data$CROPEXP
Selecting Crop Damage and Property Damage Cases
The next step is selecting 10 events that cause the most crop and property damage.
propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, data = data, FUN = sum)
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, data = data, FUN = sum)
propdmg10<-propdmg[order(-propdmg$PROPDMGVAL), ][1:10,]
cropdmg10<-cropdmg[order(-cropdmg$CROPDMGVAL), ][1:10,]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8)
barplot(fatal10$FATALITIES, names.arg=fatal10$EVTYPE, ylim= c(0,7000),col=heat.colors(10),ylab="Number of Fatalities", main=" Top 10 Events with Highest Fatalities")
barplot(injury10$INJURIES, names.arg=injury10$EVTYPE,ylim= c(0,10000), col=terrain.colors(10), ylab="Number of Injuries", main=" Top 10 Events with Highest Injuries")
The plot above shows that tornado has caused the highest number of fatalities and injuries.
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), las=3,cex = 0.8, cex.main = 0.9)
barplot((propdmg10$PROPDMGVAL)/(1*10^9), names.arg=propdmg10$EVTYPE, col=heat.colors(10, alpha = 1), ylab=" Cost of Property Damage($ billions)", main="Top 10 Events Causing Highest Property Damage")
barplot((cropdmg10$CROPDMGVAL)/(1*10^9), names.arg=cropdmg10$EVTYPE, col=terrain.colors(10, alpha = 1), ylab=" Cost of Crop Damage($ billions)", main="Top 10 Events Causing Highest Crop Damage")
The plot above shows that flood has caused the most property damage, whereas droughts have been the most harmful to crops.