The data can be downloaded from the link https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2
The National Oceanic and Atmospheric Administration is an American scientific agency within the United States Department of Commerce that focuses on the conditions of the oceans, major waterways, and the atmosphere. NOAA warns of dangerous weather, charts seas, guides the use and protection of ocean and coastal resources, and conducts research to provide understanding and improve stewardship of the environment. NOAA was officially formed in 1970 and in 2017 had over 11,000 civilian employees.Its research and operations are further supported by 321 uniformed service members who make up the NOAA Commissioned Corps. Since October 2017, NOAA has been headed by Timothy Gallaudet, as acting Under Secretary of Commerce for Oceans and Atmosphere and NOAA interim administrator.
Some information appearing in Storm Data may be provided by or gathered from sources outside the National Weather Service (NWS), such as the media, law enforcement and/or other government agencies, private companies, individuals, etc. An effort is made to use the best available information, but because of time and resource constraints, information from these sources may be unverified by the NWS. Accordingly, the NWS does not guarantee the accuracy or validity of the information. Further, when information appearing in Storm Data originated from a source outside the NWS (frequently credit is provided), Storm Data users requiring additional information should contact that source directly. ##1.1Downloading the data
#data<-download.file(url="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",destfile = "data")
library(plyr)
library(ggplot2)
library(gridExtra)
library(grid)
stormDataRed<-read.csv("repdata%2Fdata%2FStormData.csv")
dim(stormDataRed)
## [1] 902297 37
The analysis focuses only on the health and economic consequences od severe weather events,so we subset the columns. EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP.
stormData <- stormDataRed[,c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")]
Fatalaties as well as the injuries are summarized according to the event type
harm2health <- ddply(stormDataRed, .(EVTYPE), summarize,fatalities = sum(FATALITIES),injuries = sum(INJURIES))
fatal <- harm2health[order(harm2health$fatalities, decreasing = T), ]
injury <- harm2health[order(harm2health$injuries, decreasing = T), ]
exponential values are stored in a seperate column describing their value with letters (h = hundred, k = thousand, m = million, b = billion), the calucalion of the financial damage turns out to be slightly tricky. In a first step a function that converts the letter value of the exponent to a usable number must be implemented.
getExp <- function(e) {
if (e %in% c("h", "H"))
return(2)
else if (e %in% c("k", "K"))
return(3)
else if (e %in% c("m", "M"))
return(6)
else if (e %in% c("b", "B"))
return(9)
else if (!is.na(as.numeric(e)))
return(as.numeric(e))
else if (e %in% c("", "-", "?", "+"))
return(0)
else {
stop("Invalid value.")
}
}
values for property damage and crop damage are then calculated.
propExp <- sapply(stormDataRed$PROPDMGEXP, FUN=getExp)
stormDataRed$propDamage <- stormDataRed$PROPDMG * (10 ** propExp)
cropExp <- sapply(stormDataRed$CROPDMGEXP, FUN=getExp)
stormDataRed$cropDamage <- stormDataRed$CROPDMG * (10 ** cropExp)
Financial damage for crops and property have to be summarized according to the event type.
econDamage <- ddply(stormDataRed, .(EVTYPE), summarize,propDamage = sum(propDamage), cropDamage = sum(cropDamage))
events not causing any financial damage are removed.
econDamage <- econDamage[(econDamage$propDamage > 0 | econDamage$cropDamage > 0), ]
and The data is stored.
propDmgSorted <- econDamage[order(econDamage$propDamage, decreasing = T), ]
cropDmgSorted <- econDamage[order(econDamage$cropDamage, decreasing = T), ]
Top 5 weather events affecting the populations health (injuries and deaths) are shown.
head(injury[, c("EVTYPE", "injuries")],5)
## EVTYPE injuries
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
head(fatal[, c("EVTYPE", "fatalities")],5)
## EVTYPE fatalities
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
Lsts of the Top 5 weather events causing financial damage to both property and crops are shown below.
head(propDmgSorted[, c("EVTYPE", "propDamage")], 5)
## EVTYPE propDamage
## 153 FLASH FLOOD 6.820237e+13
## 786 THUNDERSTORM WINDS 2.086532e+13
## 834 TORNADO 1.078951e+12
## 244 HAIL 3.157558e+11
## 464 LIGHTNING 1.729433e+11
head(cropDmgSorted[, c("EVTYPE", "cropDamage")], 5)
## EVTYPE cropDamage
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025974480
p1 <- ggplot(data=head(injury,10), aes(x=reorder(EVTYPE, injuries), y=injuries)) +
geom_bar(fill="#999999",stat="identity") + coord_flip() +
ylab("Total number of injuries") + xlab("Event type") +
ggtitle("Health impact of weather events in the US - Top 10") +
theme(legend.position="none")
p2 <- ggplot(data=head(fatal,10), aes(x=reorder(EVTYPE, fatalities), y=fatalities)) +
geom_bar(fill="#E69F00",stat="identity") + coord_flip() +
ylab("Total number of fatalities") + xlab("Event type") +
theme(legend.position="none")
grid.arrange(p1, p2, nrow =2)
p1 <- ggplot(data=head(propDmgSorted,10), aes(x=reorder(EVTYPE, propDamage), y=log10(propDamage), fill=propDamage )) +
geom_bar(fill="#999999", stat="identity",col="blue") + coord_flip() +
xlab("Event type") + ylab("Property damage in dollars (log10)") +
ggtitle("Economic impact of weather events in the US - Top 10") +
theme(plot.title = element_text(hjust = 0))
p2 <- ggplot(data=head(cropDmgSorted,10), aes(x=reorder(EVTYPE, cropDamage), y=cropDamage, fill=cropDamage)) +
geom_bar(fill="#E69F00", stat="identity",col="blue") + coord_flip() +
xlab("Event type") + ylab("Crop damage in dollars") +
theme(legend.position="none")
grid.arrange(p1, p2, ncol=1, nrow =2)