The U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database was explored in order to study which are the most harmful events to population health and to the economy of the United States. The data was downloaded and only the columns of interest was handled. Some data processing was needed to evaluate the economic losses, since there are exponent columns.
After that the 10 most harmful events regarding fatalities and injured was presented, and it was possible to see that tornardos are by far the most harmful.
Regarding the economy the 10 most harmful events were also presented and the conclusion was that floods caused the biggest damages in properties whereas drouhgts were responsible for the biggest losses in crop values.
First the file is downloaded and loaded to the variable “stormdata”
if (!file.exists("/stormdata.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "./stormData.csv.bz2")
}
stormdata<-read.csv("stormdata.csv.bz2",header = TRUE, sep = ",")
Next, the data is explored, to check how it was built
head(stormdata)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL
## EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO 0 0
## 2 TORNADO 0 0
## 3 TORNADO 0 0
## 4 TORNADO 0 0
## 5 TORNADO 0 0
## 6 TORNADO 0 0
## COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1 NA 0 14.0 100 3 0 0
## 2 NA 0 2.0 150 2 0 0
## 3 NA 0 0.1 123 2 0 0
## 4 NA 0 0.0 100 2 0 0
## 5 NA 0 0.0 150 2 0 0
## 6 NA 0 1.5 177 2 0 0
## INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1 15 25.0 K 0
## 2 0 2.5 K 0
## 3 2 25.0 K 0
## 4 2 2.5 K 0
## 5 2 2.5 K 0
## 6 6 2.5 K 0
## LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3040 8812 3051 8806 1
## 2 3042 8755 0 0 2
## 3 3340 8742 0 0 3
## 4 3458 8626 0 0 4
## 5 3412 8642 0 0 5
## 6 3450 8748 0 0 6
To answer the questions of interest, the columns EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP will be used.
event <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG",
"PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
data <- stormdata[event]
To calculate the total property and crop damages by events, there is a need to deal with the exponent columns, which indicate the magnitude of each damage value.
unique(data$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
The numbers indicate the power of 10 of each value, and the letters indicate if it is hundreds (h, H), thousands (K), millions (M,m), or billions (B). If the level is blank or 0, it means units. The values “-”, “?”, “+” are invalid data and will be assigned as zero.
data$PROPEXP[data$PROPDMGEXP == ""] <- 1e+00
data$PROPEXP[data$PROPDMGEXP == "0"] <- 1
data$PROPEXP[data$PROPDMGEXP == "1"] <- 1e+01
data$PROPEXP[data$PROPDMGEXP == "2"] <- 1e+02
data$PROPEXP[data$PROPDMGEXP == "h"] <- 1e+02
data$PROPEXP[data$PROPDMGEXP == "H"] <- 1e+02
data$PROPEXP[data$PROPDMGEXP == "3"] <- 1e+03
data$PROPEXP[data$PROPDMGEXP == "K"] <- 1e+03
data$PROPEXP[data$PROPDMGEXP == "4"] <- 1e+04
data$PROPEXP[data$PROPDMGEXP == "5"] <- 1e+05
data$PROPEXP[data$PROPDMGEXP == "6"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "m"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "M"] <- 1e+06
data$PROPEXP[data$PROPDMGEXP == "7"] <- 1e+07
data$PROPEXP[data$PROPDMGEXP == "8"] <- 1e+08
data$PROPEXP[data$PROPDMGEXP == "B"] <- 1e+09
data$PROPEXP[data$PROPDMGEXP == "+"] <- 0
data$PROPEXP[data$PROPDMGEXP == "-"] <- 0
data$PROPEXP[data$PROPDMGEXP == "?"] <- 0
data$PROPVAL <- data$PROPDMG * data$PROPEXP
The same was done for the exponent column of crop damage.
unique(data$CROPDMGEXP)
## [1] M K m B ? 0 k 2
## Levels: ? 0 2 B k K m M
The numbers indicate the power of 10 of each value, and the letters indicate if it is thousands (k,K), millions (M,m) or billions (B). If the level is blank or 0, it means units. The value “?” is for invalid data and will be assigned as zero.
data$CROPEXP[data$CROPDMGEXP == ""] <- 1e+00
data$CROPEXP[data$CROPDMGEXP == "0"] <- 1
data$CROPEXP[data$CROPDMGEXP == "2"] <- 1e+02
data$CROPEXP[data$CROPDMGEXP == "K"] <- 1e+03
data$CROPEXP[data$CROPDMGEXP == "k"] <- 1e+03
data$CROPEXP[data$CROPDMGEXP == "m"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "M"] <- 1e+06
data$CROPEXP[data$CROPDMGEXP == "B"] <- 1e+09
data$CROPEXP[data$CROPDMGEXP == "?"] <- 0
data$CROPVAL <- data$CROPDMG * data$CROPEXP
The events were ranked regarding the number of fatalities and injuries.
fatalities<-aggregate(FATALITIES~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
fatalities<-head(fatalities[order(fatalities$FATALITIES,decreasing = TRUE),],10)
injuries<-aggregate(INJURIES~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
injuries<-head(injuries[order(injuries$INJURIES,decreasing = TRUE),],10)
print(fatalities)
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
print(injuries)
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
par(mfrow=c(1,2),mar=c(12,5,3,2),cex=0.75,mgp=c(4,1,0))
barplot(fatalities$FATALITIES,
names.arg = fatalities$EVTYPE,
las=2,
ylab = "Fatalities",
main="Fatalities vs Events")
barplot(injuries$INJURIES,
names.arg = injuries$EVTYPE,
las=2,
ylab = "Injuries",
main="Injuries vs Events")
Types of events that are most harmful to population health
As we can see, Tornados are by far the most harmful events to population health (both in fatalities an injuries).
The events were ranked regarding property and crop damage.
prop<-aggregate(PROPVAL~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
prop<-head(prop[order(prop$PROPVAL,decreasing = TRUE),],10)
crop<-aggregate(CROPVAL~EVTYPE,data=data,FUN=sum,na.rm=TRUE)
crop<-head(crop[order(crop$CROPVAL,decreasing = TRUE),],10)
print(prop)
## EVTYPE PROPVAL
## 170 FLOOD 144657709807
## 411 HURRICANE/TYPHOON 69305840000
## 834 TORNADO 56947380617
## 670 STORM SURGE 43323536000
## 153 FLASH FLOOD 16822673979
## 244 HAIL 15735267513
## 402 HURRICANE 11868319010
## 848 TROPICAL STORM 7703890550
## 972 WINTER STORM 6688497251
## 359 HIGH WIND 5270046260
print(crop)
## EVTYPE CROPVAL
## 95 DROUGHT 13972566000
## 170 FLOOD 5661968450
## 590 RIVER FLOOD 5029459000
## 427 ICE STORM 5022113500
## 244 HAIL 3025954473
## 402 HURRICANE 2741910000
## 411 HURRICANE/TYPHOON 2607872800
## 153 FLASH FLOOD 1421317100
## 140 EXTREME COLD 1292973000
## 212 FROST/FREEZE 1094086000
par(mfrow=c(1,2),mar=c(12,5,3,2),cex=0.75,mgp=c(4,1,0))
barplot(prop$PROPVAL*1e-9,
names.arg = prop$EVTYPE,
las=2,
ylab = "Property Damage (in billions of USD)",
main="Property Damage vs Events")
barplot(crop$CROPVAL*1e-9,
names.arg = crop$EVTYPE,
las=2,
ylab = "Crop Damage (in billions of USD)",
main="Crop Damage vs Events")
Types of events that are most harmful to the economy
As we can see, floods are responsible for the biggest property damages and droughts are accountable for the biggest losses in crops.