Storms and other severe weather events can cause both public health and economic problems in the United States. Usign data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, this report will read, order and sum data relating to human health and economic damage organised by weather events. The data in the database is organised by county/municipality but this analysis will extract, aggregate and display the weather events with the highest total impact on humand health and economic cost for the entire United States.
The results show that Flood, Hurricane/Typhon & Tornado have the highest imapact on economic damage, and Tornado, Excessive heat & Thunderstrom wind have the highest impact on human health.
The script imports the NOAA Storm Database file, “repdata-data-StormData.csv.bz2”. When reading the CSV file, blank lines are ignored and data is delimited by comma “,”.
datacsv <- read.csv("repdata-data-StormData.csv.bz2",sep=",", na.strings = c("NA","","<NA>"),stringsAsFactors = FALSE, blank.lines.skip=TRUE)
There are 902297 rows and 37 columns:
dim(datacsv)
## [1] 902297 37
Looking at the available data types available from the data set:
We are primarily insterested in two questions
The available variables from, “repdata-data-StormData.csv”, are as follows:
colnames(datacsv)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
We are interested in the following variables, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP which relate to human health and economic impact.
We can limit the dataset to these columns:
data <- datacsv[c("BGN_DATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Checking for NA values in the data variable:
sum(is.na(data))
## [1] 1084347
There are NA values.
Checking for NA values in the fields:
Data.NA.Total <- data[1,c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Data.NA.Total$EVTYPE <- sum(is.na(data$EVTYPE))
Data.NA.Total$FATALITIES <- sum(is.na(data$FATALITIES))
Data.NA.Total$INJURIES <- sum(is.na(data$INJURIES))
Data.NA.Total$PROPDMG <- sum(is.na(data$PROPDMG))
Data.NA.Total$PROPDMGEXP <- sum(is.na(data$PROPDMGEXP))
Data.NA.Total$CROPDMG <- sum(is.na(data$CROPDMG))
Data.NA.Total$CROPDMGEXP <- sum(is.na(data$CROPDMGEXP))
Data.NA.Total
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 0 0 0 0 465934 0 618413
The property and crop exponents have a large number of NA values. This will be taken in to account during the total economic damage calculations.
Extracting data relating to human health (fatalities, injuries) by weather event. Then aggregating and getting the totals.
sample <- data[,c("EVTYPE", "FATALITIES", "INJURIES")]
health <- aggregate(. ~ EVTYPE, data=sample, FUN=sum)
health$TOTAL <- health$FATALITIES + health$INJURIES
We can see the 10 weather events ordered by total fatalities:
hf <- health [order(health$FATALITIES, decreasing=TRUE),]
head(hf[,c("EVTYPE","FATALITIES")],10)
## EVTYPE FATALITIES
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
## 170 FLOOD 470
## 585 RIP CURRENT 368
## 359 HIGH WIND 248
## 19 AVALANCHE 224
The main cause of fatalities is tornados.
We can see the 10 weather events ordered by total injuries:
hi <- health [order(health$INJURIES, decreasing=TRUE),]
head(hi[,c("EVTYPE","INJURIES")],10)
## EVTYPE INJURIES
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
## 427 ICE STORM 1975
## 153 FLASH FLOOD 1777
## 760 THUNDERSTORM WIND 1488
## 244 HAIL 1361
The main cause of injuries is tornados, but the order changes for some of the other events in comparison to ordering by fatalities.
We can see a barplot of total fatalities by weather event:
colours <- rainbow (25, start=0, end=0.5)
barplot(hf[1:10,2]/1e3,beside=TRUE, main="Top 10 US Weather Events Causing Fatalities",names.arg= hf[1:10,1],density = -1,xlab="",
ylab="Total Fatalities per thousand",cex.axis=0.8,cex.names = 0.7,las=2, col = colours)
We can see the 10 weather events ordered by the total fatalities (fatalities + injuries):
ht <- health [order(health$TOTAL, decreasing=TRUE),]
head(ht[,c("EVTYPE","TOTAL")],10)
## EVTYPE TOTAL
## 834 TORNADO 96979
## 130 EXCESSIVE HEAT 8428
## 856 TSTM WIND 7461
## 170 FLOOD 7259
## 464 LIGHTNING 6046
## 275 HEAT 3037
## 153 FLASH FLOOD 2755
## 427 ICE STORM 2064
## 760 THUNDERSTORM WIND 1621
## 972 WINTER STORM 1527
The main impact on human health (fatalities + injuries) is tornados followed by excessive heat and thunderstorm wind.
We can see a barplot of total injuries and fatalities.
colours <- rainbow (25, start=0, end=0.5)
barplot(ht[1:10,4]/1e3,beside=TRUE, main="Top 10 US Weather Events Harmful to Human Health",names.arg= ht[1:10,1],density = -1,xlab="",
ylab = "Total People Affected (Fatalities + Injuries) per thousand"
,cex.axis=0.8,cex.names = 0.7,las=2, col = colours)
Let’s start by ignoring rows from data that have no property and no crop damage, as these do not contribute to economic dmagae:
data <- data[!((data$PROPDMG == "0") & (data$CROPDMG == "0")),]
Assign NA property and exponent values to zero. So these vlaues do not affect our calculations.
data$CROPDMGEXP[is.na(data$CROPDMGEXP)] <- 0
data$PROPDMGEXP[is.na(data$PROPDMGEXP)] <- 0
Lets check the available property and agriculture exponents in the dataset.
Property exponent:
unique(data$PROPDMGEXP)
## [1] "K" "M" "B" "m" "0" "+" "5" "6" "4" "h" "2" "7" "3" "H" "-"
Crop exponent:
unique(data$CROPDMGEXP)
## [1] "0" "M" "K" "m" "B" "?" "k"
Looking at the National Weather Service Instruction document on page 12 in the section that relates to econmic damage:
“Alphabetical characters used to signify magnitude include”K" for thousands, “M” for millions, and “B” for billions."
All exponents are used to culculate the property & crop damage, they are first converted to their equivalent numeric value e.g. H or h = Hecto = 100
# 3 = 1e3 because other EXP equal to K for same record
# List must contain all exponents or it will lead to an error.
EXPLIST <- list(h=1e2,H=1e2, K=1e3,k=1e3,M=1e6,m=1e6,B=1e9, "+"=1,"-"=1, "?"=1, "0"=1, "2"=1e2, "3"=1e3, "4"=1e4, "5"=1e5, "6"=1e6,"7"=1e7)
The total property & crop damage is calculated using the exponent.
The values are aggragated and ordered to determine the most damaging weather events.
data$PROPDMGVAL <- unlist(EXPLIST[data$PROPDMGEXP])*data$PROPDMG
data$CROPDMGVAL <- unlist(EXPLIST[data$CROPDMGEXP])*data$CROPDMG
sample.dmg <- data[,c("EVTYPE", "PROPDMGVAL", "CROPDMGVAL")]
damage <- aggregate(. ~ EVTYPE, data=sample.dmg, FUN=sum)
damage$TOTAL <- damage$PROPDMGVAL + damage$CROPDMGVAL
damage <- damage [order(damage$TOTAL, decreasing=TRUE),]
head(damage,10)
## EVTYPE PROPDMGVAL CROPDMGVAL TOTAL
## 72 FLOOD 144657709807 5661968450 150319678257
## 197 HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 354 TORNADO 56947380677 414953270 57362333947
## 299 STORM SURGE 43323536000 5000 43323541000
## 116 HAIL 15735267513 3025954473 18761221986
## 59 FLASH FLOOD 16822673979 1421317100 18243991079
## 39 DROUGHT 1046106000 13972566000 15018672000
## 189 HURRICANE 11868319010 2741910000 14610229010
## 262 RIVER FLOOD 5118945500 5029459000 10148404500
## 206 ICE STORM 3944927860 5022113500 8967041360
The weather events causing the most economic damage in the US are Floods followed by Hurricane/Typhons & then Tornados.
Plotting these results:
colours <- rainbow (25, start=0, end=0.5)
barplot(damage[1:10,4]/1e9,beside=TRUE, main="Top 10 US Weather Events by Economic Damage",names.arg= damage[1:10,1],density = -1,xlab="",
ylab = "Total Cost / Billions $"
,cex.axis=0.8, cex.names = 0.75,las = 2, col = colours)
The weather events in the US which are the main danger to human health are tornados followed by oppresive heat & Thunderstorm wind, they cause the most injuries and fatalties.
The weather event swith the greatest economic consequences are Floods followed by Hurricane/Typhons & then Tornados.