Synopsis

Storms and other severe weather events can cause both public health and economic problems in the United States. Usign data from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database, this report will read, order and sum data relating to human health and economic damage organised by weather events. The data in the database is organised by county/municipality but this analysis will extract, aggregate and display the weather events with the highest total impact on humand health and economic cost for the entire United States.
The results show that Flood, Hurricane/Typhon & Tornado have the highest imapact on economic damage, and Tornado, Excessive heat & Thunderstrom wind have the highest impact on human health.

Data Processing

The script imports the NOAA Storm Database file, “repdata-data-StormData.csv.bz2”. When reading the CSV file, blank lines are ignored and data is delimited by comma “,”.

datacsv <- read.csv("repdata-data-StormData.csv.bz2",sep=",", na.strings = c("NA","","<NA>"),stringsAsFactors = FALSE, blank.lines.skip=TRUE)

There are 902297 rows and 37 columns:

dim(datacsv)
## [1] 902297     37

Looking at the available data types available from the data set:

We are primarily insterested in two questions

  1. Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
  2. Across the United States, which types of events have the greatest economic consequences?

The available variables from, “repdata-data-StormData.csv”, are as follows:

colnames(datacsv)
##  [1] "STATE__"    "BGN_DATE"   "BGN_TIME"   "TIME_ZONE"  "COUNTY"    
##  [6] "COUNTYNAME" "STATE"      "EVTYPE"     "BGN_RANGE"  "BGN_AZI"   
## [11] "BGN_LOCATI" "END_DATE"   "END_TIME"   "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE"  "END_AZI"    "END_LOCATI" "LENGTH"     "WIDTH"     
## [21] "F"          "MAG"        "FATALITIES" "INJURIES"   "PROPDMG"   
## [26] "PROPDMGEXP" "CROPDMG"    "CROPDMGEXP" "WFO"        "STATEOFFIC"
## [31] "ZONENAMES"  "LATITUDE"   "LONGITUDE"  "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS"    "REFNUM"

We are interested in the following variables, BGN_DATE, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP which relate to human health and economic impact.

We can limit the dataset to these columns:

data <- datacsv[c("BGN_DATE",  "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]

Checking for NA values in the data variable:

sum(is.na(data))
## [1] 1084347

There are NA values.

Checking for NA values in the fields:

Data.NA.Total <- data[1,c("EVTYPE","FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
Data.NA.Total$EVTYPE     <- sum(is.na(data$EVTYPE))
Data.NA.Total$FATALITIES <- sum(is.na(data$FATALITIES))
Data.NA.Total$INJURIES   <- sum(is.na(data$INJURIES))
Data.NA.Total$PROPDMG    <- sum(is.na(data$PROPDMG))
Data.NA.Total$PROPDMGEXP <- sum(is.na(data$PROPDMGEXP))
Data.NA.Total$CROPDMG    <- sum(is.na(data$CROPDMG))
Data.NA.Total$CROPDMGEXP <- sum(is.na(data$CROPDMGEXP))
Data.NA.Total
##   EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1      0          0        0       0     465934       0     618413

The property and crop exponents have a large number of NA values. This will be taken in to account during the total economic damage calculations.

Results

Weather events most harmful to human health in the United States

Extracting data relating to human health (fatalities, injuries) by weather event. Then aggregating and getting the totals.

sample <- data[,c("EVTYPE", "FATALITIES", "INJURIES")]
health <- aggregate(. ~ EVTYPE, data=sample, FUN=sum)
health$TOTAL <- health$FATALITIES + health$INJURIES

We can see the 10 weather events ordered by total fatalities:

hf <- health [order(health$FATALITIES, decreasing=TRUE),]
head(hf[,c("EVTYPE","FATALITIES")],10)
##             EVTYPE FATALITIES
## 834        TORNADO       5633
## 130 EXCESSIVE HEAT       1903
## 153    FLASH FLOOD        978
## 275           HEAT        937
## 464      LIGHTNING        816
## 856      TSTM WIND        504
## 170          FLOOD        470
## 585    RIP CURRENT        368
## 359      HIGH WIND        248
## 19       AVALANCHE        224

The main cause of fatalities is tornados.

We can see the 10 weather events ordered by total injuries:

hi <- health [order(health$INJURIES, decreasing=TRUE),]
head(hi[,c("EVTYPE","INJURIES")],10)
##                EVTYPE INJURIES
## 834           TORNADO    91346
## 856         TSTM WIND     6957
## 170             FLOOD     6789
## 130    EXCESSIVE HEAT     6525
## 464         LIGHTNING     5230
## 275              HEAT     2100
## 427         ICE STORM     1975
## 153       FLASH FLOOD     1777
## 760 THUNDERSTORM WIND     1488
## 244              HAIL     1361

The main cause of injuries is tornados, but the order changes for some of the other events in comparison to ordering by fatalities.

We can see a barplot of total fatalities by weather event:

colours <- rainbow (25, start=0, end=0.5)
barplot(hf[1:10,2]/1e3,beside=TRUE, main="Top 10 US Weather Events Causing Fatalities",names.arg= hf[1:10,1],density = -1,xlab="",
        ylab="Total Fatalities per thousand",cex.axis=0.8,cex.names = 0.7,las=2, col = colours)

We can see the 10 weather events ordered by the total fatalities (fatalities + injuries):

ht <- health [order(health$TOTAL, decreasing=TRUE),]
head(ht[,c("EVTYPE","TOTAL")],10)
##                EVTYPE TOTAL
## 834           TORNADO 96979
## 130    EXCESSIVE HEAT  8428
## 856         TSTM WIND  7461
## 170             FLOOD  7259
## 464         LIGHTNING  6046
## 275              HEAT  3037
## 153       FLASH FLOOD  2755
## 427         ICE STORM  2064
## 760 THUNDERSTORM WIND  1621
## 972      WINTER STORM  1527

The main impact on human health (fatalities + injuries) is tornados followed by excessive heat and thunderstorm wind.

We can see a barplot of total injuries and fatalities.

colours <- rainbow (25, start=0, end=0.5)
barplot(ht[1:10,4]/1e3,beside=TRUE, main="Top 10 US Weather Events Harmful to Human Health",names.arg= ht[1:10,1],density = -1,xlab="",
        ylab = "Total People Affected (Fatalities + Injuries) per thousand"
        ,cex.axis=0.8,cex.names = 0.7,las=2, col = colours)

Weather events with greatest economic consequences in the United States

Let’s start by ignoring rows from data that have no property and no crop damage, as these do not contribute to economic dmagae:

data <- data[!((data$PROPDMG == "0") & (data$CROPDMG == "0")),]

Assign NA property and exponent values to zero. So these vlaues do not affect our calculations.

data$CROPDMGEXP[is.na(data$CROPDMGEXP)] <- 0
data$PROPDMGEXP[is.na(data$PROPDMGEXP)] <- 0

Lets check the available property and agriculture exponents in the dataset.

Property exponent:

unique(data$PROPDMGEXP)
##  [1] "K" "M" "B" "m" "0" "+" "5" "6" "4" "h" "2" "7" "3" "H" "-"

Crop exponent:

unique(data$CROPDMGEXP)
## [1] "0" "M" "K" "m" "B" "?" "k"

Using crop and property damage exponent

Looking at the National Weather Service Instruction document on page 12 in the section that relates to econmic damage:

“Alphabetical characters used to signify magnitude include”K" for thousands, “M” for millions, and “B” for billions."

All exponents are used to culculate the property & crop damage, they are first converted to their equivalent numeric value e.g. H or h = Hecto = 100

# 3 = 1e3 because other EXP equal to K for same record
# List must contain all exponents or it will lead to an error.
EXPLIST <- list(h=1e2,H=1e2, K=1e3,k=1e3,M=1e6,m=1e6,B=1e9, "+"=1,"-"=1, "?"=1, "0"=1, "2"=1e2, "3"=1e3, "4"=1e4, "5"=1e5, "6"=1e6,"7"=1e7)

The total property & crop damage is calculated using the exponent.

The values are aggragated and ordered to determine the most damaging weather events.

data$PROPDMGVAL <- unlist(EXPLIST[data$PROPDMGEXP])*data$PROPDMG
data$CROPDMGVAL <- unlist(EXPLIST[data$CROPDMGEXP])*data$CROPDMG

sample.dmg <- data[,c("EVTYPE", "PROPDMGVAL", "CROPDMGVAL")]
damage <- aggregate(. ~ EVTYPE, data=sample.dmg, FUN=sum)
damage$TOTAL <- damage$PROPDMGVAL + damage$CROPDMGVAL 
damage <- damage [order(damage$TOTAL, decreasing=TRUE),]
head(damage,10)
##                EVTYPE   PROPDMGVAL  CROPDMGVAL        TOTAL
## 72              FLOOD 144657709807  5661968450 150319678257
## 197 HURRICANE/TYPHOON  69305840000  2607872800  71913712800
## 354           TORNADO  56947380677   414953270  57362333947
## 299       STORM SURGE  43323536000        5000  43323541000
## 116              HAIL  15735267513  3025954473  18761221986
## 59        FLASH FLOOD  16822673979  1421317100  18243991079
## 39            DROUGHT   1046106000 13972566000  15018672000
## 189         HURRICANE  11868319010  2741910000  14610229010
## 262       RIVER FLOOD   5118945500  5029459000  10148404500
## 206         ICE STORM   3944927860  5022113500   8967041360

The weather events causing the most economic damage in the US are Floods followed by Hurricane/Typhons & then Tornados.

Plotting these results:

colours <- rainbow (25, start=0, end=0.5)
barplot(damage[1:10,4]/1e9,beside=TRUE, main="Top 10 US Weather Events by Economic Damage",names.arg= damage[1:10,1],density = -1,xlab="",
        ylab = "Total Cost / Billions $"
        ,cex.axis=0.8, cex.names = 0.75,las = 2, col = colours)

Conclusion

The weather events in the US which are the main danger to human health are tornados followed by oppresive heat & Thunderstorm wind, they cause the most injuries and fatalties.

The weather event swith the greatest economic consequences are Floods followed by Hurricane/Typhons & then Tornados.