This report analyze one of U.S. National Oceanic and Atmospheric Administration’s (NOAA) datasets, which has data from natural events from 1950 to November 2011 across United States. The analyze in this report attempt to answer two questions, the first one is what type of event is the most harmful to United States population and the second one is what events have the greatest economic consequences.
My conclusion shows that Tornado is the most harmful event to U. S. population with more than 5000 fatalities and around 90000 injuries. For economic consequence there are two main categories of economic damage, one is property and the second is crop, for property Flood caused around $144 billion loss and for crop Drought caused around $13 billion loss.
Firstly, the data is downloaded from Storm Data to the user working directory then the data is loaded to R as a data.table.
download.file(url = "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "StormData.csv", method = "curl");
library(data.table);
rawData <- data.table(read.csv("StormData.csv", header = TRUE,
stringsAsFactors = FALSE, row.names = NULL));
The data set description can be fount at Data Description.
For our analysis we are interested in the following variables:
1. EVTYPE Type of natural event
2. FATALITIES Number of fatalities
3. INJURIES Number of injuries
4. PROPDMG Amount of property damage
5. PROPDMGEXP Order of magnitude for property damage
6. CROPDMG Amount of crop damage
7. CROPDMGEXP Order of magnitude for crop damage
The folowing R code selects the specified variables and removes the raw data from memmory.
variables <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP");
data <- rawData[ , variables, with = FALSE];
rm(rawData);
Next all the crop and property damage are transformed billion dolars.
# The exponent variable has the following values
# ("K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8")
# if it is not a know value of K, M or B exponent it is considerend to be a 10^-9 exponent.
transform <- function(amount, exponent) {
switch(exponent,
K = (amount*10^-6) , k = (amount*10^-6),
M = (amount*10^-3), m = (amount*10^-3),
B = (amount), b = (amount),
(amount*10^-9))
}
data[ , PROPDMG := transform(PROPDMG, PROPDMGEXP), by = 1:NROW(data)];
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1: TORNADO 0 15 2.5e-05 K 0
## 2: TORNADO 0 0 2.5e-06 K 0
## 3: TORNADO 0 2 2.5e-05 K 0
## 4: TORNADO 0 2 2.5e-06 K 0
## 5: TORNADO 0 2 2.5e-06 K 0
## ---
## 902293: HIGH WIND 0 0 0.0e+00 K 0
## 902294: HIGH WIND 0 0 0.0e+00 K 0
## 902295: HIGH WIND 0 0 0.0e+00 K 0
## 902296: BLIZZARD 0 0 0.0e+00 K 0
## 902297: HEAVY SNOW 0 0 0.0e+00 K 0
## CROPDMGEXP
## 1:
## 2:
## 3:
## 4:
## 5:
## ---
## 902293: K
## 902294: K
## 902295: K
## 902296: K
## 902297: K
data[ , CROPDMG := transform(CROPDMG, CROPDMGEXP), by = 1:NROW(data)];
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG
## 1: TORNADO 0 15 2.5e-05 K 0
## 2: TORNADO 0 0 2.5e-06 K 0
## 3: TORNADO 0 2 2.5e-05 K 0
## 4: TORNADO 0 2 2.5e-06 K 0
## 5: TORNADO 0 2 2.5e-06 K 0
## ---
## 902293: HIGH WIND 0 0 0.0e+00 K 0
## 902294: HIGH WIND 0 0 0.0e+00 K 0
## 902295: HIGH WIND 0 0 0.0e+00 K 0
## 902296: BLIZZARD 0 0 0.0e+00 K 0
## 902297: HEAVY SNOW 0 0 0.0e+00 K 0
## CROPDMGEXP
## 1:
## 2:
## 3:
## 4:
## 5:
## ---
## 902293: K
## 902294: K
## 902295: K
## 902296: K
## 902297: K
For simplicity, in this report it is not considered that environmental events such as named “TSTM WINDS” and “THUNDERSTORM WINDS” in the EVTYPE variables are the same event. Further work is necessery to clean up this data by analysing these special cases in detail.
The following R code displays the 10 natural events that had more fatalities and injuries respectively.
fatalities <- data[, sum(FATALITIES) , by = EVTYPE][order(V1 , decreasing = TRUE)];
injuries <- data[ , sum(INJURIES), by = EVTYPE][order(V1, decreasing = TRUE)];
setnames(fatalities, "V1", "TotalFatalities")
setnames(injuries, "V1", "TotalInjuries");
fatalities[1:10];
## EVTYPE TotalFatalities
## 1: TORNADO 5633
## 2: EXCESSIVE HEAT 1903
## 3: FLASH FLOOD 978
## 4: HEAT 937
## 5: LIGHTNING 816
## 6: TSTM WIND 504
## 7: FLOOD 470
## 8: RIP CURRENT 368
## 9: HIGH WIND 248
## 10: AVALANCHE 224
injuries[1:10];
## EVTYPE TotalInjuries
## 1: TORNADO 91346
## 2: TSTM WIND 6957
## 3: FLOOD 6789
## 4: EXCESSIVE HEAT 6525
## 5: LIGHTNING 5230
## 6: HEAT 2100
## 7: ICE STORM 1975
## 8: FLASH FLOOD 1777
## 9: THUNDERSTORM WIND 1488
## 10: HAIL 1361
The following chart shows the top 10 natural events that caused fatalities or injuries
between the years of 1950 and 2011 in United States.
library(ggplot2);
qplot(x = fatalities$EVTYPE[1:10], y = fatalities$TotalFatalities[1:10],
main = "Total Number of Fatalities", xlab = "Natural Event",
ylab = "Number of Fatalities") + theme(axis.title.x = element_text(face="bold"),
axis.text.x = element_text(angle=90, vjust=1));
qplot(x = injuries$EVTYPE[1:10] , y = injuries$TotalInjuries[1:10],
xlab = "Natural Event", ylab = "Number of Injuries", main = "Total Number of Injuries",
) + theme(axis.title.x = element_text(face="bold"),
axis.text.x = element_text(angle=90, vjust=1));
The economic consequence is separated in two categories, the first one is property damage and the second is crop damage. Firstly we compute the top 10 natural events that caused the most economic consequence from property and crop.
property <- data[, sum(PROPDMG) , by = EVTYPE][order(V1 , decreasing = TRUE)];
crop <- data[ , sum(CROPDMG), by = EVTYPE][order(V1, decreasing = TRUE)];
setnames(property, "V1", "TotalPropDamageBlnUSD")
setnames(crop, "V1", "TotalCropDamageBlnUSD");
property[1:10];
## EVTYPE TotalPropDamageBlnUSD
## 1: FLOOD 144.657710
## 2: HURRICANE/TYPHOON 69.305840
## 3: TORNADO 56.937161
## 4: STORM SURGE 43.323536
## 5: FLASH FLOOD 16.140812
## 6: HAIL 15.732267
## 7: HURRICANE 11.868319
## 8: TROPICAL STORM 7.703891
## 9: WINTER STORM 6.688497
## 10: HIGH WIND 5.270046
crop[1:10];
## EVTYPE TotalCropDamageBlnUSD
## 1: DROUGHT 13.972566
## 2: FLOOD 5.661968
## 3: RIVER FLOOD 5.029459
## 4: ICE STORM 5.022113
## 5: HAIL 3.025954
## 6: HURRICANE 2.741910
## 7: HURRICANE/TYPHOON 2.607873
## 8: FLASH FLOOD 1.421317
## 9: EXTREME COLD 1.292973
## 10: FROST/FREEZE 1.094086
The following chart shows the total amount of the 10 natural events that caused more property damage in United States in between the years of 1950 and 2011.
library(ggplot2);
qplot(x = property$EVTYPE[1:10], y = property$TotalPropDamageBlnUSD[1:10],
main = "Total Property Damage", xlab = "Natural Event",
ylab = "Cost in bln USD") +
theme(axis.text.x = element_text(angle=90, vjust=1));
qplot(x = crop$EVTYPE[1:10] , y = crop$TotalCropDamageBlnUSD[1:10],
xlab = "Natural Event", ylab = "Cost in bln USD", main = "Total Crop Damage",
) + theme(axis.text.x = element_text(angle=90, vjust=1));
It can be seen that the most harmful natural event to United States population is Tornado with 5633 fatalities and 91346 injuries followed by Excessive Heat in terms of fatalities (1903) and TSTM Wind in terms of injuries (6957).
For economic consequence it can be seen that Flood (144.657710 bln USD), Hurricane or Thyphon (69.305840 bln USD) and Tornado (56.937161 bln USD) are the natural events that causes the most property damage. On the other hand Drought (13.972566 bln USD), Flood (5.661968 bln USD) and River Flood (5.029459 bln USD) are the natural events that causes the most economic damage to crop.