In this report, analyisis is performed on the NOAA Storm Database to answer questions about severe weather events across the United States:
The database documents the occurrence of storms and other weather events that cause loss of life, injuries, property damage and/or disruption to commerce, which we can use to identify the top event-types based on the impact on population heath (death and injuries) and damage (property and crop).
The analysis shows
This report is useful for a government or municipal manager who might be responsible for preparing for severe weather events and will need to prioritize resources for different types of events.
The data for this assignment can be downloaded from the course web site: Storm Data [47Mb]. The events in the database start in the year 1950 and end in November 2011.
In the earlier years it was observed that there there are generally fewer events recorded while the records in recent years are more complete.
setwd("F:/Data Science Course/ProgammingAssignment53")
# Check if the "repdata-data-StormData.csv.bz" data frame exists
if (!file.exists("repdata_data_StormData.csv.bz2")) {
# If not, download it
setInternet2(use = TRUE)
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", "F:/Data Science Course/ProgammingAssignment53/repdata_data_StormData.csv.bz2")
}
# Read data from downloaded file
storm.raw.data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"),
header = TRUE,
nrows = -1,
sep = ",",
stringsAsFactors = FALSE)
print(nrow(storm.raw.data))
## [1] 902297
print(ncol(storm.raw.data))
## [1] 37
The raw data frame contains 902297 observations and 37 fields
In order to optimize computational resources, we only consider rows where Fatalities, Injuries, Property Damage Expense or Crop Damage Expense are > 0 and select 7 relevant columns (EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) that will be used in this analysis.
storm.data <- subset(storm.raw.data, FATALITIES > 0 | INJURIES > 0 | PROPDMG > 0 | CROPDMG > 0, select = c(8, 23:28))
message("There are ", nrow(storm.data), " selected observations and ", ncol(storm.data), " fields")
## There are 254633 selected observations and 7 fields
#check out the structure of selected columns
str(storm.data)
## 'data.frame': 254633 obs. of 7 variables:
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
To analyze the impact on population health, we prepare the dataset to include both injuries and fatalities.
# Calculating events with fatalities
fatalities.data <- aggregate(FATALITIES ~ EVTYPE, data = storm.data, sum, na.rm = TRUE)
names(fatalities.data) <- c("EVENT_TYPE", "FATALITIES")
fatalities.data <- fatalities.data[order(-fatalities.data$FATALITIES), ]
fatalities.data[1:15, ]
## EVENT_TYPE FATALITIES
## 407 TORNADO 5633
## 61 EXCESSIVE HEAT 1903
## 73 FLASH FLOOD 978
## 151 HEAT 937
## 258 LIGHTNING 816
## 423 TSTM WIND 504
## 86 FLOOD 470
## 306 RIP CURRENT 368
## 200 HIGH WIND 248
## 11 AVALANCHE 224
## 481 WINTER STORM 206
## 307 RIP CURRENTS 204
## 153 HEAT WAVE 172
## 67 EXTREME COLD 160
## 364 THUNDERSTORM WIND 133
# Calculating events with injuries
injuries.data <- aggregate(INJURIES ~ EVTYPE, data = storm.data, sum, na.rm = TRUE)
names(injuries.data) = c("EVENT_TYPE", "INJURIES")
injuries.data = injuries.data[order(-injuries.data$INJURIES), ]
injuries.data[1:15, ]
## EVENT_TYPE INJURIES
## 407 TORNADO 91346
## 423 TSTM WIND 6957
## 86 FLOOD 6789
## 61 EXCESSIVE HEAT 6525
## 258 LIGHTNING 5230
## 151 HEAT 2100
## 238 ICE STORM 1975
## 73 FLASH FLOOD 1777
## 364 THUNDERSTORM WIND 1488
## 134 HAIL 1361
## 481 WINTER STORM 1321
## 224 HURRICANE/TYPHOON 1275
## 200 HIGH WIND 1137
## 170 HEAVY SNOW 1021
## 471 WILDFIRE 911
# Calcutating events with fatalities and injuries
impact.on.health <- aggregate(FATALITIES + INJURIES ~ EVTYPE, data = storm.data, sum,
na.rm = TRUE)
names(impact.on.health) <- c("EVENT_TYPE", "FATALITIES.AND.INJURIES")
impact.on.health = impact.on.health[order(-impact.on.health$FATALITIES.AND.INJURIES), ]
#Merge the data for future use in results section
casualties0 <- merge(fatalities.data, injuries.data)
casualties.data <- merge(casualties0,impact.on.health)
casualties.data <- casualties.data[order(-casualties.data$FATALITIES.AND.INJURIES), ]
To analyze the impact on economy, we prepare the dataset to include both property and crop damages in order to find the events with the greatest economic consequences.
The damage value is represented by two parts “-DMG” (numeric) and “-DMGEXP” (alphanumeric) so we use the followin steps:
Retrieving values of exponents
expData <- storm.data[storm.data$PROPDMGEXP %in% c("", "K", "M", "B") & storm.data$CROPDMGEXP %in% c("", "K", "M", "B"), ]
We need to transform the exponent values (“”,1,H,K,M,B) for both crop damages and property damages into numerical values and multiply them by the economic damages. The following shows a function created to convert exponent values to numeric for the calculation of total damages, where the formula is DMG * Exponent
convExponent <- function(dmg, exp) {
if (exp == "K") {
dmg * 1000
} else if (exp == "M") {
dmg * 1e+06
} else if (exp == "B") {
dmg * 1e+09
} else if (exp == "") {
dmg
} else {
stop("NOT VALID DATA")
}
}
Applying conversion function to CROPDMG and PROPDMG, and adding two new fields with total damage amounts
expData$PROP_DMG <- mapply(convExponent, expData$PROPDMG, expData$PROPDMGEXP)
expData$CROP_DMG <- mapply(convExponent, expData$CROPDMG, expData$CROPDMGEXP)
Calculate for the events (crop and property damages) which have the greatest economic consequences and convert total economic impact to “million dollars”"
#calculation crop damages
crop.damage <- aggregate(expData$CROP_DMG ~ EVTYPE, data = expData, sum, na.rm = TRUE)
names(crop.damage) <- c("EVENT_TYPE", "CROP_TOTAL_DMG")
crop.damage <- crop.damage[order(-crop.damage$CROP_TOTAL_DMG),]
crop.damage$cropMILLS <- crop.damage$CROP_TOTAL_DMG/10^6
crop.damage[1:15,c(1,3)]
## EVENT_TYPE cropMILLS
## 48 DROUGHT 13972.5660
## 84 FLOOD 5661.9685
## 305 RIVER FLOOD 5029.4590
## 233 ICE STORM 5022.1100
## 131 HAIL 3000.5375
## 210 HURRICANE 2741.9100
## 219 HURRICANE/TYPHOON 2607.8728
## 72 FLASH FLOOD 1420.7271
## 66 EXTREME COLD 1292.9730
## 111 FROST/FREEZE 1094.0860
## 156 HEAVY RAIN 733.3998
## 411 TROPICAL STORM 678.3460
## 196 HIGH WIND 638.5713
## 417 TSTM WIND 554.0073
## 60 EXCESSIVE HEAT 492.4020
#calculate the property damages
property.damage <- aggregate(expData$PROP_DMG ~ EVTYPE, data = expData, sum, na.rm = TRUE)
names(property.damage) <- c("EVENT_TYPE", "PROP_TOTAL_DMG")
property.damage <- property.damage[order(-property.damage$PROP_TOTAL_DMG),]
property.damage$propMILLS <- property.damage$PROP_TOTAL_DMG/10^6
property.damage[1:15,c(1,3)]
## EVENT_TYPE propMILLS
## 84 FLOOD 144657.710
## 219 HURRICANE/TYPHOON 69305.840
## 401 TORNADO 56925.485
## 345 STORM SURGE 43323.536
## 72 FLASH FLOOD 16140.812
## 131 HAIL 15727.166
## 210 HURRICANE 11868.319
## 411 TROPICAL STORM 7703.891
## 475 WINTER STORM 6688.497
## 196 HIGH WIND 5270.046
## 305 RIVER FLOOD 5118.945
## 465 WILDFIRE 4765.114
## 346 STORM SURGE/TIDE 4641.188
## 417 TSTM WIND 4484.928
## 233 ICE STORM 3944.928
#calculate the combined crop and property damages
economic.damage<- aggregate(expData$CROP_DMG + expData$PROP_DMG ~
EVTYPE, data <- expData, sum, na.rm = TRUE)
names(economic.damage) <- c("EVENT_TYPE", "CROP_PROP_TOTAL_DMG")
economic.damage<- economic.damage[order(-economic.damage$CROP_PROP_TOTAL_DMG),]
economic.damage$ECODMGMILLS <- economic.damage$CROP_PROP_TOTAL_DMG/10^6
#merge data for future use in result section
economic0 <- merge(crop.damage,property.damage)
economic.data <- merge(economic0,economic.damage)
economic.data <- economic.data[order(-economic.data$CROP_PROP_TOTAL_DMG), ]
Question 1. Across the United States, which types of events (as indicated in the [EVTYPE] variable) are most harmful with respect to human health?
casualties.data[1:15, ]
## EVENT_TYPE FATALITIES INJURIES FATALITIES.AND.INJURIES
## 407 TORNADO 5633 91346 96979
## 61 EXCESSIVE HEAT 1903 6525 8428
## 423 TSTM WIND 504 6957 7461
## 86 FLOOD 470 6789 7259
## 258 LIGHTNING 816 5230 6046
## 151 HEAT 937 2100 3037
## 73 FLASH FLOOD 978 1777 2755
## 238 ICE STORM 89 1975 2064
## 364 THUNDERSTORM WIND 133 1488 1621
## 481 WINTER STORM 206 1321 1527
## 200 HIGH WIND 248 1137 1385
## 134 HAIL 15 1361 1376
## 224 HURRICANE/TYPHOON 64 1275 1339
## 170 HEAVY SNOW 127 1021 1148
## 471 WILDFIRE 75 911 986
The following plot shows that tornados are the most harmful weather event to population health. (injuries = 91346.0 and fatalities = 5633.0, total “casualties” = 96979.0)
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.1.2
ggplot(impact.on.health[1:15, ], aes(x = reorder(EVENT_TYPE, FATALITIES.AND.INJURIES), y = FATALITIES.AND.INJURIES)) +
geom_bar(stat = "identity",fill="red") + coord_flip() +
labs(x = "Event types", y = "Fatalities & Injuries", title = "Top 15 Weather Events with Fatalities & Injuries" )
Question 2. Across the United States, which types of events (as indicated in the [EVTYPE] variable) have the greatest economic consequences?
economic.data[1:15,c(1,3,5,7)]
## EVENT_TYPE cropMILLS propMILLS ECODMGMILLS
## 84 FLOOD 5661.9685 144657.710 150319.678
## 219 HURRICANE/TYPHOON 2607.8728 69305.840 71913.713
## 401 TORNADO 364.9501 56925.485 57290.436
## 345 STORM SURGE 0.0050 43323.536 43323.541
## 131 HAIL 3000.5375 15727.166 18727.703
## 72 FLASH FLOOD 1420.7271 16140.812 17561.539
## 48 DROUGHT 13972.5660 1046.106 15018.672
## 210 HURRICANE 2741.9100 11868.319 14610.229
## 305 RIVER FLOOD 5029.4590 5118.945 10148.405
## 233 ICE STORM 5022.1100 3944.928 8967.038
## 411 TROPICAL STORM 678.3460 7703.891 8382.237
## 475 WINTER STORM 26.9440 6688.497 6715.441
## 196 HIGH WIND 638.5713 5270.046 5908.618
## 465 WILDFIRE 295.4728 4765.114 5060.587
## 417 TSTM WIND 554.0073 4484.928 5038.936
The following plot shows that floods are the most harmful weather event to economy. (crop damages = $5661.97M and property damages = $144657.71M, total “economic impact” = $150319.68M)
ggplot(economic.damage[1:15, ], aes(x = reorder(EVENT_TYPE, ECODMGMILLS), y = ECODMGMILLS)) +
geom_bar(stat = "identity",fill="red") + coord_flip() + labs(x = "Event types", y = "Crop & Property Damage (in Millions) ",
title = "Top 15 Weather Events with Crop & Property Damages")
Tornados are the weather events most harmful to population health with 96979.0 casualties (91346.0 injuries and 5633.0 deaths).
Floods caused the most economic damage - $150319.68M ($5661.97M in crop damages and $144657.71M in property damages).