This course project challenged us to analyze and explore the National Oceanic and Atmospheric Administration (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete. The link to the data can be found here while additional information about the data and its documentation can be found here and an FAQ about the National National Climatic Data Center Storm Events can be found here
We were tasked to answer two questions.
After analyzing and exploring the data, it was found that tornadoes were responsible for causing the most harm with respect to population health, while floods were responsible for having the highest amount of economic consequence.
Load in libraries and data so it can be processed
library(data.table)
library(ggplot2)
data <- read.csv("./repdata_data_StormData.csv.bz2", header = TRUE, sep = ",")
head(data)
## STATE__ BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE EVTYPE
## 1 1 4/18/1950 0:00:00 0130 CST 97 MOBILE AL TORNADO
## 2 1 4/18/1950 0:00:00 0145 CST 3 BALDWIN AL TORNADO
## 3 1 2/20/1951 0:00:00 1600 CST 57 FAYETTE AL TORNADO
## 4 1 6/8/1951 0:00:00 0900 CST 89 MADISON AL TORNADO
## 5 1 11/15/1951 0:00:00 1500 CST 43 CULLMAN AL TORNADO
## 6 1 11/15/1951 0:00:00 2000 CST 77 LAUDERDALE AL TORNADO
## BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END COUNTYENDN
## 1 0 0 NA
## 2 0 0 NA
## 3 0 0 NA
## 4 0 0 NA
## 5 0 0 NA
## 6 0 0 NA
## END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES INJURIES PROPDMG
## 1 0 14.0 100 3 0 0 15 25.0
## 2 0 2.0 150 2 0 0 0 2.5
## 3 0 0.1 123 2 0 0 2 25.0
## 4 0 0.0 100 2 0 0 2 2.5
## 5 0 0.0 150 2 0 0 2 2.5
## 6 0 1.5 177 2 0 0 6 2.5
## PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES LATITUDE LONGITUDE
## 1 K 0 3040 8812
## 2 K 0 3042 8755
## 3 K 0 3340 8742
## 4 K 0 3458 8626
## 5 K 0 3412 8642
## 6 K 0 3450 8748
## LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1 3051 8806 1
## 2 0 0 2
## 3 0 0 3
## 4 0 0 4
## 5 0 0 5
## 6 0 0 6
Next, we have to subset data to look at values pertaining to health and economic impact, as well as filter out any inputs with no values. This is done to ensure that there is accurate data when calculating the total cost
subset_data <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
data <- data[, subset_data]
data <- as.data.table(data)
data <- data[(EVTYPE != "?" & (INJURIES > 0 | FATALITIES > 0 | PROPDMG > 0 | CROPDMG > 0)),
c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")]
summary(data)
## EVTYPE FATALITIES INJURIES PROPDMG
## Length:254632 Min. : 0.00000 Min. : 0.0000 Min. : 0.00
## Class :character 1st Qu.: 0.00000 1st Qu.: 0.0000 1st Qu.: 2.00
## Mode :character Median : 0.00000 Median : 0.0000 Median : 5.00
## Mean : 0.05948 Mean : 0.5519 Mean : 42.75
## 3rd Qu.: 0.00000 3rd Qu.: 0.0000 3rd Qu.: 25.00
## Max. :583.00000 Max. :1700.0000 Max. :5000.00
## PROPDMGEXP CROPDMG CROPDMGEXP
## Length:254632 Min. : 0.000 Length:254632
## Class :character 1st Qu.: 0.000 Class :character
## Mode :character Median : 0.000 Mode :character
## Mean : 5.411
## 3rd Qu.: 0.000
## Max. :990.000
Next, we have to convert the values in PROPDMG and CROPDMG from exponent values to numeric values. This is done so that when we create the graphs, we’re able to have numerical values that can be quickly read and understood
damage <- c("PROPDMGEXP", "CROPDMGEXP")
data[, (damage) := c(lapply(.SD, toupper)), .SDcols = damage]
prop_damage <- c("\"\"" = 10^0,
"-" = 10^0, "+" = 10^0, "0" = 10^0, "1" = 10^1, "2" = 10^2, "3" = 10^3,
"4" = 10^4, "5" = 10^5, "6" = 10^6, "7" = 10^7, "8" = 10^8, "9" = 10^9,
"H" = 10^2, "K" = 10^3, "M" = 10^6, "B" = 10^9)
crop_damage <- c("\"\"" = 10^0, "?" = 10^0, "0" = 10^0, "K" = 10^3, "M" = 10^6, "B" = 10^9)
data[, PROPDMGEXP := prop_damage[as.character(data[,PROPDMGEXP])]]
data[is.na(PROPDMGEXP), PROPDMGEXP := 10^0]
data[, CROPDMGEXP := crop_damage[as.character(data[,CROPDMGEXP])]]
data[is.na(CROPDMGEXP), CROPDMGEXP := 10^0]
data <- data[, .(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, PROPCOST =
PROPDMG * PROPDMGEXP, CROPDMG, CROPDMGEXP, CROPCOST = CROPDMG * CROPDMGEXP)]
Estimating the total fatalities and injuries
harmful_events <- data[, .(FATALITIES = sum(FATALITIES), INJURIES = sum(INJURIES),
TOTAL_HARMFUL_EVENTS = sum(FATALITIES) + sum(INJURIES)), by = .(EVTYPE)]
harmful_events <- harmful_events[order(-TOTAL_HARMFUL_EVENTS),]
harmful_events <- harmful_events[1:10,]
head(harmful_events, 10)
## EVTYPE FATALITIES INJURIES TOTAL_HARMFUL_EVENTS
## <char> <num> <num> <num>
## 1: TORNADO 5633 91346 96979
## 2: EXCESSIVE HEAT 1903 6525 8428
## 3: TSTM WIND 504 6957 7461
## 4: FLOOD 470 6789 7259
## 5: LIGHTNING 816 5230 6046
## 6: HEAT 937 2100 3037
## 7: FLASH FLOOD 978 1777 2755
## 8: ICE STORM 89 1975 2064
## 9: THUNDERSTORM WIND 133 1488 1621
## 10: WINTER STORM 206 1321 1527
Estimating the total prop and crop cost
eco_result <- data[, .(PROPCOST = sum(PROPCOST), CROPCOST = sum(CROPCOST),
TOTAL_ECO_RESULT = sum(PROPCOST) + sum(CROPCOST)), by = .(EVTYPE)]
eco_result <- eco_result[order(-TOTAL_ECO_RESULT),]
eco_result <- eco_result[1:10,]
head(eco_result, 10)
## EVTYPE PROPCOST CROPCOST TOTAL_ECO_RESULT
## <char> <num> <num> <num>
## 1: FLOOD 144657709807 5661968450 150319678257
## 2: HURRICANE/TYPHOON 69305840000 2607872800 71913712800
## 3: TORNADO 56947380677 414953270 57362333947
## 4: STORM SURGE 43323536000 5000 43323541000
## 5: HAIL 15735267513 3025954473 18761221986
## 6: FLASH FLOOD 16822673979 1421317100 18243991079
## 7: DROUGHT 1046106000 13972566000 15018672000
## 8: HURRICANE 11868319010 2741910000 14610229010
## 9: RIVER FLOOD 5118945500 5029459000 10148404500
## 10: ICE STORM 3944927860 5022113500 8967041360
population_health <- melt(harmful_events, id.vars = "EVTYPE", variable.name = "Labels")
ggplot(population_health, aes(x = reorder(EVTYPE, -value), y = value)) +
geom_bar(stat = "identity", aes(fill = Labels), position = "dodge") +
ylab("Total Fatalities/Injuries") + xlab("Event") +
theme(axis.text.x = element_text(angle=30, hjust=0.5)) +
ggtitle("Weather Events That Are Most Harmful With Respect To Population Health")
This graph shows that tornadoes cause the most amount of fatalities and injuries
eco_consequences <- melt(eco_result, id.vars = "EVTYPE", variable.name = "Labels")
ggplot(eco_consequences, aes(x = reorder(EVTYPE, -value), y = value/1e9)) +
geom_bar(stat = "identity", aes(fill = Labels), position = "dodge") +
ylab("Cost/Damage In Billions") + xlab("Event") +
theme(axis.text.x = element_text(angle=30, hjust=0.5)) +
ggtitle("Weather Events That Have The Greatest Economic Consequence")
This graph shows that floods cause the most amount of property and crop cost/damage