Every climatic event has a different range of consequences related to public health and economy. Some have severe effects while some can be directly ignored. Hence, It is very important to study the general trend of these weather events on health and economy, so that precautions can be taken.
In this case study, we have taken the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
The data will be loaded into R if it isn’t already.
if(!exists("stormData")){
stormData <- read.csv(file="repdata_data_StormData.csv.bz2")
}
Initial exploring of stormData
dim(stormData)
## [1] 902297 37
str(stormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
The column related with Events type is EVTYPE.
For the health impact, we consider following columns:
For the economic impact, we consider following columns:
The above mentioned columns are extracted from stormData to get a smaller focused dataset.
req <- c("EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP", "CROPDMG", "CROPDMGEXP")
req_stormData <- stormData[, req]
dim(req_stormData)
## [1] 902297 7
The economic damages are given in two parts for property and crop each. The PROPDMGEXP and CROPDMGEXP hold the exponential power of 10 to which PROPDMG and CROPDMG is mulitplied.
The characters in PROPDMGEXP and CROPDMGEXP both follow the same notion stated below
# function to add appropriate exponential value in place of letters
exp_unit <- function(dmgexp){
dmgexp[grep("h|H", dmgexp, ignore.case = TRUE)] <- "2"
dmgexp[grep("k|K", dmgexp, ignore.case = TRUE)] <- "3"
dmgexp[grep("m|M", dmgexp, ignore.case = TRUE)] <- "6"
dmgexp[grep("b|B", dmgexp, ignore.case = TRUE)] <- "9"
dmgexp[!grep("h|H|k|K|b|B|M|m", dmgexp, ignore.case = TRUE)] <- "0"
dmgexp <- as.numeric(as.character(dmgexp))
}
# Calculating Total cost for each entry in dataset
req_stormData <- req_stormData %>% mutate(propusd = PROPDMG * 10^exp_unit(PROPDMGEXP)) %>% mutate(cropusd = CROPDMG * 10^exp_unit(CROPDMGEXP)) %>% mutate(totalusd = cropusd + propusd)
Going through fatalities for each Event type
# Grouping by all the occurences of every event while summing up their fatalities
total_fatalities <- req_stormData %>% select(EVTYPE, FATALITIES) %>% group_by(EVTYPE) %>% summarise(total_fatalities = sum(FATALITIES)) %>% arrange(-total_fatalities)
head(total_fatalities, 10)
## # A tibble: 10 x 2
## EVTYPE total_fatalities
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
Going through injuries for each Event type
# Grouping by all the occurences of every event while summing up their injuries
total_injuries <- req_stormData %>% select(EVTYPE,INJURIES) %>% group_by(EVTYPE) %>% summarise(total_injuries = sum(INJURIES)) %>% arrange(-total_injuries)
head(total_injuries, 10)
## # A tibble: 10 x 2
## EVTYPE total_injuries
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
Going through total USD cost for each Event type
# Grouping by all the occurences of every event while summing up their total cost
total_cost <- req_stormData %>% select(EVTYPE, totalusd) %>% group_by(EVTYPE) %>% summarise(total_cost= sum(totalusd,na.rm = TRUE)) %>% arrange(-total_cost)
head(total_cost, 10)
## # A tibble: 10 x 2
## EVTYPE total_cost
## <chr> <dbl>
## 1 FLOOD 138007444500
## 2 HURRICANE/TYPHOON 29348167800
## 3 TORNADO 16570326363
## 4 HURRICANE 12405268000
## 5 RIVER FLOOD 10108369000
## 6 HAIL 10048596590
## 7 FLASH FLOOD 8716525177
## 8 ICE STORM 5925150850
## 9 STORM SURGE/TIDE 4641493000
## 10 THUNDERSTORM WIND 3813647990
plot1 <- ggplot(total_injuries[1:10,], aes(x = EVTYPE, y = total_injuries)) +
geom_bar(stat = "identity") +
labs(x = "Event Type", y = "Number of Injuries", title = "Events with Highest Injuries")+theme(axis.text=element_text(size=5))
plot1
plot2 <- ggplot(total_fatalities[1:10,], aes(x = EVTYPE, y = total_fatalities)) +
geom_bar(stat = "identity") +
labs(x = "Event Type", y = "Number of Fatalities", title = "Events with Highest Fatalities")+theme(axis.text=element_text(size=5))
plot2
As depicted by the above figures, Tornadoes contribute highest in both the fatality and injury count.
plot <- ggplot(total_cost[1:10,], aes(x = EVTYPE, y = total_cost/10^9)) +
geom_bar(stat = "identity") +
labs(x = "Event Type", y = "Total Cost (per Billion USD$)", title = "Events with Highest Economic Impact")+theme(axis.text=element_text(size=5))
plot
As depicted by the above figure, Floods are responsible for the highest economic impact!