Read the data into the environment and call dplyr.
library(dplyr)
weather = read.csv("StormData.csv", stringsAsFactors = FALSE)
For the purposes of our analysis, we will look at the number of injuries and fatalities caused by each type of event (indicated by EVTYPE).
weather_pop = weather %>%
group_by(EVTYPE) %>%
summarise(fatalities = sum(FATALITIES), injuries = sum(INJURIES))
weather_fatalities = head(weather_pop[order(weather_pop$fatalities,decreasing = TRUE),c(1,2)],10)
weather_injuries = head(weather_pop[order(weather_pop$injuries,decreasing = TRUE),c(1,3)],10)
The following is the number of fatalities related to weather:
weather_fatalities
## # A tibble: 10 x 2
## EVTYPE fatalities
## <chr> <dbl>
## 1 TORNADO 5633
## 2 EXCESSIVE HEAT 1903
## 3 FLASH FLOOD 978
## 4 HEAT 937
## 5 LIGHTNING 816
## 6 TSTM WIND 504
## 7 FLOOD 470
## 8 RIP CURRENT 368
## 9 HIGH WIND 248
## 10 AVALANCHE 224
The following is the number of injuries related to weather:
weather_injuries
## # A tibble: 10 x 2
## EVTYPE injuries
## <chr> <dbl>
## 1 TORNADO 91346
## 2 TSTM WIND 6957
## 3 FLOOD 6789
## 4 EXCESSIVE HEAT 6525
## 5 LIGHTNING 5230
## 6 HEAT 2100
## 7 ICE STORM 1975
## 8 FLASH FLOOD 1777
## 9 THUNDERSTORM WIND 1488
## 10 HAIL 1361
We have to first transform the numbers in Property damage and crop damage to their actual values, based on their modifier.
weather_adj = weather %>%
mutate(prop_mod_num = ifelse(PROPDMGEXP %in% c('h','H'),100,ifelse(PROPDMGEXP %in% c('k','K'), 1000, ifelse(PROPDMGEXP %in% c('m','M'),1000000,ifelse(PROPDMGEXP %in% c('b','B'),1000000000,0))))) %>%
mutate(crop_mod_num = ifelse(CROPDMGEXP %in% c('h','H'),100,ifelse(CROPDMGEXP %in% c('k','K'), 1000, ifelse(CROPDMGEXP %in% c('m','M'),1000000,ifelse(CROPDMGEXP %in% c('b','B'),1000000000,0))))) %>%
mutate(prop_dmg_adj = PROPDMG*prop_mod_num, crop_dmg_adj = CROPDMG*crop_mod_num)
Now that we have transformed the data, we can group by EVTYPE and get the sum of damages per type
damages = weather_adj%>%
group_by(EVTYPE)%>%
summarise(cost_prop = sum(prop_dmg_adj), cost_crop = sum(crop_dmg_adj))
The following is the cost of property damages:
weather_prop =head(damages[order(damages$cost_prop,decreasing = T),1:2],10)
weather_prop
## # A tibble: 10 x 2
## EVTYPE cost_prop
## <chr> <dbl>
## 1 FLOOD 144657709800
## 2 HURRICANE/TYPHOON 69305840000
## 3 TORNADO 56937160480
## 4 STORM SURGE 43323536000
## 5 FLASH FLOOD 16140811510
## 6 HAIL 15732267220
## 7 HURRICANE 11868319010
## 8 TROPICAL STORM 7703890550
## 9 WINTER STORM 6688497250
## 10 HIGH WIND 5270046260
The following is the cost of crop damages:
weather_crop =head(damages[order(damages$cost_crop,decreasing = T),c(1,3)],10)
weather_crop
## # A tibble: 10 x 2
## EVTYPE cost_crop
## <chr> <dbl>
## 1 DROUGHT 13972566000
## 2 FLOOD 5661968450
## 3 RIVER FLOOD 5029459000
## 4 ICE STORM 5022113500
## 5 HAIL 3025954450
## 6 HURRICANE 2741910000
## 7 HURRICANE/TYPHOON 2607872800
## 8 FLASH FLOOD 1421317100
## 9 EXTREME COLD 1292973000
## 10 FROST/FREEZE 1094086000
Plot for fatalities:
library(ggplot2)
ggplot(weather_fatalities,aes(x=EVTYPE, y=fatalities))+
geom_bar(stat="identity")+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title="Fatalities by EVTYPE")
Plot for property damage:
ggplot(weather_prop,aes(x=EVTYPE, y=cost_prop))+
geom_bar(stat="identity")+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title="Property Damage by EVTYPE")
Plot for crop damage:
ggplot(weather_crop,aes(x=EVTYPE, y=cost_crop))+
geom_bar(stat="identity")+
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title="Crop Damage by EVTYPE")
As we can see, floods are the worst for property damage. Droughts are the worst for crop damage. Tornados are the worst for injuries and fatalities, but have a lower property damage than floods. This means that we have good evacuation procedure for floods, but not tornados. We should look into that.