Considering the data gathered by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) the human and economic impact of atmospheric phenomena can be quantified.
Using this data we can get a visual and analytic view that can give us an overview of this kind of impact.
The analysis of this data can be used to minimize the losses due to climate disasters and prevent economic impact and mainly human impact.
If we are able to know which kind of phenomenon is more dangerous then we can focus efforts on deliver measures that can be addressed to conform better countermeasures.
The Data is gotten from the URL Storm Data and is used as is in this document. The code to download and get the corresponding data frame is shown below.
zip.file.URL="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zip.file.name="StormData.csv.bz2"
if(!file.exists(zip.file.name)){
download.file(zip.file.URL,zip.file.name)
}
data.storm<-read.csv(zip.file.name)
From the documentation and the FAQ we know that the damage costs PROPDMG is represented in US dollars with a multiplier defined by PROPDMGEXP.
PROPDMG is defined as follows (text from the documentation quoted above):
“Estimates can be obtained from emergency managers, U.S. Geological Survey, U.S. Army Corps of Engineers, power utility companies, and newspaper articles. If the values provided are rough estimates, then this should be stated as such in the narrative. Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. If additional precision is available, it may be provided in the narrative part of the entry. When damage is due to more than one element of the storm, indicate, when possible, the amount of damage caused by each element. If the dollar amount of damage is unknown, or not available, check the “no information available” box. "
So, we need to transform the amount in PROPDMG to a new value that reflects the effective value.
We use the following chunk in order to get this value, note that we include the costs in crops.
data.storm.totalized <- mutate(
data.storm,sum_cost=(
PROPDMG*ifelse(PROPDMGEXP=="K",1e3,
ifelse(PROPDMGEXP=="M",1e6,
ifelse(
PROPDMGEXP=="B",1e9,0
)
)
)+CROPDMG*ifelse(CROPDMGEXP=="K",1e3,
ifelse(CROPDMGEXP=="M",1e6,
ifelse(
CROPDMGEXP=="B",1e9,0
)
)
)
)
)
In order to have a better vision of the human and economic impacts due to this kind of phenomena we must summarize the total costs, people with injuries, and fatalities for each type of phenomenon.
data.storm.summarized<-data.storm.totalized %>%
group_by(EVTYPE) %>% summarize(total_cost=sum(sum_cost),
total_fatalities=sum(FATALITIES),
total_injuries=sum(INJURIES),
mean_cost=mean(sum_cost),
mean_fatalities=mean(FATALITIES),
mean_injuries=mean(INJURIES))
After we prepare the data frames to be displayed in order of the impact of the phenomenon.
orderedbycosts<-data.storm.summarized %>% arrange(desc(total_cost))
orderedbyfatalities<-data.storm.summarized %>% arrange(desc(total_fatalities))
orderedbyinjuries<-data.storm.summarized %>% arrange(desc(total_injuries))
orderedbymeancosts<-data.storm.summarized %>% arrange(desc(mean_cost))
orderedbymeanfatalities<-data.storm.summarized %>% arrange(desc(mean_fatalities))
orderedbymeaninjuries<-data.storm.summarized %>% arrange(desc(mean_injuries))
We can see the top 10 phenomena for economic impact in the following table.
print(xtable(head(select(orderedbycosts,EVTYPE,total_cost),10)),type="html")
| EVTYPE | total_cost | |
|---|---|---|
| 1 | FLOOD | 150319678250.00 |
| 2 | HURRICANE/TYPHOON | 71913712800.00 |
| 3 | TORNADO | 57340613590.00 |
| 4 | STORM SURGE | 43323541000.00 |
| 5 | HAIL | 18752904170.00 |
| 6 | FLASH FLOOD | 17562128610.00 |
| 7 | DROUGHT | 15018672000.00 |
| 8 | HURRICANE | 14610229010.00 |
| 9 | RIVER FLOOD | 10148404500.00 |
| 10 | ICE STORM | 8967041310.00 |
About the fatalities.
print(xtable(head(select(orderedbyfatalities,EVTYPE,total_fatalities),10)),type="html")
| EVTYPE | total_fatalities | |
|---|---|---|
| 1 | TORNADO | 5633.00 |
| 2 | EXCESSIVE HEAT | 1903.00 |
| 3 | FLASH FLOOD | 978.00 |
| 4 | HEAT | 937.00 |
| 5 | LIGHTNING | 816.00 |
| 6 | TSTM WIND | 504.00 |
| 7 | FLOOD | 470.00 |
| 8 | RIP CURRENT | 368.00 |
| 9 | HIGH WIND | 248.00 |
| 10 | AVALANCHE | 224.00 |
About the injuries.
print(xtable(head(select(orderedbyinjuries,EVTYPE,total_injuries),10)),type="html")
| EVTYPE | total_injuries | |
|---|---|---|
| 1 | TORNADO | 91346.00 |
| 2 | TSTM WIND | 6957.00 |
| 3 | FLOOD | 6789.00 |
| 4 | EXCESSIVE HEAT | 6525.00 |
| 5 | LIGHTNING | 5230.00 |
| 6 | HEAT | 2100.00 |
| 7 | ICE STORM | 1975.00 |
| 8 | FLASH FLOOD | 1777.00 |
| 9 | THUNDERSTORM WIND | 1488.00 |
| 10 | HAIL | 1361.00 |
Even when the tables establish what phenomena has the most impact this could not be utterly true as can be many phenomena of one specific type and the accumulative effect can be misleading.
Because that here are shown the average damages by event type.
About the costs.
print(xtable(head(select(orderedbymeancosts,EVTYPE,mean_cost),10)),type="html")
| EVTYPE | mean_cost | |
|---|---|---|
| 1 | TORNADOES, TSTM WIND, HAIL | 1602500000.00 |
| 2 | HEAVY RAIN/SEVERE WEATHER | 1250000000.00 |
| 3 | HURRICANE/TYPHOON | 817201281.82 |
| 4 | HURRICANE OPAL | 351316222.22 |
| 5 | STORM SURGE | 165990578.54 |
| 6 | WILD FIRES | 156025000.00 |
| 7 | EXCESSIVE WETNESS | 142000000.00 |
| 8 | HURRICANE OPAL/HIGH WINDS | 110000000.00 |
| 9 | SEVERE THUNDERSTORM | 92735384.62 |
| 10 | HURRICANE | 83966833.39 |
About the fatalities.
print(xtable(head(select(orderedbymeanfatalities,EVTYPE,mean_fatalities),10)),type="html")
| EVTYPE | mean_fatalities | |
|---|---|---|
| 1 | TORNADOES, TSTM WIND, HAIL | 25.00 |
| 2 | COLD AND SNOW | 14.00 |
| 3 | TROPICAL STORM GORDON | 8.00 |
| 4 | RECORD/EXCESSIVE HEAT | 5.67 |
| 5 | EXTREME HEAT | 4.36 |
| 6 | HEAT WAVE DROUGHT | 4.00 |
| 7 | HIGH WIND/SEAS | 4.00 |
| 8 | MARINE MISHAP | 3.50 |
| 9 | WINTER STORMS | 3.33 |
| 10 | Heavy surf and wind | 3.00 |
About the injuries.
print(xtable(head(select(orderedbymeaninjuries,EVTYPE,mean_injuries),10)),type="html")
| EVTYPE | mean_injuries | |
|---|---|---|
| 1 | Heat Wave | 70.00 |
| 2 | TROPICAL STORM GORDON | 43.00 |
| 3 | WILD FIRES | 37.50 |
| 4 | THUNDERSTORMW | 27.00 |
| 5 | HIGH WIND AND SEAS | 20.00 |
| 6 | SNOW/HIGH WINDS | 18.00 |
| 7 | GLAZE/ICE STORM | 15.00 |
| 8 | HEAT WAVE DROUGHT | 15.00 |
| 9 | WINTER STORM HIGH WINDS | 15.00 |
| 10 | HURRICANE/TYPHOON | 14.49 |
If we examine a little closer we realize taht the case of flood is quite particular, only one event means more than 115 billions dollars, this is the behavior of an outlier, to normalize this we could trim the dataset or we could use the mean. We will use the mean as a preliminary approach. This graph shows the top 5 events ordered by the mean.
plot4<-ggplot(head(orderedbymeancosts,5), aes(x=EVTYPE, y=as.numeric(mean_cost))) +
geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
labs(y="cost",title="Impact by cost.") +
geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())
plot5<-ggplot(head(orderedbymeanfatalities,5), aes(x=EVTYPE, y=as.numeric(mean_fatalities)))+
geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
labs(y="cost",title="Impact by fatalities.") +
geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())
plot6<-ggplot(head(orderedbymeaninjuries,5), aes(x=EVTYPE, y=as.numeric(mean_injuries))) +
geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
labs(y="cost",title="Impact by injuries.") +
geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())
grid.arrange(plot4,plot5,plot6,nrow=3)
We can analyze using a boxplot using the top 5 events in total impact.
topeventsfilteredcost<-filter(data.storm.totalized, EVTYPE %in% head(orderedbycosts,5)$EVTYPE)
plot7<-ggplot(topeventsfilteredcost,aes(x=EVTYPE,y=log10(sum_cost), color=EVTYPE))+
geom_boxplot()+
scale_color_brewer(palette="Dark2")+
theme(axis.title.x=element_blank(),axis.text.x=element_blank())
topeventsfilteredfatalities<-filter(data.storm.totalized, EVTYPE %in% head(orderedbyfatalities,5)$EVTYPE)
plot8<-ggplot(topeventsfilteredfatalities,aes(x=EVTYPE,y=log10(FATALITIES), color=EVTYPE))+
geom_boxplot()+
scale_color_brewer(palette="Dark2")+
theme(axis.title.x=element_blank(),axis.text.x=element_blank())
topeventsfilteredinjuries<-filter(data.storm.totalized, EVTYPE %in% head(orderedbyinjuries,5)$EVTYPE)
plot9<-ggplot(topeventsfilteredinjuries,aes(x=EVTYPE,y=log10(INJURIES), color=EVTYPE))+
geom_boxplot()+
scale_color_brewer(palette="Dark2")+
theme(axis.title.x=element_blank(),axis.text.x=element_blank())
grid.arrange(plot7,plot8,plot9,nrow=3)
## Warning: Removed 299392 rows containing non-finite values (stat_boxplot).
## Warning: Removed 129375 rows containing non-finite values (stat_boxplot).
## Warning: Removed 309845 rows containing non-finite values (stat_boxplot).
So we can say that the Hurricane/Typhoon events are the most costly per event as we can see that the boxplot describes that the upper limit is the higher and the amplitude of the boxplot is bigger.
About the fatalities the main concern is with the Tornados, even the tables shown above give this reference.
The injuries are more comon with heat waves as shown in the boxplot and confirmed by te tables.