Synopsis

Considering the data gathered by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) the human and economic impact of atmospheric phenomena can be quantified.

Using this data we can get a visual and analytic view that can give us an overview of this kind of impact.

The analysis of this data can be used to minimize the losses due to climate disasters and prevent economic impact and mainly human impact.

If we are able to know which kind of phenomenon is more dangerous then we can focus efforts on deliver measures that can be addressed to conform better countermeasures.


Data Processing

The Data is gotten from the URL Storm Data and is used as is in this document. The code to download and get the corresponding data frame is shown below.

zip.file.URL="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zip.file.name="StormData.csv.bz2"

if(!file.exists(zip.file.name)){
  download.file(zip.file.URL,zip.file.name)
}

data.storm<-read.csv(zip.file.name)

From the documentation and the FAQ we know that the damage costs PROPDMG is represented in US dollars with a multiplier defined by PROPDMGEXP.

PROPDMG is defined as follows (text from the documentation quoted above):

“Estimates can be obtained from emergency managers, U.S. Geological Survey, U.S. Army Corps of Engineers, power utility companies, and newspaper articles. If the values provided are rough estimates, then this should be stated as such in the narrative. Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. If additional precision is available, it may be provided in the narrative part of the entry. When damage is due to more than one element of the storm, indicate, when possible, the amount of damage caused by each element. If the dollar amount of damage is unknown, or not available, check the “no information available” box. "

So, we need to transform the amount in PROPDMG to a new value that reflects the effective value.

We use the following chunk in order to get this value, note that we include the costs in crops.

data.storm.totalized <-  mutate(
  data.storm,sum_cost=(
    PROPDMG*ifelse(PROPDMGEXP=="K",1e3,
                   ifelse(PROPDMGEXP=="M",1e6,
                            ifelse(
                              PROPDMGEXP=="B",1e9,0
                            )
                          )
                   )+CROPDMG*ifelse(CROPDMGEXP=="K",1e3,
                                        ifelse(CROPDMGEXP=="M",1e6,
                                               ifelse(
                                                 CROPDMGEXP=="B",1e9,0
                                               )
                                        )
                   )
    )
  )

In order to have a better vision of the human and economic impacts due to this kind of phenomena we must summarize the total costs, people with injuries, and fatalities for each type of phenomenon.

data.storm.summarized<-data.storm.totalized %>% 
  group_by(EVTYPE) %>% summarize(total_cost=sum(sum_cost), 
                                 total_fatalities=sum(FATALITIES),
                                 total_injuries=sum(INJURIES),
                                 mean_cost=mean(sum_cost),
                                 mean_fatalities=mean(FATALITIES),
                                 mean_injuries=mean(INJURIES))

After we prepare the data frames to be displayed in order of the impact of the phenomenon.

orderedbycosts<-data.storm.summarized %>% arrange(desc(total_cost))
orderedbyfatalities<-data.storm.summarized %>% arrange(desc(total_fatalities))
orderedbyinjuries<-data.storm.summarized %>% arrange(desc(total_injuries))

orderedbymeancosts<-data.storm.summarized %>% arrange(desc(mean_cost))
orderedbymeanfatalities<-data.storm.summarized %>% arrange(desc(mean_fatalities))
orderedbymeaninjuries<-data.storm.summarized %>% arrange(desc(mean_injuries))

We can see the top 10 phenomena for economic impact in the following table.

print(xtable(head(select(orderedbycosts,EVTYPE,total_cost),10)),type="html")
EVTYPE total_cost
1 FLOOD 150319678250.00
2 HURRICANE/TYPHOON 71913712800.00
3 TORNADO 57340613590.00
4 STORM SURGE 43323541000.00
5 HAIL 18752904170.00
6 FLASH FLOOD 17562128610.00
7 DROUGHT 15018672000.00
8 HURRICANE 14610229010.00
9 RIVER FLOOD 10148404500.00
10 ICE STORM 8967041310.00

About the fatalities.

print(xtable(head(select(orderedbyfatalities,EVTYPE,total_fatalities),10)),type="html")
EVTYPE total_fatalities
1 TORNADO 5633.00
2 EXCESSIVE HEAT 1903.00
3 FLASH FLOOD 978.00
4 HEAT 937.00
5 LIGHTNING 816.00
6 TSTM WIND 504.00
7 FLOOD 470.00
8 RIP CURRENT 368.00
9 HIGH WIND 248.00
10 AVALANCHE 224.00

About the injuries.

print(xtable(head(select(orderedbyinjuries,EVTYPE,total_injuries),10)),type="html")
EVTYPE total_injuries
1 TORNADO 91346.00
2 TSTM WIND 6957.00
3 FLOOD 6789.00
4 EXCESSIVE HEAT 6525.00
5 LIGHTNING 5230.00
6 HEAT 2100.00
7 ICE STORM 1975.00
8 FLASH FLOOD 1777.00
9 THUNDERSTORM WIND 1488.00
10 HAIL 1361.00

Even when the tables establish what phenomena has the most impact this could not be utterly true as can be many phenomena of one specific type and the accumulative effect can be misleading.

Because that here are shown the average damages by event type.

About the costs.

print(xtable(head(select(orderedbymeancosts,EVTYPE,mean_cost),10)),type="html")
EVTYPE mean_cost
1 TORNADOES, TSTM WIND, HAIL 1602500000.00
2 HEAVY RAIN/SEVERE WEATHER 1250000000.00
3 HURRICANE/TYPHOON 817201281.82
4 HURRICANE OPAL 351316222.22
5 STORM SURGE 165990578.54
6 WILD FIRES 156025000.00
7 EXCESSIVE WETNESS 142000000.00
8 HURRICANE OPAL/HIGH WINDS 110000000.00
9 SEVERE THUNDERSTORM 92735384.62
10 HURRICANE 83966833.39

About the fatalities.

print(xtable(head(select(orderedbymeanfatalities,EVTYPE,mean_fatalities),10)),type="html")
EVTYPE mean_fatalities
1 TORNADOES, TSTM WIND, HAIL 25.00
2 COLD AND SNOW 14.00
3 TROPICAL STORM GORDON 8.00
4 RECORD/EXCESSIVE HEAT 5.67
5 EXTREME HEAT 4.36
6 HEAT WAVE DROUGHT 4.00
7 HIGH WIND/SEAS 4.00
8 MARINE MISHAP 3.50
9 WINTER STORMS 3.33
10 Heavy surf and wind 3.00

About the injuries.

print(xtable(head(select(orderedbymeaninjuries,EVTYPE,mean_injuries),10)),type="html")
EVTYPE mean_injuries
1 Heat Wave 70.00
2 TROPICAL STORM GORDON 43.00
3 WILD FIRES 37.50
4 THUNDERSTORMW 27.00
5 HIGH WIND AND SEAS 20.00
6 SNOW/HIGH WINDS 18.00
7 GLAZE/ICE STORM 15.00
8 HEAT WAVE DROUGHT 15.00
9 WINTER STORM HIGH WINDS 15.00
10 HURRICANE/TYPHOON 14.49

Results

If we examine a little closer we realize taht the case of flood is quite particular, only one event means more than 115 billions dollars, this is the behavior of an outlier, to normalize this we could trim the dataset or we could use the mean. We will use the mean as a preliminary approach. This graph shows the top 5 events ordered by the mean.

plot4<-ggplot(head(orderedbymeancosts,5), aes(x=EVTYPE, y=as.numeric(mean_cost))) + 
  geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
  labs(y="cost",title="Impact by cost.") +
  geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())
plot5<-ggplot(head(orderedbymeanfatalities,5), aes(x=EVTYPE, y=as.numeric(mean_fatalities)))+
  geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
  labs(y="cost",title="Impact by fatalities.") +
  geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())
plot6<-ggplot(head(orderedbymeaninjuries,5), aes(x=EVTYPE, y=as.numeric(mean_injuries))) +
  geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
  labs(y="cost",title="Impact by injuries.") +
  geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())

grid.arrange(plot4,plot5,plot6,nrow=3)

We can analyze using a boxplot using the top 5 events in total impact.

topeventsfilteredcost<-filter(data.storm.totalized, EVTYPE %in% head(orderedbycosts,5)$EVTYPE) 

plot7<-ggplot(topeventsfilteredcost,aes(x=EVTYPE,y=log10(sum_cost), color=EVTYPE))+
  geom_boxplot()+
  scale_color_brewer(palette="Dark2")+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank())

topeventsfilteredfatalities<-filter(data.storm.totalized, EVTYPE %in% head(orderedbyfatalities,5)$EVTYPE)

plot8<-ggplot(topeventsfilteredfatalities,aes(x=EVTYPE,y=log10(FATALITIES), color=EVTYPE))+
  geom_boxplot()+
  scale_color_brewer(palette="Dark2")+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank())

topeventsfilteredinjuries<-filter(data.storm.totalized, EVTYPE %in% head(orderedbyinjuries,5)$EVTYPE)

plot9<-ggplot(topeventsfilteredinjuries,aes(x=EVTYPE,y=log10(INJURIES), color=EVTYPE))+
  geom_boxplot()+
  scale_color_brewer(palette="Dark2")+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank())

 grid.arrange(plot7,plot8,plot9,nrow=3)
## Warning: Removed 299392 rows containing non-finite values (stat_boxplot).
## Warning: Removed 129375 rows containing non-finite values (stat_boxplot).
## Warning: Removed 309845 rows containing non-finite values (stat_boxplot).

So we can say that the Hurricane/Typhoon events are the most costly per event as we can see that the boxplot describes that the upper limit is the higher and the amplitude of the boxplot is bigger.

About the fatalities the main concern is with the Tornados, even the tables shown above give this reference.

The injuries are more comon with heat waves as shown in the boxplot and confirmed by te tables.