Synopsis

Considering the data gathered by the U.S. National Oceanic and Atmospheric Administration’s (NOAA) the human and economic impact of atmospheric phenomena can be quantified.

Using this data we can get a visual and analytic view that can give us an overview of this kind of impact.

The analysis of this data can be used to minimize the losses due to climate disasters and prevent economic impact and mainly human impact.

If we are able to know which kind of phenomenon is more dangerous then we can focus efforts on deliver measures that can be addressed to conform better countermeasures.

Data Processing

The Data is gotten from the URL Storm Data and is used as is in this document. The code to download and get the corresponding data frame is shown below.

zip.file.URL="https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
zip.file.name="StormData.csv.bz2"

if(!file.exists(zip.file.name)){
  download.file(zip.file.URL,zip.file.name)
}

data.storm<-read.csv(zip.file.name)

From the documentation and the FAQ we know that the damage costs PROPDMG is represented in US dollars with a multiplier defined by PROPDMGEXP.

PROPDMG is defined as follows (text from the documentation quoted above):

“Estimates can be obtained from emergency managers, U.S. Geological Survey, U.S. Army Corps of Engineers, power utility companies, and newspaper articles. If the values provided are rough estimates, then this should be stated as such in the narrative. Estimates should be rounded to three significant digits, followed by an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions. If additional precision is available, it may be provided in the narrative part of the entry. When damage is due to more than one element of the storm, indicate, when possible, the amount of damage caused by each element. If the dollar amount of damage is unknown, or not available, check the “no information available” box. "

So, we need to transform the amount in PROPDMG to a new value that reflects the effective value.

We use the following chunk in order to get this value, note that we include the costs in crops.

data.storm.totalized <-  mutate(
  data.storm,sum_cost=(
    PROPDMG*ifelse(PROPDMGEXP=="K",1e3,
                   ifelse(PROPDMGEXP=="M",1e6,
                            ifelse(
                              PROPDMGEXP=="B",1e9,0
                            )
                          )
                   )+CROPDMG*ifelse(CROPDMGEXP=="K",1e3,
                                        ifelse(CROPDMGEXP=="M",1e6,
                                               ifelse(
                                                 CROPDMGEXP=="B",1e9,0
                                               )
                                        )
                   )
    )
  )

In order to have a better vision of the human and economic impacts due to this kind of phenomena we must summarize the total costs, people with injuries, and fatalities for each type of phenomenon.

data.storm.summarized<-data.storm.totalized %>% 
  group_by(EVTYPE) %>% summarize(total_cost=sum(sum_cost), 
                                 total_fatalities=sum(FATALITIES),
                                 total_injuries=sum(INJURIES),
                                 mean_cost=mean(sum_cost),
                                 mean_fatalities=mean(FATALITIES),
                                 mean_injuries=mean(INJURIES))

After we prepare the data frames to be displayed in order of the impact of the phenomenon.

orderedbycosts<-data.storm.summarized %>% arrange(desc(total_cost))
orderedbyfatalities<-data.storm.summarized %>% arrange(desc(total_fatalities))
orderedbyinjuries<-data.storm.summarized %>% arrange(desc(total_injuries))

orderedbymeancosts<-data.storm.summarized %>% arrange(desc(mean_cost))
orderedbymeanfatalities<-data.storm.summarized %>% arrange(desc(mean_fatalities))
orderedbymeaninjuries<-data.storm.summarized %>% arrange(desc(mean_injuries))

We can see the top 10 phenomena for economic impact in the following table.

print(xtable(head(select(orderedbycosts,EVTYPE,total_cost),10)),type="html")

	EVTYPE	total_cost
1	FLOOD	150319678250.00
2	HURRICANE/TYPHOON	71913712800.00
3	TORNADO	57340613590.00
4	STORM SURGE	43323541000.00
5	HAIL	18752904170.00
6	FLASH FLOOD	17562128610.00
7	DROUGHT	15018672000.00
8	HURRICANE	14610229010.00
9	RIVER FLOOD	10148404500.00
10	ICE STORM	8967041310.00

About the fatalities.

print(xtable(head(select(orderedbyfatalities,EVTYPE,total_fatalities),10)),type="html")

	EVTYPE	total_fatalities
1	TORNADO	5633.00
2	EXCESSIVE HEAT	1903.00
3	FLASH FLOOD	978.00
4	HEAT	937.00
5	LIGHTNING	816.00
6	TSTM WIND	504.00
7	FLOOD	470.00
8	RIP CURRENT	368.00
9	HIGH WIND	248.00
10	AVALANCHE	224.00

About the injuries.

print(xtable(head(select(orderedbyinjuries,EVTYPE,total_injuries),10)),type="html")

	EVTYPE	total_injuries
1	TORNADO	91346.00
2	TSTM WIND	6957.00
3	FLOOD	6789.00
4	EXCESSIVE HEAT	6525.00
5	LIGHTNING	5230.00
6	HEAT	2100.00
7	ICE STORM	1975.00
8	FLASH FLOOD	1777.00
9	THUNDERSTORM WIND	1488.00
10	HAIL	1361.00

Even when the tables establish what phenomena has the most impact this could not be utterly true as can be many phenomena of one specific type and the accumulative effect can be misleading.

Because that here are shown the average damages by event type.

About the costs.

print(xtable(head(select(orderedbymeancosts,EVTYPE,mean_cost),10)),type="html")

	EVTYPE	mean_cost
1	TORNADOES, TSTM WIND, HAIL	1602500000.00
2	HEAVY RAIN/SEVERE WEATHER	1250000000.00
3	HURRICANE/TYPHOON	817201281.82
4	HURRICANE OPAL	351316222.22
5	STORM SURGE	165990578.54
6	WILD FIRES	156025000.00
7	EXCESSIVE WETNESS	142000000.00
8	HURRICANE OPAL/HIGH WINDS	110000000.00
9	SEVERE THUNDERSTORM	92735384.62
10	HURRICANE	83966833.39

About the fatalities.

print(xtable(head(select(orderedbymeanfatalities,EVTYPE,mean_fatalities),10)),type="html")

	EVTYPE	mean_fatalities
1	TORNADOES, TSTM WIND, HAIL	25.00
2	COLD AND SNOW	14.00
3	TROPICAL STORM GORDON	8.00
4	RECORD/EXCESSIVE HEAT	5.67
5	EXTREME HEAT	4.36
6	HEAT WAVE DROUGHT	4.00
7	HIGH WIND/SEAS	4.00
8	MARINE MISHAP	3.50
9	WINTER STORMS	3.33
10	Heavy surf and wind	3.00

About the injuries.

print(xtable(head(select(orderedbymeaninjuries,EVTYPE,mean_injuries),10)),type="html")

	EVTYPE	mean_injuries
1	Heat Wave	70.00
2	TROPICAL STORM GORDON	43.00
3	WILD FIRES	37.50
4	THUNDERSTORMW	27.00
5	HIGH WIND AND SEAS	20.00
6	SNOW/HIGH WINDS	18.00
7	GLAZE/ICE STORM	15.00
8	HEAT WAVE DROUGHT	15.00
9	WINTER STORM HIGH WINDS	15.00
10	HURRICANE/TYPHOON	14.49

Results

If we examine a little closer we realize taht the case of flood is quite particular, only one event means more than 115 billions dollars, this is the behavior of an outlier, to normalize this we could trim the dataset or we could use the mean. We will use the mean as a preliminary approach. This graph shows the top 5 events ordered by the mean.

plot4<-ggplot(head(orderedbymeancosts,5), aes(x=EVTYPE, y=as.numeric(mean_cost))) + 
  geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
  labs(y="cost",title="Impact by cost.") +
  geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())
plot5<-ggplot(head(orderedbymeanfatalities,5), aes(x=EVTYPE, y=as.numeric(mean_fatalities)))+
  geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
  labs(y="cost",title="Impact by fatalities.") +
  geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())
plot6<-ggplot(head(orderedbymeaninjuries,5), aes(x=EVTYPE, y=as.numeric(mean_injuries))) +
  geom_bar(stat="identity", fill="lightblue", colour="black", width=1)+
  labs(y="cost",title="Impact by injuries.") +
  geom_text(aes(label=EVTYPE),size=2,position = position_dodge(.9), vjust=1.5)+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank(),axis.ticks.x=element_blank())

grid.arrange(plot4,plot5,plot6,nrow=3)

We can analyze using a boxplot using the top 5 events in total impact.

topeventsfilteredcost<-filter(data.storm.totalized, EVTYPE %in% head(orderedbycosts,5)$EVTYPE) 

plot7<-ggplot(topeventsfilteredcost,aes(x=EVTYPE,y=log10(sum_cost), color=EVTYPE))+
  geom_boxplot()+
  scale_color_brewer(palette="Dark2")+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank())

topeventsfilteredfatalities<-filter(data.storm.totalized, EVTYPE %in% head(orderedbyfatalities,5)$EVTYPE)

plot8<-ggplot(topeventsfilteredfatalities,aes(x=EVTYPE,y=log10(FATALITIES), color=EVTYPE))+
  geom_boxplot()+
  scale_color_brewer(palette="Dark2")+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank())

topeventsfilteredinjuries<-filter(data.storm.totalized, EVTYPE %in% head(orderedbyinjuries,5)$EVTYPE)

plot9<-ggplot(topeventsfilteredinjuries,aes(x=EVTYPE,y=log10(INJURIES), color=EVTYPE))+
  geom_boxplot()+
  scale_color_brewer(palette="Dark2")+
  theme(axis.title.x=element_blank(),axis.text.x=element_blank())

 grid.arrange(plot7,plot8,plot9,nrow=3)

## Warning: Removed 299392 rows containing non-finite values (stat_boxplot).

## Warning: Removed 129375 rows containing non-finite values (stat_boxplot).

## Warning: Removed 309845 rows containing non-finite values (stat_boxplot).

So we can say that the Hurricane/Typhoon events are the most costly per event as we can see that the boxplot describes that the upper limit is the higher and the amplitude of the boxplot is bigger.

About the fatalities the main concern is with the Tornados, even the tables shown above give this reference.

The injuries are more comon with heat waves as shown in the boxplot and confirmed by te tables.

Meteoric events analysis across the United States

Marco Antonio Béjar Villalba.

11/25/2020

Synopsis

Data Processing

Results