The Most Harmful Climate Events Cause Population Health and Economics Problems

author: liuyubobobo
date: Saturday, February 21, 2015

Synopsis

In this report, we use the data from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) to anaylsis the most harmful events in the U.S. cause the population health and economic health. Through the analysis, we conclude the top 10 events which cause the most fatalities, injuries and economic damages. Besides, we discover that most damages are causes by a few events. Among those, the serious one is TORNADO, besides, *Flood and tstm wind** are also very harmful. If we can prevent or forecast these events, we can avoid lots of loss!

Data Processing

Load the data

We first read in the Storm Data, which comes from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

More information of this data can be found on:
- National Weather Service Storm Data Documentation
- National Climatic Data Center Storm Events FAQ

data <- read.csv( bzfile("StormData.csv.bz2") )

Data Analysis for harmful events respect to population health

First of all, we aggregate our data by event types and calculate the total numbers of fatalities and injuries.

eventDataForPopulationHealth <- aggregate( cbind( FATALITIES , INJURIES ) ~ EVTYPE , data = data , FUN = sum)

Then, we can sort our new data frame - eventDataForPopulationHealth by the total number of fatalities.

attach(eventDataForPopulationHealth)
eventDataOrderByFatalities <- eventDataForPopulationHealth[ order(FATALITIES , INJURIES , decreasing = TRUE) , ]
detach(eventDataForPopulationHealth)

Data Analysis acording to fatalities

We can summarize the top 10 harmful events cause most fatalities.

head( eventDataOrderByFatalities , 10)
##             EVTYPE FATALITIES INJURIES
## 834        TORNADO       5633    91346
## 130 EXCESSIVE HEAT       1903     6525
## 153    FLASH FLOOD        978     1777
## 275           HEAT        937     2100
## 464      LIGHTNING        816     5230
## 856      TSTM WIND        504     6957
## 170          FLOOD        470     6789
## 585    RIP CURRENT        368      232
## 359      HIGH WIND        248     1137
## 19       AVALANCHE        224      170

The percentage of these top 10 events cause fatalities can be calculated as follows:

attach(eventDataOrderByFatalities)
sum(FATALITIES[1:10]) / sum(FATALITIES)
## [1] 0.797689
detach(eventDataOrderByFatalities)

Which is really high! It means these events need our attention especially!

For understanding the top 10 events better, we can plot the data as follows:

eventDataOrderByFatalities$FATALITIES <- eventDataOrderByFatalities$FATALITIES / 1000
par( mar = c(10,6,2,1) , las = 2 )
barplot( height = eventDataOrderByFatalities$FATALITIES[1:10] , names.arg = eventDataOrderByFatalities$EVTYPE[1:10] , col = heat.colors(10) , main = "Top 10 Harmful Events cause most fatalities" , ylab = "Total numbers of fatalities (Thousand People)" )

Data Analysis acording to Injuries

In the same way, we can sort our new data frame - eventDataForPopulationHealth by the total number of injuries.

attach(eventDataForPopulationHealth)
eventDataOrderByInjuries <- eventDataForPopulationHealth[ order(INJURIES , FATALITIES , decreasing = TRUE) , ]
detach(eventDataForPopulationHealth)

Then, we can summarize the top 10 farmful event cause most injuries.

head( eventDataOrderByInjuries , 10)
##                EVTYPE FATALITIES INJURIES
## 834           TORNADO       5633    91346
## 856         TSTM WIND        504     6957
## 170             FLOOD        470     6789
## 130    EXCESSIVE HEAT       1903     6525
## 464         LIGHTNING        816     5230
## 275              HEAT        937     2100
## 427         ICE STORM         89     1975
## 153       FLASH FLOOD        978     1777
## 760 THUNDERSTORM WIND        133     1488
## 244              HAIL         15     1361

The percentage of these top 10 events cause injuries can be calculated as follows:

attach(eventDataOrderByInjuries)
sum( INJURIES[1:10] ) / sum( INJURIES )
## [1] 0.893402
detach(eventDataOrderByInjuries)

Which is even higher! These events also need our attention!

For understanding this top 10 events better, we can plot the data as follows:

eventDataOrderByInjuries$INJURIES = eventDataOrderByInjuries$INJURIES / 1000
par( mar = c(10,6,2,1) , las = 2 )
barplot( height = eventDataOrderByInjuries$INJURIES[1:10] , names.arg = eventDataOrderByInjuries$EVTYPE[1:10] , col = heat.colors(10) , main = "Top 10 Harmful Events cause most injures" , ylab = "Total numbers of injuries (Thousand People)" )

Other Respect

To emphasis the harm of these events, we can try to calculate how many counties these events occured.
In our data, the total county number is:

length( unique(data$COUNTY) )
## [1] 557

The county number which the top 10 harmful events cause most fatalities occur is:

length(unique(data[ data$EVTYPE %in% eventDataOrderByFatalities[1:10,"EVTYPE"] , "COUNTY"]))
## [1] 446

The percentage is:

length(unique(data[ data$EVTYPE %in% eventDataOrderByFatalities[1:10,"EVTYPE"] , "COUNTY"])) / length( unique(data$COUNTY) )
## [1] 0.8007181

The county number which the top 10 harmful events cause most injuries occur is:

length(unique(data[ data$EVTYPE %in% eventDataOrderByInjuries[1:10,"EVTYPE"] , "COUNTY"]))
## [1] 424

The percentage is:

length(unique(data[ data$EVTYPE %in% eventDataOrderByInjuries[1:10,"EVTYPE"] , "COUNTY"])) / length( unique(data$COUNTY) )
## [1] 0.7612208

These number are pretty high. It means these events not only cause serious harm to population health, but also occurs widely.

Data Analysis for events cause the greatest economic consequence

First of all, we can aggregate our data by event types and calculate the total numbers of propert damages.

eventDataForPropdmg <- aggregate( PROPDMG ~ EVTYPE , data = data , FUN = sum)

Then, we can sort our new data frame - eventDataForPropdmg by the total property damages.

attach(eventDataForPropdmg)
eventDataOrderByPropdmg <- eventDataForPropdmg[ order(PROPDMG , decreasing = TRUE) , ]
detach(eventDataForPropdmg)

We can summarize the top 10 events cause the greatest economic consequence.

head( eventDataOrderByPropdmg , 10)
##                 EVTYPE   PROPDMG
## 834            TORNADO 3212258.2
## 153        FLASH FLOOD 1420124.6
## 856          TSTM WIND 1335965.6
## 170              FLOOD  899938.5
## 760  THUNDERSTORM WIND  876844.2
## 244               HAIL  688693.4
## 464          LIGHTNING  603351.8
## 786 THUNDERSTORM WINDS  446293.2
## 359          HIGH WIND  324731.6
## 972       WINTER STORM  132720.6

For understanding the top 10 events better, we can plot the data as follows:

eventDataOrderByPropdmg$PROPDMG <- eventDataOrderByPropdmg$PROPDMG / 1000000
par( mar = c(10,6,2,1) , las = 2 )
barplot( height = eventDataOrderByPropdmg$PROPDMG[1:10] , names.arg = eventDataOrderByPropdmg$EVTYPE[1:10] , col = heat.colors(10) ,  main = "Top 10 events cause the greatest economic consequence" , ylab = "Total Property Damages (Million Dollars)" )

Results

From the above, we can see that the 3 different top 10 events are serious enough for US, both in population health and economics damages. If we look these events closer, we my find that there are not 30 different events in total. some critical events cause not only fatalities and injuries, but also economics damages. Among them, TORNADO is the most serious one. Besides that, Flood and tstm wind are also very harmful. If we can prevent or forecast these events, we can avoid lots of loss!