Analysis of public health and economic impact of storms and other severe weather

Synopsis:

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The analysis addresses two questions, 1)which types of events are most harmful to population health? and 2)which types of events have the greatest economic consequences? The dataset provides the fatalies, injuries, property damage and crop damage estimate of each event. The analysis concludes the most harmful event to population health is Tornado and the greatest economic consequences were made by flood.

Data Processing

1.Download storm data from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2 and then save it to /data folder
2.Read the bz2 file directly
3.The download cmd cannot be cached, so I commented it for one time use
4.Reading csv file of 40+ MB is time-consuming, I cached it

##download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2","data/storm_data.csv.bz2",cacheOK=TRUE)

storm<-read.csv('./data/storm_data.csv.bz2')

Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

Results

This analysis considers fatalities/injuries as the source of the population health. It includes both direct and indirect fatalities and injuries.

storm %>% group_by(EVTYPE) %>% summarise(total_fatalities = sum(FATALITIES, na.rm = TRUE),total_injuries = sum(INJURIES, na.rm=TRUE)) %>% arrange(desc(total_fatalities,total_injuries)) ->a

# Grouped Bar Plot
b<-a[1:4,]
barplot(t(as.matrix(b[, 2:3])), main="Top 4 Fatalities/Injuries Distribution by Type of Events",names.arg=b$EVTYPE,
  xlab="Type of Events", legend=colnames(b[,2:3]),col=c("darkblue","red"),beside=TRUE)

The grouped barplot shows the top 4 event types based on the total fatalities and injuries. Tornado is the most harmful with respect to population health. The fatalities and injuries caused by Tornado is 5633 and 9.134610^{4} respectively.

Across the United States, which types of events have the greatest economic consequences?

There are 2 types of damage estimate available in the dataset, property damage and crop damage, the variables are PROPDMG and CROPDMG.PROPDMGEXP and CROPDMGEXP are the magnitude respectively.

Data Processing(question 2)

According to the Microsoft Word - 10-1605_StormDataPrep.doc, DMGEXP should be an alphabetical character signifying the magnitude of the number, i.e., 1.55B for $1,550,000,000. Alphabetical characters used to signify magnitude include “K” for thousands, “M” for millions, and “B” for billions.

storm %>% group_by(EVTYPE) %>% select(PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP,EVTYPE,REMARKS) -> cost

summary(cost)
##     PROPDMG          PROPDMGEXP        CROPDMG          CROPDMGEXP    
##  Min.   :   0.00          :465934   Min.   :  0.000          :618413  
##  1st Qu.:   0.00   K      :424665   1st Qu.:  0.000   K      :281832  
##  Median :   0.00   M      : 11330   Median :  0.000   M      :  1994  
##  Mean   :  12.06   0      :   216   Mean   :  1.527   k      :    21  
##  3rd Qu.:   0.50   B      :    40   3rd Qu.:  0.000   0      :    19  
##  Max.   :5000.00   5      :    28   Max.   :990.000   B      :     9  
##                    (Other):    84                     (Other):     9  
##                EVTYPE      
##  HAIL             :288661  
##  TSTM WIND        :219940  
##  THUNDERSTORM WIND: 82563  
##  TORNADO          : 60652  
##  FLASH FLOOD      : 54277  
##  FLOOD            : 25326  
##  (Other)          :170878  
##                                            REMARKS      
##                                                :287433  
##                                                : 24013  
##  Trees down.\n                                 :  1110  
##  Several trees were blown down.\n              :   568  
##  Trees were downed.\n                          :   446  
##  Large trees and power lines were blown down.\n:   432  
##  (Other)                                       :588295

As the summary indicated, PROPDMGEXP,CROPDMGEXP are not complete.

There are three major issues:
1. Missing DMGEXP data
2. Category “5” is not clear
3. Category “0” is not clear
Then, I’ll try to map each category to the correct unit.

Examing missing DMGEXP data

cost %>%  filter(PROPDMGEXP=="") -> outliers_na
summary(outliers_na)
##     PROPDMG           PROPDMGEXP        CROPDMG           CROPDMGEXP    
##  Min.   : 0.00000          :465934   Min.   :  0.0000          :461616  
##  1st Qu.: 0.00000   -      :     0   1st Qu.:  0.0000   K      :  3865  
##  Median : 0.00000   ?      :     0   Median :  0.0000   M      :   443  
##  Mean   : 0.00113   +      :     0   Mean   :  0.5121   B      :     4  
##  3rd Qu.: 0.00000   0      :     0   3rd Qu.:  0.0000   0      :     3  
##  Max.   :75.00000   1      :     0   Max.   :990.0000   ?      :     2  
##                     (Other):     0                      (Other):     1  
##                 EVTYPE                                  REMARKS      
##  HAIL              :196662                                  :245306  
##  TSTM WIND         :157095                                  : 16990  
##  FLASH FLOOD       : 21319   Trees down.\n                  :   590  
##  THUNDERSTORM WINDS:  8951   Penny size hail was observed.\n:   315  
##  TORNADO           :  8805   Trees were downed.\n           :   279  
##  HEAVY SNOW        :  8695   Trees were blown down.\n       :   234  
##  (Other)           : 64407   (Other)                        :202220

As the Mean is close to Zero and the max is small, it is safe to conclude that when the DMGEXP is blank, the damage cost estimate is zero all the time.

Examine category “5”

cost %>%  filter(PROPDMGEXP=='5') -> outliers_5
tail(outliers_5)
## # A tibble: 6 x 6
## # Groups:   EVTYPE [6]
##   PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP EVTYPE             REMARKS        
##     <dbl> <fct>        <dbl> <fct>      <fct>              <fct>          
## 1   0.    5               0. ""         HAIL               "  "           
## 2   0.    5               0. ""         THUNDERSTORM WINDS "A large awnin~
## 3   0.200 5               0. ""         TORNADO            "  "           
## 4   0.700 5               0. ""         FLOODING           "  "           
## 5  13.0   5               0. ""         LIGHTNING          "Lightning set~
## 6   6.40  5             430. K          FLASH FLOOD        "  "

After examing one of the remarks in the “5” category, I find out that “5” represents 5K. See line 5.

Examing category “0”

cost %>%  filter(PROPDMGEXP=='0') -> outliers_0
summary(outliers_0)
##     PROPDMG         PROPDMGEXP     CROPDMG          CROPDMGEXP 
##  Min.   :  0.00   0      :216   Min.   :  0.000          :211  
##  1st Qu.: 10.00          :  0   1st Qu.:  0.000   K      :  4  
##  Median : 30.00   -      :  0   Median :  0.000   M      :  1  
##  Mean   : 32.91   ?      :  0   Mean   :  1.002   ?      :  0  
##  3rd Qu.: 50.00   +      :  0   3rd Qu.:  0.000   0      :  0  
##  Max.   :150.00   1      :  0   Max.   :160.000   2      :  0  
##                   (Other):  0                     (Other):  0  
##                 EVTYPE   
##  THUNDERSTORM WINDS:158  
##  LIGHTNING         : 13  
##  HAIL              : 12  
##  FLASH FLOOD       : 10  
##  TORNADO           :  9  
##  FLOOD/FLASH FLOOD :  2  
##  (Other)           : 12  
##                                                                                                               REMARKS   
##                                                                                                                   : 27  
##  Thunderstorm winds blew down a large tree east of Hampton and knocked power lines down in Hampton and McDonough. :  3  
##  Thunderstorm winds knocked down a couple of trees.                                                               :  2  
##  Thunderstorm winds knocked down a pine tree near Starrs Mill and Bradford pear tree west of Hampton.             :  2  
##  Thunderstorm winds knocked down trees and power lines.                                                           :  2  
##  Thunderstorm winds knocked numerous trees down on power lines.                                                   :  2  
##  (Other)                                                                                                          :178

I cannot find any info from documentation or remarks about this category. since the majority events are Thunderstorm Winds, I make a scientific guess that the unit is “M”.

Convert each category to the proper unit and bind all categories together, plot the top 5 events by total damage cost

cost %>% group_by(EVTYPE) %>% filter(PROPDMGEXP=='K') %>% summarise(total=sum(PROPDMG*1000)) -> cost_p_k
cost %>% group_by(EVTYPE) %>% filter(PROPDMGEXP=='M') %>% summarise(total=sum(PROPDMG*1000000)) -> cost_p_m
cost %>% group_by(EVTYPE) %>% filter(PROPDMGEXP=='0') %>% summarise(total=sum(PROPDMG*1000000)) -> cost_p_0
cost %>% group_by(EVTYPE) %>% filter(PROPDMGEXP=='B') %>% summarise(total=sum(PROPDMG*1000000000)) -> cost_p_b
cost %>% group_by(EVTYPE) %>% filter(PROPDMGEXP=='5') %>% summarise(total=sum(PROPDMG*5000)) -> cost_p_5k

cost %>% group_by(EVTYPE) %>% filter(CROPDMGEXP=='K') %>% summarise(total=sum(CROPDMG*1000)) -> cost_c_k1
cost %>% group_by(EVTYPE) %>% filter(CROPDMGEXP=='M') %>% summarise(total=sum(CROPDMG*1000000)) -> cost_c_m
cost %>% group_by(EVTYPE) %>% filter(CROPDMGEXP=='k') %>% summarise(total=sum(CROPDMG*1000)) -> cost_c_k2
cost %>% group_by(EVTYPE) %>% filter(CROPDMGEXP=='0') %>% summarise(total=sum(CROPDMG*1000000)) -> cost_c_0
cost %>% group_by(EVTYPE) %>% filter(CROPDMGEXP=='B') %>% summarise(total=sum(CROPDMG*1000000000)) -> cost_c_b

bind_rows(cost_p_k,cost_p_m,cost_p_0,cost_p_b,cost_p_5k,cost_c_k1,cost_c_m,cost_c_k2,cost_c_0,cost_c_b) -> result

Results

result %>% group_by(EVTYPE) %>% summarise(total_exp=sum(total)) %>% arrange(desc(total_exp)) -> result2

top5<-result2[1:5,]
x <- barplot(top5$total_exp, main="Top 5 Type of Events have greatest econmic consequences",names.arg=top5$EVTYPE,
  xlab="Type of Events",ylab="Total Damage Cost",beside=TRUE, las=2,xaxt="n",yaxt="n")
text(cex=0.6, x=x-.25, y=-2.25, top5$EVTYPE, xpd=TRUE, srt=45)
axis(2, at=top5$total_exp, labels=format(paste(round(top5$total_exp/1e9,1),"B"), scientific=FALSE), hadj=0.9, cex.axis=0.8, las=2)

Conclusion

As the two charts concluded, the Tornado caused the most harm to population health, 5633 death and 9.134610^{4} injuries. The flood caused the greatest economic loss which is 150.3196783 billion dollors.