Tornado/excessive heat and flood are the top severe weather types on public Health and economic damages

Synopsis

The NOAA storm data contains 985 severe weather types. Among those weather types, analysis has been implemented to illustrate its effect/toll on public health(the weather types that cause injuries and death) and economical damages(such as property damage and crop damage). across the US, it is clear that tornado causes most of the injuries whereas excessive heat causes the most of the fatalities among all the weather types. In the aspect of economical damages, flood generates the most damage before hurricane and typhoon.

Data processing

  1. load data
if(!file.exists("StormData.csv.bz2")){
        fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
        download.file(fileurl, destfile = "StormData.csv.bz2")
}

storm <- read.csv(bzfile("StormData.csv.bz2"), header = T, stringsAsFactors = F )
  1. transform the dataset
str(storm)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...
dim(storm)
## [1] 902297     37
names(storm) <- tolower(names(storm))
storm$bgn_date <- as.Date(storm$bgn_date, "%m/%d/%Y %H:%M:%S")

storm$year <- as.numeric(format(storm$bgn_date, "%Y"))

hist(x= storm$year, breaks = 30)

The data starts to have more records after 1990, so select the year > 1990

storm1 <- storm[storm$year > 1990,]

Results

Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?

load libraries

library(dplyr)
library(ggplot2)
health <- storm1 %>%
        select(injuries, fatalities, evtype) %>%
        group_by(events = as.factor(evtype)) %>%
        summarise(total_injuries = sum(injuries), total_death = sum(fatalities)) 

health %>% arrange(desc(total_injuries))
## # A tibble: 985 x 3
##               events total_injuries total_death
##               <fctr>          <dbl>       <dbl>
## 1            TORNADO          25497        1699
## 2              FLOOD           6789         470
## 3     EXCESSIVE HEAT           6525        1903
## 4          LIGHTNING           5230         816
## 5          TSTM WIND           4441         285
## 6               HEAT           2100         937
## 7          ICE STORM           1975          89
## 8        FLASH FLOOD           1777         978
## 9  THUNDERSTORM WIND           1488         133
## 10      WINTER STORM           1321         206
## # ... with 975 more rows
health %>% arrange(desc(total_death))
## # A tibble: 985 x 3
##            events total_injuries total_death
##            <fctr>          <dbl>       <dbl>
## 1  EXCESSIVE HEAT           6525        1903
## 2         TORNADO          25497        1699
## 3     FLASH FLOOD           1777         978
## 4            HEAT           2100         937
## 5       LIGHTNING           5230         816
## 6           FLOOD           6789         470
## 7     RIP CURRENT            232         368
## 8       TSTM WIND           4441         285
## 9       HIGH WIND           1137         248
## 10      AVALANCHE            170         224
## # ... with 975 more rows

we can see from the reuslts that tornado causes most of the injuries and excessive heat causes most of the fatalities

Make plots for the injuries and fatalities

injury1 <- health %>%
        filter(total_injuries > 300) %>%
        arrange(desc(total_injuries))

ggplot(injury1, aes( x = reorder(events, -total_injuries), y = total_injuries)) + geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90)) + xlab("Severe Weather Type") + ylab("Number of Injuries")

> From the plot, we concluded that tornato caused most of the injuries since 1990 and flood is the second most severe weather type causing injuries.

fatal <- health %>%
        filter(total_death > 100)  %>%
        arrange(desc(total_death))
ggplot(fatal, aes ( x = reorder(events, -total_death), y = total_death)) + geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90)) + xlab("Severe Weather Type") + ylab("Number of Death")

> we see from the ordered plot that excessive heat is the major culprit of severe weather causing fatalities(1903 people died of over heating since 1990 to 2011). Tornado follows suit in the second place took 1699 people’s lives since 1990 to 2011.

Question 2:Across the United States, which types of events have the greatest economic consequences?

  1. reduce the dataset to what concern the quesion: property damage and crop damage
dmg <- storm1 %>% select(evtype, bgn_date, propdmg, propdmgexp, cropdmg,cropdmgexp)

2.Transform of the “dmg”

*need to interpret those mulitpliers

unique(dmg$propdmgexp)
##  [1] ""  "K" "M" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(dmg$cropdmgexp)
## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

“”/“?” = 0, “0” or “+/-” = 1, “H”, “h” or “2” = 10e2, “K” or “k” = 10e3, “M” or “m” = 10e6, “B” or “b” = 10e9, “3” = 10e3, “4” = 10e4 etc.

dmg$propdmgexp <- toupper(dmg$propdmgexp)
dmg$cropdmgexp <- toupper(dmg$cropdmgexp)

dmg$propdmgexp <- gsub("[1]", 10, dmg$propdmgexp)
dmg$propdmgexp <- gsub("\\+|\\-|^0", 1, dmg$propdmgexp)
dmg$propdmgexp <- gsub("\\?", 0, dmg$propdmgexp)
dmg$propdmgexp <- gsub("^$", 0, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[H2]", 10^2, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[K3]", 10^3, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[M6]", 10^6, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[4]", 10^4, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[5]", 10^5, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[7]", 10^7, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[8]", 10^8, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[B]", 10^9, dmg$propdmgexp)


dmg$cropdmgexp <- gsub("\\+|\\-|^0", 1, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("\\?", 0, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("^$", 0, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("[2]", 10^2, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("K", 10^3, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("M", 10^6, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("B", 10^9, dmg$cropdmgexp)

Calculate the total damages and parse it to a new variable “total_loss”

dmg <- dmg %>%
        mutate(prop_dmg = propdmg * as.numeric(propdmgexp), crop_dmg = cropdmg * as.numeric(cropdmgexp), total_loss = prop_dmg + crop_dmg)
dmg1990 <- dmg %>%
        select(evtype, prop_dmg, crop_dmg, date = bgn_date, total_loss) %>%
        filter(total_loss > 10^9) %>%
        group_by(evtype) %>%
        summarise( total = sum(total_loss)) %>%
        arrange(desc(total))

print(dmg1990)
## # A tibble: 18 x 2
##                        evtype        total
##                         <chr>        <dbl>
## 1                       FLOOD 121532501000
## 2           HURRICANE/TYPHOON  66438500000
## 3                 STORM SURGE  42560000000
## 4                 RIVER FLOOD  10000000000
## 5                   HURRICANE   5501000000
## 6              TROPICAL STORM   5150000000
## 7                   ICE STORM   5000500000
## 8                WINTER STORM   5000000000
## 9                     TORNADO   4300000000
## 10           STORM SURGE/TIDE   4000000000
## 11  HEAVY RAIN/SEVERE WEATHER   2500000000
## 12                  HIGH WIND   2404000000
## 13             HURRICANE OPAL   2105000000
## 14                       HAIL   1800000000
## 15 TORNADOES, TSTM WIND, HAIL   1602500000
## 16           WILD/FOREST FIRE   1500000000
## 17        SEVERE THUNDERSTORM   1200000000
## 18                   WILDFIRE   1046500000

From summary, we see Flood causes most of the damages

ggplot(dmg1990, aes( x = reorder(evtype, -total), y = total)) + geom_bar(stat = "identity") + theme(axis.text.x=element_text(angle =  90)) + xlab("Severe Weath Type") + ylab("Total Loss (property and crops) in Dollars")

> From the plot, we see that flood cuases most of the damages and Hurricane/typhone follows it. It makes senese considering the frequency between flood and hurricane, even though hurricane has stronger magnitude than flood, but usually hurricane would also cause flooding.