Background

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Data Analysis Summary

Health-related data fields (FATALITIES and INJURIES) are selected. Get the total count of FATALITIES and INJURIES for each event type, and order the results in descending order. The top 4 records will show the most harmful events with respect to population health.

Economic loss related data fields (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) are selected. Firstly, transform the exponential power to digits. Then, calculate the total economic loss of the property damage and crop damage. Lastly, order the records by the total loss in descending order and get the top 4 records.

Source Data Descriptions and Summaries

For the most harmful events, count the number of fatalities and injuries for each type of event, we can get the FATALITIES_SUM and INJURIES_SUM for each event type. HEALTHHARM_SUM is the sum of FATALITIES_SUM and INJURIES_SUM.

For the greatest economic consequences, calculate the economic loss of property damage and crop damage for each record. Group the records by event type and get the economic loss of property damage and crop damage for each event type. Also, sum the two damages to get the total damage for each event type.

Data Transformation Approaches

“B” or “b” represents billion, so the exponential power is 9.
“M” or “m” represents million, so the exponential power is 6.
“K” or “k” represents thousand, so the exponential power is 3.
“H” or “h” represents hundred, so the exponential power is 2.
Digit from 0 - 9 represents the 10 to the power of n, where n is the digit.
Empty or other charaters will represents value 1, so the exponential power is 0.
The exponential power digit will be used to calculate the economic loss for the damages.

Data Processing

Download and select the relevant data fields.

library(tidyverse)
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "/Users/wenwenxia/Desktop/DataScienceSpecialization/Course_5/Week4/NOAA_Storm_Data_Analysis/repdata_data_StormData.csv.bz2")
data <- read.csv("repdata_data_StormData.csv.bz2")
dt1 <- data %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
names(dt1)
## [1] "EVTYPE"     "FATALITIES" "INJURIES"   "PROPDMG"    "PROPDMGEXP"
## [6] "CROPDMG"    "CROPDMGEXP"

Tranform data for population health

dt2 <- dt1 %>% mutate(HEALTHHARM = FATALITIES + INJURIES) %>% 
    select(EVTYPE, FATALITIES, INJURIES, HEALTHHARM)
dt3 <- dt2 %>% group_by(EVTYPE) %>% summarise(FATALITIES_SUM = sum(FATALITIES),
                                              INJURIES_SUM = sum(INJURIES),
                                              HEALTHHARM_SUM = sum(HEALTHHARM)) %>%
    arrange(-HEALTHHARM_SUM, -FATALITIES_SUM, -INJURIES_SUM) %>% top_n(4)

dt3$num <-1:nrow(dt3)
dt3$ORDER_EVTYPE <- paste(dt3$num, dt3$EVTYPE, sep= "_")
head(dt3)
## # A tibble: 4 × 6
##   EVTYPE    FATALITIES_SUM INJURIES_SUM HEALTHHARM_SUM   num ORDER_EVTYPE
##   <chr>              <dbl>        <dbl>          <dbl> <int> <chr>       
## 1 TORNADO             4320        73715          78035     1 1_TORNADO   
## 2 TSTM WIND            348         4644           4992     2 2_TSTM WIND 
## 3 LIGHTNING            292         1923           2215     3 3_LIGHTNING 
## 4 ICE STORM             42         1837           1879     4 4_ICE STORM

Transform data for economic consequences

library(switchcase)

#convert the exponential character into multiplier integer
exp_to_multiplier <- function(dmgexp){
    digit <- switchCase(
        dmgexp,
        alt(
            ..expr  == "B" |  ..expr == "b",
            {},
            {"9"}
        ),
        alt(
            ..expr  == "M" |  ..expr == "m",
            {},
            "6"
        ),
        alt(
            ..expr  == "K" |  ..expr == "k",
            {},
            "3"
        ),
        alt(
            ..expr  == "H" |  ..expr == "h",
            {},
            "2"
        ),
        alt(
            ..expr  %in% c('0', '1', '2', '3', '4', '5', '6', '7', '8', '9'),
            {},
            dmgexp
        ),
        alt(
            ..expr %in% c('', '+', '-', '?'),
            {},
            "0"
        )
    )
    return (10 ^ as.numeric(digit))
}

Calculate the total economic loss for each event

dt4 <- dt1 %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
for(i in 1:nrow(dt4)){
    dt4$PROPDMGUSD[i] = exp_to_multiplier(dt4$PROPDMGEXP[i]) * dt4$PROPDMG[i]
    dt4$CROPDMGUSD[i] = exp_to_multiplier(dt4$CROPDMGEXP[i]) * dt4$CROPDMG[i]
    dt4$TOTALDMGUSD[i] = dt4$PROPDMGUSD[i] + dt4$CROPDMGUSD[i]
}

Get the events with the greatest economic consequences

dt5 <- dt4 %>% select(EVTYPE, PROPDMGUSD, CROPDMGUSD, TOTALDMGUSD) %>% 
  group_by(EVTYPE) %>% summarise(PROPDMGUSD_SUM=sum(PROPDMGUSD),
                                 CROPDMGUSD_SUM=sum(CROPDMGUSD),
                                 TOTALDMGUSD_SUM=sum(TOTALDMGUSD)) %>%
  arrange(-TOTALDMGUSD_SUM) %>% top_n(4)

dt5$num <-1:nrow(dt5)
dt5$ORDER_EVTYPE <- paste(dt5$num, dt5$EVTYPE, sep= "_")
head(dt5)
## # A tibble: 4 × 6
##   EVTYPE      PROPDMGUSD_SUM CROPDMGUSD_SUM TOTALDMGUSD_SUM   num ORDER_EVTYPE 
##   <chr>                <dbl>          <dbl>           <dbl> <int> <chr>        
## 1 TORNADO       35189550166.      157639060    35347189226.     1 1_TORNADO    
## 2 FLOOD          8897429557      1306553550    10203983107      2 2_FLOOD      
## 3 RIVER FLOOD    5118939500      5029459000    10148398500      3 3_RIVER FLOOD
## 4 ICE STORM       972185540      5012173500     5984359040      4 4_ICE STORM

Results

Plot for population health

library(ggplot2)
ggplot(dt3, aes(as.factor(ORDER_EVTYPE), group=1)) + 
  geom_line(aes(y = HEALTHHARM_SUM, colour = "Health Harmful")) + 
  geom_line(aes(y = FATALITIES_SUM, colour = "Fatalities")) +
  geom_line(aes(y = INJURIES_SUM, colour = "Injuries")) +
  ggtitle("Events Affacting Polulation Health") +
  xlab("Event Type")+
  ylab("Number of Cases")

The most harmful event types with respect to population health are TORNADO, TSTM WIND, LIGHTNING, and ICE STORM.

Plot for economic damage

ggplot(dt5, aes(as.factor(ORDER_EVTYPE), group=1)) + 
  geom_line(aes(y = TOTALDMGUSD_SUM, colour = "Total Damage")) + 
  geom_line(aes(y = PROPDMGUSD_SUM, colour = "Property Damage")) +
  geom_line(aes(y = CROPDMGUSD_SUM, colour = "Crop Damage")) +
  ggtitle("Events with Greatest Economic Consequences") +
  xlab("Event Type")+
  ylab("Economic Loss in USD")

The event types with greatest economic consequences are TORNADO, FLOOD, RIVER FLOOD and ICE_STORM.