Synopsis

Sever weather events can have significant impacts on individuals, envirnoment, economy, and sometimes the entire country. Many sever weathers, like Tornado, Flood, and Storm, can result in huge fatalities, injuries, and property damage.

This project involves exploring the U.S National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States.

This project is done in RStudio. Exploratory data analysis is conducted to reveal the most fatal weather events and those that caused the greatest economic damages. The results are ploted accordingly.

Data Processing

CodeBook:

  • EVTYPE - Even type
  • FATALITIES - fatalities, number of people killed by certain weather event type
  • INJURIES - injuries, number of people injured by certain weather event type
  • PROPDMG - property damage, actual dollar amounts
  • CROPDMG - crop damage, actual dollar amounts
  • PROPDMGEXP - magnitude of dollar amounts for property damage
  • CROPDMGEXP - magnitude of dollar amounts for crop damage
  • Alphabetical characters are used to signify magnitude of values of economic damages: 1. “K” - thoursands of dollars; 2. “M” - millions of dollars; 3. “B” - billions of dollars.

1. Load the raw data file (i.e. the original.csv.bz2 file)

library(dplyr)
library(reshape2)
library(ggplot2)
if(!file.exists("data")) {
  dir.create("C:/Users/JJQ/Documents/data")
}

download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", 
              dest="StormData.csv.bz2", method="curl")

stormdata <- read.csv("StormData.csv.bz2", header = TRUE, sep = ",", stringsAsFactors = FALSE, na.strings = "NA")
str(stormdata)
## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

2.2 Check if there is any missing value in the subset data. There is no missing value in the data set.

storm <- stormdata %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)

summary(is.na(storm))
##    EVTYPE        FATALITIES       INJURIES        PROPDMG       
##  Mode :logical   Mode :logical   Mode :logical   Mode :logical  
##  FALSE:902297    FALSE:902297    FALSE:902297    FALSE:902297   
##  PROPDMGEXP       CROPDMG        CROPDMGEXP     
##  Mode :logical   Mode :logical   Mode :logical  
##  FALSE:902297    FALSE:902297    FALSE:902297

2.3 Process data for fatalities and injuries

2.3.1 In this data set, two varialbes are related to population health, FATALITIES and INJURIES. 2.3.2 Assumption: if fatality or injure is equal to zero, that type of weather event has no harm to people. 2.3.3 Select and process data greater than zero 2.3.4 The objective of this project is to find the most harmful weather type, thus, data over years for these two variables are summarized. 2.3.5 Top ten weather events that cause the highest fatalities and injuries of population are found.

health_harm <- storm %>% select(EVTYPE, FATALITIES, INJURIES) %>% filter(FATALITIES > 0 & INJURIES > 0)
health_harm <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = health_harm, FUN = sum)
health_harm <- health_harm[order(health_harm$FATALITIES, decreasing = TRUE), ]
health_harm <- health_harm[1:10, ]
health_harm
##            EVTYPE FATALITIES INJURIES
## 70        TORNADO       5227    60187
## 11 EXCESSIVE HEAT        402     4791
## 50      LIGHTNING        283      649
## 73      TSTM WIND        199      646
## 16    FLASH FLOOD        171      641
## 17          FLOOD        104     2679
## 38      HIGH WIND        102      308
## 82   WINTER STORM         85      599
## 28           HEAT         73     1420
## 80       WILDFIRE         55      261

2.4 Process data for economic damages

2.4.1 In this data set, four varialbes are related to economic damages, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. 2.4.2 Assumption: property damage or crop damage in billions of dollars is greater than those in millions 2.4.3 Select and process data that caused billions of dollars economic damages. 2.4.4 The objective of this project is to find the weather type that caused the greatest economic damage, thus, data over years are summarized. 2.4.5 Data are transformed to numeric format for further analysis. For example, 1.5 B transform to 1.5*10^9. 2.4.6 Top ten weather events that caused the greatest economic damages are found.

damage <- storm %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% filter(PROPDMGEXP == "B")

damage$PROPDMGEXP <- as.numeric(gsub("B", 1e9, damage$PROPDMGEXP))

damage$CROPDMGEXP <- gsub("K", 1e3, damage$CROPDMGEXP)
damage$CROPDMGEXP <- gsub("B", 1e9, damage$CROPDMGEXP)
damage$CROPDMGEXP <- gsub("M", 1e6, damage$CROPDMGEXP)
damage$CROPDMGEXP <- as.numeric(damage$CROPDMGEXP)

damage_economic <- damage %>% as_tibble() %>% mutate(crop = CROPDMG * CROPDMGEXP, property = PROPDMG*PROPDMGEXP)
damage_economic <- aggregate(cbind(property, crop) ~ EVTYPE, data = damage_economic, FUN = sum)
damage_economic <- damage_economic[order(damage_economic$property, decreasing = TRUE), ]
damage_economic <- damage_economic[1:10, ]
damage_economic
##                        EVTYPE  property       crop
## 2                       FLOOD 1.195e+11   32501000
## 7           HURRICANE/TYPHOON 2.413e+10 1938500000
## 4                   HURRICANE 5.700e+09  801000000
## 10                    TORNADO 5.300e+09          0
## 8                 RIVER FLOOD 5.000e+09 5000000000
## 9            STORM SURGE/TIDE 4.000e+09          0
## 5              HURRICANE OPAL 2.100e+09    5000000
## 3                        HAIL 1.800e+09          0
## 11 TORNADOES, TSTM WIND, HAIL 1.600e+09    2500000
## 12                   WILDFIRE 1.040e+09    6500000

Data Visualization

3.1 Plot the most harmful weather type of events to population health

health_melt <- melt(health_harm, id.vars = "EVTYPE", measure.vars = c("FATALITIES", "INJURIES"))


g <- ggplot(health_melt, aes(x = EVTYPE, y = value, fill = variable))
g <- g + geom_bar(color = "black", stat = "identity", position = "dodge") + 
  ggtitle("The Most Harmful Weather Type") + xlab("Event Type") + ylab("Harmnfulness") + 
  labs(caption = "Source: National Weather Service Instruction 10-1605") + 
  theme(axis.text.x = element_text(angle=85, vjust=0.5), plot.caption = element_text(hjust = 0.5)) + 
  scale_fill_manual(values = c("light blue", "navy"))

g

3.2 Plot the weather type of events that caused the greatest economic damages

damage_melt <- melt(damage_economic, id.vars = "EVTYPE", measure.vars = c("property", "crop"))
damage_melt
##                        EVTYPE variable      value
## 1                       FLOOD property 1.1950e+11
## 2           HURRICANE/TYPHOON property 2.4130e+10
## 3                   HURRICANE property 5.7000e+09
## 4                     TORNADO property 5.3000e+09
## 5                 RIVER FLOOD property 5.0000e+09
## 6            STORM SURGE/TIDE property 4.0000e+09
## 7              HURRICANE OPAL property 2.1000e+09
## 8                        HAIL property 1.8000e+09
## 9  TORNADOES, TSTM WIND, HAIL property 1.6000e+09
## 10                   WILDFIRE property 1.0400e+09
## 11                      FLOOD     crop 3.2501e+07
## 12          HURRICANE/TYPHOON     crop 1.9385e+09
## 13                  HURRICANE     crop 8.0100e+08
## 14                    TORNADO     crop 0.0000e+00
## 15                RIVER FLOOD     crop 5.0000e+09
## 16           STORM SURGE/TIDE     crop 0.0000e+00
## 17             HURRICANE OPAL     crop 5.0000e+06
## 18                       HAIL     crop 0.0000e+00
## 19 TORNADOES, TSTM WIND, HAIL     crop 2.5000e+06
## 20                   WILDFIRE     crop 6.5000e+06
g2 <- ggplot(damage_melt, aes(x = EVTYPE, y = value, fill = variable)) + 
  geom_bar(stat = "identity", position = "dodge") + 
  ggtitle("The Greateat Economic Damages") + 
  labs(caption = "Source: National Weather Service Instruction 10-1605") + xlab("Event Type") + ylab("Damage") + 
  theme(axis.text.x = element_text(angle=90, vjust=0.5), plot.caption = element_text(hjust = 0.5)) + 
  scale_fill_manual(values = c("dark green", "orange"))
g2

Result