Sever weather events can have significant impacts on individuals, envirnoment, economy, and sometimes the entire country. Many sever weathers, like Tornado, Flood, and Storm, can result in huge fatalities, injuries, and property damage.
This project involves exploring the U.S National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States.
This project is done in RStudio. Exploratory data analysis is conducted to reveal the most fatal weather events and those that caused the greatest economic damages. The results are ploted accordingly.
library(dplyr)
library(reshape2)
library(ggplot2)
if(!file.exists("data")) {
dir.create("C:/Users/JJQ/Documents/data")
}
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
dest="StormData.csv.bz2", method="curl")
stormdata <- read.csv("StormData.csv.bz2", header = TRUE, sep = ",", stringsAsFactors = FALSE, na.strings = "NA")
str(stormdata)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
storm <- stormdata %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
summary(is.na(storm))
## EVTYPE FATALITIES INJURIES PROPDMG
## Mode :logical Mode :logical Mode :logical Mode :logical
## FALSE:902297 FALSE:902297 FALSE:902297 FALSE:902297
## PROPDMGEXP CROPDMG CROPDMGEXP
## Mode :logical Mode :logical Mode :logical
## FALSE:902297 FALSE:902297 FALSE:902297
2.3.1 In this data set, two varialbes are related to population health, FATALITIES and INJURIES. 2.3.2 Assumption: if fatality or injure is equal to zero, that type of weather event has no harm to people. 2.3.3 Select and process data greater than zero 2.3.4 The objective of this project is to find the most harmful weather type, thus, data over years for these two variables are summarized. 2.3.5 Top ten weather events that cause the highest fatalities and injuries of population are found.
health_harm <- storm %>% select(EVTYPE, FATALITIES, INJURIES) %>% filter(FATALITIES > 0 & INJURIES > 0)
health_harm <- aggregate(cbind(FATALITIES, INJURIES) ~ EVTYPE, data = health_harm, FUN = sum)
health_harm <- health_harm[order(health_harm$FATALITIES, decreasing = TRUE), ]
health_harm <- health_harm[1:10, ]
health_harm
## EVTYPE FATALITIES INJURIES
## 70 TORNADO 5227 60187
## 11 EXCESSIVE HEAT 402 4791
## 50 LIGHTNING 283 649
## 73 TSTM WIND 199 646
## 16 FLASH FLOOD 171 641
## 17 FLOOD 104 2679
## 38 HIGH WIND 102 308
## 82 WINTER STORM 85 599
## 28 HEAT 73 1420
## 80 WILDFIRE 55 261
2.4.1 In this data set, four varialbes are related to economic damages, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP. 2.4.2 Assumption: property damage or crop damage in billions of dollars is greater than those in millions 2.4.3 Select and process data that caused billions of dollars economic damages. 2.4.4 The objective of this project is to find the weather type that caused the greatest economic damage, thus, data over years are summarized. 2.4.5 Data are transformed to numeric format for further analysis. For example, 1.5 B transform to 1.5*10^9. 2.4.6 Top ten weather events that caused the greatest economic damages are found.
damage <- storm %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) %>% filter(PROPDMGEXP == "B")
damage$PROPDMGEXP <- as.numeric(gsub("B", 1e9, damage$PROPDMGEXP))
damage$CROPDMGEXP <- gsub("K", 1e3, damage$CROPDMGEXP)
damage$CROPDMGEXP <- gsub("B", 1e9, damage$CROPDMGEXP)
damage$CROPDMGEXP <- gsub("M", 1e6, damage$CROPDMGEXP)
damage$CROPDMGEXP <- as.numeric(damage$CROPDMGEXP)
damage_economic <- damage %>% as_tibble() %>% mutate(crop = CROPDMG * CROPDMGEXP, property = PROPDMG*PROPDMGEXP)
damage_economic <- aggregate(cbind(property, crop) ~ EVTYPE, data = damage_economic, FUN = sum)
damage_economic <- damage_economic[order(damage_economic$property, decreasing = TRUE), ]
damage_economic <- damage_economic[1:10, ]
damage_economic
## EVTYPE property crop
## 2 FLOOD 1.195e+11 32501000
## 7 HURRICANE/TYPHOON 2.413e+10 1938500000
## 4 HURRICANE 5.700e+09 801000000
## 10 TORNADO 5.300e+09 0
## 8 RIVER FLOOD 5.000e+09 5000000000
## 9 STORM SURGE/TIDE 4.000e+09 0
## 5 HURRICANE OPAL 2.100e+09 5000000
## 3 HAIL 1.800e+09 0
## 11 TORNADOES, TSTM WIND, HAIL 1.600e+09 2500000
## 12 WILDFIRE 1.040e+09 6500000
health_melt <- melt(health_harm, id.vars = "EVTYPE", measure.vars = c("FATALITIES", "INJURIES"))
g <- ggplot(health_melt, aes(x = EVTYPE, y = value, fill = variable))
g <- g + geom_bar(color = "black", stat = "identity", position = "dodge") +
ggtitle("The Most Harmful Weather Type") + xlab("Event Type") + ylab("Harmnfulness") +
labs(caption = "Source: National Weather Service Instruction 10-1605") +
theme(axis.text.x = element_text(angle=85, vjust=0.5), plot.caption = element_text(hjust = 0.5)) +
scale_fill_manual(values = c("light blue", "navy"))
g
damage_melt <- melt(damage_economic, id.vars = "EVTYPE", measure.vars = c("property", "crop"))
damage_melt
## EVTYPE variable value
## 1 FLOOD property 1.1950e+11
## 2 HURRICANE/TYPHOON property 2.4130e+10
## 3 HURRICANE property 5.7000e+09
## 4 TORNADO property 5.3000e+09
## 5 RIVER FLOOD property 5.0000e+09
## 6 STORM SURGE/TIDE property 4.0000e+09
## 7 HURRICANE OPAL property 2.1000e+09
## 8 HAIL property 1.8000e+09
## 9 TORNADOES, TSTM WIND, HAIL property 1.6000e+09
## 10 WILDFIRE property 1.0400e+09
## 11 FLOOD crop 3.2501e+07
## 12 HURRICANE/TYPHOON crop 1.9385e+09
## 13 HURRICANE crop 8.0100e+08
## 14 TORNADO crop 0.0000e+00
## 15 RIVER FLOOD crop 5.0000e+09
## 16 STORM SURGE/TIDE crop 0.0000e+00
## 17 HURRICANE OPAL crop 5.0000e+06
## 18 HAIL crop 0.0000e+00
## 19 TORNADOES, TSTM WIND, HAIL crop 2.5000e+06
## 20 WILDFIRE crop 6.5000e+06
g2 <- ggplot(damage_melt, aes(x = EVTYPE, y = value, fill = variable)) +
geom_bar(stat = "identity", position = "dodge") +
ggtitle("The Greateat Economic Damages") +
labs(caption = "Source: National Weather Service Instruction 10-1605") + xlab("Event Type") + ylab("Damage") +
theme(axis.text.x = element_text(angle=90, vjust=0.5), plot.caption = element_text(hjust = 0.5)) +
scale_fill_manual(values = c("dark green", "orange"))
g2