Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. It’s going to answering two questions: (1) Across the United States, which types of events are most harmful with respect to population health? (2) Across the United States, which types of events have the greatest economic consequences? The answers are tonardo and flood respectively.
StormData <- read_csv("StormData.csv")
dimension <- dim(StormData)
total_events <- length(table(StormData$EVTYPE))
The dataset contains 902297 observations and 37 variables. According to the National Weather Service Storm Data Documentation, Section 2.1.1 Storm Data Event Table (The URL of that documentation: https://d396qusza40orc.cloudfront.net/repdata%2Fpeer2_doc%2Fpd01016005curr.pdf), the storm data contains 48 different storm events. However, the imported dataset contains 977 different storm events. Hence, rows which contains unsuitable event types have to be removed.
event_name <- c(
"Astronomical Low Tide", "Avalanche", "Blizzard", "Coastal Flood",
"Cold/Wind Chill", "Debris Flow", "Dense Fog", "Dense Smoke",
"Drought", "Dust Devil", "Dust Storm", "Excessive Heat",
"Extreme Cold/Wind Chill", "Flash Flood", "Flood", "Frost/Freeze",
"Funnel Cloud", "Freezing Fog", "Hail", "Heat", "Heavy Rain",
"Heavy Snow", "High Surf", "High Wind", "Hurricane", "Typhoon",
"Ice Storm", "Lake-Effect Snow", "Lakeshore Flood", "Lightning",
"Marine Hail", "Marine High Wind", "Marine Strong Wind",
"Marine Thunderstorm Wind", "Rip Current", "Seiche", "Sleet",
"Storm Surge/Tide", "Strong Wind", "Thunderstorm Wind", "Tornado",
"Tropical Depression", "Tropical Storm", "Tsunami", "Volcanic Ash",
"Waterspout", "Wildfire", "Winter Storm", "Winter Weather"
)
event_name <- str_to_upper(event_name)
StormData2 <- StormData %>%
filter(EVTYPE %in% event_name) %>%
select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
dimension2 <- dim(StormData2)
After the data transformation, the new data frame, StormData2, contrains 635464 observations and 7 variables.
In order to tackle the questions, two subsets of data frame were created from StormData2, i.e. storm_health and storm_econ.
It contains variables of EVTYPE, FATALITIES and INJURIES, representing events types, fatalities and injuries respectively. A new variable, HARM_SUM, representing population health, was created by adding up FATALITIES and INJURIES. Then the subset was grouped by events types and a summary of total count of population health affected by each events types was calculated.
storm_health <- StormData2 %>%
select(EVTYPE, FATALITIES, INJURIES) %>%
mutate(HARM_SUM = FATALITIES + INJURIES) %>%
select(EVTYPE, HARM_SUM) %>%
group_by(EVTYPE) %>%
summarise(TOTAL_HARM = sum(HARM_SUM)) %>%
arrange(desc(TOTAL_HARM))
head(storm_health)
## # A tibble: 6 x 2
## EVTYPE TOTAL_HARM
## <chr> <dbl>
## 1 TORNADO 96979
## 2 EXCESSIVE HEAT 8428
## 3 FLOOD 7259
## 4 LIGHTNING 6046
## 5 HEAT 3037
## 6 FLASH FLOOD 2755
It contains variables EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG and CROPDMGEXP, representing events types, property damage, property damage exponential, crop damage and crop damage exponential respectively.
storm_econ <- StormData2 %>%
select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
na_prop <- sum(is.na(StormData2$PROPDMGEXP))
na_crop <- sum(is.na(StormData2$CROPDMGEXP))
unique(StormData2$PROPDMGEXP)
## [1] "K" "M" NA "B" "+" "0" "5" "m" "2" "4" "7" "?" "-" "6" "3" "1" "8"
## [18] "H"
unique(StormData2$CROPDMGEXP)
## [1] NA "K" "M" "B" "0" "k"
According to the National Weather Service Storm Data Documentation, Section 2.7, both PROPDMGEXP and CROPDMGEXP should have three values only: K for thousand, M for million, and B for billion. However, in the dataset, they contains other values and many NA values (PROPDMGEXP has 279370 NAs; CROPDMGEXP has 360444 NAs).
Since the actual amount of property damage is the multiplication of PROPDMG and PROPDMGEXP, all the NAs and strange values in PROPDMGEXP were replaced by 1. The same strategy also applied to the actual amount of crop damage, which is the multiplication of CROPDMG and CROPDMGEXP. On the contrary, the remaining values were replaced by their corresponding exponential values. After all of that, both PROPDMGEXP and CROPDMGEXP were converted from character to numeric, in order to carry out the multiplication later.
propdmgexp <- storm_econ$PROPDMGEXP
for (i in 1:length(propdmgexp)){
if(is.na(propdmgexp[i])){
propdmgexp[i] <- 1
}
}
for (i in 1:length(propdmgexp)){
if(propdmgexp[i] == "K" | propdmgexp[i] == "3") {
propdmgexp[i] <- 10^3
} else if (propdmgexp[i] == "M" | propdmgexp[i] == "m" | propdmgexp[i] == "6") {
propdmgexp[i] <- 10^6
} else if (propdmgexp[i] == "B") {
propdmgexp[i] <- 10^9
} else if (propdmgexp[i] == "5") {
propdmgexp[i] <- 10^5
} else if (propdmgexp[i] == "2" | propdmgexp[i] == "H") {
propdmgexp[i] <- 10^2
} else if (propdmgexp[i] == "4") {
propdmgexp[i] <- 10^4
} else if (propdmgexp[i] == "7") {
propdmgexp[i] <- 10^7
} else if (propdmgexp[i] == "8") {
propdmgexp[i] <- 10^8
} else {
propdmgexp[i] <- 1
}
}
propdmgexp <- as.numeric(propdmgexp)
cropdmgexp <- storm_econ$CROPDMGEXP
for (i in 1:length(cropdmgexp)){
if(is.na(cropdmgexp[i])){
cropdmgexp[i] <- 1
}
}
for (i in 1:length(cropdmgexp)){
if(cropdmgexp[i] == "K" | cropdmgexp[i] == "k") {
cropdmgexp[i] <- 1000
} else if (cropdmgexp[i] == "M") {
cropdmgexp[i] <- 1000000
} else if (cropdmgexp[i] == "B") {
cropdmgexp[i] <- 1000000000
} else {
cropdmgexp[i] <- 1
}
}
cropdmgexp <- as.numeric(cropdmgexp)
After tackling the NAs and strange values, the subset, storm_econ, was grouped by events types. A summary was concluded by summing up total amount of economic damage for each events types.
storm_econ <- storm_econ %>%
mutate(PROPDMGEXP = propdmgexp,
CROPDMGEXP = cropdmgexp,
TOTAL_DMG = PROPDMG * PROPDMGEXP + CROPDMG * CROPDMGEXP) %>%
group_by(EVTYPE) %>%
summarize(TOTAL = sum(TOTAL_DMG)) %>%
arrange(desc(TOTAL))
head(storm_econ)
## # A tibble: 6 x 2
## EVTYPE TOTAL
## <chr> <dbl>
## 1 FLOOD 150319678257
## 2 TORNADO 57362333947
## 3 HAIL 18761221986
## 4 FLASH FLOOD 18244041079
## 5 DROUGHT 15018672000
## 6 HURRICANE 14610229010
ggplot(head(storm_health, 10), aes(reorder(EVTYPE, TOTAL_HARM), TOTAL_HARM)) +
geom_col() +
coord_flip() +
labs(title = "Top 10 Storm Events' Consequences on Population Health",
y = "Total Number",
x = "Storm Event Types")
The graph’s y-axis contains the 48 different storm events types while its x-asix displays the total count for each types.
ggplot(head(storm_econ, 10), aes(reorder(EVTYPE, TOTAL), TOTAL)) +
geom_col() +
coord_flip() +
labs(title = "Top 10 Economic Damage Caused by Storm Events",
y = "Damage Amount($)",
x = "Storm Event Types")
The graph’s y-axis contains the 48 different storm events types while its x-asix displays the total amount of damage for each types.