Minh Tri 04/04/2022
Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage. The basic goal of this report is to explore the NOAA Storm Database by using R and answer some basic questions about severe weather events.
The data for the analysis was downloaded from the NOAA storm database. After the data is downloaded from the website, it is uncompressed and read into R environment
setwd("D:/Statistics/R/R data/Reproducible Research Project 2")
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "repdata_data_StormData.csv.bz2")
data <- read.csv(bzfile("repdata_data_StormData.csv.bz2"))
To answer the 1st question, we create a bar chart illustrate the relation between Event type(x axis) and the number of injuries(y axis)
At the beggining, let summerize the number of injuries according to Event type. After that, reorder the event and drawing a bar graph.
injuries <- aggregate(data$INJURIES, by = list(EVENT= data$EVTYPE), sum)
injuries <- injuries[order(injuries$x, decreasing = TRUE), ]
head(injuries)
## EVENT x
## 834 TORNADO 91346
## 856 TSTM WIND 6957
## 170 FLOOD 6789
## 130 EXCESSIVE HEAT 6525
## 464 LIGHTNING 5230
## 275 HEAT 2100
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 4.1.3
bar <- ggplot(injuries[1:6, ], aes(EVENT, x, fill = EVENT, label = x))
bar + stat_summary(geom = "bar") + labs(x = "Event Type", y = "Number of Injuries") + geom_text(nudge_y = 4000) + ggtitle("Most Injuries Events")
## No summary function supplied, defaulting to `mean_se()`
Looking at the graph, we can see that TORNADO is responsible for most of
injuries (91346 events occurred)
To answer this question, we create a bar chart illustrate the relation between Event type(x axis) and the number of fatalitis(y axis)
At the beggining, let summerize the number of fatalities according to Event type. After that, reorder the event and drawing a bar graph.
fatalities <- aggregate(data$FATALITIES, by = list(EVENT= data$EVTYPE), sum)
fatalities <- fatalities[order(fatalities$x, decreasing = TRUE), ]
head(fatalities)
## EVENT x
## 834 TORNADO 5633
## 130 EXCESSIVE HEAT 1903
## 153 FLASH FLOOD 978
## 275 HEAT 937
## 464 LIGHTNING 816
## 856 TSTM WIND 504
library(ggplot2)
bar1 <- ggplot(fatalities[1:6, ], aes(EVENT, x, fill = EVENT, label = x))
bar1 + stat_summary(geom = "bar") + labs(x = "Event Type", y = "Number of fatalities") + geom_text(nudge_y = 200) + ggtitle("Most Fatal Events")
## No summary function supplied, defaulting to `mean_se()`
Looking at the graph, we can see that TORNADO is responsible for most of
fatalities (5633 events occurred)
Checking the all the characters of PRO/CROPDMGEXP variables
unique(data$PROPDMGEXP)
## [1] "K" "M" "" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"
unique(data$CROPDMGEXP)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
Changing these characters to upper case
data$PROPDMGEXP <- toupper(data$PROPDMGEXP)
data$CROPDMGEXP <- toupper(data$CROPDMGEXP)
unique(data$PROPDMGEXP); unique(data$CROPDMGEXP)
## [1] "K" "M" "" "B" "+" "0" "5" "6" "?" "4" "2" "3" "H" "7" "-" "1" "8"
## [1] "" "M" "K" "B" "?" "0" "2"
Assigning numeric value according to the characters: Billion (9), Hundred (2), Kilo (3), and Million (6)
data[data$PROPDMGEXP == "B", "PROPDMGEXP"] <- 9
data[data$PROPDMGEXP == "M", "PROPDMGEXP"] <- 6
data[data$PROPDMGEXP == "K", "PROPDMGEXP"] <- 3
data[data$PROPDMGEXP == "H", "PROPDMGEXP"] <- 2
data[data$PROPDMGEXP %in% c("", "+", "-", "?"), "PROPDMGEXP"] <- "0"
data[data$CROPDMGEXP %in% c("", "+", "-", "?"), "CROPDMGEXP"] <- "0"
data[data$CROPDMGEXP == "B", "CROPDMGEXP"] <- 9
data[data$CROPDMGEXP == "M", "CROPDMGEXP"] <- 6
data[data$CROPDMGEXP == "K", "CROPDMGEXP"] <- 3
data[data$CROPDMGEXP == "H", "CROPDMGEXP"] <- 2
unique(c(data$PROPDMGEXP, data$CROPDMGEXP))
## [1] "3" "6" "0" "9" "5" "4" "2" "7" "1" "8"
Assign the PDMGEXP value
data$PROPDMGEXP <- 10^(as.numeric(data$PROPDMGEXP))
data$CROPDMGEXP <- 10^(as.numeric(data$CROPDMGEXP))
Calculate the total damage
data$DMGTOTAL <- data$PROPDMGEXP * data$PROPDMG + data$CROPDMGEXP * data$CROPDMG
Extract the value (DMGToTAL, EVTYPE)
DamageByType <- aggregate(data$DMGTOTAL, by = list(EVENT= data$EVTYPE), sum)
DamageByType <- DamageByType[order(DamageByType$x, decreasing = TRUE), ]
head(DamageByType)
## EVENT x
## 170 FLOOD 150319678257
## 411 HURRICANE/TYPHOON 71913712800
## 834 TORNADO 57362333947
## 670 STORM SURGE 43323541000
## 244 HAIL 18761221986
## 153 FLASH FLOOD 18243991079
Drawing plot
library(ggplot2)
bar2 <- ggplot(DamageByType[1:6, ], aes(EVENT, x, fill = EVENT))
bar2 + geom_bar(stat = "identity") + labs(x = "Event Type", y = "Total Damage") + ggtitle("Top 6 events that cause severe damage")
Looking at the graph, we can see that FLOOD is responsible for most of
the damage occurred.
Based on the graph obtained, while FLOOD is responsible for most of the damage, TORNADO is the major cause of human injuries and fatalities.