This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Health-related data fields (FATALITIES and INJURIES) are selected. Get the total count of FATALITIES and INJURIES for each event type, and order the results in descending order. The top 4 records will show the most harmful events with respect to population health.
Economic loss related data fields (PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP) are selected. Firstly, transform the exponential power to digits. Then, calculate the total economic loss of the property damage and crop damage. Lastly, order the records by the total loss in descending order and get the top 4 records.
For the most harmful events, count the number of fatalities and injuries for each type of event, we can get the FATALITIES_SUM and INJURIES_SUM for each event type. HEALTHHARM_SUM is the sum of FATALITIES_SUM and INJURIES_SUM.
For the greatest economic consequences, calculate the economic loss of property damage and crop damage for each record. Group the records by event type and get the economic loss of property damage and crop damage for each event type. Also, sum the two damages to get the total damage for each event type.
“B” or “b” represents billion, so the exponential power is 9.
“M” or “m” represents million, so the exponential power is 6.
“K” or “k” represents thousand, so the exponential power is 3.
“H” or “h” represents hundred, so the exponential power is 2.
Digit from 0 - 9 represents the 10 to the power of n, where n is the digit.
Empty or other charaters will represents value 1, so the exponential power is 0.
The exponential power digit will be used to calculate the economic loss for the damages.
Download and select the relevant data fields.
library(tidyverse)
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2", destfile = "/Users/wenwenxia/Desktop/DataScienceSpecialization/Course_5/Week4/NOAA_Storm_Data_Analysis/repdata_data_StormData.csv.bz2")
data <- read.csv("repdata_data_StormData.csv.bz2")
dt1 <- data %>% select(EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
names(dt1)
## [1] "EVTYPE" "FATALITIES" "INJURIES" "PROPDMG" "PROPDMGEXP"
## [6] "CROPDMG" "CROPDMGEXP"
Tranform data for population health
dt2 <- dt1 %>% mutate(HEALTHHARM = FATALITIES + INJURIES) %>%
select(EVTYPE, FATALITIES, INJURIES, HEALTHHARM)
dt3 <- dt2 %>% group_by(EVTYPE) %>% summarise(FATALITIES_SUM = sum(FATALITIES),
INJURIES_SUM = sum(INJURIES),
HEALTHHARM_SUM = sum(HEALTHHARM)) %>%
arrange(-HEALTHHARM_SUM, -FATALITIES_SUM, -INJURIES_SUM) %>% top_n(4)
dt3$num <-1:nrow(dt3)
dt3$ORDER_EVTYPE <- paste(dt3$num, dt3$EVTYPE, sep= "_")
head(dt3)
## # A tibble: 4 × 6
## EVTYPE FATALITIES_SUM INJURIES_SUM HEALTHHARM_SUM num ORDER_EVTYPE
## <chr> <dbl> <dbl> <dbl> <int> <chr>
## 1 TORNADO 4320 73715 78035 1 1_TORNADO
## 2 TSTM WIND 348 4644 4992 2 2_TSTM WIND
## 3 LIGHTNING 292 1923 2215 3 3_LIGHTNING
## 4 ICE STORM 42 1837 1879 4 4_ICE STORM
Transform data for economic consequences
library(switchcase)
#convert the exponential character into multiplier integer
exp_to_multiplier <- function(dmgexp){
digit <- switchCase(
dmgexp,
alt(
..expr == "B" | ..expr == "b",
{},
{"9"}
),
alt(
..expr == "M" | ..expr == "m",
{},
"6"
),
alt(
..expr == "K" | ..expr == "k",
{},
"3"
),
alt(
..expr == "H" | ..expr == "h",
{},
"2"
),
alt(
..expr %in% c('0', '1', '2', '3', '4', '5', '6', '7', '8', '9'),
{},
dmgexp
),
alt(
..expr %in% c('', '+', '-', '?'),
{},
"0"
)
)
return (10 ^ as.numeric(digit))
}
Calculate the total economic loss for each event
dt4 <- dt1 %>% select(EVTYPE, PROPDMG, PROPDMGEXP, CROPDMG, CROPDMGEXP)
for(i in 1:nrow(dt4)){
dt4$PROPDMGUSD[i] = exp_to_multiplier(dt4$PROPDMGEXP[i]) * dt4$PROPDMG[i]
dt4$CROPDMGUSD[i] = exp_to_multiplier(dt4$CROPDMGEXP[i]) * dt4$CROPDMG[i]
dt4$TOTALDMGUSD[i] = dt4$PROPDMGUSD[i] + dt4$CROPDMGUSD[i]
}
Get the events with the greatest economic consequences
dt5 <- dt4 %>% select(EVTYPE, PROPDMGUSD, CROPDMGUSD, TOTALDMGUSD) %>%
group_by(EVTYPE) %>% summarise(PROPDMGUSD_SUM=sum(PROPDMGUSD),
CROPDMGUSD_SUM=sum(CROPDMGUSD),
TOTALDMGUSD_SUM=sum(TOTALDMGUSD)) %>%
arrange(-TOTALDMGUSD_SUM) %>% top_n(4)
dt5$num <-1:nrow(dt5)
dt5$ORDER_EVTYPE <- paste(dt5$num, dt5$EVTYPE, sep= "_")
head(dt5)
## # A tibble: 4 × 6
## EVTYPE PROPDMGUSD_SUM CROPDMGUSD_SUM TOTALDMGUSD_SUM num ORDER_EVTYPE
## <chr> <dbl> <dbl> <dbl> <int> <chr>
## 1 TORNADO 35189550166. 157639060 35347189226. 1 1_TORNADO
## 2 FLOOD 8897429557 1306553550 10203983107 2 2_FLOOD
## 3 RIVER FLOOD 5118939500 5029459000 10148398500 3 3_RIVER FLOOD
## 4 ICE STORM 972185540 5012173500 5984359040 4 4_ICE STORM
library(ggplot2)
ggplot(dt3, aes(as.factor(ORDER_EVTYPE), group=1)) +
geom_line(aes(y = HEALTHHARM_SUM, colour = "Health Harmful")) +
geom_line(aes(y = FATALITIES_SUM, colour = "Fatalities")) +
geom_line(aes(y = INJURIES_SUM, colour = "Injuries")) +
ggtitle("Events Affacting Polulation Health") +
xlab("Event Type")+
ylab("Number of Cases")
The most harmful event types with respect to population health are TORNADO, TSTM WIND, LIGHTNING, and ICE STORM.
ggplot(dt5, aes(as.factor(ORDER_EVTYPE), group=1)) +
geom_line(aes(y = TOTALDMGUSD_SUM, colour = "Total Damage")) +
geom_line(aes(y = PROPDMGUSD_SUM, colour = "Property Damage")) +
geom_line(aes(y = CROPDMGUSD_SUM, colour = "Crop Damage")) +
ggtitle("Events with Greatest Economic Consequences") +
xlab("Event Type")+
ylab("Economic Loss in USD")
The event types with greatest economic consequences are TORNADO, FLOOD, RIVER FLOOD and ICE_STORM.