This study examines and compare the impacts of differnet events using the date from U.S.National Oceanic and Atmosphere Administration (NOAA). From the perspectives of the lethality and the cost, this study listed and presented the most dangerous natural events. For the most lethal 10 events, the study uses the fatalities and injuries to demonstrate the lethality. For the most costly 10 events, the study uses the damage of properties and damage of crops to demonstrate the economic costs. In sum, tonado, heat, and flood are the most lethal natural disasters in the United States; tornado, TSTM wind, and flood are the events injure most people in the United States;lastly,flood, Typhoon/hurricane and Tornado led to most economic loss in the United States.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(reshape2)
library(ggplot2)
# download process at local:
'fileUrl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
Dest_1 <- "~/JHU_DS4/PeerProject2/repdata-data-StormData.bz2"
download.file(fileUrl, destfile = Dest_1, method = curl)
if(!file.exists(Dest_1)){download.file(FileUrl,destfile = Dest_1, method = "curl")}
if(file.exists(Dest_1)){unzip(Dest_1)}'
## [1] "fileUrl <- \"https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2\"\nDest_1 <- \"~/JHU_DS4/PeerProject2/repdata-data-StormData.bz2\"\ndownload.file(fileUrl, destfile = Dest_1, method = curl)\n\nif(!file.exists(Dest_1)){download.file(FileUrl,destfile = Dest_1, method = \"curl\")}\nif(file.exists(Dest_1)){unzip(Dest_1)}"
storm <- read.csv('~/JHU_DS4/PeerProject2/repdata-data-StormData.csv')
str(storm$EVTYPE)
## Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
summary(storm$EVTYPE)
## HAIL TSTM WIND THUNDERSTORM WIND
## 288661 219940 82563
## TORNADO FLASH FLOOD FLOOD
## 60652 54277 25326
## THUNDERSTORM WINDS HIGH WIND LIGHTNING
## 20843 20212 15754
## HEAVY SNOW HEAVY RAIN WINTER STORM
## 15708 11723 11433
## WINTER WEATHER FUNNEL CLOUD MARINE TSTM WIND
## 7026 6839 6175
## MARINE THUNDERSTORM WIND WATERSPOUT STRONG WIND
## 5812 3796 3566
## URBAN/SML STREAM FLD WILDFIRE BLIZZARD
## 3392 2761 2719
## DROUGHT ICE STORM EXCESSIVE HEAT
## 2488 2006 1678
## HIGH WINDS WILD/FOREST FIRE FROST/FREEZE
## 1533 1457 1342
## DENSE FOG WINTER WEATHER/MIX TSTM WIND/HAIL
## 1293 1104 1028
## EXTREME COLD/WIND CHILL HEAT HIGH SURF
## 1002 767 725
## TROPICAL STORM FLASH FLOODING EXTREME COLD
## 690 682 655
## COASTAL FLOOD LAKE-EFFECT SNOW FLOOD/FLASH FLOOD
## 650 636 624
## LANDSLIDE SNOW COLD/WIND CHILL
## 600 587 539
## FOG RIP CURRENT MARINE HAIL
## 538 470 442
## DUST STORM AVALANCHE WIND
## 427 386 340
## RIP CURRENTS STORM SURGE FREEZING RAIN
## 304 261 250
## URBAN FLOOD HEAVY SURF/HIGH SURF EXTREME WINDCHILL
## 249 228 204
## STRONG WINDS DRY MICROBURST ASTRONOMICAL LOW TIDE
## 196 186 174
## HURRICANE RIVER FLOOD LIGHT SNOW
## 174 173 154
## STORM SURGE/TIDE RECORD WARMTH COASTAL FLOODING
## 148 146 143
## DUST DEVIL MARINE HIGH WIND UNSEASONABLY WARM
## 141 135 126
## FLOODING ASTRONOMICAL HIGH TIDE MODERATE SNOWFALL
## 120 103 101
## URBAN FLOODING WINTRY MIX HURRICANE/TYPHOON
## 98 90 88
## FUNNEL CLOUDS HEAVY SURF RECORD HEAT
## 87 84 81
## FREEZE HEAT WAVE COLD
## 74 74 72
## RECORD COLD ICE THUNDERSTORM WINDS HAIL
## 64 61 61
## TROPICAL DEPRESSION SLEET UNSEASONABLY DRY
## 60 59 56
## FROST GUSTY WINDS THUNDERSTORM WINDSS
## 53 53 51
## MARINE STRONG WIND OTHER SMALL HAIL
## 48 48 47
## FUNNEL FREEZING FOG THUNDERSTORM
## 46 45 45
## Temperature record TSTM WIND (G45) Coastal Flooding
## 43 39 38
## WATERSPOUTS MONTHLY PRECIPITATION WINDS
## 37 36 36
## (Other)
## 2940
df_storm <- data.frame(summary(storm$EVTYPE))
Meanwhile, it is necessary to take out the unrelated factors in this case. After examining the assignment’s questions, the following factors are extracted: Event Fatilities INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP After this extraction, ab_storm is prepared for the further analysis.
ab_storm <- data.frame(storm$EVTYPE, storm$FATALITIES,storm$INJURIES,storm$PROPDMG, storm$PROPDMGEXP, storm$CROPDMG, storm$CROPDMGEXP)
Examing the data again to see whether there is any mistake in the variables. Then, in the PROPDMGEXP and CROPDMGEXP, we found that these two parts provides the unit or the exponent of the PROPDMG and CROPDMG, the variables that vital to our analysis.
## NULL
Let us aggregate these data into different categories
Fatality_rate <- aggregate(ab_storm$storm.FATALITIES ~ ab_storm$storm.EVTYPE, FUN = sum, rm.na = TRUE)
Injuries_rate <- aggregate(ab_storm$storm.INJURIES ~ ab_storm$storm.EVTYPE, FUN = sum, rm.na = TRUE)
By Aggregating the data into two groups and combining them under the dataframe of lifecost, we are able to use plots to demonstrate the possible answers to this question. Lets us recognize 10 type of events leading to the most fatalities and injuries.
Based on the conditions, we are making two plots side by side.
fatal10 <- Fatality_rate[order(-Fatality_rate$`ab_storm$storm.FATALITIES`), ][1:10, ]
Injuries10 <- Injuries_rate[order(-Injuries_rate$`ab_storm$storm.INJURIES`), ][1:10, ]
The most lethal disaster event is Tornado based on the fatalities.
fatal10
## ab_storm$storm.EVTYPE ab_storm$storm.FATALITIES
## 834 TORNADO 5634
## 130 EXCESSIVE HEAT 1904
## 153 FLASH FLOOD 979
## 275 HEAT 938
## 464 LIGHTNING 817
## 856 TSTM WIND 505
## 170 FLOOD 471
## 585 RIP CURRENT 369
## 359 HIGH WIND 249
## 19 AVALANCHE 225
#Fatalities
fatality<-ggplot(fatal10, aes(x=reorder(`ab_storm$storm.EVTYPE`, -`ab_storm$storm.FATALITIES`), y= `ab_storm$storm.FATALITIES`))+ geom_bar(stat="identity", fill = 'Maroon', color = 'Maroon') + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events Led to Most Fatalities") +labs(x="EVENT TYPE", y="Total Fatality")
fatality
Based on the plot, we can detect that in both categories, the Tonado is the most destructive disaster, which kills most people and hurts most citizens in the United States.
#Injuries
Injuries10
## ab_storm$storm.EVTYPE ab_storm$storm.INJURIES
## 834 TORNADO 91347
## 856 TSTM WIND 6958
## 170 FLOOD 6790
## 130 EXCESSIVE HEAT 6526
## 464 LIGHTNING 5231
## 275 HEAT 2101
## 427 ICE STORM 1976
## 153 FLASH FLOOD 1778
## 760 THUNDERSTORM WIND 1489
## 244 HAIL 1362
injuries<- ggplot(Injuries10, aes(x=reorder(`ab_storm$storm.EVTYPE`, -`ab_storm$storm.INJURIES`), y= `ab_storm$storm.INJURIES`))+ geom_bar(stat="identity",fill = 'Orange', color = 'Orange') + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events Led to Most Injuries") +labs(x="EVENT TYPE", y="Total Injuries")
injuries
Based on the plot, we can detect that in both categories, the Tonado is the most destructive disaster, which kills most people and hurts most citizens in the United States.
Based on the data frame and plots, we can detect that in both categories, the Tonado is the most destructive disaster, which kills most people and hurts most citizens in the United States.
For question 2, here two major variables that we are dealing with are the destruction over the properties and crops, which could be find in the PROPDMG and CROPDMG. Similar to the previous example, I will use the graph and table to compile the two costs side by side.
Using the value of the property damage (PDMG) and crop damagae (CDMG), I can compare and test.
library(ggplot2)
# Selecting 10 events cause most Property Damage
econ_storm <- ab_storm[,c(1,9,11)]
econ_storm$EDMG <- econ_storm$PDMG + econ_storm$CDMG
EDMG_a <- aggregate(econ_storm$EDMG ~ econ_storm$storm.EVTYPE, FUN = sum)
EDMG10 <- EDMG_a[order(-EDMG_a$`econ_storm$EDMG`), ][1:10,]
# Plot
edmg<-ggplot(EDMG10, aes(x=reorder(`econ_storm$storm.EVTYPE`, -`econ_storm$EDMG`), y= `econ_storm$EDMG`))+ geom_bar(stat="identity") + theme(axis.text.x = element_text(angle=90, vjust=0.5, hjust=1))+ggtitle("Top 10 Events Caused Most Economic Damage of Properties and Crops") +labs(x="EVENT TYPE", y="Total Economic Loss")
edmg
Based on the result, it can be tell that the flood, hurricane and Tornada are the most economical destructive events happened in the United States.
Based on the data of NOAA, after comparing the economic damage and life costs, the tornado/typhoon/hurricane remain as the most deadly and destructive events that happening in the United States, which fits the expectation and history. Hence, the governors, government staffs, and scientists should continue establish advance climate systems for detecting, preventing and relieving the impacts of these extreme weathers.