The NOAA storm data contains 985 severe weather types. Among those weather types, analysis has been implemented to illustrate its effect/toll on public health(the weather types that cause injuries and death) and economical damages(such as property damage and crop damage). across the US, it is clear that tornado causes most of the injuries whereas excessive heat causes the most of the fatalities among all the weather types. In the aspect of economical damages, flood generates the most damage before hurricane and typhoon.
if(!file.exists("StormData.csv.bz2")){
fileurl <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
download.file(fileurl, destfile = "StormData.csv.bz2")
}
storm <- read.csv(bzfile("StormData.csv.bz2"), header = T, stringsAsFactors = F )
str(storm)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : chr "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
## $ BGN_TIME : chr "0130" "0145" "1600" "0900" ...
## $ TIME_ZONE : chr "CST" "CST" "CST" "CST" ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: chr "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
## $ STATE : chr "AL" "AL" "AL" "AL" ...
## $ EVTYPE : chr "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : chr "" "" "" "" ...
## $ BGN_LOCATI: chr "" "" "" "" ...
## $ END_DATE : chr "" "" "" "" ...
## $ END_TIME : chr "" "" "" "" ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : chr "" "" "" "" ...
## $ END_LOCATI: chr "" "" "" "" ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: chr "K" "K" "K" "K" ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: chr "" "" "" "" ...
## $ WFO : chr "" "" "" "" ...
## $ STATEOFFIC: chr "" "" "" "" ...
## $ ZONENAMES : chr "" "" "" "" ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : chr "" "" "" "" ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
dim(storm)
## [1] 902297 37
names(storm) <- tolower(names(storm))
storm$bgn_date <- as.Date(storm$bgn_date, "%m/%d/%Y %H:%M:%S")
storm$year <- as.numeric(format(storm$bgn_date, "%Y"))
hist(x= storm$year, breaks = 30)
The data starts to have more records after 1990, so select the year > 1990
storm1 <- storm[storm$year > 1990,]
Question 1: Across the United States, which types of events (as indicated in the EVTYPE variable) are most harmful with respect to population health?
load libraries
library(dplyr)
library(ggplot2)
health <- storm1 %>%
select(injuries, fatalities, evtype) %>%
group_by(events = as.factor(evtype)) %>%
summarise(total_injuries = sum(injuries), total_death = sum(fatalities))
health %>% arrange(desc(total_injuries))
## # A tibble: 985 x 3
## events total_injuries total_death
## <fctr> <dbl> <dbl>
## 1 TORNADO 25497 1699
## 2 FLOOD 6789 470
## 3 EXCESSIVE HEAT 6525 1903
## 4 LIGHTNING 5230 816
## 5 TSTM WIND 4441 285
## 6 HEAT 2100 937
## 7 ICE STORM 1975 89
## 8 FLASH FLOOD 1777 978
## 9 THUNDERSTORM WIND 1488 133
## 10 WINTER STORM 1321 206
## # ... with 975 more rows
health %>% arrange(desc(total_death))
## # A tibble: 985 x 3
## events total_injuries total_death
## <fctr> <dbl> <dbl>
## 1 EXCESSIVE HEAT 6525 1903
## 2 TORNADO 25497 1699
## 3 FLASH FLOOD 1777 978
## 4 HEAT 2100 937
## 5 LIGHTNING 5230 816
## 6 FLOOD 6789 470
## 7 RIP CURRENT 232 368
## 8 TSTM WIND 4441 285
## 9 HIGH WIND 1137 248
## 10 AVALANCHE 170 224
## # ... with 975 more rows
we can see from the reuslts that tornado causes most of the injuries and excessive heat causes most of the fatalities
Make plots for the injuries and fatalities
injury1 <- health %>%
filter(total_injuries > 300) %>%
arrange(desc(total_injuries))
ggplot(injury1, aes( x = reorder(events, -total_injuries), y = total_injuries)) + geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90)) + xlab("Severe Weather Type") + ylab("Number of Injuries")
> From the plot, we concluded that tornato caused most of the injuries since 1990 and flood is the second most severe weather type causing injuries.
fatal <- health %>%
filter(total_death > 100) %>%
arrange(desc(total_death))
ggplot(fatal, aes ( x = reorder(events, -total_death), y = total_death)) + geom_bar(stat = "identity", position = "dodge") + theme(axis.text.x = element_text(angle = 90)) + xlab("Severe Weather Type") + ylab("Number of Death")
> we see from the ordered plot that excessive heat is the major culprit of severe weather causing fatalities(1903 people died of over heating since 1990 to 2011). Tornado follows suit in the second place took 1699 people’s lives since 1990 to 2011.
Question 2:Across the United States, which types of events have the greatest economic consequences?
dmg <- storm1 %>% select(evtype, bgn_date, propdmg, propdmgexp, cropdmg,cropdmgexp)
2.Transform of the “dmg”
*need to interpret those mulitpliers
unique(dmg$propdmgexp)
## [1] "" "K" "M" "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-"
## [18] "1" "8"
unique(dmg$cropdmgexp)
## [1] "" "M" "K" "m" "B" "?" "0" "k" "2"
“”/“?” = 0, “0” or “+/-” = 1, “H”, “h” or “2” = 10e2, “K” or “k” = 10e3, “M” or “m” = 10e6, “B” or “b” = 10e9, “3” = 10e3, “4” = 10e4 etc.
dmg$propdmgexp <- toupper(dmg$propdmgexp)
dmg$cropdmgexp <- toupper(dmg$cropdmgexp)
dmg$propdmgexp <- gsub("[1]", 10, dmg$propdmgexp)
dmg$propdmgexp <- gsub("\\+|\\-|^0", 1, dmg$propdmgexp)
dmg$propdmgexp <- gsub("\\?", 0, dmg$propdmgexp)
dmg$propdmgexp <- gsub("^$", 0, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[H2]", 10^2, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[K3]", 10^3, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[M6]", 10^6, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[4]", 10^4, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[5]", 10^5, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[7]", 10^7, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[8]", 10^8, dmg$propdmgexp)
dmg$propdmgexp <- gsub("[B]", 10^9, dmg$propdmgexp)
dmg$cropdmgexp <- gsub("\\+|\\-|^0", 1, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("\\?", 0, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("^$", 0, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("[2]", 10^2, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("K", 10^3, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("M", 10^6, dmg$cropdmgexp)
dmg$cropdmgexp <- gsub("B", 10^9, dmg$cropdmgexp)
Calculate the total damages and parse it to a new variable “total_loss”
dmg <- dmg %>%
mutate(prop_dmg = propdmg * as.numeric(propdmgexp), crop_dmg = cropdmg * as.numeric(cropdmgexp), total_loss = prop_dmg + crop_dmg)
dmg1990 <- dmg %>%
select(evtype, prop_dmg, crop_dmg, date = bgn_date, total_loss) %>%
filter(total_loss > 10^9) %>%
group_by(evtype) %>%
summarise( total = sum(total_loss)) %>%
arrange(desc(total))
print(dmg1990)
## # A tibble: 18 x 2
## evtype total
## <chr> <dbl>
## 1 FLOOD 121532501000
## 2 HURRICANE/TYPHOON 66438500000
## 3 STORM SURGE 42560000000
## 4 RIVER FLOOD 10000000000
## 5 HURRICANE 5501000000
## 6 TROPICAL STORM 5150000000
## 7 ICE STORM 5000500000
## 8 WINTER STORM 5000000000
## 9 TORNADO 4300000000
## 10 STORM SURGE/TIDE 4000000000
## 11 HEAVY RAIN/SEVERE WEATHER 2500000000
## 12 HIGH WIND 2404000000
## 13 HURRICANE OPAL 2105000000
## 14 HAIL 1800000000
## 15 TORNADOES, TSTM WIND, HAIL 1602500000
## 16 WILD/FOREST FIRE 1500000000
## 17 SEVERE THUNDERSTORM 1200000000
## 18 WILDFIRE 1046500000
From summary, we see Flood causes most of the damages
ggplot(dmg1990, aes( x = reorder(evtype, -total), y = total)) + geom_bar(stat = "identity") + theme(axis.text.x=element_text(angle = 90)) + xlab("Severe Weath Type") + ylab("Total Loss (property and crops) in Dollars")
> From the plot, we see that flood cuases most of the damages and Hurricane/typhone follows it. It makes senese considering the frequency between flood and hurricane, even though hurricane has stronger magnitude than flood, but usually hurricane would also cause flooding.