Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
The goal of this data analysis is to find which types of events are most harmful with respect to population health and which types of events have the greatest economic consequences in US. The data for this assignment collected from U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. The data can be collected from:
Storm Data[47Mb]
if(!file.exists("repdata_data_StormData.csv.bz2")) {
download.file("https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2",
destfile = "StormData.csv.bz2", method = "curl")
}
stormData <- read.csv("repdata_data_StormData.csv")
str(stormData)
## 'data.frame': 902297 obs. of 37 variables:
## $ STATE__ : num 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_DATE : Factor w/ 16335 levels "1/1/1966 0:00:00",..: 6523 6523 4242 11116 2224 2224 2260 383 3980 3980 ...
## $ BGN_TIME : Factor w/ 3608 levels "00:00:00 AM",..: 272 287 2705 1683 2584 3186 242 1683 3186 3186 ...
## $ TIME_ZONE : Factor w/ 22 levels "ADT","AKS","AST",..: 7 7 7 7 7 7 7 7 7 7 ...
## $ COUNTY : num 97 3 57 89 43 77 9 123 125 57 ...
## $ COUNTYNAME: Factor w/ 29601 levels "","5NM E OF MACKINAC BRIDGE TO PRESQUE ISLE LT MI",..: 13513 1873 4598 10592 4372 10094 1973 23873 24418 4598 ...
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ BGN_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ BGN_AZI : Factor w/ 35 levels ""," N"," NW",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ BGN_LOCATI: Factor w/ 54429 levels ""," Christiansburg",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_DATE : Factor w/ 6663 levels "","1/1/1993 0:00:00",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_TIME : Factor w/ 3647 levels ""," 0900CST",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ COUNTY_END: num 0 0 0 0 0 0 0 0 0 0 ...
## $ COUNTYENDN: logi NA NA NA NA NA NA ...
## $ END_RANGE : num 0 0 0 0 0 0 0 0 0 0 ...
## $ END_AZI : Factor w/ 24 levels "","E","ENE","ESE",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ END_LOCATI: Factor w/ 34506 levels ""," CANTON"," TULIA",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LENGTH : num 14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
## $ WIDTH : num 100 150 123 100 150 177 33 33 100 100 ...
## $ F : int 3 2 2 2 2 2 2 1 3 3 ...
## $ MAG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ WFO : Factor w/ 542 levels ""," CI","%SD",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ STATEOFFIC: Factor w/ 250 levels "","ALABAMA, Central",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ ZONENAMES : Factor w/ 25112 levels ""," "| __truncated__,..: 1 1 1 1 1 1 1 1 1 1 ...
## $ LATITUDE : num 3040 3042 3340 3458 3412 ...
## $ LONGITUDE : num 8812 8755 8742 8626 8642 ...
## $ LATITUDE_E: num 3051 0 0 0 0 ...
## $ LONGITUDE_: num 8806 0 0 0 0 ...
## $ REMARKS : Factor w/ 436781 levels "","\t","\t\t",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ REFNUM : num 1 2 3 4 5 6 7 8 9 10 ...
library(dplyr)
library(ggplot2)
Certain column was selected based on need for data analysis to make tidy dataset.
stormDamage <- select(stormData, EVTYPE, FATALITIES, INJURIES, PROPDMG, PROPDMGEXP,CROPDMG,CROPDMGEXP)
head(stormDamage)
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO 0 15 25.0 K 0
## 2 TORNADO 0 0 2.5 K 0
## 3 TORNADO 0 2 25.0 K 0
## 4 TORNADO 0 2 2.5 K 0
## 5 TORNADO 0 2 2.5 K 0
## 6 TORNADO 0 6 2.5 K 0
Two variables was added with original data as PROPDMGNUM and CROPDMGNUM where PROPDMGNUM contains the number of property damage and CROPDMGNUM contains the number of crop damage. Property damage was calculated by the number given in PRODMG variable times the expression (“H = hundred”, “K = kilo”, “M = million”, “B = billion”) given in PRODMGEXP variable.
# Initialize the new variable
stormDamage$PROPDMGNUM <- 0
stormDamage$CROPDMGNUM <- 0
# Calculating property damage
stormDamage$PROPDMGNUM <- ifelse(stormDamage$PROPDMGEXP =="H" | stormDamage$PROPDMGEXP =="h", stormDamage$PROPDMG*10^2, stormDamage$PROPDMGNUM)
stormDamage$PROPDMGNUM <- ifelse(stormDamage$PROPDMGEXP =="K" | stormDamage$PROPDMGEXP =="k", stormDamage$PROPDMG*10^3, stormDamage$PROPDMGNUM)
stormDamage$PROPDMGNUM <- ifelse(stormDamage$PROPDMGEXP =="M" | stormDamage$PROPDMGEXP =="m", stormDamage$PROPDMG*10^6, stormDamage$PROPDMGNUM)
stormDamage$PROPDMGNUM <- ifelse(stormDamage$PROPDMGEXP =="B" | stormDamage$PROPDMGEXP =="b", stormDamage$PROPDMG*10^9, stormDamage$PROPDMGNUM)
# Calculating crop damage
stormDamage$CROPDMGNUM <- ifelse(stormDamage$CROPDMGEXP =="H" | stormDamage$CROPDMGEXP =="h", stormDamage$CROPDMG*10^2, stormDamage$CROPDMGNUM)
stormDamage$CROPDMGNUM <- ifelse(stormDamage$CROPDMGEXP =="K" | stormDamage$CROPDMGEXP =="k", stormDamage$CROPDMG*10^3, stormDamage$CROPDMGNUM)
stormDamage$CROPDMGNUM <- ifelse(stormDamage$CROPDMGEXP =="M" | stormDamage$CROPDMGEXP =="m", stormDamage$CROPDMG*10^6, stormDamage$CROPDMGNUM)
stormDamage$CROPDMGNUM <- ifelse(stormDamage$CROPDMGEXP =="B" | stormDamage$CROPDMGEXP =="b", stormDamage$CROPDMG*10^9, stormDamage$CROPDMGNUM)
# Separate health and economic effect/damage
stormDamage$health <- stormDamage$FATALITIES + stormDamage$INJURIES
stormDamage$economy <- stormDamage$PROPDMGNUM + stormDamage$CROPDMGNUM
# subset by event and find total health and property damage
d1 <- stormDamage %>%
select(EVTYPE, health, economy) %>%
group_by(EVTYPE) %>%
summarise(healthdamg = sum(health), propdmg = sum(economy))
d2 <- arrange(d1, desc(healthdamg))
d3 <- d2[1:10, ]
# Convert the class of EVTYPE into factor
d3$EVTYPE <- factor(d3$EVTYPE, levels = d3$EVTYPE)
g <- ggplot(d3, aes(EVTYPE, healthdamg))
g + geom_bar(stat = "identity", fill = "green") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Event Type", y = "Health Damage") + ggtitle("Total health damage by top 10 Weather Events in US")
d4 <- arrange(d1, desc(propdmg))
d5 <- d4[1:10,]
d5$EVTYPE <- factor(d5$EVTYPE, levels = d5$EVTYPE)
g1 <- ggplot(d5, aes(EVTYPE, propdmg))
g1 + geom_bar(stat = "identity", fill = "blue") + theme(axis.text.x = element_text(angle = 90, hjust = 1)) + labs(x = "Event Type", y = "Property Damage") + ggtitle("Total economic impact by top 10 Weather Events in US")
Population health is more damaged by Tornado and economy is more effected by the Flood.