Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.
I have explored the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database suggested by this Assignment page. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.
Download required resource file repdata-data-StormData.csv from https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2.
Store the data file in the ./data directory under working directory.
Load necessary library files.
library(knitr)
## Warning: package 'knitr' was built under R version 3.2.4
library(ggplot2)
Load storm data from repdata-data-StormData.csv file.
fileUrl <- "data\\repdata-data-StormData.csv"
stormData <- read.csv(file = fileUrl)
Check the dimensions of the data. Dimensions of the data.
dim(stormData)
## [1] 902297 37
Names of the attributes.
names(stormData)
## [1] "STATE__" "BGN_DATE" "BGN_TIME" "TIME_ZONE" "COUNTY"
## [6] "COUNTYNAME" "STATE" "EVTYPE" "BGN_RANGE" "BGN_AZI"
## [11] "BGN_LOCATI" "END_DATE" "END_TIME" "COUNTY_END" "COUNTYENDN"
## [16] "END_RANGE" "END_AZI" "END_LOCATI" "LENGTH" "WIDTH"
## [21] "F" "MAG" "FATALITIES" "INJURIES" "PROPDMG"
## [26] "PROPDMGEXP" "CROPDMG" "CROPDMGEXP" "WFO" "STATEOFFIC"
## [31] "ZONENAMES" "LATITUDE" "LONGITUDE" "LATITUDE_E" "LONGITUDE_"
## [36] "REMARKS" "REFNUM"
Check if there is missing or NA values in the data.
sum(is.na(stormData))
## [1] 1745947
For our analysis we have to select only the a few of the attributes. Out of 37 attributes, we will require only 8 attributes. Select the subset from the storm data.
stormD <- stormData[,c("STATE", "EVTYPE", "FATALITIES", "INJURIES", "PROPDMG", "PROPDMGEXP","CROPDMG", "CROPDMGEXP")]
Lets check the dimension of this sub-set.
dim(stormD)
## [1] 902297 8
Also, lets see a few values in the subset.
kable(rbind(head(stormD), tail(stormD)))
| STATE | EVTYPE | FATALITIES | INJURIES | PROPDMG | PROPDMGEXP | CROPDMG | CROPDMGEXP | |
|---|---|---|---|---|---|---|---|---|
| 1 | AL | TORNADO | 0 | 15 | 25.0 | K | 0 | |
| 2 | AL | TORNADO | 0 | 0 | 2.5 | K | 0 | |
| 3 | AL | TORNADO | 0 | 2 | 25.0 | K | 0 | |
| 4 | AL | TORNADO | 0 | 2 | 2.5 | K | 0 | |
| 5 | AL | TORNADO | 0 | 2 | 2.5 | K | 0 | |
| 6 | AL | TORNADO | 0 | 6 | 2.5 | K | 0 | |
| 902292 | TN | WINTER WEATHER | 0 | 0 | 0.0 | K | 0 | K |
| 902293 | WY | HIGH WIND | 0 | 0 | 0.0 | K | 0 | K |
| 902294 | MT | HIGH WIND | 0 | 0 | 0.0 | K | 0 | K |
| 902295 | AK | HIGH WIND | 0 | 0 | 0.0 | K | 0 | K |
| 902296 | AK | BLIZZARD | 0 | 0 | 0.0 | K | 0 | K |
| 902297 | AL | HEAVY SNOW | 0 | 0 | 0.0 | K | 0 | K |
Compact structure summary of the sub-set is.
str(stormD)
## 'data.frame': 902297 obs. of 8 variables:
## $ STATE : Factor w/ 72 levels "AK","AL","AM",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ EVTYPE : Factor w/ 985 levels " HIGH SURF ADVISORY",..: 834 834 834 834 834 834 834 834 834 834 ...
## $ FATALITIES: num 0 0 0 0 0 0 0 0 1 0 ...
## $ INJURIES : num 15 0 2 2 2 6 1 0 14 0 ...
## $ PROPDMG : num 25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
## $ PROPDMGEXP: Factor w/ 19 levels "","-","?","+",..: 17 17 17 17 17 17 17 17 17 17 ...
## $ CROPDMG : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CROPDMGEXP: Factor w/ 9 levels "","?","0","2",..: 1 1 1 1 1 1 1 1 1 1 ...
Missing values in the sub-set.
sum(is.na(stormD))
## [1] 0
States in the data set.
unique(stormD$STATE)
## [1] AL AZ AR CA CO CT DE DC FL GA HI ID IL IN IA KS KY LA ME MD MA MI MN
## [24] MS MO MT NE NV NH NJ NM NY NC ND OH OK OR PA RI SC SD TN TX UT VT VA
## [47] WA WV WI WY PR AK ST AS GU MH VI AM LC PH GM PZ AN LH LM LE LS SL LO
## [70] PM PK XX
## 72 Levels: AK AL AM AN AR AS AZ CA CO CT DC DE FL GA GM GU HI IA ID ... XX
We have shown,
a. Event wise fatality statistics
b. State wise fatality statistics
c. Event and State wise fatality statistics
d. Event wise injury statistics
e. State wise injury statistics
f. Event and State wise injury statistics
fatal.fatal w.r.t FATALITIES.fatal <- aggregate (FATALITIES~EVTYPE, stormD, sum)
fatal <- fatal [order(fatal$FATALITIES, decreasing=TRUE),]
The number one event having the largest fatality is, TORNADO.
ggplot(fatal[1:10,], aes(EVTYPE[1:10], FATALITIES[1:10]))+
geom_bar(stat="identity", aes(fill = EVTYPE[1:10]))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "Event Type", y = "Fatality Count for Events",
title = "Top 10 Fatality causing Events")
fatalStatewise.fatalStatewise w.r.t FATALITIES.fatalStatewise <- aggregate (FATALITIES~STATE, stormD, sum)
fatalStatewise <- fatalStatewise[order(fatalStatewise$FATALITIES, decreasing = TRUE),]
State having largest number of fatality, IL.
ggplot(fatalStatewise[1:10,], aes(x = STATE, y = FATALITIES))+
geom_bar(stat = "identity", aes(fill = STATE))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "State", y = "Fatality Count",
title = "Top 10 States having highers Fatality")
fatalState.fatalState w.r.t FATALITIES.fatalState <- aggregate (FATALITIES~EVTYPE+STATE, stormD, sum)
fatalState <- fatalState[order(fatalState$FATALITIES, decreasing = TRUE),]
firstGroup <- fatalState[1:10,]
firstGroup$EVT_STATE <- paste(firstGroup$EVTYPE, firstGroup$STATE,sep = "@")
Most fatality caused by an event is HEAT in IL.
ggplot(firstGroup, aes(x = EVT_STATE, y = FATALITIES))+
geom_bar(stat = "identity", aes(fill = EVT_STATE))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "Event@State", y = "Fatality Count",
title = "Top 10 Individual Events In one State having highers Fatality")
injur.injur w.r.t FATALITIES.injur <- aggregate (INJURIES~EVTYPE, stormD, sum)
injur <- injur [order(injur$INJURIES, decreasing=TRUE),]
The number one event having the largest injury is, TORNADO.
ggplot(injur[1:10,], aes(EVTYPE[1:10], INJURIES[1:10]))+
geom_bar(stat="identity", aes(fill = EVTYPE[1:10]))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "Event Type", y = "Injury Count for Events",
title = "Top 10 Injury causing Events")
injurStatewise.injurStatewise w.r.t FATALITIES.injurStatewise <- aggregate (INJURIES~STATE, stormD, sum)
dim(injurStatewise)
## [1] 72 2
injurStatewise <- injurStatewise[order(injurStatewise$INJURIES, decreasing = TRUE),]
State having largest number of injury, TX.
ggplot(injurStatewise[1:10,], aes(x = STATE, y = INJURIES))+
geom_bar(stat = "identity", aes(fill = STATE))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "State", y = "Injury Count",
title = "Top 10 States having highers Injury")
injurState.injurState w.r.t FATALITIES.injurState <- aggregate (INJURIES~EVTYPE+STATE, stormD, sum)
dim(injurState)
## [1] 4258 3
injurState <- injurState[order(injurState$INJURIES, decreasing = TRUE),]
firstGroup <- injurState[1:10,]
firstGroup$EVT_STATE <- paste(firstGroup$EVTYPE, firstGroup$STATE,sep = "@")
Most injury caused by an event is TORNADO in TX.
ggplot(firstGroup, aes(x = EVT_STATE, y = INJURIES))+
geom_bar(stat = "identity", aes(fill = EVT_STATE))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "State", y = "Fatality Count",
title = "Top 10 Individual Events In one State having highers Injury")
The property damage value is given in two columns. PROPDMG contains a value and PROPDMGEXP is an exponent. In order to calculate any property damage by for a specific row, we have to use, PROPDMG*10^PROPDMGEXP equation. Similarly, for crop damage, the two necessary values are given in CROPDMG and CROPDMGEXP columns.
The PROPDMGEXP values are,
sort(unique(stormD$PROPDMGEXP))
## [1] - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
Calculate the necessary multiplication factors.
symbol <- c("", "+", "-", "?", 0:9, "h", "H", "k", "K", "m", "M", "b", "B");
factor <- c(rep(0,4), 0:9, 2, 2, 3, 3, 6, 6, 9, 9)
multiplier <- data.frame (symbol, factor)
Calculate property damage by PROPDMG*10^PROPDMGEXP. Similarly calculate crop damage by CROPDMG*10^CROPDMGEXP. Total will be the sum of property damage and crop damage.
stormD$damage.prop <- stormD$PROPDMG*10^multiplier[match(stormD$PROPDMGEXP,multiplier$symbol),2]
stormD$damage.crop <- stormD$CROPDMG*10^multiplier[match(stormD$CROPDMGEXP,multiplier$symbol),2]
stormD$damage <- stormD$damage.prop + stormD$damage.crop
We showed,
a. Event wise damage statistics
b. State wise damage statistics
c. Event and State wise damage statistics
damage.damage w.r.t damage in billions.damage <- aggregate (damage~EVTYPE, stormD, sum);
damage$bilion <- damage$damage / 1e9;
damage <- damage [order(damage$bilion, decreasing=TRUE),]
The number one event having the largest damage (in billion) is, FLOOD.
ggplot(damage[1:10,], aes(EVTYPE, bilion))+
geom_bar(stat="identity", aes(fill = EVTYPE))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "Event Type", y = "Damage in Billion Dollar",
title = "Top 10 Damage (in Billion) causing Events")
damageStatewise.Statewise w.r.t damage in billions.damageStatewise <- aggregate (damage~STATE, stormD, sum);
damageStatewise$bilion <- damageStatewise$damage / 1e9;
damageStatewise <- damageStatewise[order(damageStatewise$bilion, decreasing=TRUE),]
State having largest number of damage (in billion), CA.
ggplot(damageStatewise[1:10,], aes(STATE, bilion))+
geom_bar(stat="identity", aes(fill = STATE))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "State", y = "Fatality Count",
title = "Top 10 States having highers Damage")
damageState.damageState w.r.t damage in billions.damageState <- aggregate (damage~EVTYPE+STATE, stormD, sum);
damageState$bilion <- damageState$damage / 1e9;
damageState <- damageState[order(damageState$bilion, decreasing=TRUE),]
firstGroup <- damageState[1:10,]
firstGroup$EVT_STATE <- paste(firstGroup$EVTYPE, firstGroup$STATE,sep = "@")
Most damage (in billion) caused by an event is FLOOD in CA.
ggplot(firstGroup, aes(EVT_STATE, bilion))+
geom_bar(stat="identity", aes(fill = EVT_STATE))+
theme(axis.text.x=element_text(angle=90,hjust=1,vjust=0.5))+
labs(x = "State", y = "Fatality Count",
title = "Top 10 Individual Events In one State having highers Damage")
Different event has different impact on different states. Tornado is number one fatality and injury causing event. Interestingly, the answer to the question, “which state has the highers fatality count?” is Illinois, where as, Texas has the most injuries. Also, Heat and heat realted events caused higher number of fatalities in the state of Illinois, which is, for a single event on a state causing highest number of fatalities. In case of injury, Texas also takes the most toll from Tornado. So, in short, Tornado is the number one danger in terms of fatality and injury.
In terms of damage (both property and crop damage), Flood is the most dangerous of all. And among all the states, California is affected the most by flood. The second most damaging event is Hurricane or Typhoon, which has much impact in the state of Florida. So, some states reuire more safety measures and awareness to save lives. In order to minimize damages, some states require proper strategy and resource management.