Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities,injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This report analyses such events across USA from 1950 to 2011 and enlists the events with major effects.
The primary goal of this report is to identify the types of storm/events which have harmful effects on human life and the events which create highest property loss across USA.This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database.This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.The analysis on this data led us to the events with maximum damage to human life and economic property. From the analysis, Tornado seems to have the worst impact on human health while Floods and Drought have the worst economic effect.
#Unzip the compressed file
if (!file.exists("repdata-data-StormData.csv")){
bunzip2(file="repdata-data-StormData.csv.bz2")
}
# Read the unzipped .csv file
rawdata <- read.csv("repdata-data-StormData.csv")
After reding the data, the dimensions of the dataset is checked by using the dim() function. We can observe a total of 902297 observations across 37 columns. To get an overall sense of the data, the first few rows can be checked by using the head() function. Key variable names required for the analysis are EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP
dim(rawdata)
## [1] 902297 37
head(rawdata[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP")])
## EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1 TORNADO 0 15 25.0 K
## 2 TORNADO 0 0 2.5 K
## 3 TORNADO 0 2 25.0 K
## 4 TORNADO 0 2 2.5 K
## 5 TORNADO 0 2 2.5 K
## 6 TORNADO 0 6 2.5 K
unique(rawdata$PROPDMGEXP)
## [1] K M B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels: - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# Checking the number of error values
table(rawdata$PROPDMGEXP)
##
## - ? + 0 1 2 3 4 5
## 465934 1 8 5 216 25 13 4 4 28
## 6 7 8 B h H K m M
## 4 5 1 40 1 6 424665 7 11330
There are a total of 985 events. The average number of fatalities and injuries are estimated for all these events. Top 10 events corresponding to the worst number of injuries and fatalities are tabulated in health dataset.
inj <- aggregate(rawdata$INJURIES,list(EVTYPE=rawdata$EVTYPE),sum)
inj <- inj[order(-inj$x),]
fat <- aggregate(rawdata$FATALITIES,list(EVTYPE=rawdata$EVTYPE),sum)
fat <- fat[order(-fat$x),]
health <- data.frame(FATALEVENTS=fat$EVTYPE[1:10],FATALITIES=fat$x[1:10],
INJURYEVENTS=inj$EVTYPE[1:10],INJURIES=inj$x[1:10])
The PROPDMGEXP variable should explicitly contain B,M,K,H values corresponding to Billion ,Million,Thousand and Hundred respectively. But there are other values as displayed by unique() function. So the column is tabulated to find the total no. of errors. These other variables can be neglected as they are very less in number. Similar adjustments are made to CROPDMGEXP variable. It is important to note that the data for economic factors is not available for every row in the raw dataset. This data is performed on almost 46,000 observations.
# Check the levels of the Damage Exponent Variable
levels(rawdata$PROPDMGEXP)
## [1] "" "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
# Replace the values in PROPDMGEXP with a numeric value corresponding to Million Dollars
levels(rawdata$PROPDMGEXP) <- c(rep(0,13),1000,0.0001,0.0001,0.001,1,1)
levels(rawdata$PROPDMGEXP)
## [1] "0" "1000" "1e-04" "0.001" "1"
# Converting the variable into a numeric vector
rawdata$PROPDMGEXP <- as.numeric(as.character(rawdata$PROPDMGEXP))
# Creating a new variable DAMAGE estimating the actual damage in Million Dollars
rawdata$DAMAGE <- rawdata$PROPDMG * rawdata$PROPDMGEXP
# Aggregating the Damage corresponding to the event types
propDamage <- aggregate(rawdata$DAMAGE,list(EVENT=rawdata$EVTYPE),sum)
propDamage <- propDamage[order(-propDamage$x),]
# Check the levels of the Crop Damage Exponent Variable
levels(rawdata$CROPDMGEXP)
## [1] "" "?" "0" "2" "B" "k" "K" "m" "M"
# Replace the values in CROPDMGEXP with a numeric value corresponding to Million Dollars
levels(rawdata$CROPDMGEXP) <- c(0,0,0,0,1000,0.001,0.001,1,1)
levels(rawdata$CROPDMGEXP)
## [1] "0" "1000" "0.001" "1"
# Converting the variable into a numeric vector
rawdata$CROPDMGEXP <- as.numeric(as.character(rawdata$CROPDMGEXP))
# Creating a new variable DAMAGE estimating the actual damage in Million Dollars
rawdata$CROPDAMAGE <- rawdata$CROPDMG * rawdata$CROPDMGEXP
# Aggregating the Damage corresponding to the event types
cropDamage <- aggregate(rawdata$CROPDAMAGE,list(EVENT=rawdata$EVTYPE),sum)
cropDamage <- cropDamage[order(-cropDamage$x),]
# Building a dataset specifying events for property and crop damages
damages <- data.frame(PROPEVENTS=propDamage$EVENT[1:10],PROPDAMAGE=propDamage$x[1:10],
CROPEVENTS=cropDamage$EVENT[1:10],CROPDAMAGE=cropDamage$x[1:10])
The major emphasis of the analysis is to find the events with greatest consequences on public life and those with highest economic damage. These events have been summarized below in tables and barplots.
- Tornado stands out with a total of 5633 Fatalities and 91346 injuries.
- Excessive heat, Flash Flood and Heat are other events with high fatality count.
- TSTM Wind, Flood, Excessive Heat and Lightning are major events with over 5000
total injuries.
print(health)
## FATALEVENTS FATALITIES INJURYEVENTS INJURIES
## 1 TORNADO 5633 TORNADO 91346
## 2 EXCESSIVE HEAT 1903 TSTM WIND 6957
## 3 FLASH FLOOD 978 FLOOD 6789
## 4 HEAT 937 EXCESSIVE HEAT 6525
## 5 LIGHTNING 816 LIGHTNING 5230
## 6 TSTM WIND 504 HEAT 2100
## 7 FLOOD 470 ICE STORM 1975
## 8 RIP CURRENT 368 FLASH FLOOD 1777
## 9 HIGH WIND 248 THUNDERSTORM WIND 1488
## 10 AVALANCHE 224 HAIL 1361
library(ggplot2)
library(gridExtra)
# Plotting events with highest Fatalities
fatPlot <- ggplot(health[1:10,],aes(FATALEVENTS,FATALITIES,fill=FATALEVENTS))+
geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
labs(title="Events with highest Fatalities")+
ylab("Number of Fatalities")+
xlab("")+
theme_bw()
#Plotting events with highest Injuries
injPlot <- ggplot(health[1:10,],aes(INJURYEVENTS,INJURIES,fill=INJURYEVENTS))+
geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
labs(title="Events with highest Injuries")+
ylab("Number of Injuries")+
xlab("")+
theme_bw()
# Plotting the two figures in a row
grid.arrange(fatPlot,injPlot,ncol=2)
- **Flood and Drought** respectively have excessive damages of over 140 Billion Dollars
to the properties and crops.
- **Hurricanes, Tornado and Storm Surge** are other events contributing to property
damages.
- **Drought, Floods, River Floods and Ice storms** are other major events responsible
for crop damage.
# Display the top 10 events with highest economic damage
print(damages)
## PROPEVENTS PROPDAMAGE CROPEVENTS CROPDAMAGE
## 1 FLOOD 144657.710 DROUGHT 13972.566
## 2 HURRICANE/TYPHOON 69305.840 FLOOD 5661.968
## 3 TORNADO 56937.160 RIVER FLOOD 5029.459
## 4 STORM SURGE 43323.536 ICE STORM 5022.114
## 5 FLASH FLOOD 16140.812 HAIL 3025.954
## 6 HAIL 15732.267 HURRICANE 2741.910
## 7 HURRICANE 11868.319 HURRICANE/TYPHOON 2607.873
## 8 TROPICAL STORM 7703.891 FLASH FLOOD 1421.317
## 9 WINTER STORM 6688.497 EXTREME COLD 1292.973
## 10 HIGH WIND 5270.046 FROST/FREEZE 1094.086
# Plot the top 10 Events and the respective damages
propPlot <- ggplot(propDamage[1:10,],aes(EVENT,x,fill=EVENT))+
geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
labs(title="Events with highest property damage")+
ylab("Damage in Million Dollars")+
xlab("")+
theme_bw()
cropPlot <- ggplot(cropDamage[1:10,],aes(EVENT,x,fill=EVENT))+
geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
labs(title="Events with highest crop damage")+
ylab("Damage in Million Dollars")+
xlab("")+
theme_bw()
grid.arrange(propPlot,cropPlot,ncol=2)