Effects of Severe Weather Events

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities,injuries, and property damage, and preventing such outcomes to the extent possible is a key concern. This report analyses such events across USA from 1950 to 2011 and enlists the events with major effects.

Synopsis

The primary goal of this report is to identify the types of storm/events which have harmful effects on human life and the events which create highest property loss across USA.This project involves exploring the U.S. National Oceanic and Atmospheric Administration's (NOAA) storm database.This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.The analysis on this data led us to the events with maximum damage to human life and economic property. From the analysis, Tornado seems to have the worst impact on human health while Floods and Drought have the worst economic effect.

Loading and Processing the Data

Reading raw data

#Unzip the compressed file
if (!file.exists("repdata-data-StormData.csv")){
    bunzip2(file="repdata-data-StormData.csv.bz2")
}
# Read the unzipped .csv file
rawdata <- read.csv("repdata-data-StormData.csv")

After reding the data, the dimensions of the dataset is checked by using the dim() function. We can observe a total of 902297 observations across 37 columns. To get an overall sense of the data, the first few rows can be checked by using the head() function. Key variable names required for the analysis are EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP

dim(rawdata)
## [1] 902297     37
head(rawdata[,c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP")])
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP
## 1 TORNADO          0       15    25.0          K
## 2 TORNADO          0        0     2.5          K
## 3 TORNADO          0        2    25.0          K
## 4 TORNADO          0        2     2.5          K
## 5 TORNADO          0        2     2.5          K
## 6 TORNADO          0        6     2.5          K

Processing the data

unique(rawdata$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
# Checking the number of error values 
table(rawdata$PROPDMGEXP)
## 
##             -      ?      +      0      1      2      3      4      5 
## 465934      1      8      5    216     25     13      4      4     28 
##      6      7      8      B      h      H      K      m      M 
##      4      5      1     40      1      6 424665      7  11330

Effects on Population Health :

There are a total of 985 events. The average number of fatalities and injuries are estimated for all these events. Top 10 events corresponding to the worst number of injuries and fatalities are tabulated in health dataset.

inj <- aggregate(rawdata$INJURIES,list(EVTYPE=rawdata$EVTYPE),sum)
inj <- inj[order(-inj$x),]
fat <- aggregate(rawdata$FATALITIES,list(EVTYPE=rawdata$EVTYPE),sum)
fat <- fat[order(-fat$x),]
health <- data.frame(FATALEVENTS=fat$EVTYPE[1:10],FATALITIES=fat$x[1:10],
                     INJURYEVENTS=inj$EVTYPE[1:10],INJURIES=inj$x[1:10])

Economic Effects:

The PROPDMGEXP variable should explicitly contain B,M,K,H values corresponding to Billion ,Million,Thousand and Hundred respectively. But there are other values as displayed by unique() function. So the column is tabulated to find the total no. of errors. These other variables can be neglected as they are very less in number. Similar adjustments are made to CROPDMGEXP variable. It is important to note that the data for economic factors is not available for every row in the raw dataset. This data is performed on almost 46,000 observations.

# Check the levels of the Damage Exponent Variable
levels(rawdata$PROPDMGEXP)
##  [1] ""  "-" "?" "+" "0" "1" "2" "3" "4" "5" "6" "7" "8" "B" "h" "H" "K"
## [18] "m" "M"
# Replace the values in PROPDMGEXP with a numeric value corresponding to Million Dollars
levels(rawdata$PROPDMGEXP) <- c(rep(0,13),1000,0.0001,0.0001,0.001,1,1)
levels(rawdata$PROPDMGEXP)
## [1] "0"     "1000"  "1e-04" "0.001" "1"
# Converting the variable into a numeric vector
rawdata$PROPDMGEXP <- as.numeric(as.character(rawdata$PROPDMGEXP))
# Creating a new variable DAMAGE estimating the actual damage in Million Dollars
rawdata$DAMAGE <- rawdata$PROPDMG * rawdata$PROPDMGEXP
# Aggregating the Damage corresponding to the event types
propDamage <- aggregate(rawdata$DAMAGE,list(EVENT=rawdata$EVTYPE),sum)
propDamage <- propDamage[order(-propDamage$x),]

# Check the levels of the Crop Damage Exponent Variable
levels(rawdata$CROPDMGEXP)
## [1] ""  "?" "0" "2" "B" "k" "K" "m" "M"
# Replace the values in CROPDMGEXP with a numeric value corresponding to Million Dollars
levels(rawdata$CROPDMGEXP) <- c(0,0,0,0,1000,0.001,0.001,1,1)
levels(rawdata$CROPDMGEXP)
## [1] "0"     "1000"  "0.001" "1"
# Converting the variable into a numeric vector
rawdata$CROPDMGEXP <- as.numeric(as.character(rawdata$CROPDMGEXP))
# Creating a new variable DAMAGE estimating the actual damage in Million Dollars
rawdata$CROPDAMAGE <- rawdata$CROPDMG * rawdata$CROPDMGEXP
# Aggregating the Damage corresponding to the event types
cropDamage <- aggregate(rawdata$CROPDAMAGE,list(EVENT=rawdata$EVTYPE),sum)
cropDamage <- cropDamage[order(-cropDamage$x),]
# Building a dataset specifying events for property and crop damages
damages <- data.frame(PROPEVENTS=propDamage$EVENT[1:10],PROPDAMAGE=propDamage$x[1:10],
            CROPEVENTS=cropDamage$EVENT[1:10],CROPDAMAGE=cropDamage$x[1:10])

Results:

The major emphasis of the analysis is to find the events with greatest consequences on public life and those with highest economic damage. These events have been summarized below in tables and barplots.

Events with drastic effects on human life:

- Tornado stands out with a total of 5633 Fatalities and 91346 injuries.
- Excessive heat, Flash Flood and Heat are other events with high fatality count.
- TSTM Wind, Flood, Excessive Heat and Lightning are major events with over 5000 
  total injuries.
print(health)
##       FATALEVENTS FATALITIES      INJURYEVENTS INJURIES
## 1         TORNADO       5633           TORNADO    91346
## 2  EXCESSIVE HEAT       1903         TSTM WIND     6957
## 3     FLASH FLOOD        978             FLOOD     6789
## 4            HEAT        937    EXCESSIVE HEAT     6525
## 5       LIGHTNING        816         LIGHTNING     5230
## 6       TSTM WIND        504              HEAT     2100
## 7           FLOOD        470         ICE STORM     1975
## 8     RIP CURRENT        368       FLASH FLOOD     1777
## 9       HIGH WIND        248 THUNDERSTORM WIND     1488
## 10      AVALANCHE        224              HAIL     1361
library(ggplot2)
library(gridExtra)
        # Plotting events with highest Fatalities
        fatPlot <-  ggplot(health[1:10,],aes(FATALEVENTS,FATALITIES,fill=FATALEVENTS))+
                    geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
                    labs(title="Events with highest Fatalities")+
                    ylab("Number of Fatalities")+
                    xlab("")+
                    theme_bw()

        #Plotting events with highest Injuries
        injPlot <-  ggplot(health[1:10,],aes(INJURYEVENTS,INJURIES,fill=INJURYEVENTS))+
                    geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
                    labs(title="Events with highest Injuries")+
                    ylab("Number of Injuries")+
                    xlab("")+
                    theme_bw()

# Plotting the two figures in a row
grid.arrange(fatPlot,injPlot,ncol=2)

plot of chunk unnamed-chunk-1

Events causing Economic Damage:

- **Flood and Drought** respectively have excessive damages of over 140 Billion Dollars
to the properties and crops.
- **Hurricanes, Tornado and Storm Surge** are other events contributing to property
damages.
- **Drought, Floods, River Floods and Ice storms** are other major events responsible
for crop damage.
# Display the top 10 events with highest economic damage
print(damages)
##           PROPEVENTS PROPDAMAGE        CROPEVENTS CROPDAMAGE
## 1              FLOOD 144657.710           DROUGHT  13972.566
## 2  HURRICANE/TYPHOON  69305.840             FLOOD   5661.968
## 3            TORNADO  56937.160       RIVER FLOOD   5029.459
## 4        STORM SURGE  43323.536         ICE STORM   5022.114
## 5        FLASH FLOOD  16140.812              HAIL   3025.954
## 6               HAIL  15732.267         HURRICANE   2741.910
## 7          HURRICANE  11868.319 HURRICANE/TYPHOON   2607.873
## 8     TROPICAL STORM   7703.891       FLASH FLOOD   1421.317
## 9       WINTER STORM   6688.497      EXTREME COLD   1292.973
## 10         HIGH WIND   5270.046      FROST/FREEZE   1094.086
# Plot the top 10 Events and the respective damages

         propPlot <- ggplot(propDamage[1:10,],aes(EVENT,x,fill=EVENT))+
                    geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
                    labs(title="Events with highest property damage")+
                    ylab("Damage in Million Dollars")+
                    xlab("")+
                    theme_bw()
         cropPlot <- ggplot(cropDamage[1:10,],aes(EVENT,x,fill=EVENT))+
                    geom_bar(stat="identity")+coord_flip()+guides(fill=FALSE)+
                    labs(title="Events with highest crop damage")+
                    ylab("Damage in Million Dollars")+
                    xlab("")+
                    theme_bw()
grid.arrange(propPlot,cropPlot,ncol=2)

plot of chunk unnamed-chunk-2