Project Title:

Fatalities and injuries and property damage from Weather

1. Synopsis:

This project analyzed the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database to determine the effects of weather events on US population and economy. Impact on the populous, measured in injuries and fatalities, was caused by similar weather event patterns, with Tornados inflicting the harshest toll. Economic impact, measured in crop and property damage, followed a very different pattern of weather events, with Floods causing the largest total damage.

2. Data processing

Download data set (if not present) and load into R.

stormData <- read.csv("repdata_data_StormData.csv")

#Visualizing data frame variables
head(stormData)
##   STATE__           BGN_DATE BGN_TIME TIME_ZONE COUNTY COUNTYNAME STATE
## 1       1  4/18/1950 0:00:00     0130       CST     97     MOBILE    AL
## 2       1  4/18/1950 0:00:00     0145       CST      3    BALDWIN    AL
## 3       1  2/20/1951 0:00:00     1600       CST     57    FAYETTE    AL
## 4       1   6/8/1951 0:00:00     0900       CST     89    MADISON    AL
## 5       1 11/15/1951 0:00:00     1500       CST     43    CULLMAN    AL
## 6       1 11/15/1951 0:00:00     2000       CST     77 LAUDERDALE    AL
##    EVTYPE BGN_RANGE BGN_AZI BGN_LOCATI END_DATE END_TIME COUNTY_END
## 1 TORNADO         0                                               0
## 2 TORNADO         0                                               0
## 3 TORNADO         0                                               0
## 4 TORNADO         0                                               0
## 5 TORNADO         0                                               0
## 6 TORNADO         0                                               0
##   COUNTYENDN END_RANGE END_AZI END_LOCATI LENGTH WIDTH F MAG FATALITIES
## 1         NA         0                      14.0   100 3   0          0
## 2         NA         0                       2.0   150 2   0          0
## 3         NA         0                       0.1   123 2   0          0
## 4         NA         0                       0.0   100 2   0          0
## 5         NA         0                       0.0   150 2   0          0
## 6         NA         0                       1.5   177 2   0          0
##   INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP WFO STATEOFFIC ZONENAMES
## 1       15    25.0          K       0                                    
## 2        0     2.5          K       0                                    
## 3        2    25.0          K       0                                    
## 4        2     2.5          K       0                                    
## 5        2     2.5          K       0                                    
## 6        6     2.5          K       0                                    
##   LATITUDE LONGITUDE LATITUDE_E LONGITUDE_ REMARKS REFNUM
## 1     3040      8812       3051       8806              1
## 2     3042      8755          0          0              2
## 3     3340      8742          0          0              3
## 4     3458      8626          0          0              4
## 5     3412      8642          0          0              5
## 6     3450      8748          0          0              6

Note: Dataset avalaible in: https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2

This dataset consists of lot of information most of which is not required for our present study. So, here is the code to extract the required data for health and economic impact analysis against weather.

#Filtering and load only event types that cuases health and economic impact
eventType <- c("EVTYPE","FATALITIES","INJURIES","PROPDMG","PROPDMGEXP","CROPDMG","CROPDMGEXP")
workingData <- stormData[eventType]
head(workingData)
##    EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1 TORNADO          0       15    25.0          K       0           
## 2 TORNADO          0        0     2.5          K       0           
## 3 TORNADO          0        2    25.0          K       0           
## 4 TORNADO          0        2     2.5          K       0           
## 5 TORNADO          0        2     2.5          K       0           
## 6 TORNADO          0        6     2.5          K       0

Property damage exponents for each level was listed out and assigned those values for the property exponent data. Invalid data was excluded by assigning the value as ‘0’. Then property damage value was calculated by multiplying the property damage and property exponent value.The code for this process was listed below

#Finding property damage levels and exponents
unique(workingData$PROPDMGEXP)
##  [1] K M   B m + 0 5 6 ? 4 2 3 h 7 H - 1 8
## Levels:  - ? + 0 1 2 3 4 5 6 7 8 B h H K m M
workingData$PROPEXP[workingData$PROPDMGEXP == "K"] <- 1000
workingData$PROPEXP[workingData$PROPDMGEXP == "M"] <- 1e+06
workingData$PROPEXP[workingData$PROPDMGEXP == ""] <- 1
workingData$PROPEXP[workingData$PROPDMGEXP == "B"] <- 1e+09
workingData$PROPEXP[workingData$PROPDMGEXP == "m"] <- 1e+06
workingData$PROPEXP[workingData$PROPDMGEXP == "0"] <- 1
workingData$PROPEXP[workingData$PROPDMGEXP == "5"] <- 1e+05
workingData$PROPEXP[workingData$PROPDMGEXP == "6"] <- 1e+06
workingData$PROPEXP[workingData$PROPDMGEXP == "4"] <- 10000
workingData$PROPEXP[workingData$PROPDMGEXP == "2"] <- 100
workingData$PROPEXP[workingData$PROPDMGEXP == "3"] <- 1000
workingData$PROPEXP[workingData$PROPDMGEXP == "h"] <- 100
workingData$PROPEXP[workingData$PROPDMGEXP == "7"] <- 1e+07
workingData$PROPEXP[workingData$PROPDMGEXP == "H"] <- 100
workingData$PROPEXP[workingData$PROPDMGEXP == "1"] <- 10
workingData$PROPEXP[workingData$PROPDMGEXP == "8"] <- 1e+08

workingData$PROPEXP[workingData$PROPDMGEXP == "+"] <- 0
workingData$PROPEXP[workingData$PROPDMGEXP == "-"] <- 0
workingData$PROPEXP[workingData$PROPDMGEXP == "?"] <- 0

workingData$PROPDMGVAL <- workingData$PROPDMG * workingData$PROPEXP

unique(workingData$CROPDMGEXP)
## [1]   M K m B ? 0 k 2
## Levels:  ? 0 2 B k K m M
workingData$CROPEXP[workingData$CROPDMGEXP == "M"] <- 1e+06
workingData$CROPEXP[workingData$CROPDMGEXP == "K"] <- 1000
workingData$CROPEXP[workingData$CROPDMGEXP == "m"] <- 1e+06
workingData$CROPEXP[workingData$CROPDMGEXP == "B"] <- 1e+09
workingData$CROPEXP[workingData$CROPDMGEXP == "0"] <- 1
workingData$CROPEXP[workingData$CROPDMGEXP == "k"] <- 1000
workingData$CROPEXP[workingData$CROPDMGEXP == "2"] <- 100
workingData$CROPEXP[workingData$CROPDMGEXP == ""] <- 1

workingData$CROPEXP[workingData$CROPDMGEXP == "?"] <- 0

workingData$CROPDMGVAL <- workingData$CROPDMG * workingData$CROPEXP

It was observed that " most harmful to population health" events are fatalities and injuries.So,only those events with fatalities and injuries were selecetd.

It was observed that " most harmful to econamic problem“” events are Property and crop damages.So,only those events with property and crop damage were selecetd.

Then for each incident (Fatalities,Injuries, Property damage and Crop damage), the total values were estimated. Code for which is as follows.

# Totalling the data by event
fatal <- aggregate(FATALITIES ~ EVTYPE, workingData, FUN = sum)
injury <- aggregate(INJURIES ~ EVTYPE, workingData, FUN = sum)
propdmg <- aggregate(PROPDMGVAL ~ EVTYPE, workingData, FUN = sum)
cropdmg <- aggregate(CROPDMGVAL ~ EVTYPE, workingData, FUN = sum)

3. Plotting data analisis

3.1. Plotting for events with highest fatalities and highest injuries

Highest fatalities and highest injuries for Top 10 events were calculated. For better understanding and comparision these values were plotted as follows.

# Listing  events with highest fatalities
fatal10 <- fatal[order(-fatal$FATALITIES), ][1:10, ]

# Listing events with highest injuries
injury10 <- injury[order(-injury$INJURIES), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(fatal10$FATALITIES, las = 3, names.arg = fatal10$EVTYPE, main = "Events with Highest Fatalities", 
        ylab = "Number of fatalities", col = "lightblue")
barplot(injury10$INJURIES, las = 3, names.arg = injury10$EVTYPE, main = "Events with Highest Injuries", 
        ylab = "Number of injuries", col = "lightblue")

3.2. Plotting for events with highest Property damage and highest crop damage

Highest Property damage and highest crop damage for Top 10 events were calculated. For better understanding and comparision these values were plotted as follows.

# Finding events with highest property damage
propdmg10 <- propdmg[order(-propdmg$PROPDMGVAL), ][1:10, ]

# Finding events with highest crop damage
cropdmg10 <- cropdmg[order(-cropdmg$CROPDMGVAL), ][1:10, ]
par(mfrow = c(1, 2), mar = c(12, 4, 3, 2), mgp = c(3, 1, 0), cex = 0.8)
barplot(propdmg10$PROPDMGVAL/(10^9), las = 3, names.arg = propdmg10$EVTYPE, 
        main = "Events with Highest Property Damages", ylab = "Damage Cost ($ billions)", 
        col = "light green")
barplot(cropdmg10$CROPDMGVAL/(10^9), las = 3, names.arg = cropdmg10$EVTYPE, 
        main = "Events With Highest Crop Damages", ylab = "Damage Cost ($ billions)", 
        col = "light green")

4. Results

Fatalities and injuries:

  1. Tornados cause the most fatalities and injuries.
  2. Excessive Heat is the second cause for fatalities and
  3. Thunderstorm wind is the second cause for injuries.

Property damages and crop damages

  1. Floods caused the maximum property damage where as Drought caused the maximum crop damage.
  2. Second major events that caused the maximum damage was Hurricanes/Typhoos for property damage and Floods for crop damage.