Impact of Weather Events On US Health And Economy

Storms and other severe weather events can cause both public health and economic problems for communities and municipalities. Many severe events can result in fatalities, injuries, and property damage, and preventing such outcomes to the extent possible is a key concern.

This project involves exploring the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. This database tracks characteristics of major storms and weather events in the United States, including when and where they occur, as well as estimates of any fatalities, injuries, and property damage.

Synopsis

Basically the analysis contains 2 parts. 1. The Health Impacts : The first part was to figure out the impact on health from weather events and for this research I tried to find out the maxium number of casualties(deaths and Injuries combined) due to these weather events and the top harm was caused by TORNADO with a total of 96979 casualties. 2. The Economical Impacts : In the second part of the research I tried to figure out the impact on Economy from weather events and for this research I took two criteria for economic damages * The total property damages done (In Billions) * The total Crop damages (in Millions)

Data Processing

The data for this assignment come in the form of a comma-separated-value file compressed via the bzip2 algorithm to reduce its size.Also,some documentation of the database available at National Weather Service Storm Data Documentation and National Climatic Data Center Storm Events FAQ The events in the database start in the year 1950 and end in November 2011. In the earlier years of the database there are generally fewer events recorded, most likely due to a lack of good records. More recent years should be considered more complete.

The First step involves downloading the data if it does not exi.

filename <- "repdata%2Fdata%2FStormData.csv.bz2"
if (!file.exists(filename)){
  fileURL <- "https://d396qusza40orc.cloudfront.net/repdata%2Fdata%2FStormData.csv.bz2"
  download.file(fileURL, filename, method="curl")
}  

Attaching the required packages

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(data.table)
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
library(ggplot2)

Next lets try to read the data from csv and take out the essential columns.

raw_data <- read.csv(bzfile("repdata%2Fdata%2FStormData.csv.bz2"))[,c(8,23,24,25,26,27,28)]

Studying the Data

# Looking at the raw_data

head(raw_data,50)
##     EVTYPE FATALITIES INJURIES PROPDMG PROPDMGEXP CROPDMG CROPDMGEXP
## 1  TORNADO          0       15   25.00          K       0           
## 2  TORNADO          0        0    2.50          K       0           
## 3  TORNADO          0        2   25.00          K       0           
## 4  TORNADO          0        2    2.50          K       0           
## 5  TORNADO          0        2    2.50          K       0           
## 6  TORNADO          0        6    2.50          K       0           
## 7  TORNADO          0        1    2.50          K       0           
## 8  TORNADO          0        0    2.50          K       0           
## 9  TORNADO          1       14   25.00          K       0           
## 10 TORNADO          0        0   25.00          K       0           
## 11 TORNADO          0        3    2.50          M       0           
## 12 TORNADO          0        3    2.50          M       0           
## 13 TORNADO          1       26  250.00          K       0           
## 14 TORNADO          0       12    0.00          K       0           
## 15 TORNADO          0        6   25.00          K       0           
## 16 TORNADO          4       50   25.00          K       0           
## 17 TORNADO          0        2   25.00          K       0           
## 18 TORNADO          0        0   25.00          K       0           
## 19 TORNADO          0        0   25.00          K       0           
## 20 TORNADO          0        0   25.00          K       0           
## 21 TORNADO          0        0   25.00          K       0           
## 22 TORNADO          0        0    2.50          K       0           
## 23 TORNADO          0        0    2.50          K       0           
## 24 TORNADO          0        1   25.00          K       0           
## 25 TORNADO          0        1   25.00          K       0           
## 26 TORNADO          1        8   25.00          K       0           
## 27 TORNADO          0        2   25.00          K       0           
## 28 TORNADO          0        1   25.00          K       0           
## 29 TORNADO          0        6   25.00          K       0           
## 30 TORNADO          0        2    2.50          K       0           
## 31 TORNADO          0        0    2.50          K       0           
## 32 TORNADO          0       12    2.50          K       0           
## 33 TORNADO          0        0   25.00          K       0           
## 34 TORNADO          6      195    2.50          M       0           
## 35 TORNADO          0        2   25.00          K       0           
## 36 TORNADO          7       12  250.00          K       0           
## 37 TORNADO          0        0    2.50          K       0           
## 38 TORNADO          2        3   25.00          K       0           
## 39 TORNADO          0        2    2.50          K       0           
## 40 TORNADO          0        0   25.00          K       0           
## 41 TORNADO          0        0    2.50          K       0           
## 42 TORNADO          0        1   25.00          K       0           
## 43 TORNADO          0        0    2.50          K       0           
## 44 TORNADO          0        0   25.00          K       0           
## 45 TORNADO          0        0   25.00          K       0           
## 46 TORNADO          0        0    0.03          K       0           
## 47 TORNADO          0        1   25.00          K       0           
## 48 TORNADO          0        4  250.00          K       0           
## 49 TORNADO          0       26  250.00          K       0           
## 50 TORNADO          0        3    2.50          K       0
# Lets look at the different Event Types

head(unique(raw_data$EVTYPE),50)
##  [1] TORNADO                        TSTM WIND                     
##  [3] HAIL                           FREEZING RAIN                 
##  [5] SNOW                           ICE STORM/FLASH FLOOD         
##  [7] SNOW/ICE                       WINTER STORM                  
##  [9] HURRICANE OPAL/HIGH WINDS      THUNDERSTORM WINDS            
## [11] RECORD COLD                    HURRICANE ERIN                
## [13] HURRICANE OPAL                 HEAVY RAIN                    
## [15] LIGHTNING                      THUNDERSTORM WIND             
## [17] DENSE FOG                      RIP CURRENT                   
## [19] THUNDERSTORM WINS              FLASH FLOOD                   
## [21] FLASH FLOODING                 HIGH WINDS                    
## [23] FUNNEL CLOUD                   TORNADO F0                    
## [25] THUNDERSTORM WINDS LIGHTNING   THUNDERSTORM WINDS/HAIL       
## [27] HEAT                           WIND                          
## [29] LIGHTING                       HEAVY RAINS                   
## [31] LIGHTNING AND HEAVY RAIN       FUNNEL                        
## [33] WALL CLOUD                     FLOODING                      
## [35] THUNDERSTORM WINDS HAIL        FLOOD                         
## [37] COLD                           HEAVY RAIN/LIGHTNING          
## [39] FLASH FLOODING/THUNDERSTORM WI WALL CLOUD/FUNNEL CLOUD       
## [41] THUNDERSTORM                   WATERSPOUT                    
## [43] EXTREME COLD                   HAIL 1.75)                    
## [45] LIGHTNING/HEAVY RAIN           HIGH WIND                     
## [47] BLIZZARD                       BLIZZARD WEATHER              
## [49] WIND CHILL                     BREAKUP FLOODING              
## 985 Levels:    HIGH SURF ADVISORY  COASTAL FLOOD ... WND

As you can see we have ambiguity in the data , We need to first work on this ambiguity. For example SNOW, SNOW/ICE acctually mean the same. So lets try to remove the ambiguity from Event Types.

# Checking the number of event types before processing the event types

length(unique(raw_data$EVTYPE))
## [1] 985
event_types <- toupper(raw_data$EVTYPE)

# replace all punctuation characters with blank spaces
event_types <- gsub("[[:blank:][:punct:]+]", " ", event_types)
event_types<-gsub("^ ","",event_types)
length(unique(event_types))
## [1] 867
event_types<-gsub("^ *| $*","",event_types) # remove leading & trailing spaces
event_types<-gsub("  "," ",event_types) #removing in-between extra spaces


# updating the data frame
raw_data$EVTYPE <- event_types

As you can see we now have lesser event types than before.We decreased the number of event types from 985 to 867. Although further cleaning up the data is possible but for now lets move on to the analysis part.

Casualties from these events

Let us try to analyse the total number of casualties caused by these events from the year 1950 to 2011.

raw_data$casualties <- raw_data$FATALITIES + raw_data$INJURIES
casualties <- aggregate(raw_data$casualties, by=list(Event=raw_data$EVTYPE), FUN=sum)

# Taking out the top 10 Events causing maximum number of casualties.

Top_Casualties <- head(casualties[order(casualties$x, decreasing = TRUE), ], 10)

Potting the BarPlot

library(ggplot2)
p <-ggplot(Top_Casualties, aes(Event, x))
p +geom_bar(stat = "identity") + xlab("Events") + ylab("Casualties") +coord_flip() + theme(axis.text.x = element_text(face="bold", color="#993333", 
                           size=6, angle=45),
          axis.text.y = element_text(face="bold", color="#993333", 
                           size=6, angle=45)
          )

Economical Impact from these events

Cleaning and transforming the Property and Crop data

raw_data$PROP_US <- 0
raw_data$CROP_US <- 0


raw_data$PROP_US <- ifelse(raw_data$PROPDMGEXP %in% "H" | raw_data$PROPDMGEXP %in% "h",
                               raw_data$PROPDMG*0.0000001, raw_data$PROP_US)
raw_data$CROP_US <- ifelse(raw_data$CROPDMGEXP %in%"H"|  raw_data$CROPDMGEXP %in%"h",
                               raw_data$CROPDMG*0.0000001, raw_data$CROP_US)

raw_data$PROP_US <- ifelse(raw_data$PROPDMGEXP %in% "K"|  raw_data$PROPDMGEXP %in% "k",
                               raw_data$PROPDMG*0.000001,  raw_data$PROP_US)
raw_data$CROP_US <- ifelse(raw_data$CROPDMGEXP %in% "K"|  raw_data$CROPDMGEXP %in% "k",
                               raw_data$CROPDMG*0.000001,  raw_data$CROP_US)

raw_data$PROP_US <- ifelse(raw_data$PROPDMGEXP %in% "M"|  raw_data$PROPDMGEXP %in% "m", 
                               raw_data$PROPDMG*0.001,     raw_data$PROP_US)
raw_data$CROP_US <- ifelse(raw_data$CROPDMGEXP %in% "M"|  raw_data$CROPDMGEXP %in% "m", 
                               raw_data$CROPDMG*0.001,     raw_data$CROP_US)

raw_data$PROP_US <- ifelse(raw_data$PROPDMGEXP %in% "B"|  raw_data$PROPDMGEXP %in% "b", 
                               raw_data$PROPDMG*1,         raw_data$PROP_US)
raw_data$CROP_US <- ifelse(raw_data$CROPDMGEXP %in% "B"|  raw_data$CROPDMGEXP %in% "b",
                               raw_data$CROPDMG*1,         raw_data$CROP_US)

Property Damage (In Billions)

Property_damage <- aggregate(raw_data$PROP_US, by=list(Event=raw_data$EVTYPE), FUN=sum)

# Taking out the top 10 Events causing maximum number of Property damage.

Top_Property_Damages <- head(Property_damage[order(Property_damage$x, decreasing = TRUE), ], 10)

Potting the BarPlot

library(ggplot2)
p <-ggplot(Top_Property_Damages, aes(Event, x))
p +geom_bar(stat = "identity") + xlab("Events") + ylab("Property Damages") +coord_flip() + theme(axis.text.x = element_text(face="bold", color="#993333", 
                           size=6, angle=45),
          axis.text.y = element_text(face="bold", color="#993333", 
                           size=6, angle=45)
          )

So it is clear from the graph that highest property damage was caused by Flood.

Crop Damage(In Billions)

Crop_damage <- aggregate(raw_data$CROP_US, by=list(Event=raw_data$EVTYPE), FUN=sum)

# Taking out the top 10 Events causing maximum number of Property damage.

Top_Crop_Damages <- head(Crop_damage[order(Crop_damage$x, decreasing = TRUE), ], 10)

Potting the BarPlot

library(ggplot2)
p <-ggplot(Top_Crop_Damages, aes(Event, x))
p +geom_bar(stat = "identity") + xlab("Events") + ylab("Crop Damages") +coord_flip() + theme(axis.text.x = element_text(face="bold", color="#993333", 
                           size=6, angle=45),
          axis.text.y = element_text(face="bold", color="#993333", 
                           size=6, angle=45)
          )

So it is clear from the graph that highest property damage was caused by DROUGHT.

Conclusion:

From the above research we can conclude that :

  1. Highest number of casualties were caused by tornado causing a total destruction of 96979 Peoples
  2. Highest Property Damage was caused by Floods causing a total destruction of 144.7 Billions
  3. Highest Crop Damage was caused by Drought causing a total destruction of 14.0 Billions