Synopsis

In this report I aim on discussing the economic and health problems that were caused by several natural disasters like storms and other severe weather events that occured in the Country of United States of America between 1950 and 2011. The data used for this analysis was obtained from the U.S. National Oceanic and Atmospheric Administration’s (NOAA) storm database. dataset In this analysis I found out that the main cause for most of the public deaths and injuries both combined in the US were caused by floods whereas from the economic point of view we had two conclusions as the damages done to the crop were from mostly drought and the damages to the property were mostly from the floods.

For this Analysis to be reproduced the usage of packages like “Dplyr”,“Plyr” & “Ggplot2” is highly recommended.

Loading the Dataset.

After downloading and unzipping the file from the give website the first thing is we read the csv and store in the variable.

data <- read.csv("repdata_data_StormData.csv") 
str(data)

## 'data.frame':    902297 obs. of  37 variables:
##  $ STATE__   : num  1 1 1 1 1 1 1 1 1 1 ...
##  $ BGN_DATE  : chr  "4/18/1950 0:00:00" "4/18/1950 0:00:00" "2/20/1951 0:00:00" "6/8/1951 0:00:00" ...
##  $ BGN_TIME  : chr  "0130" "0145" "1600" "0900" ...
##  $ TIME_ZONE : chr  "CST" "CST" "CST" "CST" ...
##  $ COUNTY    : num  97 3 57 89 43 77 9 123 125 57 ...
##  $ COUNTYNAME: chr  "MOBILE" "BALDWIN" "FAYETTE" "MADISON" ...
##  $ STATE     : chr  "AL" "AL" "AL" "AL" ...
##  $ EVTYPE    : chr  "TORNADO" "TORNADO" "TORNADO" "TORNADO" ...
##  $ BGN_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ BGN_AZI   : chr  "" "" "" "" ...
##  $ BGN_LOCATI: chr  "" "" "" "" ...
##  $ END_DATE  : chr  "" "" "" "" ...
##  $ END_TIME  : chr  "" "" "" "" ...
##  $ COUNTY_END: num  0 0 0 0 0 0 0 0 0 0 ...
##  $ COUNTYENDN: logi  NA NA NA NA NA NA ...
##  $ END_RANGE : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ END_AZI   : chr  "" "" "" "" ...
##  $ END_LOCATI: chr  "" "" "" "" ...
##  $ LENGTH    : num  14 2 0.1 0 0 1.5 1.5 0 3.3 2.3 ...
##  $ WIDTH     : num  100 150 123 100 150 177 33 33 100 100 ...
##  $ F         : int  3 2 2 2 2 2 2 1 3 3 ...
##  $ MAG       : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ FATALITIES: num  0 0 0 0 0 0 0 0 1 0 ...
##  $ INJURIES  : num  15 0 2 2 2 6 1 0 14 0 ...
##  $ PROPDMG   : num  25 2.5 25 2.5 2.5 2.5 2.5 2.5 25 25 ...
##  $ PROPDMGEXP: chr  "K" "K" "K" "K" ...
##  $ CROPDMG   : num  0 0 0 0 0 0 0 0 0 0 ...
##  $ CROPDMGEXP: chr  "" "" "" "" ...
##  $ WFO       : chr  "" "" "" "" ...
##  $ STATEOFFIC: chr  "" "" "" "" ...
##  $ ZONENAMES : chr  "" "" "" "" ...
##  $ LATITUDE  : num  3040 3042 3340 3458 3412 ...
##  $ LONGITUDE : num  8812 8755 8742 8626 8642 ...
##  $ LATITUDE_E: num  3051 0 0 0 0 ...
##  $ LONGITUDE_: num  8806 0 0 0 0 ...
##  $ REMARKS   : chr  "" "" "" "" ...
##  $ REFNUM    : num  1 2 3 4 5 6 7 8 9 10 ...

Data Processing

After Exploring the dataset I realised that there were only a few coloumns that were relevant for the analysis of economic and public health, so I created a subset that comprises of only relevant coloumn’s like (FATALITIES,INJURIES) for public health and(PROPDMG,PROPDMGEXP,CROPDMG, CROPDMGEXP) which are all related to the damages caused to Crops and Goods and (EVTYPE) which are the natural disasters itself. For this I have used the “Dplyr” package.

library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

dat1 <- data %>% 
  select(EVTYPE,FATALITIES,INJURIES,PROPDMG,PROPDMGEXP,CROPDMG,CROPDMGEXP)

The coloumn’s for the economic damages like the PROPDMGEXP and CROPDMGEXP both contains exponents which need to be replaced with their corresponding values.

unique(dat1$PROPDMGEXP)

##  [1] "K" "M" ""  "B" "m" "+" "0" "5" "6" "?" "4" "2" "3" "h" "7" "H" "-" "1" "8"

unique(dat1$CROPDMGEXP)

## [1] ""  "M" "K" "m" "B" "?" "0" "k" "2"

So we need to replace these factors from “this” to :this “”, “?”, “+”, “-”: 0 “K”: 1.000 “M”: 1.000.000 “B”: 1.000.000.000 ANYTHING OTHER THAN THESE ARE CONSIDERED 0

dat1$PROPDMGEXP <- as.character(dat1$PROPDMGEXP)
dat1$PROPDMGEXP[is.na(dat1$PROPDMGEXP)] <- "0" #NA's are 0
dat1$PROPDMGEXP[!grepl("K|M|B",dat1$PROPDMGEXP,ignore.case = TRUE)] <- "0"#anything other then KMB
dat1$PROPDMGEXP[grep("K",dat1$PROPDMGEXP,ignore.case = TRUE)] <- "3"
dat1$PROPDMGEXP[grep("M",dat1$PROPDMGEXP,ignore.case = TRUE)] <- "6"
dat1$PROPDMGEXP[grep("B",dat1$PROPDMGEXP,ignore.case = TRUE)] <- "9"
dat1$PROPDMGEXP <- as.numeric(dat1$PROPDMGEXP)
dat1$TOTALPROPDMG <- dat1$PROPDMG*10^dat1$PROPDMGEXP

dat1$CROPDMGEXP <- as.character(dat1$CROPDMGEXP)
dat1$CROPDMGEXP[is.na(dat1$CROPDMGEXP)] <- "0" #NA's are 0
dat1$CROPDMGEXP[!grepl("K|M|B",dat1$CROPDMGEXP,ignore.case = TRUE)] <- "0"#anything other than KMB
dat1$CROPDMGEXP[grep("K",dat1$CROPDMGEXP,ignore.case = TRUE)] <- "3"
dat1$CROPDMGEXP[grep("M",dat1$CROPDMGEXP,ignore.case = TRUE)] <- "6"
dat1$CROPDMGEXP[grep("B",dat1$CROPDMGEXP,ignore.case = TRUE)] <- "9"
dat1$CROPDMGEXP <- as.numeric(dat1$CROPDMGEXP)
dat1$TOTALCROPDMG <- dat1$CROPDMG*10^dat1$CROPDMGEXP

Now we finally have the correct dataset that can be used for analysis.

Results

Damage to pulic health and safety.

In order to find the damages caused by the natural events on the overall public health we look into the number of FATALITIES & INJURIES caused by the EVTYPE.

fatality <- aggregate(FATALITIES ~ EVTYPE,data = dat1,FUN = "sum") #deaths from each EVTYPE
top5f <- fatality[order(-fatality$FATALITIES),][1:5, ] #top5 EVTYPE deaths

injury <- aggregate(INJURIES ~ EVTYPE,data = dat1,FUN = "sum") #injury from each EVTYPE
top5i <- injury[order(-injury$INJURIES),][1:5, ] #top5 EVTYPE injuries

Now after finally calculating the top 5 “number of deaths & injuries” caused by each EVTYPE,we plot the data. For the plotting I will be using “Ggplot2” package.

library(ggplot2)
ggplot(data=top5f, aes(x=reorder(EVTYPE, FATALITIES), y=FATALITIES, color=EVTYPE)) + geom_bar(stat="identity",fill="white") + xlab("Event Type") +  ylab("Total number of fatalities") +  ggtitle("Top 5 Events with most deaths")+coord_flip()

Now let’s make the same for Injuries caused by Evtype to compare and come up with a conclusion.

ggplot(data=top5i,aes(x=reorder(EVTYPE, INJURIES),y=INJURIES,color=EVTYPE)) +        geom_bar(stat ="identity",fill="white") + xlab("Event type") + ylab("Total number of injuries")  + ggtitle("Top 5 events with most injuries")+coord_flip()

In both of these box plots we obsereve that the main culprit behind most of the deaths and injuries is Tornado.

Damage to the overall economy

In order to find out the damage caused to the US economy we need to look for factors like crops and goods(property) damage.

cropdmg <- aggregate(TOTALCROPDMG ~ EVTYPE,data = dat1,FUN = "sum") #crop damage 
top5c <- cropdmg[order(-cropdmg$TOTALCROPDMG),][1:5,]

propmg <- aggregate(TOTALPROPDMG ~ EVTYPE,data = dat1,FUN = "sum")
top5p <- propmg[order(-propmg$TOTALPROPDMG),][1:5,]

Now let’s plot the data for further findings.

ggplot(data=top5c, aes(x=reorder(EVTYPE, TOTALCROPDMG), y=TOTALCROPDMG, color=EVTYPE)) + geom_bar(stat="identity",fill="white") + xlab("Event Type") +  ylab("Total damage to crops")

ggplot(data=top5p, aes(x=reorder(EVTYPE, TOTALPROPDMG), y=TOTALPROPDMG, color=EVTYPE)) + geom_bar(stat="identity",fill="white") + xlab("Event Type") +  ylab("Total damage to property") +coord_flip()

In this we have come up with an interesting conclusion as the main reason for crop damages are drought’s whereas the main reason for damages of property is the polar opposite of that which are floods.

so now we have come to the end of my analysis.

Damage caused by Natural Disaster in US between 1950-2011

Raghav Pradhan

2024-08-04